python - Scipy Sparse Matrix built with non-int64 (indptr, indices) for dot -
is ok use uint32 type indptr , indices when manually construct scipy.sparse.csr_matrix? dot method of matrix return correct answer?
the following example seems ok... not sure if officially ok.
import numpy np import scipy.sparse spsp x = np.random.choice([0,1],size=(1000,1000), replace=true, p=[0.9,0.1]) x = x.astype(np.uint8) x_csr = spsp.csr_matrix(x) x_csr.indptr = x_csr.indptr.astype(np.uint32) x_csr.indices = x_csr.indices.astype(np.uint32) x_csr_selfdot = x_csr.dot(x_csr.t) x_selfdot = x.dot(x.t) print(np.sum(x_selfdot != x_csr_selfdot)) the x_csr.data array of 1. scipy doesn't let me use single number replace whole x_csr.data array.
i'm not sure goal is. doing works (sort of)
in [237]: x=x_csr.dot(x_csr.t) in [238]: np.allclose(x.a,x.dot(x.t)) out[238]: true that is, multiplication modified x_csr works.
but note manipulation of x_csr makes new sparse matrix reverts int32 indices
in [240]: x_csr.indptr out[240]: array([ 0, 112, 216, ..., 99652, 99751, 99853], dtype=uint32) in [241]: x_csr.t.indptr out[241]: array([ 0, 112, 216, ..., 99652, 99751, 99853], dtype=int32) in [242]: x.indptr out[242]: array([ 0, 1000, 2000, ..., 997962, 998962, 999962], dtype=int32) in [260]: x_csr[:].indptr out[260]: array([ 0, 112, 216, ..., 99652, 99751, 99853], dtype=int32) the dtype .data preserved, when creating new matrix, sparse makes own indptr , indices arrays. doesn't try make view of originals.
and yes, data attribute has have value each nonzero element of matrix. data has same size indices. in coo format, row , col match data.
also print(x_csr) gives error when x_csr.tocoo():
--> 931 _sparsetools.expandptr(major_dim,self.indptr,major_indices) valueerror: output dtype not compatible inputs. in general, don't try play indices , indptr of csr matrix. let sparse code take care of those.
=====================
x_csr.dot performed x_csr.__mul__, when other sparse done x_csr._mul_sparse_matrix(self, other). uses sparse.sputils.get_index_dtype determine dtype indexes of returned value. choose between suitable index data type (int32 or int64).
it converts inputs dtype
np.asarray(self.indptr, dtype=idx_dtype), so attempt change x_csr.indptr dtype doesn't change calculation method. note after prep work, actual multiplication performed in compiled c code (csr_matmat_pass1, csr_matmat_pass2).
Comments
Post a Comment