python - Scipy Sparse Matrix built with non-int64 (indptr, indices) for dot -
is ok use uint32 type indptr , indices when manually construct scipy.sparse.csr_matrix? dot method of matrix return correct answer?
the following example seems ok... not sure if officially ok.
import numpy np import scipy.sparse spsp x = np.random.choice([0,1],size=(1000,1000), replace=true, p=[0.9,0.1]) x = x.astype(np.uint8) x_csr = spsp.csr_matrix(x) x_csr.indptr = x_csr.indptr.astype(np.uint32) x_csr.indices = x_csr.indices.astype(np.uint32) x_csr_selfdot = x_csr.dot(x_csr.t) x_selfdot = x.dot(x.t) print(np.sum(x_selfdot != x_csr_selfdot))
the x_csr.data array of 1. scipy doesn't let me use single number replace whole x_csr.data array.
i'm not sure goal is. doing works (sort of)
in [237]: x=x_csr.dot(x_csr.t) in [238]: np.allclose(x.a,x.dot(x.t)) out[238]: true
that is, multiplication modified x_csr
works.
but note manipulation of x_csr
makes new sparse matrix reverts int32
indices
in [240]: x_csr.indptr out[240]: array([ 0, 112, 216, ..., 99652, 99751, 99853], dtype=uint32) in [241]: x_csr.t.indptr out[241]: array([ 0, 112, 216, ..., 99652, 99751, 99853], dtype=int32) in [242]: x.indptr out[242]: array([ 0, 1000, 2000, ..., 997962, 998962, 999962], dtype=int32) in [260]: x_csr[:].indptr out[260]: array([ 0, 112, 216, ..., 99652, 99751, 99853], dtype=int32)
the dtype .data
preserved, when creating new matrix, sparse
makes own indptr
, indices
arrays. doesn't try make view of originals.
and yes, data
attribute has have value each nonzero element of matrix. data
has same size indices
. in coo
format, row
, col
match data
.
also print(x_csr)
gives error when x_csr.tocoo()
:
--> 931 _sparsetools.expandptr(major_dim,self.indptr,major_indices) valueerror: output dtype not compatible inputs.
in general, don't try play indices
, indptr
of csr
matrix. let sparse
code take care of those.
=====================
x_csr.dot
performed x_csr.__mul__
, when other
sparse done x_csr._mul_sparse_matrix(self, other)
. uses sparse.sputils.get_index_dtype
determine dtype
indexes of returned value. choose between suitable index data type (int32 or int64)
.
it converts inputs dtype
np.asarray(self.indptr, dtype=idx_dtype),
so attempt change x_csr.indptr
dtype doesn't change calculation method. note after prep work, actual multiplication performed in compiled c code (csr_matmat_pass1
, csr_matmat_pass2
).
Comments
Post a Comment