python - Scipy Sparse Matrix built with non-int64 (indptr, indices) for dot -


is ok use uint32 type indptr , indices when manually construct scipy.sparse.csr_matrix? dot method of matrix return correct answer?

the following example seems ok... not sure if officially ok.

import numpy np import scipy.sparse spsp x = np.random.choice([0,1],size=(1000,1000), replace=true, p=[0.9,0.1]) x = x.astype(np.uint8)  x_csr = spsp.csr_matrix(x) x_csr.indptr = x_csr.indptr.astype(np.uint32) x_csr.indices = x_csr.indices.astype(np.uint32)  x_csr_selfdot = x_csr.dot(x_csr.t) x_selfdot = x.dot(x.t)  print(np.sum(x_selfdot != x_csr_selfdot)) 

the x_csr.data array of 1. scipy doesn't let me use single number replace whole x_csr.data array.

i'm not sure goal is. doing works (sort of)

in [237]: x=x_csr.dot(x_csr.t)  in [238]: np.allclose(x.a,x.dot(x.t)) out[238]: true 

that is, multiplication modified x_csr works.

but note manipulation of x_csr makes new sparse matrix reverts int32 indices

in [240]: x_csr.indptr out[240]: array([    0,   112,   216, ..., 99652, 99751, 99853], dtype=uint32)  in [241]: x_csr.t.indptr out[241]: array([    0,   112,   216, ..., 99652, 99751, 99853], dtype=int32)  in [242]: x.indptr out[242]: array([     0,   1000,   2000, ..., 997962, 998962, 999962], dtype=int32)  in [260]: x_csr[:].indptr out[260]: array([    0,   112,   216, ..., 99652, 99751, 99853], dtype=int32) 

the dtype .data preserved, when creating new matrix, sparse makes own indptr , indices arrays. doesn't try make view of originals.

and yes, data attribute has have value each nonzero element of matrix. data has same size indices. in coo format, row , col match data.

also print(x_csr) gives error when x_csr.tocoo():

--> 931         _sparsetools.expandptr(major_dim,self.indptr,major_indices) valueerror: output dtype not compatible inputs. 

in general, don't try play indices , indptr of csr matrix. let sparse code take care of those.

=====================

x_csr.dot performed x_csr.__mul__, when other sparse done x_csr._mul_sparse_matrix(self, other). uses sparse.sputils.get_index_dtype determine dtype indexes of returned value. choose between suitable index data type (int32 or int64).

it converts inputs dtype

np.asarray(self.indptr, dtype=idx_dtype), 

so attempt change x_csr.indptr dtype doesn't change calculation method. note after prep work, actual multiplication performed in compiled c code (csr_matmat_pass1, csr_matmat_pass2).


Comments

Popular posts from this blog

ios - RestKit 0.20 — CoreData: error: Failed to call designated initializer on NSManagedObject class (again) -

java - Digest auth with Spring Security using javaconfig -

laravel - PDOException in Connector.php line 55: SQLSTATE[HY000] [1045] Access denied for user 'root'@'localhost' (using password: YES) -