I have an MxN sparse csr_matrix
, and I'd like to add a few columns with only zeroes to the right of the matrix. In principle, the arrays indptr
, indices
and data
keep the same, so I only want to change the dimensions of the matrix. However, this seems to be not implemented.
我有一个MxN稀疏的csr_matrix,我想在矩阵的右边添加几个只有0的列。原则上,数组indptr、索引和数据保持不变,所以我只想改变矩阵的维数。然而,这似乎没有实现。
>>> A = csr_matrix(np.identity(5), dtype = int)
>>> A.toarray()
array([[1, 0, 0, 0, 0],
[0, 1, 0, 0, 0],
[0, 0, 1, 0, 0],
[0, 0, 0, 1, 0],
[0, 0, 0, 0, 1]])
>>> A.shape
(5, 5)
>>> A.shape = ((5,7))
NotImplementedError: Reshaping not implemented for csr_matrix.
Also horizontally stacking a zero matrix does not seem to work.
同样水平叠加一个零矩阵似乎也不起作用。
>>> B = csr_matrix(np.zeros([5,2]), dtype = int)
>>> B.toarray()
array([[0, 0],
[0, 0],
[0, 0],
[0, 0],
[0, 0]])
>>> np.hstack((A,B))
array([ <5x5 sparse matrix of type '<type 'numpy.int32'>'
with 5 stored elements in Compressed Sparse Row format>,
<5x2 sparse matrix of type '<type 'numpy.int32'>'
with 0 stored elements in Compressed Sparse Row format>], dtype=object)
This is what I want to achieve eventually. Is there a quick way to reshape my csr_matrix
without copying everything in it?
这就是我最终想要达到的目标。是否有一种快速的方法来重构我的csr_matrix,而不需要复制它中的所有内容?
>>> C = csr_matrix(np.hstack((A.toarray(), B.toarray())))
>>> C.toarray()
array([[1, 0, 0, 0, 0, 0, 0],
[0, 1, 0, 0, 0, 0, 0],
[0, 0, 1, 0, 0, 0, 0],
[0, 0, 0, 1, 0, 0, 0],
[0, 0, 0, 0, 1, 0, 0]])
2 个解决方案
#1
4
What you want to do isn't really what numpy or scipy understand as a reshape. But for your particular case, you can create a new CSR matrix reusing the data
, indices
and indptr
from your original one, without copying them:
你想要做的,并不是真正意义上的“麻木”或“scipy”。但是对于您的特殊情况,您可以创建一个新的CSR矩阵,将数据、索引和indptr从您的原始数据中重新使用,而不需要复制它们:
import scipy.sparse as sps
a = sps.rand(10000, 10000, density=0.01, format='csr')
In [19]: %timeit sps.csr_matrix((a.data, a.indices, a.indptr),
... shape=(10000, 10020), copy=True)
100 loops, best of 3: 6.26 ms per loop
In [20]: %timeit sps.csr_matrix((a.data, a.indices, a.indptr),
... shape=(10000, 10020), copy=False)
10000 loops, best of 3: 47.3 us per loop
In [21]: %timeit sps.csr_matrix((a.data, a.indices, a.indptr),
... shape=(10000, 10020))
10000 loops, best of 3: 48.2 us per loop
So if you no longer need your original matrix a
, since the default is copy=False
, simply do:
因此,如果你不再需要你的原始矩阵a,因为默认是copy=False,简单地做:
a = sps.csr_matrix((a.data, a.indices, a.indptr), shape=(10000, 10020))
#2
5
You can use scipy.sparse.vstack
or scipy.sparse.hstack
to do it faster:
您可以使用scipy.sparse。vstack或scipy.sparse。hstack加快速度:
from scipy.sparse import csr_matrix, vstack, hstack
B = csr_matrix((5, 2), dtype=int)
C = csr_matrix((5, 2), dtype=int)
D = csr_matrix((10, 10), dtype=int)
B2 = vstack((B, C))
#<10x2 sparse matrix of type '<type 'numpy.int32'>'
# with 0 stored elements in COOrdinate format>
hstack((B2, D))
#<10x12 sparse matrix of type '<type 'numpy.int32'>'
# with 0 stored elements in COOrdinate format>
Note that the output is a coo_matrix
, which can be efficiently converted to the CSR
or CSC
formats.
注意,输出是一个coo_matrix,它可以有效地转换为CSR或CSC格式。
#1
4
What you want to do isn't really what numpy or scipy understand as a reshape. But for your particular case, you can create a new CSR matrix reusing the data
, indices
and indptr
from your original one, without copying them:
你想要做的,并不是真正意义上的“麻木”或“scipy”。但是对于您的特殊情况,您可以创建一个新的CSR矩阵,将数据、索引和indptr从您的原始数据中重新使用,而不需要复制它们:
import scipy.sparse as sps
a = sps.rand(10000, 10000, density=0.01, format='csr')
In [19]: %timeit sps.csr_matrix((a.data, a.indices, a.indptr),
... shape=(10000, 10020), copy=True)
100 loops, best of 3: 6.26 ms per loop
In [20]: %timeit sps.csr_matrix((a.data, a.indices, a.indptr),
... shape=(10000, 10020), copy=False)
10000 loops, best of 3: 47.3 us per loop
In [21]: %timeit sps.csr_matrix((a.data, a.indices, a.indptr),
... shape=(10000, 10020))
10000 loops, best of 3: 48.2 us per loop
So if you no longer need your original matrix a
, since the default is copy=False
, simply do:
因此,如果你不再需要你的原始矩阵a,因为默认是copy=False,简单地做:
a = sps.csr_matrix((a.data, a.indices, a.indptr), shape=(10000, 10020))
#2
5
You can use scipy.sparse.vstack
or scipy.sparse.hstack
to do it faster:
您可以使用scipy.sparse。vstack或scipy.sparse。hstack加快速度:
from scipy.sparse import csr_matrix, vstack, hstack
B = csr_matrix((5, 2), dtype=int)
C = csr_matrix((5, 2), dtype=int)
D = csr_matrix((10, 10), dtype=int)
B2 = vstack((B, C))
#<10x2 sparse matrix of type '<type 'numpy.int32'>'
# with 0 stored elements in COOrdinate format>
hstack((B2, D))
#<10x12 sparse matrix of type '<type 'numpy.int32'>'
# with 0 stored elements in COOrdinate format>
Note that the output is a coo_matrix
, which can be efficiently converted to the CSR
or CSC
formats.
注意,输出是一个coo_matrix,它可以有效地转换为CSR或CSC格式。