I have a csr_matrix
A
of shape (70000, 80000)
and another csr_matrix
B
of shape (1, 80000)
. How can I efficiently add B to every row of A? One idea is to somehow create a sparse matrix B'
which is rows of B
repeated, but numpy.repeat
does not work and using a matrix of ones to create B'
is very memory inefficient.
我有一个形状的csr_matrix A(70000,80000)和另一个csr_matrix Bof形状(1,80000)。如何有效地将B添加到A的每一行?一种想法是以某种方式创建稀疏矩阵B',其是B行重复,但是numpy.repeat不起作用并且使用1的矩阵来创建B'是非常低效的内存。
I also tried iterating through every row of A
and adding B
to it, but that again is very time inefficient.
我也尝试迭代A的每一行并向其添加B,但这又是非常低效的时间。
Update: I tried something very simple which seems to be very efficient than the ideas I mentioned above. The idea is to use scipy.sparse.vstack
:
更新:我尝试了一些非常简单的东西,它似乎比我上面提到的想法非常有效。想法是使用scipy.sparse.vstack:
C = sparse.vstack([B for x in range(A.shape[0])])
A + C
This performs well for my task! Few more realizations: I initially tried an iterative approach where I called vstack
multiple times, this approach is slower than calling it just once.
这对我的任务表现很好!几乎没有实现:我最初尝试了一种迭代方法,我称之为vstackmultiple次,这种方法比仅调用一次要慢。
1 个解决方案
#1
2
A + B[np.zeros(A.shape[0])]
is another way to expand B
to the same shape as A
.
A + B [np.zeros(A.shape [0])]是将B扩展为与A相同形状的另一种方法。
It has about the same performance and memory footprint as Warren Weckesser's solution:
它与Warren Weckesser的解决方案具有相同的性能和内存占用:
import numpy as np
import scipy.sparse as sparse
N, M = 70000, 80000
A = sparse.rand(N, M, density=0.001).tocsr()
B = sparse.rand(1, M, density=0.001).tocsr()
In [185]: %timeit u = sparse.csr_matrix(np.ones((A.shape[0], 1), dtype=B.dtype)); Bp = u * B; A + Bp
1 loops, best of 3: 284 ms per loop
In [186]: %timeit A + B[np.zeros(A.shape[0])]
1 loops, best of 3: 280 ms per loop
and appears to be faster than using sparse.vstack
:
并且似乎比使用sparse.vstack更快:
In [187]: %timeit A + sparse.vstack([B for x in range(A.shape[0])])
1 loops, best of 3: 606 ms per loop
#1
2
A + B[np.zeros(A.shape[0])]
is another way to expand B
to the same shape as A
.
A + B [np.zeros(A.shape [0])]是将B扩展为与A相同形状的另一种方法。
It has about the same performance and memory footprint as Warren Weckesser's solution:
它与Warren Weckesser的解决方案具有相同的性能和内存占用:
import numpy as np
import scipy.sparse as sparse
N, M = 70000, 80000
A = sparse.rand(N, M, density=0.001).tocsr()
B = sparse.rand(1, M, density=0.001).tocsr()
In [185]: %timeit u = sparse.csr_matrix(np.ones((A.shape[0], 1), dtype=B.dtype)); Bp = u * B; A + Bp
1 loops, best of 3: 284 ms per loop
In [186]: %timeit A + B[np.zeros(A.shape[0])]
1 loops, best of 3: 280 ms per loop
and appears to be faster than using sparse.vstack
:
并且似乎比使用sparse.vstack更快:
In [187]: %timeit A + sparse.vstack([B for x in range(A.shape[0])])
1 loops, best of 3: 606 ms per loop