如何在没有toDense()的情况下将scipy.csr稀疏矩阵作为普通密集矩阵?

时间:2022-10-03 21:22:43

I have a problem with sparse matrixes in scipy. I want to use them as a normal matrix but not with todense() function. I m new in this field, I dont know how I can get the same result when I want to multiply the sparse matrix, but without beeing a sparse matrix... I think sparse matrix only used for faster computation, so it should be possible to do this without a sparse matrix:

我在scipy中有稀疏矩阵的问题。我想将它们用作普通矩阵,但不能用于todense()函数。我是这个领域的新手,我不知道当我想要稀疏矩阵时,我怎么能得到相同的结果,但是没有稀疏矩阵...我认为稀疏矩阵只用于更快的计算,所以它应该是可能的没有稀疏矩阵这样做:

sparse_matrix * 5 == sparase_matrix.todense() * 5 == no_sparse_matrix* 5

sparse_matrix * 5 == sparase_matrix.todense()* 5 == no_sparse_matrix * 5

data = np.ones(5178)
indices   = [34,12,545,23...,25,18,29] Shape:5178L
indptr = np.arange(5178 + 1)

sparse_matrix = sp.csr_matrix((data, indices, indptr), shape = (5178, 3800))

Is this correct? sparse_matrix * 5 == sparase_matrix.todense() * 5 == data * 5 ?

它是否正确? sparse_matrix * 5 == sparase_matrix.todense()* 5 == data * 5?

My goal is to get the same result as when the sparse matrix is multiplied without using a sparse matrix? Is this possible? How can I do this?

我的目标是获得与稀疏矩阵相乘而不使用稀疏矩阵时相同的结果?这可能吗?我怎样才能做到这一点?


edit: about my intension: My problem is that I want to transfer a python code into java and my java libary for linear algeba does not provide sparse matrix operstions.

编辑:关于我的意图:我的问题是我想将一个python代码转移到java中,而我的java libary for linear algeba不提供稀疏矩阵运算。

So I have to do the same in java without sparse matrixes. I was not sure, if I can just use the data array instead of a sparse matrix.

所以我必须在没有稀疏矩阵的java中做同样的事情。我不确定,如果我可以使用数据数组而不是稀疏矩阵。

In the original code a sparse matrix is multiplied with an other matrix. To transfer that to java I will just multiply the data array of the sparse matrix with the other matrix. Is this correct?

在原始代码中,稀疏矩阵与另一个矩阵相乘。要将它传递给java,我只需将稀疏矩阵的数据数组与另一个矩阵相乘。它是否正确?

1 个解决方案

#1


It's not entirely clear what you are asking for, but here's my guess.

你要求的并不完全清楚,但这是我的猜测。

Let's just experiment with a simple array:

我们来试试一个简单的数组:

Start with 3 arrays (I took these from another sparse matrix, but that isn't important):

从3个数组开始(我从另一个稀疏矩阵中获取这些数据,但这并不重要):

In [165]: data
Out[165]: array([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11], dtype=int32)

In [166]: indices
Out[166]: array([1, 2, 3, 0, 1, 2, 3, 0, 1, 2, 3], dtype=int32)

In [167]: indptr
Out[167]: array([ 0,  3,  7, 11], dtype=int32)

In [168]: M=sparse.csr_matrix((data,indices,indptr),shape=(3,4))

These arrays have been assigned to 3 attributes of the new matrix

这些数组已分配给新矩阵的3个属性

In [169]: M.data
Out[169]: array([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11], dtype=int32)

In [170]: M.indices
Out[170]: array([1, 2, 3, 0, 1, 2, 3, 0, 1, 2, 3], dtype=int32)

In [171]: M.indptr
Out[171]: array([ 0,  3,  7, 11], dtype=int32)

Now try multiplying the .data attribute:

现在尝试乘以.data属性:

In [172]: M.data *= 3

Low and behold we have multiplied the 'whole' array

低,看,我们已经乘以'整个'阵列

In [173]: M.A
Out[173]: 
array([[ 0,  3,  6,  9],
       [12, 15, 18, 21],
       [24, 27, 30, 33]], dtype=int32)

Of course we can also multiply the matrix directly. That is, multiplication by a constant is defined for csr sparse matrices:

当然我们也可以直接乘以矩阵。也就是说,为csr稀疏矩阵定义乘以常数:

In [174]: M *= 2

In [175]: M.A
Out[175]: 
array([[ 0,  6, 12, 18],
       [24, 30, 36, 42],
       [48, 54, 60, 66]], dtype=int32)

In [176]: M.data
Out[176]: array([ 6, 12, 18, 24, 30, 36, 42, 48, 54, 60, 66], dtype=int32)

Out of curiousity lets look at the source array. It too has changed. So M.data points to the same array. Change one, change the other.

出于好奇,让我们看看源数组。它也发生了变化。所以M.data指向同一个数组。改变一个,改变另一个。

In [177]: data
Out[177]: array([ 6, 12, 18, 24, 30, 36, 42, 48, 54, 60, 66], dtype=int32)

So when the matrix is created this way, it is possible to multiply it by a scalar in several different ways.

因此,当以这种方式创建矩阵时,可以通过几种不同的方式将其乘以标量。

Which is best? Directly multiplying the .data attribute might be faster than multiplying the matrix. But you should be aware of the differences between manipulating .data directly, and using the defined math operations for the whole matrix. For example M*N performs matrix multiplication. You really should understand the matrix data structure before you try changing its internals directly.

哪个最好?直接乘以.data属性可能比乘以矩阵更快。但是你应该意识到直接操作.data和使用定义的数学运算整个矩阵之间的区别。例如,M * N执行矩阵乘法。在尝试直接更改其内部结构之前,您真的应该了解矩阵数据结构。

The ability to modify data, the source array, depends on creating the matrix just this way, and maintaining that pointer link. If you defined it via a coo matrix (or coo style inputs), the data link would not be maintained. And M1 = M*2 is not going to pass this link on to M1.

修改数据(源数组)的能力取决于以这种方式创建矩阵,并维护该指针链接。如果您通过coo矩阵(或CU样式输入)定义它,则不会保留数据链接。并且M1 = M * 2不会将此链接传递给M1。

Get your code working with the normal math operations sparse has defined. Later, if you still to squeeze out more speed, you can dig into the internals, and streamline selected operations.

让您的代码使用稀疏已定义的常规数学运算。之后,如果您仍然要挤出更快的速度,您可以深入了解内部,并简化选定的操作。

#1


It's not entirely clear what you are asking for, but here's my guess.

你要求的并不完全清楚,但这是我的猜测。

Let's just experiment with a simple array:

我们来试试一个简单的数组:

Start with 3 arrays (I took these from another sparse matrix, but that isn't important):

从3个数组开始(我从另一个稀疏矩阵中获取这些数据,但这并不重要):

In [165]: data
Out[165]: array([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11], dtype=int32)

In [166]: indices
Out[166]: array([1, 2, 3, 0, 1, 2, 3, 0, 1, 2, 3], dtype=int32)

In [167]: indptr
Out[167]: array([ 0,  3,  7, 11], dtype=int32)

In [168]: M=sparse.csr_matrix((data,indices,indptr),shape=(3,4))

These arrays have been assigned to 3 attributes of the new matrix

这些数组已分配给新矩阵的3个属性

In [169]: M.data
Out[169]: array([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11], dtype=int32)

In [170]: M.indices
Out[170]: array([1, 2, 3, 0, 1, 2, 3, 0, 1, 2, 3], dtype=int32)

In [171]: M.indptr
Out[171]: array([ 0,  3,  7, 11], dtype=int32)

Now try multiplying the .data attribute:

现在尝试乘以.data属性:

In [172]: M.data *= 3

Low and behold we have multiplied the 'whole' array

低,看,我们已经乘以'整个'阵列

In [173]: M.A
Out[173]: 
array([[ 0,  3,  6,  9],
       [12, 15, 18, 21],
       [24, 27, 30, 33]], dtype=int32)

Of course we can also multiply the matrix directly. That is, multiplication by a constant is defined for csr sparse matrices:

当然我们也可以直接乘以矩阵。也就是说,为csr稀疏矩阵定义乘以常数:

In [174]: M *= 2

In [175]: M.A
Out[175]: 
array([[ 0,  6, 12, 18],
       [24, 30, 36, 42],
       [48, 54, 60, 66]], dtype=int32)

In [176]: M.data
Out[176]: array([ 6, 12, 18, 24, 30, 36, 42, 48, 54, 60, 66], dtype=int32)

Out of curiousity lets look at the source array. It too has changed. So M.data points to the same array. Change one, change the other.

出于好奇,让我们看看源数组。它也发生了变化。所以M.data指向同一个数组。改变一个,改变另一个。

In [177]: data
Out[177]: array([ 6, 12, 18, 24, 30, 36, 42, 48, 54, 60, 66], dtype=int32)

So when the matrix is created this way, it is possible to multiply it by a scalar in several different ways.

因此,当以这种方式创建矩阵时,可以通过几种不同的方式将其乘以标量。

Which is best? Directly multiplying the .data attribute might be faster than multiplying the matrix. But you should be aware of the differences between manipulating .data directly, and using the defined math operations for the whole matrix. For example M*N performs matrix multiplication. You really should understand the matrix data structure before you try changing its internals directly.

哪个最好?直接乘以.data属性可能比乘以矩阵更快。但是你应该意识到直接操作.data和使用定义的数学运算整个矩阵之间的区别。例如,M * N执行矩阵乘法。在尝试直接更改其内部结构之前,您真的应该了解矩阵数据结构。

The ability to modify data, the source array, depends on creating the matrix just this way, and maintaining that pointer link. If you defined it via a coo matrix (or coo style inputs), the data link would not be maintained. And M1 = M*2 is not going to pass this link on to M1.

修改数据(源数组)的能力取决于以这种方式创建矩阵,并维护该指针链接。如果您通过coo矩阵(或CU样式输入)定义它,则不会保留数据链接。并且M1 = M * 2不会将此链接传递给M1。

Get your code working with the normal math operations sparse has defined. Later, if you still to squeeze out more speed, you can dig into the internals, and streamline selected operations.

让您的代码使用稀疏已定义的常规数学运算。之后,如果您仍然要挤出更快的速度,您可以深入了解内部,并简化选定的操作。