如何访问稀疏矩阵元素?

时间:2022-11-27 21:23:07
type(A)
<class 'scipy.sparse.csc.csc_matrix'>
A.shape
(8529, 60877)
print A[0,:]
  (0, 25)   1.0
  (0, 7422) 1.0
  (0, 26062)    1.0
  (0, 31804)    1.0
  (0, 41602)    1.0
  (0, 43791)    1.0
print A[1,:]
  (0, 7044) 1.0
  (0, 31418)    1.0
  (0, 42341)    1.0
  (0, 47125)    1.0
  (0, 54376)    1.0
print A[:,0]
  #nothing returned

Now what I don't understand is when I type A[1,:] that should select elements from the 2nd row, yet I get elements from the 1st row in the print. When I type A[:,0] that should return the first column but I get nothing printed. Why?

现在我不明白的是当我输入A [1,:]时应该从第2行中选择元素,但是我从打印中的第1行获取元素。当我输入应该返回第一列的A [:,0]但我没有打印任何内容。为什么?

4 个解决方案

#1


24  

A[1,:] is itself a sparse matrix with shape (1, 60877). This is what you are printing, and it has only one row, so all the row coordinates are 0.

A [1,:]本身就是一个形状稀疏的矩阵(1,60877)。这是您要打印的内容,它只有一行,因此所有行坐标都是0。

For example:

In [41]: a = csc_matrix([[1, 0, 0, 0], [0, 0, 10, 11], [0, 0, 0, 99]])

In [42]: a.todense()
Out[42]: 
matrix([[ 1,  0,  0,  0],
        [ 0,  0, 10, 11],
        [ 0,  0,  0, 99]], dtype=int64)

In [43]: print(a[1, :])
  (0, 2)    10
  (0, 3)    11

In [44]: print(a)
  (0, 0)    1
  (1, 2)    10
  (1, 3)    11
  (2, 3)    99

In [45]: print(a[1, :].toarray())
[[ 0  0 10 11]]

You can select columns, but if there are no nonzero elements in the column, nothing is displayed when it is output with print:

您可以选择列,但如果列中没有非零元素,则使用print输出时不会显示任何内容:

In [46]: a[:, 3].toarray()
Out[46]: 
array([[ 0],
       [11],
       [99]])

In [47]: print(a[:,3])
  (1, 0)    11
  (2, 0)    99

In [48]: a[:, 1].toarray()
Out[48]: 
array([[0],
       [0],
       [0]])

In [49]: print(a[:, 1])


In [50]:

The last print call shows no output because the column a[:, 1] has no nonzero elements.

最后一次打印调用显示没有输出,因为列a [:,1]没有非零元素。

#2


9  

To answer your title's question using a different technique than your question's details:

使用与问题细节不同的技巧来回答标题的问题:

csc_matrix gives you the method .nonzero().

csc_matrix为您提供方法.nonzero()。

Given:

>>> import numpy as np
>>> from scipy.sparse.csc import csc_matrix
>>> 
>>> row = np.array( [0, 1, 3])
>>> col = np.array( [0, 2, 3])
>>> data = np.array([1, 4, 16])
>>> A = csc_matrix((data, (row, col)), shape=(4, 4))

You can access the indices poniting to non-zero data by:

您可以通过以下方式访问引入非零数据的索引:

>>> rows, cols = A.nonzero()
>>> rows
array([0, 1, 3], dtype=int32)
>>> cols
array([0, 2, 3], dtype=int32)

Which you can then use to access your data, without ever needing to make a dense version of your sparse matrix:

然后您可以使用它来访问您的数据,而无需制作稀疏矩阵的密集版本:

>>> [((i, j), A[i,j]) for i, j in zip(*A.nonzero())]
[((0, 0), 1), ((1, 2), 4), ((3, 3), 16)]

#3


0  

If it is for calculating TFIDF score using TfidfTransformer, yu can get the IDF by tfidf.idf_. Then the sparse array name, say 'a', a.toarray().

如果是使用TfidfTransformer计算TFIDF分数,则yu可以通过tfidf.idf_获得IDF。然后稀疏数组名称,说'a',a.toarray()。

toarray returns an ndarray; todense returns a matrix. If you want a matrix, use todense; otherwise, use toarray.

toarray返回一个ndarray; todense返回一个矩阵。如果你想要一个矩阵,请使用todense;否则,使用toarray。

#4


0  

I fully acknowledge all the other given answers. This is simply a different approach.

我完全承认所有其他给出的答案。这只是一种不同的方法。

To demonstrate this example I am creating a new sparse matrix:

为了演示这个例子,我创建了一个新的稀疏矩阵:

from scipy.sparse.csc import csc_matrix
a = csc_matrix([[1, 0, 0, 0], [0, 0, 10, 11], [0, 0, 0, 99]])
print(a)

Output:

(0, 0)  1
(1, 2)  10
(1, 3)  11
(2, 3)  99

To access this easily, like the way we access a list, I converted it into a list.

为了轻松访问它,就像我们访问列表的方式一样,我将其转换为列表。

temp_list = []
for i in a:
    temp_list.append(list(i.A[0]))

print(temp_list)

Output:

[[1, 0, 0, 0], [0, 0, 10, 11], [0, 0, 0, 99]]

This might look stupid, since I am creating a sparse matrix and converting it back, but there are some functions like TfidfVectorizer and others that return a sparse matrix as output and handling them can be tricky. This is one way to extract data out of a sparse matrix.

这可能看起来很愚蠢,因为我正在创建一个稀疏矩阵并将其转换回来,但是有一些函数如TfidfVectorizer和其他函数返回稀疏矩阵作为输出并处理它们可能会很棘手。这是从稀疏矩阵中提取数据的一种方法。

#1


24  

A[1,:] is itself a sparse matrix with shape (1, 60877). This is what you are printing, and it has only one row, so all the row coordinates are 0.

A [1,:]本身就是一个形状稀疏的矩阵(1,60877)。这是您要打印的内容,它只有一行,因此所有行坐标都是0。

For example:

In [41]: a = csc_matrix([[1, 0, 0, 0], [0, 0, 10, 11], [0, 0, 0, 99]])

In [42]: a.todense()
Out[42]: 
matrix([[ 1,  0,  0,  0],
        [ 0,  0, 10, 11],
        [ 0,  0,  0, 99]], dtype=int64)

In [43]: print(a[1, :])
  (0, 2)    10
  (0, 3)    11

In [44]: print(a)
  (0, 0)    1
  (1, 2)    10
  (1, 3)    11
  (2, 3)    99

In [45]: print(a[1, :].toarray())
[[ 0  0 10 11]]

You can select columns, but if there are no nonzero elements in the column, nothing is displayed when it is output with print:

您可以选择列,但如果列中没有非零元素,则使用print输出时不会显示任何内容:

In [46]: a[:, 3].toarray()
Out[46]: 
array([[ 0],
       [11],
       [99]])

In [47]: print(a[:,3])
  (1, 0)    11
  (2, 0)    99

In [48]: a[:, 1].toarray()
Out[48]: 
array([[0],
       [0],
       [0]])

In [49]: print(a[:, 1])


In [50]:

The last print call shows no output because the column a[:, 1] has no nonzero elements.

最后一次打印调用显示没有输出,因为列a [:,1]没有非零元素。

#2


9  

To answer your title's question using a different technique than your question's details:

使用与问题细节不同的技巧来回答标题的问题:

csc_matrix gives you the method .nonzero().

csc_matrix为您提供方法.nonzero()。

Given:

>>> import numpy as np
>>> from scipy.sparse.csc import csc_matrix
>>> 
>>> row = np.array( [0, 1, 3])
>>> col = np.array( [0, 2, 3])
>>> data = np.array([1, 4, 16])
>>> A = csc_matrix((data, (row, col)), shape=(4, 4))

You can access the indices poniting to non-zero data by:

您可以通过以下方式访问引入非零数据的索引:

>>> rows, cols = A.nonzero()
>>> rows
array([0, 1, 3], dtype=int32)
>>> cols
array([0, 2, 3], dtype=int32)

Which you can then use to access your data, without ever needing to make a dense version of your sparse matrix:

然后您可以使用它来访问您的数据,而无需制作稀疏矩阵的密集版本:

>>> [((i, j), A[i,j]) for i, j in zip(*A.nonzero())]
[((0, 0), 1), ((1, 2), 4), ((3, 3), 16)]

#3


0  

If it is for calculating TFIDF score using TfidfTransformer, yu can get the IDF by tfidf.idf_. Then the sparse array name, say 'a', a.toarray().

如果是使用TfidfTransformer计算TFIDF分数,则yu可以通过tfidf.idf_获得IDF。然后稀疏数组名称,说'a',a.toarray()。

toarray returns an ndarray; todense returns a matrix. If you want a matrix, use todense; otherwise, use toarray.

toarray返回一个ndarray; todense返回一个矩阵。如果你想要一个矩阵,请使用todense;否则,使用toarray。

#4


0  

I fully acknowledge all the other given answers. This is simply a different approach.

我完全承认所有其他给出的答案。这只是一种不同的方法。

To demonstrate this example I am creating a new sparse matrix:

为了演示这个例子,我创建了一个新的稀疏矩阵:

from scipy.sparse.csc import csc_matrix
a = csc_matrix([[1, 0, 0, 0], [0, 0, 10, 11], [0, 0, 0, 99]])
print(a)

Output:

(0, 0)  1
(1, 2)  10
(1, 3)  11
(2, 3)  99

To access this easily, like the way we access a list, I converted it into a list.

为了轻松访问它,就像我们访问列表的方式一样,我将其转换为列表。

temp_list = []
for i in a:
    temp_list.append(list(i.A[0]))

print(temp_list)

Output:

[[1, 0, 0, 0], [0, 0, 10, 11], [0, 0, 0, 99]]

This might look stupid, since I am creating a sparse matrix and converting it back, but there are some functions like TfidfVectorizer and others that return a sparse matrix as output and handling them can be tricky. This is one way to extract data out of a sparse matrix.

这可能看起来很愚蠢,因为我正在创建一个稀疏矩阵并将其转换回来,但是有一些函数如TfidfVectorizer和其他函数返回稀疏矩阵作为输出并处理它们可能会很棘手。这是从稀疏矩阵中提取数据的一种方法。