获取所有列索引等于max并使用它们索引另一个数组：numpy vs sparse csr_matrix

Following the example here, I am able to find the column indices of a 2D numpy array and get back an array of column indices of all occurrences of the max value.

按照这里的示例,我能够找到2D numpy数组的列索引,并返回所有出现的max值的列索引数组。

But now I want to do the same thing but on a sparse csr_matrix.

但现在我想在稀疏的csr_matrix上做同样的事情。

x = np.array([[0,0,1,0,0,0,2],[0,0,0,4,0,0,0],[0,9,1,0,0,0,2],[0,0,1,0,0,9,2]])
max_col_inds = np.argwhere(x == np.max(x))[:,1]
# array([1, 5], dtype=int64)

Then I want to get the 1st and 5th elements of a 1D array using that result:

然后我想使用该结果获得1D数组的第1和第5个元素:

words[max_col_inds]

If x is a 2D numpy array and words is a 1D numpy array, this works.

如果x是2D numpy数组,而单词是1D numpy数组,则可行。

But now if I replace x with a scipy.sparse.csr.csr_matrix, I get this on the call to np.argwhere():

但是现在如果我用scipy.sparse.csr.csr_matrix替换x,我会在调用np.argwhere()时得到这个:

TypeError: tuple indices must be integers, not tuple

1 个解决方案

#1

In [804]: x = np.array([[0,0,1,0,0,0,2],[0,0,0,4,0,0,0],[0,9,1,0,0,0,2],[0,0,1,0,0,9,2]])
In [805]: np.max(x)
Out[805]: 9
In [806]: np.where(x == 9)
Out[806]: (array([2, 3], dtype=int32), array([1, 5], dtype=int32))

argwhere is just np.transpose(np.where(...)); that is, converts the tuple into a 2d array and transposes it:

argwhere只是np.transpose(np.where(...));也就是说,将元组转换为2d数组并对其进行转置:

In [807]: np.argwhere(x ==9)
Out[807]: 
array([[2, 1],
       [3, 5]], dtype=int32)

Doing the same thing with sparse

用稀疏做同样的事情

In [808]: xM = sparse.csr_matrix(x)
In [809]: xM == 9
Out[809]: 
<4x7 sparse matrix of type '<class 'numpy.bool_'>'
    with 2 stored elements in Compressed Sparse Row format>

np.where is the samething as np.nonzero:

np.where和np.nonzero一样:

In [810]: (xM==9).nonzero()
Out[810]: (array([2, 3], dtype=int32), array([1, 5], dtype=int32))
In [811]: np.transpose((xM==9).nonzero())
Out[811]: 
array([[2, 1],
       [3, 5]], dtype=int32)

Actually in the current numpy argwhere works with sparse. That's because np.nonzero delegates to the matrix method:

实际上在目前的numpy argwhere工作与稀疏。那是因为np.nonzero委托给矩阵方法:

In [813]: np.argwhere(xM==9)
Out[813]: 
array([[2, 1],
       [3, 5]], dtype=int32)

#1

In [804]: x = np.array([[0,0,1,0,0,0,2],[0,0,0,4,0,0,0],[0,9,1,0,0,0,2],[0,0,1,0,0,9,2]])
In [805]: np.max(x)
Out[805]: 9
In [806]: np.where(x == 9)
Out[806]: (array([2, 3], dtype=int32), array([1, 5], dtype=int32))