根据每一行的第一个元素返回NumPy数组的子集

I am trying to get the subset x of the given NumPy array alist such that the first element of each row must be in the list r.

我试图得到给定的NumPy数组的子集x这样每一行的第一个元素必须在列表r中。

>>> import numpy 
>>> alist = numpy.array([(0, 2), (0, 4), (1, 3), (1, 4), (2, 1), (3, 1), (3, 2), (4, 1), (4, 3), (4, 2)])
>>> alist
array([[0, 2],
   [0, 4],
   [1, 3],
   [1, 4],
   [2, 1],
   [3, 1],
   [3, 2],
   [4, 1],
   [4, 3],
   [4, 2]])
>>> r = [1,3]
>>> x = alist[where first element of each row is in r] #this i need to figure out.
>>> x
array([[1, 3],
   [1, 4],
   [3, 1],
   [3, 2]])

Any easy way (without looping as I've a large dataset) to do this in Python?

有什么简单的方法(没有循环，因为我有一个大的数据集)在Python中实现这一点吗?

2 个解决方案

#1

Slice the first column off input array (basically selecting first elem from each row), then use np.in1d with r as the second input to create a mask of such valid rows and finally index into the rows of the array with the mask to select the valid ones.

将第一列从输入数组中删除(基本上是从每一行中选择第一个elem)，然后使用np。以r作为第二个输入，创建这样的有效行掩码，最后用掩码索引到数组的行中，以选择有效行。

Thus, the implementation would be like so -

因此，实现是这样的

alist[np.in1d(alist[:,0],r)]

Sample run -

样本运行-

In [258]: alist   # Input array
Out[258]: 
array([[0, 2],
       [0, 4],
       [1, 3],
       [1, 4],
       [2, 1],
       [3, 1],
       [3, 2],
       [4, 1],
       [4, 3],
       [4, 2]])

In [259]: r  # Input list to be searched for
Out[259]: [1, 3]

In [260]: np.in1d(alist[:,0],r) # Mask of valid rows
Out[260]: array([False, False,  True,  True, False,  True,  True,
                        False, False, False], dtype=bool)

In [261]: alist[np.in1d(alist[:,0],r)] # Index and select for final o/p
Out[261]: 
array([[1, 3],
       [1, 4],
       [3, 1],
       [3, 2]])

#2

You can construct the index array for the valid rows using some indexing tricks: we can add an additional dimension and check equality with each element of your first column:

您可以使用一些索引技巧来构造有效行的索引数组:我们可以添加一个额外的维度，并检查与第一列中的每个元素是否相等:

import numpy as np
alist = np.array([(0, 2), (0, 4), (1, 3), (1, 4), (2, 1),
                     (3, 1), (3, 2), (4, 1), (4, 3), (4, 2)])

inds = (alist[:,0][:,None] == r).any(axis=-1)
x = alist[inds,:] # the valid rows

The trick is that we take the first column of alist, make it an (N,1)-shaped array, make use of array broadcasting in the comparison to end up with an (N,2)-shape boolean array, and if any of the values in a given row is True, we keep that index. The resulting index array is the exact same as the np.in1d one in Divakar's answer.

关键在于，我们取alist的第一列，使它成为(N,1)形数组，在比较中利用数组广播，最后得到(N,2)形布尔数组，如果给定行中的任何值为真，我们保留该索引。得到的索引数组与np完全相同。在迪瓦卡的回答中。

#1