I am trying to get the subset x of the given NumPy array alist such that the first element of each row must be in the list r.
我试图得到给定的NumPy数组的子集x这样每一行的第一个元素必须在列表r中。
>>> import numpy
>>> alist = numpy.array([(0, 2), (0, 4), (1, 3), (1, 4), (2, 1), (3, 1), (3, 2), (4, 1), (4, 3), (4, 2)])
>>> alist
array([[0, 2],
[0, 4],
[1, 3],
[1, 4],
[2, 1],
[3, 1],
[3, 2],
[4, 1],
[4, 3],
[4, 2]])
>>> r = [1,3]
>>> x = alist[where first element of each row is in r] #this i need to figure out.
>>> x
array([[1, 3],
[1, 4],
[3, 1],
[3, 2]])
Any easy way (without looping as I've a large dataset) to do this in Python?
有什么简单的方法(没有循环,因为我有一个大的数据集)在Python中实现这一点吗?
2 个解决方案
#1
2
Slice the first column off input array (basically selecting first elem from each row), then use np.in1d
with r
as the second input to create a mask of such valid rows and finally index into the rows of the array with the mask to select the valid ones.
将第一列从输入数组中删除(基本上是从每一行中选择第一个elem),然后使用np。以r作为第二个输入,创建这样的有效行掩码,最后用掩码索引到数组的行中,以选择有效行。
Thus, the implementation would be like so -
因此,实现是这样的
alist[np.in1d(alist[:,0],r)]
Sample run -
样本运行-
In [258]: alist # Input array
Out[258]:
array([[0, 2],
[0, 4],
[1, 3],
[1, 4],
[2, 1],
[3, 1],
[3, 2],
[4, 1],
[4, 3],
[4, 2]])
In [259]: r # Input list to be searched for
Out[259]: [1, 3]
In [260]: np.in1d(alist[:,0],r) # Mask of valid rows
Out[260]: array([False, False, True, True, False, True, True,
False, False, False], dtype=bool)
In [261]: alist[np.in1d(alist[:,0],r)] # Index and select for final o/p
Out[261]:
array([[1, 3],
[1, 4],
[3, 1],
[3, 2]])
#2
2
You can construct the index array for the valid rows using some indexing tricks: we can add an additional dimension and check equality with each element of your first column:
您可以使用一些索引技巧来构造有效行的索引数组:我们可以添加一个额外的维度,并检查与第一列中的每个元素是否相等:
import numpy as np
alist = np.array([(0, 2), (0, 4), (1, 3), (1, 4), (2, 1),
(3, 1), (3, 2), (4, 1), (4, 3), (4, 2)])
inds = (alist[:,0][:,None] == r).any(axis=-1)
x = alist[inds,:] # the valid rows
The trick is that we take the first column of alist
, make it an (N,1)
-shaped array, make use of array broadcasting in the comparison to end up with an (N,2)
-shape boolean array, and if any of the values in a given row is True
, we keep that index. The resulting index array is the exact same as the np.in1d
one in Divakar's answer.
关键在于,我们取alist的第一列,使它成为(N,1)形数组,在比较中利用数组广播,最后得到(N,2)形布尔数组,如果给定行中的任何值为真,我们保留该索引。得到的索引数组与np完全相同。在迪瓦卡的回答中。
#1
2
Slice the first column off input array (basically selecting first elem from each row), then use np.in1d
with r
as the second input to create a mask of such valid rows and finally index into the rows of the array with the mask to select the valid ones.
将第一列从输入数组中删除(基本上是从每一行中选择第一个elem),然后使用np。以r作为第二个输入,创建这样的有效行掩码,最后用掩码索引到数组的行中,以选择有效行。
Thus, the implementation would be like so -
因此,实现是这样的
alist[np.in1d(alist[:,0],r)]
Sample run -
样本运行-
In [258]: alist # Input array
Out[258]:
array([[0, 2],
[0, 4],
[1, 3],
[1, 4],
[2, 1],
[3, 1],
[3, 2],
[4, 1],
[4, 3],
[4, 2]])
In [259]: r # Input list to be searched for
Out[259]: [1, 3]
In [260]: np.in1d(alist[:,0],r) # Mask of valid rows
Out[260]: array([False, False, True, True, False, True, True,
False, False, False], dtype=bool)
In [261]: alist[np.in1d(alist[:,0],r)] # Index and select for final o/p
Out[261]:
array([[1, 3],
[1, 4],
[3, 1],
[3, 2]])
#2
2
You can construct the index array for the valid rows using some indexing tricks: we can add an additional dimension and check equality with each element of your first column:
您可以使用一些索引技巧来构造有效行的索引数组:我们可以添加一个额外的维度,并检查与第一列中的每个元素是否相等:
import numpy as np
alist = np.array([(0, 2), (0, 4), (1, 3), (1, 4), (2, 1),
(3, 1), (3, 2), (4, 1), (4, 3), (4, 2)])
inds = (alist[:,0][:,None] == r).any(axis=-1)
x = alist[inds,:] # the valid rows
The trick is that we take the first column of alist
, make it an (N,1)
-shaped array, make use of array broadcasting in the comparison to end up with an (N,2)
-shape boolean array, and if any of the values in a given row is True
, we keep that index. The resulting index array is the exact same as the np.in1d
one in Divakar's answer.
关键在于,我们取alist的第一列,使它成为(N,1)形数组,在比较中利用数组广播,最后得到(N,2)形布尔数组,如果给定行中的任何值为真,我们保留该索引。得到的索引数组与np完全相同。在迪瓦卡的回答中。