NumPy:比较两个数组中的元素

时间:2022-07-23 12:14:33

Anyone ever come up to this problem? Let's say you have two arrays like the following

有人遇到过这个问题吗?假设您有两个如下所示的数组

a = array([1,2,3,4,5,6])
b = array([1,4,5])

Is there a way to compare what elements in a exist in b? For example,

有没有办法比较b中存在的元素?例如,

c = a == b # Wishful example here
print c
array([1,4,5])
# Or even better
array([True, False, False, True, True, False])

I'm trying to avoid loops as it would take ages with millions of elements. Any ideas?

我试图避免循环,因为它需要数百万元素的年龄。有任何想法吗?

Cheers

干杯

6 个解决方案

#1


48  

Actually, there's an even simpler solution than any of these:

实际上,有一个比这些更简单的解决方案:

import numpy as np

a = array([1,2,3,4,5,6])
b = array([1,4,5])

c = np.in1d(a,b)

The resulting c is then:

结果c然后是:

array([ True, False, False,  True,  True, False], dtype=bool)

#2


18  

Use np.intersect1d.

使用np.intersect1d。

#!/usr/bin/env python
import numpy as np
a = np.array([1,2,3,4,5,6])
b = np.array([1,4,5])
c=np.intersect1d(a,b)
print(c)
# [1 4 5]

Note that np.intersect1d gives the wrong answer if a or b have nonunique elements. In that case use np.intersect1d_nu.

请注意,如果a或b具有非唯一元素,则np.intersect1d会给出错误的答案。在那种情况下使用np.intersect1d_nu。

There is also np.setdiff1d, setxor1d, setmember1d, and union1d. See Numpy Example List With Doc

还有np.setdiff1d,setxor1d,setmember1d和union1d。请参阅使用Doc的Numpy示例列表

#3


2  

Thanks for your reply kaizer.se. It's not quite what I was looking for, but with a suggestion from a friend and what you said I came up with the following.

感谢您的回复kaizer.se。这不是我想要的,但是根据朋友的建议和你所说的我想出了以下内容。

import numpy as np

a = np.array([1,4,5]).astype(np.float32)
b = np.arange(10).astype(np.float32)

# Assigning matching values from a in b as np.nan
b[b.searchsorted(a)] = np.nan

# Now generating Boolean arrays
match = np.isnan(b)
nonmatch = match == False

It's a bit of a cumbersome process, but it beats writing loops or using weave with loops.

这是一个繁琐的过程,但它比编写循环或使用带循环的编织更好。

Cheers

干杯

#4


2  

Numpy has a set function numpy.setmember1d() that works on sorted and uniqued arrays and returns exactly the boolean array that you want. If the input arrays don't match the criteria you'll need to convert to the set format and invert the transformation on the result.

Numpy有一个set函数numpy.setmember1d(),它可以处理有序和无序数组,并返回你想要的布尔数组。如果输入数组与您需要转换为设置格式的条件不匹配,则反转结果转换。

import numpy as np
a = np.array([6,1,2,3,4,5,6])
b = np.array([1,4,5])

# convert to the uniqued form
a_set, a_inv = np.unique1d(a, return_inverse=True)
b_set = np.unique1d(b)
# calculate matching elements
matches = np.setmea_set, b_set)
# invert the transformation
result = matches[a_inv]
print(result)
# [False  True False False  True  True False]

Edit: Unfortunately the setmember1d method in numpy is really inefficient. The search sorted and assign method you proposed works faster, but if you can assign directly you might as well assign directly to the result and avoid lots of unnecessary copying. Also your method will fail if b contains anything not in a. The following corrects those errors:

编辑:不幸的是numpy中的setmember1d方法效率很低。您建议的搜索排序和分配方法工作得更快,但如果您可以直接分配,您也可以直接分配给结果并避免大量不必要的复制。如果b包含不在a中的任何内容,您的方法也会失败。以下更正了这些错误:

result = np.zeros(a.shape, dtype=np.bool)
idxs = a.searchsorted(b)
idxs = idxs[np.where(idxs < a.shape[0])] # Filter out out of range values
idxs = idxs[np.where(a[idxs] == b)] # Filter out where there isn't an actual match
result[idxs] = True
print(result)

My benchmarks show this at 91us vs. 6.6ms for your approach and 109ms for numpy setmember1d on 1M element a and 100 element b.

我的基准测试结果表明,91us对比你的方法为6.6ms,对于1M元素a和100元素b的numpy setmember1d为109ms。

#5


0  

ebresset, your answer won't work unless a is a subset of b (and a and b are sorted). Otherwise the searchsorted will return false indices. I had to do something similar, and combining that with your code:

ebresset,除非a是b的子集(并且a和b已排序),否则您的答案将无效。否则searchsorted将返回false索引。我必须做类似的事情,并将其与您的代码相结合:

# Assume a and b are sorted
idxs = numpy.mod(b.searchsorted(a),len(b))
idxs = idxs[b[idxs]==a]
b[idxs] = numpy.nan
match = numpy.isnan(b)

#6


-2  

Your example implies set-like behavior, caring more about existance in the array than having the right element at the right place. Numpy does this differently with its mathematical arrays and matrices, it will tell you only about items at the exact right spot. Can you make that work for you?

您的示例意味着类似集合的行为,更关心数组中的存在而不是在正确的位置使用正确的元素。 Numpy对数学数组和矩阵的处理方式不同,它只会告诉您有关正确位置的项目。你能为你做这件事吗?

>>> import numpy
>>> a = numpy.array([1,2,3])
>>> b = numpy.array([1,3,3])
>>> a == b
array([ True, False,  True], dtype=bool)

#1


48  

Actually, there's an even simpler solution than any of these:

实际上,有一个比这些更简单的解决方案:

import numpy as np

a = array([1,2,3,4,5,6])
b = array([1,4,5])

c = np.in1d(a,b)

The resulting c is then:

结果c然后是:

array([ True, False, False,  True,  True, False], dtype=bool)

#2


18  

Use np.intersect1d.

使用np.intersect1d。

#!/usr/bin/env python
import numpy as np
a = np.array([1,2,3,4,5,6])
b = np.array([1,4,5])
c=np.intersect1d(a,b)
print(c)
# [1 4 5]

Note that np.intersect1d gives the wrong answer if a or b have nonunique elements. In that case use np.intersect1d_nu.

请注意,如果a或b具有非唯一元素,则np.intersect1d会给出错误的答案。在那种情况下使用np.intersect1d_nu。

There is also np.setdiff1d, setxor1d, setmember1d, and union1d. See Numpy Example List With Doc

还有np.setdiff1d,setxor1d,setmember1d和union1d。请参阅使用Doc的Numpy示例列表

#3


2  

Thanks for your reply kaizer.se. It's not quite what I was looking for, but with a suggestion from a friend and what you said I came up with the following.

感谢您的回复kaizer.se。这不是我想要的,但是根据朋友的建议和你所说的我想出了以下内容。

import numpy as np

a = np.array([1,4,5]).astype(np.float32)
b = np.arange(10).astype(np.float32)

# Assigning matching values from a in b as np.nan
b[b.searchsorted(a)] = np.nan

# Now generating Boolean arrays
match = np.isnan(b)
nonmatch = match == False

It's a bit of a cumbersome process, but it beats writing loops or using weave with loops.

这是一个繁琐的过程,但它比编写循环或使用带循环的编织更好。

Cheers

干杯

#4


2  

Numpy has a set function numpy.setmember1d() that works on sorted and uniqued arrays and returns exactly the boolean array that you want. If the input arrays don't match the criteria you'll need to convert to the set format and invert the transformation on the result.

Numpy有一个set函数numpy.setmember1d(),它可以处理有序和无序数组,并返回你想要的布尔数组。如果输入数组与您需要转换为设置格式的条件不匹配,则反转结果转换。

import numpy as np
a = np.array([6,1,2,3,4,5,6])
b = np.array([1,4,5])

# convert to the uniqued form
a_set, a_inv = np.unique1d(a, return_inverse=True)
b_set = np.unique1d(b)
# calculate matching elements
matches = np.setmea_set, b_set)
# invert the transformation
result = matches[a_inv]
print(result)
# [False  True False False  True  True False]

Edit: Unfortunately the setmember1d method in numpy is really inefficient. The search sorted and assign method you proposed works faster, but if you can assign directly you might as well assign directly to the result and avoid lots of unnecessary copying. Also your method will fail if b contains anything not in a. The following corrects those errors:

编辑:不幸的是numpy中的setmember1d方法效率很低。您建议的搜索排序和分配方法工作得更快,但如果您可以直接分配,您也可以直接分配给结果并避免大量不必要的复制。如果b包含不在a中的任何内容,您的方法也会失败。以下更正了这些错误:

result = np.zeros(a.shape, dtype=np.bool)
idxs = a.searchsorted(b)
idxs = idxs[np.where(idxs < a.shape[0])] # Filter out out of range values
idxs = idxs[np.where(a[idxs] == b)] # Filter out where there isn't an actual match
result[idxs] = True
print(result)

My benchmarks show this at 91us vs. 6.6ms for your approach and 109ms for numpy setmember1d on 1M element a and 100 element b.

我的基准测试结果表明,91us对比你的方法为6.6ms,对于1M元素a和100元素b的numpy setmember1d为109ms。

#5


0  

ebresset, your answer won't work unless a is a subset of b (and a and b are sorted). Otherwise the searchsorted will return false indices. I had to do something similar, and combining that with your code:

ebresset,除非a是b的子集(并且a和b已排序),否则您的答案将无效。否则searchsorted将返回false索引。我必须做类似的事情,并将其与您的代码相结合:

# Assume a and b are sorted
idxs = numpy.mod(b.searchsorted(a),len(b))
idxs = idxs[b[idxs]==a]
b[idxs] = numpy.nan
match = numpy.isnan(b)

#6


-2  

Your example implies set-like behavior, caring more about existance in the array than having the right element at the right place. Numpy does this differently with its mathematical arrays and matrices, it will tell you only about items at the exact right spot. Can you make that work for you?

您的示例意味着类似集合的行为,更关心数组中的存在而不是在正确的位置使用正确的元素。 Numpy对数学数组和矩阵的处理方式不同,它只会告诉您有关正确位置的项目。你能为你做这件事吗?

>>> import numpy
>>> a = numpy.array([1,2,3])
>>> b = numpy.array([1,3,3])
>>> a == b
array([ True, False,  True], dtype=bool)