I have a large numpy array (dtype=int
) and a set of numbers which I'd like to find in that array, e.g.,
我有一个大的numpy数组(dtype = int)和一组我想在该数组中找到的数字,例如,
import numpy as np
values = np.array([1, 2, 3, 1, 2, 4, 5, 6, 3, 2, 1])
searchvals = [3, 1]
# result = [0, 2, 3, 8, 10]
The result
array doesn't have to be sorted.
结果数组不必排序。
Speed is an issue, and since both values
and searchvals
can be large,
速度是一个问题,因为值和搜索量都很大,
for searchval in searchvals:
np.where(values == searchval)[0]
doesn't cut it.
不削减它。
Any hints?
3 个解决方案
#1
5
Is this fast enough?
这够快吗?
>>> np.where(np.in1d(values, searchvals))
(array([ 0, 2, 3, 8, 10]),)
#2
1
I would say using np.in1d
would be the intuitive solution to solve such a case. Having said that, based on this solution
here's an alternative with np.searchsorted
-
我会说使用np.in1d将是解决这种情况的直观解决方案。话虽如此,基于此解决方案,这里是np.searchsorted的替代方案 -
sidx = np.argsort(searchvals)
left_idx = np.searchsorted(searchvals,values,sorter=sidx,side='left')
right_idx = np.searchsorted(searchvals,values,sorter=sidx,side='right')
out = np.where(left_idx != right_idx)[0]
#3
0
Can you avoid numpy all together? List concatenation should be much faster than relying on numpy's methods. This will still work even if values
needs to be a numpy array.
你能一起避免numpy吗?列表连接应该比依赖于numpy的方法快得多。即使值必须是一个numpy数组,这仍然有效。
result = []
for sv in searchvals:
result += [i for i in range(len(values)) if values[i] == sv]
#1
5
Is this fast enough?
这够快吗?
>>> np.where(np.in1d(values, searchvals))
(array([ 0, 2, 3, 8, 10]),)
#2
1
I would say using np.in1d
would be the intuitive solution to solve such a case. Having said that, based on this solution
here's an alternative with np.searchsorted
-
我会说使用np.in1d将是解决这种情况的直观解决方案。话虽如此,基于此解决方案,这里是np.searchsorted的替代方案 -
sidx = np.argsort(searchvals)
left_idx = np.searchsorted(searchvals,values,sorter=sidx,side='left')
right_idx = np.searchsorted(searchvals,values,sorter=sidx,side='right')
out = np.where(left_idx != right_idx)[0]
#3
0
Can you avoid numpy all together? List concatenation should be much faster than relying on numpy's methods. This will still work even if values
needs to be a numpy array.
你能一起避免numpy吗?列表连接应该比依赖于numpy的方法快得多。即使值必须是一个numpy数组,这仍然有效。
result = []
for sv in searchvals:
result += [i for i in range(len(values)) if values[i] == sv]