用numpy加速花哨的索引

I have two numpy arrays and each has shape of (10000,10000). One is value array and the other one is index array.

我有两个numpy数组,每个数组的形状为(10000,10000)。一个是值数组,另一个是索引数组。

Value=np.random.rand(10000,10000)
Index=np.random.randint(0,1000,(10000,10000))

I want to make a list (or 1D numpy array) by summing all the "Value array" referring the "Index array". For example, for each index i, finding matching array index and giving it to value array as argument

我想通过将所有引用“索引数组”的“值数组”相加来制作一个列表(或1D numpy数组)。例如,对于每个索引i,找到匹配的数组索引并将其作为参数赋予value数组

for i in range(1000):
    NewArray[i] = np.sum(Value[np.where(Index==i)])

However, This is too slow since I have to do this loop through 300,000 arrays. I tried to come up with some logical indexing method like

但是,这太慢了,因为我必须通过300,000个数组进行循环。我试着提出一些逻辑索引方法,比如

NewArray[Index] += Value[Index]

But it didn't work. The next thing I tried is using dictionary

但它没有用。我尝试的下一件事是使用字典

for k, v in list(zip(Index.flatten(),Value.flatten())):
    NewDict[k].append(v)

and

for i in NewDict:
    NewDict[i] = np.sum(NewDict[i])

But it was slow too

但它也很慢

Is there any smart way to speed up?

有没有聪明的方法加快?

1 个解决方案

#1

I had two thoughts. First, try masking, it speeds this up by about 4x:

我有两个想法。首先,尝试屏蔽,它将速度提高约4倍:

for i in range(1000):
    NewArray[i] = np.sum(Value[Index==i])

Alternately, you can sort your arrays to put the values you're adding together in contiguous memory space. Masking or using where() has to gather all your values together each time you call sum on the slice. By front-loading this gathering, you might be able to speed things up considerably:

或者,您可以对数组进行排序,以将要添加的值放在连续的内存空间中。每次在切片上调用sum时,屏蔽或使用where()必须将所有值聚集在一起。通过前装这次聚会,您可以大大加快速度:

# flatten your arrays
vals = Value.ravel()
inds = Index.ravel()
s = np.argsort(inds)  # these are the indices that will sort your Index array

v_sorted = vals[s].copy()  # the copy here orders the values in memory instead of just providing a view
i_sorted = inds[s].copy()
searches = np.searchsorted(i_sorted, np.arange(0, i_sorted[-1] + 2)) # 1 greater than your max, this gives you your array end...
for i in range(len(searches) -1):
    st = searches[i]
    nd = searches[i+1]
    NewArray[i] = v_sorted[st:nd].sum()

This method takes 26 sec on my computer vs 400 using the old way. Good luck. If you want to read more about contiguous memory and performance check this discussion out.

我的计算机上使用旧方法需要26秒,而使用旧方法需要400秒。祝好运。如果您想了解有关连续内存和性能的更多信息,请查看此讨论。

#1