
时间:2021-05-29 12:33:32

My question is about a specific array operation that I want to express using numpy.


I have an array of floats w and an array of indices idx of the same length as w and I want to sum up all w with the same idx value and collect them in an array v. As a loop, this looks like this:


for i, x in enumerate(w):
     v[idx[i]] += x

Is there a way to do this with array operations? My guess was v[idx] += w but that does not work, since idx contains the same index multiple times.

是否有一种方法可以对数组操作进行处理?我的猜测是v[idx] += w,但这行不通,因为idx多次包含相同的索引。



2 个解决方案



numpy.bincount was introduced for this purpose:


tmp = np.bincount(idx, w)
v[:len(tmp)] += tmp

I think as of 1.6 you can also pass a minlength to bincount.




This is a known behavior and, though somewhat unfortunate, does not have a numpy-level workaround. (bincount can be used for this if you twist its arm.) Doing the loop yourself is really your best bet.


Note that your code might have been a bit more clear without re-using the name w and without introducing another set of indices, like


for i, w_thing in zip(idx, w):
    v[i] += w_thing

If you need to speed up this loop, you might have to drop down to C. Cython makes this relatively easy.

如果您需要加速这个循环,您可能需要向下拉到C. Cython使这相对容易。



numpy.bincount was introduced for this purpose:


tmp = np.bincount(idx, w)
v[:len(tmp)] += tmp

I think as of 1.6 you can also pass a minlength to bincount.




This is a known behavior and, though somewhat unfortunate, does not have a numpy-level workaround. (bincount can be used for this if you twist its arm.) Doing the loop yourself is really your best bet.


Note that your code might have been a bit more clear without re-using the name w and without introducing another set of indices, like


for i, w_thing in zip(idx, w):
    v[i] += w_thing

If you need to speed up this loop, you might have to drop down to C. Cython makes this relatively easy.

如果您需要加速这个循环,您可能需要向下拉到C. Cython使这相对容易。