掩码基于索引的numpy数组

时间:2023-02-08 15:42:12

How do I mask an array based on the actual index values?

如何根据实际索引值屏蔽数组?

That is, if I have a 10 x 10 x 30 matrix and I want to mask the array when the first and second index equal each other.

也就是说,如果我有一个10 x 10 x 30的矩阵,我想在第一个和第二个索引相等时屏蔽数组。

For example, [1, 1 , :] should be masked because 1 and 1 equal each other but [1, 2, :] should not because they do not.

例如,[1,1,...]应该被屏蔽,因为1和1相等,但[1,2,:]不应该,因为它们不相同。

I'm only asking this with the third dimension because it resembles my current problem and may complicate things. But my main question is, how to mask arrays based on the value of the indices?

我只是问第三个维度,因为它类似于我当前的问题并且可能使事情复杂化。但我的主要问题是,如何根据索引的值掩盖数组?

2 个解决方案

#1


7  

In general, to access the value of the indices, you can use np.meshgrid:

通常,要访问索引的值,可以使用np.meshgrid:

i, j, k = np.meshgrid(*map(np.arange, m.shape), indexing='ij')
m.mask = (i == j)

The advantage of this method is that it works for arbitrary boolean functions on i, j, and k. It is a bit slower than the use of the identity special case.

这种方法的优点是它适用于i,j和k上的任意布尔函数。它比使用身份特殊情况慢一点。

In [56]: %%timeit
   ....: i, j, k = np.meshgrid(*map(np.arange, m.shape), indexing='ij')
   ....: i == j
10000 loops, best of 3: 96.8 µs per loop

As @Jaime points out, meshgrid supports a sparse option, which doesn't do so much duplication, but requires a bit more care in some cases because they don't broadcast. It will save memory and speed things up a little. For example,

正如@Jaime指出的那样,meshgrid支持一个稀疏选项,它没有那么多重复,但在某些情况下需要更多的关注,因为它们不进行广播。它会节省内存并加快速度。例如,

In [77]: x = np.arange(5)

In [78]: np.meshgrid(x, x)
Out[78]: 
[array([[0, 1, 2, 3, 4],
       [0, 1, 2, 3, 4],
       [0, 1, 2, 3, 4],
       [0, 1, 2, 3, 4],
       [0, 1, 2, 3, 4]]),
 array([[0, 0, 0, 0, 0],
       [1, 1, 1, 1, 1],
       [2, 2, 2, 2, 2],
       [3, 3, 3, 3, 3],
       [4, 4, 4, 4, 4]])]

In [79]: np.meshgrid(x, x, sparse=True)
Out[79]: 
[array([[0, 1, 2, 3, 4]]),
 array([[0],
       [1],
       [2],
       [3],
       [4]])]

So, you can use the sparse version as he says, but you must force the broadcasting as such:

因此,您可以使用稀疏版本,但您必须强制广播:

i, j, k = np.meshgrid(*map(np.arange, m.shape), indexing='ij', sparse=True)
m.mask = np.repeat(i==j, k.size, axis=2)

And the speedup:

加速:

In [84]: %%timeit
   ....: i, j, k = np.meshgrid(*map(np.arange, m.shape), indexing='ij', sparse=True)
   ....: np.repeat(i==j, k.size, axis=2)
10000 loops, best of 3: 73.9 µs per loop

#2


0  

In your special case of wanting to mask the diagonals, you can use the np.identity() function which returns ones along the diagonal. Since you have the third dimension, we have to add that third dimension to the the identity matrix:

在你想要掩盖对角线的特殊情况下,你可以使用np.identity()函数返回沿对角线的函数。由于您有第三个维度,我们必须将第三个维度添加到单位矩阵:

m.mask = np.identity(10)[...,None]*np.ones((1,1,30))

There might be a better way of constructing that array, but it is basically stacking 30 of the np.identity(10) array. For example, this is equivalent:

可能有一种更好的方法来构造该数组,但它基本上堆叠了30个np.identity(10)数组。例如,这是等效的:

np.dstack((np.identity(10),)*30)

but slower:

In [30]: timeit np.identity(10)[...,None]*np.ones((1,1,30))
10000 loops, best of 3: 40.7 µs per loop

In [31]: timeit np.dstack((np.identity(10),)*30)
1000 loops, best of 3: 219 µs per loop

And @Ophion's suggestions

和@ Ophion的建议

In [33]: timeit np.tile(np.identity(10)[...,None], 30)
10000 loops, best of 3: 63.2 µs per loop

In [71]: timeit np.repeat(np.identity(10)[...,None], 30)
10000 loops, best of 3: 45.3 µs per loop

#1


7  

In general, to access the value of the indices, you can use np.meshgrid:

通常,要访问索引的值,可以使用np.meshgrid:

i, j, k = np.meshgrid(*map(np.arange, m.shape), indexing='ij')
m.mask = (i == j)

The advantage of this method is that it works for arbitrary boolean functions on i, j, and k. It is a bit slower than the use of the identity special case.

这种方法的优点是它适用于i,j和k上的任意布尔函数。它比使用身份特殊情况慢一点。

In [56]: %%timeit
   ....: i, j, k = np.meshgrid(*map(np.arange, m.shape), indexing='ij')
   ....: i == j
10000 loops, best of 3: 96.8 µs per loop

As @Jaime points out, meshgrid supports a sparse option, which doesn't do so much duplication, but requires a bit more care in some cases because they don't broadcast. It will save memory and speed things up a little. For example,

正如@Jaime指出的那样,meshgrid支持一个稀疏选项,它没有那么多重复,但在某些情况下需要更多的关注,因为它们不进行广播。它会节省内存并加快速度。例如,

In [77]: x = np.arange(5)

In [78]: np.meshgrid(x, x)
Out[78]: 
[array([[0, 1, 2, 3, 4],
       [0, 1, 2, 3, 4],
       [0, 1, 2, 3, 4],
       [0, 1, 2, 3, 4],
       [0, 1, 2, 3, 4]]),
 array([[0, 0, 0, 0, 0],
       [1, 1, 1, 1, 1],
       [2, 2, 2, 2, 2],
       [3, 3, 3, 3, 3],
       [4, 4, 4, 4, 4]])]

In [79]: np.meshgrid(x, x, sparse=True)
Out[79]: 
[array([[0, 1, 2, 3, 4]]),
 array([[0],
       [1],
       [2],
       [3],
       [4]])]

So, you can use the sparse version as he says, but you must force the broadcasting as such:

因此,您可以使用稀疏版本,但您必须强制广播:

i, j, k = np.meshgrid(*map(np.arange, m.shape), indexing='ij', sparse=True)
m.mask = np.repeat(i==j, k.size, axis=2)

And the speedup:

加速:

In [84]: %%timeit
   ....: i, j, k = np.meshgrid(*map(np.arange, m.shape), indexing='ij', sparse=True)
   ....: np.repeat(i==j, k.size, axis=2)
10000 loops, best of 3: 73.9 µs per loop

#2


0  

In your special case of wanting to mask the diagonals, you can use the np.identity() function which returns ones along the diagonal. Since you have the third dimension, we have to add that third dimension to the the identity matrix:

在你想要掩盖对角线的特殊情况下,你可以使用np.identity()函数返回沿对角线的函数。由于您有第三个维度,我们必须将第三个维度添加到单位矩阵:

m.mask = np.identity(10)[...,None]*np.ones((1,1,30))

There might be a better way of constructing that array, but it is basically stacking 30 of the np.identity(10) array. For example, this is equivalent:

可能有一种更好的方法来构造该数组,但它基本上堆叠了30个np.identity(10)数组。例如,这是等效的:

np.dstack((np.identity(10),)*30)

but slower:

In [30]: timeit np.identity(10)[...,None]*np.ones((1,1,30))
10000 loops, best of 3: 40.7 µs per loop

In [31]: timeit np.dstack((np.identity(10),)*30)
1000 loops, best of 3: 219 µs per loop

And @Ophion's suggestions

和@ Ophion的建议

In [33]: timeit np.tile(np.identity(10)[...,None], 30)
10000 loops, best of 3: 63.2 µs per loop

In [71]: timeit np.repeat(np.identity(10)[...,None], 30)
10000 loops, best of 3: 45.3 µs per loop