将依赖于其输入值的位置的函数映射到numpy数组

时间:2021-07-22 21:46:34

Say we have an array,

假设我们有一个数组,

arr = np.random.rand(3,3)

Usually when mapping a function an array, we are only interested in values of the array elements, ie

通常在将函数映射到数组时,我们只对数组元素的值感兴趣,即

f = lambda val : val**2
arr_squared = f(arr)

But what if the output of our function depends on where the input value is located in the array, ie

但是,如果函数的输出取决于输入值在数组中的位置,即

f = lambda x,y,val : x*y*val

Right now I'm using meshgrids and ravel.

现在我正在使用网格和ravel。

X, Y = np.arange(arr.shape[0]), np.arange(arr.shape[1])
X, Y = np.meshgrid(X,Y)

result = np.zeros(arr.shape)

for x,y in zip(np.ravel(X), np.ravel(Y)):
    result[x,y] = f(x,y,arr[x,y])

This works but is pretty slow. I'm having a hard time figuring out if there is a better/faster way to do this, and searches online have not yielded much useful info.

这有效,但速度很慢。我很难搞清楚是否有更好/更快的方法来做到这一点,而在线搜索并没有产生太多有用的信息。

1 个解决方案

#1


0  

In principle, indices behave like any other arguments.

原则上,索引的行为与任何其他参数一样。

It all boils down to whether your function is vectorized or not.

这一切都归结为你的功能是否被矢量化。

If yes, as in your example:

如果是,请在您的示例中:

>>> Y, X = np.ogrid[(*map(slice, arr.shape),)]
>>> def f(X, Y, val): return X*Y*val
... 
>>> f(X, Y, arr)
array([[0.        , 0.        , 0.        ],
       [0.        , 0.92796409, 0.20353397],
       [0.        , 1.01294541, 1.30677315]])

If not, as for example

如果不是,例如

>>> def g(X, Y, val): return X+Y if val>0.5 else X-Y
... 
>>> g(X, Y, arr)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<stdin>", line 1, in g
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

The quick fix is np.vectorize, but this is not fast:

快速修复是np.vectorize,但这并不快:

>>> np.vectorize(g)(X, Y, arr)
array([[0, 1, 2],
       [1, 2, 1],
       [2, 3, 0]])

Better, if possible, manually vectorize:

如果可能,更好,手动矢量化:

>>> def gv(X, Y, val): return X + (2*(val>0.5)-1) * Y
... 
>>> gv(X, Y, arr)
array([[0, 1, 2],
       [1, 2, 1],
       [2, 3, 0]])

Side note re your use of meshgrid: For indices you can abbreviate this using

侧面注释你使用meshgrid:对于索引,你可以使用它来缩写

Y, X = np.indices(arr.shape)

In the example above I'm using open grids which leverage broadcasting to save memory.

在上面的示例中,我使用开放网格,利用广播来节省内存。

#1


0  

In principle, indices behave like any other arguments.

原则上,索引的行为与任何其他参数一样。

It all boils down to whether your function is vectorized or not.

这一切都归结为你的功能是否被矢量化。

If yes, as in your example:

如果是,请在您的示例中:

>>> Y, X = np.ogrid[(*map(slice, arr.shape),)]
>>> def f(X, Y, val): return X*Y*val
... 
>>> f(X, Y, arr)
array([[0.        , 0.        , 0.        ],
       [0.        , 0.92796409, 0.20353397],
       [0.        , 1.01294541, 1.30677315]])

If not, as for example

如果不是,例如

>>> def g(X, Y, val): return X+Y if val>0.5 else X-Y
... 
>>> g(X, Y, arr)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<stdin>", line 1, in g
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

The quick fix is np.vectorize, but this is not fast:

快速修复是np.vectorize,但这并不快:

>>> np.vectorize(g)(X, Y, arr)
array([[0, 1, 2],
       [1, 2, 1],
       [2, 3, 0]])

Better, if possible, manually vectorize:

如果可能,更好,手动矢量化:

>>> def gv(X, Y, val): return X + (2*(val>0.5)-1) * Y
... 
>>> gv(X, Y, arr)
array([[0, 1, 2],
       [1, 2, 1],
       [2, 3, 0]])

Side note re your use of meshgrid: For indices you can abbreviate this using

侧面注释你使用meshgrid:对于索引,你可以使用它来缩写

Y, X = np.indices(arr.shape)

In the example above I'm using open grids which leverage broadcasting to save memory.

在上面的示例中,我使用开放网格,利用广播来节省内存。