对返回向量的函数使用Numpy Vectorize。

时间:2022-06-27 04:16:29

numpy.vectorize takes a function f:a->b and turns it into g:a[]->b[].

numpy。vectorize取一个函数f:a->b,并将其转化为g:a[]->b[]。

This works fine when a and b are scalars, but I can't think of a reason why it wouldn't work with b as an ndarray or list, i.e. f:a->b[] and g:a[]->b[][]

当a和b都是标量时,这是可行的,但我想不出为什么它不能与b作为ndarray或list,即f:a->b[]和g:a[]->b[][]

For example:

例如:

import numpy as np
def f(x):
    return x * np.array([1,1,1,1,1], dtype=np.float32)
g = np.vectorize(f, otypes=[np.ndarray])
a = np.arange(4)
print(g(a))

This yields:

这个收益率:

array([[ 0.  0.  0.  0.  0.],
       [ 1.  1.  1.  1.  1.],
       [ 2.  2.  2.  2.  2.],
       [ 3.  3.  3.  3.  3.]], dtype=object)

Ok, so that gives the right values, but the wrong dtype. And even worse:

好的,这给出了正确的值,但dtype是错误的。更糟糕的是:

g(a).shape

yields:

收益率:

(4,)

So this array is pretty much useless. I know I can convert it doing:

这个数组几乎没用。我知道我可以将它转化为:

np.array(map(list, a), dtype=np.float32)

to give me what I want:

给我我想要的:

array([[ 0.,  0.,  0.,  0.,  0.],
       [ 1.,  1.,  1.,  1.,  1.],
       [ 2.,  2.,  2.,  2.,  2.],
       [ 3.,  3.,  3.,  3.,  3.]], dtype=float32)

but that is neither efficient nor pythonic. Can any of you guys find a cleaner way to do this?

但这既不高效也不符合python语言。你们能找到一个更干净的方法吗?

Thanks in advance!

提前谢谢!

5 个解决方案

#1


27  

np.vectorize is just a convenience function. It doesn't actually make code run any faster. If it isn't convenient to use np.vectorize, simply write your own function that works as you wish.

np。vectorize只是一个方便函数。它不会让代码运行得更快。如果使用np不方便。vectorize,简单地编写您想要的函数。

The purpose of np.vectorize is to transform functions which are not numpy-aware (e.g. take floats as input and return floats as output) into functions that can operate on (and return) numpy arrays.

np的目的。vectorize是将非numpaware(例如将浮点数作为输入,将浮点数作为输出)转换为可以操作(和返回)numpy数组的函数的函数。

Your function f is already numpy-aware -- it uses a numpy array in its definition and returns a numpy array. So np.vectorize is not a good fit for your use case.

您的函数f已经具有numpy-aware——它在定义中使用了一个numpy数组,并返回一个numpy数组。所以np。vectorize并不适合您的用例。

The solution therefore is just to roll your own function f that works the way you desire.

因此,解决方法就是将你自己的函数按你所期望的方式运行。

#2


2  

import numpy as np
def f(x):
    return x * np.array([1,1,1,1,1], dtype=np.float32)
g = np.vectorize(f, otypes=[np.ndarray])
a = np.arange(4)
b = g(a)
b = np.array(b.tolist())
print(b)#b.shape = (4,5)
c = np.ones((2,3,4))
d = g(c)
d = np.array(d.tolist())
print(d)#d.shape = (2,3,4,5)

This should fix the problem and it will work regardless of what size your input is. "map" only works for one dimentional inputs. Using ".tolist()" and creating a new ndarray solves the problem more completely and nicely(I believe). Hope this helps.

这将解决问题,不管输入的大小如何,它都可以工作。“地图”只适用于一维的输入。使用“。tolist()”和创建一个新的ndarray可以更彻底地解决这个问题(我相信)。希望这个有帮助。

#3


1  

I've written a function, it seems fits to your need.

我写了一个函数,它似乎符合你的需要。

def amap(func, *args):
    '''array version of build-in map
    amap(function, sequence[, sequence, ...]) -> array
    Examples
    --------
    >>> amap(lambda x: x**2, 1)
    array(1)
    >>> amap(lambda x: x**2, [1, 2])
    array([1, 4])
    >>> amap(lambda x,y: y**2 + x**2, 1, [1, 2])
    array([2, 5])
    >>> amap(lambda x: (x, x), 1)
    array([1, 1])
    >>> amap(lambda x,y: [x**2, y**2], [1,2], [3,4])
    array([[1, 9], [4, 16]])
    '''
    args = np.broadcast(None, *args)
    res = np.array([func(*arg[1:]) for arg in args])
    shape = args.shape + res.shape[1:]
    return res.reshape(shape)

Let try

我们试一试

def f(x):
        return x * np.array([1,1,1,1,1], dtype=np.float32)
amap(f, np.arange(4))

Outputs

输出

array([[ 0.,  0.,  0.,  0.,  0.],
       [ 1.,  1.,  1.,  1.,  1.],
       [ 2.,  2.,  2.,  2.,  2.],
       [ 3.,  3.,  3.,  3.,  3.]], dtype=float32)

You may also wrap it with lambda or partial for convenience

为了方便,您也可以用lambda或局部来包装它。

g = lambda x:amap(f, x)
g(np.arange(4))

Note the docstring of vectorize says

注意,vectorize的docstring表示。

The vectorize function is provided primarily for convenience, not for performance. The implementation is essentially a for loop.

矢量化功能主要是为了方便,而不是为了性能。实现本质上是一个for循环。

Thus we would expect the amap here have similar performance as vectorize. I didn't check it, Any performance test are welcome.

因此,我们期望amap在这里具有与vectorize类似的性能。我没有检查,任何性能测试都是受欢迎的。

If the performance is really important, you should consider something else, e.g. direct array calculation with reshape and broadcast to avoid loop in pure python (both vectorize and amap are the later case).

如果性能真的很重要,您应该考虑一些其他的东西,例如通过重构和广播进行直接数组计算,以避免纯python中的循环(后面的例子都是vectorize和amap)。

#4


1  

A new parameter signature in 1.12.0 does exactly what you what.

1.12.0中的一个新的参数签名可以精确地完成您所做的工作。

def f(x):
    return x * np.array([1,1,1,1,1], dtype=np.float32)

g = np.vectorize(f, signature='()->(n)')

Then g(np.arange(4)).shape will give (4L, 5L).

然后g(np.arange(4))。形状将给予(4L, 5L)。

Here the signature of f is specified. The (n) is the shape of the return value, and the () is the shape of the parameter which is scalar. And the parameters can be arrays too. For more complex signatures, see Generalized Universal Function API.

这里指定了f的签名。(n)是返回值的形状,()是参数的形状,它是标量。参数也可以是数组。对于更复杂的签名,请参见通用通用函数API。

#5


0  

The best way to solve this would be to use a 2-D NumPy array (in this case a column array) as an input to the original function, which will then generate a 2-D output with the results I believe you were expecting.

解决这一问题的最好方法是使用一个2-D的NumPy数组(在本例中为列数组)作为原始函数的输入,然后生成一个2-D输出,结果是我相信您所期望的结果。

Here is what it might look like in code:

下面是代码中可能出现的情况:

import numpy as np
def f(x):
    return x*np.array([1, 1, 1, 1, 1], dtype=np.float32)

a = np.arange(4).reshape((4, 1))
b = f(a)
# b is a 2-D array with shape (4, 5)
print(b)

This is a much simpler and less error prone way to complete the operation. Rather than trying to transform the function with numpy.vectorize, this method relies on NumPy's natural ability to broadcast arrays. The trick is to make sure that at least one dimension has an equal length between the arrays.

这是一个更简单、更容易出错的方法来完成操作。而不是用numpy来转换函数。矢量化,这种方法依赖于NumPy的广播阵列的自然能力。技巧是确保至少一个维度在数组之间具有相等的长度。

#1


27  

np.vectorize is just a convenience function. It doesn't actually make code run any faster. If it isn't convenient to use np.vectorize, simply write your own function that works as you wish.

np。vectorize只是一个方便函数。它不会让代码运行得更快。如果使用np不方便。vectorize,简单地编写您想要的函数。

The purpose of np.vectorize is to transform functions which are not numpy-aware (e.g. take floats as input and return floats as output) into functions that can operate on (and return) numpy arrays.

np的目的。vectorize是将非numpaware(例如将浮点数作为输入,将浮点数作为输出)转换为可以操作(和返回)numpy数组的函数的函数。

Your function f is already numpy-aware -- it uses a numpy array in its definition and returns a numpy array. So np.vectorize is not a good fit for your use case.

您的函数f已经具有numpy-aware——它在定义中使用了一个numpy数组,并返回一个numpy数组。所以np。vectorize并不适合您的用例。

The solution therefore is just to roll your own function f that works the way you desire.

因此,解决方法就是将你自己的函数按你所期望的方式运行。

#2


2  

import numpy as np
def f(x):
    return x * np.array([1,1,1,1,1], dtype=np.float32)
g = np.vectorize(f, otypes=[np.ndarray])
a = np.arange(4)
b = g(a)
b = np.array(b.tolist())
print(b)#b.shape = (4,5)
c = np.ones((2,3,4))
d = g(c)
d = np.array(d.tolist())
print(d)#d.shape = (2,3,4,5)

This should fix the problem and it will work regardless of what size your input is. "map" only works for one dimentional inputs. Using ".tolist()" and creating a new ndarray solves the problem more completely and nicely(I believe). Hope this helps.

这将解决问题,不管输入的大小如何,它都可以工作。“地图”只适用于一维的输入。使用“。tolist()”和创建一个新的ndarray可以更彻底地解决这个问题(我相信)。希望这个有帮助。

#3


1  

I've written a function, it seems fits to your need.

我写了一个函数,它似乎符合你的需要。

def amap(func, *args):
    '''array version of build-in map
    amap(function, sequence[, sequence, ...]) -> array
    Examples
    --------
    >>> amap(lambda x: x**2, 1)
    array(1)
    >>> amap(lambda x: x**2, [1, 2])
    array([1, 4])
    >>> amap(lambda x,y: y**2 + x**2, 1, [1, 2])
    array([2, 5])
    >>> amap(lambda x: (x, x), 1)
    array([1, 1])
    >>> amap(lambda x,y: [x**2, y**2], [1,2], [3,4])
    array([[1, 9], [4, 16]])
    '''
    args = np.broadcast(None, *args)
    res = np.array([func(*arg[1:]) for arg in args])
    shape = args.shape + res.shape[1:]
    return res.reshape(shape)

Let try

我们试一试

def f(x):
        return x * np.array([1,1,1,1,1], dtype=np.float32)
amap(f, np.arange(4))

Outputs

输出

array([[ 0.,  0.,  0.,  0.,  0.],
       [ 1.,  1.,  1.,  1.,  1.],
       [ 2.,  2.,  2.,  2.,  2.],
       [ 3.,  3.,  3.,  3.,  3.]], dtype=float32)

You may also wrap it with lambda or partial for convenience

为了方便,您也可以用lambda或局部来包装它。

g = lambda x:amap(f, x)
g(np.arange(4))

Note the docstring of vectorize says

注意,vectorize的docstring表示。

The vectorize function is provided primarily for convenience, not for performance. The implementation is essentially a for loop.

矢量化功能主要是为了方便,而不是为了性能。实现本质上是一个for循环。

Thus we would expect the amap here have similar performance as vectorize. I didn't check it, Any performance test are welcome.

因此,我们期望amap在这里具有与vectorize类似的性能。我没有检查,任何性能测试都是受欢迎的。

If the performance is really important, you should consider something else, e.g. direct array calculation with reshape and broadcast to avoid loop in pure python (both vectorize and amap are the later case).

如果性能真的很重要,您应该考虑一些其他的东西,例如通过重构和广播进行直接数组计算,以避免纯python中的循环(后面的例子都是vectorize和amap)。

#4


1  

A new parameter signature in 1.12.0 does exactly what you what.

1.12.0中的一个新的参数签名可以精确地完成您所做的工作。

def f(x):
    return x * np.array([1,1,1,1,1], dtype=np.float32)

g = np.vectorize(f, signature='()->(n)')

Then g(np.arange(4)).shape will give (4L, 5L).

然后g(np.arange(4))。形状将给予(4L, 5L)。

Here the signature of f is specified. The (n) is the shape of the return value, and the () is the shape of the parameter which is scalar. And the parameters can be arrays too. For more complex signatures, see Generalized Universal Function API.

这里指定了f的签名。(n)是返回值的形状,()是参数的形状,它是标量。参数也可以是数组。对于更复杂的签名,请参见通用通用函数API。

#5


0  

The best way to solve this would be to use a 2-D NumPy array (in this case a column array) as an input to the original function, which will then generate a 2-D output with the results I believe you were expecting.

解决这一问题的最好方法是使用一个2-D的NumPy数组(在本例中为列数组)作为原始函数的输入,然后生成一个2-D输出,结果是我相信您所期望的结果。

Here is what it might look like in code:

下面是代码中可能出现的情况:

import numpy as np
def f(x):
    return x*np.array([1, 1, 1, 1, 1], dtype=np.float32)

a = np.arange(4).reshape((4, 1))
b = f(a)
# b is a 2-D array with shape (4, 5)
print(b)

This is a much simpler and less error prone way to complete the operation. Rather than trying to transform the function with numpy.vectorize, this method relies on NumPy's natural ability to broadcast arrays. The trick is to make sure that at least one dimension has an equal length between the arrays.

这是一个更简单、更容易出错的方法来完成操作。而不是用numpy来转换函数。矢量化,这种方法依赖于NumPy的广播阵列的自然能力。技巧是确保至少一个维度在数组之间具有相等的长度。