什么是矢量化的方法来创建一个numpy数组的多个幂?

时间:2021-01-12 21:22:33

I have a numpy array:

我有一个numpy数组:

arr = [[1, 2],
       [3, 4]]

I want to create a new array that contains powers of arr up to a power order:

我想创建一个新的数组,它包含了arr的幂次幂:

# arr_new = [arr^0, arr^1, arr^2, arr^3,...arr^order]
arr_new = [[1, 1, 1, 2, 1, 4, 1, 8],
           [1, 1, 3, 4, 9, 16, 27, 64]]

My current approach uses for loops:

我现在的方法用于循环:

# pre-allocate an array for powers
arr = np.array([[1, 2],[3,4]])
order = 3
rows, cols = arr.shape
arr_new = np.zeros((rows, (order+1) * cols))

# iterate over each exponent
for i in range(order + 1):
    arr_new[:, (i * cols) : (i + 1) * cols] = arr**i
print(arr_new)

Is there a faster (i.e. vectorized) approach to creating powers of an array?

是否有一个更快的(即矢量化)方法来创建数组的幂?


Edit: Benchmarking

Thanks to @hpaulj and @Divakar and @Paul Panzer for the answers. I benchmarked the loop-based and broadcasting-based operations on the following test arrays.

感谢@hpaulj和@Divakar和@Paul Panzer的答案。我在以下测试数组中对基于环的和基于广播的操作进行了基准测试。

arr = np.array([[1, 2],
                [3,4]])
order = 3

arrLarge = np.random.randint(0, 10, (100, 100))  # 100 x 100 array
orderLarge = 10

The loop_based function is:

loop_based函数是:

def loop_based(arr, order):
    # pre-allocate an array for powers
    rows, cols = arr.shape
    arr_new = np.zeros((rows, (order+1) * cols))
    # iterate over each exponent
    for i in range(order + 1):
        arr_new[:, (i * cols) : (i + 1) * cols] = arr**i
    return arr_new

The broadcast_based function using hstack is:

使用hstack的基于广播的函数为:

def broadcast_based_hstack(arr, order):
    # create a 3D exponent array for a 2D input array to force broadcasting
    powers = np.arange(order + 1)[:, None, None]
    # generate values (3-rd axis contains array at various powers)
    exponentiated = arr ** powers
    # reshape and return array
    return np.hstack(exponentiated)   # <== using hstack function

The broadcast_based function using reshape is:

使用整形的广播功能是:

def broadcast_based_reshape(arr, order):
    # create a 3D exponent array for a 2D input array to force broadcasting
    powers = np.arange(order + 1)[:, None]
    # generate values (3-rd axis contains array at various powers)
    exponentiated = arr[:, None] ** powers
    # reshape and return array
    return exponentiated.reshape(arr.shape[0], -1)  # <== using reshape function

The broadcast_based function using cumulative product cumprod and reshape:

使用累积积累积和重塑的基于广播的函数:

def broadcast_cumprod_reshape(arr, order):
    rows, cols = arr.shape
    # Create 3D empty array where the middle dimension is
    # the array at powers 0 through order
    out = np.empty((rows, order + 1, cols), dtype=arr.dtype)
    out[:, 0, :] = 1   # 0th power is always 1
    a = np.broadcast_to(arr[:, None], (rows, order, cols))
    # cumulatively multiply arrays so each multiplication produces the next order
    np.cumprod(a, axis=1, out=out[:,1:,:])
    return out.reshape(rows, -1)

On jupyter notebook, I used the timeit command and got these results:

在jupyter笔记本上,我使用了timeit命令并得到了这些结果:

Small arrays (2x2):

小阵列(2 x2):

%timeit -n 100000 loop_based(arr, order)
7.41 µs ± 174 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

%timeit -n 100000 broadcast_based_hstack(arr, order)
10.1 µs ± 137 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

%timeit -n 100000 broadcast_based_reshape(arr, order)
3.31 µs ± 61.5 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

%timeit -n 100000 broadcast_cumprod_reshape(arr, order)
11 µs ± 102 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

Large arrays (100x100):

大数组(100 x100):

%timeit -n 1000 loop_based(arrLarge, orderLarge)
261 µs ± 5.82 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

%timeit -n 1000 broadcast_based_hstack(arrLarge, orderLarge)
225 µs ± 4.15 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

%timeit -n 1000 broadcast_based_reshape(arrLarge, orderLarge)
223 µs ± 2.16 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

%timeit -n 1000 broadcast_cumprod_reshape(arrLarge, orderLarge)
157 µs ± 1.02 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

Conclusions:

It seems that the broadcast based approach using reshape is faster for smaller arrays. However, for large arrays, the cumprod approach scales better and is faster.

对于较小的数组,基于广播的方法似乎更快。然而,对于大型数组,cumprod方法可以更好地扩展,而且更快。

3 个解决方案

#1


6  

Extend arrays to higher dims and let broadcasting do its magic with some help from reshaping -

将数组扩展到更高的dims,并让广播用一些帮助来改变它的魔法。

In [16]: arr = np.array([[1, 2],[3,4]])

In [17]: order = 3

In [18]: (arr[:,None]**np.arange(order+1)[:,None]).reshape(arr.shape[0],-1)
Out[18]: 
array([[ 1,  1,  1,  2,  1,  4,  1,  8],
       [ 1,  1,  3,  4,  9, 16, 27, 64]])

Note that arr[:,None] is essentially arr[:,None,:], but we can skip the trailing ellipsis for brevity.

注意,arr[:,None]本质上是arr[:,None,:],但是我们可以跳过后面的省略。

Timings on a bigger dataset -

更大数据集的时间。

In [40]: np.random.seed(0)
    ...: arr = np.random.randint(0,9,(100,100))
    ...: order = 10

# @hpaulj's soln with broadcasting and stacking
In [41]: %timeit np.hstack(arr **np.arange(order+1)[:,None,None])
1000 loops, best of 3: 734 µs per loop

In [42]: %timeit (arr[:,None]**np.arange(order+1)[:,None]).reshape(arr.shape[0],-1)
1000 loops, best of 3: 401 µs per loop

That reshaping part is practically free and that's where we gain performance here alongwith the broadcasting part of course, as seen in the breakdown below -

重塑部分实际上是免费的,这就是我们在这里获得表演的地方,当然,正如在下面的分解中所看到的。

In [52]: %timeit (arr[:,None]**np.arange(order+1)[:,None])
1000 loops, best of 3: 390 µs per loop

In [53]: %timeit (arr[:,None]**np.arange(order+1)[:,None]).reshape(arr.shape[0],-1)
1000 loops, best of 3: 401 µs per loop

#2


7  

Use broadcasting to generate the values, and reshape or rearrange the values as desired:

使用广播来生成值,并根据需要重新调整或重新排列这些值:

In [34]: arr **np.arange(4)[:,None,None]
Out[34]: 
array([[[ 1,  1],
        [ 1,  1]],

       [[ 1,  2],
        [ 3,  4]],

       [[ 1,  4],
        [ 9, 16]],

       [[ 1,  8],
        [27, 64]]])
In [35]: np.hstack(_)
Out[35]: 
array([[ 1,  1,  1,  2,  1,  4,  1,  8],
       [ 1,  1,  3,  4,  9, 16, 27, 64]])

#3


4  

Here is a solution using cumulative multiplication which scales better than power based approaches, especially if the input array is of float dtype:

这里是一个使用累积乘法的解决方案,它比基于电源的方法更有效,特别是如果输入数组是浮点dtype:

import numpy as np

def f_mult(a, k):
    m, n = a.shape
    out = np.empty((m, k, n), dtype=a.dtype)
    out[:, 0, :] = 1
    a = np.broadcast_to(a[:, None], (m, k-1, n))
    a.cumprod(axis=1, out=out[:, 1:])
    return out.reshape(m, -1)

Timings:

计时:

int up to power 9
divakar: 0.4342731796205044 ms
hpaulj:  0.794165057130158 ms
pp:      0.20520629966631532 ms
float up to power 39
divakar: 29.056487752124667 ms
hpaulj:  31.773792404681444 ms
pp:      1.0329263447783887 ms

Code for timings, thks @Divakar:

timings的代码,thks @Divakar:

def f_divakar(a, k):
    return (a[:,None]**np.arange(k)[:,None]).reshape(a.shape[0],-1)

def f_hpaulj(a, k):
    return np.hstack(a**np.arange(k)[:,None,None])


from timeit import timeit

np.random.seed(0)
a = np.random.randint(0,9,(100,100))
k = 10
print('int up to power 9')
print('divakar:', timeit(lambda: f_divakar(a, k), number=1000), 'ms')
print('hpaulj: ', timeit(lambda: f_hpaulj(a, k), number=1000), 'ms')
print('pp:     ', timeit(lambda: f_mult(a, k), number=1000), 'ms')

a = np.random.uniform(0.5,2.0,(100,100))
k = 40
print('float up to power 39')
print('divakar:', timeit(lambda: f_divakar(a, k), number=1000), 'ms')
print('hpaulj: ', timeit(lambda: f_hpaulj(a, k), number=1000), 'ms')
print('pp:     ', timeit(lambda: f_mult(a, k), number=1000), 'ms')

#1


6  

Extend arrays to higher dims and let broadcasting do its magic with some help from reshaping -

将数组扩展到更高的dims,并让广播用一些帮助来改变它的魔法。

In [16]: arr = np.array([[1, 2],[3,4]])

In [17]: order = 3

In [18]: (arr[:,None]**np.arange(order+1)[:,None]).reshape(arr.shape[0],-1)
Out[18]: 
array([[ 1,  1,  1,  2,  1,  4,  1,  8],
       [ 1,  1,  3,  4,  9, 16, 27, 64]])

Note that arr[:,None] is essentially arr[:,None,:], but we can skip the trailing ellipsis for brevity.

注意,arr[:,None]本质上是arr[:,None,:],但是我们可以跳过后面的省略。

Timings on a bigger dataset -

更大数据集的时间。

In [40]: np.random.seed(0)
    ...: arr = np.random.randint(0,9,(100,100))
    ...: order = 10

# @hpaulj's soln with broadcasting and stacking
In [41]: %timeit np.hstack(arr **np.arange(order+1)[:,None,None])
1000 loops, best of 3: 734 µs per loop

In [42]: %timeit (arr[:,None]**np.arange(order+1)[:,None]).reshape(arr.shape[0],-1)
1000 loops, best of 3: 401 µs per loop

That reshaping part is practically free and that's where we gain performance here alongwith the broadcasting part of course, as seen in the breakdown below -

重塑部分实际上是免费的,这就是我们在这里获得表演的地方,当然,正如在下面的分解中所看到的。

In [52]: %timeit (arr[:,None]**np.arange(order+1)[:,None])
1000 loops, best of 3: 390 µs per loop

In [53]: %timeit (arr[:,None]**np.arange(order+1)[:,None]).reshape(arr.shape[0],-1)
1000 loops, best of 3: 401 µs per loop

#2


7  

Use broadcasting to generate the values, and reshape or rearrange the values as desired:

使用广播来生成值,并根据需要重新调整或重新排列这些值:

In [34]: arr **np.arange(4)[:,None,None]
Out[34]: 
array([[[ 1,  1],
        [ 1,  1]],

       [[ 1,  2],
        [ 3,  4]],

       [[ 1,  4],
        [ 9, 16]],

       [[ 1,  8],
        [27, 64]]])
In [35]: np.hstack(_)
Out[35]: 
array([[ 1,  1,  1,  2,  1,  4,  1,  8],
       [ 1,  1,  3,  4,  9, 16, 27, 64]])

#3


4  

Here is a solution using cumulative multiplication which scales better than power based approaches, especially if the input array is of float dtype:

这里是一个使用累积乘法的解决方案,它比基于电源的方法更有效,特别是如果输入数组是浮点dtype:

import numpy as np

def f_mult(a, k):
    m, n = a.shape
    out = np.empty((m, k, n), dtype=a.dtype)
    out[:, 0, :] = 1
    a = np.broadcast_to(a[:, None], (m, k-1, n))
    a.cumprod(axis=1, out=out[:, 1:])
    return out.reshape(m, -1)

Timings:

计时:

int up to power 9
divakar: 0.4342731796205044 ms
hpaulj:  0.794165057130158 ms
pp:      0.20520629966631532 ms
float up to power 39
divakar: 29.056487752124667 ms
hpaulj:  31.773792404681444 ms
pp:      1.0329263447783887 ms

Code for timings, thks @Divakar:

timings的代码,thks @Divakar:

def f_divakar(a, k):
    return (a[:,None]**np.arange(k)[:,None]).reshape(a.shape[0],-1)

def f_hpaulj(a, k):
    return np.hstack(a**np.arange(k)[:,None,None])


from timeit import timeit

np.random.seed(0)
a = np.random.randint(0,9,(100,100))
k = 10
print('int up to power 9')
print('divakar:', timeit(lambda: f_divakar(a, k), number=1000), 'ms')
print('hpaulj: ', timeit(lambda: f_hpaulj(a, k), number=1000), 'ms')
print('pp:     ', timeit(lambda: f_mult(a, k), number=1000), 'ms')

a = np.random.uniform(0.5,2.0,(100,100))
k = 40
print('float up to power 39')
print('divakar:', timeit(lambda: f_divakar(a, k), number=1000), 'ms')
print('hpaulj: ', timeit(lambda: f_hpaulj(a, k), number=1000), 'ms')
print('pp:     ', timeit(lambda: f_mult(a, k), number=1000), 'ms')