矢量化这个for循环numpy

I am trying to compute matrix z (defined below) in python with numpy.

我试图用numpy计算python中的矩阵z(在下面定义)。

矢量化这个for循环numpy

Here's my current solution (using 1 for loop)

这是我目前的解决方案(使用1 for for循环)

z = np.zeros((n, k))
for i in range(n):
    v = pi * (1 / math.factorial(x[i])) * np.exp(-1 * lamb) * (lamb ** x[i])
    numerator = np.sum(v)
    c = v / numerator
    z[i, :] = c
return z

Is it possible to completely vectorize this computation? I need to do this computation for thousands of iterations, and matrix operations in numpy is much faster than huge for loops.

是否有可能完全矢量化这个计算?我需要对数千次迭代进行这种计算,而numpy中的矩阵运算要比循环中的矩阵运算快得多。

1 个解决方案

#1

Here is a vectorized version of E. It replaces the for-loop and scalar arithmetic with NumPy broadcasting and array-based arithmetic:

这是E的矢量化版本。它用NumPy广播和基于数组的算法替换for循环和标量算法:

def alt_E(x):
    x = x[:, None]
    z = pi * (np.exp(-lamb) * (lamb**x)) / special.factorial(x)
    denom = z.sum(axis=1)[:, None]
    z /= denom
    return z

I ran em.py to get a sense for the typical size of x, lamb, pi, n and k. On data of this size, alt_E is about 120x faster than E:

我运行em.py来了解x,lamb,pi,n和k的典型大小。对于此大小的数据,alt_E比E快约120倍:

In [32]: %timeit E(x)
100 loops, best of 3: 11.5 ms per loop

In [33]: %timeit alt_E(x)
10000 loops, best of 3: 94.7 µs per loop

In [34]: 11500/94.7
Out[34]: 121.43611404435057

This is the setup I used for the benchmark:

这是我用于基准测试的设置:

import math
import numpy as np
import scipy.special as special

def alt_E(x):
    x = x[:, None]
    z = pi * (np.exp(-lamb) * (lamb**x)) / special.factorial(x)
    denom = z.sum(axis=1)[:, None]
    z /= denom
    return z


def E(x):
    z = np.zeros((n, k))
    for i in range(n):
        v = pi * (1 / math.factorial(x[i])) * \
            np.exp(-1 * lamb) * (lamb ** x[i])
        numerator = np.sum(v)
        c = v / numerator
        z[i, :] = c
    return z

n = 576
k = 2

x = np.array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4, 4, 4, 5])

lamb = np.array([ 0.84835141, 1.04025989])
pi = np.array([ 0.5806958, 0.4193042])

assert np.allclose(alt_E(x), E(x))

By the way, E could also be calculated using scipy.stats.poisson:

顺便说一句,E也可以使用scipy.stats.poisson计算:

import scipy.stats as stats
pois = stats.poisson(mu=lamb)

def alt_E2(x):
    z = pi * pois.pmf(x[:,None])
    denom = z.sum(axis=1)[:, None]
    z /= denom
    return z

but this does not turn out to be faster, at least for arrays of this length:

但这并不会更快,至少对于这个长度的数组:

In [33]: %timeit alt_E(x)
10000 loops, best of 3: 94.7 µs per loop

In [102]: %timeit alt_E2(x)
1000 loops, best of 3: 278 µs per loop

For larger x, alt_E2 is faster:

对于较大的x,alt_E2更快:

In [104]: x = np.random.random(10000)

In [106]: %timeit alt_E(x)
100 loops, best of 3: 2.18 ms per loop

In [105]: %timeit alt_E2(x)
1000 loops, best of 3: 643 µs per loop

#1

Here is a vectorized version of E. It replaces the for-loop and scalar arithmetic with NumPy broadcasting and array-based arithmetic:

这是E的矢量化版本。它用NumPy广播和基于数组的算法替换for循环和标量算法:

def alt_E(x):
    x = x[:, None]
    z = pi * (np.exp(-lamb) * (lamb**x)) / special.factorial(x)
    denom = z.sum(axis=1)[:, None]
    z /= denom
    return z

I ran em.py to get a sense for the typical size of x, lamb, pi, n and k. On data of this size, alt_E is about 120x faster than E:

我运行em.py来了解x,lamb,pi,n和k的典型大小。对于此大小的数据,alt_E比E快约120倍:

In [32]: %timeit E(x)
100 loops, best of 3: 11.5 ms per loop

In [33]: %timeit alt_E(x)
10000 loops, best of 3: 94.7 µs per loop

In [34]: 11500/94.7
Out[34]: 121.43611404435057

This is the setup I used for the benchmark:

这是我用于基准测试的设置:

import math
import numpy as np
import scipy.special as special

def alt_E(x):
    x = x[:, None]
    z = pi * (np.exp(-lamb) * (lamb**x)) / special.factorial(x)
    denom = z.sum(axis=1)[:, None]
    z /= denom
    return z


def E(x):
    z = np.zeros((n, k))
    for i in range(n):
        v = pi * (1 / math.factorial(x[i])) * \
            np.exp(-1 * lamb) * (lamb ** x[i])
        numerator = np.sum(v)
        c = v / numerator
        z[i, :] = c
    return z

n = 576
k = 2

x = np.array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4, 4, 4, 5])

lamb = np.array([ 0.84835141, 1.04025989])
pi = np.array([ 0.5806958, 0.4193042])

assert np.allclose(alt_E(x), E(x))

By the way, E could also be calculated using scipy.stats.poisson:

顺便说一句,E也可以使用scipy.stats.poisson计算:

import scipy.stats as stats
pois = stats.poisson(mu=lamb)

def alt_E2(x):
    z = pi * pois.pmf(x[:,None])
    denom = z.sum(axis=1)[:, None]
    z /= denom
    return z

but this does not turn out to be faster, at least for arrays of this length:

但这并不会更快,至少对于这个长度的数组:

In [33]: %timeit alt_E(x)
10000 loops, best of 3: 94.7 µs per loop

In [102]: %timeit alt_E2(x)
1000 loops, best of 3: 278 µs per loop

For larger x, alt_E2 is faster:

对于较大的x,alt_E2更快:

In [104]: x = np.random.random(10000)

In [106]: %timeit alt_E(x)
100 loops, best of 3: 2.18 ms per loop

In [105]: %timeit alt_E2(x)
1000 loops, best of 3: 643 µs per loop

秒客网

矢量化这个for循环numpy

1 个解决方案

#1

#1

相关文章