Maybe I'm doing something odd, but maybe found a surprising performance loss when using numpy, seems consistent regardless of the power used. For instance when x is a random 100x100 array
也许我正在做一些奇怪的事情,但是在使用numpy时可能会发现令人惊讶的性能损失,无论使用的功率如何都显得一致。例如,当x是随机的100x100阵列时
x = numpy.power(x,3)
is about 60x slower than
比约慢60倍
x = x*x*x
A plot of the speed up for various array sizes reveals a sweet spot with arrays around size 10k and a consistent 5-10x speed up for other sizes.
各种阵列尺寸的加速图显示了一个最佳尺寸10k左右的阵列和其他尺寸的5-10倍速度。
Code to test below on your own machine (a little messy):
在你自己的机器上测试下面的代码(有点乱):
import numpy as np
from matplotlib import pyplot as plt
from time import time
ratios = []
sizes = []
for n in np.logspace(1,3,20).astype(int):
a = np.random.randn(n,n)
inline_times = []
for i in range(100):
t = time()
b = a*a*a
inline_times.append(time()-t)
inline_time = np.mean(inline_times)
pow_times = []
for i in range(100):
t = time()
b = np.power(a,3)
pow_times.append(time()-t)
pow_time = np.mean(pow_times)
sizes.append(a.size)
ratios.append(pow_time/inline_time)
plt.plot(sizes,ratios)
plt.title('Performance of inline vs numpy.power')
plt.ylabel('Nx speed-up using inline')
plt.xlabel('Array size')
plt.xscale('log')
plt.show()
Anyone have an explanation?
有人有解释吗?
3 个解决方案
#1
17
It's well known that multiplication of doubles, which your processor can do in a very fancy way, is very, very fast. pow
is decidedly slower.
众所周知,处理器可以以非常奇特的方式进行的双倍乘法非常非常快。 pow显然比较慢。
Some performance guides out there even advise people to plan for this, perhaps even in some way that might be a bit overzealous at times.
一些性能指南甚至建议人们为此做好计划,甚至可能在某些方面有时可能有点过于热心。
numpy special-cases squaring to make sure it's not too, too slow, but it sends cubing right off to your libc's pow
, which isn't nearly as fast as a couple multiplications.
numpy特殊情况平方以确保它不会太慢,但是它会立即向你的libc的pow发送立方体,这几乎不会像几次乘法一样快。
#2
5
I suspect the issue is that np.power
always does float exponentiation, and it doesn't know how to optimize or vectorize that on your platform (or, probably, most/all platforms), while multiplication is easy to toss into SSE, and pretty fast even if you don't.
我怀疑问题是np.power总是浮点运算,它不知道如何优化或矢量化你的平台(或者,可能是大多数/所有平台),而乘法很容易折入SSE,即使你不这样做也很快。
Even if np.power
were smart enough to do integer exponentiation separately, unless it unrolled small values into repeated multiplication, it still wouldn't be nearly as fast.
即使np.power足够智能分别进行整数求幂,除非将小值展开到重复乘法中,否则它仍然不会那么快。
You can verify this pretty easily by comparing the time for int-to-int, int-to-float, float-to-int, and float-to-float powers vs. multiplication for a small array; int-to-int is about 5x as fast as the others—but still 4x slower than multiplication (although I tested with PyPy with a customized NumPy, so it's probably better for someone with the normal NumPy installed on CPython to give real results…)
您可以通过比较int-to-int,int-to-float,float-to-int和float-to-float功率与小数组乘法的时间来轻松验证这一点。 int-to-int的速度是其他的5倍 - 但仍然比乘法慢4倍(尽管我使用PyPy测试了自定义的NumPy,所以对于在CPython上安装了正常NumPy的人来说,它可能会更好地给出真正的结果......)
#3
5
The performance of numpys power function scales very non-linearly with the exponent. Constrast this with the naive approach which does. The same type of scaling should exist, regardless of matrix size. Basically, unless the exponent is sufficiently large, you aren't going to see any tangible benefit.
numpys幂函数的性能与指数非常非线性地成比例。用这种天真的方法来衡量这一点。无论矩阵大小如何,都应存在相同类型的缩放。基本上,除非指数足够大,否则你不会看到任何实际的好处。
import matplotlib.pyplot as plt
import numpy as np
import functools
import time
def timeit(func):
@functools.wraps(func)
def newfunc(*args, **kwargs):
startTime = time.time()
res = func(*args, **kwargs)
elapsedTime = time.time() - startTime
return (res, elapsedTime)
return newfunc
@timeit
def naive_power(m, n):
m = np.asarray(m)
res = m.copy()
for i in xrange(1,n):
res *= m
return res
@timeit
def fast_power(m, n):
# elementwise power
return np.power(m, n)
m = np.random.random((100,100))
n = 400
rs1 = []
ts1 = []
ts2 = []
for i in xrange(1, n):
r1, t1 = naive_power(m, i)
ts1.append(t1)
for i in xrange(1, n):
r2, t2 = fast_power(m, i)
ts2.append(t2)
plt.plot(ts1, label='naive')
plt.plot(ts2, label='numpy')
plt.xlabel('exponent')
plt.ylabel('time')
plt.legend(loc='upper left')
#1
17
It's well known that multiplication of doubles, which your processor can do in a very fancy way, is very, very fast. pow
is decidedly slower.
众所周知,处理器可以以非常奇特的方式进行的双倍乘法非常非常快。 pow显然比较慢。
Some performance guides out there even advise people to plan for this, perhaps even in some way that might be a bit overzealous at times.
一些性能指南甚至建议人们为此做好计划,甚至可能在某些方面有时可能有点过于热心。
numpy special-cases squaring to make sure it's not too, too slow, but it sends cubing right off to your libc's pow
, which isn't nearly as fast as a couple multiplications.
numpy特殊情况平方以确保它不会太慢,但是它会立即向你的libc的pow发送立方体,这几乎不会像几次乘法一样快。
#2
5
I suspect the issue is that np.power
always does float exponentiation, and it doesn't know how to optimize or vectorize that on your platform (or, probably, most/all platforms), while multiplication is easy to toss into SSE, and pretty fast even if you don't.
我怀疑问题是np.power总是浮点运算,它不知道如何优化或矢量化你的平台(或者,可能是大多数/所有平台),而乘法很容易折入SSE,即使你不这样做也很快。
Even if np.power
were smart enough to do integer exponentiation separately, unless it unrolled small values into repeated multiplication, it still wouldn't be nearly as fast.
即使np.power足够智能分别进行整数求幂,除非将小值展开到重复乘法中,否则它仍然不会那么快。
You can verify this pretty easily by comparing the time for int-to-int, int-to-float, float-to-int, and float-to-float powers vs. multiplication for a small array; int-to-int is about 5x as fast as the others—but still 4x slower than multiplication (although I tested with PyPy with a customized NumPy, so it's probably better for someone with the normal NumPy installed on CPython to give real results…)
您可以通过比较int-to-int,int-to-float,float-to-int和float-to-float功率与小数组乘法的时间来轻松验证这一点。 int-to-int的速度是其他的5倍 - 但仍然比乘法慢4倍(尽管我使用PyPy测试了自定义的NumPy,所以对于在CPython上安装了正常NumPy的人来说,它可能会更好地给出真正的结果......)
#3
5
The performance of numpys power function scales very non-linearly with the exponent. Constrast this with the naive approach which does. The same type of scaling should exist, regardless of matrix size. Basically, unless the exponent is sufficiently large, you aren't going to see any tangible benefit.
numpys幂函数的性能与指数非常非线性地成比例。用这种天真的方法来衡量这一点。无论矩阵大小如何,都应存在相同类型的缩放。基本上,除非指数足够大,否则你不会看到任何实际的好处。
import matplotlib.pyplot as plt
import numpy as np
import functools
import time
def timeit(func):
@functools.wraps(func)
def newfunc(*args, **kwargs):
startTime = time.time()
res = func(*args, **kwargs)
elapsedTime = time.time() - startTime
return (res, elapsedTime)
return newfunc
@timeit
def naive_power(m, n):
m = np.asarray(m)
res = m.copy()
for i in xrange(1,n):
res *= m
return res
@timeit
def fast_power(m, n):
# elementwise power
return np.power(m, n)
m = np.random.random((100,100))
n = 400
rs1 = []
ts1 = []
ts2 = []
for i in xrange(1, n):
r1, t1 = naive_power(m, i)
ts1.append(t1)
for i in xrange(1, n):
r2, t2 = fast_power(m, i)
ts2.append(t2)
plt.plot(ts1, label='naive')
plt.plot(ts2, label='numpy')
plt.xlabel('exponent')
plt.ylabel('time')
plt.legend(loc='upper left')