I have the following two functions:
我有以下两个功能:
def loop(x):
a = np.zeros(10)
for i1 in range(10):
for i2 in range(10):
a[i1] += np.sin(x[i2] - x[i1])
return a
and
和
def vectorized(x):
b = np.zeros(10)
for i1 in range(10):
b += np.sin(np.roll(x, i1) - x)
return b
However, when I run both, I find that their results slightly differ:
然而,当我同时运行两者时,我发现他们的结果略有不同:
x = np.arange(10)
a, b = loop(x), vectorized(x)
print b - a
I get:
我得到:
[ 2.22044605e-16 0.00000000e+00 0.00000000e+00 6.66133815e-16
-2.22044605e-16 2.22044605e-16 0.00000000e+00 2.22044605e-16
2.22044605e-16 2.22044605e-16]
which is very small, but in my case, affects the simulation. If I remove the np.sin from the functions, the difference disappears. Alternatively the difference also goes away if use np.float32 for x, but this is part of an ode which is being solved by by a solver that uses float64. Is there a way to resolve this difference?
这是很小的,但是在我的例子中,会影响到模拟。如果我去掉np。函数中的sin,差值消失了。另外,如果使用np,差异也会消失。float32表示x,但是这是ode的一部分,ode是由使用float64的解析器来解决的。有办法解决这种差异吗?
1 个解决方案
#1
6
It's because you don't make the operation in the same order.
因为你的操作顺序不一样。
For the equivalent totally vectored solution, do c=sin(add.outer(x,-x))).sum(axis=0)
.
对于等价的完全矢量解,do c=sin(add.outer(x,-x))).sum(axis=0)。
In [8]: (c==loop(x)).all()
Out[8]: True
And you win the full avantage of vectorisation :
你赢得了矢量化的全部优势:
In [9]: %timeit loop(x)
1000 loops, best of 3: 750 µs per loop
In [10]: %timeit vectorized(x)
1000 loops, best of 3: 347 µs per loop
In [11]: %timeit sin(x[:,None]-x).sum(axis=0)
10000 loops, best of 3: 46 µs per loop
#1
6
It's because you don't make the operation in the same order.
因为你的操作顺序不一样。
For the equivalent totally vectored solution, do c=sin(add.outer(x,-x))).sum(axis=0)
.
对于等价的完全矢量解,do c=sin(add.outer(x,-x))).sum(axis=0)。
In [8]: (c==loop(x)).all()
Out[8]: True
And you win the full avantage of vectorisation :
你赢得了矢量化的全部优势:
In [9]: %timeit loop(x)
1000 loops, best of 3: 750 µs per loop
In [10]: %timeit vectorized(x)
1000 loops, best of 3: 347 µs per loop
In [11]: %timeit sin(x[:,None]-x).sum(axis=0)
10000 loops, best of 3: 46 µs per loop