For 1-D numpy arrays, this two expressions should yield the same result (theorically):
对于一维numpy数组,这两个表达式应该产生相同的结果(理论上):
(a*b).sum()/a.sum()
dot(a, b)/a.sum()
The latter uses dot()
and is faster. But which one is more accurate? Why?
后者使用dot(),而且速度更快。但是哪一个更准确呢?为什么?
Some context follows.
一些背景。
I wanted to compute the weighted variance of a sample using numpy. I found the dot()
expression in another answer, with a comment stating that it should be more accurate. However no explanation is given there.
我想计算一个使用numpy的样本的加权方差。我在另一个答案中找到了点()表达式,并有一条注释说明它应该更准确。但是没有给出任何解释。
1 个解决方案
#1
9
Numpy dot is one of the routines that calls the BLAS library that you link on compile (or builds its own). The importance of this is the BLAS library can make use of Multiply–accumulate operations (usually Fused-Multiply Add) which limit the number of roundings that the computation performs.
Numpy dot是一个调用BLAS库的例程,它可以链接到编译(或构建它自己的)。这一点的重要性在于,BLAS库可以利用多重积累操作(通常是模糊的乘法),这限制了计算执行的探测数量。
Take the following:
采取以下:
>>> a=np.ones(1000,dtype=np.float128)+1E-14
>>> (a*a).sum()
1000.0000000000199948
>>> np.dot(a,a)
1000.0000000000199948
Not exact, but close enough.
不精确,但足够近。
>>> a=np.ones(1000,dtype=np.float64)+1E-14
>>> np.dot(a,a)
1000.0000000000176 #off by 2.3948e-12
>>> (a*a).sum()
1000.0000000000059 #off by 1.40948e-11
The np.dot(a, a)
will be the more accurate of the two as it use approximately half the number of floating point roundings that the naive (a*a).sum()
does.
np。点(a, a)将会更精确,因为它使用的浮点数大约是天真(a*a).sum()的一半。
A book by Nvidia has the following example for 4 digits of precision. rn
stands for 4 round to the nearest 4 digits:
英伟达(Nvidia)的一本书给出了四个精确数字的例子。rn代表4轮到最近的4位数字:
x = 1.0008
x2 = 1.00160064 # true value
rn(x2 − 1) = 1.6006 × 10−4 # fused multiply-add
rn(rn(x2) − 1) = 1.6000 × 10−4 # multiply, then add
Of course floating point numbers are not rounded to the 16th decimal place in base 10, but you get the idea.
当然,浮点数没有四舍五入到十进制的十进制,但是你明白了。
Placing np.dot(a,a)
in the above notation with some additional pseudo code:
在上面的表示法中,加上一些额外的伪代码,将np.dot(a,a)设置为:
out=0
for x in a:
out=rn(x*x+out) #Fused multiply add
While (a*a).sum()
is:
(*).sum():
arr=np.zeros(a.shape[0])
for x in range(len(arr)):
arr[x]=rn(a[x]*a[x])
out=0
for x in arr:
out=rn(x+out)
From this its easy to see that the number is rounded twice as many times using (a*a).sum()
compared to np.dot(a,a)
. These small differences summed can change the answer minutely. Additional exmaples can be found here.
从这里很容易看出,使用(a*a).sum()和np.dot(a,a)的次数是整数的两倍。这些细微的差别可以细微地改变答案。在这里可以找到额外的exmaples。
#1
9
Numpy dot is one of the routines that calls the BLAS library that you link on compile (or builds its own). The importance of this is the BLAS library can make use of Multiply–accumulate operations (usually Fused-Multiply Add) which limit the number of roundings that the computation performs.
Numpy dot是一个调用BLAS库的例程,它可以链接到编译(或构建它自己的)。这一点的重要性在于,BLAS库可以利用多重积累操作(通常是模糊的乘法),这限制了计算执行的探测数量。
Take the following:
采取以下:
>>> a=np.ones(1000,dtype=np.float128)+1E-14
>>> (a*a).sum()
1000.0000000000199948
>>> np.dot(a,a)
1000.0000000000199948
Not exact, but close enough.
不精确,但足够近。
>>> a=np.ones(1000,dtype=np.float64)+1E-14
>>> np.dot(a,a)
1000.0000000000176 #off by 2.3948e-12
>>> (a*a).sum()
1000.0000000000059 #off by 1.40948e-11
The np.dot(a, a)
will be the more accurate of the two as it use approximately half the number of floating point roundings that the naive (a*a).sum()
does.
np。点(a, a)将会更精确,因为它使用的浮点数大约是天真(a*a).sum()的一半。
A book by Nvidia has the following example for 4 digits of precision. rn
stands for 4 round to the nearest 4 digits:
英伟达(Nvidia)的一本书给出了四个精确数字的例子。rn代表4轮到最近的4位数字:
x = 1.0008
x2 = 1.00160064 # true value
rn(x2 − 1) = 1.6006 × 10−4 # fused multiply-add
rn(rn(x2) − 1) = 1.6000 × 10−4 # multiply, then add
Of course floating point numbers are not rounded to the 16th decimal place in base 10, but you get the idea.
当然,浮点数没有四舍五入到十进制的十进制,但是你明白了。
Placing np.dot(a,a)
in the above notation with some additional pseudo code:
在上面的表示法中,加上一些额外的伪代码,将np.dot(a,a)设置为:
out=0
for x in a:
out=rn(x*x+out) #Fused multiply add
While (a*a).sum()
is:
(*).sum():
arr=np.zeros(a.shape[0])
for x in range(len(arr)):
arr[x]=rn(a[x]*a[x])
out=0
for x in arr:
out=rn(x+out)
From this its easy to see that the number is rounded twice as many times using (a*a).sum()
compared to np.dot(a,a)
. These small differences summed can change the answer minutely. Additional exmaples can be found here.
从这里很容易看出,使用(a*a).sum()和np.dot(a,a)的次数是整数的两倍。这些细微的差别可以细微地改变答案。在这里可以找到额外的exmaples。