I got some troubles when did numpy 2d-array divide. I have a 2D numpy array A (shape=(N,N)), then i divide it by row_sum(axis=1) and got 2D-array B, but when i computed the row_sum(axis=1) of B, is not equal to one at some rows, the code is followed: (python2.7.x)
当numpy二维阵列分裂时,我遇到了一些麻烦。我有一个2D numpy数组A(shape =(N,N)),然后我将它除以row_sum(axis = 1)并获得2D数组B,但是当我计算出B的row_sum(axis = 1)时,在某些行不等于1,代码如下:(python2.7.x)
from __future__ import division
import numpy as np
A = np.array([[x_11, x_12, ..., x_1N],
[x_21, x_22, ..., x_2N],
[... ... ... ... ]
[x_N1, x_N2, ..., x_NN]]) # x_ij are some np.float64 values
B = A / np.sum(A, axis=1, keepdims=True)
Theoretically result:
np.count_nonzero(np.sum(B, axis=1) != 1)
# it should be 0
Reality result:
np.count_nonzero(np.sum(B, axis=1) != 1)
# something bigger than 0
I believe the reason is precise lost, though i use the dtype=np.float64. Because in my project, the A 2D-array (shape=(N, N), N>8000), most of the values is very small(eg. =1.0) and the others are very big(eg. =2000) at the same row.
我相信原因是精确丢失,尽管我使用的是dtype = np.float64。因为在我的项目中,A 2D阵列(形状=(N,N),N> 8000),大多数值非常小(例如= 1.0)而其他值非常大(例如,= 2000)同一排。
I have try this: Add the losts
我试过这个:添加遗失物
while np.count_nonzero(np.sum(B, axis=1) != 1) != 0
losts = 1 - B
B[:, i] += losts # the i may change by some conditions
Though, finally it can solve this problems, but is not good for next step in my project.
虽然,最后它可以解决这个问题,但对我项目的下一步并不好。
Could anyone help me? Thanks a lot!!!
谁能帮助我?非常感谢!!!
1 个解决方案
#1
2
When working with floating numbers you get loss in precision and floating numbers very hardly match exactly natural numbers.
使用浮动数字时,会导致精度损失,浮动数字很难与完全自然的数字匹配。
A simple test to demonstrate this is:
一个简单的测试来证明这一点:
>>> 0.1 + 0.2 == 0.3
False
This is because the floatting point representation of 0.1 + 0.2
is 0.30000000000000004
.
这是因为0.1 + 0.2的浮点表示为0.30000000000000004。
To solve this you just need to switch to np.isclose or np.allclose:
要解决此问题,您只需切换到np.isclose或np.allclose:
import numpy as np
N = 100
A = np.random.randn(N, N)
B = A / np.sum(A, axis=1, keepdims=True)
Then:
>>> np.count_nonzero(np.sum(B, axis=1) != 1)
79
whereas
>>> np.allclose(np.sum(B, axis=1), 1)
True
In short, your rows are properly normalized, they just don't sum exactly to 1.
简而言之,您的行已正确归一化,它们只是不精确到1。
From the documentation np.isclose(a, b)
is equivalent to:
从文档np.isclose(a,b)相当于:
absolute(a - b) <= (atol + rtol * absolute(b))
绝对值(a - b)<=(atol + rtol * absolute(b))
with atol = 1e-8
and rtol = 1e-5
(by default), which is the proper way of comparing that two floating point numbers represent the same number (or at least, approximately).
使用atol = 1e-8和rtol = 1e-5(默认情况下),这是比较两个浮点数代表相同数字(或至少近似)的正确方法。
#1
2
When working with floating numbers you get loss in precision and floating numbers very hardly match exactly natural numbers.
使用浮动数字时,会导致精度损失,浮动数字很难与完全自然的数字匹配。
A simple test to demonstrate this is:
一个简单的测试来证明这一点:
>>> 0.1 + 0.2 == 0.3
False
This is because the floatting point representation of 0.1 + 0.2
is 0.30000000000000004
.
这是因为0.1 + 0.2的浮点表示为0.30000000000000004。
To solve this you just need to switch to np.isclose or np.allclose:
要解决此问题,您只需切换到np.isclose或np.allclose:
import numpy as np
N = 100
A = np.random.randn(N, N)
B = A / np.sum(A, axis=1, keepdims=True)
Then:
>>> np.count_nonzero(np.sum(B, axis=1) != 1)
79
whereas
>>> np.allclose(np.sum(B, axis=1), 1)
True
In short, your rows are properly normalized, they just don't sum exactly to 1.
简而言之,您的行已正确归一化,它们只是不精确到1。
From the documentation np.isclose(a, b)
is equivalent to:
从文档np.isclose(a,b)相当于:
absolute(a - b) <= (atol + rtol * absolute(b))
绝对值(a - b)<=(atol + rtol * absolute(b))
with atol = 1e-8
and rtol = 1e-5
(by default), which is the proper way of comparing that two floating point numbers represent the same number (or at least, approximately).
使用atol = 1e-8和rtol = 1e-5(默认情况下),这是比较两个浮点数代表相同数字(或至少近似)的正确方法。