numpy二维数组划分精确丢失

I got some troubles when did numpy 2d-array divide. I have a 2D numpy array A (shape=(N,N)), then i divide it by row_sum(axis=1) and got 2D-array B, but when i computed the row_sum(axis=1) of B, is not equal to one at some rows, the code is followed: (python2.7.x)

当numpy二维阵列分裂时,我遇到了一些麻烦。我有一个2D numpy数组A(shape =(N,N)),然后我将它除以row_sum(axis = 1)并获得2D数组B,但是当我计算出B的row_sum(axis = 1)时,在某些行不等于1,代码如下:(python2.7.x)

from __future__ import division
import numpy as np

A = np.array([[x_11, x_12, ..., x_1N],
              [x_21, x_22, ..., x_2N],
              [...   ...   ...  ... ]
              [x_N1, x_N2, ..., x_NN]]) # x_ij are some np.float64 values
B = A / np.sum(A, axis=1, keepdims=True)

Theoretically result:

np.count_nonzero(np.sum(B, axis=1) != 1)
# it should be 0

Reality result:

np.count_nonzero(np.sum(B, axis=1) != 1)
# something bigger than 0

I believe the reason is precise lost, though i use the dtype=np.float64. Because in my project, the A 2D-array (shape=(N, N), N>8000), most of the values is very small(eg. =1.0) and the others are very big(eg. =2000) at the same row.

我相信原因是精确丢失,尽管我使用的是dtype = np.float64。因为在我的项目中,A 2D阵列(形状=(N,N),N> 8000),大多数值非常小(例如= 1.0)而其他值非常大(例如,= 2000)同一排。

I have try this: Add the losts

我试过这个:添加遗失物

while np.count_nonzero(np.sum(B, axis=1) != 1) != 0
    losts = 1 - B
    B[:, i] += losts # the i may change by some conditions

Though, finally it can solve this problems, but is not good for next step in my project.

虽然,最后它可以解决这个问题,但对我项目的下一步并不好。

Could anyone help me? Thanks a lot!!!

谁能帮助我?非常感谢!!!

1 个解决方案

#1

When working with floating numbers you get loss in precision and floating numbers very hardly match exactly natural numbers.

使用浮动数字时,会导致精度损失,浮动数字很难与完全自然的数字匹配。

A simple test to demonstrate this is:

一个简单的测试来证明这一点:

>>> 0.1 + 0.2 == 0.3
False

This is because the floatting point representation of 0.1 + 0.2 is 0.30000000000000004.

这是因为0.1 + 0.2的浮点表示为0.30000000000000004。

To solve this you just need to switch to np.isclose or np.allclose:

要解决此问题,您只需切换到np.isclose或np.allclose:

import numpy as np

N = 100
A = np.random.randn(N, N)
B = A / np.sum(A, axis=1, keepdims=True)

Then:

>>> np.count_nonzero(np.sum(B, axis=1) != 1)
79

whereas

>>> np.allclose(np.sum(B, axis=1), 1)
True

In short, your rows are properly normalized, they just don't sum exactly to 1.

简而言之,您的行已正确归一化,它们只是不精确到1。

From the documentation np.isclose(a, b) is equivalent to:

从文档np.isclose(a,b)相当于:

absolute(a - b) <= (atol + rtol * absolute(b))

绝对值(a - b)<=(atol + rtol * absolute(b))

with atol = 1e-8 and rtol = 1e-5 (by default), which is the proper way of comparing that two floating point numbers represent the same number (or at least, approximately).

使用atol = 1e-8和rtol = 1e-5(默认情况下),这是比较两个浮点数代表相同数字(或至少近似)的正确方法。

#1