当使用双打时,为什么不(x / (y * z))与(x / y / z)相同?(复制)

时间:2021-08-16 21:32:17

This question already has an answer here:

这个问题已经有了答案:

This is partly academic, as for my purposes I only need it rounded to two decimal places; but I am keen to know what is going on to produce two slightly different results.

这部分是学术性的,至于我的目的,我只需要它四舍五入到小数点后两位;但我很想知道接下来会发生什么,会产生两个稍微不同的结果。

This is the test that I wrote to narrow it to the simplest implementation:

这是我写的测试,把它缩小到最简单的实现:

@Test
public void shouldEqual() {
  double expected = 450.00d / (7d * 60);  // 1.0714285714285714
  double actual = 450.00d / 7d / 60;      // 1.0714285714285716

  assertThat(actual).isEqualTo(expected);
}

But it fails with this output:

但是这个输出却失败了:

org.junit.ComparisonFailure: 
Expected :1.0714285714285714
Actual   :1.0714285714285716

Can anyone explain in detail what is going on under the hood to result in the value at 1.000000000000000X being different?

有谁能详细解释一下引擎盖下发生的事情,导致1.000000000000000X的值不一样吗?

Some of the points I'm looking for in an answer are: Where is the precision lost? Which method is preferred, and why? Which is actually correct? (In pure maths, both can't be right. Perhaps both are wrong?) Is there a better solution or method for these arithmetic operations?

我在一个答案中寻找的一些要点是:精确损失在哪里?哪种方法更可取,为什么?这是正确的?(在纯数学中,两者都不可能是对的。也许是错误的?)这些算术运算有更好的解决方法或方法吗?

5 个解决方案

#1


42  

I see a bunch of questions that tell you how to work around this problem, but not one that really explains what's going on, other than "floating-point roundoff error is bad, m'kay?" So let me take a shot at it. Let me first point out that nothing in this answer is specific to Java. Roundoff error is a problem inherent to any fixed-precision representation of numbers, so you get the same issues in, say, C.

我看到了一些问题,告诉你如何解决这个问题,但没有一个能真正解释发生了什么,除了“浮点舍掉错误是坏的,m'kay?”让我试试看。让我首先指出,这个答案中没有什么是针对Java的。舍入错误是任何数字的固定精度表示所固有的问题,因此您可以在C语言中得到相同的问题。

Roundoff error in a decimal data type

As a simplified example, imagine we have some sort of computer that natively uses an unsigned decimal data type, let's call it float6d. The length of the data type is 6 digits: 4 dedicated to the mantissa, and 2 dedicated to the exponent. For example, the number 3.142 can be expressed as

作为一个简化的例子,假设我们有某种计算机,本机使用无符号的十进制数据类型,我们称之为float6d。数据类型的长度为6位数:4为尾数,2为指数。例如,数字3.142可以表示为。

3.142 x 10^0

which would be stored in 6 digits as

哪一个会以6位数存储?

503142

The first two digits are the exponent plus 50, and the last four are the mantissa. This data type can represent any number from 0.001 x 10^-50 to 9.999 x 10^+49.

前两个数字是指数加50,最后四个是尾数。这个数据类型可以表示任意数量从0.001 x 10 ^ -50 - 9.999 x 10 ^ + 49。

Actually, that's not true. It can't store any number. What if you want to represent 3.141592? Or 3.1412034? Or 3.141488906? Tough luck, the data type can't store more than four digits of precision, so the compiler has to round anything with more digits to fit into the constraints of the data type. If you write

事实上,这不是真的。它不能存储任何数字。如果你想表示3.141592呢?还是3.1412034 ?还是3.141488906 ?不幸的是,数据类型不能存储超过4位数的精度,因此编译器必须将任何有更多数字的数据包围,以适应数据类型的约束。如果你写

float6d x = 3.141592;
float6d y = 3.1412034;
float6d z = 3.141488906;

then the compiler converts each of these three values to the same internal representation, 3.142 x 10^0 (which, remember, is stored as 503142), so that x == y == z will hold true.

然后,编译器将这三个值中的每一个都转换为相同的内部表示,3.142 x10 0(记住,它被存储为503142),因此x == y == z将是正确的。

The point is that there is a whole range of real numbers which all map to the same underlying sequence of digits (or bits, in a real computer). Specifically, any x satisfying 3.1415 <= x <= 3.1425 (assuming half-even rounding) gets converted to the representation 503142 for storage in memory.

重点是,有一系列的实数,它们都映射到相同的数字序列(或比特,在真实的计算机中)。具体地说,任何满足3.1415 <= x <= 3.1425(假设半舍入)的x都转换为表示内存中存储的503142表示。

This rounding happens every time your program stores a floating-point value in memory. The first time it happens is when you write a constant in your source code, as I did above with x, y, and z. It happens again whenever you do an arithmetic operation that increases the number of digits of precision beyond what the data type can represent. Either of these effects is called roundoff error. There are a few different ways this can happen:

每当程序在内存中存储浮点值时,就会发生这种情况。第一次发生的情况是,当你在源代码中写一个常量时,就像我在上面用x、y和z写的那样,当你做一个算术运算时,它会再次发生,它增加了超过数据类型所能表示的精度的位数。这些效果中的任何一种都称为舍入错误。有几种不同的方式可以发生:

  • Addition and subtraction: if one of the values you're adding has a different exponent from the other, you will wind up with extra digits of precision, and if there are enough of them, the least significant ones will need to be dropped. For example, 2.718 and 121.0 are both values that can be exactly represented in the float6d data type. But if you try to add them together:

    添加和减法:如果你添加的一个值与另一个值有不同的指数,你就会得到额外的精确数字,如果有足够多的值,那么最不重要的值就需要被删除。例如,2.718和121.0都是可以在float6d数据类型中精确表示的值。但是如果你想把它们加在一起:

       1.210     x 10^2
    +  0.02718   x 10^2
    -------------------
       1.23718   x 10^2
    

    which gets rounded off to 1.237 x 10^2, or 123.7, dropping two digits of precision.

    四舍五入到1.237 x 10 ^ 2,或123.7,两位精度的下降。

  • Multiplication: the number of digits in the result is approximately the sum of the number of digits in the two operands. This will produce some amount of roundoff error, if your operands already have many significant digits. For example, 121 x 2.718 gives you

    乘法:结果中数字的数目大约是两个操作数中位数的总和。如果您的操作数已经有许多有效数字,这将产生一定数量的舍入错误。例如,121 x 2。718给出。

       1.210     x 10^2
    x  0.02718   x 10^2
    -------------------
       3.28878   x 10^2
    

    which gets rounded off to 3.289 x 10^2, or 328.9, again dropping two digits of precision.

    四舍五入到3.289 x 10 ^ 2,或328.9,再把两个数字的精度。

    However, it's useful to keep in mind that, if your operands are "nice" numbers, without many significant digits, the floating-point format can probably represent the result exactly, so you don't have to deal with roundoff error. For example, 2.3 x 140 gives

    但是,记住,如果您的操作数是“nice”数字,而没有许多有效数字,浮点格式可能会准确地表示结果,所以您不必处理舍入错误。例如,2.3 x140给出。

       1.40      x 10^2
    x  0.23      x 10^2
    -------------------
       3.22      x 10^2
    

    which has no roundoff problems.

    这没有问题。

  • Division: this is where things get messy. Division will pretty much always result in some amount of roundoff error unless the number you're dividing by happens to be a power of the base (in which case the division is just a digit shift, or bit shift in binary). As an example, take two very simple numbers, 3 and 7, divide them, and you get

    师:这就是事情变得一团糟的地方。除法几乎总是会导致一些舍入误差,除非你所划分的数字恰好是基数的一个幂(在这种情况下,除法只是一个数字移位,或者二进制的移位)。举个例子,取两个非常简单的数字,3和7,除以它们,得到。

       3.                x 10^0
    /  7.                x 10^0
    ----------------------------
       0.428571428571... x 10^0
    

    The closest value to this number which can be represented as a float6d is 4.286 x 10^-1, or 0.4286, which distinctly differs from the exact result.

    最接近值这个数可以表示为float6d 4.286 x 10 ^ 1或0.4286,明显不同于确切的结果。

As we'll see in the next section, the error introduced by rounding grows with each operation you do. So if you're working with "nice" numbers, as in your example, it's generally best to do the division operations as late as possible because those are the operations most likely to introduce roundoff error into your program where none existed before.

正如我们在下一节中会看到的,舍入所引入的错误随着每个操作的增加而增加。因此,如果您使用“nice”数字,就像在您的示例中一样,通常最好尽可能晚地执行分区操作,因为这些操作最有可能在您的程序中引入不存在的舍入错误。

Analysis of roundoff error

In general, if you can't assume your numbers are "nice", roundoff error can be either positive or negative, and it's very difficult to predict which direction it will go just based on the operation. It depends on the specific values involved. Look at this plot of the roundoff error for 2.718 z as a function of z (still using the float6d data type):

一般来说,如果你不能假设你的数字是“好”的,那么舍入误差可以是正的,也可以是负的,并且很难预测它会根据操作的方向进行。这取决于所涉及的具体数值。以z为函数(仍然使用float6d数据类型):

当使用双打时,为什么不(x / (y * z))与(x / y / z)相同?(复制)

In practice, when you're working with values that use the full precision of your data type, it's often easier to treat roundoff error as a random error. Looking at the plot, you might be able to guess that the magnitude of the error depends on the order of magnitude of the result of the operation. In this particular case, when z is of the order 10-1, 2.718 z is also on the order of 10-1, so it will be a number of the form 0.XXXX. The maximum roundoff error is then half of the last digit of precision; in this case, by "the last digit of precision" I mean 0.0001, so the roundoff error varies between -0.00005 and +0.00005. At the point where 2.718 z jumps up to the next order of magnitude, which is 1/2.718 = 0.3679, you can see that the roundoff error also jumps up by an order of magnitude.

在实践中,当您使用数据类型的完全精度的值时,通常更容易将roundoff错误视为一个随机错误。看这个图,你可能会猜出误差的大小取决于运算结果的大小。在这个特殊情况下,当z的顺序是10-1时,2。718 z也在10-1的顺序上,所以它是0。xxxx的一些形式。最大的舍入误差是精度的最后一个数字的一半;在这种情况下,通过“精确的最后一个数字”,我的意思是0.0001,所以舍入误差在-0.00005和+0.00005之间。在2。718 z跳跃到下一个数量级,也就是1/2。718 = 0.3679,你可以看到,舍入误差也会上升一个数量级。

You can use well-known techniques of error analysis to analyze how a random (or unpredictable) error of a certain magnitude affects your result. Specifically, for multiplication or division, the "average" relative error in your result can be approximated by adding the relative error in each of the operands in quadrature - that is, square them, add them, and take the square root. With our float6d data type, the relative error varies between 0.0005 (for a value like 0.101) and 0.00005 (for a value like 0.995).

您可以使用众所周知的错误分析技术来分析一个特定大小的随机(或不可预测的)错误如何影响您的结果。具体地说,对于乘法或除法,你的结果中的“平均”相对误差可以通过在每一个操作数中加入相对误差来近似——即,平方,加它们,然后取平方根。使用我们的float6d数据类型,相对误差在0.0005(值0.101)和0.00005(值为0.995)之间变化。

当使用双打时,为什么不(x / (y * z))与(x / y / z)相同?(复制)

Let's take 0.0001 as a rough average for the relative error in values x and y. The relative error in x * y or x / y is then given by

我们取0。0001作为x和y的相对误差的粗略平均值,然后给出x * y或x / y的相对误差。

sqrt(0.0001^2 + 0.0001^2) = 0.0001414

which is a factor of sqrt(2) larger than the relative error in each of the individual values.

这是一个比每个单独值的相对误差大的sqrt(2)因子。

When it comes to combining operations, you can apply this formula multiple times, once for each floating-point operation. So for instance, for z / (x * y), the relative error in x * y is, on average, 0.0001414 (in this decimal example) and then the relative error in z / (x * y) is

在组合操作时,可以多次应用这个公式,每一次浮点运算一次。例如,对于z / (x * y), x * y的相对误差是,平均为0.0001414(在这个十进制例子中),然后z / (x * y)的相对误差为。

sqrt(0.0001^2 + 0.0001414^2) = 0.0001732

Notice that the average relative error grows with each operation, specifically as the square root of the number of multiplications and divisions you do.

请注意,每个操作的平均相对误差都在增长,具体地说就是您所做的乘法和除法的平方根。

Similarly, for z / x * y, the average relative error in z / x is 0.0001414, and the relative error in z / x * y is

同样,z / x * y的平均相对误差为0.0001414,z / x * y的相对误差为。

sqrt(0.0001414^2 + 0.0001^2) = 0.0001732

So, the same, in this case. This means that for arbitrary values, on average, the two expressions introduce approximately the same error. (In theory, that is. I've seen these operations behave very differently in practice, but that's another story.)

在这个例子中是一样的。这意味着对于任意值,平均而言,这两个表达式会引入近似相同的错误。(理论上。我看到这些操作在实践中表现得非常不同,但那是另一回事了。

Gory details

You might be curious about the specific calculation you presented in the question, not just an average. For that analysis, let's switch to the real world of binary arithmetic. Floating-point numbers in most systems and languages are represented using IEEE standard 754. For 64-bit numbers, the format specifies 52 bits dedicated to the mantissa, 11 to the exponent, and one to the sign. In other words, when written in base 2, a floating point number is a value of the form

你可能会对你在问题中给出的具体计算感到好奇,而不仅仅是一个平均值。为了这个分析,让我们切换到二进制算术的现实世界。大多数系统和语言中的浮点数使用IEEE标准754表示。对于64位数字,格式指定52位专用于尾数,11的指数,和1的符号。换句话说,当以2为基底时,浮点数就是表单的值。

1.1100000000000000000000000000000000000000000000000000 x 2^00000000010
                       52 bits                             11 bits

The leading 1 is not explicitly stored, and constitutes a 53rd bit. Also, you should note that the 11 bits stored to represent the exponent are actually the real exponent plus 1023. For example, this particular value is 7, which is 1.75 x 22. The mantissa is 1.75 in binary, or 1.11, and the exponent is 1023 + 2 = 1025 in binary, or 10000000001, so the content stored in memory is

引导的1没有显式地存储,并构成第53位。还要注意,表示指数的11位实际上是指数+ 1023。例如,这个特殊的值是7,即1.75 x 22。mantissa是1.75,或者1.11,指数是1023 + 2 = 1025,或者10000000001,所以存储在内存中的内容是。

01000000000111100000000000000000000000000000000000000000000000000
 ^          ^
 exponent   mantissa

but that doesn't really matter.

但这并不重要。

Your example also involves 450,

你的例子还包括450,

1.1100001000000000000000000000000000000000000000000000 x 2^00000001000

and 60,

和60,

1.1110000000000000000000000000000000000000000000000000 x 2^00000000101

You can play around with these values using this converter or any of many others on the internet.

您可以使用这个转换器或internet上的任何其他值来处理这些值。

When you compute the first expression, 450/(7*60), the processor first does the multiplication, obtaining 420, or

当你计算第一个表达式,450/(7*60),处理器首先做乘法,得到420,或者。

1.1010010000000000000000000000000000000000000000000000 x 2^00000001000

Then it divides 450 by 420. This produces 15/14, which is

然后它将450除以420。这就产生了15/14,也就是。

1.0001001001001001001001001001001001001001001001001001001001001001001001...

in binary. Now, the Java language specification says that

在二进制。现在,Java语言规范说明了这一点。

Inexact results must be rounded to the representable value nearest to the infinitely precise result; if the two nearest representable values are equally near, the one with its least significant bit zero is chosen. This is the IEEE 754 standard's default rounding mode known as round to nearest.

不精确的结果必须四舍五入到最接近无限精确结果的可表示值;如果两个最接近的可表示值相等,则选择最不显著位零的值。这是IEEE 754标准的默认四舍五入模式。

and the nearest representable value to 15/14 in 64-bit IEEE 754 format is

在64位IEEE 754格式中,最接近15/14的可表示值是。

1.0001001001001001001001001001001001001001001001001001 x 2^00000000000

which is approximately 1.0714285714285714 in decimal. (More precisely, this is the least precise decimal value that uniquely specifies this particular binary representation.)

大约是1。0714285714285714。(更准确地说,这是最不精确的十进制值,它惟一地指定了这个特殊的二进制表示。)

On the other hand, if you compute 450 / 7 first, the result is 64.2857142857..., or in binary,

另一方面,如果你先计算450 / 7,结果是64.2857142857……,或者在二进制,

1000000.01001001001001001001001001001001001001001001001001001001001001001...

for which the nearest representable value is

最接近的可表示值是。

1.0000000100100100100100100100100100100100100100100101 x 2^00000000110

which is 64.28571428571429180465... Note the change in the last digit of the binary mantissa (compared to the exact value) due to roundoff error. Dividing this by 60 gives you

这是64.28571428571429180465……注意,由于舍入错误,二进制尾数的最后一个数字(与精确值相比)的变化。把这个除以60。

1.000100100100100100100100100100100100100100100100100110011001100110011...

Look at the end: the pattern is different! It's 0011 that repeats, instead of 001 as in the other case. The closest representable value is

看最后:模式不同!它是0011重复,而不是001,在另一种情况下。最接近的可表示值是。

1.0001001001001001001001001001001001001001001001001010 x 2^00000000000

which differs from the other order of operations in the last two bits: they're 10 instead of 01. The decimal equivalent is 1.0714285714285716.

这与前两部分中的其他操作顺序不同:它们是10而不是01。十进制的等价物是1。0714285714285716。

The specific rounding that causes this difference should be clear if you look at the exact binary values:

如果你看一下精确的二进制数值,就可以清楚地看到,导致这种差异的具体的四舍五入:

1.0001001001001001001001001001001001001001001001001001001001001001001001...
1.0001001001001001001001001001001001001001001001001001100110011001100110...
                                                     ^ last bit of mantissa

It works out in this case that the former result, numerically 15/14, happens to be the most accurate representation of the exact value. This is an example of how leaving division until the end benefits you. But again, this rule only holds as long as the values you're working with don't use the full precision of the data type. Once you start working with inexact (rounded) values, you no longer protect yourself from further roundoff errors by doing the multiplications first.

在这个例子中,它的结果是,在数值上15/14,恰好是精确值的最精确表示。这是一个例子,说明离开部门直到最终对你有好处。但是,这条规则只适用于您所使用的值不使用数据类型的全部精度。一旦您开始使用不精确的(圆)值,您将不再通过先做乘法来保护自己不受进一步的舍离错误的保护。

#2


5  

It has to do with how the double type is implemented and the fact that the floating-point types don't make the same precision guarantees as other simpler numerical types. Although the following answer is more specifically about sums, it also answers your question by explaining how there is no guarantee of infinite precision in floating-point mathematical operations: Why does changing the sum order returns a different result?. Essentially you should never attempt to determine the equality of floating-point values without specifying an acceptable margin of error. Google's Guava library includes DoubleMath.fuzzyEquals(double, double, double) to determine the equality of two double values within a certain precision. If you wish to read up on the specifics of floating-point equality this site is quite useful; the same site also explains floating-point rounding errors. In summation: the expected and actual values of your calculation differ because of the rounding differing between the calculations due to the order of operations.

它与double类型的实现方式有关,以及浮点类型与其他更简单的数值类型不具有相同的精度保证。虽然下面的答案更具体地说明了加法,但它也回答了你的问题,它解释了浮点数学运算中没有无限精度的保证:为什么改变求和顺序会返回一个不同的结果?本质上,您不应该尝试确定浮点值的相等性,而不指定可接受的误差范围。谷歌的Guava图书馆包括DoubleMath。fuzzyEquals(double, double, double)在一定的精度下确定两个双值的相等。如果你想了解关于浮点平等的细节,这个网站是非常有用的;同样的网站也解释了浮点舍入误差。在求和中:由于运算顺序的不同,计算的期望和实际值不同。

#3


4  

Let's simplify things a bit. What you want to know is why 450d / 420 and 450d / 7 / 60 (specifically) give different results.

让我们化简一下。你想知道的是为什么450d / 420和450d / 7 / 60(具体地)给出不同的结果。

Let's see how division is performed in IEE double-precision floating point format. Without going deep into implementation details, it's basically XOR-ing the sign bit, subtracting the exponent of the divisor from the exponent of the dividend, dividing the mantissas, and normalizing the result.

我们来看看IEE双精度浮点格式是如何执行的。没有深入到实现细节中,它基本上是x -ing符号位,从股息的指数中减去除数的指数,将mantissas分割,并使结果标准化。

First, we should represent our numbers in the proper format for double:

首先,我们应该用合适的格式来表示我们的数字:

450    is  0 10000000111 1100001000000000000000000000000000000000000000000000

420    is  0 10000000111 1010010000000000000000000000000000000000000000000000

7      is  0 10000000001 1100000000000000000000000000000000000000000000000000

60     is  0 10000000100 1110000000000000000000000000000000000000000000000000

Let's first divide 450 by 420

我们先把450除以420。

First comes the sign bit, it's 0 (0 xor 0 == 0).

首先是符号位,它是0 (0 xor 0 == 0)。

Then comes the exponent. 10000000111b - 10000000111b + 1023 == 10000000111b - 10000000111b + 01111111111b == 01111111111b

然后是指数。10000000111b - 10000000111b + 1023 == 10000000111b - 10000000111b + 011111111b == 011111111b。

Looking good, now the mantissa:

看起来不错,现在是尾数:

1.1100001000000000000000000000000000000000000000000000 / 1.1010010000000000000000000000000000000000000000000000 == 1.1100001 / 1.101001. There are a couple of different ways to do this, I'll talk a bit about them later. The result is 1.0(001) (you can verify it here).

1.1100001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000有几种不同的方法来做这个,稍后我会详细介绍它们。结果是1.0(001)(您可以在这里验证)。

Now we should normalize the result. Let's see the guard, round and sticky bit values:

现在我们应该使结果正常化。让我们看看守卫,圆形和粘性的比特值:

0001001001001001001001001001001001001001001001001001 0 0 1

0001001001001001001001001001001001001001001001001001 0 0 1

Guard bit's 0, we don't do any rounding. The result is, in binary:

后卫位是0,我们不做四舍五入。结果是,二进制:

0 01111111111 0001001001001001001001001001001001001001001001001001

0 01111111111 0001001001001001001001001001001001001001001001001001

Which gets represented as 1.0714285714285714 in decimal.

用十进制表示为1.0714285714285714。

Now let's divide 450 by 7 by analogy.

现在我们把450除以7。

Sign bit = 0

符号位= 0

Exponent = 10000000111b - 10000000001b + 01111111111b == -01111111001b + 01111111111b + 01111111111b == 10000000101b

指数= 10000000111b - 10000000001b + 011111111b == -01111111001b + 011111111b + 0111111b == 10000000101b。

Mantissa = 1.1100001 / 1.11 == 1.00000(001)

Mantissa = 1.1100001 / 1.11 == 1.00000(001)

Rounding:

舍入:

0000000100100100100100100100100100100100100100100100 1 0 0

0000000100100100100100100100100100100100100100100100 1 0 0

Guard bit is set, round and sticky bits are not. We are rounding to-nearest (default mode for IEEE), and we're stuck right between the two possible values which we could round to. As the lsb is 0, we add 1. This gives us the rounded mantissa:

保护位设置,圆形和粘性位不是。我们正在接近(IEEE的默认模式),我们在两个可能的值之间找到了正确的位置。当lsb为0时,加1。这就得到了圆形的尾数:

0000000100100100100100100100100100100100100100100101

0000000100100100100100100100100100100100100100100101

The result is

结果是

0 10000000101 0000000100100100100100100100100100100100100100100101

0 10000000101 0000000100100100100100100100100100100100100100100101

Which gets represented as 64.28571428571429 in decimal.

以小数形式表示为64.28571428571429。

Now we will have to divide it by 60... But you already know that we have lost some precision. Dividing 450 by 420 didn't require rounding at all, but here, we already had to round the result at least once. But, for completeness's sake, let's finish the job:

现在我们要把它除以60。但你已经知道我们已经失去了一些精确度。将450除以420完全不需要四舍五入,但在这里,我们已经至少要绕过结果一次了。但是,为了完整起见,让我们完成这个工作:

Dividing 64.28571428571429 by 60

64.28571428571429除以60

Sign bit = 0

符号位= 0

Exponent = 10000000101b - 10000000100b + 01111111111b == 01111111110b

指数= 10000000101b - 10000000100b + 0111111b == 011111111b。

Mantissa = 1.0000000100100100100100100100100100100100100100100101 / 1.111 == 0.10001001001001001001001001001001001001001001001001001100110011

,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,

Round and shift:

轮和转变:

0.1000100100100100100100100100100100100100100100100100 1 1 0 0

1.0001001001001001001001001001001001001001001001001001 1 0 0

Rounding just as in the previous case, we get the mantissa: 0001001001001001001001001001001001001001001001001010.

与前一种情况一样,我们得到了mantissa: 0001001001001001001001001001001001001001001001001001001001001001001。

As we shifted by 1, we add that to the exponent, getting

当我们平移1时,我们把它加到指数中,得到。

Exponent = 01111111111b

指数= 01111111111 b

So, the result is:

所以,结果是:

0 01111111111 0001001001001001001001001001001001001001001001001010

0 01111111111 0001001001001001001001001001001001001001001001001010

Which gets represented as 1.0714285714285716 in decimal.

用十进制表示为1.0714285714285716。

Tl;dr:

Tl;博士:

The first division gave us:

第一个师给了我们:

0 01111111111 0001001001001001001001001001001001001001001001001001

0 01111111111 0001001001001001001001001001001001001001001001001001

And the last division gave us:

最后一个师给了我们:

0 01111111111 0001001001001001001001001001001001001001001001001010

0 01111111111 0001001001001001001001001001001001001001001001001010

The difference is in the last 2 bits only, but we could have lost more - after all, to get the second result, we had to round two times instead of none!

不同的是最后的2位,但是我们可能会损失更多——毕竟,为了得到第二个结果,我们不得不转了两次,而不是没有!

Now, about mantissa division. Floating-point division is implemented in two major ways.

现在,关于尾数。浮点除法是用两种主要方法实现的。

The way mandated by the IEEE long division (here are some good examples; it's basically the regular long division, but with binary instead of decimal), and it's pretty slow. That is what your computer did.

IEEE长除法的授权方式(这里有一些很好的例子;它基本上是常规的长除法,但是用二进制代替小数),而且很慢。这就是你的电脑所做的。

There is also a faster but less accrate option, multiplication by inverse. First, a reciprocal of the divisor is found, and then multiplication is performed.

也有一个更快但更少的积累选项,乘逆。首先,找到除数的倒数,然后进行乘法运算。

#4


1  

That's because double division often lead to a loss of precision. Said loss can vary depending on the order of the divisions.

这是因为双重划分常常导致精度的丧失。损失可以根据部门的顺序而变化。

When you divide by 7d, you already lost some precision with the actual result. Then only you divide an erroneous result by 60.

当你除以7d时,你已经在实际结果中失去了一些精度。那么只有你把错误的结果除以60。

When you divide by 7d * 60, you only have to use division once, thus losing precision only once.

当你除以7d * 60时,你只需要使用一次除法,这样只会损失一次精度。

Note that double multiplication can sometimes fail too, but that's much less common.

注意,双乘法有时也会失败,但这并不常见。

#5


0  

Certainly the order of the operations mixed with the fact that doubles aren't precise :

当然,操作的顺序与双打并不精确的事实混合在一起:

450.00d / (7d * 60) --> a = 7d * 60 --> result = 450.00d / a

vs

vs

450.00d / 7d / 60 --> a = 450.00d /7d --> result = a / 60

#1


42  

I see a bunch of questions that tell you how to work around this problem, but not one that really explains what's going on, other than "floating-point roundoff error is bad, m'kay?" So let me take a shot at it. Let me first point out that nothing in this answer is specific to Java. Roundoff error is a problem inherent to any fixed-precision representation of numbers, so you get the same issues in, say, C.

我看到了一些问题,告诉你如何解决这个问题,但没有一个能真正解释发生了什么,除了“浮点舍掉错误是坏的,m'kay?”让我试试看。让我首先指出,这个答案中没有什么是针对Java的。舍入错误是任何数字的固定精度表示所固有的问题,因此您可以在C语言中得到相同的问题。

Roundoff error in a decimal data type

As a simplified example, imagine we have some sort of computer that natively uses an unsigned decimal data type, let's call it float6d. The length of the data type is 6 digits: 4 dedicated to the mantissa, and 2 dedicated to the exponent. For example, the number 3.142 can be expressed as

作为一个简化的例子,假设我们有某种计算机,本机使用无符号的十进制数据类型,我们称之为float6d。数据类型的长度为6位数:4为尾数,2为指数。例如,数字3.142可以表示为。

3.142 x 10^0

which would be stored in 6 digits as

哪一个会以6位数存储?

503142

The first two digits are the exponent plus 50, and the last four are the mantissa. This data type can represent any number from 0.001 x 10^-50 to 9.999 x 10^+49.

前两个数字是指数加50,最后四个是尾数。这个数据类型可以表示任意数量从0.001 x 10 ^ -50 - 9.999 x 10 ^ + 49。

Actually, that's not true. It can't store any number. What if you want to represent 3.141592? Or 3.1412034? Or 3.141488906? Tough luck, the data type can't store more than four digits of precision, so the compiler has to round anything with more digits to fit into the constraints of the data type. If you write

事实上,这不是真的。它不能存储任何数字。如果你想表示3.141592呢?还是3.1412034 ?还是3.141488906 ?不幸的是,数据类型不能存储超过4位数的精度,因此编译器必须将任何有更多数字的数据包围,以适应数据类型的约束。如果你写

float6d x = 3.141592;
float6d y = 3.1412034;
float6d z = 3.141488906;

then the compiler converts each of these three values to the same internal representation, 3.142 x 10^0 (which, remember, is stored as 503142), so that x == y == z will hold true.

然后,编译器将这三个值中的每一个都转换为相同的内部表示,3.142 x10 0(记住,它被存储为503142),因此x == y == z将是正确的。

The point is that there is a whole range of real numbers which all map to the same underlying sequence of digits (or bits, in a real computer). Specifically, any x satisfying 3.1415 <= x <= 3.1425 (assuming half-even rounding) gets converted to the representation 503142 for storage in memory.

重点是,有一系列的实数,它们都映射到相同的数字序列(或比特,在真实的计算机中)。具体地说,任何满足3.1415 <= x <= 3.1425(假设半舍入)的x都转换为表示内存中存储的503142表示。

This rounding happens every time your program stores a floating-point value in memory. The first time it happens is when you write a constant in your source code, as I did above with x, y, and z. It happens again whenever you do an arithmetic operation that increases the number of digits of precision beyond what the data type can represent. Either of these effects is called roundoff error. There are a few different ways this can happen:

每当程序在内存中存储浮点值时,就会发生这种情况。第一次发生的情况是,当你在源代码中写一个常量时,就像我在上面用x、y和z写的那样,当你做一个算术运算时,它会再次发生,它增加了超过数据类型所能表示的精度的位数。这些效果中的任何一种都称为舍入错误。有几种不同的方式可以发生:

  • Addition and subtraction: if one of the values you're adding has a different exponent from the other, you will wind up with extra digits of precision, and if there are enough of them, the least significant ones will need to be dropped. For example, 2.718 and 121.0 are both values that can be exactly represented in the float6d data type. But if you try to add them together:

    添加和减法:如果你添加的一个值与另一个值有不同的指数,你就会得到额外的精确数字,如果有足够多的值,那么最不重要的值就需要被删除。例如,2.718和121.0都是可以在float6d数据类型中精确表示的值。但是如果你想把它们加在一起:

       1.210     x 10^2
    +  0.02718   x 10^2
    -------------------
       1.23718   x 10^2
    

    which gets rounded off to 1.237 x 10^2, or 123.7, dropping two digits of precision.

    四舍五入到1.237 x 10 ^ 2,或123.7,两位精度的下降。

  • Multiplication: the number of digits in the result is approximately the sum of the number of digits in the two operands. This will produce some amount of roundoff error, if your operands already have many significant digits. For example, 121 x 2.718 gives you

    乘法:结果中数字的数目大约是两个操作数中位数的总和。如果您的操作数已经有许多有效数字,这将产生一定数量的舍入错误。例如,121 x 2。718给出。

       1.210     x 10^2
    x  0.02718   x 10^2
    -------------------
       3.28878   x 10^2
    

    which gets rounded off to 3.289 x 10^2, or 328.9, again dropping two digits of precision.

    四舍五入到3.289 x 10 ^ 2,或328.9,再把两个数字的精度。

    However, it's useful to keep in mind that, if your operands are "nice" numbers, without many significant digits, the floating-point format can probably represent the result exactly, so you don't have to deal with roundoff error. For example, 2.3 x 140 gives

    但是,记住,如果您的操作数是“nice”数字,而没有许多有效数字,浮点格式可能会准确地表示结果,所以您不必处理舍入错误。例如,2.3 x140给出。

       1.40      x 10^2
    x  0.23      x 10^2
    -------------------
       3.22      x 10^2
    

    which has no roundoff problems.

    这没有问题。

  • Division: this is where things get messy. Division will pretty much always result in some amount of roundoff error unless the number you're dividing by happens to be a power of the base (in which case the division is just a digit shift, or bit shift in binary). As an example, take two very simple numbers, 3 and 7, divide them, and you get

    师:这就是事情变得一团糟的地方。除法几乎总是会导致一些舍入误差,除非你所划分的数字恰好是基数的一个幂(在这种情况下,除法只是一个数字移位,或者二进制的移位)。举个例子,取两个非常简单的数字,3和7,除以它们,得到。

       3.                x 10^0
    /  7.                x 10^0
    ----------------------------
       0.428571428571... x 10^0
    

    The closest value to this number which can be represented as a float6d is 4.286 x 10^-1, or 0.4286, which distinctly differs from the exact result.

    最接近值这个数可以表示为float6d 4.286 x 10 ^ 1或0.4286,明显不同于确切的结果。

As we'll see in the next section, the error introduced by rounding grows with each operation you do. So if you're working with "nice" numbers, as in your example, it's generally best to do the division operations as late as possible because those are the operations most likely to introduce roundoff error into your program where none existed before.

正如我们在下一节中会看到的,舍入所引入的错误随着每个操作的增加而增加。因此,如果您使用“nice”数字,就像在您的示例中一样,通常最好尽可能晚地执行分区操作,因为这些操作最有可能在您的程序中引入不存在的舍入错误。

Analysis of roundoff error

In general, if you can't assume your numbers are "nice", roundoff error can be either positive or negative, and it's very difficult to predict which direction it will go just based on the operation. It depends on the specific values involved. Look at this plot of the roundoff error for 2.718 z as a function of z (still using the float6d data type):

一般来说,如果你不能假设你的数字是“好”的,那么舍入误差可以是正的,也可以是负的,并且很难预测它会根据操作的方向进行。这取决于所涉及的具体数值。以z为函数(仍然使用float6d数据类型):

当使用双打时,为什么不(x / (y * z))与(x / y / z)相同?(复制)

In practice, when you're working with values that use the full precision of your data type, it's often easier to treat roundoff error as a random error. Looking at the plot, you might be able to guess that the magnitude of the error depends on the order of magnitude of the result of the operation. In this particular case, when z is of the order 10-1, 2.718 z is also on the order of 10-1, so it will be a number of the form 0.XXXX. The maximum roundoff error is then half of the last digit of precision; in this case, by "the last digit of precision" I mean 0.0001, so the roundoff error varies between -0.00005 and +0.00005. At the point where 2.718 z jumps up to the next order of magnitude, which is 1/2.718 = 0.3679, you can see that the roundoff error also jumps up by an order of magnitude.

在实践中,当您使用数据类型的完全精度的值时,通常更容易将roundoff错误视为一个随机错误。看这个图,你可能会猜出误差的大小取决于运算结果的大小。在这个特殊情况下,当z的顺序是10-1时,2。718 z也在10-1的顺序上,所以它是0。xxxx的一些形式。最大的舍入误差是精度的最后一个数字的一半;在这种情况下,通过“精确的最后一个数字”,我的意思是0.0001,所以舍入误差在-0.00005和+0.00005之间。在2。718 z跳跃到下一个数量级,也就是1/2。718 = 0.3679,你可以看到,舍入误差也会上升一个数量级。

You can use well-known techniques of error analysis to analyze how a random (or unpredictable) error of a certain magnitude affects your result. Specifically, for multiplication or division, the "average" relative error in your result can be approximated by adding the relative error in each of the operands in quadrature - that is, square them, add them, and take the square root. With our float6d data type, the relative error varies between 0.0005 (for a value like 0.101) and 0.00005 (for a value like 0.995).

您可以使用众所周知的错误分析技术来分析一个特定大小的随机(或不可预测的)错误如何影响您的结果。具体地说,对于乘法或除法,你的结果中的“平均”相对误差可以通过在每一个操作数中加入相对误差来近似——即,平方,加它们,然后取平方根。使用我们的float6d数据类型,相对误差在0.0005(值0.101)和0.00005(值为0.995)之间变化。

当使用双打时,为什么不(x / (y * z))与(x / y / z)相同?(复制)

Let's take 0.0001 as a rough average for the relative error in values x and y. The relative error in x * y or x / y is then given by

我们取0。0001作为x和y的相对误差的粗略平均值,然后给出x * y或x / y的相对误差。

sqrt(0.0001^2 + 0.0001^2) = 0.0001414

which is a factor of sqrt(2) larger than the relative error in each of the individual values.

这是一个比每个单独值的相对误差大的sqrt(2)因子。

When it comes to combining operations, you can apply this formula multiple times, once for each floating-point operation. So for instance, for z / (x * y), the relative error in x * y is, on average, 0.0001414 (in this decimal example) and then the relative error in z / (x * y) is

在组合操作时,可以多次应用这个公式,每一次浮点运算一次。例如,对于z / (x * y), x * y的相对误差是,平均为0.0001414(在这个十进制例子中),然后z / (x * y)的相对误差为。

sqrt(0.0001^2 + 0.0001414^2) = 0.0001732

Notice that the average relative error grows with each operation, specifically as the square root of the number of multiplications and divisions you do.

请注意,每个操作的平均相对误差都在增长,具体地说就是您所做的乘法和除法的平方根。

Similarly, for z / x * y, the average relative error in z / x is 0.0001414, and the relative error in z / x * y is

同样,z / x * y的平均相对误差为0.0001414,z / x * y的相对误差为。

sqrt(0.0001414^2 + 0.0001^2) = 0.0001732

So, the same, in this case. This means that for arbitrary values, on average, the two expressions introduce approximately the same error. (In theory, that is. I've seen these operations behave very differently in practice, but that's another story.)

在这个例子中是一样的。这意味着对于任意值,平均而言,这两个表达式会引入近似相同的错误。(理论上。我看到这些操作在实践中表现得非常不同,但那是另一回事了。

Gory details

You might be curious about the specific calculation you presented in the question, not just an average. For that analysis, let's switch to the real world of binary arithmetic. Floating-point numbers in most systems and languages are represented using IEEE standard 754. For 64-bit numbers, the format specifies 52 bits dedicated to the mantissa, 11 to the exponent, and one to the sign. In other words, when written in base 2, a floating point number is a value of the form

你可能会对你在问题中给出的具体计算感到好奇,而不仅仅是一个平均值。为了这个分析,让我们切换到二进制算术的现实世界。大多数系统和语言中的浮点数使用IEEE标准754表示。对于64位数字,格式指定52位专用于尾数,11的指数,和1的符号。换句话说,当以2为基底时,浮点数就是表单的值。

1.1100000000000000000000000000000000000000000000000000 x 2^00000000010
                       52 bits                             11 bits

The leading 1 is not explicitly stored, and constitutes a 53rd bit. Also, you should note that the 11 bits stored to represent the exponent are actually the real exponent plus 1023. For example, this particular value is 7, which is 1.75 x 22. The mantissa is 1.75 in binary, or 1.11, and the exponent is 1023 + 2 = 1025 in binary, or 10000000001, so the content stored in memory is

引导的1没有显式地存储,并构成第53位。还要注意,表示指数的11位实际上是指数+ 1023。例如,这个特殊的值是7,即1.75 x 22。mantissa是1.75,或者1.11,指数是1023 + 2 = 1025,或者10000000001,所以存储在内存中的内容是。

01000000000111100000000000000000000000000000000000000000000000000
 ^          ^
 exponent   mantissa

but that doesn't really matter.

但这并不重要。

Your example also involves 450,

你的例子还包括450,

1.1100001000000000000000000000000000000000000000000000 x 2^00000001000

and 60,

和60,

1.1110000000000000000000000000000000000000000000000000 x 2^00000000101

You can play around with these values using this converter or any of many others on the internet.

您可以使用这个转换器或internet上的任何其他值来处理这些值。

When you compute the first expression, 450/(7*60), the processor first does the multiplication, obtaining 420, or

当你计算第一个表达式,450/(7*60),处理器首先做乘法,得到420,或者。

1.1010010000000000000000000000000000000000000000000000 x 2^00000001000

Then it divides 450 by 420. This produces 15/14, which is

然后它将450除以420。这就产生了15/14,也就是。

1.0001001001001001001001001001001001001001001001001001001001001001001001...

in binary. Now, the Java language specification says that

在二进制。现在,Java语言规范说明了这一点。

Inexact results must be rounded to the representable value nearest to the infinitely precise result; if the two nearest representable values are equally near, the one with its least significant bit zero is chosen. This is the IEEE 754 standard's default rounding mode known as round to nearest.

不精确的结果必须四舍五入到最接近无限精确结果的可表示值;如果两个最接近的可表示值相等,则选择最不显著位零的值。这是IEEE 754标准的默认四舍五入模式。

and the nearest representable value to 15/14 in 64-bit IEEE 754 format is

在64位IEEE 754格式中,最接近15/14的可表示值是。

1.0001001001001001001001001001001001001001001001001001 x 2^00000000000

which is approximately 1.0714285714285714 in decimal. (More precisely, this is the least precise decimal value that uniquely specifies this particular binary representation.)

大约是1。0714285714285714。(更准确地说,这是最不精确的十进制值,它惟一地指定了这个特殊的二进制表示。)

On the other hand, if you compute 450 / 7 first, the result is 64.2857142857..., or in binary,

另一方面,如果你先计算450 / 7,结果是64.2857142857……,或者在二进制,

1000000.01001001001001001001001001001001001001001001001001001001001001001...

for which the nearest representable value is

最接近的可表示值是。

1.0000000100100100100100100100100100100100100100100101 x 2^00000000110

which is 64.28571428571429180465... Note the change in the last digit of the binary mantissa (compared to the exact value) due to roundoff error. Dividing this by 60 gives you

这是64.28571428571429180465……注意,由于舍入错误,二进制尾数的最后一个数字(与精确值相比)的变化。把这个除以60。

1.000100100100100100100100100100100100100100100100100110011001100110011...

Look at the end: the pattern is different! It's 0011 that repeats, instead of 001 as in the other case. The closest representable value is

看最后:模式不同!它是0011重复,而不是001,在另一种情况下。最接近的可表示值是。

1.0001001001001001001001001001001001001001001001001010 x 2^00000000000

which differs from the other order of operations in the last two bits: they're 10 instead of 01. The decimal equivalent is 1.0714285714285716.

这与前两部分中的其他操作顺序不同:它们是10而不是01。十进制的等价物是1。0714285714285716。

The specific rounding that causes this difference should be clear if you look at the exact binary values:

如果你看一下精确的二进制数值,就可以清楚地看到,导致这种差异的具体的四舍五入:

1.0001001001001001001001001001001001001001001001001001001001001001001001...
1.0001001001001001001001001001001001001001001001001001100110011001100110...
                                                     ^ last bit of mantissa

It works out in this case that the former result, numerically 15/14, happens to be the most accurate representation of the exact value. This is an example of how leaving division until the end benefits you. But again, this rule only holds as long as the values you're working with don't use the full precision of the data type. Once you start working with inexact (rounded) values, you no longer protect yourself from further roundoff errors by doing the multiplications first.

在这个例子中,它的结果是,在数值上15/14,恰好是精确值的最精确表示。这是一个例子,说明离开部门直到最终对你有好处。但是,这条规则只适用于您所使用的值不使用数据类型的全部精度。一旦您开始使用不精确的(圆)值,您将不再通过先做乘法来保护自己不受进一步的舍离错误的保护。

#2


5  

It has to do with how the double type is implemented and the fact that the floating-point types don't make the same precision guarantees as other simpler numerical types. Although the following answer is more specifically about sums, it also answers your question by explaining how there is no guarantee of infinite precision in floating-point mathematical operations: Why does changing the sum order returns a different result?. Essentially you should never attempt to determine the equality of floating-point values without specifying an acceptable margin of error. Google's Guava library includes DoubleMath.fuzzyEquals(double, double, double) to determine the equality of two double values within a certain precision. If you wish to read up on the specifics of floating-point equality this site is quite useful; the same site also explains floating-point rounding errors. In summation: the expected and actual values of your calculation differ because of the rounding differing between the calculations due to the order of operations.

它与double类型的实现方式有关,以及浮点类型与其他更简单的数值类型不具有相同的精度保证。虽然下面的答案更具体地说明了加法,但它也回答了你的问题,它解释了浮点数学运算中没有无限精度的保证:为什么改变求和顺序会返回一个不同的结果?本质上,您不应该尝试确定浮点值的相等性,而不指定可接受的误差范围。谷歌的Guava图书馆包括DoubleMath。fuzzyEquals(double, double, double)在一定的精度下确定两个双值的相等。如果你想了解关于浮点平等的细节,这个网站是非常有用的;同样的网站也解释了浮点舍入误差。在求和中:由于运算顺序的不同,计算的期望和实际值不同。

#3


4  

Let's simplify things a bit. What you want to know is why 450d / 420 and 450d / 7 / 60 (specifically) give different results.

让我们化简一下。你想知道的是为什么450d / 420和450d / 7 / 60(具体地)给出不同的结果。

Let's see how division is performed in IEE double-precision floating point format. Without going deep into implementation details, it's basically XOR-ing the sign bit, subtracting the exponent of the divisor from the exponent of the dividend, dividing the mantissas, and normalizing the result.

我们来看看IEE双精度浮点格式是如何执行的。没有深入到实现细节中,它基本上是x -ing符号位,从股息的指数中减去除数的指数,将mantissas分割,并使结果标准化。

First, we should represent our numbers in the proper format for double:

首先,我们应该用合适的格式来表示我们的数字:

450    is  0 10000000111 1100001000000000000000000000000000000000000000000000

420    is  0 10000000111 1010010000000000000000000000000000000000000000000000

7      is  0 10000000001 1100000000000000000000000000000000000000000000000000

60     is  0 10000000100 1110000000000000000000000000000000000000000000000000

Let's first divide 450 by 420

我们先把450除以420。

First comes the sign bit, it's 0 (0 xor 0 == 0).

首先是符号位,它是0 (0 xor 0 == 0)。

Then comes the exponent. 10000000111b - 10000000111b + 1023 == 10000000111b - 10000000111b + 01111111111b == 01111111111b

然后是指数。10000000111b - 10000000111b + 1023 == 10000000111b - 10000000111b + 011111111b == 011111111b。

Looking good, now the mantissa:

看起来不错,现在是尾数:

1.1100001000000000000000000000000000000000000000000000 / 1.1010010000000000000000000000000000000000000000000000 == 1.1100001 / 1.101001. There are a couple of different ways to do this, I'll talk a bit about them later. The result is 1.0(001) (you can verify it here).

1.1100001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000有几种不同的方法来做这个,稍后我会详细介绍它们。结果是1.0(001)(您可以在这里验证)。

Now we should normalize the result. Let's see the guard, round and sticky bit values:

现在我们应该使结果正常化。让我们看看守卫,圆形和粘性的比特值:

0001001001001001001001001001001001001001001001001001 0 0 1

0001001001001001001001001001001001001001001001001001 0 0 1

Guard bit's 0, we don't do any rounding. The result is, in binary:

后卫位是0,我们不做四舍五入。结果是,二进制:

0 01111111111 0001001001001001001001001001001001001001001001001001

0 01111111111 0001001001001001001001001001001001001001001001001001

Which gets represented as 1.0714285714285714 in decimal.

用十进制表示为1.0714285714285714。

Now let's divide 450 by 7 by analogy.

现在我们把450除以7。

Sign bit = 0

符号位= 0

Exponent = 10000000111b - 10000000001b + 01111111111b == -01111111001b + 01111111111b + 01111111111b == 10000000101b

指数= 10000000111b - 10000000001b + 011111111b == -01111111001b + 011111111b + 0111111b == 10000000101b。

Mantissa = 1.1100001 / 1.11 == 1.00000(001)

Mantissa = 1.1100001 / 1.11 == 1.00000(001)

Rounding:

舍入:

0000000100100100100100100100100100100100100100100100 1 0 0

0000000100100100100100100100100100100100100100100100 1 0 0

Guard bit is set, round and sticky bits are not. We are rounding to-nearest (default mode for IEEE), and we're stuck right between the two possible values which we could round to. As the lsb is 0, we add 1. This gives us the rounded mantissa:

保护位设置,圆形和粘性位不是。我们正在接近(IEEE的默认模式),我们在两个可能的值之间找到了正确的位置。当lsb为0时,加1。这就得到了圆形的尾数:

0000000100100100100100100100100100100100100100100101

0000000100100100100100100100100100100100100100100101

The result is

结果是

0 10000000101 0000000100100100100100100100100100100100100100100101

0 10000000101 0000000100100100100100100100100100100100100100100101

Which gets represented as 64.28571428571429 in decimal.

以小数形式表示为64.28571428571429。

Now we will have to divide it by 60... But you already know that we have lost some precision. Dividing 450 by 420 didn't require rounding at all, but here, we already had to round the result at least once. But, for completeness's sake, let's finish the job:

现在我们要把它除以60。但你已经知道我们已经失去了一些精确度。将450除以420完全不需要四舍五入,但在这里,我们已经至少要绕过结果一次了。但是,为了完整起见,让我们完成这个工作:

Dividing 64.28571428571429 by 60

64.28571428571429除以60

Sign bit = 0

符号位= 0

Exponent = 10000000101b - 10000000100b + 01111111111b == 01111111110b

指数= 10000000101b - 10000000100b + 0111111b == 011111111b。

Mantissa = 1.0000000100100100100100100100100100100100100100100101 / 1.111 == 0.10001001001001001001001001001001001001001001001001001100110011

,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,

Round and shift:

轮和转变:

0.1000100100100100100100100100100100100100100100100100 1 1 0 0

1.0001001001001001001001001001001001001001001001001001 1 0 0

Rounding just as in the previous case, we get the mantissa: 0001001001001001001001001001001001001001001001001010.

与前一种情况一样,我们得到了mantissa: 0001001001001001001001001001001001001001001001001001001001001001001。

As we shifted by 1, we add that to the exponent, getting

当我们平移1时,我们把它加到指数中,得到。

Exponent = 01111111111b

指数= 01111111111 b

So, the result is:

所以,结果是:

0 01111111111 0001001001001001001001001001001001001001001001001010

0 01111111111 0001001001001001001001001001001001001001001001001010

Which gets represented as 1.0714285714285716 in decimal.

用十进制表示为1.0714285714285716。

Tl;dr:

Tl;博士:

The first division gave us:

第一个师给了我们:

0 01111111111 0001001001001001001001001001001001001001001001001001

0 01111111111 0001001001001001001001001001001001001001001001001001

And the last division gave us:

最后一个师给了我们:

0 01111111111 0001001001001001001001001001001001001001001001001010

0 01111111111 0001001001001001001001001001001001001001001001001010

The difference is in the last 2 bits only, but we could have lost more - after all, to get the second result, we had to round two times instead of none!

不同的是最后的2位,但是我们可能会损失更多——毕竟,为了得到第二个结果,我们不得不转了两次,而不是没有!

Now, about mantissa division. Floating-point division is implemented in two major ways.

现在,关于尾数。浮点除法是用两种主要方法实现的。

The way mandated by the IEEE long division (here are some good examples; it's basically the regular long division, but with binary instead of decimal), and it's pretty slow. That is what your computer did.

IEEE长除法的授权方式(这里有一些很好的例子;它基本上是常规的长除法,但是用二进制代替小数),而且很慢。这就是你的电脑所做的。

There is also a faster but less accrate option, multiplication by inverse. First, a reciprocal of the divisor is found, and then multiplication is performed.

也有一个更快但更少的积累选项,乘逆。首先,找到除数的倒数,然后进行乘法运算。

#4


1  

That's because double division often lead to a loss of precision. Said loss can vary depending on the order of the divisions.

这是因为双重划分常常导致精度的丧失。损失可以根据部门的顺序而变化。

When you divide by 7d, you already lost some precision with the actual result. Then only you divide an erroneous result by 60.

当你除以7d时,你已经在实际结果中失去了一些精度。那么只有你把错误的结果除以60。

When you divide by 7d * 60, you only have to use division once, thus losing precision only once.

当你除以7d * 60时,你只需要使用一次除法,这样只会损失一次精度。

Note that double multiplication can sometimes fail too, but that's much less common.

注意,双乘法有时也会失败,但这并不常见。

#5


0  

Certainly the order of the operations mixed with the fact that doubles aren't precise :

当然,操作的顺序与双打并不精确的事实混合在一起:

450.00d / (7d * 60) --> a = 7d * 60 --> result = 450.00d / a

vs

vs

450.00d / 7d / 60 --> a = 450.00d /7d --> result = a / 60