为什么浮点数打印得如此不同?

时间:2022-02-24 11:34:39

It's kind of a common knowledge that (most) floating point numbers are not stored precisely (when IEEE-754 format is used). So one shouldn't do this:

众所周知,(大多数)浮点数不能精确存储(使用IEEE-754格式时)。所以不应该这样做:

0.3 - 0.2 === 0.1; // very wrong

... as it will result in false, unless some specific arbitrary-precision type/class was used (BigDecimal in Java/Ruby, BCMath in PHP, Math::BigInt/Math::BigFloat in Perl, to name a few) instead.

...因为它会导致false,除非使用了一些特定的任意精度类型/类(Java / Ruby中的BigDecimal,PHP中的BCMath,Perl中的Math :: BigInt / Math :: BigFloat,仅举几例) 。

Yet I wonder why when one tries to print the result of this expression, 0.3 - 0.2, scripting languages (Perl and PHP) give 0.1, but "virtual-machine" ones (Java, JavaScript and Erlang) give something more similar to 0.09999999999999998 instead?

但我想知道为什么当一个人尝试打印这个表达式的结果时,0.3 - 0.2,脚本语言(Perl和PHP)给出0.1,但“虚拟机”(Java,JavaScript和Erlang)给出的更类似于0.09999999999999998 ?

And why is it also inconsistent in Ruby? version 1.8.6 (codepad) gives 0.1, version 1.9.3 (ideone) gives 0.0999...

为什么它在Ruby中也不一致?版本1.8.6(codepad)给出0.1,版本1.9.3(ideone)给出0.0999 ...

5 个解决方案

#1


4  

Floating-point numbers are printed differently because printing is done for different purposes, so different choices are made about how to do it.

浮点数的打印方式不同,因为打印是为了不同的目的,因此对如何进行打印有不同的选择。

Printing a floating-point number is a conversion operation: A value encoded in an internal format is converted to a decimal numeral. However, there are choices about the details of the conversion.

打印浮点数是一种转换操作:以内部格式编码的值将转换为十进制数字。但是,有关转换细节的选择。

(A) If you are doing precise mathematics and want to see the actual value represented by the internal format, then the conversion must be exact: It must produce a decimal numeral that has exactly the same value as the input. (Each floating-point number represents exactly one number. A floating-point number, as defined in the IEEE 754 standard, does not represent an interval.) At times, this may require producing a very large number of digits.

(A)如果你正在进行精确的数学运算并希望看到内部格式所代表的实际值,那么转换必须是精确的:它必须产生一个与输入具有完全相同值的十进制数字。 (每个浮点数只代表一个数字。浮点数,如IEEE 754标准中定义的,不代表间隔。)有时,这可能需要产生非常大量的数字。

(B) If you do not need the exact value but do need to convert back and forth between the internal format and decimal, then you need to convert it to a decimal numeral precisely (and accurately) enough to distinguish it from any other result. That is, you must produce enough digits that the result is different from what you would get by converting numbers that are adjacent in the internal format. This may require producing a large number of digits, but not so many as to be unmanageable.

(B)如果您不需要确切的值但需要在内部格式和小数之间来回转换,那么您需要将其精确(并且准确)地转换为十进制数字,以便将其与任何其他结果区分开来。也就是说,您必须生成足够的数字,使得结果与通过转换内部格式中相邻的数字所获得的结果不同。这可能需要产生大量数字,但不能太多以至于无法管理。

(C) If you only want to give the reader a sense of the number, and do not need to produce the exact value in order for your application to function as desired, then you only need to produce as many digits as are needed for your particular application.

(C)如果您只想让读者了解数字,并且不需要为您的应用程序按照需要生成确切的值,那么您只需要生成所需的数字。特别的应用。

Which of these should a conversion do?

转换中的哪一个应该进行转换?

Different languages have different defaults because they were developed for different purposes, or because it was not expedient during development to do all the work necessary to produce exact results, or for various other reasons.

不同的语言具有不同的默认值,因为它们是为不同的目的而开发的,或者是因为在开发过程中进行所有必要的工作以产生精确的结果或出于各种其他原因而不合适。

(A) requires careful code, and some languages or implementations of them do not provide, or do not guarantee to provide, this behavior.

(A)需要仔细的代码,并且某些语言或其实现不提供或不保证提供此行为。

(B) is required by Java, I believe. However, as we saw in a recent question, it can have some unexpected behavior. (65.12 is printed as “65.12” because the latter has enough digits to distinguish it from nearby values, but 65.12-2 is printed as “63.120000000000005” because there is another floating-point value between it and 63.12, so you need the extra digits to distinguish them.)

(B)是Java所要求的,我相信。但是,正如我们在最近的一个问题中看到的,它可能会有一些意想不到的行为(65.12打印为“65.12”,因为后者有足够的数字来区分它与附近的值,但65.12-2打印为“63.120000000000005”,因为它和63.12之间有另一个浮点值,所以你需要额外的数字区分他们。)

(C) is what some languages use by default. It is, in essence, wrong, since no single value for how many digits to print can be suitable for all applications. Indeed, we have seen over decades that it fosters continuing misconceptions about floating-point, largely by concealing the true values involved. It is, however, easy to implement, and hence is attractive to some implementors. Ideally, a language should by default print the correct value of a floating-point number. If fewer digits are to be displayed, the number of digits should be selected only by the application implementor, hopefully including consideration of the appropriate number of digits to produce the desire results.

(C)是某些语言默认使用的。从本质上讲,这是错误的,因为打印多少位数的单个值不适合所有应用程序。事实上,几十年来我们已经看到它继续存在对浮点的误解,主要是隐瞒真正的价值观。然而,它易于实现,因此对某些实现者具有吸引力。理想情况下,语言应默认打印浮点数的正确值。如果要显示的位数较少,则应仅由应用程序实现者选择位数,希望包括考虑适当的位数以产生所需结果。

Worse, some languages, in addition to not displaying the actual value or enough digits to distinguish it, do not even guarantee that the digits produced are correct in some sense (such as being the value you would get by rounding the exact value to the number of digits shown). When programming in an implementation that does not provide a guarantee about this behavior, you are not doing engineering.

更糟糕的是,除了不显示实际值或足以区分它的数字之外,某些语言甚至不保证所产生的数字在某种意义上是正确的(例如通过将精确值四舍五入到数字而得到的值)显示的数字)。在无法提供此行为保证的实现中进行编程时,您不会进行工程设计。

#2


7  

As for php, output is related to ini settings of precision:

至于php,输出与精度的ini设置有关:

ini_set('precision', 15);
print 0.3 - 0.2; // 0.1

ini_set('precision', 17);
print 0.3 - 0.2; //0.099999999999999978 

This may be also cause for other languages

这也可能是其他语言的原因

#3


2  

PHP automatically rounds the number to an arbitrary precision.

PHP会自动将数字舍入为任意精度。

Floating-point numbers in general aren't accurate (as you noted), and you should use the language-specific round() function if you need a comparison with only a few decimal places. Otherwise, take the absolute value of the equation, and test they are within a given range.

浮点数通常不准确(如您所述),如果需要仅与几个小数位进行比较,则应使用特定于语言的round()函数。否则,取等式的绝对值,并测试它们是否在给定范围内。

PHP Example from php.net:

来自php.net的PHP示例:

$a = 1.23456789;
$b = 1.23456780;
$epsilon = 0.00001;
if(abs($a - $b) < $epsilon) {
  echo "true";
}

As for the Ruby issue, they appear to be using different versions. Codepad uses 1.8.6, While Ideaone uses 1.9.3, but it's more likely related to a config somewhere.

至于Ruby问题,它们似乎使用不同的版本。 Codepad使用1.8.6,而Ideaone使用1.9.3,但它更可能与某个地方的配置相关。

#4


2  

If we want this property

如果我们想要这个属性

  • every two different float has a different printed representation
  • 每两个不同的浮点数具有不同的印刷表示

Or an even stronger one useful for REPL

或者对REPL更有用的一个

  • printed representation shall be re-interpreted unchanged
  • 印刷表示应重新解释不变

Then I see 3 solutions for printing a float/double with base 2 internal representation into base 10

然后我看到3个解决方案,用于将带有基本2内部表示的浮动/双重打印到基础10中

  1. print the EXACT representation.
  2. 打印EXACT表示。
  3. print enough decimal digits (with proper rounding)
  4. 打印足够的十进制数字(适当的舍入)
  5. print the shortest decimal representation that can be reinterpreted unchanged
  6. 打印可以不加改变地重新解释的最短十进制表示

Since in base two, the float number is an_integer * 2^an_exponent, its base 10 exact representation has a finite number of digits.
Unfortunately, this can result in very long strings... For example 1.0e-10 is represented exactly as 1.0000000000000000364321973154977415791655470655996396089904010295867919921875e-10

由于在基数2中,浮点数是an_integer * 2 ^ an_exponent,其基数10精确表示具有有限的位数。不幸的是,这可能导致非常长的字符串......例如,1.0e-10完全表示为1.0000000000000000364321973154977415791655470655996396089904010295867919921875e-10

Solution 2 is easy, you use printf with 17 digits for IEEE-754 double...
Drawback: it's not exact, nor the shortest! If you enter 0.1, you get 0.100000000000000006

解决方案2很简单,你使用带有17位数字的printf用于IEEE-754双...缺点:它不准确,也不是最短的!如果输入0.1,则得到0.100000000000000006

Solution 3 is the best one for REPL languages, if you enter 0.1, it prints 0.1
Unfortunately it is not found in standard libraries (a shame).
At least, Scheme, Python and recent Squeak/Pharo Smalltalk do it right, I think Java too.

解决方案3是REPL语言的最佳选择,如果输入0.1,则打印0.1不幸的是它在标准库中找不到(遗憾)。至少,Scheme,Python和最近的Squeak / Pharo Smalltalk做得对,我认为Java也是如此。

#5


0  

As for Javascript, base2 is being used internally for calculations.

至于Javascript,base2正在内部用于计算。

> 0.2 + 0.4
0.6000000000000001

For that, Javascript can only deliver even numbers, if the resulting base2 number is not periodic.

为此,如果得到的base2数不是周期性的,则Javascript只能传递偶数。

0.6 is 0.10011 10011 10011 10011 ... in base2 (periodic), whereas 0.5 is not and therefore correctly printed.

0.6是0.10011 10011 10011 10011 ...在base2(周期性)中,而0.5不是因此正确打印。

#1


4  

Floating-point numbers are printed differently because printing is done for different purposes, so different choices are made about how to do it.

浮点数的打印方式不同,因为打印是为了不同的目的,因此对如何进行打印有不同的选择。

Printing a floating-point number is a conversion operation: A value encoded in an internal format is converted to a decimal numeral. However, there are choices about the details of the conversion.

打印浮点数是一种转换操作:以内部格式编码的值将转换为十进制数字。但是,有关转换细节的选择。

(A) If you are doing precise mathematics and want to see the actual value represented by the internal format, then the conversion must be exact: It must produce a decimal numeral that has exactly the same value as the input. (Each floating-point number represents exactly one number. A floating-point number, as defined in the IEEE 754 standard, does not represent an interval.) At times, this may require producing a very large number of digits.

(A)如果你正在进行精确的数学运算并希望看到内部格式所代表的实际值,那么转换必须是精确的:它必须产生一个与输入具有完全相同值的十进制数字。 (每个浮点数只代表一个数字。浮点数,如IEEE 754标准中定义的,不代表间隔。)有时,这可能需要产生非常大量的数字。

(B) If you do not need the exact value but do need to convert back and forth between the internal format and decimal, then you need to convert it to a decimal numeral precisely (and accurately) enough to distinguish it from any other result. That is, you must produce enough digits that the result is different from what you would get by converting numbers that are adjacent in the internal format. This may require producing a large number of digits, but not so many as to be unmanageable.

(B)如果您不需要确切的值但需要在内部格式和小数之间来回转换,那么您需要将其精确(并且准确)地转换为十进制数字,以便将其与任何其他结果区分开来。也就是说,您必须生成足够的数字,使得结果与通过转换内部格式中相邻的数字所获得的结果不同。这可能需要产生大量数字,但不能太多以至于无法管理。

(C) If you only want to give the reader a sense of the number, and do not need to produce the exact value in order for your application to function as desired, then you only need to produce as many digits as are needed for your particular application.

(C)如果您只想让读者了解数字,并且不需要为您的应用程序按照需要生成确切的值,那么您只需要生成所需的数字。特别的应用。

Which of these should a conversion do?

转换中的哪一个应该进行转换?

Different languages have different defaults because they were developed for different purposes, or because it was not expedient during development to do all the work necessary to produce exact results, or for various other reasons.

不同的语言具有不同的默认值,因为它们是为不同的目的而开发的,或者是因为在开发过程中进行所有必要的工作以产生精确的结果或出于各种其他原因而不合适。

(A) requires careful code, and some languages or implementations of them do not provide, or do not guarantee to provide, this behavior.

(A)需要仔细的代码,并且某些语言或其实现不提供或不保证提供此行为。

(B) is required by Java, I believe. However, as we saw in a recent question, it can have some unexpected behavior. (65.12 is printed as “65.12” because the latter has enough digits to distinguish it from nearby values, but 65.12-2 is printed as “63.120000000000005” because there is another floating-point value between it and 63.12, so you need the extra digits to distinguish them.)

(B)是Java所要求的,我相信。但是,正如我们在最近的一个问题中看到的,它可能会有一些意想不到的行为(65.12打印为“65.12”,因为后者有足够的数字来区分它与附近的值,但65.12-2打印为“63.120000000000005”,因为它和63.12之间有另一个浮点值,所以你需要额外的数字区分他们。)

(C) is what some languages use by default. It is, in essence, wrong, since no single value for how many digits to print can be suitable for all applications. Indeed, we have seen over decades that it fosters continuing misconceptions about floating-point, largely by concealing the true values involved. It is, however, easy to implement, and hence is attractive to some implementors. Ideally, a language should by default print the correct value of a floating-point number. If fewer digits are to be displayed, the number of digits should be selected only by the application implementor, hopefully including consideration of the appropriate number of digits to produce the desire results.

(C)是某些语言默认使用的。从本质上讲,这是错误的,因为打印多少位数的单个值不适合所有应用程序。事实上,几十年来我们已经看到它继续存在对浮点的误解,主要是隐瞒真正的价值观。然而,它易于实现,因此对某些实现者具有吸引力。理想情况下,语言应默认打印浮点数的正确值。如果要显示的位数较少,则应仅由应用程序实现者选择位数,希望包括考虑适当的位数以产生所需结果。

Worse, some languages, in addition to not displaying the actual value or enough digits to distinguish it, do not even guarantee that the digits produced are correct in some sense (such as being the value you would get by rounding the exact value to the number of digits shown). When programming in an implementation that does not provide a guarantee about this behavior, you are not doing engineering.

更糟糕的是,除了不显示实际值或足以区分它的数字之外,某些语言甚至不保证所产生的数字在某种意义上是正确的(例如通过将精确值四舍五入到数字而得到的值)显示的数字)。在无法提供此行为保证的实现中进行编程时,您不会进行工程设计。

#2


7  

As for php, output is related to ini settings of precision:

至于php,输出与精度的ini设置有关:

ini_set('precision', 15);
print 0.3 - 0.2; // 0.1

ini_set('precision', 17);
print 0.3 - 0.2; //0.099999999999999978 

This may be also cause for other languages

这也可能是其他语言的原因

#3


2  

PHP automatically rounds the number to an arbitrary precision.

PHP会自动将数字舍入为任意精度。

Floating-point numbers in general aren't accurate (as you noted), and you should use the language-specific round() function if you need a comparison with only a few decimal places. Otherwise, take the absolute value of the equation, and test they are within a given range.

浮点数通常不准确(如您所述),如果需要仅与几个小数位进行比较,则应使用特定于语言的round()函数。否则,取等式的绝对值,并测试它们是否在给定范围内。

PHP Example from php.net:

来自php.net的PHP示例:

$a = 1.23456789;
$b = 1.23456780;
$epsilon = 0.00001;
if(abs($a - $b) < $epsilon) {
  echo "true";
}

As for the Ruby issue, they appear to be using different versions. Codepad uses 1.8.6, While Ideaone uses 1.9.3, but it's more likely related to a config somewhere.

至于Ruby问题,它们似乎使用不同的版本。 Codepad使用1.8.6,而Ideaone使用1.9.3,但它更可能与某个地方的配置相关。

#4


2  

If we want this property

如果我们想要这个属性

  • every two different float has a different printed representation
  • 每两个不同的浮点数具有不同的印刷表示

Or an even stronger one useful for REPL

或者对REPL更有用的一个

  • printed representation shall be re-interpreted unchanged
  • 印刷表示应重新解释不变

Then I see 3 solutions for printing a float/double with base 2 internal representation into base 10

然后我看到3个解决方案,用于将带有基本2内部表示的浮动/双重打印到基础10中

  1. print the EXACT representation.
  2. 打印EXACT表示。
  3. print enough decimal digits (with proper rounding)
  4. 打印足够的十进制数字(适当的舍入)
  5. print the shortest decimal representation that can be reinterpreted unchanged
  6. 打印可以不加改变地重新解释的最短十进制表示

Since in base two, the float number is an_integer * 2^an_exponent, its base 10 exact representation has a finite number of digits.
Unfortunately, this can result in very long strings... For example 1.0e-10 is represented exactly as 1.0000000000000000364321973154977415791655470655996396089904010295867919921875e-10

由于在基数2中,浮点数是an_integer * 2 ^ an_exponent,其基数10精确表示具有有限的位数。不幸的是,这可能导致非常长的字符串......例如,1.0e-10完全表示为1.0000000000000000364321973154977415791655470655996396089904010295867919921875e-10

Solution 2 is easy, you use printf with 17 digits for IEEE-754 double...
Drawback: it's not exact, nor the shortest! If you enter 0.1, you get 0.100000000000000006

解决方案2很简单,你使用带有17位数字的printf用于IEEE-754双...缺点:它不准确,也不是最短的!如果输入0.1,则得到0.100000000000000006

Solution 3 is the best one for REPL languages, if you enter 0.1, it prints 0.1
Unfortunately it is not found in standard libraries (a shame).
At least, Scheme, Python and recent Squeak/Pharo Smalltalk do it right, I think Java too.

解决方案3是REPL语言的最佳选择,如果输入0.1,则打印0.1不幸的是它在标准库中找不到(遗憾)。至少,Scheme,Python和最近的Squeak / Pharo Smalltalk做得对,我认为Java也是如此。

#5


0  

As for Javascript, base2 is being used internally for calculations.

至于Javascript,base2正在内部用于计算。

> 0.2 + 0.4
0.6000000000000001

For that, Javascript can only deliver even numbers, if the resulting base2 number is not periodic.

为此,如果得到的base2数不是周期性的,则Javascript只能传递偶数。

0.6 is 0.10011 10011 10011 10011 ... in base2 (periodic), whereas 0.5 is not and therefore correctly printed.

0.6是0.10011 10011 10011 10011 ...在base2(周期性)中,而0.5不是因此正确打印。