如何打印浮点数的确切值?

First of all, this is not a floating point newbie question. I know results of floating point arithmetic (not to mention transcendental functions) usually cannot be represented exactly, and that most terminating decimals cannot be represented exactly as binary floating point numbers.

首先，这不是一个浮点新问题。我知道浮点算术(更不用说超越函数)的结果通常不能被精确地表示，而且大多数终止小数不能被精确地表示为二进制浮点数。

That said, each possible floating point value corresponds exactly to a diadic rational (a rational number p/q where q is a power of 2), which in turn has an exact decimal representation.

也就是说，每个可能的浮点值都对应一个二进有理数(一个有理数p/q, q是2的幂)，而这个有理数又有一个精确的十进制表示。

My question is: How do you find this exact decimal representation efficiently? sprintf and similar functions are usually only specified up to a number of significant digits to uniquely determine the original floating point value; they don't necessarily print the exact decimal representation. I know one algorithm I've used, but it's very slow, O(e^2) where e is the exponent. Here's an outline:

我的问题是:如何有效地找到这个精确的十进制表示?sprintf和类似的函数通常只指定若干位有效数字，以惟一地确定原始浮点值;它们不一定打印出精确的十进制表示。我知道我使用了一个算法,但是它非常缓慢,O(e ^ 2),e是指数。这是一个大纲:

Convert the mantissa to a decimal integer. You can either do this by pulling apart the bits to read the mantissa directly, or you can write a messy floating point loop that first multiplies the value by a power of two to put it in the range 1<=x<10, then pulls off a digit at a time by casting to int, subtracting, and multiplying by 10.
将尾数转换为十进制整数。你可以通过直接拉开一些阅读的尾数,或者你可以写一个混乱的浮点循环第一次繁殖2的幂的价值用范围1 < = x < 10,然后脱下一个数字一次铸造int,减去,并乘以10。
Apply the exponent by repeatedly multiplying or dividing by 2. This is an operation on the string of decimal digits you generated. Every ~3 multiplications will add an extra digit to the left. Every single dividion will add an extra digit to the right.
通过反复乘以或除以2来应用指数。这是对您生成的十进制数字字符串的操作。每做3次乘法都会在左边增加一个数字。每一个除数都会在右边增加一个数字。

Is this really the best possible? I doubt it, but I'm not a floating-point expert and I can't find a way to do the base-10 computations on the floating point representation of the number without running into a possibility of inexact results (multiplying or dividing by anything but a power of 2 is a lossy operation on floating point numbers unless you know you have free bits to work with).

这真的是最好的吗?我怀疑它,但我不是一个浮点专家,我不能找到一个办法八进制数数计算浮点表示的数量没有运行到一个不准确的结果的可能性(乘以或除以一个2的幂是一种有损操作浮点数,除非你知道你有*位一起工作)。

10 个解决方案

#1

This question has a bureaucratic part and an algorithmic part. A floating point number is stored internally as (2^e × m), where e is an exponent (itself in binary) and m is a mantissa. The bureaucratic part of the question is how to access this data, but R. seems more interested in the algorithmic part of the question, namely converting (2^e × m) to a fraction (a/b) in decimal form. The answer to the bureaucratic question in several languages is "frexp" (which is an interesting detail that I didn't know before today).

这个问题有官僚主义的部分，也有算法的部分。一个浮点数存储内部(2 e×m),e是一个指数(二进制本身)和m是一个尾数。官僚的部分问题是如何访问这些数据,但r .似乎更感兴趣的算法问题的一部分,即(2 e×m)转换成一个分数小数形式(a / b)。用几种语言回答官僚问题的答案是“frexp”(这是一个有趣的细节，我今天还不知道)。

It is true that at first glance, it takes O(e²) work just to write 2^e in decimal, and more time still for the mantissa. But, thanks to the magic of the Schonhage-Strassen fast multiplication algorithm, you can do it in Õ(e) time, where the tilde means "up to log factors". If you view Schonhage-Strassen as magic, then it's not that hard to think of what to do. If e is even, you can recursively compute 2^e/2, and then square it using fast multiplication. On the other hand if e is odd, you can recursively compute 2^e-1 and then double it. You have to be careful to check that there is a version of Schonhage-Strassen in base 10. Although it is not widely documented, it can be done in any base.

的确，乍一看，只用O(e2)就能把2e写成小数，而尾数还需要更多的时间。但是，多亏了Schonhage-Strassen快速乘法算法的魔力，你可以在O (e)时间内完成，在O (e)时间内，波浪线表示“达到log因子”。如果你认为Schonhage-Strassen是魔法，那你就不难想到该怎么做了。如果e是偶数，你可以递归地计算2e/2，然后用快速乘法求平方。另一方面，如果e是奇数，你可以递归地计算2 -1然后把它加倍。你必须小心检查是否有一个版本的Schonhage-Strassen在基地10。虽然它没有被广泛记录，但它可以在任何基础上完成。

Converting a very long mantissa from binary to base 10 is not exactly the same question, but it has a similar answer. You can divide the mantissa into two halves, m = a 2^k + b. Then recursively convert a and b to base 10, convert 2^k to base 10, and do another fast multiplication to compute m in base 10.

将一个非常长的尾数从二进制转换为基数10不是完全相同的问题，但是它有相似的答案。你可以把尾数分成两半,m = 2 k + b,然后递归地10 a和b转换为基地,10 2 ^ k转换为基地,并且做一个快速乘法计算m基地10。

The abstract result behind all of this is that you can convert integers from one base to another in Õ(N) time.

这一切背后的抽象结果是，可以在O (N)时间内将整数从一个碱基转换为另一个碱基。

If the question is about standard 64-bit floating point numbers, then it's too small for the fancy Schonhage-Strassen algorithm. In this range you can instead save work with various tricks. One approach is to store all 2048 values of 2^e in a lookup table, and then work in the mantissa with asymmetric multiplication (in between long multiplication and short multiplication). Another trick is to work in base 10000 (or a higher power of 10, depending on architecture) instead of base 10. But, as R. points out in the comments, 128-bit floating point numbers already allow large enough exponents to call into question both lookup tables and standard long multiplication. As a practical matter, long multiplication is the fastest up to a handful of digits, then in a significant medium range one can use Karatsuba multiplication or Toom-Cook multiplication, and then after that a variation of Schonhage-Strassen is best not just in theory but also in practice.

如果问题是关于标准的64位浮点数，那么它对于花哨的Schonhage-Strassen算法来说太小了。在这个范围内，您可以使用各种技巧来保存工作。一种方法是将所有2048个2e值存储在查找表中，然后使用不对称乘法(在长乘法和短乘法之间)处理尾数。另一个技巧是使用基数10000(或者更高的幂10，取决于架构)而不是基数10。但是，正如r在评论中指出的，128位浮点数已经允许足够大的指数对查找表和标准长乘法提出质疑。实际上，长乘法运算速度最快的是几位数字，然后在一个重要的中等范围内，我们可以使用卡拉托巴乘法或托姆-库克乘法，然后，Schonhage-Strassen的变异不仅在理论上是最好的，在实践中也是最好的。

Actually, the big integer package GMP already has Õ(N) time radix conversion, as well as good heuristics for which choice of multiplication algorithm. The only difference between their solution and mine is that instead of doing any big arithmetic in base 10, they compute large powers of 10 in base 2. In this solution, they also need fast division, but that can be obtained from fast multiplication in any of several ways.

实际上，大整数包GMP已经有O (N)个时间基数的转换，对于乘法算法的选择也有很好的启发式。它们的解和我的唯一不同之处在于，它们不是用10为底的大算术，而是用2为底的10为大幂。在这个解中，它们也需要快速除法，但是可以通过几种方法中的任意一种从快速乘法中得到。

#2

I see you've accepted an answer already but here are a couple of open source implementations of this conversion you might want to look at:

我看到你已经接受了答案，但是这里有几个开源的实现，你可能想看看:

David Gay's dtoa() function in dtoa.c (http://www.netlib.org/fp/dtoa.c).

David Gay的dtoa()函数在dtoa中。c(http://www.netlib.org/fp/dtoa.c)。
The function ___printf_fp() in file /stdio-common/printf_fp.c in glibc (http://ftp.gnu.org/gnu/glibc/glibc-2.11.2.tar.gz, for example).

函数___printf_fp()位于文件/stdio-common/printf_fp中。c在glibc中(例如http://ftp.gnu.org/gnu/glibc/glibc-2.11.2.tar.gz)。

Both will print as many digits as you ask for in a %f type printf (as I've written about here: http://www.exploringbinary.com/print-precision-of-dyadic-fractions-varies-by-language/ and http://www.exploringbinary.com/print-precision-of-floating-point-integers-varies-too/).

两者都将在%f类型printf中打印所需的数字(正如我在这里所写的:http://www.exploringbinary.com/print-precision-of dy- fractions-varis-varies-bylanguage/以及http://www.exploringbinary.com/print-floating-point - integers-too/)。

#3

There's been a lot of work on printing floating-point numbers. The gold standard is to print out a decimal equivalent of minimal length such that when the decimal equivalent is read back in, you get the same floating-point number that you started with, no matter what the rounding mode is during readback. You can read about the algorithm in the excellent paper by Burger and Dybvig.

印刷浮点数有很多工作要做。黄金标准是打印一个与最小长度相等的十进制数，这样当将十进制等价物读入时，您将得到与开始时相同的浮点数，无论读回期间的舍入模式是什么。你可以在Burger和Dybvig的优秀论文中读到这个算法。

#4

Although it's C# and your question is tagged with C, Jon Skeet has code to convert a double to its exact representation as a string: http://www.yoda.arachsys.com/csharp/DoubleConverter.cs

虽然它是c#，并且您的问题是用C标记的，但是Jon Skeet有代码将double转换为字符串的确切表示:http://www.yoda.arachsys.com/csharp/DoubleConverter.cs

From a quick glance, it does not appear to be too hard to port to C, and even easier to write in C++.

快速浏览一下，它看起来并不难移植到C，甚至更容易用c++编写。

Upon further reflection, it appears that Jon's algorithm is also O(e^2), as it also loops over the exponent. However, that means the algorithm is O(log(n)^2) (where n is the floating-point number), and I'm not sure you can convert from base 2 to base 10 in better than log-squared time.

在进一步反思,乔恩的算法也是O(e ^ 2),也因为它循环指数。然而,这意味着算法是O(log(n)^ 2)(其中n是浮点数),我不确定你可以从基地2转换为以10为底比log²时间。

#5

Well being no floating point expert myself, I'd defer to using a well tested open source library.

我自己也不是浮点专家，所以我建议使用经过良好测试的开源库。

The GNU MPFR is a good one.

GNU MPFR是一个很好的例子。

The MPFR library is a C library for multiple-precision floating-point computations with correct rounding. The main goal of MPFR is to provide a library for multiple-precision floating-point computation which is both efficient and has a well-defined semantics.

MPFR库是一个C库，用于进行精确四舍五入的多精度浮点计算。MPFR的主要目标是为多精度浮点计算提供一个库，既高效又具有良好定义的语义。

#6

sprintf and similar functions are usually only specified up to a number of significant digits to uniquely determine the original floating point value; they don't necessarily print the exact decimal representation.

sprintf和类似的函数通常只指定若干位有效数字，以惟一地确定原始浮点值;他们不必打印精确的小数表示。

You can ask for more significant digits than the default:

你可以要求比默认数字更有意义的数字:

printf("%.100g\n", 0.1);

prints 0.1000000000000000055511151231257827021181583404541015625.

0.1000000000000000055511151231257827021181583404541015625打印。

#7

If you want more exact results, why not use fixed point math instead? Conversions are quick. Error is known and can be worked around. Not an exact answer to your question, but a different idea for you.

如果你想要更精确的结果，为什么不用定点数学呢?快速转换。错误是已知的，可以处理。对你的问题不是一个确切的答案，而是一个不同的想法。

#8

Off the top of my head, why not break the exponent down into a sum of binary exponents first, then all your operations are loss-less.

在我的脑海中，为什么不先把指数分解成二进制指数的和，然后你所有的操作都没有损失。

I.e.

即。

10^2 = 2^6 + 2^5 + 2^2

Then sum:

然后总结:

mantissa<<6 + mantissa<<5 + mantissa<<2

I'm thinking that breaking it down would be on the O(n) on the the number of digits, the shifting is O(1), and the summing is O(n) digits...

我想把它分解成O(n)的数字，移位是O(1)求和是O(n)的数字…

You would have to have an integer class big enough to store the results, of course...

当然，您必须有一个足够大的整数类来存储结果……

Let me know - I'm curious about this, it really made me think. :-)

让我知道-我很好奇，它真的让我思考。:-)

#9

There are 3 ways

有三个方法

printing numbers in bin or hex

在bin或hex中打印数字

This is the most precise way. I prefer hex because it is more like base 10 for reading/feel like for example F.8h = 15.5 no precision loss here.

这是最精确的方法。我更喜欢十六进制，因为它更像以10为基数的阅读/感觉像是F.8h = 15.5，这里没有精度损失。
printing in dec but only the relevant digits

在dec上打印，但只有相关的数字。

With this I mean only digits which can have 1 in your number represented as close as possible.

有了这个，我的意思是只有数字中可以有1的数字代表尽可能接近。

num of integer digits are easy and precise (no precision loss):

整数位的num简单、准确(无精度损失):
```
// n10 - base 10 integer digits
// n2  - base 2 integer digits
n10=log10(2^n2)
n10=log2(2^n2)/log2(10)
n10=n2/log2(10)
n10=ceil(n2*0.30102999566398119521373889472449)
// if fist digit is 0 and n10 > 1 then n10--
```
num of fractional digits are more tricky and empirically I found this:

分数位数更棘手，根据经验我发现:
```
// n10 - base 10 fract. digits
// n2  - base 2 fract. digits >= 8
n10=0; if (n02==8) n10=1;
else if (n02==9) n10=2;
else if (n02> 9)
    {
    n10=((n02-9)%10);
         if (n10>=6) n10=2;
    else if (n10>=1) n10=1;
    n10+=2+(((n02-9)/10)*3);
    }
```
if you make a dependency table n02 <-> n10 then you see that constant 0.30102999566398119521373889472449 is still present, but at start from 8 bits because less cannot represent 0.1 with satisfactory precision (0.85 - 1.15). because of negative exponents of base 2 the dependency is not linear, instead it patterns. This code works for small bit count (<=52) but at larger bit counts there can be error because used pattern do not fit log10(2) or 1/log2(10) exactly.

如果您创建一个依赖表n02 <-> n10，那么您将看到常量0.3010299956698119521373889472449仍然存在，但是从8位开始，因为less不能以令人满意的精度表示0.1(0.85 - 1.15)。因为以2为底的负指数关系不是线性的，而是模式。这段代码适用于小位计数(<=52)，但在大位计数时可能会出现错误，因为使用的模式并不完全适合log10(2)或1/log2(10)。

for larger bit counts I use this:

对于较大的比特计数，我使用以下方法:
```
n10=7.810+(9.6366363636363636363636*((n02>>5)-1.0));
```
but that formula is 32bit aligned !!! and also bigger bit count ads error to it

但是这个公式是32位对齐的!!!还有更大的比特计数广告错误

P.S. further analysis of binary representation of decadic numbers

进一步分析十进制数的二进制表示形式
```
0.1
0.01
0.001
0.0001
...
```
may reveal the exact pattern repetition which would lead to exact number of relevant digits for any bit count.

可能会显示准确的模式重复，这将导致精确的数字，任何位计数。

for clarity:

为了清晰:
```
8 bin digits -> 1 dec digits
9 bin digits -> 2 dec digits
10 bin digits -> 3 dec digits
11 bin digits -> 3 dec digits
12 bin digits -> 3 dec digits
13 bin digits -> 3 dec digits
14 bin digits -> 3 dec digits
15 bin digits -> 4 dec digits
16 bin digits -> 4 dec digits
17 bin digits -> 4 dec digits
18 bin digits -> 4 dec digits
19 bin digits -> 5 dec digits
20 bin digits -> 6 dec digits
21 bin digits -> 6 dec digits
22 bin digits -> 6 dec digits
23 bin digits -> 6 dec digits
24 bin digits -> 6 dec digits
25 bin digits -> 7 dec digits
26 bin digits -> 7 dec digits
27 bin digits -> 7 dec digits
28 bin digits -> 7 dec digits
29 bin digits -> 8 dec digits
30 bin digits -> 9 dec digits
31 bin digits -> 9 dec digits
32 bin digits -> 9 dec digits
33 bin digits -> 9 dec digits
34 bin digits -> 9 dec digits
35 bin digits -> 10 dec digits
36 bin digits -> 10 dec digits
37 bin digits -> 10 dec digits
38 bin digits -> 10 dec digits
39 bin digits -> 11 dec digits
40 bin digits -> 12 dec digits
41 bin digits -> 12 dec digits
42 bin digits -> 12 dec digits
43 bin digits -> 12 dec digits
44 bin digits -> 12 dec digits
45 bin digits -> 13 dec digits
46 bin digits -> 13 dec digits
47 bin digits -> 13 dec digits
48 bin digits -> 13 dec digits
49 bin digits -> 14 dec digits
50 bin digits -> 15 dec digits
51 bin digits -> 15 dec digits
52 bin digits -> 15 dec digits
53 bin digits -> 15 dec digits
54 bin digits -> 15 dec digits
55 bin digits -> 16 dec digits
56 bin digits -> 16 dec digits
57 bin digits -> 16 dec digits
58 bin digits -> 16 dec digits
59 bin digits -> 17 dec digits
60 bin digits -> 18 dec digits
61 bin digits -> 18 dec digits
62 bin digits -> 18 dec digits
63 bin digits -> 18 dec digits
64 bin digits -> 18 dec digits
```
And at last do not forget to round the cut off digits !!! That means if digit after the last relevant digit is >=5 in dec than last relevant digit should be +1 ... and if it is already 9 than you must go to previous digit and so on...

最后别忘了把被剪掉的手指围起来!!!也就是说，如果在最后一个相关数字后面的数字是>=5，那么在12月的最后一个相关数字应该是+1……如果它已经是9，那么你必须去前一个数字，等等。

print exact value

打印精确值

To print exact value of fractional binary number just print fractional n digits where n is the number of fractional bits because the value represented is sum of this values so the number of fractional decimals cannot be bigger than the num of fractional digits of LSB. Stuff above (bullet #2) is relevant for storing decimal number to float (or printing just relevant decimals).

要打印分数二进制数的精确值，只需打印分数n位数，其中n是小数位数，因为表示的值是这个值的和，所以小数小数的小数位数不能大于LSB的小数位数。上面的内容(项目#2)与将十进制数存储为浮点数相关(或者只打印相关的小数)。

negative powers of two exact values...

两个精确值的负幂……

2^- 1 = 0.5
2^- 2 = 0.25
2^- 3 = 0.125
2^- 4 = 0.0625
2^- 5 = 0.03125
2^- 6 = 0.015625
2^- 7 = 0.0078125
2^- 8 = 0.00390625
2^- 9 = 0.001953125
2^-10 = 0.0009765625
2^-11 = 0.00048828125
2^-12 = 0.000244140625
2^-13 = 0.0001220703125
2^-14 = 0.00006103515625
2^-15 = 0.000030517578125
2^-16 = 0.0000152587890625
2^-17 = 0.00000762939453125
2^-18 = 0.000003814697265625
2^-19 = 0.0000019073486328125
2^-20 = 0.00000095367431640625

now negative powers of 10 printed by exact value style for 64 bit doubles:

现在，按精确值样式打印10的负功率为64位双精度:

10^+ -1 = 0.1000000000000000055511151231257827021181583404541015625
        = 0.0001100110011001100110011001100110011001100110011001101b
10^+ -2 = 0.01000000000000000020816681711721685132943093776702880859375
        = 0.00000010100011110101110000101000111101011100001010001111011b
10^+ -3 = 0.001000000000000000020816681711721685132943093776702880859375
        = 0.000000000100000110001001001101110100101111000110101001111111b
10^+ -4 = 0.000100000000000000004792173602385929598312941379845142364501953125
        = 0.000000000000011010001101101110001011101011000111000100001100101101b
10^+ -5 = 0.000010000000000000000818030539140313095458623138256371021270751953125
        = 0.000000000000000010100111110001011010110001000111000110110100011110001b
10^+ -6 = 0.000000999999999999999954748111825886258685613938723690807819366455078125
        = 0.000000000000000000010000110001101111011110100000101101011110110110001101b
10^+ -7 = 0.0000000999999999999999954748111825886258685613938723690807819366455078125
        = 0.0000000000000000000000011010110101111111001010011010101111001010111101001b
10^+ -8 = 0.000000010000000000000000209225608301284726753266340892878361046314239501953125
        = 0.000000000000000000000000001010101111001100011101110001000110000100011000011101b
10^+ -9 = 0.0000000010000000000000000622815914577798564188970686927859787829220294952392578125
        = 0.0000000000000000000000000000010001001011100000101111101000001001101101011010010101b
10^+-10 = 0.00000000010000000000000000364321973154977415791655470655996396089904010295867919921875
        = 0.00000000000000000000000000000000011011011111001101111111011001110101111011110110111011b
10^+-11 = 0.00000000000999999999999999939496969281939810930172340963650867706746794283390045166015625
        = 0.00000000000000000000000000000000000010101111111010111111111100001011110010110010010010101b
10^+-12 = 0.00000000000099999999999999997988664762925561536725284350612952266601496376097202301025390625
        = 0.00000000000000000000000000000000000000010001100101111001100110000001001011011110101000010001b
10^+-13 = 0.00000000000010000000000000000303737455634003709136034716842278413651001756079494953155517578125
        = 0.00000000000000000000000000000000000000000001110000100101110000100110100001001001011101101000001b
10^+-14 = 0.000000000000009999999999999999988193093545598986971343290729163921781719182035885751247406005859375
        = 0.000000000000000000000000000000000000000000000010110100001001001101110000110101000010010101110011011b
10^+-15 = 0.00000000000000100000000000000007770539987666107923830718560119501514549256171449087560176849365234375
        = 0.00000000000000000000000000000000000000000000000001001000000011101011111001111011100111010101100001011b
10^+-16 = 0.00000000000000009999999999999999790977867240346035618411149408467364363417573258630000054836273193359375
        = 0.00000000000000000000000000000000000000000000000000000111001101001010110010100101111101100010001001101111b
10^+-17 = 0.0000000000000000100000000000000007154242405462192450852805618492324772617063644020163337700068950653076171875
        = 0.0000000000000000000000000000000000000000000000000000000010111000011101111010101000110010001101101010010010111b
10^+-18 = 0.00000000000000000100000000000000007154242405462192450852805618492324772617063644020163337700068950653076171875
        = 0.00000000000000000000000000000000000000000000000000000000000100100111001001011101110100011101001001000011101011b
10^+-19 = 0.000000000000000000099999999999999997524592683526013185572915905567688179926555402943222361500374972820281982421875
        = 0.000000000000000000000000000000000000000000000000000000000000000111011000001111001001010011111011011011010010101011b
10^+-20 = 0.00000000000000000000999999999999999945153271454209571651729503702787392447107715776066783064379706047475337982177734375
        = 0.00000000000000000000000000000000000000000000000000000000000000000010111100111001010000100001100100100100100001000100011b

now negative powers of 10 printed by relevant decimal digits only style (i am more used to this) for 64bit doubles:

现在用相关的十进制数字只打印10的负幂(我更习惯这个)来打印64位双精度:

10^+ -1 = 0.1
10^+ -2 = 0.01
10^+ -3 = 0.001
10^+ -4 = 0.0001
10^+ -5 = 0.00001
10^+ -6 = 0.000001
10^+ -7 = 0.0000001
10^+ -8 = 0.00000001
10^+ -9 = 0.000000001
10^+-10 = 0.0000000001
10^+-11 = 0.00000000001
10^+-12 = 0.000000000001
10^+-13 = 0.0000000000001
10^+-14 = 0.00000000000001
10^+-15 = 0.000000000000001
10^+-16 = 0.0000000000000001
10^+-17 = 0.00000000000000001
10^+-18 = 0.000000000000000001
10^+-19 = 0.0000000000000000001
10^+-20 = 0.00000000000000000001

hope it helps :)

希望它能帮助:)

#10

-2

You don't. The closest you can come to that is dumping the bytes.

你不。最接近的是转储字节。

#1