Why do some numbers lose accuracy when stored as floating point numbers?
为什么在存储为浮点数时,某些数字会失去准确性?
For example, the decimal number 9.2
can be expressed exactly as a ratio of two decimal integers (92/10
), both of which can be expressed exactly in binary (0b1011100/0b1010
). However, the same ratio stored as a floating point number is never exactly equal to 9.2
:
例如,小数9.2可以精确地表示为两个十进制整数的比率(92/10),两者都可以精确地表示为二进制数(0b1011100/0b1010)。然而,存储为浮点数的比率永远不会完全等于9.2:
32-bit "single precision" float: 9.19999980926513671875
64-bit "double precision" float: 9.199999999999999289457264239899814128875732421875
How can such an apparently simple number be "too big" to express in 64 bits of memory?
这样一个明显简单的数字怎么能在64位内存中“太大”?
3 个解决方案
#1
161
In most programming languages, floating point numbers are represented a lot like scientific notation: with an exponent and a mantissa (also called the significand). A very simple number, say 9.2
, is actually this fraction:
在大多数编程语言中,浮点数都是很像科学的符号:有指数和尾数(也称为意义)。一个非常简单的数字,比如9.2,实际上是这个分数:
5179139571476070 * 2 -49
5179139571476070 * -49(2)
Where the exponent is -49
and the mantissa is 5179139571476070
. The reason it is impossible to represent some decimal numbers this way is that both the exponent and the mantissa must be integers. In other words, all floats must be an integer multiplied by an integer power of 2.
指数为-49,尾数为5179139571476070。不可能用这种方法来表示十进制数的原因是,指数和尾数都必须是整数。换句话说,所有的浮点数都必须是一个整数乘以2的整数幂。
9.2
may be simply 92/10
, but 10 cannot be expressed as 2n if n is limited to integer values.
9.2可能仅仅是92/10,但如果n限于整数值,则10不能表示为2n。
Seeing the Data
First, a few functions to see the components that make a 32- and 64-bit float
. Gloss over these if you only care about the output (example in Python):
首先,一些函数可以查看组成32位和64位浮点数的组件。如果您只关心输出(例如Python中的示例),则要对这些内容进行注释:
def float_to_bin_parts(number, bits=64):
if bits == 32: # single precision
int_pack = 'I'
float_pack = 'f'
exponent_bits = 8
mantissa_bits = 23
exponent_bias = 127
elif bits == 64: # double precision. all python floats are this
int_pack = 'Q'
float_pack = 'd'
exponent_bits = 11
mantissa_bits = 52
exponent_bias = 1023
else:
raise ValueError, 'bits argument must be 32 or 64'
bin_iter = iter(bin(struct.unpack(int_pack, struct.pack(float_pack, number))[0])[2:].rjust(bits, '0'))
return [''.join(islice(bin_iter, x)) for x in (1, exponent_bits, mantissa_bits)]
There's a lot of complexity behind that function, and it'd be quite the tangent to explain, but if you're interested, the important resource for our purposes is the struct module.
在这个函数后面有很多复杂的东西,这是很容易解释的,但是如果你感兴趣的话,我们的目的的重要资源是struct模块。
Python's float
is a 64-bit, double-precision number. In other languages such as C, C++, Java and C#, double-precision has a separate type double
, which is often implemented as 64 bits.
Python的float是一个64位的双精度浮点数。在其他语言中,例如C、c++、Java和c#,双精度有一个单独的类型double,它通常被实现为64位。
When we call that function with our example, 9.2
, here's what we get:
当我们用我们的例子来调用这个函数时,我们得到的是:
>>> float_to_bin_parts(9.2)
['0', '10000000010', '0010011001100110011001100110011001100110011001100110']
Interpreting the Data
You'll see I've split the return value into three components. These components are:
你会看到我把返回值分成了三个部分。这些组件包括:
- Sign
- 标志
- Exponent
- 指数
- Mantissa (also called Significand, or Fraction)
- 尾数(也称为意义,或分数)
Sign
The sign is stored in the first component as a single bit. It's easy to explain: 0
means the float is a positive number; 1
means it's negative. Because 9.2
is positive, our sign value is 0
.
该符号作为单个位存储在第一个组件中。很容易解释:0表示浮点数是正数;1意味着它是负的。因为9.2是正数,我们的符号值是0。
Exponent
The exponent is stored in the middle component as 11 bits. In our case, 0b10000000010
. In decimal, that represents the value 1026
. A quirk of this component is that you must subtract a number equal to 2(# of bits) - 1 - 1 to get the true exponent; in our case, that means subtracting 0b1111111111
(decimal number 1023
) to get the true exponent, 0b00000000011
(decimal number 3).
指数以11位存储在中间组件中。在我们的例子中,0 b10000000010。在十进制中,表示值为1026。这个组件的一个特点是,你必须减去一个等于2(# of bits)的数字——1 - 1才能得到真正的指数;在我们的例子中,这意味着减去0b1111111111(十进制数字1023)以得到真正的指数,0b00000000011(十进制数字3)。
Mantissa
The mantissa is stored in the third component as 52 bits. However, there's a quirk to this component as well. To understand this quirk, consider a number in scientific notation, like this:
尾数作为52位存储在第三个组件中。然而,这个组件也有一个怪癖。要理解这个怪癖,可以用科学的符号来考虑一个数字:
6.0221413x1023
6.0221413 x1023
The mantissa would be the 6.0221413
. Recall that the mantissa in scientific notation always begins with a single non-zero digit. The same holds true for binary, except that binary only has two digits: 0
and 1
. So the binary mantissa always starts with 1
! When a float is stored, the 1
at the front of the binary mantissa is omitted to save space; we have to place it back at the front of our third element to get the true mantissa:
mantissa将是6.0221413。回想一下,科学符号中的尾数总是以一个非零的数字开始。对于二进制来说也是如此,除了二进制只有两个数字:0和1。所以二元尾数总是从1开始!当存储一个浮点数时,在二元尾数前面的1被省略以节省空间;我们必须把它放回第三个元素的前面来得到真正的尾数:
1.0010011001100110011001100110011001100110011001100110
1.0010011001100110011001100110011001100110011001100110
This involves more than just a simple addition, because the bits stored in our third component actually represent the fractional part of the mantissa, to the right of the radix point.
这不仅仅是一个简单的加法,因为存储在第三个分量中的比特实际上代表了尾数的小数部分,也就是小数点的右边。
When dealing with decimal numbers, we "move the decimal point" by multiplying or dividing by powers of 10. In binary, we can do the same thing by multiplying or dividing by powers of 2. Since our third element has 52 bits, we divide it by 252 to move it 52 places to the right:
在处理十进制数时,我们“移动小数点”,乘以或除以10的幂。在二进制中,我们可以通过乘以或除以2的幂来做同样的事情。因为我们的第三个元素有52位,我们把它除以252,把它移到右边52个位置:
0.0010011001100110011001100110011001100110011001100110
0.0010011001100110011001100110011001100110011001100110
In decimal notation, that's the same as dividing 675539944105574
by 4503599627370496
to get 0.1499999999999999
. (This is one example of a ratio that can be expressed exactly in binary, but only approximately in decimal; for more detail, see: 675539944105574 / 4503599627370496.)
在十进制记数法中,这与用4503599627370496除以4503599627370496,得到0。14999999999999相同。(这是一个可以用二进制表示的比率的例子,但仅近似于小数;详情请见:675539944105574 / 4503599627370496。
Now that we've transformed the third component into a fractional number, adding 1
gives the true mantissa.
现在我们已经把第三个分量变成了一个小数,加1给出了真正的尾数。
Recapping the Components
- Sign (first component):
0
for positive,1
for negative - 符号(第一分量):0为正,1为负。
- Exponent (middle component): Subtract 2(# of bits) - 1 - 1 to get the true exponent
- 指数(中分量):减去2(位)- 1 - 1得到真正的指数。
- Mantissa (last component): Divide by 2(# of bits) and add
1
to get the true mantissa - 尾数(最后一个分量):除以2(位),加1得到真正的尾数。
Calculating the Number
Putting all three parts together, we're given this binary number:
把这三个部分放在一起,我们得到了这个二进制数:
1.0010011001100110011001100110011001100110011001100110 x 1011
1.0010011001100110011001100110011001100110011001100110 x 1011
Which we can then convert from binary to decimal:
我们可以把它从二进制转换成小数:
1.1499999999999999 x 23 (inexact!)
1.1499999999999999 x 23(不准确!)
And multiply to reveal the final representation of the number we started with (9.2
) after being stored as a floating point value:
然后相乘,以显示我们开始时的数字的最终表示形式(9.2),它被存储为浮点值:
9.1999999999999993
9.1999999999999993
Representing as a Fraction
9.2
Now that we've built the number, it's possible to reconstruct it into a simple fraction:
既然我们已经建立了这个数字,就有可能把它重构成一个简单的分数:
1.0010011001100110011001100110011001100110011001100110 x 1011
1.0010011001100110011001100110011001100110011001100110 x 1011
Shift mantissa to a whole number:
将mantissa移到一个整数:
10010011001100110011001100110011001100110011001100110 x 1011-110100
10010011001100110011001100110011001100110011001100110 x 1011 - 110100
Convert to decimal:
转换为小数:
5179139571476070 x 23-52
5179139571476070 x 23-52
Subtract the exponent:
减去指数:
5179139571476070 x 2-49
5179139571476070 x 2-49
Turn negative exponent into division:
将负指数转为除法:
5179139571476070 / 249
5179139571476070 / 249
Multiply exponent:
用指数:
5179139571476070 / 562949953421312
5179139571476070 / 562949953421312
Which equals:
等于:
9.1999999999999993
9.1999999999999993
9.5
>>> float_to_bin_parts(9.5)
['0', '10000000010', '0011000000000000000000000000000000000000000000000000']
Already you can see the mantissa is only 4 digits followed by a whole lot of zeroes. But let's go through the paces.
你可以看到尾数只有4个数字后面跟着很多个0。让我们来看看这些步骤。
Assemble the binary scientific notation:
汇编二进制科学符号:
1.0011 x 1011
1.0011 x 1011
Shift the decimal point:
小数点的转变:
10011 x 1011-100
10011 x 1011 - 100
Subtract the exponent:
减去指数:
10011 x 10-1
10011 x - 1
Binary to decimal:
二进制小数:
19 x 2-1
19 x 2 - 1
Negative exponent to division:
负指数部门:
19 / 21
19/21
Multiply exponent:
用指数:
19 / 2
19日/ 2
Equals:
等于:
9.5
9.5
Further reading
- The Floating-Point Guide: What Every Programmer Should Know About Floating-Point Arithmetic, or, Why don’t my numbers add up? (floating-point-gui.de)
- 浮点指南:每个程序员都应该知道浮点算术,或者,为什么我的数字不加起来?(floating-point-gui.de)
- What Every Computer Scientist Should Know About Floating-Point Arithmetic (Goldberg 1991)
- 每个计算机科学家都应该知道浮点算法(Goldberg 1991)
- IEEE Double-precision floating-point format (Wikipedia)
- IEEE双精度浮点格式(Wikipedia)
- Floating Point Arithmetic: Issues and Limitations (docs.python.org)
- 浮点运算:问题和限制(docs.python.org)
- Floating Point Binary
- 浮点数的二进制
#2
22
This isn't a full answer (mhlester already covered a lot of good ground I won't duplicate), but I would like to stress how much the representation of a number depends on the base you are working in.
这并不是一个完整的答案(mhlester已经涵盖了很多我不会重复的好理由),但我想强调的是,一个数字的表示取决于你所工作的基础。
Consider the fraction 2/3
In good-ol' base 10, we typically write it out as something like
在“好ol”10中,我们通常把它写成类似的形式。
- 0.666...
- 0.666……
- 0.666
- 0.666
- 0.667
- 0.667
When we look at those representations, we tend to associate each of them with the fraction 2/3, even though only the first representation is mathematically equal to the fraction. The second and third representations/approximations have an error on the order of 0.001, which is actually much worse than the error between 9.2 and 9.1999999999999993. In fact, the second representation isn't even rounded correctly! Nevertheless, we don't have a problem with 0.666 as an approximation of the number 2/3, so we shouldn't really have a problem with how 9.2 is approximated in most programs. (Yes, in some programs it matters.)
当我们看这些表示时,我们倾向于将它们与分数2/3联系起来,即使只有第一个表示在数学上等于分数。第二个和第三个表示/近似在0.001的顺序上有一个错误,这实际上比9.2和9.19999999999993之间的错误差很多。实际上,第二个表示甚至没有正确地圆!不过,0。666作为2/3的近似值,我们没有问题,所以我们不应该有问题,在大多数程序中,9.2是如何近似的。(是的,在一些项目中它很重要。)
Number bases
So here's where number bases are crutial. If we were trying to represent 2/3 in base 3, then
所以这里的数基是原始的。如果我们要用3 /3来表示2/3。
(2/3)10 = 0.23
(2/3)10 = 0.23
In other words, we have an exact, finite representation for the same number by switching bases! The take-away is that even though you can convert any number to any base, all rational numbers have exact finite representations in some bases but not in others.
换句话说,我们有一个精确的,有限的表示,相同的数字通过交换基地!尽管你可以将任意数字转换为任何基数,但所有的理性数字在某些基础上都有精确的有限表示,而在其他基础上却没有。
To drive this point home, let's look at 1/2. It might surprise you that even though this perfectly simple number has an exact representation in base 10 and 2, it requires a repeating representation in base 3.
要把这个点开回家,让我们看看1/2。你可能会惊讶,即使这个非常简单的数字在基数10和2中有一个精确的表示,它需要在基数3中重复表示。
(1/2)10 = 0.510 = 0.12 = 0.1111...3
(1/2)10 = 0.510 = 0.12 = 0.1111…
Why are floating point numbers inaccurate?
Because often-times, they are approximating rationals that cannot be represented finitely in base 2 (the digits repeat), and in general they are approximating real (possibly irrational) numbers which may not be representable in finitely many digits in any base.
因为通常情况下,它们都是近似的,不能在2(数字重复)中以有限的形式表示,一般来说,它们接近真实(可能是不合理的)数字,而这些数字在任何基础上都不能以有限的数字表示。
#3
6
While all of the other answers are good there is still one thing missing:
虽然所有其他的答案都很好,但仍然有一件事没有做到:
It is impossible to represent irrational numbers (e.g. π, sqrt(2)
, log(3)
, etc.) precisely!
是不可能表示无理数(如π、sqrt(2)、日志(3),等等)精确!
And that actually is why they are called irrational. No amount of bit storage in the world would be enough to hold even one of them. Only symbolic arithmetic is able to preserve their precision.
这就是为什么他们被称为非理性的原因。在这个世界上,任何数量的比特存储都不足以容纳其中的一个。只有符号运算才能保持其精确度。
Although if you would limit your math needs to rational numbers only the problem of precision becomes manageable. You would need to store a pair of (possibly very big) integers a
and b
to hold the number represented by the fraction a/b
. All your arithmetic would have to be done on fractions just like in highschool math (e.g. a/b * c/d = ac/bd
).
尽管如果你把数学的需要限制在合理的数字上,那么精确的问题就变得可控了。您需要存储一对(可能非常大的)整数a和b,以容纳由分数a/b表示的数字。你所有的算术都必须在分数上完成,就像高中数学一样(例如a/b * c/d = ac/bd)。
But of course you would still run into the same kind of trouble when pi
, sqrt
, log
, sin
, etc. are involved.
当然,当pi, sqrt, log, sin等涉及到的时候,你还是会遇到同样的麻烦。
TL;DR
博士TL;
For hardware accelerated arithmetic only a limited amount of rational numbers can be represented. Every not-representable number is approximated. Some numbers (i.e. irrational) can never be represented no matter the system.
对于硬件加速算法,只有有限数量的有理数可以表示。每一个不能表示的数都是近似的。有些数字(即不合理的)永远不能代表任何系统。
#1
161
In most programming languages, floating point numbers are represented a lot like scientific notation: with an exponent and a mantissa (also called the significand). A very simple number, say 9.2
, is actually this fraction:
在大多数编程语言中,浮点数都是很像科学的符号:有指数和尾数(也称为意义)。一个非常简单的数字,比如9.2,实际上是这个分数:
5179139571476070 * 2 -49
5179139571476070 * -49(2)
Where the exponent is -49
and the mantissa is 5179139571476070
. The reason it is impossible to represent some decimal numbers this way is that both the exponent and the mantissa must be integers. In other words, all floats must be an integer multiplied by an integer power of 2.
指数为-49,尾数为5179139571476070。不可能用这种方法来表示十进制数的原因是,指数和尾数都必须是整数。换句话说,所有的浮点数都必须是一个整数乘以2的整数幂。
9.2
may be simply 92/10
, but 10 cannot be expressed as 2n if n is limited to integer values.
9.2可能仅仅是92/10,但如果n限于整数值,则10不能表示为2n。
Seeing the Data
First, a few functions to see the components that make a 32- and 64-bit float
. Gloss over these if you only care about the output (example in Python):
首先,一些函数可以查看组成32位和64位浮点数的组件。如果您只关心输出(例如Python中的示例),则要对这些内容进行注释:
def float_to_bin_parts(number, bits=64):
if bits == 32: # single precision
int_pack = 'I'
float_pack = 'f'
exponent_bits = 8
mantissa_bits = 23
exponent_bias = 127
elif bits == 64: # double precision. all python floats are this
int_pack = 'Q'
float_pack = 'd'
exponent_bits = 11
mantissa_bits = 52
exponent_bias = 1023
else:
raise ValueError, 'bits argument must be 32 or 64'
bin_iter = iter(bin(struct.unpack(int_pack, struct.pack(float_pack, number))[0])[2:].rjust(bits, '0'))
return [''.join(islice(bin_iter, x)) for x in (1, exponent_bits, mantissa_bits)]
There's a lot of complexity behind that function, and it'd be quite the tangent to explain, but if you're interested, the important resource for our purposes is the struct module.
在这个函数后面有很多复杂的东西,这是很容易解释的,但是如果你感兴趣的话,我们的目的的重要资源是struct模块。
Python's float
is a 64-bit, double-precision number. In other languages such as C, C++, Java and C#, double-precision has a separate type double
, which is often implemented as 64 bits.
Python的float是一个64位的双精度浮点数。在其他语言中,例如C、c++、Java和c#,双精度有一个单独的类型double,它通常被实现为64位。
When we call that function with our example, 9.2
, here's what we get:
当我们用我们的例子来调用这个函数时,我们得到的是:
>>> float_to_bin_parts(9.2)
['0', '10000000010', '0010011001100110011001100110011001100110011001100110']
Interpreting the Data
You'll see I've split the return value into three components. These components are:
你会看到我把返回值分成了三个部分。这些组件包括:
- Sign
- 标志
- Exponent
- 指数
- Mantissa (also called Significand, or Fraction)
- 尾数(也称为意义,或分数)
Sign
The sign is stored in the first component as a single bit. It's easy to explain: 0
means the float is a positive number; 1
means it's negative. Because 9.2
is positive, our sign value is 0
.
该符号作为单个位存储在第一个组件中。很容易解释:0表示浮点数是正数;1意味着它是负的。因为9.2是正数,我们的符号值是0。
Exponent
The exponent is stored in the middle component as 11 bits. In our case, 0b10000000010
. In decimal, that represents the value 1026
. A quirk of this component is that you must subtract a number equal to 2(# of bits) - 1 - 1 to get the true exponent; in our case, that means subtracting 0b1111111111
(decimal number 1023
) to get the true exponent, 0b00000000011
(decimal number 3).
指数以11位存储在中间组件中。在我们的例子中,0 b10000000010。在十进制中,表示值为1026。这个组件的一个特点是,你必须减去一个等于2(# of bits)的数字——1 - 1才能得到真正的指数;在我们的例子中,这意味着减去0b1111111111(十进制数字1023)以得到真正的指数,0b00000000011(十进制数字3)。
Mantissa
The mantissa is stored in the third component as 52 bits. However, there's a quirk to this component as well. To understand this quirk, consider a number in scientific notation, like this:
尾数作为52位存储在第三个组件中。然而,这个组件也有一个怪癖。要理解这个怪癖,可以用科学的符号来考虑一个数字:
6.0221413x1023
6.0221413 x1023
The mantissa would be the 6.0221413
. Recall that the mantissa in scientific notation always begins with a single non-zero digit. The same holds true for binary, except that binary only has two digits: 0
and 1
. So the binary mantissa always starts with 1
! When a float is stored, the 1
at the front of the binary mantissa is omitted to save space; we have to place it back at the front of our third element to get the true mantissa:
mantissa将是6.0221413。回想一下,科学符号中的尾数总是以一个非零的数字开始。对于二进制来说也是如此,除了二进制只有两个数字:0和1。所以二元尾数总是从1开始!当存储一个浮点数时,在二元尾数前面的1被省略以节省空间;我们必须把它放回第三个元素的前面来得到真正的尾数:
1.0010011001100110011001100110011001100110011001100110
1.0010011001100110011001100110011001100110011001100110
This involves more than just a simple addition, because the bits stored in our third component actually represent the fractional part of the mantissa, to the right of the radix point.
这不仅仅是一个简单的加法,因为存储在第三个分量中的比特实际上代表了尾数的小数部分,也就是小数点的右边。
When dealing with decimal numbers, we "move the decimal point" by multiplying or dividing by powers of 10. In binary, we can do the same thing by multiplying or dividing by powers of 2. Since our third element has 52 bits, we divide it by 252 to move it 52 places to the right:
在处理十进制数时,我们“移动小数点”,乘以或除以10的幂。在二进制中,我们可以通过乘以或除以2的幂来做同样的事情。因为我们的第三个元素有52位,我们把它除以252,把它移到右边52个位置:
0.0010011001100110011001100110011001100110011001100110
0.0010011001100110011001100110011001100110011001100110
In decimal notation, that's the same as dividing 675539944105574
by 4503599627370496
to get 0.1499999999999999
. (This is one example of a ratio that can be expressed exactly in binary, but only approximately in decimal; for more detail, see: 675539944105574 / 4503599627370496.)
在十进制记数法中,这与用4503599627370496除以4503599627370496,得到0。14999999999999相同。(这是一个可以用二进制表示的比率的例子,但仅近似于小数;详情请见:675539944105574 / 4503599627370496。
Now that we've transformed the third component into a fractional number, adding 1
gives the true mantissa.
现在我们已经把第三个分量变成了一个小数,加1给出了真正的尾数。
Recapping the Components
- Sign (first component):
0
for positive,1
for negative - 符号(第一分量):0为正,1为负。
- Exponent (middle component): Subtract 2(# of bits) - 1 - 1 to get the true exponent
- 指数(中分量):减去2(位)- 1 - 1得到真正的指数。
- Mantissa (last component): Divide by 2(# of bits) and add
1
to get the true mantissa - 尾数(最后一个分量):除以2(位),加1得到真正的尾数。
Calculating the Number
Putting all three parts together, we're given this binary number:
把这三个部分放在一起,我们得到了这个二进制数:
1.0010011001100110011001100110011001100110011001100110 x 1011
1.0010011001100110011001100110011001100110011001100110 x 1011
Which we can then convert from binary to decimal:
我们可以把它从二进制转换成小数:
1.1499999999999999 x 23 (inexact!)
1.1499999999999999 x 23(不准确!)
And multiply to reveal the final representation of the number we started with (9.2
) after being stored as a floating point value:
然后相乘,以显示我们开始时的数字的最终表示形式(9.2),它被存储为浮点值:
9.1999999999999993
9.1999999999999993
Representing as a Fraction
9.2
Now that we've built the number, it's possible to reconstruct it into a simple fraction:
既然我们已经建立了这个数字,就有可能把它重构成一个简单的分数:
1.0010011001100110011001100110011001100110011001100110 x 1011
1.0010011001100110011001100110011001100110011001100110 x 1011
Shift mantissa to a whole number:
将mantissa移到一个整数:
10010011001100110011001100110011001100110011001100110 x 1011-110100
10010011001100110011001100110011001100110011001100110 x 1011 - 110100
Convert to decimal:
转换为小数:
5179139571476070 x 23-52
5179139571476070 x 23-52
Subtract the exponent:
减去指数:
5179139571476070 x 2-49
5179139571476070 x 2-49
Turn negative exponent into division:
将负指数转为除法:
5179139571476070 / 249
5179139571476070 / 249
Multiply exponent:
用指数:
5179139571476070 / 562949953421312
5179139571476070 / 562949953421312
Which equals:
等于:
9.1999999999999993
9.1999999999999993
9.5
>>> float_to_bin_parts(9.5)
['0', '10000000010', '0011000000000000000000000000000000000000000000000000']
Already you can see the mantissa is only 4 digits followed by a whole lot of zeroes. But let's go through the paces.
你可以看到尾数只有4个数字后面跟着很多个0。让我们来看看这些步骤。
Assemble the binary scientific notation:
汇编二进制科学符号:
1.0011 x 1011
1.0011 x 1011
Shift the decimal point:
小数点的转变:
10011 x 1011-100
10011 x 1011 - 100
Subtract the exponent:
减去指数:
10011 x 10-1
10011 x - 1
Binary to decimal:
二进制小数:
19 x 2-1
19 x 2 - 1
Negative exponent to division:
负指数部门:
19 / 21
19/21
Multiply exponent:
用指数:
19 / 2
19日/ 2
Equals:
等于:
9.5
9.5
Further reading
- The Floating-Point Guide: What Every Programmer Should Know About Floating-Point Arithmetic, or, Why don’t my numbers add up? (floating-point-gui.de)
- 浮点指南:每个程序员都应该知道浮点算术,或者,为什么我的数字不加起来?(floating-point-gui.de)
- What Every Computer Scientist Should Know About Floating-Point Arithmetic (Goldberg 1991)
- 每个计算机科学家都应该知道浮点算法(Goldberg 1991)
- IEEE Double-precision floating-point format (Wikipedia)
- IEEE双精度浮点格式(Wikipedia)
- Floating Point Arithmetic: Issues and Limitations (docs.python.org)
- 浮点运算:问题和限制(docs.python.org)
- Floating Point Binary
- 浮点数的二进制
#2
22
This isn't a full answer (mhlester already covered a lot of good ground I won't duplicate), but I would like to stress how much the representation of a number depends on the base you are working in.
这并不是一个完整的答案(mhlester已经涵盖了很多我不会重复的好理由),但我想强调的是,一个数字的表示取决于你所工作的基础。
Consider the fraction 2/3
In good-ol' base 10, we typically write it out as something like
在“好ol”10中,我们通常把它写成类似的形式。
- 0.666...
- 0.666……
- 0.666
- 0.666
- 0.667
- 0.667
When we look at those representations, we tend to associate each of them with the fraction 2/3, even though only the first representation is mathematically equal to the fraction. The second and third representations/approximations have an error on the order of 0.001, which is actually much worse than the error between 9.2 and 9.1999999999999993. In fact, the second representation isn't even rounded correctly! Nevertheless, we don't have a problem with 0.666 as an approximation of the number 2/3, so we shouldn't really have a problem with how 9.2 is approximated in most programs. (Yes, in some programs it matters.)
当我们看这些表示时,我们倾向于将它们与分数2/3联系起来,即使只有第一个表示在数学上等于分数。第二个和第三个表示/近似在0.001的顺序上有一个错误,这实际上比9.2和9.19999999999993之间的错误差很多。实际上,第二个表示甚至没有正确地圆!不过,0。666作为2/3的近似值,我们没有问题,所以我们不应该有问题,在大多数程序中,9.2是如何近似的。(是的,在一些项目中它很重要。)
Number bases
So here's where number bases are crutial. If we were trying to represent 2/3 in base 3, then
所以这里的数基是原始的。如果我们要用3 /3来表示2/3。
(2/3)10 = 0.23
(2/3)10 = 0.23
In other words, we have an exact, finite representation for the same number by switching bases! The take-away is that even though you can convert any number to any base, all rational numbers have exact finite representations in some bases but not in others.
换句话说,我们有一个精确的,有限的表示,相同的数字通过交换基地!尽管你可以将任意数字转换为任何基数,但所有的理性数字在某些基础上都有精确的有限表示,而在其他基础上却没有。
To drive this point home, let's look at 1/2. It might surprise you that even though this perfectly simple number has an exact representation in base 10 and 2, it requires a repeating representation in base 3.
要把这个点开回家,让我们看看1/2。你可能会惊讶,即使这个非常简单的数字在基数10和2中有一个精确的表示,它需要在基数3中重复表示。
(1/2)10 = 0.510 = 0.12 = 0.1111...3
(1/2)10 = 0.510 = 0.12 = 0.1111…
Why are floating point numbers inaccurate?
Because often-times, they are approximating rationals that cannot be represented finitely in base 2 (the digits repeat), and in general they are approximating real (possibly irrational) numbers which may not be representable in finitely many digits in any base.
因为通常情况下,它们都是近似的,不能在2(数字重复)中以有限的形式表示,一般来说,它们接近真实(可能是不合理的)数字,而这些数字在任何基础上都不能以有限的数字表示。
#3
6
While all of the other answers are good there is still one thing missing:
虽然所有其他的答案都很好,但仍然有一件事没有做到:
It is impossible to represent irrational numbers (e.g. π, sqrt(2)
, log(3)
, etc.) precisely!
是不可能表示无理数(如π、sqrt(2)、日志(3),等等)精确!
And that actually is why they are called irrational. No amount of bit storage in the world would be enough to hold even one of them. Only symbolic arithmetic is able to preserve their precision.
这就是为什么他们被称为非理性的原因。在这个世界上,任何数量的比特存储都不足以容纳其中的一个。只有符号运算才能保持其精确度。
Although if you would limit your math needs to rational numbers only the problem of precision becomes manageable. You would need to store a pair of (possibly very big) integers a
and b
to hold the number represented by the fraction a/b
. All your arithmetic would have to be done on fractions just like in highschool math (e.g. a/b * c/d = ac/bd
).
尽管如果你把数学的需要限制在合理的数字上,那么精确的问题就变得可控了。您需要存储一对(可能非常大的)整数a和b,以容纳由分数a/b表示的数字。你所有的算术都必须在分数上完成,就像高中数学一样(例如a/b * c/d = ac/bd)。
But of course you would still run into the same kind of trouble when pi
, sqrt
, log
, sin
, etc. are involved.
当然,当pi, sqrt, log, sin等涉及到的时候,你还是会遇到同样的麻烦。
TL;DR
博士TL;
For hardware accelerated arithmetic only a limited amount of rational numbers can be represented. Every not-representable number is approximated. Some numbers (i.e. irrational) can never be represented no matter the system.
对于硬件加速算法,只有有限数量的有理数可以表示。每一个不能表示的数都是近似的。有些数字(即不合理的)永远不能代表任何系统。