I'm reading CS:APP, and regarding casts it says that when casting from int to float, the number cannot overflow, but it may be rounded.
我正在阅读CS:APP,关于强制转换,它说当从int转换为float时,数字不能溢出,但它可能是舍入的。
It seemed odd to me as I didn't know what there was to round, so I've tried it out. I thought that this would only be the case for very large integers (near INT_MAX
/INT_MIN
), but rounding happens at values around a hundred million as well. (Not sure where exactly this happens first).
这对我来说似乎很奇怪,因为我不知道有什么回合,所以我已经尝试过了。我认为这只适用于非常大的整数(接近INT_MAX / INT_MIN),但是舍入也发生在大约一亿的值上。 (不确定这首先发生在哪里)。
Why does this happen? The range of float
far exceeds that of int
. One might say that floating point numbers cannot be represented exactly, but when converting from int
to double
there is no change in value. The advantage of double
over float
is that it has greater range and precision. But float
still has enough range to "encapsulate" integers, and precision shouldn't really matter as integers have no decimal places (well, all 0), or am I thinking wrong?
为什么会这样?浮动范围远远超过int的范围。有人可能会说浮点数不能准确表示,但是当从int转换为double时,值没有变化。双浮法的优点是它具有更大的范围和精度。但是floatstill有足够的范围来“封装”整数,并且精度不应该真正重要,因为整数没有小数位(好吧,全部为0),或者我认为错了?
Here's some output that I got (here is the code: http://pastebin.com/K3E3A6Ni):
这是我得到的一些输出(这里是代码:http://pastebin.com/K3E3A6Ni):
FLT_MAX = 340282346638528859811704183484516925440.000000
INT_MAX = 2147483647
(float)INT_MAX = 2147483648.000000
(double)INT_MAX = 2147483647.000000
INT_MIN = -2147483648
(float)INT_MIN = -2147483648.000000
====other values close to INT_MIN INT_MAX====
INT_MAX-1 = 2147483646
(float)INT_MAX-1 = 2147483648.000000
INT_MIN+1 = -2147483647
(float)INT_MIN+1 = -2147483648.000000
INT_MAX-2 = 2147483645
(float)INT_MAX-2 = 2147483648.000000
INT_MAX-10 = 2147483637
(float)INT_MAX-10 = 2147483648.000000
INT_MAX-100 = 2147483547
(float)INT_MAX-100 = 2147483520.000000
INT_MAX-1000 = 2147482647
(float)INT_MAX-1000 = 2147482624.000000
(float)1.234.567.809 = 1234567808.000000
(float)1.234.567.800 = 1234567808.000000
(float)1.000.000.005 = 1000000000.000000
(float)800.000.003 = 800000000.000000
(float)500.000.007 = 500000000.000000
(float)100.000.009 = 100000008.000000
2 个解决方案
#1
9
I'm assuming that by float
you mean a 32-bit IEEE-754 binary floating point value, by double
you mean a 64-bit IEEE-754 binary floating point value, and by int
you mean a 32-bit integer.
我假设浮点数表示32位IEEE-754二进制浮点值,加倍表示64位IEEE-754二进制浮点值,而int表示32位整数。
Why does this happen? The range of float far exceeds that of int
为什么会这样?浮动范围远远超过int的范围
Yes, but the precision of float
is only 7-9 decimal digits. To be more specific, the significand is only 24 bits wide... so if you're trying to store 32 bits of information in there, you're going to have problems.
是的,但浮点数的精度只有7-9位十进制数。更具体地说,有效数字只有24位宽...所以如果你试图在那里存储32位信息,你就会遇到问题。
but when converting from
int
todouble
there is no change in value但是当从int转换为double时,值没有变化
Sure, because a double
has a 53-bit significand - plenty of room for a 32-bit integer there!
当然,因为double有一个53位有效数字 - 那里有32位整数的空间!
To think of it another way, the gap between consecutive int
values is always 1... whereas the gap between consecutive float
values starts very, very small... but increases as the magnitude of the value increases. It gets to "more than 2" well before you hit the limit of int
... so you get to the stage where not every int
can be exactly represented.
换一种方式来看,连续的int值之间的差距总是为1 ......而连续浮点值之间的差距开始非常非常小......但随着值的大小增加而增加。在你达到int的极限之前它已经达到“超过2”......所以你进入了不能准确表示每个int的阶段。
To think of it another way, you can simply use the pigeon-hole principle... even ignoring NaN values, there can be at most 232float
values, and at least one of those is not the exact value of an int
- take 0.5, for example. There are 232int
values, therefore at least one int
value doesn't have an exact float
representation.
再想一想,你可以简单地使用鸽子洞原则...即使忽略NaN值,最多可以有232个浮点值,其中至少有一个不是int的精确值 - 取0.5 , 例如。有232个int值,因此至少有一个int值没有精确的float表示。
#2
7
A typical float
that is implemented with the 32-bit IEEE-754 representation has only 24 bits for the significand, which allows for about 7 decimal digits of precision. So you'll see rounding as soon as you hit the millions (224 ≈ 16M).
使用32位IEEE-754表示法实现的典型浮点只有24位有效数字,允许大约7位十进制数字的精度。因此,一旦达到数百万(224≈16M),你就会看到四舍五入。
(For a double
, the significand has 53 bits, and 253 ≈ 9×1015.)
(对于double,有效数字有53位,253≈9×1015。)
#1
9
I'm assuming that by float
you mean a 32-bit IEEE-754 binary floating point value, by double
you mean a 64-bit IEEE-754 binary floating point value, and by int
you mean a 32-bit integer.
我假设浮点数表示32位IEEE-754二进制浮点值,加倍表示64位IEEE-754二进制浮点值,而int表示32位整数。
Why does this happen? The range of float far exceeds that of int
为什么会这样?浮动范围远远超过int的范围
Yes, but the precision of float
is only 7-9 decimal digits. To be more specific, the significand is only 24 bits wide... so if you're trying to store 32 bits of information in there, you're going to have problems.
是的,但浮点数的精度只有7-9位十进制数。更具体地说,有效数字只有24位宽...所以如果你试图在那里存储32位信息,你就会遇到问题。
but when converting from
int
todouble
there is no change in value但是当从int转换为double时,值没有变化
Sure, because a double
has a 53-bit significand - plenty of room for a 32-bit integer there!
当然,因为double有一个53位有效数字 - 那里有32位整数的空间!
To think of it another way, the gap between consecutive int
values is always 1... whereas the gap between consecutive float
values starts very, very small... but increases as the magnitude of the value increases. It gets to "more than 2" well before you hit the limit of int
... so you get to the stage where not every int
can be exactly represented.
换一种方式来看,连续的int值之间的差距总是为1 ......而连续浮点值之间的差距开始非常非常小......但随着值的大小增加而增加。在你达到int的极限之前它已经达到“超过2”......所以你进入了不能准确表示每个int的阶段。
To think of it another way, you can simply use the pigeon-hole principle... even ignoring NaN values, there can be at most 232float
values, and at least one of those is not the exact value of an int
- take 0.5, for example. There are 232int
values, therefore at least one int
value doesn't have an exact float
representation.
再想一想,你可以简单地使用鸽子洞原则...即使忽略NaN值,最多可以有232个浮点值,其中至少有一个不是int的精确值 - 取0.5 , 例如。有232个int值,因此至少有一个int值没有精确的float表示。
#2
7
A typical float
that is implemented with the 32-bit IEEE-754 representation has only 24 bits for the significand, which allows for about 7 decimal digits of precision. So you'll see rounding as soon as you hit the millions (224 ≈ 16M).
使用32位IEEE-754表示法实现的典型浮点只有24位有效数字,允许大约7位十进制数字的精度。因此,一旦达到数百万(224≈16M),你就会看到四舍五入。
(For a double
, the significand has 53 bits, and 253 ≈ 9×1015.)
(对于double,有效数字有53位,253≈9×1015。)