浮点计算使用float而不是double来得到不同的结果

时间:2022-06-02 17:08:00

I have the following line of code.

我有以下代码行。

hero->onBeingHit(ENEMY_ATTACK_POINT * (1.0 - hero->getDefensePercent()));
  • void onBeingHit(int decHP) method accepts integer number and updates health points.
  • void onBeingHit(int decHP)方法接受整数并更新健康点。

  • float getDefensePercent() method is a getter method returning the defense percent of a hero.
  • float getDefensePercent()方法是一个getter方法,返回英雄的防御百分比。

  • ENEMY_ATTACK_POINT is a macro constant factor defined as #define ENEMY_ATTACK_POINT 20.
  • ENEMY_ATTACK_POINT是定义为#define ENEMY_ATTACK_POINT 20的宏常量因子。

Let's say hero->getDefensePercent() returns 0.1. So the calculation is

假设hero-> getDefensePercent()返回0.1。所以计算是

20 * (1.0 - 0.1)  =  20 * (0.9)  =  18

Whenever I tried it with the following code (no f appending 1.0)

每当我使用以下代码尝试它时(没有附加1.0)

hero->onBeingHit(ENEMY_ATTACK_POINT * (1.0 - hero->getDefensePercent()));

I got 17.

我17岁了。

But for the following code (f appended after 1.0)

但对于以下代码(f后附加f)

hero->onBeingHit(ENEMY_ATTACK_POINT * (1.0f - hero->getDefensePercent()));

I got 18.

我18岁了。

What's going on? Is f significant to have at all although hero->getDefensePercent() is already in float?

这是怎么回事?尽管英雄 - > getDefensePercent()已经在浮动中,但是有意义吗?

3 个解决方案

#1


10  

What's going on? Why isn't the integer result 18 in both cases?

这是怎么回事?两种情况下为什么不是整数结果18?

The problem is that the result of the floating point expression is rounded towards zero when being converted to an integer value (in both cases).

问题是浮点表达式的结果在转换为整数值时(在两种情况下)都舍入为零。

0.1 can't be represented exactly as a floating point value (in both cases). The compiler does the conversion to a binary IEEE754 floating point number and decides whether to round up or down to a representable value. The processor then multiplies this value during runtime and the result is rounded to get an integer value.

0.1不能精确表示为浮点值(在两种情况下)。编译器将转换为二进制IEEE754浮点数,并决定是向上舍入还是向下舍入为可表示的值。然后,处理器在运行时将该值相乘,并对结果进行舍入以获得整数值。

Ok, but since both double and float behave like that, why do I get 18 in one of the two cases, but 17 in the other case? I'm confused.

好吧,但由于double和float都表现得那样,为什么我在两个案例中的一个中得到18,而在另一个案例中得到17?我很困惑。

Your code takes the result of the function, 0.1f (a float), and then calculates 20 * (1.0 - 0.1f) which is a double expression, while 20 * (1.0f - 0.1f) is a float expression. Now the float version happens to be slightly larger than 18.0 and gets rounded down to 18, while the double expression is slightly less than 18.0 and gets rounded down to 17.

您的代码获取函数的结果,0.1f(浮点数),然后计算20 *(1.0 - 0.1f)这是一个双重表达式,而20 *(1.0f - 0.1f)是一个浮点表达式。现在float版本恰好略大于18.0并向下舍入到18,而double表达式略小于18.0并向下舍入到17。

If you don't know exactly how IEEE754 binary floating point numbers are constructed from decimal numbers, it's pretty much random if it will be slightly less or slightly greater than the decimal number you've entered in your code. So you shouldn't count on this. Don't try to fix such an issue by appending f to one of the numbers and say "now it works, so I leave this f there", because another value behaves differently again.

如果你不确切地知道IEEE754二进制浮点数是如何用十进制数构造的,那么它几乎是随机的,如果它比你在代码中输入的十进制数稍微或略大一些。所以你不应该指望这一点。不要尝试通过将f添加到其中一个数字来解决这个问题,然后说“现在它可以工作,所以我把它留在那里”,因为另一个值的行为会有所不同。

Why depends the type of the expression on the precence of this f?

为什么依赖于表达式的类型在这个f的先例?

This is because a floating point literal in C and C++ is of type double per default. If you add the f, it's a float. The result of a floating point epxression is of the "greater" type. The result of a double expression and an integer is still a double expression as well as int and float will be a float. So the result of your expression is either a float or a double.

这是因为C和C ++中的浮点文字默认为double类型。如果你添加f,它就是一个浮点数。浮点epxression的结果是“更大”类型。 double表达式和整数的结果仍然是double表达式,int和float也是float。因此表达式的结果是float或double。

Ok, but I don't want to round to zero. I want to round to the nearest number.

好的,但我不想舍入到零。我想四舍五入到最接近的数字。

To fix this issue, add one half to the result before converting it to an integer:

要解决此问题,请在将结果转换为整数之前将一半添加到结果中:

hero->onBeingHit(ENEMY_ATTACK_POINT * (1.0 - hero->getDefensePercent()) + 0.5);

In C++11, there is std::round() for that. In previous versions of the standard, there was no such function to round to the nearest integer. (Please see comments for details.)

在C ++ 11中,有std :: round()。在该标准的先前版本中,没有这样的函数可以舍入到最接近的整数。 (详情请见评论。)

If you don't have std::round, you can write it yourself. Take care when dealing with negative numbers. When converting to an integer, the number will be truncated (rounded towards zero), which means that negative values will be rounded up, not down. So we have to subtract one half if the number is negative:

如果你没有std :: round,你可以自己编写。处理负数时要小心。转换为整数时,数字将被截断(向零舍入),这意味着负值将向上舍入而不是向下舍入。因此,如果数字是负数,我们必须减去一半:

int round(double x) {
    return (x < 0.0) ? (x - .5) : (x + .5);
}

#2


4  

1.0 is interpreted as a double, as opposed to 1.0f which is seen by the compiler as a float.

1.0被解释为double,而不是1.0f,编译器将其视为float。

The f suffix simply tells the compiler which is a float and which is a double.

f后缀只是告诉编译器哪个是float,哪个是double。

As the name implies, a double has 2x the precision of float. In general a double has 15 to 16 decimal digits of precision, while float only has 7.

顾名思义,double的精度是float的2倍。通常,double有15到16个十进制数字的精度,而float只有7。

This precision loss could lead to truncation errors much easier to float up

这种精度损失可能导致截断错误更容易浮动

See MSDN (C++)

请参阅MSDN(C ++)

#3


4  

The reason why is this happening is more precise result when using double, i.e. 1.0.

发生这种情况的原因是使用double时的更精确的结果,即1.0。

Try to round your result, which will lead to more precise integral result after conversion:

尝试对结果进行舍入,这将导致转换后更精确的积分结果:

hero->onBeingHit(ENEMY_ATTACK_POINT * (1.0 - hero->getDefensePercent()) + 0.5);

Note that adding 0.5 and truncating to int right after it will cause rounding of the result, so by the time your result would be 17.999..., it will become 18.499..., which will be truncated to 18

注意,在它之后添加0.5并截断到int将导致结果舍入,所以当你的结果为17.999时,它将变为18.499 ...,这将被截断为18

#1


10  

What's going on? Why isn't the integer result 18 in both cases?

这是怎么回事?两种情况下为什么不是整数结果18?

The problem is that the result of the floating point expression is rounded towards zero when being converted to an integer value (in both cases).

问题是浮点表达式的结果在转换为整数值时(在两种情况下)都舍入为零。

0.1 can't be represented exactly as a floating point value (in both cases). The compiler does the conversion to a binary IEEE754 floating point number and decides whether to round up or down to a representable value. The processor then multiplies this value during runtime and the result is rounded to get an integer value.

0.1不能精确表示为浮点值(在两种情况下)。编译器将转换为二进制IEEE754浮点数,并决定是向上舍入还是向下舍入为可表示的值。然后,处理器在运行时将该值相乘,并对结果进行舍入以获得整数值。

Ok, but since both double and float behave like that, why do I get 18 in one of the two cases, but 17 in the other case? I'm confused.

好吧,但由于double和float都表现得那样,为什么我在两个案例中的一个中得到18,而在另一个案例中得到17?我很困惑。

Your code takes the result of the function, 0.1f (a float), and then calculates 20 * (1.0 - 0.1f) which is a double expression, while 20 * (1.0f - 0.1f) is a float expression. Now the float version happens to be slightly larger than 18.0 and gets rounded down to 18, while the double expression is slightly less than 18.0 and gets rounded down to 17.

您的代码获取函数的结果,0.1f(浮点数),然后计算20 *(1.0 - 0.1f)这是一个双重表达式,而20 *(1.0f - 0.1f)是一个浮点表达式。现在float版本恰好略大于18.0并向下舍入到18,而double表达式略小于18.0并向下舍入到17。

If you don't know exactly how IEEE754 binary floating point numbers are constructed from decimal numbers, it's pretty much random if it will be slightly less or slightly greater than the decimal number you've entered in your code. So you shouldn't count on this. Don't try to fix such an issue by appending f to one of the numbers and say "now it works, so I leave this f there", because another value behaves differently again.

如果你不确切地知道IEEE754二进制浮点数是如何用十进制数构造的,那么它几乎是随机的,如果它比你在代码中输入的十进制数稍微或略大一些。所以你不应该指望这一点。不要尝试通过将f添加到其中一个数字来解决这个问题,然后说“现在它可以工作,所以我把它留在那里”,因为另一个值的行为会有所不同。

Why depends the type of the expression on the precence of this f?

为什么依赖于表达式的类型在这个f的先例?

This is because a floating point literal in C and C++ is of type double per default. If you add the f, it's a float. The result of a floating point epxression is of the "greater" type. The result of a double expression and an integer is still a double expression as well as int and float will be a float. So the result of your expression is either a float or a double.

这是因为C和C ++中的浮点文字默认为double类型。如果你添加f,它就是一个浮点数。浮点epxression的结果是“更大”类型。 double表达式和整数的结果仍然是double表达式,int和float也是float。因此表达式的结果是float或double。

Ok, but I don't want to round to zero. I want to round to the nearest number.

好的,但我不想舍入到零。我想四舍五入到最接近的数字。

To fix this issue, add one half to the result before converting it to an integer:

要解决此问题,请在将结果转换为整数之前将一半添加到结果中:

hero->onBeingHit(ENEMY_ATTACK_POINT * (1.0 - hero->getDefensePercent()) + 0.5);

In C++11, there is std::round() for that. In previous versions of the standard, there was no such function to round to the nearest integer. (Please see comments for details.)

在C ++ 11中,有std :: round()。在该标准的先前版本中,没有这样的函数可以舍入到最接近的整数。 (详情请见评论。)

If you don't have std::round, you can write it yourself. Take care when dealing with negative numbers. When converting to an integer, the number will be truncated (rounded towards zero), which means that negative values will be rounded up, not down. So we have to subtract one half if the number is negative:

如果你没有std :: round,你可以自己编写。处理负数时要小心。转换为整数时,数字将被截断(向零舍入),这意味着负值将向上舍入而不是向下舍入。因此,如果数字是负数,我们必须减去一半:

int round(double x) {
    return (x < 0.0) ? (x - .5) : (x + .5);
}

#2


4  

1.0 is interpreted as a double, as opposed to 1.0f which is seen by the compiler as a float.

1.0被解释为double,而不是1.0f,编译器将其视为float。

The f suffix simply tells the compiler which is a float and which is a double.

f后缀只是告诉编译器哪个是float,哪个是double。

As the name implies, a double has 2x the precision of float. In general a double has 15 to 16 decimal digits of precision, while float only has 7.

顾名思义,double的精度是float的2倍。通常,double有15到16个十进制数字的精度,而float只有7。

This precision loss could lead to truncation errors much easier to float up

这种精度损失可能导致截断错误更容易浮动

See MSDN (C++)

请参阅MSDN(C ++)

#3


4  

The reason why is this happening is more precise result when using double, i.e. 1.0.

发生这种情况的原因是使用double时的更精确的结果,即1.0。

Try to round your result, which will lead to more precise integral result after conversion:

尝试对结果进行舍入,这将导致转换后更精确的积分结果:

hero->onBeingHit(ENEMY_ATTACK_POINT * (1.0 - hero->getDefensePercent()) + 0.5);

Note that adding 0.5 and truncating to int right after it will cause rounding of the result, so by the time your result would be 17.999..., it will become 18.499..., which will be truncated to 18

注意,在它之后添加0.5并截断到int将导致结果舍入,所以当你的结果为17.999时,它将变为18.499 ...,这将被截断为18