
时间:2021-09-20 16:30:13

In my numerical simulation I have code similar to the following snippet


double x;
do {
  x = /* some computation */;
} while (x <= 0.0);
/* some algorithm that requires x to be (precisely) larger than 0 */

With certain compilers (e.g. gcc) on certain platforms (e.g. linux, x87 math) it is possible that x is computed in higher than double precision ("with excess precision"). (Update: When I talk of precision here, I mean precision /and/ range.) Under these circumstances it is conceivable that the comparison (x <= 0) returns false even though the next time x is rounded down to double precision it becomes 0. (And there's no guarantee that x isn't rounded down at an arbitrary point in time.)

对于某些平台上的某些编译器(例如gcc)(例如linux,x87 math),有可能以高于双精度(“具有过度精度”)的方式计算x。 (更新:当我在这里谈到精度时,我指的是精度/和/范围。)在这些情况下,可以想象比较(x <= 0)返回false,即使下一次x向下舍入到双精度它变成0.(并且无法保证x不会在任意时间点向下舍入。)

Is there any way to perform this comparison that


  • is portable,
  • works in code that gets inlined,
  • 适用于内联的代码,

  • has no performance impact and
  • 没有性能影响

  • doesn't exclude some arbitrary range (0, eps)?
  • 不排除某些任意范围(0,eps)?

I tried to use (x < std::numeric_limits<double>::denorm_min()) but that seemed to significantly slow down the loop when working with SSE2 math. (I know that denormals can slow down a computation, but I didn't expect them to be slower to just move around and compare.)

我尝试使用(x :: denorm_min())但这在使用SSE2数学时似乎显着减慢了循环。 (我知道非正规可以减慢计算速度,但我没想到它们只是移动并比较慢。)

Update: An alternative is to use volatile to force x into memory before the comparison, e.g. by writing


} while (*((volatile double*)&x) <= 0.0);

However, depending on the application and the optimizations applied by the compiler, this solution can introduce a noticeable overhead too.


Update: The problem with any tolerance is that it's quite arbitrary, i.e. it depends on the specific application or context. I'd prefer to just do the comparison without excess precision, so that I don't have to make any additional assumptions or introduce some arbitrary epsilons into the documentation of my library functions.


5 个解决方案



As Arkadiy stated in the comments, an explicit cast ((double)x) <= 0.0 should work - at least according to the standard.

正如Arkadiy在评论中所说,显式演员((双)x)<= 0.0应该有效 - 至少根据标准。

C99:TC3, §8:


Except for assignment and cast (which remove all extra range and precision), the values of operations with floating operands and values subject to the usual arithmetic conversions and of floating constants are evaluated to a format whose range and precision may be greater than required by the type. [...]

除了赋值和强制转换(删除所有额外的范围和精度)之外,具有浮动操作数的操作值和通常算术转换以及浮动常量的值将被评估为其范围和精度可能大于类型。 [...]

If you're using GCC on x86, you can use the flags -mpc32, -mpc64 and -mpc80 to set the precision of floating-point operations to single, double and extended double precision.




In your question, you stated that using volatile will work but that there'll be a huge performance hit. What about using the volatile variable only during the comparison, allowing x to be held in a register?


double x; /* might have excess precision */
volatile double x_dbl; /* guaranteed to be double precision */
do {
  x = /* some computation */;
  x_dbl = x;
} while (x_dbl <= 0.0);

You should also check if you can speed up the comparison with the smallest subnormal value by using long double explicitly and cache this value, ie

您还应该检查是否可以通过明确使用long double来加速与最小次正规值的比较并缓存此值,即

const long double dbl_denorm_min = static_cast<long double>(std::numeric_limits<double>::denorm_min());

and then compare


x < dbl_denorm_min

I'd assume that a decent compiler would do this automatically, but one never knows...




I wonder whether you have the right stopping criterion. It sounds like x <= 0 is an exception condition, but not a terminating condition and that the terminating condition is easier to satisfy. Maybe there should be a break statement inside your while loop that stops the iteration when some tolerance is met. For example, a lot of algorithm terminate when two successive iterations are sufficiently close to each other.

我想知道你是否有正确的停止标准。听起来x <= 0是一个异常条件,但不是终止条件,并且终止条件更容易满足。也许在你的while循环中应该有一个break语句,当满足一些容差时会停止迭代。例如,当两个连续迭代彼此足够接近时,许多算法终止。



Well, GCC has a flag, -fexcess-precision which causes the problem you are discussing. It also has a flag, -ffloat-store , which solves the problem you are discussing.


"Do not store floating point variables in registers. This pre-vents undesirable excess precision on machines such as the 68000 where the floating registers (of the 68881) keep more precision than a double is supposed to have."


I doubt that solution has no performance impact, but the impact is probably not overly expensive. Random googling suggests it costs about 20%. Actually, I don't think there is a solution which is both portable and has no performance impact, since forcing a chip to not use excess precision is often going to involve some non-free operation. However, this is probably the solution you want.




Be sure to make that check an absolute value. It needs to be an epsilon around zero, above and below.




As Arkadiy stated in the comments, an explicit cast ((double)x) <= 0.0 should work - at least according to the standard.

正如Arkadiy在评论中所说,显式演员((双)x)<= 0.0应该有效 - 至少根据标准。

C99:TC3, §8:


Except for assignment and cast (which remove all extra range and precision), the values of operations with floating operands and values subject to the usual arithmetic conversions and of floating constants are evaluated to a format whose range and precision may be greater than required by the type. [...]

除了赋值和强制转换(删除所有额外的范围和精度)之外,具有浮动操作数的操作值和通常算术转换以及浮动常量的值将被评估为其范围和精度可能大于类型。 [...]

If you're using GCC on x86, you can use the flags -mpc32, -mpc64 and -mpc80 to set the precision of floating-point operations to single, double and extended double precision.




In your question, you stated that using volatile will work but that there'll be a huge performance hit. What about using the volatile variable only during the comparison, allowing x to be held in a register?


double x; /* might have excess precision */
volatile double x_dbl; /* guaranteed to be double precision */
do {
  x = /* some computation */;
  x_dbl = x;
} while (x_dbl <= 0.0);

You should also check if you can speed up the comparison with the smallest subnormal value by using long double explicitly and cache this value, ie

您还应该检查是否可以通过明确使用long double来加速与最小次正规值的比较并缓存此值,即

const long double dbl_denorm_min = static_cast<long double>(std::numeric_limits<double>::denorm_min());

and then compare


x < dbl_denorm_min

I'd assume that a decent compiler would do this automatically, but one never knows...




I wonder whether you have the right stopping criterion. It sounds like x <= 0 is an exception condition, but not a terminating condition and that the terminating condition is easier to satisfy. Maybe there should be a break statement inside your while loop that stops the iteration when some tolerance is met. For example, a lot of algorithm terminate when two successive iterations are sufficiently close to each other.

我想知道你是否有正确的停止标准。听起来x <= 0是一个异常条件,但不是终止条件,并且终止条件更容易满足。也许在你的while循环中应该有一个break语句,当满足一些容差时会停止迭代。例如,当两个连续迭代彼此足够接近时,许多算法终止。



Well, GCC has a flag, -fexcess-precision which causes the problem you are discussing. It also has a flag, -ffloat-store , which solves the problem you are discussing.


"Do not store floating point variables in registers. This pre-vents undesirable excess precision on machines such as the 68000 where the floating registers (of the 68881) keep more precision than a double is supposed to have."


I doubt that solution has no performance impact, but the impact is probably not overly expensive. Random googling suggests it costs about 20%. Actually, I don't think there is a solution which is both portable and has no performance impact, since forcing a chip to not use excess precision is often going to involve some non-free operation. However, this is probably the solution you want.




Be sure to make that check an absolute value. It needs to be an epsilon around zero, above and below.
