如何处理浮点计算中的过度精度?

时间:2021-09-20 16:30:13

In my numerical simulation I have code similar to the following snippet

在我的数值模拟中,我的代码类似于以下代码段

double x;
do {
  x = /* some computation */;
} while (x <= 0.0);
/* some algorithm that requires x to be (precisely) larger than 0 */

With certain compilers (e.g. gcc) on certain platforms (e.g. linux, x87 math) it is possible that x is computed in higher than double precision ("with excess precision"). (Update: When I talk of precision here, I mean precision /and/ range.) Under these circumstances it is conceivable that the comparison (x <= 0) returns false even though the next time x is rounded down to double precision it becomes 0. (And there's no guarantee that x isn't rounded down at an arbitrary point in time.)

对于某些平台上的某些编译器(例如gcc)(例如linux,x87 math),有可能以高于双精度(“具有过度精度”)的方式计算x。 (更新:当我在这里谈到精度时,我指的是精度/和/范围。)在这些情况下,可以想象比较(x <= 0)返回false,即使下一次x向下舍入到双精度它变成0.(并且无法保证x不会在任意时间点向下舍入。)

Is there any way to perform this comparison that

有没有办法进行这种比较

  • is portable,
  • works in code that gets inlined,
  • 适用于内联的代码,

  • has no performance impact and
  • 没有性能影响

  • doesn't exclude some arbitrary range (0, eps)?
  • 不排除某些任意范围(0,eps)?

I tried to use (x < std::numeric_limits<double>::denorm_min()) but that seemed to significantly slow down the loop when working with SSE2 math. (I know that denormals can slow down a computation, but I didn't expect them to be slower to just move around and compare.)

我尝试使用(x :: denorm_min())但这在使用SSE2数学时似乎显着减慢了循环。 (我知道非正规可以减慢计算速度,但我没想到它们只是移动并比较慢。)

Update: An alternative is to use volatile to force x into memory before the comparison, e.g. by writing

更新:另一种方法是在比较之前使用volatile强制x进入内存,例如:通过写作

} while (*((volatile double*)&x) <= 0.0);

However, depending on the application and the optimizations applied by the compiler, this solution can introduce a noticeable overhead too.

但是,根据应用程序和编译器应用的优化,此解决方案也会引入明显的开销。

Update: The problem with any tolerance is that it's quite arbitrary, i.e. it depends on the specific application or context. I'd prefer to just do the comparison without excess precision, so that I don't have to make any additional assumptions or introduce some arbitrary epsilons into the documentation of my library functions.

更新:任何容忍的问题是它非常随意,即它取决于具体的应用程序或上下文。我更愿意在没有过多精度的情况下进行比较,这样我就不必做任何额外的假设或在我的库函数的文档中引入一些任意的epsilons。

5 个解决方案

#1


6  

As Arkadiy stated in the comments, an explicit cast ((double)x) <= 0.0 should work - at least according to the standard.

正如Arkadiy在评论中所说,显式演员((双)x)<= 0.0应该有效 - 至少根据标准。

C99:TC3, 5.2.4.2.2 §8:

C99:TC3,5.2.4.2.2§8:

Except for assignment and cast (which remove all extra range and precision), the values of operations with floating operands and values subject to the usual arithmetic conversions and of floating constants are evaluated to a format whose range and precision may be greater than required by the type. [...]

除了赋值和强制转换(删除所有额外的范围和精度)之外,具有浮动操作数的操作值和通常算术转换以及浮动常量的值将被评估为其范围和精度可能大于类型。 [...]


If you're using GCC on x86, you can use the flags -mpc32, -mpc64 and -mpc80 to set the precision of floating-point operations to single, double and extended double precision.

如果在x86上使用GCC,则可以使用标志-mpc32,-mpc64和-mpc80将浮点运算的精度设置为单精度,双精度和双精度。

#2


2  

In your question, you stated that using volatile will work but that there'll be a huge performance hit. What about using the volatile variable only during the comparison, allowing x to be held in a register?

在你的问题中,你说过使用volatile会起作用,但是会有很大的性能损失。如何在比较期间使用volatile变量,允许x保存在寄存器中?

double x; /* might have excess precision */
volatile double x_dbl; /* guaranteed to be double precision */
do {
  x = /* some computation */;
  x_dbl = x;
} while (x_dbl <= 0.0);

You should also check if you can speed up the comparison with the smallest subnormal value by using long double explicitly and cache this value, ie

您还应该检查是否可以通过明确使用long double来加速与最小次正规值的比较并缓存此值,即

const long double dbl_denorm_min = static_cast<long double>(std::numeric_limits<double>::denorm_min());

and then compare

然后比较

x < dbl_denorm_min

I'd assume that a decent compiler would do this automatically, but one never knows...

我假设一个体面的编译器会自动执行此操作,但是人们永远不会知道......

#3


1  

I wonder whether you have the right stopping criterion. It sounds like x <= 0 is an exception condition, but not a terminating condition and that the terminating condition is easier to satisfy. Maybe there should be a break statement inside your while loop that stops the iteration when some tolerance is met. For example, a lot of algorithm terminate when two successive iterations are sufficiently close to each other.

我想知道你是否有正确的停止标准。听起来x <= 0是一个异常条件,但不是终止条件,并且终止条件更容易满足。也许在你的while循环中应该有一个break语句,当满足一些容差时会停止迭代。例如,当两个连续迭代彼此足够接近时,许多算法终止。

#4


0  

Well, GCC has a flag, -fexcess-precision which causes the problem you are discussing. It also has a flag, -ffloat-store , which solves the problem you are discussing.

好吧,GCC有一个标志,-fexcess-precision导致你正在讨论的问题。它还有一个标志,-ffloat-store,它解决了你正在讨论的问题。

"Do not store floating point variables in registers. This pre-vents undesirable excess precision on machines such as the 68000 where the floating registers (of the 68881) keep more precision than a double is supposed to have."

“不要将浮点变量存储在寄存器中。这样可以避免在机器上出现不必要的过高精度,例如68000浮动寄存器(68881)的精度高于双倍应该具有的精度。”

I doubt that solution has no performance impact, but the impact is probably not overly expensive. Random googling suggests it costs about 20%. Actually, I don't think there is a solution which is both portable and has no performance impact, since forcing a chip to not use excess precision is often going to involve some non-free operation. However, this is probably the solution you want.

我怀疑该解决方案没有性能影响,但影响可能不是太昂贵。随机谷歌搜索表明它的成本约为20%。实际上,我认为没有一种兼顾便携性并且没有性能影响的解决方案,因为迫使芯片不使用过多的精度通常会涉及一些非*操作。但是,这可能是您想要的解决方案。

#5


0  

Be sure to make that check an absolute value. It needs to be an epsilon around zero, above and below.

一定要检查绝对值。它需要是一个大约零,上下的epsilon。

#1


6  

As Arkadiy stated in the comments, an explicit cast ((double)x) <= 0.0 should work - at least according to the standard.

正如Arkadiy在评论中所说,显式演员((双)x)<= 0.0应该有效 - 至少根据标准。

C99:TC3, 5.2.4.2.2 §8:

C99:TC3,5.2.4.2.2§8:

Except for assignment and cast (which remove all extra range and precision), the values of operations with floating operands and values subject to the usual arithmetic conversions and of floating constants are evaluated to a format whose range and precision may be greater than required by the type. [...]

除了赋值和强制转换(删除所有额外的范围和精度)之外,具有浮动操作数的操作值和通常算术转换以及浮动常量的值将被评估为其范围和精度可能大于类型。 [...]


If you're using GCC on x86, you can use the flags -mpc32, -mpc64 and -mpc80 to set the precision of floating-point operations to single, double and extended double precision.

如果在x86上使用GCC,则可以使用标志-mpc32,-mpc64和-mpc80将浮点运算的精度设置为单精度,双精度和双精度。

#2


2  

In your question, you stated that using volatile will work but that there'll be a huge performance hit. What about using the volatile variable only during the comparison, allowing x to be held in a register?

在你的问题中,你说过使用volatile会起作用,但是会有很大的性能损失。如何在比较期间使用volatile变量,允许x保存在寄存器中?

double x; /* might have excess precision */
volatile double x_dbl; /* guaranteed to be double precision */
do {
  x = /* some computation */;
  x_dbl = x;
} while (x_dbl <= 0.0);

You should also check if you can speed up the comparison with the smallest subnormal value by using long double explicitly and cache this value, ie

您还应该检查是否可以通过明确使用long double来加速与最小次正规值的比较并缓存此值,即

const long double dbl_denorm_min = static_cast<long double>(std::numeric_limits<double>::denorm_min());

and then compare

然后比较

x < dbl_denorm_min

I'd assume that a decent compiler would do this automatically, but one never knows...

我假设一个体面的编译器会自动执行此操作,但是人们永远不会知道......

#3


1  

I wonder whether you have the right stopping criterion. It sounds like x <= 0 is an exception condition, but not a terminating condition and that the terminating condition is easier to satisfy. Maybe there should be a break statement inside your while loop that stops the iteration when some tolerance is met. For example, a lot of algorithm terminate when two successive iterations are sufficiently close to each other.

我想知道你是否有正确的停止标准。听起来x <= 0是一个异常条件,但不是终止条件,并且终止条件更容易满足。也许在你的while循环中应该有一个break语句,当满足一些容差时会停止迭代。例如,当两个连续迭代彼此足够接近时,许多算法终止。

#4


0  

Well, GCC has a flag, -fexcess-precision which causes the problem you are discussing. It also has a flag, -ffloat-store , which solves the problem you are discussing.

好吧,GCC有一个标志,-fexcess-precision导致你正在讨论的问题。它还有一个标志,-ffloat-store,它解决了你正在讨论的问题。

"Do not store floating point variables in registers. This pre-vents undesirable excess precision on machines such as the 68000 where the floating registers (of the 68881) keep more precision than a double is supposed to have."

“不要将浮点变量存储在寄存器中。这样可以避免在机器上出现不必要的过高精度,例如68000浮动寄存器(68881)的精度高于双倍应该具有的精度。”

I doubt that solution has no performance impact, but the impact is probably not overly expensive. Random googling suggests it costs about 20%. Actually, I don't think there is a solution which is both portable and has no performance impact, since forcing a chip to not use excess precision is often going to involve some non-free operation. However, this is probably the solution you want.

我怀疑该解决方案没有性能影响,但影响可能不是太昂贵。随机谷歌搜索表明它的成本约为20%。实际上,我认为没有一种兼顾便携性并且没有性能影响的解决方案,因为迫使芯片不使用过多的精度通常会涉及一些非*操作。但是,这可能是您想要的解决方案。

#5


0  

Be sure to make that check an absolute value. It needs to be an epsilon around zero, above and below.

一定要检查绝对值。它需要是一个大约零,上下的epsilon。