I understand gcc's --ffast-math
flag can greatly increase speed for float ops, and goes outside of IEEE standards, but I can't seem to find information on what is really happening when it's on. Can anyone please explain some of the details and maybe give a clear example of how something would change if the flag was on or off?
我理解gcc的——ffast-math标志可以大大提高浮点操作的速度,并且超出了IEEE标准,但是我似乎无法找到关于它的实际情况的信息。谁能解释一下其中的一些细节,并给出一个明确的例子,说明如果国旗是开着还是关着,会有什么变化?
I did try digging through S.O. for similar questions but couldn't find anything explaining the workings of ffast-math.
我确实尝试过在S.O.中挖掘类似的问题,但却找不到任何解释ffastmath的方法。
2 个解决方案
#1
52
As you mentioned, it allows optimizations that do not preserve strict IEEE compliance.
正如您所提到的,它允许优化不保留严格的IEEE遵从性。
An example is this:
一个例子是这样的:
x = x*x*x*x*x*x*x*x;
to
来
x *= x;
x *= x;
x *= x;
Because floating-point arithmetic is not associative, the ordering and factoring of the operations will affect results due to round-off. Therefore, this optimization is not done under strict FP behavior.
因为浮点运算不是关联的,操作的排序和分解会影响由于舍入而产生的结果。因此,这种优化不是在严格的FP行为下完成的。
EDIT: I haven't actually checked to see if GCC actually does this particular optimization. But the idea is the same.
编辑:实际上我还没有检查GCC是否真的进行了这种优化。但想法是一样的。
#2
173
-ffast-math
does a lot more than just break strict IEEE compliance.
- ffastmath做的不仅仅是打破IEEE的严格遵守。
First of all, of course, it does break strict IEEE compliance, allowing e.g. the reordering of instructions to something which is mathematically the same (ideally) but not exactly the same in floating point.
首先,当然,它确实打破了严格的IEEE遵从性,允许例如,对某些东西的指令重新排序,这在数学上是相同的(理想情况下),但在浮点数上却不完全相同。
Second, it disables setting errno
after single-instruction math functions, which means avoiding a write to a thread-local variable (this can make a 100% difference for those functions on some architectures).
其次,它禁止在单指令数学函数之后设置errno,这意味着避免写入线程局部变量(这会使某些架构上的函数产生100%的差异)。
Third, it makes the assumption that all math is finite, which means that no checks for NaN (or zero) are made in place where they would have detrimental effects. It is simply assumed that this isn't going to happen.
第三,它假设所有的数学都是有限的,这意味着没有对NaN(或零)的检查会产生有害的影响。它只是假设这不会发生。
Fourth, it enables reciprocal approximations for division and reciprocal square root.
第四,它可以对除法和倒数平方根进行倒数。
Further, it disables signed zero (code assumes signed zero does not exist, even if the target supports it) and rounding math, which enables among other things constant folding at compile-time.
此外,它禁止签署零(代码假定为零的代码不存在,即使目标支持它)和四舍五入的数学,这使得在编译时可以不断地折叠其他东西。
Last, it generates code that assumes that no hardware interrupts can happen due to signalling/trapping math (that is, if these cannot be disabled on the target architecture and consequently do happen, they will not be handled).
最后,它生成的代码假设没有硬件中断可能是由于信号/陷阱数学(也就是说,如果不能在目标体系结构上禁用它们,并且会发生,它们将不会被处理)。
#1
52
As you mentioned, it allows optimizations that do not preserve strict IEEE compliance.
正如您所提到的,它允许优化不保留严格的IEEE遵从性。
An example is this:
一个例子是这样的:
x = x*x*x*x*x*x*x*x;
to
来
x *= x;
x *= x;
x *= x;
Because floating-point arithmetic is not associative, the ordering and factoring of the operations will affect results due to round-off. Therefore, this optimization is not done under strict FP behavior.
因为浮点运算不是关联的,操作的排序和分解会影响由于舍入而产生的结果。因此,这种优化不是在严格的FP行为下完成的。
EDIT: I haven't actually checked to see if GCC actually does this particular optimization. But the idea is the same.
编辑:实际上我还没有检查GCC是否真的进行了这种优化。但想法是一样的。
#2
173
-ffast-math
does a lot more than just break strict IEEE compliance.
- ffastmath做的不仅仅是打破IEEE的严格遵守。
First of all, of course, it does break strict IEEE compliance, allowing e.g. the reordering of instructions to something which is mathematically the same (ideally) but not exactly the same in floating point.
首先,当然,它确实打破了严格的IEEE遵从性,允许例如,对某些东西的指令重新排序,这在数学上是相同的(理想情况下),但在浮点数上却不完全相同。
Second, it disables setting errno
after single-instruction math functions, which means avoiding a write to a thread-local variable (this can make a 100% difference for those functions on some architectures).
其次,它禁止在单指令数学函数之后设置errno,这意味着避免写入线程局部变量(这会使某些架构上的函数产生100%的差异)。
Third, it makes the assumption that all math is finite, which means that no checks for NaN (or zero) are made in place where they would have detrimental effects. It is simply assumed that this isn't going to happen.
第三,它假设所有的数学都是有限的,这意味着没有对NaN(或零)的检查会产生有害的影响。它只是假设这不会发生。
Fourth, it enables reciprocal approximations for division and reciprocal square root.
第四,它可以对除法和倒数平方根进行倒数。
Further, it disables signed zero (code assumes signed zero does not exist, even if the target supports it) and rounding math, which enables among other things constant folding at compile-time.
此外,它禁止签署零(代码假定为零的代码不存在,即使目标支持它)和四舍五入的数学,这使得在编译时可以不断地折叠其他东西。
Last, it generates code that assumes that no hardware interrupts can happen due to signalling/trapping math (that is, if these cannot be disabled on the target architecture and consequently do happen, they will not be handled).
最后,它生成的代码假设没有硬件中断可能是由于信号/陷阱数学(也就是说,如果不能在目标体系结构上禁用它们,并且会发生,它们将不会被处理)。