浮点运算32位模式与64位模式

时间:2022-09-01 09:57:14

I have the same number crunching source code in Delphi which is compiled both as 32 bit and as 64 bit application. From the log file I can see that the numbers are slightly (1e-14 relative error) different. So I'm wondering if it is possible that the same CPU performs floating point operations differently when running 32 bit and 64 bit code. Or is it something that the compiler is responsible for.

我在Delphi中使用相同数量的运算源代码,它被编译为32位和64位应用程序。从日志文件中我可以看到数字略有不同(1e-14相对误差)。所以我想知道在运行32位和64位代码时,同一CPU是否可能以不同方式执行浮点运算。或者它是编译器负责的东西。

2 个解决方案

#1


7  

I'm going to assume that the code does not explicitly use Extended. Since that data type differs between 32 and 64 bit (it's 10 bytes in 32 bit and 8 bytes in 64 bit), any explicit use of Extended introduces an immediate difference. I'm going to assume that you are using Double for all your variables. Although the arguments below transfer across equally to Single.

我将假设代码没有明确使用Extended。由于该数据类型在32位和64位之间不同(32位为10位,64位为8位),因此任何显式使用Extended都会产生直接差异。我将假设您使用Double来表示所有变量。虽然下面的论点平等转移到单一。

Beyond that, the most common reason for this is a difference in behaviour between the two floating point units.

除此之外,最常见的原因是两个浮点单元之间的行为差​​异。

The x87 unit, used by 32 bit code, stores intermediate values to 80 bit extended precision. The SSE unit, used by 64 bit code, stores intermediate values to 64 bit double precision.

由32位代码使用的x87单元将中间值存储为80位扩展精度。由64位代码使用的SSE单元将中间值存储为64位双精度。

Now, the x87 unit can be configured using the control word to store intermediate values to 64 bit precision. It makes no difference in terms of performance, but will align your 32 and 64 bit results to be closer.

现在,可以使用控制字配置x87单元,以将中间值存储为64位精度。它在性能方面没有任何区别,但会使您的32位和64位结果更接近。

Even then you won't get exactly the same results on the different units. In fact you won't get the exact same results on all x87 units. Even though these units are all IEEE754 conformant, that standard allows a degree of leeway for calculations.

即使这样,你也不会在不同的单位上获得完全相同的结果。实际上,你不会在所有x87单元上得到完全相同的结果。尽管这些单元都符合IEEE754标准,但该标准允许一定程度的计算余地。

What's more, higher order calculations like trigonometry, logarithms, exponentiation etc. are performed quite differently between 32 and 64 bit. The 32 bit unit has more built in functionality than the 64 bit unit. You'll note in the Delphi source code that the trig functions, for example, are all implemented in the RTL for 64 bit. On 32 bit code they are implemented by calling x87 ops.

更重要的是,在32位和64位之间执行高阶计算,如三角函数,对数,取幂等。 32位单元具有比64位单元更多的内置功能。您将在Delphi源代码中注意到,例如,trig函数都是在RTL中为64位实现的。在32位代码上,它们通过调用x87 ops来实现。

The bottom line is that you will never get your 32 and 64 bit programs to agree exactly when there are floating point calculations involved. You will have to accept differences to a small tolerance.

最重要的是,当涉及浮点计算时,您永远不会让32位和64位程序完全一致。您必须接受差异才能获得较小的容忍度。

#2


3  

Extended is equal to Double in X64. X32 mode is using the FPU floating point unit, while X64 is using the SSE registers for floating point execution.

X64中的Extended等于Double。 X32模式使用FPU浮点单元,而X64使用SSE寄存器进行浮点执行。

There is also the compiler directive Floating point precision control (Delphi for x64), which by default is on and keeps intermediate single floats as doubles.

还有编译器指令浮点精度控制(Delphi for x64),默认情况下打开并将中间单浮点数保持为双精度数。

#1


7  

I'm going to assume that the code does not explicitly use Extended. Since that data type differs between 32 and 64 bit (it's 10 bytes in 32 bit and 8 bytes in 64 bit), any explicit use of Extended introduces an immediate difference. I'm going to assume that you are using Double for all your variables. Although the arguments below transfer across equally to Single.

我将假设代码没有明确使用Extended。由于该数据类型在32位和64位之间不同(32位为10位,64位为8位),因此任何显式使用Extended都会产生直接差异。我将假设您使用Double来表示所有变量。虽然下面的论点平等转移到单一。

Beyond that, the most common reason for this is a difference in behaviour between the two floating point units.

除此之外,最常见的原因是两个浮点单元之间的行为差​​异。

The x87 unit, used by 32 bit code, stores intermediate values to 80 bit extended precision. The SSE unit, used by 64 bit code, stores intermediate values to 64 bit double precision.

由32位代码使用的x87单元将中间值存储为80位扩展精度。由64位代码使用的SSE单元将中间值存储为64位双精度。

Now, the x87 unit can be configured using the control word to store intermediate values to 64 bit precision. It makes no difference in terms of performance, but will align your 32 and 64 bit results to be closer.

现在,可以使用控制字配置x87单元,以将中间值存储为64位精度。它在性能方面没有任何区别,但会使您的32位和64位结果更接近。

Even then you won't get exactly the same results on the different units. In fact you won't get the exact same results on all x87 units. Even though these units are all IEEE754 conformant, that standard allows a degree of leeway for calculations.

即使这样,你也不会在不同的单位上获得完全相同的结果。实际上,你不会在所有x87单元上得到完全相同的结果。尽管这些单元都符合IEEE754标准,但该标准允许一定程度的计算余地。

What's more, higher order calculations like trigonometry, logarithms, exponentiation etc. are performed quite differently between 32 and 64 bit. The 32 bit unit has more built in functionality than the 64 bit unit. You'll note in the Delphi source code that the trig functions, for example, are all implemented in the RTL for 64 bit. On 32 bit code they are implemented by calling x87 ops.

更重要的是,在32位和64位之间执行高阶计算,如三角函数,对数,取幂等。 32位单元具有比64位单元更多的内置功能。您将在Delphi源代码中注意到,例如,trig函数都是在RTL中为64位实现的。在32位代码上,它们通过调用x87 ops来实现。

The bottom line is that you will never get your 32 and 64 bit programs to agree exactly when there are floating point calculations involved. You will have to accept differences to a small tolerance.

最重要的是,当涉及浮点计算时,您永远不会让32位和64位程序完全一致。您必须接受差异才能获得较小的容忍度。

#2


3  

Extended is equal to Double in X64. X32 mode is using the FPU floating point unit, while X64 is using the SSE registers for floating point execution.

X64中的Extended等于Double。 X32模式使用FPU浮点单元,而X64使用SSE寄存器进行浮点执行。

There is also the compiler directive Floating point precision control (Delphi for x64), which by default is on and keeps intermediate single floats as doubles.

还有编译器指令浮点精度控制(Delphi for x64),默认情况下打开并将中间单浮点数保持为双精度数。