I have a very strange bug in my program. I was not able to isolate the error in a reproducible code but at a certain place in my code there is:
我的程序中有一个非常奇怪的错误。我无法在可重现的代码中隔离错误,但在我的代码中的某个位置有:
double distance, criticalDistance;
...
if (distance > criticalDistance)
{
std::cout << "first branch" << std::endl;
}
if (distance == criticalDistance)
{
std::cout << "second branch" << std::endl;
}
In debug build everything is fine. Only one branch gets executed.
在调试构建中一切都很好。只有一个分支被执行。
But in release build all hell breaks loose and sometimes both branches get executed.
但是在发布版本中,所有地狱都会松动,有时两个分支都会被执行。
This is very strange, since if I add the else conditional:
这很奇怪,因为如果我添加else条件:
if (distance > criticalDistance)
{
std::cout << "first branch" << std::endl;
}
else if (distance == criticalDistance)
{
std::cout << "second branch" << std::endl;
}
This does not happen.
这不会发生。
Please, what can be the cause of this? I am using gcc 4.8.1 on Ubuntu 13.10 on a 32 bit computer.
请问,这可能是什么原因?我在32位计算机上的Ubuntu 13.10上使用gcc 4.8.1。
EDIT1:
I am using precompiler flags
我正在使用预编译器标志
- -std=gnu++11
- -gdwarf-3
EDIT2:
I do not think this is caused by a memory leak. I analyzed both release and debug builds with valgrind memory analyzer with tracking of unitialized memory and detection of self-modifiyng code and I found no errors.
我不认为这是由内存泄漏引起的。我用valgrind内存分析器分析了发布和调试版本,跟踪了单元化内存并检测了自修改代码,我发现没有错误。
EDIT3:
Changing the declaration to
将声明更改为
volatile double distance, criticalDistance;
makes the problem go away. Does this confirm woolstar's answer? Is this a compiler bug?
使问题消失。这是否证实了woolstar的答案?这是编译器错误吗?
EDIT4:
using the gcc option -ffloat-store also fixes the problem. If I understand this correctly this is caused by gcc.
使用gcc选项-ffloat-store也可以解决问题。如果我理解正确,这是由gcc引起的。
3 个解决方案
#1
14
if (distance > criticalDistance)
// true
if (distance == criticalDistance)
// also true
I have seen this behavior before in my own code. It is due to the mismatch between the standard 64 bit value stored in memory, and the 80 bit internal values that intel processors use for floating point calculation.
我之前在自己的代码中看到过这种行为。这是由于存储在内存中的标准64位值与英特尔处理器用于浮点计算的80位内部值不匹配。
Basically, when truncated to 64 bits, your values are equal, but when tested at 80 bit values, one is slightly larger than the other. In DEBUG
mode, the values are always stored to memory and then reloaded so they are always truncated. In optimized mode, the compiler reuses the value in the floating point register and it doesn't get truncated.
基本上,当截断为64位时,您的值相等,但在以80位值测试时,一个稍大于另一个。在DEBUG模式下,值始终存储在内存中,然后重新加载,因此它们总是被截断。在优化模式下,编译器会重用浮点寄存器中的值,并且不会被截断。
#2
2
Please, what can be the cause of this?
请问,这可能是什么原因?
Undefined behavior, aka. bugs in your code.
未定义的行为,又名。代码中的错误。
There is no IEEE floating point value which exhibits this behavior. So what's happening is that you are doing something wrong, which violates an assumption made by your compiler.
没有IEEE浮点值表现出这种行为。所以正在发生的事情是你做错了什么,这违反了编译器的假设。
When optimizing your code, the compiler assumes that your code can be described by the C++ standard. If you do anything that is left undefined by the C++ standard, then these assumptions are violated, resulting in "weird" execution. It could be something "simple" like an uninitialized variable or a buffer overrun resulting in parts of the stack or heap being overwritten with garbage data, or it could be something more subtle, where you rely on a specific ordering between two operations, which is not guaranteed by the standard.
在优化代码时,编译器假定您的代码可以由C ++标准描述。如果您执行C ++标准未定义的任何操作,则会违反这些假设,从而导致执行“怪异”。它可能是一个“简单”的东西,如未初始化的变量或缓冲区溢出导致堆栈或堆的部分被垃圾数据覆盖,或者它可能是更微妙的东西,你依赖于两个操作之间的特定顺序,这是不符合标准。
That is probably why you were not able to reproduce the problem in a small test case (the smaller test code does not contain the erroneous code), or and why you only see the error in optimized builds.
这可能就是为什么你不能在一个小的测试用例中重现问题(较小的测试代码不包含错误的代码),或者为什么你只看到优化版本中的错误。
Of course, it is also possible that you've stumbled across a compiler bug, but a bug in your code is quite a bit more likely. :)
当然,您也可能偶然发现了编译器错误,但代码中的错误更有可能发生。 :)
And best of all, it means that we don't really have a chance to debug the problem from the code snippet you've shown. We can say "the code shouldn't behave like that", but that's about all.
最重要的是,这意味着我们没有机会从您显示的代码段中调试问题。我们可以说“代码不应该那样”,但这就是全部。
#3
1
You are not initializing your doubles, are you sure that they always get a value?
I have found that uninitilized variables in debug is allways 0, but in release they can be pretty much anything.
你没有初始化你的双打,你确定他们总能获得价值吗?我发现调试中未经过激励的变量总是为0,但在发布时它们几乎可以是任何东西。
#1
14
if (distance > criticalDistance)
// true
if (distance == criticalDistance)
// also true
I have seen this behavior before in my own code. It is due to the mismatch between the standard 64 bit value stored in memory, and the 80 bit internal values that intel processors use for floating point calculation.
我之前在自己的代码中看到过这种行为。这是由于存储在内存中的标准64位值与英特尔处理器用于浮点计算的80位内部值不匹配。
Basically, when truncated to 64 bits, your values are equal, but when tested at 80 bit values, one is slightly larger than the other. In DEBUG
mode, the values are always stored to memory and then reloaded so they are always truncated. In optimized mode, the compiler reuses the value in the floating point register and it doesn't get truncated.
基本上,当截断为64位时,您的值相等,但在以80位值测试时,一个稍大于另一个。在DEBUG模式下,值始终存储在内存中,然后重新加载,因此它们总是被截断。在优化模式下,编译器会重用浮点寄存器中的值,并且不会被截断。
#2
2
Please, what can be the cause of this?
请问,这可能是什么原因?
Undefined behavior, aka. bugs in your code.
未定义的行为,又名。代码中的错误。
There is no IEEE floating point value which exhibits this behavior. So what's happening is that you are doing something wrong, which violates an assumption made by your compiler.
没有IEEE浮点值表现出这种行为。所以正在发生的事情是你做错了什么,这违反了编译器的假设。
When optimizing your code, the compiler assumes that your code can be described by the C++ standard. If you do anything that is left undefined by the C++ standard, then these assumptions are violated, resulting in "weird" execution. It could be something "simple" like an uninitialized variable or a buffer overrun resulting in parts of the stack or heap being overwritten with garbage data, or it could be something more subtle, where you rely on a specific ordering between two operations, which is not guaranteed by the standard.
在优化代码时,编译器假定您的代码可以由C ++标准描述。如果您执行C ++标准未定义的任何操作,则会违反这些假设,从而导致执行“怪异”。它可能是一个“简单”的东西,如未初始化的变量或缓冲区溢出导致堆栈或堆的部分被垃圾数据覆盖,或者它可能是更微妙的东西,你依赖于两个操作之间的特定顺序,这是不符合标准。
That is probably why you were not able to reproduce the problem in a small test case (the smaller test code does not contain the erroneous code), or and why you only see the error in optimized builds.
这可能就是为什么你不能在一个小的测试用例中重现问题(较小的测试代码不包含错误的代码),或者为什么你只看到优化版本中的错误。
Of course, it is also possible that you've stumbled across a compiler bug, but a bug in your code is quite a bit more likely. :)
当然,您也可能偶然发现了编译器错误,但代码中的错误更有可能发生。 :)
And best of all, it means that we don't really have a chance to debug the problem from the code snippet you've shown. We can say "the code shouldn't behave like that", but that's about all.
最重要的是,这意味着我们没有机会从您显示的代码段中调试问题。我们可以说“代码不应该那样”,但这就是全部。
#3
1
You are not initializing your doubles, are you sure that they always get a value?
I have found that uninitilized variables in debug is allways 0, but in release they can be pretty much anything.
你没有初始化你的双打,你确定他们总能获得价值吗?我发现调试中未经过激励的变量总是为0,但在发布时它们几乎可以是任何东西。