GCC，-O2和位域 - 这是一个错误还是一个功能？

Today I discovered alarming behavior when experimenting with bit fields. For the sake of discussion and simplicity, here's an example program:

今天,我在尝试使用位域时发现了令人担忧的行为。为了便于讨论和简化,这是一个示例程序:

#include <stdio.h>

struct Node
{
  int a:16 __attribute__ ((packed));
  int b:16 __attribute__ ((packed));

  unsigned int c:27 __attribute__ ((packed));
  unsigned int d:3 __attribute__ ((packed));
  unsigned int e:2 __attribute__ ((packed));
};

int main (int argc, char *argv[])
{
  Node n;
  n.a = 12345;
  n.b = -23456;
  n.c = 0x7ffffff;
  n.d = 0x7;
  n.e = 0x3;

  printf("3-bit field cast to int: %d\n",(int)n.d);

  n.d++;  

  printf("3-bit field cast to int: %d\n",(int)n.d);
}

The program is purposely causing the 3-bit bit-field to overflow. Here's the (correct) output when compiled using "g++ -O0":

该程序故意使3位位域溢出。这是使用“g ++ -O0”编译时的(正确)输出:

3-bit field cast to int: 7

3位字段转换为int:7

3-bit field cast to int: 0

3位字段转换为int:0

Here's the output when compiled using "g++ -O2" (and -O3):

这是使用“g ++ -O2”(和-O3)编译时的输出:

3-bit field cast to int: 7

3位字段转换为int:7

3-bit field cast to int: 8

3位字段转换为int:8

Checking the assembly of the latter example, I found this:

检查后一个例子的程序集,我发现了这个:

movl    $7, %esi
movl    $.LC1, %edi
xorl    %eax, %eax
call    printf
movl    $8, %esi
movl    $.LC1, %edi
xorl    %eax, %eax
call    printf
xorl    %eax, %eax
addq    $8, %rsp

The optimizations have just inserted "8", assuming 7+1=8 when in fact the number overflows and is zero.

优化刚刚插入“8”,假设7 + 1 = 8,实际上数字溢出且为零。

Fortunately the code I care about doesn't overflow as far as I know, but this situation scares me - is this a known bug, a feature, or is this expected behavior? When can I expect gcc to be right about this?

幸运的是,据我所知,我关心的代码并没有溢出,但这种情况让我感到害怕 - 这是一个已知的错误,一个功能,还是这个预期的行为?我什么时候可以期待gcc对此有所帮助?

Edit (re: signed/unsigned) :

编辑(重新:签名/未签名):

It's being treated as unsigned because it's declared as unsigned. Declaring it as int you get the output (with O0):

它被视为无符号,因为它被声明为无符号。将其声明为int,即可获得输出(使用O0):

3-bit field cast to int: -1

3位字段转换为int:-1

3-bit field cast to int: 0

3位字段转换为int:0

An even funnier thing happens with -O2 in this case:

在这种情况下,-O2会发生更有趣的事情:

3-bit field cast to int: 7

3位字段转换为int:7

3-bit field cast to int: 8

3位字段转换为int:8

I admit that attribute is a fishy thing to use; in this case it's a difference in optimization settings I'm concerned about.

我承认属性是一种可疑的东西;在这种情况下,我关注的是优化设置的差异。

1 个解决方案

#1

If you want to get technical, the minute you used __attribute__ (an identifier containing two consecutive underscores) your code has/had undefined behavior.

如果你想获得技术,那么你使用__attribute __(包含两个连续下划线的标识符)的那一刻你的代码就有/未定义的行为。

If you get the same behavior with those removed, it looks to me like a compiler bug. The fact that a 3-bit field is being treated as 7 means that it's being treated as an unsigned, so when you overflow it should do like any other unsigned, and give you modulo arithmetic.

如果你删除那些行为有同样的行为,它看起来像编译器错误。 3位字段被视为7的事实意味着它被视为无符号,所以当你溢出它应该像任何其他无符号一样,并给你模数运算。

It would also be legitimate for it to treat the bit-field as signed. In this case the first result would be -1, -3 or -0 (which might print as just 0), and the second undefined (since overflow of a signed integer gives undefined behavior). In theory, other values might be possible under C89 or the current C++ standard since they don't limit the representations of signed integers. In C99 or C++0x, it can only be those three (C99 limits signed integers to one's complement, two's complement or sign-magnitude and C++0x is based on C99 instead of C90).

它将比特字段视为已签名也是合法的。在这种情况下,第一个结果是-1,-3或-0(可能打印为0),第二个未定义(因为有符号整数的溢出会产生未定义的行为)。理论上,在C89或当前的C ++标准下,其他值可能是可能的,因为它们不限制有符号整数的表示。在C99或C ++ 0x中,它只能是那三个(C99限制有符号整数到一个补码,二进制补码或符号幅度,C ++ 0x基于C99而不是C90)。

Oops: I didn't pay close enough attention -- since it's defined as unsigned, it has to be treated as unsigned, leaving little wiggle room for getting out of its being a compiler bug.

哎呀:我没有给予足够的关注 - 因为它被定义为无符号,它必须被视为无符号,留下很小的摆动空间来摆脱它的编译器错误。

#1