I used the following piece of code to read data from files as part of a larger program.
我使用下面的代码从文件中读取数据作为一个更大程序的一部分。
double data_read(FILE *stream,int code) {
char data[8];
switch(code) {
case 0x08:
return (unsigned char)fgetc(stream);
case 0x09:
return (signed char)fgetc(stream);
case 0x0b:
data[1] = fgetc(stream);
data[0] = fgetc(stream);
return *(short*)data;
case 0x0c:
for(int i=3;i>=0;i--)
data[i] = fgetc(stream);
return *(int*)data;
case 0x0d:
for(int i=3;i>=0;i--)
data[i] = fgetc(stream);
return *(float*)data;
case 0x0e:
for(int i=7;i>=0;i--)
data[i] = fgetc(stream);
return *(double*)data;
}
die("data read failed");
return 1;
}
Now I am told to use -O2
and I get following gcc warning: warning: dereferencing type-punned pointer will break strict-aliasing rules
现在,我被告知要使用-O2,并遵循gcc警告:警告:取消对类型punned指针的引用将违反严格的别名规则
Googleing I found two orthogonal answers:
我找到了两个正交的答案:
- Concluding: there's no need to worry; gcc tries to be more law obedient than the actual law.
- 结论:没有必要担心;海湾合作委员会试图更顺从法律而不是实际的法律。
vs
vs
- So basically if you have an int* and a float* they are not allowed to point to the same memory location. If your code does not respect this, then the compiler's optimizer will most likely break your code.
- 所以基本上如果你有一个int*和一个float*它们不允许指向相同的内存位置。如果您的代码不尊重这一点,那么编译器的优化器很可能会破坏您的代码。
In the end I don't want to ignore the warnings. What would you recommend?
最后,我不想忽视这些警告。你有什么建议?
[update] I substituted the toy example with the real function.
[更新]我用真实的函数代替了玩具的例子。
7 个解决方案
#1
26
It looks a lot as if you really want to use fread:
看起来你真的很想用fread:
int data;
fread(&data, sizeof(data), 1, stream);
That said, if you do want to go the route of reading chars, then reinterpreting them as an int, the safe way to do it in C (but not in C++) is to use a union:
也就是说,如果你想读chars的路线,然后把它们重新解释为int,安全的方法是用C(但不是在c++中)是使用一个联合:
union
{
char theChars[4];
int theInt;
} myunion;
for(int i=0; i<4; i++)
myunion.theChars[i] = fgetc(stream);
return myunion.theInt;
I'm not sure why the length of data
in your original code is 3. I assume you wanted 4 bytes; at least I don't know of any systems where an int is 3 bytes.
我不确定为什么原始代码中的数据长度是3。我假设你想要4个字节;至少我不知道有哪个系统的int是3字节。
Note that both your code and mine are highly non-portable.
注意,您的代码和我的代码都是高度不可移植的。
Edit: If you want to read ints of various lengths from a file, portably, try something like this:
编辑:如果你想从文件中读取不同长度的ints,可以这样做:
unsigned result=0;
for(int i=0; i<4; i++)
result = (result << 8) | fgetc(stream);
(Note: In a real program, you would additionally want to test the return value of fgetc() against EOF.)
(注意:在真实的程序中,您还需要测试fgetc()对EOF的返回值。)
This reads a 4-byte unsigned from the file in little-endian format, regardless of what the endianness of the system is. It should work on just about any system where an unsigned is at least 4 bytes.
这将以little-endian格式从文件中读取一个4字节的未签名,而不管系统的机缘是什么。它应该适用于任何无符号至少为4字节的系统。
If you want to be endian-neutral, don't use pointers or unions; use bit-shifts instead.
如果你想保持中立,不要使用指针或联合;使用bit-shifts代替。
#2
38
The problem occurs because you access a char-array through a double*
:
问题发生的原因是您通过一个双*访问一个字符数组:
char data[8];
...
return *(double*)data;
But gcc assumes that your program will never access variables though pointers of different type. This assumption is called strict-aliasing and allows the compiler to make some optimizations:
但是gcc假设您的程序永远不会通过不同类型的指针访问变量。这种假设被称为严格别名,允许编译器进行一些优化:
If the compiler knows that your *(double*)
can in no way overlap with data[]
, it's allowed to all sorts of things like reordering your code into:
如果编译器知道您的*(double*)不能与数据[]重叠,那么它可以将您的代码重新排序为:
return *(double*)data;
for(int i=7;i>=0;i--)
data[i] = fgetc(stream);
The loop is most likely optimized away and you end up with just:
循环很可能被优化掉了,最后你得到的是:
return *(double*)data;
Which leaves your data[] uninitialized. In this particular case the compiler might be able to see that your pointers overlap, but if you had declared it char* data
, it could have given bugs.
这将使您的数据[]未初始化。在这种情况下,编译器可能会看到指针重叠,但是如果您声明它为char* data,它可能会产生错误。
But, the strict-aliasing rule says that a char* and void* can point at any type. So you can rewrite it into:
但是,严格的别名规则指出,char*和void*可以指向任何类型。所以你可以把它重写成:
double data;
...
*(((char*)&data) + i) = fgetc(stream);
...
return data;
Strict aliasing warnings are really important to understand or fix. They cause the kinds of bugs that are impossible to reproduce in-house because they occur only on one particular compiler on one particular operating system on one particular machine and only on full-moon and once a year, etc.
严格的别名警告对于理解或修复非常重要。它们导致了无法在内部复制的各种错误,因为它们只出现在特定机器上的一个特定操作系统上的一个特定编译器上,而只出现在满月上,每年出现一次,等等。
#3
7
Using a union is not the correct thing to do here. Reading from an unwritten member of the union is undefined - i.e. the compiler is free to perform optimisations that will break your code (like optimising away the write).
在这里使用union不是正确的做法。从联盟的不成文成员处读取是没有定义的——例如,编译器可以*地执行优化,这会破坏您的代码(比如优化写操作)。
#4
6
This doc summarizes the situation: http://dbp-consulting.com/tutorials/StrictAliasing.html
本文总结了这种情况:http://dbp- consulting.com/tutorials/strictalias.html
There are several different solutions there, but the most portable/safe one is to use memcpy(). (The function calls may be optimized out, so it's not as inefficient as it appears.) For example, replace this:
这里有几种不同的解决方案,但是最便携/最安全的方法是使用memcpy()。(函数调用可能会被优化,所以不会像看起来那样低效。)例如,替换:
return *(short*)data;
With this:
用这个:
short temp;
memcpy(&temp, data, sizeof(temp));
return temp;
#5
2
Basically you can read gcc's message as guy you are looking for trouble, don't say I didn't warn ya.
基本上你可以把gcc的信息当成你在找麻烦的人来读,不要说我没警告过你。
Casting a three byte character array to an int
is one of the worst things I have seen, ever. Normally your int
has at least 4 bytes. So for the fourth (and maybe more if int
is wider) you get random data. And then you cast all of this to a double
.
将一个3字节字符数组转换为int是我见过的最糟糕的事情之一。通常,int至少有4个字节。所以对于第四个(如果int更宽的话)你会得到随机数据。然后你把所有这些都变成了双倍。
Just do none of that. The aliasing problem that gcc warns about is innocent compared to what you are doing.
不要这样做。gcc警告的混叠问题与你正在做的相比是无害的。
#6
0
The authors of the C Standard wanted to let compiler writers generate efficient code in circumstances where it would be theoretically possible but unlikely that a global variable might have its value accessed using a seemingly-unrelated pointer. The idea wasn't to forbid type punning by casting and dereferencing a pointer in a single expression, but rather to say that given something like:
C标准的作者希望让编译器作者在理论上可能但不太可能使用看似不相关的指针访问全局变量的情况下生成有效的代码。这个想法并不是要通过在一个表达式中强制和取消引用一个指针来禁止输入双关语,而是说如果给定了以下内容:
int x;
int foo(double *d)
{
x++;
*d=1234;
return x;
}
a compiler would be entitled to assume that the write to *d won't affect x. The authors of the Standard wanted to list situations where a function like the above that received a pointer from an unknown source would have to assume that it might alias a seemingly-unrelated global, without requiring that types perfectly match. Unfortunately, while the rationale strongly suggests that authors of the Standard intended to describe a standard for minimum conformance in cases where a compiler would otherwise have no reason to believe that things might alias, the rule fails to require that compilers recognize aliasing in cases where it is obvious and the authors of gcc have decided that they'd rather generate the smallest program it can while conforming to the poorly-written language of the Standard, than generate code which is actually useful, and instead of recognizing aliasing in cases where it's obvious (while still being able to assume that things that don't look like they'll alias, won't) they'd rather require that programmers use memcpy
, thus requiring a compiler to allow for the possibility that pointers of unknown origin might alias just about anything, thus impeding optimization.
编译器将有权认为写* d不会影响x。标准的作者想要列表函数像上述的情况下收到了来自未知来源指针必须假定可能别名看似无关的全球,而无需类型完美匹配。不幸的是,虽然标准的理由强烈表明,作者旨在描述一个最低标准一致性的情况下编译器否则没有理由相信事情可能别名,规则没有要求编译器识别混叠的情况下很明显和gcc的作者已经决定,他们宁愿生成最小的程序可以编写符合语言的标准,比生成代码实际上是有用的,而不是认识到混叠的情况下很明显(同时仍然能够假定事情看起来不像他们会别名,不会)他们宁愿要求程序员使用memcpy,因此要求编译器允许的可能性几乎任何来历不明的指针可能别名,因此阻碍优化。
#7
-4
Apparently the standard allows sizeof(char*) to be different from sizeof(int*) so gcc complains when you try a direct cast. void* is a little special in that everything can be converted back and forth to and from void*. In practice I don't know many architecture/compiler where a pointer is not always the same for all types but gcc is right to emit a warning even if it is annoying.
显然,标准允许sizeof(char*)与sizeof(int*)不同,所以当您尝试直接转换时,gcc会抱怨。void*有点特殊,因为所有东西都可以来回转换到或从void*中转换。在实践中,我不知道很多体系结构/编译器中指针对于所有类型并不总是相同的,但是gcc发出警告是正确的,即使它很烦人。
I think the safe way would be
我认为安全的方法是
int i, *p = &i;
char *q = (char*)&p[0];
or
或
char *q = (char*)(void*)p;
You can also try this and see what you get:
你也可以试试这个,看看你能得到什么:
char *q = reinterpret_cast<char*>(p);
#1
26
It looks a lot as if you really want to use fread:
看起来你真的很想用fread:
int data;
fread(&data, sizeof(data), 1, stream);
That said, if you do want to go the route of reading chars, then reinterpreting them as an int, the safe way to do it in C (but not in C++) is to use a union:
也就是说,如果你想读chars的路线,然后把它们重新解释为int,安全的方法是用C(但不是在c++中)是使用一个联合:
union
{
char theChars[4];
int theInt;
} myunion;
for(int i=0; i<4; i++)
myunion.theChars[i] = fgetc(stream);
return myunion.theInt;
I'm not sure why the length of data
in your original code is 3. I assume you wanted 4 bytes; at least I don't know of any systems where an int is 3 bytes.
我不确定为什么原始代码中的数据长度是3。我假设你想要4个字节;至少我不知道有哪个系统的int是3字节。
Note that both your code and mine are highly non-portable.
注意,您的代码和我的代码都是高度不可移植的。
Edit: If you want to read ints of various lengths from a file, portably, try something like this:
编辑:如果你想从文件中读取不同长度的ints,可以这样做:
unsigned result=0;
for(int i=0; i<4; i++)
result = (result << 8) | fgetc(stream);
(Note: In a real program, you would additionally want to test the return value of fgetc() against EOF.)
(注意:在真实的程序中,您还需要测试fgetc()对EOF的返回值。)
This reads a 4-byte unsigned from the file in little-endian format, regardless of what the endianness of the system is. It should work on just about any system where an unsigned is at least 4 bytes.
这将以little-endian格式从文件中读取一个4字节的未签名,而不管系统的机缘是什么。它应该适用于任何无符号至少为4字节的系统。
If you want to be endian-neutral, don't use pointers or unions; use bit-shifts instead.
如果你想保持中立,不要使用指针或联合;使用bit-shifts代替。
#2
38
The problem occurs because you access a char-array through a double*
:
问题发生的原因是您通过一个双*访问一个字符数组:
char data[8];
...
return *(double*)data;
But gcc assumes that your program will never access variables though pointers of different type. This assumption is called strict-aliasing and allows the compiler to make some optimizations:
但是gcc假设您的程序永远不会通过不同类型的指针访问变量。这种假设被称为严格别名,允许编译器进行一些优化:
If the compiler knows that your *(double*)
can in no way overlap with data[]
, it's allowed to all sorts of things like reordering your code into:
如果编译器知道您的*(double*)不能与数据[]重叠,那么它可以将您的代码重新排序为:
return *(double*)data;
for(int i=7;i>=0;i--)
data[i] = fgetc(stream);
The loop is most likely optimized away and you end up with just:
循环很可能被优化掉了,最后你得到的是:
return *(double*)data;
Which leaves your data[] uninitialized. In this particular case the compiler might be able to see that your pointers overlap, but if you had declared it char* data
, it could have given bugs.
这将使您的数据[]未初始化。在这种情况下,编译器可能会看到指针重叠,但是如果您声明它为char* data,它可能会产生错误。
But, the strict-aliasing rule says that a char* and void* can point at any type. So you can rewrite it into:
但是,严格的别名规则指出,char*和void*可以指向任何类型。所以你可以把它重写成:
double data;
...
*(((char*)&data) + i) = fgetc(stream);
...
return data;
Strict aliasing warnings are really important to understand or fix. They cause the kinds of bugs that are impossible to reproduce in-house because they occur only on one particular compiler on one particular operating system on one particular machine and only on full-moon and once a year, etc.
严格的别名警告对于理解或修复非常重要。它们导致了无法在内部复制的各种错误,因为它们只出现在特定机器上的一个特定操作系统上的一个特定编译器上,而只出现在满月上,每年出现一次,等等。
#3
7
Using a union is not the correct thing to do here. Reading from an unwritten member of the union is undefined - i.e. the compiler is free to perform optimisations that will break your code (like optimising away the write).
在这里使用union不是正确的做法。从联盟的不成文成员处读取是没有定义的——例如,编译器可以*地执行优化,这会破坏您的代码(比如优化写操作)。
#4
6
This doc summarizes the situation: http://dbp-consulting.com/tutorials/StrictAliasing.html
本文总结了这种情况:http://dbp- consulting.com/tutorials/strictalias.html
There are several different solutions there, but the most portable/safe one is to use memcpy(). (The function calls may be optimized out, so it's not as inefficient as it appears.) For example, replace this:
这里有几种不同的解决方案,但是最便携/最安全的方法是使用memcpy()。(函数调用可能会被优化,所以不会像看起来那样低效。)例如,替换:
return *(short*)data;
With this:
用这个:
short temp;
memcpy(&temp, data, sizeof(temp));
return temp;
#5
2
Basically you can read gcc's message as guy you are looking for trouble, don't say I didn't warn ya.
基本上你可以把gcc的信息当成你在找麻烦的人来读,不要说我没警告过你。
Casting a three byte character array to an int
is one of the worst things I have seen, ever. Normally your int
has at least 4 bytes. So for the fourth (and maybe more if int
is wider) you get random data. And then you cast all of this to a double
.
将一个3字节字符数组转换为int是我见过的最糟糕的事情之一。通常,int至少有4个字节。所以对于第四个(如果int更宽的话)你会得到随机数据。然后你把所有这些都变成了双倍。
Just do none of that. The aliasing problem that gcc warns about is innocent compared to what you are doing.
不要这样做。gcc警告的混叠问题与你正在做的相比是无害的。
#6
0
The authors of the C Standard wanted to let compiler writers generate efficient code in circumstances where it would be theoretically possible but unlikely that a global variable might have its value accessed using a seemingly-unrelated pointer. The idea wasn't to forbid type punning by casting and dereferencing a pointer in a single expression, but rather to say that given something like:
C标准的作者希望让编译器作者在理论上可能但不太可能使用看似不相关的指针访问全局变量的情况下生成有效的代码。这个想法并不是要通过在一个表达式中强制和取消引用一个指针来禁止输入双关语,而是说如果给定了以下内容:
int x;
int foo(double *d)
{
x++;
*d=1234;
return x;
}
a compiler would be entitled to assume that the write to *d won't affect x. The authors of the Standard wanted to list situations where a function like the above that received a pointer from an unknown source would have to assume that it might alias a seemingly-unrelated global, without requiring that types perfectly match. Unfortunately, while the rationale strongly suggests that authors of the Standard intended to describe a standard for minimum conformance in cases where a compiler would otherwise have no reason to believe that things might alias, the rule fails to require that compilers recognize aliasing in cases where it is obvious and the authors of gcc have decided that they'd rather generate the smallest program it can while conforming to the poorly-written language of the Standard, than generate code which is actually useful, and instead of recognizing aliasing in cases where it's obvious (while still being able to assume that things that don't look like they'll alias, won't) they'd rather require that programmers use memcpy
, thus requiring a compiler to allow for the possibility that pointers of unknown origin might alias just about anything, thus impeding optimization.
编译器将有权认为写* d不会影响x。标准的作者想要列表函数像上述的情况下收到了来自未知来源指针必须假定可能别名看似无关的全球,而无需类型完美匹配。不幸的是,虽然标准的理由强烈表明,作者旨在描述一个最低标准一致性的情况下编译器否则没有理由相信事情可能别名,规则没有要求编译器识别混叠的情况下很明显和gcc的作者已经决定,他们宁愿生成最小的程序可以编写符合语言的标准,比生成代码实际上是有用的,而不是认识到混叠的情况下很明显(同时仍然能够假定事情看起来不像他们会别名,不会)他们宁愿要求程序员使用memcpy,因此要求编译器允许的可能性几乎任何来历不明的指针可能别名,因此阻碍优化。
#7
-4
Apparently the standard allows sizeof(char*) to be different from sizeof(int*) so gcc complains when you try a direct cast. void* is a little special in that everything can be converted back and forth to and from void*. In practice I don't know many architecture/compiler where a pointer is not always the same for all types but gcc is right to emit a warning even if it is annoying.
显然,标准允许sizeof(char*)与sizeof(int*)不同,所以当您尝试直接转换时,gcc会抱怨。void*有点特殊,因为所有东西都可以来回转换到或从void*中转换。在实践中,我不知道很多体系结构/编译器中指针对于所有类型并不总是相同的,但是gcc发出警告是正确的,即使它很烦人。
I think the safe way would be
我认为安全的方法是
int i, *p = &i;
char *q = (char*)&p[0];
or
或
char *q = (char*)(void*)p;
You can also try this and see what you get:
你也可以试试这个,看看你能得到什么:
char *q = reinterpret_cast<char*>(p);