修正了取消引用类型的指针将会打破严格的别名

时间:2022-03-14 02:14:27

I'm trying to fix two warnings when compiling a specific program using GCC. The warnings are:

在使用GCC编译特定程序时,我试图修正两个警告。这些警告是:

warning: dereferencing type-punned pointer will break strict-aliasing rules [-Wstrict-aliasing]

警告:取消输入类型的指针将破坏严格的混叠规则[-Wstrict-aliasing]

and the two culprits are:

这两个罪犯是:

unsigned int received_size = ntohl (*((unsigned int*)dcc->incoming_buf));

and

*((unsigned int*)dcc->outgoing_buf) = htonl (dcc->file_confirm_offset);

incoming_buf and outgoing_buf are defined as follows:

incoming_buf和outgoing_buf的定义如下:

char                    incoming_buf[LIBIRC_DCC_BUFFER_SIZE];

char                    outgoing_buf[LIBIRC_DCC_BUFFER_SIZE];

This seems subtly different than the other examples of that warning I've been examining. I would prefer to fix the problem rather than disable strict-aliasing checks.

这似乎与我研究过的其他警告的例子有微妙的不同。我宁愿解决这个问题,也不愿禁用严格假锯齿检查。

There have been many suggestions to use a union - what might be a suitable union for this case?

有许多建议可以使用工会——在这种情况下,什么是合适的工会?

5 个解决方案

#1


28  

First off, let's examine why you get the aliasing violation warnings.

首先,让我们检查一下为什么会出现假消息违反警告。

Aliasing rules simply say that you can only access an object through its own type, its signed / unsigned variant type, or through a character type (char, signed char, unsigned char).

别名规则简单地说,您只能通过对象本身的类型、它的有符号/无符号变体类型或通过字符类型(char、有符号char、无符号char)访问对象。

C says violating aliasing rules invokes undefined behavior (so don't!).

C说违反混叠规则会引发未定义的行为(所以不要!)

In this line of your program:

在你的项目中:

unsigned int received_size = ntohl (*((unsigned int*)dcc->incoming_buf));

although the elements of the incoming_buf array are of type char, you are accessing them as unsigned int. Indeed the result of the dereference operator in the expression *((unsigned int*)dcc->incoming_buf) is of unsigned int type.

虽然incoming_buf数组的元素是char类型的,但是您将它们作为无符号int进行访问。

This is a violation of the aliasing rules, because you only have the right to access elements of incoming_buf array through (see rules summary above!) char, signed char or unsigned char.

这违反了混叠规则,因为您只有权限访问incoming_buf数组的元素(请参阅上面的规则摘要)char、signed char或unsigned char。

Notice you have exactly the same aliasing issue in your second culprit:

注意,你的第二个罪人也有同样的混叠问题:

*((unsigned int*)dcc->outgoing_buf) = htonl (dcc->file_confirm_offset);

You access the char elements of outgoing_buf through unsigned int, so it's an aliasing violation.

您可以通过无符号int访问outgoing_buf的char元素,因此这是一个别名冲突。

Proposed solution

建议的解决方案

To fix your issue, you could try to have the elements of your arrays directly defined in the type you want to access:

为了解决您的问题,您可以尝试将数组的元素直接定义为您希望访问的类型:

unsigned int incoming_buf[LIBIRC_DCC_BUFFER_SIZE / sizeof (unsigned int)];
unsigned int outgoing_buf[LIBIRC_DCC_BUFFER_SIZE / sizeof (unsigned int)];

(By the way the width of unsigned int is implementation defined, so you should consider using uint32_t if your program assumes unsigned int is 32-bit).

(顺便说一下,unsigned int的宽度是实现定义的,所以如果程序假定unsigned int是32位的,那么应该考虑使用uint32_t)。

This way you could store unsigned int objects in your array without violating the aliasing rules by accessing the element through the type char, like this:

通过这种方式,您可以通过char类型访问元素,在数组中存储无符号int对象,而不会违反别名规则,如下所示:

*((char *) outgoing_buf) =  expr_of_type_char;

or

char_lvalue = *((char *) incoming_buf);

EDIT:

编辑:

I've entirely reworked my answer, in particular I explain why the program gets the aliasing warnings from the compiler.

我已经完全修改了我的答案,特别是我解释了为什么程序会从编译器获得别名警告。

#2


15  

To fix the problem, don't pun and alias! The only "correct" way to read a type T is to allocate a type T and populate its representation if needed:

要解决这个问题,不要使用双关语和别名!读取类型T的唯一“正确”方法是分配类型T并在需要时填充它的表示形式:

uint32_t n;
memcpy(&n, dcc->incoming_buf, 4);

In short: If you want an integer, you need to make an integer. There's no way to cheat around that in a language-condoned way.

简而言之:如果您想要一个整数,您需要创建一个整数。没有办法用一种语言来作弊——这是可以容忍的。

The only pointer conversion which you are allowed (for purposes of I/O, generally) is to treat the address of an existing variable of type T as a char*, or rather, as the pointer to the first element of an array of chars of size sizeof(T).

惟一允许的指针转换(一般来说,出于I/O的目的)是将T类型的现有变量的地址作为char*,或者更确切地说,作为大小为sizeof(T)的chars数组的第一个元素的指针。

#3


3  

union
{
    const unsigned int * int_val_p;
    const char* buf;
} xyz;

xyz.buf = dcc->incoming_buf;
unsigned int received_size = ntohl(*(xyz.int_val_p));

Simplified explanation 1. c++ standard states that you should attempt to align data yourself, g++ goes an extra mile to generate warnings on the subject. 2. you should only attempt it if you completely understand the data alignment on your architecture/system and inside your code (for example the code above is a sure thing on Intel 32/64 ; alignment 1; Win/Linux/Bsd/Mac) 3. the only practical reason to use the code above is to avoid compiler warnings , WHEN and IF you know what you are doing

简化的解释1。c++标准规定,您应该尝试自己对数据进行对齐,g++在这个主题上做了额外的工作以生成警告。2。如果您完全了解架构/系统和代码内部的数据对齐(例如,上面的代码在Intel 32/64上是确定的,那么您应该尝试它)。调整1;赢/ Linux / Bsd / Mac)3。使用上述代码的唯一实际原因是避免编译器警告,如果您知道自己在做什么,什么时候做什么

#4


0  

If I may, IMHO, for this case, the problem is the design of the ntohl and htonl and related function APIs. They should not have been written as numeric argument with numeric return. (and yes, I understand the macro optimization point) They should have been designed as the 'n' side being a pointer to a buffer. When this is done, the whole problem goes away and the routine is accurate whichever endian the host is. For example (with no attempt to optimize):

在这种情况下,问题在于ntohl和htonl的设计以及相关的函数api。它们不应该被写成带有数值返回的数值参数。(是的,我理解宏观优化点)它们应该被设计成'n'端是指向缓冲区的指针。这样做之后,整个问题就消失了,例程无论主机是哪个,都是准确的。例如(不尝试优化):

inline void safe_htonl(unsigned char *netside, unsigned long value) {
    netside[3] = value & 0xFF;
    netside[2] = (value >> 8) & 0xFF;
    netside[1] = (value >> 16) & 0xFF;
    netside[0] = (value >> 24) & 0xFF;
};

#5


-2  

Cast pointer to unsigned and then back to pointer.

将指针转换为无符号指针,然后再转换回指针。

unsigned int received_size = ntohl (*((unsigned *)((unsigned) dcc->incoming_buf)) );

unsigned int received_size = ntohl (*(((unsigned *)(unsigned) dcc->incoming_buf)));

#1


28  

First off, let's examine why you get the aliasing violation warnings.

首先,让我们检查一下为什么会出现假消息违反警告。

Aliasing rules simply say that you can only access an object through its own type, its signed / unsigned variant type, or through a character type (char, signed char, unsigned char).

别名规则简单地说,您只能通过对象本身的类型、它的有符号/无符号变体类型或通过字符类型(char、有符号char、无符号char)访问对象。

C says violating aliasing rules invokes undefined behavior (so don't!).

C说违反混叠规则会引发未定义的行为(所以不要!)

In this line of your program:

在你的项目中:

unsigned int received_size = ntohl (*((unsigned int*)dcc->incoming_buf));

although the elements of the incoming_buf array are of type char, you are accessing them as unsigned int. Indeed the result of the dereference operator in the expression *((unsigned int*)dcc->incoming_buf) is of unsigned int type.

虽然incoming_buf数组的元素是char类型的,但是您将它们作为无符号int进行访问。

This is a violation of the aliasing rules, because you only have the right to access elements of incoming_buf array through (see rules summary above!) char, signed char or unsigned char.

这违反了混叠规则,因为您只有权限访问incoming_buf数组的元素(请参阅上面的规则摘要)char、signed char或unsigned char。

Notice you have exactly the same aliasing issue in your second culprit:

注意,你的第二个罪人也有同样的混叠问题:

*((unsigned int*)dcc->outgoing_buf) = htonl (dcc->file_confirm_offset);

You access the char elements of outgoing_buf through unsigned int, so it's an aliasing violation.

您可以通过无符号int访问outgoing_buf的char元素,因此这是一个别名冲突。

Proposed solution

建议的解决方案

To fix your issue, you could try to have the elements of your arrays directly defined in the type you want to access:

为了解决您的问题,您可以尝试将数组的元素直接定义为您希望访问的类型:

unsigned int incoming_buf[LIBIRC_DCC_BUFFER_SIZE / sizeof (unsigned int)];
unsigned int outgoing_buf[LIBIRC_DCC_BUFFER_SIZE / sizeof (unsigned int)];

(By the way the width of unsigned int is implementation defined, so you should consider using uint32_t if your program assumes unsigned int is 32-bit).

(顺便说一下,unsigned int的宽度是实现定义的,所以如果程序假定unsigned int是32位的,那么应该考虑使用uint32_t)。

This way you could store unsigned int objects in your array without violating the aliasing rules by accessing the element through the type char, like this:

通过这种方式,您可以通过char类型访问元素,在数组中存储无符号int对象,而不会违反别名规则,如下所示:

*((char *) outgoing_buf) =  expr_of_type_char;

or

char_lvalue = *((char *) incoming_buf);

EDIT:

编辑:

I've entirely reworked my answer, in particular I explain why the program gets the aliasing warnings from the compiler.

我已经完全修改了我的答案,特别是我解释了为什么程序会从编译器获得别名警告。

#2


15  

To fix the problem, don't pun and alias! The only "correct" way to read a type T is to allocate a type T and populate its representation if needed:

要解决这个问题,不要使用双关语和别名!读取类型T的唯一“正确”方法是分配类型T并在需要时填充它的表示形式:

uint32_t n;
memcpy(&n, dcc->incoming_buf, 4);

In short: If you want an integer, you need to make an integer. There's no way to cheat around that in a language-condoned way.

简而言之:如果您想要一个整数,您需要创建一个整数。没有办法用一种语言来作弊——这是可以容忍的。

The only pointer conversion which you are allowed (for purposes of I/O, generally) is to treat the address of an existing variable of type T as a char*, or rather, as the pointer to the first element of an array of chars of size sizeof(T).

惟一允许的指针转换(一般来说,出于I/O的目的)是将T类型的现有变量的地址作为char*,或者更确切地说,作为大小为sizeof(T)的chars数组的第一个元素的指针。

#3


3  

union
{
    const unsigned int * int_val_p;
    const char* buf;
} xyz;

xyz.buf = dcc->incoming_buf;
unsigned int received_size = ntohl(*(xyz.int_val_p));

Simplified explanation 1. c++ standard states that you should attempt to align data yourself, g++ goes an extra mile to generate warnings on the subject. 2. you should only attempt it if you completely understand the data alignment on your architecture/system and inside your code (for example the code above is a sure thing on Intel 32/64 ; alignment 1; Win/Linux/Bsd/Mac) 3. the only practical reason to use the code above is to avoid compiler warnings , WHEN and IF you know what you are doing

简化的解释1。c++标准规定,您应该尝试自己对数据进行对齐,g++在这个主题上做了额外的工作以生成警告。2。如果您完全了解架构/系统和代码内部的数据对齐(例如,上面的代码在Intel 32/64上是确定的,那么您应该尝试它)。调整1;赢/ Linux / Bsd / Mac)3。使用上述代码的唯一实际原因是避免编译器警告,如果您知道自己在做什么,什么时候做什么

#4


0  

If I may, IMHO, for this case, the problem is the design of the ntohl and htonl and related function APIs. They should not have been written as numeric argument with numeric return. (and yes, I understand the macro optimization point) They should have been designed as the 'n' side being a pointer to a buffer. When this is done, the whole problem goes away and the routine is accurate whichever endian the host is. For example (with no attempt to optimize):

在这种情况下,问题在于ntohl和htonl的设计以及相关的函数api。它们不应该被写成带有数值返回的数值参数。(是的,我理解宏观优化点)它们应该被设计成'n'端是指向缓冲区的指针。这样做之后,整个问题就消失了,例程无论主机是哪个,都是准确的。例如(不尝试优化):

inline void safe_htonl(unsigned char *netside, unsigned long value) {
    netside[3] = value & 0xFF;
    netside[2] = (value >> 8) & 0xFF;
    netside[1] = (value >> 16) & 0xFF;
    netside[0] = (value >> 24) & 0xFF;
};

#5


-2  

Cast pointer to unsigned and then back to pointer.

将指针转换为无符号指针,然后再转换回指针。

unsigned int received_size = ntohl (*((unsigned *)((unsigned) dcc->incoming_buf)) );

unsigned int received_size = ntohl (*(((unsigned *)(unsigned) dcc->incoming_buf)));