I'm writing a program in C for Linux on an ARM9 processor. The program is to access network packets which include a sequence of tagged data like:
我正在用ARM9处理器在C for Linux上编写程序。该程序用于访问网络数据包,其中包括一系列标记数据,如:
<fieldID><length><data><fieldID><length><data> ...
The fieldID and length fields are both uint16_t. The data can be 1 or more bytes (up to 64k if the full length was used, but it's not).
fieldID和length字段都是uint16_t。数据可以是1个或更多字节(如果使用全长,则最多64k,但事实并非如此)。
As long as <data>
has an even number of bytes, I don't see a problem. But if I have a 1- or 3- or 5-byte <data>
section then the next 16-bit fieldID ends up not on a 16-bit boundary and I anticipate alignment issues. It's been a while since I've done any thing like this from scratch so I'm a little unsure of the details. Any feedback welcome. Thanks.
只要 具有偶数个字节,我就不会发现问题。但是如果我有一个1或3或5字节 部分,那么下一个16位fieldID最终不会出现在16位边界上,我预计会出现对齐问题。已经有一段时间了,因为我从头做过这样的事情,所以我对细节不太了解。任何反馈欢迎。谢谢。
4 个解决方案
#1
To avoid alignment issues in this case, access all data as an unsigned char *
. So:
为了避免在这种情况下出现对齐问题,请将所有数据作为unsigned char *访问。所以:
unsigned char *p;
//...
uint16_t id = p[0] | (p[1] << 8);
p += 2;
The above example assumes "little endian" data layout, where the least significant byte comes first in a multi-byte number.
上面的示例假定为“little endian”数据布局,其中最低有效字节首先出现在多字节数字中。
#2
You should have functions (inline and/or templated if the language you're using supports those features) that will read the potentially unaligned data and return the data type you're interested in. Something like:
您应该具有函数(如果您使用的语言支持这些功能,则为内联和/或模板化),这些函数将读取可能未对齐的数据并返回您感兴趣的数据类型。例如:
uint16_t unaligned_uint16( void* p)
{
// this assumes big-endian values in data stream
// (which is common, but not universal in network
// communications) - this may or may not be
// appropriate in your case
unsigned char* pByte = (unsigned char*) p;
uint16_t val = (pByte[0] << 8) | pByte[1];
return val;
}
#3
The easy way is to manually rebuild the uint16_t
s, at the expense of speed:
简单的方法是以速度为代价手动重建uint16_ts:
uint8_t *packet = ...;
uint16_t fieldID = (packet[0] << 8) | packet[1]; // assumes big-endian host order
uint16_t length = (packet[2] << 8) | packet[2];
uint8_t *data = packet + 4;
packet += 4 + length;
If your processor supports it, you can type-pun or use a union (but beware of strict aliasing).
如果你的处理器支持它,你可以输入pun或者使用union(但要注意严格的别名)。
uint16_t fieldID = htons(*(uint16_t *)packet);
uint16_t length = htons(*(uint16_t *)(packet + 2));
Note that unaligned access aren't always supported (e.g. they might generate a fault of some sort), and on other architectures, they're supported, but there's a performance penalty.
请注意,并不总是支持未对齐访问(例如,它们可能会生成某种类型的错误),而在其他体系结构上,它们受到支持,但性能会受到影响。
If the packet isn't aligned, you could always copy it into a static buffer and then read it:
如果数据包未对齐,您可以始终将其复制到静态缓冲区中然后读取它:
static char static_buffer[65540];
memcpy(static_buffer, packet, packet_size); // make sure packet_size <= 65540
uint16_t fieldId = htons(*(uint16_t *)static_buffer);
uint16_t length = htons(*(uint16_t *)(static_buffer + 2));
Personally, I'd just go for option #1, since it'll be the most portable.
就个人而言,我只是选择#1,因为它将是最便携的。
#4
Alignment is always going to be fine, although perhaps not super-efficient, if you go through a byte pointer.
如果你通过一个字节指针,对齐总是很好,虽然可能不是超级高效。
Setting aside issues of endian-ness, you can memcpy from the 'real' byte pointer into whatever you want/need that is properly aligned and you will be fine.
抛开endian-ness的问题,你可以将'真实'字节指针存储到你想要/需要的任何正确对齐的内容中,你就可以了。
(this works because the generated code will load/store the data as bytes, which is alignment safe. It's when the generated assembly has instructions loading and storing 16/32/64 bits of memory in a mis-aligned manner that it all falls apart).
(这是有效的,因为生成的代码将数据加载/存储为字节,这是对齐安全的。当生成的程序集有指令以错误对齐的方式加载和存储16/32/64位内存时,它们全部崩溃)。
#1
To avoid alignment issues in this case, access all data as an unsigned char *
. So:
为了避免在这种情况下出现对齐问题,请将所有数据作为unsigned char *访问。所以:
unsigned char *p;
//...
uint16_t id = p[0] | (p[1] << 8);
p += 2;
The above example assumes "little endian" data layout, where the least significant byte comes first in a multi-byte number.
上面的示例假定为“little endian”数据布局,其中最低有效字节首先出现在多字节数字中。
#2
You should have functions (inline and/or templated if the language you're using supports those features) that will read the potentially unaligned data and return the data type you're interested in. Something like:
您应该具有函数(如果您使用的语言支持这些功能,则为内联和/或模板化),这些函数将读取可能未对齐的数据并返回您感兴趣的数据类型。例如:
uint16_t unaligned_uint16( void* p)
{
// this assumes big-endian values in data stream
// (which is common, but not universal in network
// communications) - this may or may not be
// appropriate in your case
unsigned char* pByte = (unsigned char*) p;
uint16_t val = (pByte[0] << 8) | pByte[1];
return val;
}
#3
The easy way is to manually rebuild the uint16_t
s, at the expense of speed:
简单的方法是以速度为代价手动重建uint16_ts:
uint8_t *packet = ...;
uint16_t fieldID = (packet[0] << 8) | packet[1]; // assumes big-endian host order
uint16_t length = (packet[2] << 8) | packet[2];
uint8_t *data = packet + 4;
packet += 4 + length;
If your processor supports it, you can type-pun or use a union (but beware of strict aliasing).
如果你的处理器支持它,你可以输入pun或者使用union(但要注意严格的别名)。
uint16_t fieldID = htons(*(uint16_t *)packet);
uint16_t length = htons(*(uint16_t *)(packet + 2));
Note that unaligned access aren't always supported (e.g. they might generate a fault of some sort), and on other architectures, they're supported, but there's a performance penalty.
请注意,并不总是支持未对齐访问(例如,它们可能会生成某种类型的错误),而在其他体系结构上,它们受到支持,但性能会受到影响。
If the packet isn't aligned, you could always copy it into a static buffer and then read it:
如果数据包未对齐,您可以始终将其复制到静态缓冲区中然后读取它:
static char static_buffer[65540];
memcpy(static_buffer, packet, packet_size); // make sure packet_size <= 65540
uint16_t fieldId = htons(*(uint16_t *)static_buffer);
uint16_t length = htons(*(uint16_t *)(static_buffer + 2));
Personally, I'd just go for option #1, since it'll be the most portable.
就个人而言,我只是选择#1,因为它将是最便携的。
#4
Alignment is always going to be fine, although perhaps not super-efficient, if you go through a byte pointer.
如果你通过一个字节指针,对齐总是很好,虽然可能不是超级高效。
Setting aside issues of endian-ness, you can memcpy from the 'real' byte pointer into whatever you want/need that is properly aligned and you will be fine.
抛开endian-ness的问题,你可以将'真实'字节指针存储到你想要/需要的任何正确对齐的内容中,你就可以了。
(this works because the generated code will load/store the data as bytes, which is alignment safe. It's when the generated assembly has instructions loading and storing 16/32/64 bits of memory in a mis-aligned manner that it all falls apart).
(这是有效的,因为生成的代码将数据加载/存储为字节,这是对齐安全的。当生成的程序集有指令以错误对齐的方式加载和存储16/32/64位内存时,它们全部崩溃)。