c ++标准保证的二进制数据布局

时间:2021-10-12 16:28:21

This is purely a theoretical problem, nothing I have really found myself in, but it has piqued my curiosity and wanted to see if anyone has a better solution for it:

这纯粹是一个理论问题,我没有真正发现自己,但它激起了我的好奇心,想看看是否有人有更好的解决方案:

How do you portably guarantee that an specific file format / network protocol or whatever conforms to a specific bit pattern.

如何可移植地保证特定的文件格式/网络协议或其他符合特定位模式的协议。

Say we have a file format that uses a 64 bit header struct immediately followed by a variable length array of 32 bit structures:

假设我们的文件格式使用64位头结构,紧接着是32位结构的可变长度数组:

Header:  magic : 32 bit
         count : 32 bit

Field :  id   : 16 bit
         data : 16 bit

My first instinct would be to write something like:

我的第一直觉是写下这样的东西:

struct Field
{
    uint16_t id   ;
    uint16_t data ;
};

Except that our compiler may decide that padding is advisable and we end up with a 64 bit structure. So our next bet is:

除了我们的编译器可能决定填充是可取的,我们最终得到64位结构。所以我们的下一个赌注是:

using Field = uint16_t[2];

and work on that.

并努力。

That is, unless someone has carefully read the standard and noticed that uint16_t is optional. At this point our next best friend is uint_least16_t, which is guaranteed to be at least 16 bits long, but for all we know could be 20 bits long in a 10 bit / char processor.

也就是说,除非有人仔细阅读标准,并注意到uint16_t是可选的。在这一点上,我们的下一个最好的朋友是uint_least16_t,保证至少16位长,但我们知道在10位/字符处理器中可能只有20位长。

At this point, the only real solution I can come up with is some sort of bit stream, capable of reading and writing specific amounts of bits, and adaptable by std::numeric_limits.

在这一点上,我能想出的唯一真正的解决方案是某种比特流,能够读取和写入特定数量的比特,并且可以通过std :: numeric_limits进行调整。

So, is there someone out there who has very carefully read the standard and found the point I'm missing? Or it is this the only real way of having a portable guarantee.

那么,是否有人非常认真地阅读标准并找到了我遗漏的观点?或者这是拥有便携式保证的唯一真正方式。

Notes: - I've just realized that endianness would probably add another layer of complexity. - I'm using the current working draft of the ISO standard (N3797).

注意: - 我刚刚意识到字节序可能会增加另一层复杂性。 - 我正在使用ISO标准的当前工作草案(N3797)。

4 个解决方案

#1


2  

How do you portably guarantee that an specific file format / network protocol or whatever conforms to a specific bit pattern.

如何可移植地保证特定的文件格式/网络协议或其他符合特定位模式的协议。

You can't. Not in C++, which was standardized against an abstract platform where little more than the existence of a "byte" that is made up of bits can be assumed. We can't even say for certain, in looking only at the Standard, how many bits are in a char. You can use bitfields for everything, as bits are indivsible, but then you'll have padding to contend with at the least.

你不能。不是在C ++中,它是针对抽象平台标准化的,只能假设存在由比特组成的“字节”。我们甚至不能肯定地说,只看标准,char中有多少位。你可以使用位域来处理所有事情,因为比特是不可分割的,但是你至少可以使用填充来应对。

Sometimes it is best to give up on the idea of absolute Standards conformance for the sake of conformance, and look to other means to get the job done efficiently and effectively. In this case, platform specifics in combination with almost absolute Standards conformance (aka, good programming practices) will set you free.

有时为了符合性,最好放弃绝对标准一致性的想法,并寻求其他方法来有效地完成工作。在这种情况下,平台细节与​​几乎绝对的标准一致性(又名,良好的编程实践)将使您*。

Every platform I work on regularly (linux & windows) provides a means to regulate the padding the compiler will actually apply. For network communications, under Linux & Windows I use:

我经常工作的每个平台(linux和windows)都提供了一种方法来规范编译器实际应用的填充。对于网络通信,在Linux和Windows下我使用:

#pragma pack (push, 1)

as a preface to all the data structures I'm going to send over the wire. Endianness is indeed another challenge, but one more or less easily dealt with using other resources provided by every platform: ntohl and the like.

作为我要通过网络发送的所有数据结构的前言。 Endianness确实是另一个挑战,但是或多或少容易使用每个平台提供的其他资源:ntohl等。

Standards conformance is a laudable goal, and indeed in a code review I would reject most code that is non-conformant. The lack of conformance is really just a moniker for the rejection however; not the reason itself. The actual reason for the rejection is in large part difficulty in maintaining and porting non-conformant code when moving to another platform, or indeed even just upgrading the compiler on the same platform. Non-conformant code might compile and even appear to work, but it will very often fail in subtle and miserable ways when you least expect it, even after thorough testing.

标准一致性是一个值得称赞的目标,实际上在代码审查中,我会拒绝大多数不符合的代码。然而,缺乏一致性实际上只是拒绝的绰号;不是原因本身。拒绝的实际原因很大程度上是在移动到另一个平台时维护和移植不符合代码的难度,或者甚至只是在同一平台上升级编译器。不符合代码的代码可能会编译甚至看似可行,但即使经过全面测试,它也会在您最不期望的时候以微妙和悲惨的方式失败。

The moral of the story is:

这个故事的寓意是:

You should always write Standards-conformant code, except when you shouldn't.

您应该始终编写符合标准的代码,除非您不应该这样做。

This really is just a re-imagining of Einstein's articulation of Occam's Razor:

这真的只是对爱因斯坦对奥卡姆剃刀表达的重新想象:

Make everything as simple as possible, but no simpler.

让一切尽可能简单,但并不简单。

#2


2  

If you want to ensure portability to everything standard-conforming, including platforms for which CHAR_BITS isn't 8, well, you've got your work cut out for you.

如果您想确保符合标准的所有内容的可移植性,包括CHAR_BITS不是8的平台,那么,您已经为您完成了工作。

If you are comfortable limiting yourself to 98% of the computers you'll ever program, I recommend writing explicit serialization for anything that has to adhere to a particular wire-format. That includes breaking integers into bytes, etc.

如果您愿意将自己限制在98%的计算机上,我建议为必须遵守特定线路格式的任何内容编写显式序列化。这包括将整数分成字节等。

Write appropriate abstractions around things and the code won't be too bad. Don't put shifts and masks everywhere. Encapsulate it.

在事物周围写出适当的抽象,代码也不会太糟糕。不要在任何地方放置班次和面具。封装它。

#3


0  

I would use network types and network byte orders. See this link.http://www.beej.us/guide/bgnet/output/html/multipage/htonsman.html. The example uses uint16_t. You can write the values a field at a time to prevent padding. Or if you want to read and write the entire structure at one see this link C++ struct alignment question

我会使用网络类型和网络字节顺序。请参阅此链接.http://www.beej.us/guide/bgnet/output/html/multipage/htonsman.html。该示例使用uint16_t。您可以一次写入一个字段以防止填充。或者,如果您想要读取和编写整个结构,请参阅此链接C ++ struct alignment问题

#4


0  

Make the structure easy for the program to use.

使程序易于使用。

Provide input methods that extract data from the input and write to the data members. This removes the issue of padding, alignment boundaries and endianness. Similarly with output.

提供输入方法,从输入中提取数据并写入数据成员。这消除了填充,对齐边界和字节序的问题。与输出类似。

For example, if your input data is 16-bits wide, but your platform is 32-bits wide, declare the structure using 32-bit fields. Copy the 16 bits from the input into the 32-bit fields.

例如,如果输入数据为16位宽,但平台为32位宽,则使用32位字段声明结构。将16位从输入复制到32位字段。

Most programs read into a structure fewer times than they access the data members. Your program is not reading the input 100% of the time.

大多数程序读入结构的次数少于访问数据成员的次数。您的程序没有100%的时间读取输入。

#1


2  

How do you portably guarantee that an specific file format / network protocol or whatever conforms to a specific bit pattern.

如何可移植地保证特定的文件格式/网络协议或其他符合特定位模式的协议。

You can't. Not in C++, which was standardized against an abstract platform where little more than the existence of a "byte" that is made up of bits can be assumed. We can't even say for certain, in looking only at the Standard, how many bits are in a char. You can use bitfields for everything, as bits are indivsible, but then you'll have padding to contend with at the least.

你不能。不是在C ++中,它是针对抽象平台标准化的,只能假设存在由比特组成的“字节”。我们甚至不能肯定地说,只看标准,char中有多少位。你可以使用位域来处理所有事情,因为比特是不可分割的,但是你至少可以使用填充来应对。

Sometimes it is best to give up on the idea of absolute Standards conformance for the sake of conformance, and look to other means to get the job done efficiently and effectively. In this case, platform specifics in combination with almost absolute Standards conformance (aka, good programming practices) will set you free.

有时为了符合性,最好放弃绝对标准一致性的想法,并寻求其他方法来有效地完成工作。在这种情况下,平台细节与​​几乎绝对的标准一致性(又名,良好的编程实践)将使您*。

Every platform I work on regularly (linux & windows) provides a means to regulate the padding the compiler will actually apply. For network communications, under Linux & Windows I use:

我经常工作的每个平台(linux和windows)都提供了一种方法来规范编译器实际应用的填充。对于网络通信,在Linux和Windows下我使用:

#pragma pack (push, 1)

as a preface to all the data structures I'm going to send over the wire. Endianness is indeed another challenge, but one more or less easily dealt with using other resources provided by every platform: ntohl and the like.

作为我要通过网络发送的所有数据结构的前言。 Endianness确实是另一个挑战,但是或多或少容易使用每个平台提供的其他资源:ntohl等。

Standards conformance is a laudable goal, and indeed in a code review I would reject most code that is non-conformant. The lack of conformance is really just a moniker for the rejection however; not the reason itself. The actual reason for the rejection is in large part difficulty in maintaining and porting non-conformant code when moving to another platform, or indeed even just upgrading the compiler on the same platform. Non-conformant code might compile and even appear to work, but it will very often fail in subtle and miserable ways when you least expect it, even after thorough testing.

标准一致性是一个值得称赞的目标,实际上在代码审查中,我会拒绝大多数不符合的代码。然而,缺乏一致性实际上只是拒绝的绰号;不是原因本身。拒绝的实际原因很大程度上是在移动到另一个平台时维护和移植不符合代码的难度,或者甚至只是在同一平台上升级编译器。不符合代码的代码可能会编译甚至看似可行,但即使经过全面测试,它也会在您最不期望的时候以微妙和悲惨的方式失败。

The moral of the story is:

这个故事的寓意是:

You should always write Standards-conformant code, except when you shouldn't.

您应该始终编写符合标准的代码,除非您不应该这样做。

This really is just a re-imagining of Einstein's articulation of Occam's Razor:

这真的只是对爱因斯坦对奥卡姆剃刀表达的重新想象:

Make everything as simple as possible, but no simpler.

让一切尽可能简单,但并不简单。

#2


2  

If you want to ensure portability to everything standard-conforming, including platforms for which CHAR_BITS isn't 8, well, you've got your work cut out for you.

如果您想确保符合标准的所有内容的可移植性,包括CHAR_BITS不是8的平台,那么,您已经为您完成了工作。

If you are comfortable limiting yourself to 98% of the computers you'll ever program, I recommend writing explicit serialization for anything that has to adhere to a particular wire-format. That includes breaking integers into bytes, etc.

如果您愿意将自己限制在98%的计算机上,我建议为必须遵守特定线路格式的任何内容编写显式序列化。这包括将整数分成字节等。

Write appropriate abstractions around things and the code won't be too bad. Don't put shifts and masks everywhere. Encapsulate it.

在事物周围写出适当的抽象,代码也不会太糟糕。不要在任何地方放置班次和面具。封装它。

#3


0  

I would use network types and network byte orders. See this link.http://www.beej.us/guide/bgnet/output/html/multipage/htonsman.html. The example uses uint16_t. You can write the values a field at a time to prevent padding. Or if you want to read and write the entire structure at one see this link C++ struct alignment question

我会使用网络类型和网络字节顺序。请参阅此链接.http://www.beej.us/guide/bgnet/output/html/multipage/htonsman.html。该示例使用uint16_t。您可以一次写入一个字段以防止填充。或者,如果您想要读取和编写整个结构,请参阅此链接C ++ struct alignment问题

#4


0  

Make the structure easy for the program to use.

使程序易于使用。

Provide input methods that extract data from the input and write to the data members. This removes the issue of padding, alignment boundaries and endianness. Similarly with output.

提供输入方法,从输入中提取数据并写入数据成员。这消除了填充,对齐边界和字节序的问题。与输出类似。

For example, if your input data is 16-bits wide, but your platform is 32-bits wide, declare the structure using 32-bit fields. Copy the 16 bits from the input into the 32-bit fields.

例如,如果输入数据为16位宽,但平台为32位宽,则使用32位字段声明结构。将16位从输入复制到32位字段。

Most programs read into a structure fewer times than they access the data members. Your program is not reading the input 100% of the time.

大多数程序读入结构的次数少于访问数据成员的次数。您的程序没有100%的时间读取输入。