序列化C样式的结构(使用c++)

Is it evil to serialize struct objects using memcpy?

使用memcpy序列化结构对象是否有害?

In one of my projects I am doing the following: I memcpy a struct object, base64 encode it, and write it to file. I do the inverse when parsing the data. It seems to work OK, but in certain situations (for example when using the WINDOWPLACEMENT for the HWND of Windows Media Player) it turns out that the decoded data does not match sizeof(WINDOWPLACEMENT).

在我的一个项目中，我正在执行以下操作:我将一个struct对象memcpy，对它进行base64编码，并将其写入file中。我在解析数据时做反向操作。它似乎可以工作，但在某些情况下(例如，当使用windowsmediaplayer的HWND的WINDOWPLACEMENT时)，解码的数据与sizeof(WINDOWPLACEMENT)不匹配。

Here are some code fragments:

以下是一些代码片段:

// Using WINDOWPLACEMENT from Windows API headers:
typedef struct tagWINDOWPLACEMENT {
    UINT  length;
    UINT  flags;
    UINT  showCmd;
    POINT ptMinPosition;
    POINT ptMaxPosition;
    RECT  rcNormalPosition;
#ifdef _MAC
    RECT  rcDevice;
#endif
} WINDOWPLACEMENT;


static std::string EncodeWindowPlacement(const WINDOWPLACEMENT & inWindowPlacement)
{
    std::stringstream ss;
    {
        Poco::Base64Encoder encoder(ss); // From the Poco C++ libraries
        const char * offset = reinterpret_cast<const char*>(&inWindowPlacement);
        std::vector<char> buffer(offset, offset + sizeof(inWindowPlacement));
        for (size_t idx = 0; idx != buffer.size(); ++idx)
        {
            encoder << buffer[idx];
        }
        encoder.close();
    }
    return ss.str();
}


static WINDOWPLACEMENT DecodeWindowPlacement(const std::string & inEncoded)
{
    std::string decodedString;
    {
        std::istringstream istr(inEncoded);
        Poco::Base64Decoder decoder(istr); // From the Poco C++ libraries
        decoder >> decodedString;
        assert(decoder.eof());
        if (decoder.fail())
        {
            throw std::runtime_error("Failed to parse Window placement data from the configuration file.");
        }
    }

    if (decodedString.size() != sizeof(WINDOWPLACEMENT))
    {
        // !! Occurs frequently !!
        throw std::runtime_error("Errors occured during parsing of the Window placement.");
    }

    WINDOWPLACEMENT windowPlacement;
    memcpy(&windowPlacement, &decodedString[0], decodedString.size());
    return windowPlacement;
}

I'm aware that copying classes in C++ using memcpy is likely to cause trouble because the copy constructors are not properly executed. I'm not sure if this also applies to C-style structs. Or is serialization by memory dumping simply not done?

我知道，使用memcpy在c++中复制类可能会带来麻烦，因为复制构造函数没有正确执行。我不确定这是否也适用于c风格的结构。还是说，通过内存转储进行序列化根本就没有完成?

Update: A bug in Poco's Base64Encoder/Decoder is not impossible, but unlikely. Its test cases seem pretty thorough: Base64Test.cpp.

更新:Poco的Base64Encoder/Decoder中出现错误并非不可能，但也不太可能。它的测试用例似乎相当全面:Base64Test.cpp。

5 个解决方案

#1

You will run into problems if you need to transfer these files between machines that do not all share the same endianness and word size, or if you add/remove slots from the structs in future versions and need to retain binary compatibility.

如果您需要在不具有相同的机缘和字大小的机器之间传输这些文件，或者您需要在将来的版本中从结构体中添加/删除槽，并且需要保持二进制兼容性，那么您将会遇到问题。

#2

I'm not sure how operator>>() is implemented in Poco::Base64Decoder. If it is same as istream's operator>>(), then after decoder >> decodedString; decodedString may not contain all characters from the input. For example, if there is any whitespace character in encoded string then decoder >> decodedString; will read upto that whitespace.

我不确定操作符>>()在Poco::Base64Decoder中是如何实现的。如果与istream的算子>>()相同，则解码>>解码后;解码字符串可能不包含输入的所有字符。例如，如果编码字符串中有空格字符，则解码>>解码字符串;将读到空格。

#3

Doing a memcpy of classes/structs is okay if they're just Plain Old Data (POD), but if that's the case, then you could rely on C++ doing the copying for you via copy constructors (which exist for both struct and class types in C++).

如果类/结构体只是普通的旧数据(POD)，那么执行memcpy是可以的，但是如果是这样，那么您可以依赖c++通过copy构造函数(在c++中，对于结构体和类类型都存在)为您进行复制。

Certainly you can do it the way you have been doing it - one of the products I've worked on serializes data using memcpy, sends the data over the wire, and client applications decode the bytestream to get the data back.

当然，您可以像以前那样做——我曾经使用memcpy对数据进行序列化的产品之一，通过连接发送数据，客户端应用程序解码bytestream以获取数据。

But if you have a choice, you might want something higher level like boost.serialization, which offers more flexibility and deep-pointer copying. The aforementioned Google ProtoBuffers would work nicely too.

但如果你有选择，你可能想要更高的级别，比如boost。序列化，提供了更多的灵活性和深度指针复制。前面提到的谷歌原型缓冲区也可以很好地工作。

Here are some threads discussing serialization methods in C++:

以下是一些讨论c++序列化方法的线程:

boost serialization vs google protocol buffers?
增强序列化与谷歌协议缓冲区?
C++ Serialization Performance
c++序列化的性能

#4

I wouldn't go as far as to say that it's evil, but I think it is asking for trouble and weird problems in many cases.

我不会说它是邪恶的，但我认为它在很多情况下会带来麻烦和奇怪的问题。

I know it has been done and it can work (I've seen people serialize structs like that to send over a network connection), but it has a number of drawbacks that have been pointed out already (inflexibility, endianness problems, structs containing pointers, packing, etc).

我知道它已经完成并且可以工作(我见过人们通过网络连接将这样的结构序列化来发送)，但是它有许多已经指出的缺点(灵活性、机缘性问题、包含指针的结构、包装等)。

I'd recommend a more robust way of serializing and deserializing your data. I've heard lots of good things about Google protocol buffers, something like that will be a lot more flexible and will probably save you headaches in the end.

我建议使用一种更健壮的方式来序列化和反序列化数据。我听说了许多关于谷歌协议缓冲区的好东西，类似这样的东西将会更加灵活，最终可能会避免您的头痛。

#5

Serializing data in the manner you've done it is not particularly evil, if you know you're staying on a machine with the same byte size, word size, endian-ness, etc. Since you're serializing the window placement information, you probably don't care about portability between two different machines, and only want to save this information between sessions on the same machine. I'd hazard a guess that you're storing this into the Registry. If you want portability for other data that is actually useful when it's ported to other architectures, then you can look at many of the other suggestions posted here already, such as Google protocol buffers, etc. Whitespace is a red-herring, as all WS is irrelevant in a base64 encoded data stream and all decoders should ignore it (PoCo does).I am curious to know what are the sizes of the string and the structure when it fails.Knowing this might give you some insight into the problem.

序列化数据的方式你所做的不是特别邪恶,如果你住在一台计算机上使用相同的字节大小,文字大小,endian-ness,等。既然你序列化的窗口位置信息,你可能不关心两个不同的机器之间的可移植性,并且只想会话之间保存这个信息在同一台机器上。我猜你是把它存到注册表里了。如果你想要其他数据可移植性,实际上是有用的移植到其他体系结构的时候,你可以看看其他的许多建议张贴在这里,如谷歌协议缓冲区,等等。空格是一个红鲱鱼,因为所有WS base64编码数据流是无关紧要的,所有的解码器应该忽略它(略)。我很想知道弦的大小和它失效时的结构。了解这一点可能会让你对这个问题有一些了解。

#1

#2

#3