读写C ++向量到文件

For some graphics work I need to read in a large amount of data as quickly as possible and would ideally like to directly read and write the data structures to disk. Basically I have a load of 3d models in various file formats which take too long to load so I want to write them out in their "prepared" format as a cache that will load much faster on subsequent runs of the program.

对于某些图形工作,我需要尽快读取大量数据,理想情况下要直接读取和写入数据结构到磁盘。基本上我有各种各样的文件格式的3D模型,加载时间太长,所以我想把它们以“准备好”的格式写出来作为缓存,在后续的程序运行中加载速度要快得多。

Is it safe to do it like this? My worries are around directly reading into the data of the vector? I've removed error checking, hard coded 4 as the size of the int and so on so that i can give a short working example, I know it's bad code, my question really is if it is safe in c++ to read a whole array of structures directly into a vector like this? I believe it to be so, but c++ has so many traps and undefined behavour when you start going low level and dealing directly with raw memory like this.

这样做是否安全?我担心的是直接阅读载体的数据?我已经删除了错误检查,硬编码4作为int的大小等等,所以我可以给出一个简短的工作示例,我知道这是不好的代码,我的问题是,如果在c ++中读取整个数组是安全的结构直接进入这样的矢量?我相信它是这样的,但是当你开始进入低级并直接处理像这样的原始内存时,c ++有很多陷阱和未定义的行为。

I realise that number formats and sizes may change across platforms and compilers but this will only even be read and written by the same compiler program to cache data that may be needed on a later run of the same program.

我意识到数字格式和大小可能会在平台和编译器之间发生变化,但这只会由同一个编译器程序读取和写入,以缓存稍后运行同一程序时可能需要的数据。

#include <fstream>
#include <vector>

using namespace std;

struct Vertex
{
    float x, y, z;
};

typedef vector<Vertex> VertexList;

int main()
{
    // Create a list for testing
    VertexList list;
    Vertex v1 = {1.0f, 2.0f,   3.0f}; list.push_back(v1);
    Vertex v2 = {2.0f, 100.0f, 3.0f}; list.push_back(v2);
    Vertex v3 = {3.0f, 200.0f, 3.0f}; list.push_back(v3);
    Vertex v4 = {4.0f, 300.0f, 3.0f}; list.push_back(v4);

    // Write out a list to a disk file
    ofstream os ("data.dat", ios::binary);

    int size1 = list.size();
    os.write((const char*)&size1, 4);
    os.write((const char*)&list[0], size1 * sizeof(Vertex));
    os.close();


    // Read it back in
    VertexList list2;

    ifstream is("data.dat", ios::binary);
    int size2;
    is.read((char*)&size2, 4);
    list2.resize(size2);

     // Is it safe to read a whole array of structures directly into the vector?
    is.read((char*)&list2[0], size2 * sizeof(Vertex));

}

6 个解决方案

#1

As Laurynas says, std::vector is guaranteed to be contiguous, so that should work, but it is potentially non-portable.

正如Laurynas所说,std :: vector保证是连续的,所以应该可以工作,但它可能是不可移植的。

On most systems, sizeof(Vertex) will be 12, but it's not uncommon for the struct to be padded, so that sizeof(Vertex) == 16. If you were to write the data on one system and then read that file in on another, there's no guarantee that it will work correctly.

在大多数系统中,sizeof(Vertex)将为12,但结构填充并不罕见,因此sizeof(Vertex)== 16.如果您要在一个系统上写入数据然后在其中读取该文件另外,不能保证它能正常工作。

#2

You might be interested in the Boost.Serialization library. It knows how to save/load STL containers to/from disk, among other things. It might be overkill for your simple example, but it might become more useful if you do other types of serialization in your program.

您可能对Boost.Serialization库感兴趣。它知道如何在磁盘上保存/加载STL容器等等。对于您的简单示例而言可能有点过分,但如果您在程序中执行其他类型的序列化,它可能会变得更有用。

Here's some sample code that does what you're looking for:

以下是一些示例代码,可以满足您的需求:

#include <algorithm>
#include <fstream>
#include <vector>
#include <boost/archive/binary_oarchive.hpp>
#include <boost/archive/binary_iarchive.hpp>
#include <boost/serialization/vector.hpp>

using namespace std;

struct Vertex
{
    float x, y, z;
};

bool operator==(const Vertex& lhs, const Vertex& rhs)
{
    return lhs.x==rhs.x && lhs.y==rhs.y && lhs.z==rhs.z;
}

namespace boost { namespace serialization {
    template<class Archive>
    void serialize(Archive & ar, Vertex& v, const unsigned int version)
    {
        ar & v.x; ar & v.y; ar & v.z;
    }
} }

typedef vector<Vertex> VertexList;

int main()
{
    // Create a list for testing
    const Vertex v[] = {
        {1.0f, 2.0f,   3.0f},
        {2.0f, 100.0f, 3.0f},
        {3.0f, 200.0f, 3.0f},
        {4.0f, 300.0f, 3.0f}
    };
    VertexList list(v, v + (sizeof(v) / sizeof(v[0])));

    // Write out a list to a disk file
    {
        ofstream os("data.dat", ios::binary);
        boost::archive::binary_oarchive oar(os);
        oar << list;
    }

    // Read it back in
    VertexList list2;

    {
        ifstream is("data.dat", ios::binary);
        boost::archive::binary_iarchive iar(is);
        iar >> list2;
    }

    // Check if vertex lists are equal
    assert(list == list2);

    return 0;
}

Note that I had to implement a serialize function for your Vertex in the boost::serialization namespace. This lets the serialization library know how to serialize Vertex members.

请注意,我必须在boost :: serialization命名空间中为Vertex实现序列化函数。这使序列化库知道如何序列化Vertex成员。

I've browsed through the boost::binary_oarchive source code and it seems that it reads/writes the raw vector array data directly from/to the stream buffer. So it should be pretty fast.

我浏览了boost :: binary_oarchive源代码,它似乎直接从/向流缓冲区读取/写入原始向量数组数据。所以它应该非常快。

#3

std::vector is guaranteed to be continuous in memory, so, yes.

std :: vector保证在内存中是连续的,所以,是的。

#4

I just ran into this exact same problem.

我刚遇到了同样的问题。

First off, these statements are broken

首先,这些陈述被打破

os.write((const char*)&list[0], size1 * sizeof(Vertex));
is.read((char*)&list2[0], size2 * sizeof(Vertex));

There is other stuff in the Vector data structure, so this will make your new vector get filled up with garbage.

Vector数据结构中还有其他内容,因此这将使您的新向量充满垃圾。

Solution:
When you are writing your vector into a file, don't worry about the size your Vertex class, just directly write the entire vector into memory.

解决方案:当您将矢量写入文件时,不必担心Vertex类的大小,只需将整个矢量直接写入内存即可。

os.write((const char*)&list, sizeof(list));

And then you can read the entire vector into memory at once

然后您可以立即将整个矢量读入内存

is.seekg(0,ifstream::end);
long size2 = is.tellg();
is.seekg(0,ifstream::beg);
list2.resize(size2);
is.read((char*)&list2, size2);

#5

Another alternative to explicitly reading and writing your vector<> from and to a file is to replace the underlying allocator with one that allocates memory from a memory mapped file. This would allow you to avoid an intermediate read/write related copy. However, this approach does have some overhead. Unless your file is very large it may not make sense for your particular case. Profile as usual to determine if this approach is a good fit.

从文件中显式读取和写入vector <>的另一种方法是将底层分配器替换为从内存映射文件中分配内存的分配器。这将允许您避免中间读/写相关副本。但是,这种方法确实有一些开销。除非您的文件非常大,否则对您的特定情况可能没有意义。像往常一样描述确定这种方法是否合适。

There are also some caveats to this approach that are handled very well by the Boost.Interprocess library. Of particular interest to you may be its allocators and containers.

这种方法也有一些注意事项,Boost.Interprocess库可以很好地处理这些注意事项。您特别感兴趣的可能是它的分配器和容器。

#6

If this is used for caching by the same code, I don't see any problem with this. I've used this same technique on multiple systems without a problem (all Unix based). As an extra precaution, you might want to write a struct with known values at the beginning of the file, and check that it reads ok. You might also want to record the size of the struct in the file. This will save a lot of debugging time in the future if the padding ever changes.

如果这用于通过相同的代码进行缓存,我认为没有任何问题。我在多个系统上使用了相同的技术而没有任何问题(所有基于Unix的)。作为额外的预防措施,您可能希望在文件的开头编写一个具有已知值的结构,并检查它是否正常。您可能还想在文件中记录结构的大小。如果填充变化,这将在未来节省大量的调试时间。

#1