C ++ - std :: string是否适合存储大型文本文件,如果不是,那么这样做的最佳数据类型是什么?

时间:2020-12-25 16:56:38

I was just wondering, what is the best data type for storing the contents of a text-based file? Is std::string suitable for keeping the contents of a larger file in memory?

我只是想知道,存储基于文本的文件内容的最佳数据类型是什么? std :: string是否适合将较大文件的内容保存在内存中?

I'm making an editor of sorts right now so I'd like to know, I can't seem to find a good answer.


Edit: Yeah, this was a very vague question and I didn't expect it to get quite as much attention. Saying it's an editor is kinda a bad description, and the question is quite vague, I was just wondering how to store read-only text in memory, if std::stringis a bad way to do so; if it is inefficient or not.

编辑:是的,这是一个非常模糊的问题,我没想到它得到了相当多的关注。说它是一个编辑器有点糟糕的描述,问题很模糊,我只是想知道如何在内存中存储只读文本,如果std :: string是一个坏方法;如果它效率低下。

3 个解决方案



The "editor of sorts" is probably the important thing: if the text were read-only, you could consider using mmap. I don't have enough experience with memory mapped files to know if they're appropriate for text editors, however.


There are data structures more suited to modifying large chunks of text. A rope is a binary tree with short text strings at the leaf nodes... operations on a string such as appending some text might cause the leaf node to be split and the appended text added into the righthand new node. This has the advantage that existing strings don't always need to be repeated moved or grown as the text document is modified.


Another alternative is a simpler structure called a gap buffer. This effectively uses three strings to hold your text, a prefix, a postfix and a pre-sized gap. When the user starts work on a section of text, the document is split into the prefix and postfix strings, and a new gap buffer is allocated. The text the user adds is pushed into the gap buffer which may be expanded as needed. When they move on to a different point in the document, the gap buffer is merged with the other strings and a new gap is created. The assumption here is that most of the document will be static, with most edits occurring around a specific location in the document at any given time, minimising string copies, moves and reallocations.


Emacs uses gap buffers, which suggests they're not a bad place to start. There's plenty of discussion (and comparison) of the two datastructures out there, and you may even be able to find perfectly useable implementations already available. Implementing your own gap buffer should be dead easy.


Possibly useful reading: Gap Buffers, or, Don’t Get Tied Up With Ropes? (which includes some profiling information), original SGI C++ library Rope docs

可能有用的阅读:差距缓冲,或者,不要用绳索捆绑? (包括一些分析信息),原始的SGI C ++库Rope docs



Well, for a vague question, my answer is that probably std::string will suite you well. But.. there many ways to store this, it depends on how are you development requisites.

好吧,对于一个模糊的问题,我的答案是,std :: string可能会很好地适应你。但是......有很多方法来存储它,这取决于你的开发必需品。

Edit: Complementary Answer (edited question) No, it's not inefficient at all. It's quite suitable for generic use and excelent for readlonly access.




This is a vague question that is why you can't find a good answer. It is more about what you do with this text file. If the text file is small enough to be stored in memory then sure you can store it in a string. But then how are you going to use it? What does this do for you? Are you going to use regex for find certain words? Then sure you can do that but it may be slow.


Is the the text file a webpage(source)? Then sure you can do that and search for the tags you are looking for. There might be better ways like putting it into an xml tree and searching for the tags but the ONE string should still work.


Anyway this is a tough question to answer because we don't know what you are using the string for in the first place.


If you just need it whole and intact then if you have enough memory to store it in a string then sure.




