I currently write a set of doubles from a vector to a text file like this:
我目前正在从一个向量到一个文本文件写一组双精度数,如下所示:
std::ofstream fout;
fout.open("vector.txt");
for (l = 0; l < vector.size(); l++)
fout << std::setprecision(10) << vector.at(l) << std::endl;
fout.close();
But this is taking a lot of time to finish. Is there a faster or more efficient way to do this? I would love to see and learn it.
但这需要很多时间才能完成。有没有更快或更有效的方法来做这件事?我很乐意看到并学习它。
6 个解决方案
#1
31
Your algorithm has two parts:
你的算法有两个部分:
-
Serialize double numbers to a string or character buffer.
将双号序列化为字符串或字符缓冲区。
-
Write results to a file.
将结果写入文件。
The first item can be improved (> 20%) by using sprintf or fmt. The second item can be sped up by caching results to a buffer or extending the output file stream buffer size before writing results to the output file. You should not use std::endl because it is much slower than using "\n". If you still want to make it faster then write your data in binary format. Below is my complete code sample which includes my proposed solutions and one from Edgar Rokyan. I also included Ben Voigt and Matthieu M suggestions in test code.
第一项可以通过使用sprintf或fmt改进(>20%)。第二项可以通过将结果缓存到缓冲区或在将结果写入输出文件之前扩展输出文件流缓冲区大小来加快速度。您不应该使用std::endl,因为它比使用“\n”要慢得多。如果你仍然想让它更快,那就用二进制格式写你的数据。下面是我的完整代码示例,其中包括我提出的解决方案和埃德加·罗基安提出的解决方案。我还在测试代码中加入了Ben Voigt和Matthieu M的建议。
#include <algorithm>
#include <cstdlib>
#include <fstream>
#include <iomanip>
#include <iostream>
#include <iterator>
#include <vector>
// https://github.com/fmtlib/fmt
#include "fmt/format.h"
// http://uscilab.github.io/cereal/
#include "cereal/archives/binary.hpp"
#include "cereal/archives/json.hpp"
#include "cereal/archives/portable_binary.hpp"
#include "cereal/archives/xml.hpp"
#include "cereal/types/string.hpp"
#include "cereal/types/vector.hpp"
// https://github.com/DigitalInBlue/Celero
#include "celero/Celero.h"
template <typename T> const char* getFormattedString();
template<> const char* getFormattedString<double>(){return "%g\n";}
template<> const char* getFormattedString<float>(){return "%g\n";}
template<> const char* getFormattedString<int>(){return "%d\n";}
template<> const char* getFormattedString<size_t>(){return "%lu\n";}
namespace {
constexpr size_t LEN = 32;
template <typename T> std::vector<T> create_test_data(const size_t N) {
std::vector<T> data(N);
for (size_t idx = 0; idx < N; ++idx) {
data[idx] = idx;
}
return data;
}
template <typename Iterator> auto toVectorOfChar(Iterator begin, Iterator end) {
char aLine[LEN];
std::vector<char> buffer;
buffer.reserve(std::distance(begin, end) * LEN);
const char* fmtStr = getFormattedString<typename std::iterator_traits<Iterator>::value_type>();
std::for_each(begin, end, [&buffer, &aLine, &fmtStr](const auto value) {
sprintf(aLine, fmtStr, value);
for (size_t idx = 0; aLine[idx] != 0; ++idx) {
buffer.push_back(aLine[idx]);
}
});
return buffer;
}
template <typename Iterator>
auto toStringStream(Iterator begin, Iterator end, std::stringstream &buffer) {
char aLine[LEN];
const char* fmtStr = getFormattedString<typename std::iterator_traits<Iterator>::value_type>();
std::for_each(begin, end, [&buffer, &aLine, &fmtStr](const auto value) {
sprintf(aLine, fmtStr, value);
buffer << aLine;
});
}
template <typename Iterator> auto toMemoryWriter(Iterator begin, Iterator end) {
fmt::MemoryWriter writer;
std::for_each(begin, end, [&writer](const auto value) { writer << value << "\n"; });
return writer;
}
// A modified version of the original approach.
template <typename Container>
void original_approach(const Container &data, const std::string &fileName) {
std::ofstream fout(fileName);
for (size_t l = 0; l < data.size(); l++) {
fout << data[l] << std::endl;
}
fout.close();
}
// Replace std::endl by "\n"
template <typename Iterator>
void improved_original_approach(Iterator begin, Iterator end, const std::string &fileName) {
std::ofstream fout(fileName);
const size_t len = std::distance(begin, end) * LEN;
std::vector<char> buffer(len);
fout.rdbuf()->pubsetbuf(buffer.data(), len);
for (Iterator it = begin; it != end; ++it) {
fout << *it << "\n";
}
fout.close();
}
//
template <typename Iterator>
void edgar_rokyan_solution(Iterator begin, Iterator end, const std::string &fileName) {
std::ofstream fout(fileName);
std::copy(begin, end, std::ostream_iterator<double>(fout, "\n"));
}
// Cache to a string stream before writing to the output file
template <typename Iterator>
void stringstream_approach(Iterator begin, Iterator end, const std::string &fileName) {
std::stringstream buffer;
for (Iterator it = begin; it != end; ++it) {
buffer << *it << "\n";
}
// Now write to the output file.
std::ofstream fout(fileName);
fout << buffer.str();
fout.close();
}
// Use sprintf
template <typename Iterator>
void sprintf_approach(Iterator begin, Iterator end, const std::string &fileName) {
std::stringstream buffer;
toStringStream(begin, end, buffer);
std::ofstream fout(fileName);
fout << buffer.str();
fout.close();
}
// Use fmt::MemoryWriter (https://github.com/fmtlib/fmt)
template <typename Iterator>
void fmt_approach(Iterator begin, Iterator end, const std::string &fileName) {
auto writer = toMemoryWriter(begin, end);
std::ofstream fout(fileName);
fout << writer.str();
fout.close();
}
// Use std::vector<char>
template <typename Iterator>
void vector_of_char_approach(Iterator begin, Iterator end, const std::string &fileName) {
std::vector<char> buffer = toVectorOfChar(begin, end);
std::ofstream fout(fileName);
fout << buffer.data();
fout.close();
}
// Use cereal (http://uscilab.github.io/cereal/).
template <typename Container, typename OArchive = cereal::BinaryOutputArchive>
void use_cereal(Container &&data, const std::string &fileName) {
std::stringstream buffer;
{
OArchive oar(buffer);
oar(data);
}
std::ofstream fout(fileName);
fout << buffer.str();
fout.close();
}
}
// Performance test input data.
constexpr int NumberOfSamples = 5;
constexpr int NumberOfIterations = 2;
constexpr int N = 3000000;
const auto double_data = create_test_data<double>(N);
const auto float_data = create_test_data<float>(N);
const auto int_data = create_test_data<int>(N);
const auto size_t_data = create_test_data<size_t>(N);
CELERO_MAIN
BASELINE(DoubleVector, original_approach, NumberOfSamples, NumberOfIterations) {
const std::string fileName("origsol.txt");
original_approach(double_data, fileName);
}
BENCHMARK(DoubleVector, improved_original_approach, NumberOfSamples, NumberOfIterations) {
const std::string fileName("improvedsol.txt");
improved_original_approach(double_data.cbegin(), double_data.cend(), fileName);
}
BENCHMARK(DoubleVector, edgar_rokyan_solution, NumberOfSamples, NumberOfIterations) {
const std::string fileName("edgar_rokyan_solution.txt");
edgar_rokyan_solution(double_data.cbegin(), double_data.end(), fileName);
}
BENCHMARK(DoubleVector, stringstream_approach, NumberOfSamples, NumberOfIterations) {
const std::string fileName("stringstream.txt");
stringstream_approach(double_data.cbegin(), double_data.cend(), fileName);
}
BENCHMARK(DoubleVector, sprintf_approach, NumberOfSamples, NumberOfIterations) {
const std::string fileName("sprintf.txt");
sprintf_approach(double_data.cbegin(), double_data.cend(), fileName);
}
BENCHMARK(DoubleVector, fmt_approach, NumberOfSamples, NumberOfIterations) {
const std::string fileName("fmt.txt");
fmt_approach(double_data.cbegin(), double_data.cend(), fileName);
}
BENCHMARK(DoubleVector, vector_of_char_approach, NumberOfSamples, NumberOfIterations) {
const std::string fileName("vector_of_char.txt");
vector_of_char_approach(double_data.cbegin(), double_data.cend(), fileName);
}
BENCHMARK(DoubleVector, use_cereal, NumberOfSamples, NumberOfIterations) {
const std::string fileName("cereal.bin");
use_cereal(double_data, fileName);
}
// Benchmark double vector
BASELINE(DoubleVectorConversion, toStringStream, NumberOfSamples, NumberOfIterations) {
std::stringstream output;
toStringStream(double_data.cbegin(), double_data.cend(), output);
}
BENCHMARK(DoubleVectorConversion, toMemoryWriter, NumberOfSamples, NumberOfIterations) {
celero::DoNotOptimizeAway(toMemoryWriter(double_data.cbegin(), double_data.cend()));
}
BENCHMARK(DoubleVectorConversion, toVectorOfChar, NumberOfSamples, NumberOfIterations) {
celero::DoNotOptimizeAway(toVectorOfChar(double_data.cbegin(), double_data.cend()));
}
// Benchmark float vector
BASELINE(FloatVectorConversion, toStringStream, NumberOfSamples, NumberOfIterations) {
std::stringstream output;
toStringStream(float_data.cbegin(), float_data.cend(), output);
}
BENCHMARK(FloatVectorConversion, toMemoryWriter, NumberOfSamples, NumberOfIterations) {
celero::DoNotOptimizeAway(toMemoryWriter(float_data.cbegin(), float_data.cend()));
}
BENCHMARK(FloatVectorConversion, toVectorOfChar, NumberOfSamples, NumberOfIterations) {
celero::DoNotOptimizeAway(toVectorOfChar(float_data.cbegin(), float_data.cend()));
}
// Benchmark int vector
BASELINE(int_conversion, toStringStream, NumberOfSamples, NumberOfIterations) {
std::stringstream output;
toStringStream(int_data.cbegin(), int_data.cend(), output);
}
BENCHMARK(int_conversion, toMemoryWriter, NumberOfSamples, NumberOfIterations) {
celero::DoNotOptimizeAway(toMemoryWriter(int_data.cbegin(), int_data.cend()));
}
BENCHMARK(int_conversion, toVectorOfChar, NumberOfSamples, NumberOfIterations) {
celero::DoNotOptimizeAway(toVectorOfChar(int_data.cbegin(), int_data.cend()));
}
// Benchmark size_t vector
BASELINE(size_t_conversion, toStringStream, NumberOfSamples, NumberOfIterations) {
std::stringstream output;
toStringStream(size_t_data.cbegin(), size_t_data.cend(), output);
}
BENCHMARK(size_t_conversion, toMemoryWriter, NumberOfSamples, NumberOfIterations) {
celero::DoNotOptimizeAway(toMemoryWriter(size_t_data.cbegin(), size_t_data.cend()));
}
BENCHMARK(size_t_conversion, toVectorOfChar, NumberOfSamples, NumberOfIterations) {
celero::DoNotOptimizeAway(toVectorOfChar(size_t_data.cbegin(), size_t_data.cend()));
}
Below are the performance results obtained in my Linux box using clang-3.9.1 and -O3 flag. I use Celero to collect all performance results.
下面是在我的Linux box中使用clang-3.9.1和-O3标志获得的性能结果。我使用Celero来收集所有性能结果。
Timer resolution: 0.001000 us
-----------------------------------------------------------------------------------------------------------------------------------------------
Group | Experiment | Prob. Space | Samples | Iterations | Baseline | us/Iteration | Iterations/sec |
-----------------------------------------------------------------------------------------------------------------------------------------------
DoubleVector | original_approa | Null | 10 | 4 | 1.00000 | 3650309.00000 | 0.27 |
DoubleVector | improved_origin | Null | 10 | 4 | 0.47828 | 1745855.00000 | 0.57 |
DoubleVector | edgar_rokyan_so | Null | 10 | 4 | 0.45804 | 1672005.00000 | 0.60 |
DoubleVector | stringstream_ap | Null | 10 | 4 | 0.41514 | 1515377.00000 | 0.66 |
DoubleVector | sprintf_approac | Null | 10 | 4 | 0.35436 | 1293521.50000 | 0.77 |
DoubleVector | fmt_approach | Null | 10 | 4 | 0.34916 | 1274552.75000 | 0.78 |
DoubleVector | vector_of_char_ | Null | 10 | 4 | 0.34366 | 1254462.00000 | 0.80 |
DoubleVector | use_cereal | Null | 10 | 4 | 0.04172 | 152291.25000 | 6.57 |
Complete.
I also benchmark for numeric to string conversion algorithms to compare the performance of std::stringstream, fmt::MemoryWriter, and std::vector.
我还对数字到字符串转换算法进行了基准测试,以比较std: stringstream、fmt::MemoryWriter和std::vector的性能。
Timer resolution: 0.001000 us
-----------------------------------------------------------------------------------------------------------------------------------------------
Group | Experiment | Prob. Space | Samples | Iterations | Baseline | us/Iteration | Iterations/sec |
-----------------------------------------------------------------------------------------------------------------------------------------------
DoubleVectorCon | toStringStream | Null | 10 | 4 | 1.00000 | 1272667.00000 | 0.79 |
FloatVectorConv | toStringStream | Null | 10 | 4 | 1.00000 | 1272573.75000 | 0.79 |
int_conversion | toStringStream | Null | 10 | 4 | 1.00000 | 248709.00000 | 4.02 |
size_t_conversi | toStringStream | Null | 10 | 4 | 1.00000 | 252063.00000 | 3.97 |
DoubleVectorCon | toMemoryWriter | Null | 10 | 4 | 0.98468 | 1253165.50000 | 0.80 |
DoubleVectorCon | toVectorOfChar | Null | 10 | 4 | 0.97146 | 1236340.50000 | 0.81 |
FloatVectorConv | toMemoryWriter | Null | 10 | 4 | 0.98419 | 1252454.25000 | 0.80 |
FloatVectorConv | toVectorOfChar | Null | 10 | 4 | 0.97369 | 1239093.25000 | 0.81 |
int_conversion | toMemoryWriter | Null | 10 | 4 | 0.11741 | 29200.50000 | 34.25 |
int_conversion | toVectorOfChar | Null | 10 | 4 | 0.87105 | 216637.00000 | 4.62 |
size_t_conversi | toMemoryWriter | Null | 10 | 4 | 0.13746 | 34649.50000 | 28.86 |
size_t_conversi | toVectorOfChar | Null | 10 | 4 | 0.85345 | 215123.00000 | 4.65 |
Complete.
From the above tables we can see that:
从上面的表格可以看出:
-
Edgar Rokyan solution is 10% slower than the stringstream solution. The solution that use fmt library is the best for three studied data types which are double, int, and size_t. sprintf + std::vector solution is 1% faster than the fmt solution for double data type. However, I do not recommend solutions that use sprintf for production code because they are not elegant (still written in C style) and do not work out of the box for different data types such as int or size_t.
埃德加洛肯溶液比弦流溶液慢10%。使用fmt库的解决方案是三种研究数据类型的最佳解决方案,这三种数据类型分别是double、int和size_t。sprintf + std::对于双数据类型,向量解比fmt解快1%。但是,我不推荐将sprintf用于生产代码的解决方案,因为它们并不优雅(仍然是用C风格编写的),而且对于不同的数据类型(如int或size_t),它们也不适用。
-
The benchmark results also show that fmt is the superrior integral data type serialization since it is at least 7x faster than other approaches.
基准测试结果还表明,fmt是超rior整型数据类型序列化,因为它至少比其他方法快7倍。
-
We can speed up this algorithm 10x if we use the binary format. This approach is significantly faster than writing to a formatted text file because we only do raw copy from the memory to the output. If you want to have more flexible and portable solutions then try cereal or boost::serialization or protocol-buffer. According to this performance study cereal seem to be the fastest.
如果我们使用二进制格式,我们可以加速这个算法10x。这种方法比写入格式化文本文件要快得多,因为我们只执行从内存到输出的原始拷贝。如果您希望拥有更灵活和可移植的解决方案,那么可以尝试使用grain或boost::序列化或协议缓冲区。根据这项性能研究,麦片似乎是最快的。
#2
72
std::ofstream fout("vector.txt");
fout << std::setprecision(10);
for(auto const& x : vector)
fout << x << '\n';
Everything I changed had theoretically worse performance in your version of the code, but the std::endl
was the real killer. std::vector::at
(with bounds checking, which you don't need) would be the second, then the fact that you did not use iterators.
理论上,我所改变的一切在您的代码版本中都有较差的性能,但是std::endl才是真正的杀手。向量::at(带有边界检查,您不需要它)将是第二个,然后您不使用迭代器的事实。
Why default-construct a std::ofstream
and then call open
, when you can do it in one step? Why call close
when RAII (the destructor) takes care of it for you? You can also call
为什么要默认地构建一个std::ofstream然后调用open,当您可以一步完成时?当RAII(析构函数)替你处理时,为什么要调用close ?你也可以叫
fout << std::setprecision(10)
just once, before the loop.
只有一次,在循环之前。
As noted in the comment below, if your vector is of elements of fundamental type, you might get a better performance with for(auto x : vector)
. Measure the running time / inspect the assembly output.
正如下面注释中所提到的,如果您的向量是基本类型的元素,那么使用for(auto x: vector)可能会获得更好的性能。测量运行时间/检查程序集输出。
Just to point out another thing that caught my eyes, this:
我想指出另一件吸引我的事情:
for(l = 0; l < vector.size(); l++)
What is this l
? Why declare it outside the loop? It seems you don't need it in the outer scope, so don't. And also the post-increment.
这是什么我?为什么要在循环之外声明它呢?看起来你不需要它在外部范围内,所以不要。还有post-increment。
The result:
结果:
for(size_t l = 0; l < vector.size(); ++l)
I'm sorry for making code review out of this post.
很抱歉,我把代码审查从这篇文章中删除了。
#3
20
You can also use a rather neat form of outputting contents of any vector
into the file, with a help of iterators and copy
function.
在迭代器和复制函数的帮助下,您还可以使用一种相当简洁的方式将任何向量的内容输出到文件中。
std::ofstream fout("vector.txt");
fout.precision(10);
std::copy(numbers.begin(), numbers.end(),
std::ostream_iterator<double>(fout, "\n"));
This solutions is practically the same with LogicStuff's solution in terms of execution time. But it also illustrates how to print the contents just with a single copy
function which, as I suppose, looks pretty well.
这个解决方案实际上与LogicStuff的解决方案在执行时间上是一样的。但它也说明了如何仅用一个复制函数来打印内容,我想,它看起来很不错。
#4
11
OK, I'm sad that there are three solutions that attempt to give you a fish, but no solution that attempts to teach you how to fish.
好吧,我很难过有三种解决方案试图给你一条鱼,但是没有办法教你如何捕鱼。
When you have a performance problem, the solution is to use a profiler, and fix whatever the problem the profiler shows.
当您遇到性能问题时,解决方案是使用分析器,并修复分析器显示的任何问题。
Converting double-to-string for 300,000 doubles will not take 3 minutes on any computer that has shipped in the last 10 years.
在过去的10年里,任何一台电脑都不需要花3分钟就能完成30万双。
Writing 3 MB of data to disk (an average size of 300,000 doubles) will not take 3 minutes on any computer that has shipped in the last 10 years.
在过去10年里,在任何一台已经发布的计算机上,将3mb的数据写到磁盘(平均大小为30万倍)都不需要花费3分钟。
If you profile this, my guess is that you'll find that fout gets flushed 300,000 times, and that flushing is slow, because it may involve blocking, or semi-blocking, I/O. Thus, you need to avoid the blocking I/O. The typical way of doing that is to prepare all your I/O to a single buffer (create a stringstream, write to that) and then write that buffer to a physical file in one go. This is the solution hungptit describes, except I think that what's missing is explaining WHY that solution is a good solution.
如果你分析这个,我猜你会发现fout被刷新了30万次,而且刷新速度很慢,因为它可能涉及到阻塞或半阻塞,I/O。因此,您需要避免阻塞I/O。这样做的典型方法是将所有I/O准备到一个缓冲区(创建一个stringstream,对其进行写入),然后一次性将该缓冲区写入一个物理文件。这是hungptit描述的解决方案,除了我认为缺少的是解释为什么这个解决方案是一个好的解决方案。
Or, to put it another way: What the profiler will tell you is that calling write() (on Linux) or WriteFile() (on Windows) is much slower than just copying a few bytes into a memory buffer, because it's a user/kernel level transition. If std::endl causes this to happen for each double, you're going to have a bad (slow) time. Replace it with something that just stays in user space and puts data in RAM!
或者,换句话说:分析器会告诉您,调用write()(在Linux上)或WriteFile()(在Windows上)要比将几个字节复制到内存缓冲区慢得多,因为这是一个用户/内核级的转换。如果std::endl使每一个双精度浮点数发生,你将会有一个糟糕的(慢)时间。用一些只停留在用户空间并将数据放入RAM中的东西来替换它!
If that's still not fast enough, it may be that the specific-precision version of operator<<() on strings is slow or involves unnecessary overhead. If so, you may be able to further speed up the code by using sprintf() or some other potentially faster function to generate data into the in-memory buffer, before you finally write the entire buffer to a file in one go.
如果这还不够快,可能是操作符<<()在字符串上的特定精度版本比较慢,或者涉及不必要的开销。如果是这样,您可以通过使用sprintf()或其他一些可能更快的函数将数据生成到内存缓冲区中,从而进一步加快代码的速度,然后再一次将整个缓冲区写入文件。
#5
5
You have two main bottlenecks in your program: output and formatting text.
程序中有两个主要的瓶颈:输出和格式化文本。
To increase performance, you will want to increase the amount of data output per call. For example, 1 output transfer of 500 characters is faster than 500 transfers of 1 character.
为了提高性能,您需要增加每次调用的数据输出量。例如,一个500个字符的输出传输速度比500个字符的传输速度快。
My recommendation is you format the data to a big buffer, then block write the buffer.
我的建议是将数据格式化为一个大的缓冲区,然后阻塞写入缓冲区。
Here's an example:
这里有一个例子:
char buffer[1024 * 1024];
unsigned int buffer_index = 0;
const unsigned int size = my_vector.size();
for (unsigned int i = 0; i < size; ++i)
{
signed int characters_formatted = snprintf(&buffer[buffer_index],
(1024 * 1024) - buffer_index,
"%.10f", my_vector[i]);
if (characters_formatted > 0)
{
buffer_index += (unsigned int) characters_formatted;
}
}
cout.write(&buffer[0], buffer_index);
You should first try changing optimization settings in your compiler before messing with the code.
在打乱代码之前,您应该先尝试更改编译器中的优化设置。
#6
2
Here is a slightly different solution: save your doubles in binary form.
这里有一个稍微不同的解决方案:将双打保存为二进制形式。
int fd = ::open("/path/to/the/file", O_WRONLY /* whatever permission */);
::write(fd, &vector[0], vector.size() * sizeof(vector[0]));
Since you mentioned that you have 300k doubles, which equals to 300k * 8 bytes = 2.4M, you can save all of them to local disk file in less than 0.1 second. The only drawback of this method is saved file is not as readable as string representation, but a HexEditor can solve that problem.
既然您提到您有300k双精度,即300k * 8字节= 2.4M,那么您可以在0.1秒内将它们全部保存到本地磁盘文件中。这种方法唯一的缺点是保存的文件不像字符串表示那样可读,但是六角形编辑器可以解决这个问题。
If you prefer more robust way, there are plenty of serialization libraries/tools available on line. They provide more benefits, such as language-neutral, machine-independent, flexible compression algorithm, etc. Those are the two I usually use:
如果您喜欢更健壮的方式,那么可以在线使用大量的序列化库/工具。它们提供了更多的好处,比如语言中立的、独立于机器的、灵活的压缩算法等等。
- google/protobuf
- 谷歌/ protobuf
- NetCDF
- NetCDF
#1
31
Your algorithm has two parts:
你的算法有两个部分:
-
Serialize double numbers to a string or character buffer.
将双号序列化为字符串或字符缓冲区。
-
Write results to a file.
将结果写入文件。
The first item can be improved (> 20%) by using sprintf or fmt. The second item can be sped up by caching results to a buffer or extending the output file stream buffer size before writing results to the output file. You should not use std::endl because it is much slower than using "\n". If you still want to make it faster then write your data in binary format. Below is my complete code sample which includes my proposed solutions and one from Edgar Rokyan. I also included Ben Voigt and Matthieu M suggestions in test code.
第一项可以通过使用sprintf或fmt改进(>20%)。第二项可以通过将结果缓存到缓冲区或在将结果写入输出文件之前扩展输出文件流缓冲区大小来加快速度。您不应该使用std::endl,因为它比使用“\n”要慢得多。如果你仍然想让它更快,那就用二进制格式写你的数据。下面是我的完整代码示例,其中包括我提出的解决方案和埃德加·罗基安提出的解决方案。我还在测试代码中加入了Ben Voigt和Matthieu M的建议。
#include <algorithm>
#include <cstdlib>
#include <fstream>
#include <iomanip>
#include <iostream>
#include <iterator>
#include <vector>
// https://github.com/fmtlib/fmt
#include "fmt/format.h"
// http://uscilab.github.io/cereal/
#include "cereal/archives/binary.hpp"
#include "cereal/archives/json.hpp"
#include "cereal/archives/portable_binary.hpp"
#include "cereal/archives/xml.hpp"
#include "cereal/types/string.hpp"
#include "cereal/types/vector.hpp"
// https://github.com/DigitalInBlue/Celero
#include "celero/Celero.h"
template <typename T> const char* getFormattedString();
template<> const char* getFormattedString<double>(){return "%g\n";}
template<> const char* getFormattedString<float>(){return "%g\n";}
template<> const char* getFormattedString<int>(){return "%d\n";}
template<> const char* getFormattedString<size_t>(){return "%lu\n";}
namespace {
constexpr size_t LEN = 32;
template <typename T> std::vector<T> create_test_data(const size_t N) {
std::vector<T> data(N);
for (size_t idx = 0; idx < N; ++idx) {
data[idx] = idx;
}
return data;
}
template <typename Iterator> auto toVectorOfChar(Iterator begin, Iterator end) {
char aLine[LEN];
std::vector<char> buffer;
buffer.reserve(std::distance(begin, end) * LEN);
const char* fmtStr = getFormattedString<typename std::iterator_traits<Iterator>::value_type>();
std::for_each(begin, end, [&buffer, &aLine, &fmtStr](const auto value) {
sprintf(aLine, fmtStr, value);
for (size_t idx = 0; aLine[idx] != 0; ++idx) {
buffer.push_back(aLine[idx]);
}
});
return buffer;
}
template <typename Iterator>
auto toStringStream(Iterator begin, Iterator end, std::stringstream &buffer) {
char aLine[LEN];
const char* fmtStr = getFormattedString<typename std::iterator_traits<Iterator>::value_type>();
std::for_each(begin, end, [&buffer, &aLine, &fmtStr](const auto value) {
sprintf(aLine, fmtStr, value);
buffer << aLine;
});
}
template <typename Iterator> auto toMemoryWriter(Iterator begin, Iterator end) {
fmt::MemoryWriter writer;
std::for_each(begin, end, [&writer](const auto value) { writer << value << "\n"; });
return writer;
}
// A modified version of the original approach.
template <typename Container>
void original_approach(const Container &data, const std::string &fileName) {
std::ofstream fout(fileName);
for (size_t l = 0; l < data.size(); l++) {
fout << data[l] << std::endl;
}
fout.close();
}
// Replace std::endl by "\n"
template <typename Iterator>
void improved_original_approach(Iterator begin, Iterator end, const std::string &fileName) {
std::ofstream fout(fileName);
const size_t len = std::distance(begin, end) * LEN;
std::vector<char> buffer(len);
fout.rdbuf()->pubsetbuf(buffer.data(), len);
for (Iterator it = begin; it != end; ++it) {
fout << *it << "\n";
}
fout.close();
}
//
template <typename Iterator>
void edgar_rokyan_solution(Iterator begin, Iterator end, const std::string &fileName) {
std::ofstream fout(fileName);
std::copy(begin, end, std::ostream_iterator<double>(fout, "\n"));
}
// Cache to a string stream before writing to the output file
template <typename Iterator>
void stringstream_approach(Iterator begin, Iterator end, const std::string &fileName) {
std::stringstream buffer;
for (Iterator it = begin; it != end; ++it) {
buffer << *it << "\n";
}
// Now write to the output file.
std::ofstream fout(fileName);
fout << buffer.str();
fout.close();
}
// Use sprintf
template <typename Iterator>
void sprintf_approach(Iterator begin, Iterator end, const std::string &fileName) {
std::stringstream buffer;
toStringStream(begin, end, buffer);
std::ofstream fout(fileName);
fout << buffer.str();
fout.close();
}
// Use fmt::MemoryWriter (https://github.com/fmtlib/fmt)
template <typename Iterator>
void fmt_approach(Iterator begin, Iterator end, const std::string &fileName) {
auto writer = toMemoryWriter(begin, end);
std::ofstream fout(fileName);
fout << writer.str();
fout.close();
}
// Use std::vector<char>
template <typename Iterator>
void vector_of_char_approach(Iterator begin, Iterator end, const std::string &fileName) {
std::vector<char> buffer = toVectorOfChar(begin, end);
std::ofstream fout(fileName);
fout << buffer.data();
fout.close();
}
// Use cereal (http://uscilab.github.io/cereal/).
template <typename Container, typename OArchive = cereal::BinaryOutputArchive>
void use_cereal(Container &&data, const std::string &fileName) {
std::stringstream buffer;
{
OArchive oar(buffer);
oar(data);
}
std::ofstream fout(fileName);
fout << buffer.str();
fout.close();
}
}
// Performance test input data.
constexpr int NumberOfSamples = 5;
constexpr int NumberOfIterations = 2;
constexpr int N = 3000000;
const auto double_data = create_test_data<double>(N);
const auto float_data = create_test_data<float>(N);
const auto int_data = create_test_data<int>(N);
const auto size_t_data = create_test_data<size_t>(N);
CELERO_MAIN
BASELINE(DoubleVector, original_approach, NumberOfSamples, NumberOfIterations) {
const std::string fileName("origsol.txt");
original_approach(double_data, fileName);
}
BENCHMARK(DoubleVector, improved_original_approach, NumberOfSamples, NumberOfIterations) {
const std::string fileName("improvedsol.txt");
improved_original_approach(double_data.cbegin(), double_data.cend(), fileName);
}
BENCHMARK(DoubleVector, edgar_rokyan_solution, NumberOfSamples, NumberOfIterations) {
const std::string fileName("edgar_rokyan_solution.txt");
edgar_rokyan_solution(double_data.cbegin(), double_data.end(), fileName);
}
BENCHMARK(DoubleVector, stringstream_approach, NumberOfSamples, NumberOfIterations) {
const std::string fileName("stringstream.txt");
stringstream_approach(double_data.cbegin(), double_data.cend(), fileName);
}
BENCHMARK(DoubleVector, sprintf_approach, NumberOfSamples, NumberOfIterations) {
const std::string fileName("sprintf.txt");
sprintf_approach(double_data.cbegin(), double_data.cend(), fileName);
}
BENCHMARK(DoubleVector, fmt_approach, NumberOfSamples, NumberOfIterations) {
const std::string fileName("fmt.txt");
fmt_approach(double_data.cbegin(), double_data.cend(), fileName);
}
BENCHMARK(DoubleVector, vector_of_char_approach, NumberOfSamples, NumberOfIterations) {
const std::string fileName("vector_of_char.txt");
vector_of_char_approach(double_data.cbegin(), double_data.cend(), fileName);
}
BENCHMARK(DoubleVector, use_cereal, NumberOfSamples, NumberOfIterations) {
const std::string fileName("cereal.bin");
use_cereal(double_data, fileName);
}
// Benchmark double vector
BASELINE(DoubleVectorConversion, toStringStream, NumberOfSamples, NumberOfIterations) {
std::stringstream output;
toStringStream(double_data.cbegin(), double_data.cend(), output);
}
BENCHMARK(DoubleVectorConversion, toMemoryWriter, NumberOfSamples, NumberOfIterations) {
celero::DoNotOptimizeAway(toMemoryWriter(double_data.cbegin(), double_data.cend()));
}
BENCHMARK(DoubleVectorConversion, toVectorOfChar, NumberOfSamples, NumberOfIterations) {
celero::DoNotOptimizeAway(toVectorOfChar(double_data.cbegin(), double_data.cend()));
}
// Benchmark float vector
BASELINE(FloatVectorConversion, toStringStream, NumberOfSamples, NumberOfIterations) {
std::stringstream output;
toStringStream(float_data.cbegin(), float_data.cend(), output);
}
BENCHMARK(FloatVectorConversion, toMemoryWriter, NumberOfSamples, NumberOfIterations) {
celero::DoNotOptimizeAway(toMemoryWriter(float_data.cbegin(), float_data.cend()));
}
BENCHMARK(FloatVectorConversion, toVectorOfChar, NumberOfSamples, NumberOfIterations) {
celero::DoNotOptimizeAway(toVectorOfChar(float_data.cbegin(), float_data.cend()));
}
// Benchmark int vector
BASELINE(int_conversion, toStringStream, NumberOfSamples, NumberOfIterations) {
std::stringstream output;
toStringStream(int_data.cbegin(), int_data.cend(), output);
}
BENCHMARK(int_conversion, toMemoryWriter, NumberOfSamples, NumberOfIterations) {
celero::DoNotOptimizeAway(toMemoryWriter(int_data.cbegin(), int_data.cend()));
}
BENCHMARK(int_conversion, toVectorOfChar, NumberOfSamples, NumberOfIterations) {
celero::DoNotOptimizeAway(toVectorOfChar(int_data.cbegin(), int_data.cend()));
}
// Benchmark size_t vector
BASELINE(size_t_conversion, toStringStream, NumberOfSamples, NumberOfIterations) {
std::stringstream output;
toStringStream(size_t_data.cbegin(), size_t_data.cend(), output);
}
BENCHMARK(size_t_conversion, toMemoryWriter, NumberOfSamples, NumberOfIterations) {
celero::DoNotOptimizeAway(toMemoryWriter(size_t_data.cbegin(), size_t_data.cend()));
}
BENCHMARK(size_t_conversion, toVectorOfChar, NumberOfSamples, NumberOfIterations) {
celero::DoNotOptimizeAway(toVectorOfChar(size_t_data.cbegin(), size_t_data.cend()));
}
Below are the performance results obtained in my Linux box using clang-3.9.1 and -O3 flag. I use Celero to collect all performance results.
下面是在我的Linux box中使用clang-3.9.1和-O3标志获得的性能结果。我使用Celero来收集所有性能结果。
Timer resolution: 0.001000 us
-----------------------------------------------------------------------------------------------------------------------------------------------
Group | Experiment | Prob. Space | Samples | Iterations | Baseline | us/Iteration | Iterations/sec |
-----------------------------------------------------------------------------------------------------------------------------------------------
DoubleVector | original_approa | Null | 10 | 4 | 1.00000 | 3650309.00000 | 0.27 |
DoubleVector | improved_origin | Null | 10 | 4 | 0.47828 | 1745855.00000 | 0.57 |
DoubleVector | edgar_rokyan_so | Null | 10 | 4 | 0.45804 | 1672005.00000 | 0.60 |
DoubleVector | stringstream_ap | Null | 10 | 4 | 0.41514 | 1515377.00000 | 0.66 |
DoubleVector | sprintf_approac | Null | 10 | 4 | 0.35436 | 1293521.50000 | 0.77 |
DoubleVector | fmt_approach | Null | 10 | 4 | 0.34916 | 1274552.75000 | 0.78 |
DoubleVector | vector_of_char_ | Null | 10 | 4 | 0.34366 | 1254462.00000 | 0.80 |
DoubleVector | use_cereal | Null | 10 | 4 | 0.04172 | 152291.25000 | 6.57 |
Complete.
I also benchmark for numeric to string conversion algorithms to compare the performance of std::stringstream, fmt::MemoryWriter, and std::vector.
我还对数字到字符串转换算法进行了基准测试,以比较std: stringstream、fmt::MemoryWriter和std::vector的性能。
Timer resolution: 0.001000 us
-----------------------------------------------------------------------------------------------------------------------------------------------
Group | Experiment | Prob. Space | Samples | Iterations | Baseline | us/Iteration | Iterations/sec |
-----------------------------------------------------------------------------------------------------------------------------------------------
DoubleVectorCon | toStringStream | Null | 10 | 4 | 1.00000 | 1272667.00000 | 0.79 |
FloatVectorConv | toStringStream | Null | 10 | 4 | 1.00000 | 1272573.75000 | 0.79 |
int_conversion | toStringStream | Null | 10 | 4 | 1.00000 | 248709.00000 | 4.02 |
size_t_conversi | toStringStream | Null | 10 | 4 | 1.00000 | 252063.00000 | 3.97 |
DoubleVectorCon | toMemoryWriter | Null | 10 | 4 | 0.98468 | 1253165.50000 | 0.80 |
DoubleVectorCon | toVectorOfChar | Null | 10 | 4 | 0.97146 | 1236340.50000 | 0.81 |
FloatVectorConv | toMemoryWriter | Null | 10 | 4 | 0.98419 | 1252454.25000 | 0.80 |
FloatVectorConv | toVectorOfChar | Null | 10 | 4 | 0.97369 | 1239093.25000 | 0.81 |
int_conversion | toMemoryWriter | Null | 10 | 4 | 0.11741 | 29200.50000 | 34.25 |
int_conversion | toVectorOfChar | Null | 10 | 4 | 0.87105 | 216637.00000 | 4.62 |
size_t_conversi | toMemoryWriter | Null | 10 | 4 | 0.13746 | 34649.50000 | 28.86 |
size_t_conversi | toVectorOfChar | Null | 10 | 4 | 0.85345 | 215123.00000 | 4.65 |
Complete.
From the above tables we can see that:
从上面的表格可以看出:
-
Edgar Rokyan solution is 10% slower than the stringstream solution. The solution that use fmt library is the best for three studied data types which are double, int, and size_t. sprintf + std::vector solution is 1% faster than the fmt solution for double data type. However, I do not recommend solutions that use sprintf for production code because they are not elegant (still written in C style) and do not work out of the box for different data types such as int or size_t.
埃德加洛肯溶液比弦流溶液慢10%。使用fmt库的解决方案是三种研究数据类型的最佳解决方案,这三种数据类型分别是double、int和size_t。sprintf + std::对于双数据类型,向量解比fmt解快1%。但是,我不推荐将sprintf用于生产代码的解决方案,因为它们并不优雅(仍然是用C风格编写的),而且对于不同的数据类型(如int或size_t),它们也不适用。
-
The benchmark results also show that fmt is the superrior integral data type serialization since it is at least 7x faster than other approaches.
基准测试结果还表明,fmt是超rior整型数据类型序列化,因为它至少比其他方法快7倍。
-
We can speed up this algorithm 10x if we use the binary format. This approach is significantly faster than writing to a formatted text file because we only do raw copy from the memory to the output. If you want to have more flexible and portable solutions then try cereal or boost::serialization or protocol-buffer. According to this performance study cereal seem to be the fastest.
如果我们使用二进制格式,我们可以加速这个算法10x。这种方法比写入格式化文本文件要快得多,因为我们只执行从内存到输出的原始拷贝。如果您希望拥有更灵活和可移植的解决方案,那么可以尝试使用grain或boost::序列化或协议缓冲区。根据这项性能研究,麦片似乎是最快的。
#2
72
std::ofstream fout("vector.txt");
fout << std::setprecision(10);
for(auto const& x : vector)
fout << x << '\n';
Everything I changed had theoretically worse performance in your version of the code, but the std::endl
was the real killer. std::vector::at
(with bounds checking, which you don't need) would be the second, then the fact that you did not use iterators.
理论上,我所改变的一切在您的代码版本中都有较差的性能,但是std::endl才是真正的杀手。向量::at(带有边界检查,您不需要它)将是第二个,然后您不使用迭代器的事实。
Why default-construct a std::ofstream
and then call open
, when you can do it in one step? Why call close
when RAII (the destructor) takes care of it for you? You can also call
为什么要默认地构建一个std::ofstream然后调用open,当您可以一步完成时?当RAII(析构函数)替你处理时,为什么要调用close ?你也可以叫
fout << std::setprecision(10)
just once, before the loop.
只有一次,在循环之前。
As noted in the comment below, if your vector is of elements of fundamental type, you might get a better performance with for(auto x : vector)
. Measure the running time / inspect the assembly output.
正如下面注释中所提到的,如果您的向量是基本类型的元素,那么使用for(auto x: vector)可能会获得更好的性能。测量运行时间/检查程序集输出。
Just to point out another thing that caught my eyes, this:
我想指出另一件吸引我的事情:
for(l = 0; l < vector.size(); l++)
What is this l
? Why declare it outside the loop? It seems you don't need it in the outer scope, so don't. And also the post-increment.
这是什么我?为什么要在循环之外声明它呢?看起来你不需要它在外部范围内,所以不要。还有post-increment。
The result:
结果:
for(size_t l = 0; l < vector.size(); ++l)
I'm sorry for making code review out of this post.
很抱歉,我把代码审查从这篇文章中删除了。
#3
20
You can also use a rather neat form of outputting contents of any vector
into the file, with a help of iterators and copy
function.
在迭代器和复制函数的帮助下,您还可以使用一种相当简洁的方式将任何向量的内容输出到文件中。
std::ofstream fout("vector.txt");
fout.precision(10);
std::copy(numbers.begin(), numbers.end(),
std::ostream_iterator<double>(fout, "\n"));
This solutions is practically the same with LogicStuff's solution in terms of execution time. But it also illustrates how to print the contents just with a single copy
function which, as I suppose, looks pretty well.
这个解决方案实际上与LogicStuff的解决方案在执行时间上是一样的。但它也说明了如何仅用一个复制函数来打印内容,我想,它看起来很不错。
#4
11
OK, I'm sad that there are three solutions that attempt to give you a fish, but no solution that attempts to teach you how to fish.
好吧,我很难过有三种解决方案试图给你一条鱼,但是没有办法教你如何捕鱼。
When you have a performance problem, the solution is to use a profiler, and fix whatever the problem the profiler shows.
当您遇到性能问题时,解决方案是使用分析器,并修复分析器显示的任何问题。
Converting double-to-string for 300,000 doubles will not take 3 minutes on any computer that has shipped in the last 10 years.
在过去的10年里,任何一台电脑都不需要花3分钟就能完成30万双。
Writing 3 MB of data to disk (an average size of 300,000 doubles) will not take 3 minutes on any computer that has shipped in the last 10 years.
在过去10年里,在任何一台已经发布的计算机上,将3mb的数据写到磁盘(平均大小为30万倍)都不需要花费3分钟。
If you profile this, my guess is that you'll find that fout gets flushed 300,000 times, and that flushing is slow, because it may involve blocking, or semi-blocking, I/O. Thus, you need to avoid the blocking I/O. The typical way of doing that is to prepare all your I/O to a single buffer (create a stringstream, write to that) and then write that buffer to a physical file in one go. This is the solution hungptit describes, except I think that what's missing is explaining WHY that solution is a good solution.
如果你分析这个,我猜你会发现fout被刷新了30万次,而且刷新速度很慢,因为它可能涉及到阻塞或半阻塞,I/O。因此,您需要避免阻塞I/O。这样做的典型方法是将所有I/O准备到一个缓冲区(创建一个stringstream,对其进行写入),然后一次性将该缓冲区写入一个物理文件。这是hungptit描述的解决方案,除了我认为缺少的是解释为什么这个解决方案是一个好的解决方案。
Or, to put it another way: What the profiler will tell you is that calling write() (on Linux) or WriteFile() (on Windows) is much slower than just copying a few bytes into a memory buffer, because it's a user/kernel level transition. If std::endl causes this to happen for each double, you're going to have a bad (slow) time. Replace it with something that just stays in user space and puts data in RAM!
或者,换句话说:分析器会告诉您,调用write()(在Linux上)或WriteFile()(在Windows上)要比将几个字节复制到内存缓冲区慢得多,因为这是一个用户/内核级的转换。如果std::endl使每一个双精度浮点数发生,你将会有一个糟糕的(慢)时间。用一些只停留在用户空间并将数据放入RAM中的东西来替换它!
If that's still not fast enough, it may be that the specific-precision version of operator<<() on strings is slow or involves unnecessary overhead. If so, you may be able to further speed up the code by using sprintf() or some other potentially faster function to generate data into the in-memory buffer, before you finally write the entire buffer to a file in one go.
如果这还不够快,可能是操作符<<()在字符串上的特定精度版本比较慢,或者涉及不必要的开销。如果是这样,您可以通过使用sprintf()或其他一些可能更快的函数将数据生成到内存缓冲区中,从而进一步加快代码的速度,然后再一次将整个缓冲区写入文件。
#5
5
You have two main bottlenecks in your program: output and formatting text.
程序中有两个主要的瓶颈:输出和格式化文本。
To increase performance, you will want to increase the amount of data output per call. For example, 1 output transfer of 500 characters is faster than 500 transfers of 1 character.
为了提高性能,您需要增加每次调用的数据输出量。例如,一个500个字符的输出传输速度比500个字符的传输速度快。
My recommendation is you format the data to a big buffer, then block write the buffer.
我的建议是将数据格式化为一个大的缓冲区,然后阻塞写入缓冲区。
Here's an example:
这里有一个例子:
char buffer[1024 * 1024];
unsigned int buffer_index = 0;
const unsigned int size = my_vector.size();
for (unsigned int i = 0; i < size; ++i)
{
signed int characters_formatted = snprintf(&buffer[buffer_index],
(1024 * 1024) - buffer_index,
"%.10f", my_vector[i]);
if (characters_formatted > 0)
{
buffer_index += (unsigned int) characters_formatted;
}
}
cout.write(&buffer[0], buffer_index);
You should first try changing optimization settings in your compiler before messing with the code.
在打乱代码之前,您应该先尝试更改编译器中的优化设置。
#6
2
Here is a slightly different solution: save your doubles in binary form.
这里有一个稍微不同的解决方案:将双打保存为二进制形式。
int fd = ::open("/path/to/the/file", O_WRONLY /* whatever permission */);
::write(fd, &vector[0], vector.size() * sizeof(vector[0]));
Since you mentioned that you have 300k doubles, which equals to 300k * 8 bytes = 2.4M, you can save all of them to local disk file in less than 0.1 second. The only drawback of this method is saved file is not as readable as string representation, but a HexEditor can solve that problem.
既然您提到您有300k双精度,即300k * 8字节= 2.4M,那么您可以在0.1秒内将它们全部保存到本地磁盘文件中。这种方法唯一的缺点是保存的文件不像字符串表示那样可读,但是六角形编辑器可以解决这个问题。
If you prefer more robust way, there are plenty of serialization libraries/tools available on line. They provide more benefits, such as language-neutral, machine-independent, flexible compression algorithm, etc. Those are the two I usually use:
如果您喜欢更健壮的方式,那么可以在线使用大量的序列化库/工具。它们提供了更多的好处,比如语言中立的、独立于机器的、灵活的压缩算法等等。
- google/protobuf
- 谷歌/ protobuf
- NetCDF
- NetCDF