I have a file with millions of lines, each line has 3 floats separated by spaces. It takes a lot of time to read the file, so I tried to read them using memory mapped files only to find out that the problem is not with the speed of IO but with the speed of the parsing.
我有一个有数百万行的文件,每行有3个浮点分隔的空间。读取文件需要花费大量时间,所以我尝试使用内存映射文件读取它们,结果发现问题不在于IO的速度,而在于解析的速度。
My current parsing is to take the stream (called file) and do the following
我当前的解析是获取流(称为file)并执行以下操作
float x,y,z;
file >> x >> y >> z;
Someone in Stack Overflow recommended to use Boost.Spirit but I couldn't find any simple tutorial to explain how to use it.
栈溢出中有人建议使用Boost。但是我找不到任何简单的教程来解释如何使用它。
I'm trying to find a simple and efficient way to parse a line that looks like this:
我想找到一种简单而有效的方法来解析这样一行:
"134.32 3545.87 3425"
I will really appreciate some help. I wanted to use strtok to split it, but I don't know how to convert strings to floats, and I'm not quite sure it's the best way.
我会很感激你的帮助。我想用strtok来分割它,但是我不知道如何将字符串转换成浮点数,我不确定这是最好的方法。
I don't mind if the solution will be Boost or not. I don't mind if it won't be the most efficient solution ever, but I'm sure that it is possible to double the speed.
我不介意解决方案是否会得到提振。我不介意它是不是有史以来最有效的解决方案,但我相信有可能把速度提高一倍。
Thanks in advance.
提前谢谢。
7 个解决方案
#1
18
If the conversion is the bottle neck (which is quite possible), you should start by using the different possiblities in the standard. Logically, one would expect them to be very close, but practically, they aren't always:
如果转换是瓶颈(这是完全可能的),您应该从使用标准中不同的可能性开始。从逻辑上讲,人们会认为它们非常接近,但实际上,它们并不总是:
-
You've already determined that
std::ifstream
is too slow.您已经确定了std::ifstream太慢。
-
Converting your memory mapped data to an
std::istringstream
is almost certainly not a good solution; you'll first have to create a string, which will copy all of the data.将内存映射数据转换为std::istringstream几乎肯定不是一个好的解决方案;首先必须创建一个字符串,该字符串将复制所有数据。
-
Writing your own
streambuf
to read directly from the memory, without copying (or using the deprecatedstd::istrstream
) might be a solution, although if the problem really is the conversion... this still uses the same conversion routines.编写自己的streambuf可以直接从内存中读取,而不需要复制(或者使用已弃用的std: istrstream)可能是一个解决方案,但如果问题真的是转换……这仍然使用相同的转换例程。
-
You can always try
fscanf
, orscanf
on your memory mapped stream. Depending on the implementation, they might be faster than the variousistream
implementations.您可以在内存映射流上尝试fscanf或scanf。根据实现的不同,它们可能比各种istream实现要快。
-
Probably faster than any of these is to use
strtod
. No need to tokenize for this:strtod
skips leading white space (including'\n'
), and has an out parameter where it puts the address of the first character not read. The end condition is a bit tricky, your loop should probably look a bit like:使用strtod可能比以上任何一种都要快。无需为此进行标记:strtod跳过领先的空格(包括'\n'),并有一个out参数,将第一个字符的地址置为不读。最后的条件有点棘手,你的循环应该看起来有点像:
char* begin; // Set to point to the mmap'ed data... // You'll also have to arrange for a '\0' // to follow the data. This is probably // the most difficult issue. char* end; errno = 0; double tmp = strtod( begin, &end ); while ( errno == 0 && end != begin ) { // do whatever with tmp... begin = end; tmp = strtod( begin, &end ); }
If none of these are fast enough, you'll have to consider the actual data. It probably has some sort of additional constraints, which means that you can potentially write a conversion routine which is faster than the more general ones; e.g. strtod
has to handle both fixed and scientific, and it has to be 100% accurate even if there are 17 significant digits. It also has to be locale specific. All of this is added complexity, which means added code to execute. But beware: writing an efficient and correct conversion routine, even for a restricted set of input, is non-trivial; you really do have to know what you are doing.
如果这些都不够快,您将不得不考虑实际的数据。它可能有一些额外的约束,这意味着你可以写一个转换例程比一般的程序更快;例如,strtod必须同时处理固定的和科学的数据,即使有17个有效数字,它也必须100%准确。它还必须是特定于地区的。所有这些都增加了复杂性,这意味着要执行的代码增加了。但是要注意:编写一个高效且正确的转换例程,即使是有限的输入,也是非常重要的;你真的必须知道你在做什么。
EDIT:
编辑:
Just out of curiosity, I've run some tests. In addition to the afore mentioned solutions, I wrote a simple custom converter, which only handles fixed point (no scientific), with at most five digits after the decimal, and the value before the decimal must fit in an int
:
出于好奇,我做了一些测试。除了上述的解决方案外,我还编写了一个简单的自定义转换器,它只处理不动点(没有科学性),小数点后最多5位,小数点前的值必须符合int:
double
convert( char const* source, char const** endPtr )
{
char* end;
int left = strtol( source, &end, 10 );
double results = left;
if ( *end == '.' ) {
char* start = end + 1;
int right = strtol( start, &end, 10 );
static double const fracMult[]
= { 0.0, 0.1, 0.01, 0.001, 0.0001, 0.00001 };
results += right * fracMult[ end - start ];
}
if ( endPtr != nullptr ) {
*endPtr = end;
}
return results;
}
(If you actually use this, you should definitely add some error handling. This was just knocked up quickly for experimental purposes, to read the test file I'd generated, and nothing else.)
(如果您确实使用了这个,您肯定应该添加一些错误处理。这只是为了实验目的,为了读取我生成的测试文件,而很快地就完成了。
The interface is exactly that of strtod
, to simplify coding.
该接口正是strtod的接口,以简化编码。
I ran the benchmarks in two environments (on different machines, so the absolute values of any times aren't relevant). I got the following results:
我在两个环境中运行基准(在不同的机器上,所以任何时间的绝对值都不相关)。我得到了以下结果:
Under Windows 7, compiled with VC 11 (/O2):
在Windows 7下,用vc11 (/O2)编译:
Testing Using fstream directly (5 iterations)...
6.3528e+006 microseconds per iteration
Testing Using fscan directly (5 iterations)...
685800 microseconds per iteration
Testing Using strtod (5 iterations)...
597000 microseconds per iteration
Testing Using manual (5 iterations)...
269600 microseconds per iteration
Under Linux 2.6.18, compiled with g++ 4.4.2 (-O2, IIRC):
在Linux 2.6.18下,用g++ 4.4.2 (-O2, IIRC)编译:
Testing Using fstream directly (5 iterations)...
784000 microseconds per iteration
Testing Using fscanf directly (5 iterations)...
526000 microseconds per iteration
Testing Using strtod (5 iterations)...
382000 microseconds per iteration
Testing Using strtof (5 iterations)...
360000 microseconds per iteration
Testing Using manual (5 iterations)...
186000 microseconds per iteration
In all cases, I'm reading 554000 lines, each with 3 randomly generated floating point in the range [0...10000)
.
在所有情况下,我读取的是554000行,每一行都有3个随机生成的浮点数(0…10000)。
The most striking thing is the enormous difference between fstream
and fscan
under Windows (and the relatively small difference between fscan
and strtod
). The second thing is just how much the simple custom conversion function gains, on both platforms. The necessary error handling would slow it down a little, but the difference is still significant. I expected some improvement, since it doesn't handle a lot of things the the standard conversion routines do (like scientific format, very, very small numbers, Inf and NaN, i18n, etc.), but not this much.
最引人注目的是fstream和Windows下的fscan之间的巨大差异(以及fscan和strtod之间相对较小的差异)。第二件事是简单的自定义转换函数在两个平台上的收益。必要的错误处理会使它慢一些,但是差异仍然很大。我期望有一些改进,因为它不能处理很多标准转换例程所做的事情(比如科学格式、非常非常小的数字、Inf和NaN、i18n等),但不能处理这么多。
#2
44
UPDATE
Since Spirit X3 is available for testing, I've updated the benchmarks. Meanwhile I've used Nonius to get statistically sound benchmarks.
因为Spirit X3可以进行测试,所以我更新了基准。与此同时,我使用了Nonius来获得统计上的声音基准。
All charts below are available interactive online
以下所有图表均可在线互动
Benchmark CMake project + testdata used is on github: https://github.com/sehe/bench_float_parsing
使用的基准CMake项目+ testdata在github上:https://github.com/sehe/bench_float_parser
Summary:
Spirit parsers are fastest. If you can use C++14 consider the experimental version Spirit X3:
精神解析器是最快的。如果你能用c++ 14考虑实验版精神X3:
The above is measures using memory mapped files. Using IOstreams, it will be slower accross the board,
上面是使用内存映射文件的度量。使用IOstreams,它将在棋盘上运行得更慢,
but not as slow as scanf
using C/POSIX FILE*
function calls:
但不要像使用C/POSIX文件的scanf那样慢,函数调用:
What follows is parts from the OLD answer
以下是旧答案的一部分
I implemented the Spirit version, and ran a benchmark comparing to the other suggested answers.
我实现了Spirit版本,并与其他建议的答案进行了比较。
Here's my results, all tests run on the same body of input (515Mb of
input.txt
). See below for exact specs.这是我的结果,所有测试运行在同一个输入体(515Mb的input.txt)上。具体规格见下文。
(wall clock time in seconds, average of 2+ runs)(挂钟时间以秒为单位,平均2次以上)
To my own surprise, Boost Spirit turns out to be fastest, and most elegant:
令我惊讶的是,Boost Spirit竟然是最快、最优雅的:
- handles/reports errors
- 处理/报告错误
- supports +/-Inf and NaN and variable whitespace
- 支持+/-Inf和NaN以及可变空格
- no problems at all detecting the end of input (as opposed to the other mmap answer)
- 完全不存在检测输入结束的问题(与其他mmap答案相反)
looks nice:
看起来不错:
bool ok = phrase_parse(f,l, // source iterators (double_ > double_ > double_) % eol, // grammar blank, // skipper data); // output attribute
Note that
boost::spirit::istreambuf_iterator
was unspeakably much slower (15s+). I hope this helps!注意boost:::spirit: istreambuf_iterator的速度非常慢(15s+)。我希望这可以帮助!
Benchmark details
All parsing done into
vector
ofstruct float3 { float x,y,z; }
.将所有解析完成为struct float3的向量{float x,y,z;}。
Generate input file using
生成输入文件使用
od -f -A none --width=12 /dev/urandom | head -n 11000000
This results in a 515Mb file containing data like
这将导致包含类似数据的515Mb文件
-2627.0056 -1.967235e-12 -2.2784738e+33 -1.0664798e-27 -4.6421956e-23 -6.917859e+20 -1.1080849e+36 2.8909405e-33 1.7888695e-12 -7.1663235e+33 -1.0840628e+36 1.5343362e-12 -3.1773715e-17 -6.3655537e-22 -8.797282e+31 9.781095e+19 1.7378472e-37 63825084 -1.2139188e+09 -5.2464635e-05 -2.1235992e-38 3.0109424e+08 5.3939846e+30 -6.6146894e-20
Compile the program using:
编译程序使用:
g++ -std=c++0x -g -O3 -isystem -march=native test.cpp -o test -lboost_filesystem -lboost_iostreams
Measure wall clock time using
测量挂钟的使用时间
time ./test < input.txt
Environment:
- Linux desktop 4.2.0-42-generic #49-Ubuntu SMP x86_64
- Linux桌面4.2.0-42-泛型#49-Ubuntu SMP x86_64
- Intel(R) Core(TM) i7-3770K CPU @ 3.50GHz
- 英特尔(R) Core(TM) i7-3770K CPU @ 3.50GHz。
- 32GiB RAM
- 32镶条内存
Full Code
Full code to the old benchmark is in the edit history of this post, the newest version is on github
旧基准的完整代码在本文的编辑历史中,最新的版本在github上
#3
13
Before you start, verify that this is the slow part of your application and get a test harness around it so you can measure improvements.
在开始之前,请验证这是应用程序的缓慢部分,并围绕它获得一个测试约束,以便您可以度量改进。
boost::spirit
would be overkill for this in my opinion. Try fscanf
在我看来,这种精神会让人受不了。尝试fscanf
FILE* f = fopen("yourfile");
if (NULL == f) {
printf("Failed to open 'yourfile'");
return;
}
float x,y,z;
int nItemsRead = fscanf(f,"%f %f %f\n", &x, &y, &z);
if (3 != nItemsRead) {
printf("Oh dear, items aren't in the right format.\n");
return;
}
#4
2
I would check out this related post Using ifstream to read floats or How do I tokenize a string in C++ particularly the posts related to C++ String Toolkit Library. I've used C strtok, C++ streams, Boost tokenizer and the best of them for the ease and use is C++ String Toolkit Library.
我将使用ifstream来阅读这篇相关的文章,或者如何在c++中对字符串进行标记,特别是与c++字符串工具包库相关的文章。我使用过C strtok、c++流、Boost tokenizer,其中最容易使用的是c++ String Toolkit库。
#5
0
a nitty-gritty solution would be to throw more cores at the problem, spawning multiple threads. If the bottleneck is just the CPU you can halve down the running time by spawning two threads (on multicore CPUs)
一个具体的解决方案是向这个问题扔更多的内核,生成多个线程。如果瓶颈仅仅是CPU,那么通过生成两个线程(在多核CPU上),可以将运行时间减半
some other tips:
其他一些小贴士:
-
try to avoid parsing functions from library such boost and/or std. They are bloated with error checking conditions and much of the processing time is spent doing these checks. For just a couple conversions they are fine but fail miserably when it comes to process millions of values. If you already know that your data is well-formatted you can write (or find) a custom optimized C function which does only the data conversion
尽量避免从库中解析这样的boost和/或std函数。对于几个转换,它们是可以的,但是当涉及到处理数百万个值时,它们会失败得很惨。如果您已经知道您的数据具有良好的格式,那么您可以编写(或查找)一个自定义优化的C函数,该函数只执行数据转换
-
use a large memory buffer (let's say 10 Mbytes) in which you load chunks of your file and do the conversion on there
使用一个大的内存缓冲区(比方说10mbytes),在其中加载文件块并在其上进行转换
-
divide et impera: split your problem into smaller easier ones: preprocess your file, make it single line single float, split each line by the "." character and convert integers instead of float, then merge the two integers to create the float number
分割et impera:把你的问题分割成更小的问题:预处理你的文件,使它成为单行的浮点数,将每一行分割成“。”字符,而不是浮点数,然后合并两个整数来创建浮点数
#6
0
I believe one most important rule in the string processing is "read only once, one character at a time". It is always simpler, faster and more reliable, I think.
我认为字符串处理中最重要的一条规则是“一次只读一个字符”。我认为它总是更简单、更快、更可靠。
I made simple benchmark program to show how simple it is. My test says this code runs 40% faster than strtod
version.
我做了一个简单的基准程序来展示它有多简单。我的测试显示这段代码比strtod版本快40%。
#include <iostream>
#include <sstream>
#include <iomanip>
#include <stdlib.h>
#include <math.h>
#include <time.h>
#include <sys/time.h>
using namespace std;
string test_generate(size_t n)
{
srand((unsigned)time(0));
double sum = 0.0;
ostringstream os;
os << std::fixed;
for (size_t i=0; i<n; ++i)
{
unsigned u = rand();
int w = 0;
if (u > UINT_MAX/2)
w = - (u - UINT_MAX/2);
else
w = + (u - UINT_MAX/2);
double f = w / 1000.0;
sum += f;
os << f;
os << " ";
}
printf("generated %f\n", sum);
return os.str();
}
void read_float_ss(const string& in)
{
double sum = 0.0;
const char* begin = in.c_str();
char* end = NULL;
errno = 0;
double f = strtod( begin, &end );
sum += f;
while ( errno == 0 && end != begin )
{
begin = end;
f = strtod( begin, &end );
sum += f;
}
printf("scanned %f\n", sum);
}
double scan_float(const char* str, size_t& off, size_t len)
{
static const double bases[13] = {
0.0, 10.0, 100.0, 1000.0, 10000.0,
100000.0, 1000000.0, 10000000.0, 100000000.0,
1000000000.0, 10000000000.0, 100000000000.0, 1000000000000.0,
};
bool begin = false;
bool fail = false;
bool minus = false;
int pfrac = 0;
double dec = 0.0;
double frac = 0.0;
for (; !fail && off<len; ++off)
{
char c = str[off];
if (c == '+')
{
if (!begin)
begin = true;
else
fail = true;
}
else if (c == '-')
{
if (!begin)
begin = true;
else
fail = true;
minus = true;
}
else if (c == '.')
{
if (!begin)
begin = true;
else if (pfrac)
fail = true;
pfrac = 1;
}
else if (c >= '0' && c <= '9')
{
if (!begin)
begin = true;
if (pfrac == 0)
{
dec *= 10;
dec += c - '0';
}
else if (pfrac < 13)
{
frac += (c - '0') / bases[pfrac];
++pfrac;
}
}
else
{
break;
}
}
if (!fail)
{
double f = dec + frac;
if (minus)
f = -f;
return f;
}
return 0.0;
}
void read_float_direct(const string& in)
{
double sum = 0.0;
size_t len = in.length();
const char* str = in.c_str();
for (size_t i=0; i<len; ++i)
{
double f = scan_float(str, i, len);
sum += f;
}
printf("scanned %f\n", sum);
}
int main()
{
const int n = 1000000;
printf("count = %d\n", n);
string in = test_generate(n);
{
struct timeval t1;
gettimeofday(&t1, 0);
printf("scan start\n");
read_float_ss(in);
struct timeval t2;
gettimeofday(&t2, 0);
double elapsed = (t2.tv_sec - t1.tv_sec) * 1000000.0;
elapsed += (t2.tv_usec - t1.tv_usec) / 1000.0;
printf("elapsed %.2fms\n", elapsed);
}
{
struct timeval t1;
gettimeofday(&t1, 0);
printf("scan start\n");
read_float_direct(in);
struct timeval t2;
gettimeofday(&t2, 0);
double elapsed = (t2.tv_sec - t1.tv_sec) * 1000000.0;
elapsed += (t2.tv_usec - t1.tv_usec) / 1000.0;
printf("elapsed %.2fms\n", elapsed);
}
return 0;
}
Below is console output from i7 Mac Book Pro (compiled in XCode 4.6).
下面是i7 Mac Book Pro的控制台输出(在XCode 4.6中编译)。
count = 1000000
generated -1073202156466.638184
scan start
scanned -1073202156466.638184
elapsed 83.34ms
scan start
scanned -1073202156466.638184
elapsed 53.50ms
#7
0
using C is going to be the fastest solution.
Split into tokens using
convert to float with strtok
and then
strtof
. Or if you know the exact format use fscanf
.
使用C将是最快的解决方案。使用strtok将其拆分为令牌,然后使用strtof将其转换为float。或者如果你知道确切的格式使用fscanf。
#1
18
If the conversion is the bottle neck (which is quite possible), you should start by using the different possiblities in the standard. Logically, one would expect them to be very close, but practically, they aren't always:
如果转换是瓶颈(这是完全可能的),您应该从使用标准中不同的可能性开始。从逻辑上讲,人们会认为它们非常接近,但实际上,它们并不总是:
-
You've already determined that
std::ifstream
is too slow.您已经确定了std::ifstream太慢。
-
Converting your memory mapped data to an
std::istringstream
is almost certainly not a good solution; you'll first have to create a string, which will copy all of the data.将内存映射数据转换为std::istringstream几乎肯定不是一个好的解决方案;首先必须创建一个字符串,该字符串将复制所有数据。
-
Writing your own
streambuf
to read directly from the memory, without copying (or using the deprecatedstd::istrstream
) might be a solution, although if the problem really is the conversion... this still uses the same conversion routines.编写自己的streambuf可以直接从内存中读取,而不需要复制(或者使用已弃用的std: istrstream)可能是一个解决方案,但如果问题真的是转换……这仍然使用相同的转换例程。
-
You can always try
fscanf
, orscanf
on your memory mapped stream. Depending on the implementation, they might be faster than the variousistream
implementations.您可以在内存映射流上尝试fscanf或scanf。根据实现的不同,它们可能比各种istream实现要快。
-
Probably faster than any of these is to use
strtod
. No need to tokenize for this:strtod
skips leading white space (including'\n'
), and has an out parameter where it puts the address of the first character not read. The end condition is a bit tricky, your loop should probably look a bit like:使用strtod可能比以上任何一种都要快。无需为此进行标记:strtod跳过领先的空格(包括'\n'),并有一个out参数,将第一个字符的地址置为不读。最后的条件有点棘手,你的循环应该看起来有点像:
char* begin; // Set to point to the mmap'ed data... // You'll also have to arrange for a '\0' // to follow the data. This is probably // the most difficult issue. char* end; errno = 0; double tmp = strtod( begin, &end ); while ( errno == 0 && end != begin ) { // do whatever with tmp... begin = end; tmp = strtod( begin, &end ); }
If none of these are fast enough, you'll have to consider the actual data. It probably has some sort of additional constraints, which means that you can potentially write a conversion routine which is faster than the more general ones; e.g. strtod
has to handle both fixed and scientific, and it has to be 100% accurate even if there are 17 significant digits. It also has to be locale specific. All of this is added complexity, which means added code to execute. But beware: writing an efficient and correct conversion routine, even for a restricted set of input, is non-trivial; you really do have to know what you are doing.
如果这些都不够快,您将不得不考虑实际的数据。它可能有一些额外的约束,这意味着你可以写一个转换例程比一般的程序更快;例如,strtod必须同时处理固定的和科学的数据,即使有17个有效数字,它也必须100%准确。它还必须是特定于地区的。所有这些都增加了复杂性,这意味着要执行的代码增加了。但是要注意:编写一个高效且正确的转换例程,即使是有限的输入,也是非常重要的;你真的必须知道你在做什么。
EDIT:
编辑:
Just out of curiosity, I've run some tests. In addition to the afore mentioned solutions, I wrote a simple custom converter, which only handles fixed point (no scientific), with at most five digits after the decimal, and the value before the decimal must fit in an int
:
出于好奇,我做了一些测试。除了上述的解决方案外,我还编写了一个简单的自定义转换器,它只处理不动点(没有科学性),小数点后最多5位,小数点前的值必须符合int:
double
convert( char const* source, char const** endPtr )
{
char* end;
int left = strtol( source, &end, 10 );
double results = left;
if ( *end == '.' ) {
char* start = end + 1;
int right = strtol( start, &end, 10 );
static double const fracMult[]
= { 0.0, 0.1, 0.01, 0.001, 0.0001, 0.00001 };
results += right * fracMult[ end - start ];
}
if ( endPtr != nullptr ) {
*endPtr = end;
}
return results;
}
(If you actually use this, you should definitely add some error handling. This was just knocked up quickly for experimental purposes, to read the test file I'd generated, and nothing else.)
(如果您确实使用了这个,您肯定应该添加一些错误处理。这只是为了实验目的,为了读取我生成的测试文件,而很快地就完成了。
The interface is exactly that of strtod
, to simplify coding.
该接口正是strtod的接口,以简化编码。
I ran the benchmarks in two environments (on different machines, so the absolute values of any times aren't relevant). I got the following results:
我在两个环境中运行基准(在不同的机器上,所以任何时间的绝对值都不相关)。我得到了以下结果:
Under Windows 7, compiled with VC 11 (/O2):
在Windows 7下,用vc11 (/O2)编译:
Testing Using fstream directly (5 iterations)...
6.3528e+006 microseconds per iteration
Testing Using fscan directly (5 iterations)...
685800 microseconds per iteration
Testing Using strtod (5 iterations)...
597000 microseconds per iteration
Testing Using manual (5 iterations)...
269600 microseconds per iteration
Under Linux 2.6.18, compiled with g++ 4.4.2 (-O2, IIRC):
在Linux 2.6.18下,用g++ 4.4.2 (-O2, IIRC)编译:
Testing Using fstream directly (5 iterations)...
784000 microseconds per iteration
Testing Using fscanf directly (5 iterations)...
526000 microseconds per iteration
Testing Using strtod (5 iterations)...
382000 microseconds per iteration
Testing Using strtof (5 iterations)...
360000 microseconds per iteration
Testing Using manual (5 iterations)...
186000 microseconds per iteration
In all cases, I'm reading 554000 lines, each with 3 randomly generated floating point in the range [0...10000)
.
在所有情况下,我读取的是554000行,每一行都有3个随机生成的浮点数(0…10000)。
The most striking thing is the enormous difference between fstream
and fscan
under Windows (and the relatively small difference between fscan
and strtod
). The second thing is just how much the simple custom conversion function gains, on both platforms. The necessary error handling would slow it down a little, but the difference is still significant. I expected some improvement, since it doesn't handle a lot of things the the standard conversion routines do (like scientific format, very, very small numbers, Inf and NaN, i18n, etc.), but not this much.
最引人注目的是fstream和Windows下的fscan之间的巨大差异(以及fscan和strtod之间相对较小的差异)。第二件事是简单的自定义转换函数在两个平台上的收益。必要的错误处理会使它慢一些,但是差异仍然很大。我期望有一些改进,因为它不能处理很多标准转换例程所做的事情(比如科学格式、非常非常小的数字、Inf和NaN、i18n等),但不能处理这么多。
#2
44
UPDATE
Since Spirit X3 is available for testing, I've updated the benchmarks. Meanwhile I've used Nonius to get statistically sound benchmarks.
因为Spirit X3可以进行测试,所以我更新了基准。与此同时,我使用了Nonius来获得统计上的声音基准。
All charts below are available interactive online
以下所有图表均可在线互动
Benchmark CMake project + testdata used is on github: https://github.com/sehe/bench_float_parsing
使用的基准CMake项目+ testdata在github上:https://github.com/sehe/bench_float_parser
Summary:
Spirit parsers are fastest. If you can use C++14 consider the experimental version Spirit X3:
精神解析器是最快的。如果你能用c++ 14考虑实验版精神X3:
The above is measures using memory mapped files. Using IOstreams, it will be slower accross the board,
上面是使用内存映射文件的度量。使用IOstreams,它将在棋盘上运行得更慢,
but not as slow as scanf
using C/POSIX FILE*
function calls:
但不要像使用C/POSIX文件的scanf那样慢,函数调用:
What follows is parts from the OLD answer
以下是旧答案的一部分
I implemented the Spirit version, and ran a benchmark comparing to the other suggested answers.
我实现了Spirit版本,并与其他建议的答案进行了比较。
Here's my results, all tests run on the same body of input (515Mb of
input.txt
). See below for exact specs.这是我的结果,所有测试运行在同一个输入体(515Mb的input.txt)上。具体规格见下文。
(wall clock time in seconds, average of 2+ runs)(挂钟时间以秒为单位,平均2次以上)
To my own surprise, Boost Spirit turns out to be fastest, and most elegant:
令我惊讶的是,Boost Spirit竟然是最快、最优雅的:
- handles/reports errors
- 处理/报告错误
- supports +/-Inf and NaN and variable whitespace
- 支持+/-Inf和NaN以及可变空格
- no problems at all detecting the end of input (as opposed to the other mmap answer)
- 完全不存在检测输入结束的问题(与其他mmap答案相反)
looks nice:
看起来不错:
bool ok = phrase_parse(f,l, // source iterators (double_ > double_ > double_) % eol, // grammar blank, // skipper data); // output attribute
Note that
boost::spirit::istreambuf_iterator
was unspeakably much slower (15s+). I hope this helps!注意boost:::spirit: istreambuf_iterator的速度非常慢(15s+)。我希望这可以帮助!
Benchmark details
All parsing done into
vector
ofstruct float3 { float x,y,z; }
.将所有解析完成为struct float3的向量{float x,y,z;}。
Generate input file using
生成输入文件使用
od -f -A none --width=12 /dev/urandom | head -n 11000000
This results in a 515Mb file containing data like
这将导致包含类似数据的515Mb文件
-2627.0056 -1.967235e-12 -2.2784738e+33 -1.0664798e-27 -4.6421956e-23 -6.917859e+20 -1.1080849e+36 2.8909405e-33 1.7888695e-12 -7.1663235e+33 -1.0840628e+36 1.5343362e-12 -3.1773715e-17 -6.3655537e-22 -8.797282e+31 9.781095e+19 1.7378472e-37 63825084 -1.2139188e+09 -5.2464635e-05 -2.1235992e-38 3.0109424e+08 5.3939846e+30 -6.6146894e-20
Compile the program using:
编译程序使用:
g++ -std=c++0x -g -O3 -isystem -march=native test.cpp -o test -lboost_filesystem -lboost_iostreams
Measure wall clock time using
测量挂钟的使用时间
time ./test < input.txt
Environment:
- Linux desktop 4.2.0-42-generic #49-Ubuntu SMP x86_64
- Linux桌面4.2.0-42-泛型#49-Ubuntu SMP x86_64
- Intel(R) Core(TM) i7-3770K CPU @ 3.50GHz
- 英特尔(R) Core(TM) i7-3770K CPU @ 3.50GHz。
- 32GiB RAM
- 32镶条内存
Full Code
Full code to the old benchmark is in the edit history of this post, the newest version is on github
旧基准的完整代码在本文的编辑历史中,最新的版本在github上
#3
13
Before you start, verify that this is the slow part of your application and get a test harness around it so you can measure improvements.
在开始之前,请验证这是应用程序的缓慢部分,并围绕它获得一个测试约束,以便您可以度量改进。
boost::spirit
would be overkill for this in my opinion. Try fscanf
在我看来,这种精神会让人受不了。尝试fscanf
FILE* f = fopen("yourfile");
if (NULL == f) {
printf("Failed to open 'yourfile'");
return;
}
float x,y,z;
int nItemsRead = fscanf(f,"%f %f %f\n", &x, &y, &z);
if (3 != nItemsRead) {
printf("Oh dear, items aren't in the right format.\n");
return;
}
#4
2
I would check out this related post Using ifstream to read floats or How do I tokenize a string in C++ particularly the posts related to C++ String Toolkit Library. I've used C strtok, C++ streams, Boost tokenizer and the best of them for the ease and use is C++ String Toolkit Library.
我将使用ifstream来阅读这篇相关的文章,或者如何在c++中对字符串进行标记,特别是与c++字符串工具包库相关的文章。我使用过C strtok、c++流、Boost tokenizer,其中最容易使用的是c++ String Toolkit库。
#5
0
a nitty-gritty solution would be to throw more cores at the problem, spawning multiple threads. If the bottleneck is just the CPU you can halve down the running time by spawning two threads (on multicore CPUs)
一个具体的解决方案是向这个问题扔更多的内核,生成多个线程。如果瓶颈仅仅是CPU,那么通过生成两个线程(在多核CPU上),可以将运行时间减半
some other tips:
其他一些小贴士:
-
try to avoid parsing functions from library such boost and/or std. They are bloated with error checking conditions and much of the processing time is spent doing these checks. For just a couple conversions they are fine but fail miserably when it comes to process millions of values. If you already know that your data is well-formatted you can write (or find) a custom optimized C function which does only the data conversion
尽量避免从库中解析这样的boost和/或std函数。对于几个转换,它们是可以的,但是当涉及到处理数百万个值时,它们会失败得很惨。如果您已经知道您的数据具有良好的格式,那么您可以编写(或查找)一个自定义优化的C函数,该函数只执行数据转换
-
use a large memory buffer (let's say 10 Mbytes) in which you load chunks of your file and do the conversion on there
使用一个大的内存缓冲区(比方说10mbytes),在其中加载文件块并在其上进行转换
-
divide et impera: split your problem into smaller easier ones: preprocess your file, make it single line single float, split each line by the "." character and convert integers instead of float, then merge the two integers to create the float number
分割et impera:把你的问题分割成更小的问题:预处理你的文件,使它成为单行的浮点数,将每一行分割成“。”字符,而不是浮点数,然后合并两个整数来创建浮点数
#6
0
I believe one most important rule in the string processing is "read only once, one character at a time". It is always simpler, faster and more reliable, I think.
我认为字符串处理中最重要的一条规则是“一次只读一个字符”。我认为它总是更简单、更快、更可靠。
I made simple benchmark program to show how simple it is. My test says this code runs 40% faster than strtod
version.
我做了一个简单的基准程序来展示它有多简单。我的测试显示这段代码比strtod版本快40%。
#include <iostream>
#include <sstream>
#include <iomanip>
#include <stdlib.h>
#include <math.h>
#include <time.h>
#include <sys/time.h>
using namespace std;
string test_generate(size_t n)
{
srand((unsigned)time(0));
double sum = 0.0;
ostringstream os;
os << std::fixed;
for (size_t i=0; i<n; ++i)
{
unsigned u = rand();
int w = 0;
if (u > UINT_MAX/2)
w = - (u - UINT_MAX/2);
else
w = + (u - UINT_MAX/2);
double f = w / 1000.0;
sum += f;
os << f;
os << " ";
}
printf("generated %f\n", sum);
return os.str();
}
void read_float_ss(const string& in)
{
double sum = 0.0;
const char* begin = in.c_str();
char* end = NULL;
errno = 0;
double f = strtod( begin, &end );
sum += f;
while ( errno == 0 && end != begin )
{
begin = end;
f = strtod( begin, &end );
sum += f;
}
printf("scanned %f\n", sum);
}
double scan_float(const char* str, size_t& off, size_t len)
{
static const double bases[13] = {
0.0, 10.0, 100.0, 1000.0, 10000.0,
100000.0, 1000000.0, 10000000.0, 100000000.0,
1000000000.0, 10000000000.0, 100000000000.0, 1000000000000.0,
};
bool begin = false;
bool fail = false;
bool minus = false;
int pfrac = 0;
double dec = 0.0;
double frac = 0.0;
for (; !fail && off<len; ++off)
{
char c = str[off];
if (c == '+')
{
if (!begin)
begin = true;
else
fail = true;
}
else if (c == '-')
{
if (!begin)
begin = true;
else
fail = true;
minus = true;
}
else if (c == '.')
{
if (!begin)
begin = true;
else if (pfrac)
fail = true;
pfrac = 1;
}
else if (c >= '0' && c <= '9')
{
if (!begin)
begin = true;
if (pfrac == 0)
{
dec *= 10;
dec += c - '0';
}
else if (pfrac < 13)
{
frac += (c - '0') / bases[pfrac];
++pfrac;
}
}
else
{
break;
}
}
if (!fail)
{
double f = dec + frac;
if (minus)
f = -f;
return f;
}
return 0.0;
}
void read_float_direct(const string& in)
{
double sum = 0.0;
size_t len = in.length();
const char* str = in.c_str();
for (size_t i=0; i<len; ++i)
{
double f = scan_float(str, i, len);
sum += f;
}
printf("scanned %f\n", sum);
}
int main()
{
const int n = 1000000;
printf("count = %d\n", n);
string in = test_generate(n);
{
struct timeval t1;
gettimeofday(&t1, 0);
printf("scan start\n");
read_float_ss(in);
struct timeval t2;
gettimeofday(&t2, 0);
double elapsed = (t2.tv_sec - t1.tv_sec) * 1000000.0;
elapsed += (t2.tv_usec - t1.tv_usec) / 1000.0;
printf("elapsed %.2fms\n", elapsed);
}
{
struct timeval t1;
gettimeofday(&t1, 0);
printf("scan start\n");
read_float_direct(in);
struct timeval t2;
gettimeofday(&t2, 0);
double elapsed = (t2.tv_sec - t1.tv_sec) * 1000000.0;
elapsed += (t2.tv_usec - t1.tv_usec) / 1000.0;
printf("elapsed %.2fms\n", elapsed);
}
return 0;
}
Below is console output from i7 Mac Book Pro (compiled in XCode 4.6).
下面是i7 Mac Book Pro的控制台输出(在XCode 4.6中编译)。
count = 1000000
generated -1073202156466.638184
scan start
scanned -1073202156466.638184
elapsed 83.34ms
scan start
scanned -1073202156466.638184
elapsed 53.50ms
#7
0
using C is going to be the fastest solution.
Split into tokens using
convert to float with strtok
and then
strtof
. Or if you know the exact format use fscanf
.
使用C将是最快的解决方案。使用strtok将其拆分为令牌,然后使用strtof将其转换为float。或者如果你知道确切的格式使用fscanf。