
时间:2021-10-23 03:34:33

I have a file with millions of lines, each line has 3 floats separated by spaces. It takes a lot of time to read the file, so I tried to read them using memory mapped files only to find out that the problem is not with the speed of IO but with the speed of the parsing.


My current parsing is to take the stream (called file) and do the following


float x,y,z;
file >> x >> y >> z;

Someone in Stack Overflow recommended to use Boost.Spirit but I couldn't find any simple tutorial to explain how to use it.


I'm trying to find a simple and efficient way to parse a line that looks like this:


"134.32 3545.87 3425"

I will really appreciate some help. I wanted to use strtok to split it, but I don't know how to convert strings to floats, and I'm not quite sure it's the best way.


I don't mind if the solution will be Boost or not. I don't mind if it won't be the most efficient solution ever, but I'm sure that it is possible to double the speed.


Thanks in advance.


7 个解决方案



If the conversion is the bottle neck (which is quite possible), you should start by using the different possiblities in the standard. Logically, one would expect them to be very close, but practically, they aren't always:


  • You've already determined that std::ifstream is too slow.


  • Converting your memory mapped data to an std::istringstream is almost certainly not a good solution; you'll first have to create a string, which will copy all of the data.


  • Writing your own streambuf to read directly from the memory, without copying (or using the deprecated std::istrstream) might be a solution, although if the problem really is the conversion... this still uses the same conversion routines.

    编写自己的streambuf可以直接从内存中读取,而不需要复制(或者使用已弃用的std: istrstream)可能是一个解决方案,但如果问题真的是转换……这仍然使用相同的转换例程。

  • You can always try fscanf, or scanf on your memory mapped stream. Depending on the implementation, they might be faster than the various istream implementations.


  • Probably faster than any of these is to use strtod. No need to tokenize for this: strtod skips leading white space (including '\n'), and has an out parameter where it puts the address of the first character not read. The end condition is a bit tricky, your loop should probably look a bit like:


    char* begin;    //  Set to point to the mmap'ed data...
                    //  You'll also have to arrange for a '\0'
                    //  to follow the data.  This is probably
                    //  the most difficult issue.
    char* end;
    errno = 0;
    double tmp = strtod( begin, &end );
    while ( errno == 0 && end != begin ) {
        //  do whatever with tmp...
        begin = end;
        tmp = strtod( begin, &end );

If none of these are fast enough, you'll have to consider the actual data. It probably has some sort of additional constraints, which means that you can potentially write a conversion routine which is faster than the more general ones; e.g. strtod has to handle both fixed and scientific, and it has to be 100% accurate even if there are 17 significant digits. It also has to be locale specific. All of this is added complexity, which means added code to execute. But beware: writing an efficient and correct conversion routine, even for a restricted set of input, is non-trivial; you really do have to know what you are doing.




Just out of curiosity, I've run some tests. In addition to the afore mentioned solutions, I wrote a simple custom converter, which only handles fixed point (no scientific), with at most five digits after the decimal, and the value before the decimal must fit in an int:


convert( char const* source, char const** endPtr )
    char* end;
    int left = strtol( source, &end, 10 );
    double results = left;
    if ( *end == '.' ) {
        char* start = end + 1;
        int right = strtol( start, &end, 10 );
        static double const fracMult[] 
            = { 0.0, 0.1, 0.01, 0.001, 0.0001, 0.00001 };
        results += right * fracMult[ end - start ];
    if ( endPtr != nullptr ) {
        *endPtr = end;
    return results;

(If you actually use this, you should definitely add some error handling. This was just knocked up quickly for experimental purposes, to read the test file I'd generated, and nothing else.)


The interface is exactly that of strtod, to simplify coding.


I ran the benchmarks in two environments (on different machines, so the absolute values of any times aren't relevant). I got the following results:


Under Windows 7, compiled with VC 11 (/O2):

在Windows 7下,用vc11 (/O2)编译:

Testing Using fstream directly (5 iterations)...
    6.3528e+006 microseconds per iteration
Testing Using fscan directly (5 iterations)...
    685800 microseconds per iteration
Testing Using strtod (5 iterations)...
    597000 microseconds per iteration
Testing Using manual (5 iterations)...
    269600 microseconds per iteration

Under Linux 2.6.18, compiled with g++ 4.4.2 (-O2, IIRC):

在Linux 2.6.18下,用g++ 4.4.2 (-O2, IIRC)编译:

Testing Using fstream directly (5 iterations)...
    784000 microseconds per iteration
Testing Using fscanf directly (5 iterations)...
    526000 microseconds per iteration
Testing Using strtod (5 iterations)...
    382000 microseconds per iteration
Testing Using strtof (5 iterations)...
    360000 microseconds per iteration
Testing Using manual (5 iterations)...
    186000 microseconds per iteration

In all cases, I'm reading 554000 lines, each with 3 randomly generated floating point in the range [0...10000).


The most striking thing is the enormous difference between fstream and fscan under Windows (and the relatively small difference between fscan and strtod). The second thing is just how much the simple custom conversion function gains, on both platforms. The necessary error handling would slow it down a little, but the difference is still significant. I expected some improvement, since it doesn't handle a lot of things the the standard conversion routines do (like scientific format, very, very small numbers, Inf and NaN, i18n, etc.), but not this much.





Since Spirit X3 is available for testing, I've updated the benchmarks. Meanwhile I've used Nonius to get statistically sound benchmarks.

因为Spirit X3可以进行测试,所以我更新了基准。与此同时,我使用了Nonius来获得统计上的声音基准。

All charts below are available interactive online


Benchmark CMake project + testdata used is on github: https://github.com/sehe/bench_float_parsing

使用的基准CMake项目+ testdata在github上:https://github.com/sehe/bench_float_parser



Spirit parsers are fastest. If you can use C++14 consider the experimental version Spirit X3:

精神解析器是最快的。如果你能用c++ 14考虑实验版精神X3:


The above is measures using memory mapped files. Using IOstreams, it will be slower accross the board,



but not as slow as scanf using C/POSIX FILE* function calls:



What follows is parts from the OLD answer


I implemented the Spirit version, and ran a benchmark comparing to the other suggested answers.


Here's my results, all tests run on the same body of input (515Mb of input.txt). See below for exact specs.


(wall clock time in seconds, average of 2+ runs)


To my own surprise, Boost Spirit turns out to be fastest, and most elegant:

令我惊讶的是,Boost Spirit竟然是最快、最优雅的:

  • handles/reports errors
  • 处理/报告错误
  • supports +/-Inf and NaN and variable whitespace
  • 支持+/-Inf和NaN以及可变空格
  • no problems at all detecting the end of input (as opposed to the other mmap answer)
  • 完全不存在检测输入结束的问题(与其他mmap答案相反)
  • looks nice:


    bool ok = phrase_parse(f,l,               // source iterators
         (double_ > double_ > double_) % eol, // grammar
         blank,                               // skipper
         data);                               // output attribute

Note that boost::spirit::istreambuf_iterator was unspeakably much slower (15s+). I hope this helps!

注意boost:::spirit: istreambuf_iterator的速度非常慢(15s+)。我希望这可以帮助!

Benchmark details

All parsing done into vector of struct float3 { float x,y,z; }.

将所有解析完成为struct float3的向量{float x,y,z;}。

Generate input file using


od -f -A none --width=12 /dev/urandom | head -n 11000000

This results in a 515Mb file containing data like


     -2627.0056   -1.967235e-12  -2.2784738e+33
  -1.0664798e-27  -4.6421956e-23   -6.917859e+20
  -1.1080849e+36   2.8909405e-33   1.7888695e-12
  -7.1663235e+33  -1.0840628e+36   1.5343362e-12
  -3.1773715e-17  -6.3655537e-22   -8.797282e+31
    9.781095e+19   1.7378472e-37        63825084
  -1.2139188e+09  -5.2464635e-05  -2.1235992e-38
   3.0109424e+08   5.3939846e+30  -6.6146894e-20

Compile the program using:


g++ -std=c++0x -g -O3 -isystem -march=native test.cpp -o test -lboost_filesystem -lboost_iostreams

Measure wall clock time using


time ./test < input.txt 


  • Linux desktop 4.2.0-42-generic #49-Ubuntu SMP x86_64
  • Linux桌面4.2.0-42-泛型#49-Ubuntu SMP x86_64
  • Intel(R) Core(TM) i7-3770K CPU @ 3.50GHz
  • 英特尔(R) Core(TM) i7-3770K CPU @ 3.50GHz。
  • 32GiB RAM
  • 32镶条内存

Full Code

Full code to the old benchmark is in the edit history of this post, the newest version is on github




Before you start, verify that this is the slow part of your application and get a test harness around it so you can measure improvements.


boost::spirit would be overkill for this in my opinion. Try fscanf


FILE* f = fopen("yourfile");
if (NULL == f) {
   printf("Failed to open 'yourfile'");
float x,y,z;
int nItemsRead = fscanf(f,"%f %f %f\n", &x, &y, &z);
if (3 != nItemsRead) {
   printf("Oh dear, items aren't in the right format.\n");



I would check out this related post Using ifstream to read floats or How do I tokenize a string in C++ particularly the posts related to C++ String Toolkit Library. I've used C strtok, C++ streams, Boost tokenizer and the best of them for the ease and use is C++ String Toolkit Library.

我将使用ifstream来阅读这篇相关的文章,或者如何在c++中对字符串进行标记,特别是与c++字符串工具包库相关的文章。我使用过C strtok、c++流、Boost tokenizer,其中最容易使用的是c++ String Toolkit库。



a nitty-gritty solution would be to throw more cores at the problem, spawning multiple threads. If the bottleneck is just the CPU you can halve down the running time by spawning two threads (on multicore CPUs)


some other tips:


  • try to avoid parsing functions from library such boost and/or std. They are bloated with error checking conditions and much of the processing time is spent doing these checks. For just a couple conversions they are fine but fail miserably when it comes to process millions of values. If you already know that your data is well-formatted you can write (or find) a custom optimized C function which does only the data conversion


  • use a large memory buffer (let's say 10 Mbytes) in which you load chunks of your file and do the conversion on there


  • divide et impera: split your problem into smaller easier ones: preprocess your file, make it single line single float, split each line by the "." character and convert integers instead of float, then merge the two integers to create the float number

    分割et impera:把你的问题分割成更小的问题:预处理你的文件,使它成为单行的浮点数,将每一行分割成“。”字符,而不是浮点数,然后合并两个整数来创建浮点数



I believe one most important rule in the string processing is "read only once, one character at a time". It is always simpler, faster and more reliable, I think.


I made simple benchmark program to show how simple it is. My test says this code runs 40% faster than strtod version.


#include <iostream>
#include <sstream>
#include <iomanip>
#include <stdlib.h>
#include <math.h>
#include <time.h>
#include <sys/time.h>

using namespace std;

string test_generate(size_t n)
    double sum = 0.0;
    ostringstream os;
    os << std::fixed;
    for (size_t i=0; i<n; ++i)
        unsigned u = rand();
        int w = 0;
        if (u > UINT_MAX/2)
            w = - (u - UINT_MAX/2);
            w = + (u - UINT_MAX/2);
        double f = w / 1000.0;
        sum += f;

        os << f;
        os << " ";
    printf("generated %f\n", sum);
    return os.str();

void read_float_ss(const string& in)
    double sum = 0.0;
    const char* begin = in.c_str();
    char* end = NULL;
    errno = 0;
    double f = strtod( begin, &end );
    sum += f;

    while ( errno == 0 && end != begin )
        begin = end;
        f = strtod( begin, &end );
        sum += f;
    printf("scanned %f\n", sum);

double scan_float(const char* str, size_t& off, size_t len)
    static const double bases[13] = {
        0.0, 10.0, 100.0, 1000.0, 10000.0,
        100000.0, 1000000.0, 10000000.0, 100000000.0,
        1000000000.0, 10000000000.0, 100000000000.0, 1000000000000.0,

    bool begin = false;
    bool fail = false;
    bool minus = false;
    int pfrac = 0;

    double dec = 0.0;
    double frac = 0.0;
    for (; !fail && off<len; ++off)
        char c = str[off];
        if (c == '+')
            if (!begin)
                begin = true;
                fail = true;
        else if (c == '-')
            if (!begin)
                begin = true;
                fail = true;
            minus = true;
        else if (c == '.')
            if (!begin)
                begin = true;
            else if (pfrac)
                fail = true;
            pfrac = 1;
        else if (c >= '0' && c <= '9')
            if (!begin)
                begin = true;
            if (pfrac == 0)
                dec *= 10;
                dec += c - '0';
            else if (pfrac < 13)
                frac += (c - '0') / bases[pfrac];

    if (!fail)
        double f = dec + frac;
        if (minus)
            f = -f;
        return f;

    return 0.0;

void read_float_direct(const string& in)
    double sum = 0.0;
    size_t len = in.length();
    const char* str = in.c_str();
    for (size_t i=0; i<len; ++i)
        double f = scan_float(str, i, len);
        sum += f;
    printf("scanned %f\n", sum);

int main()
    const int n = 1000000;
    printf("count = %d\n", n);

    string in = test_generate(n);    
        struct timeval t1;
        gettimeofday(&t1, 0);
        printf("scan start\n");


        struct timeval t2;
        gettimeofday(&t2, 0);
        double elapsed = (t2.tv_sec - t1.tv_sec) * 1000000.0;
        elapsed += (t2.tv_usec - t1.tv_usec) / 1000.0;
        printf("elapsed %.2fms\n", elapsed);

        struct timeval t1;
        gettimeofday(&t1, 0);
        printf("scan start\n");


        struct timeval t2;
        gettimeofday(&t2, 0);
        double elapsed = (t2.tv_sec - t1.tv_sec) * 1000000.0;
        elapsed += (t2.tv_usec - t1.tv_usec) / 1000.0;
        printf("elapsed %.2fms\n", elapsed);
    return 0;

Below is console output from i7 Mac Book Pro (compiled in XCode 4.6).

下面是i7 Mac Book Pro的控制台输出(在XCode 4.6中编译)。

count = 1000000
generated -1073202156466.638184
scan start
scanned -1073202156466.638184
elapsed 83.34ms
scan start
scanned -1073202156466.638184
elapsed 53.50ms



using C is going to be the fastest solution. Split into tokens using strtok and then convert to float with strtof. Or if you know the exact format use fscanf.




If the conversion is the bottle neck (which is quite possible), you should start by using the different possiblities in the standard. Logically, one would expect them to be very close, but practically, they aren't always:


  • You've already determined that std::ifstream is too slow.


  • Converting your memory mapped data to an std::istringstream is almost certainly not a good solution; you'll first have to create a string, which will copy all of the data.


  • Writing your own streambuf to read directly from the memory, without copying (or using the deprecated std::istrstream) might be a solution, although if the problem really is the conversion... this still uses the same conversion routines.

    编写自己的streambuf可以直接从内存中读取,而不需要复制(或者使用已弃用的std: istrstream)可能是一个解决方案,但如果问题真的是转换……这仍然使用相同的转换例程。

  • You can always try fscanf, or scanf on your memory mapped stream. Depending on the implementation, they might be faster than the various istream implementations.


  • Probably faster than any of these is to use strtod. No need to tokenize for this: strtod skips leading white space (including '\n'), and has an out parameter where it puts the address of the first character not read. The end condition is a bit tricky, your loop should probably look a bit like:


    char* begin;    //  Set to point to the mmap'ed data...
                    //  You'll also have to arrange for a '\0'
                    //  to follow the data.  This is probably
                    //  the most difficult issue.
    char* end;
    errno = 0;
    double tmp = strtod( begin, &end );
    while ( errno == 0 && end != begin ) {
        //  do whatever with tmp...
        begin = end;
        tmp = strtod( begin, &end );

If none of these are fast enough, you'll have to consider the actual data. It probably has some sort of additional constraints, which means that you can potentially write a conversion routine which is faster than the more general ones; e.g. strtod has to handle both fixed and scientific, and it has to be 100% accurate even if there are 17 significant digits. It also has to be locale specific. All of this is added complexity, which means added code to execute. But beware: writing an efficient and correct conversion routine, even for a restricted set of input, is non-trivial; you really do have to know what you are doing.




Just out of curiosity, I've run some tests. In addition to the afore mentioned solutions, I wrote a simple custom converter, which only handles fixed point (no scientific), with at most five digits after the decimal, and the value before the decimal must fit in an int:


convert( char const* source, char const** endPtr )
    char* end;
    int left = strtol( source, &end, 10 );
    double results = left;
    if ( *end == '.' ) {
        char* start = end + 1;
        int right = strtol( start, &end, 10 );
        static double const fracMult[] 
            = { 0.0, 0.1, 0.01, 0.001, 0.0001, 0.00001 };
        results += right * fracMult[ end - start ];
    if ( endPtr != nullptr ) {
        *endPtr = end;
    return results;

(If you actually use this, you should definitely add some error handling. This was just knocked up quickly for experimental purposes, to read the test file I'd generated, and nothing else.)


The interface is exactly that of strtod, to simplify coding.


I ran the benchmarks in two environments (on different machines, so the absolute values of any times aren't relevant). I got the following results:


Under Windows 7, compiled with VC 11 (/O2):

在Windows 7下,用vc11 (/O2)编译:

Testing Using fstream directly (5 iterations)...
    6.3528e+006 microseconds per iteration
Testing Using fscan directly (5 iterations)...
    685800 microseconds per iteration
Testing Using strtod (5 iterations)...
    597000 microseconds per iteration
Testing Using manual (5 iterations)...
    269600 microseconds per iteration

Under Linux 2.6.18, compiled with g++ 4.4.2 (-O2, IIRC):

在Linux 2.6.18下,用g++ 4.4.2 (-O2, IIRC)编译:

Testing Using fstream directly (5 iterations)...
    784000 microseconds per iteration
Testing Using fscanf directly (5 iterations)...
    526000 microseconds per iteration
Testing Using strtod (5 iterations)...
    382000 microseconds per iteration
Testing Using strtof (5 iterations)...
    360000 microseconds per iteration
Testing Using manual (5 iterations)...
    186000 microseconds per iteration

In all cases, I'm reading 554000 lines, each with 3 randomly generated floating point in the range [0...10000).


The most striking thing is the enormous difference between fstream and fscan under Windows (and the relatively small difference between fscan and strtod). The second thing is just how much the simple custom conversion function gains, on both platforms. The necessary error handling would slow it down a little, but the difference is still significant. I expected some improvement, since it doesn't handle a lot of things the the standard conversion routines do (like scientific format, very, very small numbers, Inf and NaN, i18n, etc.), but not this much.





Since Spirit X3 is available for testing, I've updated the benchmarks. Meanwhile I've used Nonius to get statistically sound benchmarks.

因为Spirit X3可以进行测试,所以我更新了基准。与此同时,我使用了Nonius来获得统计上的声音基准。

All charts below are available interactive online


Benchmark CMake project + testdata used is on github: https://github.com/sehe/bench_float_parsing

使用的基准CMake项目+ testdata在github上:https://github.com/sehe/bench_float_parser



Spirit parsers are fastest. If you can use C++14 consider the experimental version Spirit X3:

精神解析器是最快的。如果你能用c++ 14考虑实验版精神X3:


The above is measures using memory mapped files. Using IOstreams, it will be slower accross the board,



but not as slow as scanf using C/POSIX FILE* function calls:



What follows is parts from the OLD answer


I implemented the Spirit version, and ran a benchmark comparing to the other suggested answers.


Here's my results, all tests run on the same body of input (515Mb of input.txt). See below for exact specs.


(wall clock time in seconds, average of 2+ runs)


To my own surprise, Boost Spirit turns out to be fastest, and most elegant:

令我惊讶的是,Boost Spirit竟然是最快、最优雅的:

  • handles/reports errors
  • 处理/报告错误
  • supports +/-Inf and NaN and variable whitespace
  • 支持+/-Inf和NaN以及可变空格
  • no problems at all detecting the end of input (as opposed to the other mmap answer)
  • 完全不存在检测输入结束的问题(与其他mmap答案相反)
  • looks nice:


    bool ok = phrase_parse(f,l,               // source iterators
         (double_ > double_ > double_) % eol, // grammar
         blank,                               // skipper
         data);                               // output attribute

Note that boost::spirit::istreambuf_iterator was unspeakably much slower (15s+). I hope this helps!

注意boost:::spirit: istreambuf_iterator的速度非常慢(15s+)。我希望这可以帮助!

Benchmark details

All parsing done into vector of struct float3 { float x,y,z; }.

将所有解析完成为struct float3的向量{float x,y,z;}。

Generate input file using


od -f -A none --width=12 /dev/urandom | head -n 11000000

This results in a 515Mb file containing data like


     -2627.0056   -1.967235e-12  -2.2784738e+33
  -1.0664798e-27  -4.6421956e-23   -6.917859e+20
  -1.1080849e+36   2.8909405e-33   1.7888695e-12
  -7.1663235e+33  -1.0840628e+36   1.5343362e-12
  -3.1773715e-17  -6.3655537e-22   -8.797282e+31
    9.781095e+19   1.7378472e-37        63825084
  -1.2139188e+09  -5.2464635e-05  -2.1235992e-38
   3.0109424e+08   5.3939846e+30  -6.6146894e-20

Compile the program using:


g++ -std=c++0x -g -O3 -isystem -march=native test.cpp -o test -lboost_filesystem -lboost_iostreams

Measure wall clock time using


time ./test < input.txt 


  • Linux desktop 4.2.0-42-generic #49-Ubuntu SMP x86_64
  • Linux桌面4.2.0-42-泛型#49-Ubuntu SMP x86_64
  • Intel(R) Core(TM) i7-3770K CPU @ 3.50GHz
  • 英特尔(R) Core(TM) i7-3770K CPU @ 3.50GHz。
  • 32GiB RAM
  • 32镶条内存

Full Code

Full code to the old benchmark is in the edit history of this post, the newest version is on github




Before you start, verify that this is the slow part of your application and get a test harness around it so you can measure improvements.


boost::spirit would be overkill for this in my opinion. Try fscanf


FILE* f = fopen("yourfile");
if (NULL == f) {
   printf("Failed to open 'yourfile'");
float x,y,z;
int nItemsRead = fscanf(f,"%f %f %f\n", &x, &y, &z);
if (3 != nItemsRead) {
   printf("Oh dear, items aren't in the right format.\n");



I would check out this related post Using ifstream to read floats or How do I tokenize a string in C++ particularly the posts related to C++ String Toolkit Library. I've used C strtok, C++ streams, Boost tokenizer and the best of them for the ease and use is C++ String Toolkit Library.

我将使用ifstream来阅读这篇相关的文章,或者如何在c++中对字符串进行标记,特别是与c++字符串工具包库相关的文章。我使用过C strtok、c++流、Boost tokenizer,其中最容易使用的是c++ String Toolkit库。



a nitty-gritty solution would be to throw more cores at the problem, spawning multiple threads. If the bottleneck is just the CPU you can halve down the running time by spawning two threads (on multicore CPUs)


some other tips:


  • try to avoid parsing functions from library such boost and/or std. They are bloated with error checking conditions and much of the processing time is spent doing these checks. For just a couple conversions they are fine but fail miserably when it comes to process millions of values. If you already know that your data is well-formatted you can write (or find) a custom optimized C function which does only the data conversion


  • use a large memory buffer (let's say 10 Mbytes) in which you load chunks of your file and do the conversion on there


  • divide et impera: split your problem into smaller easier ones: preprocess your file, make it single line single float, split each line by the "." character and convert integers instead of float, then merge the two integers to create the float number

    分割et impera:把你的问题分割成更小的问题:预处理你的文件,使它成为单行的浮点数,将每一行分割成“。”字符,而不是浮点数,然后合并两个整数来创建浮点数



I believe one most important rule in the string processing is "read only once, one character at a time". It is always simpler, faster and more reliable, I think.


I made simple benchmark program to show how simple it is. My test says this code runs 40% faster than strtod version.


#include <iostream>
#include <sstream>
#include <iomanip>
#include <stdlib.h>
#include <math.h>
#include <time.h>
#include <sys/time.h>

using namespace std;

string test_generate(size_t n)
    double sum = 0.0;
    ostringstream os;
    os << std::fixed;
    for (size_t i=0; i<n; ++i)
        unsigned u = rand();
        int w = 0;
        if (u > UINT_MAX/2)
            w = - (u - UINT_MAX/2);
            w = + (u - UINT_MAX/2);
        double f = w / 1000.0;
        sum += f;

        os << f;
        os << " ";
    printf("generated %f\n", sum);
    return os.str();

void read_float_ss(const string& in)
    double sum = 0.0;
    const char* begin = in.c_str();
    char* end = NULL;
    errno = 0;
    double f = strtod( begin, &end );
    sum += f;

    while ( errno == 0 && end != begin )
        begin = end;
        f = strtod( begin, &end );
        sum += f;
    printf("scanned %f\n", sum);

double scan_float(const char* str, size_t& off, size_t len)
    static const double bases[13] = {
        0.0, 10.0, 100.0, 1000.0, 10000.0,
        100000.0, 1000000.0, 10000000.0, 100000000.0,
        1000000000.0, 10000000000.0, 100000000000.0, 1000000000000.0,

    bool begin = false;
    bool fail = false;
    bool minus = false;
    int pfrac = 0;

    double dec = 0.0;
    double frac = 0.0;
    for (; !fail && off<len; ++off)
        char c = str[off];
        if (c == '+')
            if (!begin)
                begin = true;
                fail = true;
        else if (c == '-')
            if (!begin)
                begin = true;
                fail = true;
            minus = true;
        else if (c == '.')
            if (!begin)
                begin = true;
            else if (pfrac)
                fail = true;
            pfrac = 1;
        else if (c >= '0' && c <= '9')
            if (!begin)
                begin = true;
            if (pfrac == 0)
                dec *= 10;
                dec += c - '0';
            else if (pfrac < 13)
                frac += (c - '0') / bases[pfrac];

    if (!fail)
        double f = dec + frac;
        if (minus)
            f = -f;
        return f;

    return 0.0;

void read_float_direct(const string& in)
    double sum = 0.0;
    size_t len = in.length();
    const char* str = in.c_str();
    for (size_t i=0; i<len; ++i)
        double f = scan_float(str, i, len);
        sum += f;
    printf("scanned %f\n", sum);

int main()
    const int n = 1000000;
    printf("count = %d\n", n);

    string in = test_generate(n);    
        struct timeval t1;
        gettimeofday(&t1, 0);
        printf("scan start\n");


        struct timeval t2;
        gettimeofday(&t2, 0);
        double elapsed = (t2.tv_sec - t1.tv_sec) * 1000000.0;
        elapsed += (t2.tv_usec - t1.tv_usec) / 1000.0;
        printf("elapsed %.2fms\n", elapsed);

        struct timeval t1;
        gettimeofday(&t1, 0);
        printf("scan start\n");


        struct timeval t2;
        gettimeofday(&t2, 0);
        double elapsed = (t2.tv_sec - t1.tv_sec) * 1000000.0;
        elapsed += (t2.tv_usec - t1.tv_usec) / 1000.0;
        printf("elapsed %.2fms\n", elapsed);
    return 0;

Below is console output from i7 Mac Book Pro (compiled in XCode 4.6).

下面是i7 Mac Book Pro的控制台输出(在XCode 4.6中编译)。

count = 1000000
generated -1073202156466.638184
scan start
scanned -1073202156466.638184
elapsed 83.34ms
scan start
scanned -1073202156466.638184
elapsed 53.50ms



using C is going to be the fastest solution. Split into tokens using strtok and then convert to float with strtof. Or if you know the exact format use fscanf.
