As mentioned in the title, I'm looking for something that can give me more performance than atoi. Presently, the fastest way I know is
正如标题中所提到的,我正在寻找能够比atoi更具性能的东西。目前,我所知道的最快的方式是
atoi(mystring.c_str())
Finally, I would prefer a solution that doesn't rely on Boost. Does anybody have good performance tricks for doing this?
最后,我更喜欢不依赖Boost的解决方案。这样做有没有人有很好的表现技巧?
Additional Information: int will not exceed 2 billion, it is always positive, the string has no decimal places in it.
附加信息:int不会超过20亿,它总是正数,字符串中没有小数位。
10 个解决方案
#1
18
I experimented with solutions using lookup tables, but found them fraught with issues, and actually not very fast. The fastest solution turned out to be the least imaginitive:
我使用查找表试验了解决方案,但发现它们充满了问题,实际上并不是很快。最快的解决方案结果是最不明显的:
int fast_atoi( const char * str )
{
int val = 0;
while( *str ) {
val = val*10 + (*str++ - '0');
}
return val;
}
Running a benchmark with a million randomly generated strings:
使用一百万随机生成的字符串运行基准测试:
fast_atoi : 0.0097 seconds
atoi : 0.0414 seconds
To be fair, I also tested this function by forcing the compiler not to inline it. The results were still good:
公平地说,我还通过强制编译器不要内联它来测试此函数。结果仍然很好:
fast_atoi : 0.0104 seconds
atoi : 0.0426 seconds
Provided your data conforms to the requirements of the fast_atoi
function, that is pretty reasonable performance. The requirements are:
如果您的数据符合fast_atoi函数的要求,那就是非常合理的性能。要求是:
- Input string contains only numeric characters, or is empty
- Input string represents a number from 0 up to
INT_MAX
输入字符串仅包含数字字符,或为空
输入字符串表示从0到INT_MAX的数字
#2
15
atoi
can be improved upon significantly, given certain assumptions. This was demonstrated powerfully in a presentation by Andrei Alexandrescu at the C++ and Beyond 2012 conference. Hi s replacement used loop unrolling and ALU parallelism to achieve orders of magnitude in perf improvement. I don't have his materials, but this link uses a similar technique: http://tombarta.wordpress.com/2008/04/23/specializing-atoi/
在给定某些假设的情况下,atoi可以得到显着改善。 Andrei Alexandrescu在C ++和Beyond 2012大会上的演讲中证明了这一点。您的替换使用循环展开和ALU并行性来实现性能提升的数量级。我没有他的材料,但这个链接使用了类似的技术:http://tombarta.wordpress.com/2008/04/23/specializing-atoi/
#3
9
This page compares conversion speed between different string->int functions using different compilers. The naive function, which offers no error checking, offers speeds roughly twice as fast as atoi(), according to the results presented.
此页面使用不同的编译器比较不同string-> int函数之间的转换速度。根据所提供的结果,提供无错误检查的天真功能提供的速度大约是atoi()的两倍。
// Taken from http://tinodidriksen.com/uploads/code/cpp/speed-string-to-int.cpp
int naive(const char *p) {
int x = 0;
bool neg = false;
if (*p == '-') {
neg = true;
++p;
}
while (*p >= '0' && *p <= '9') {
x = (x*10) + (*p - '0');
++p;
}
if (neg) {
x = -x;
}
return x;
}
it is always positive
它始终是积极的
Remove the negative checks in the above code for a micro optimization.
删除上述代码中的否定检查以进行微优化。
If you can guarantee the string will not have anything but numeric characters, you can micro optimize further by changing the loop
如果您可以保证字符串除了数字字符之外没有任何内容,您可以通过更改循环进行微观优化
while (*p >= '0' && *p <= '9') {
to
while (*p != '\0' ) {
Which leaves you with
这让你失望
unsigned int naive(const char *p) {
unsigned int x = 0;
while (*p != '\0') {
x = (x*10) + (*p - '0');
++p;
}
return x;
}
#4
5
Quite a few of the code examples here are quite complex and do unnecessary work, meaning the code could be slimmer and faster.
这里的一些代码示例相当复杂,并且做了不必要的工作,这意味着代码可以更轻薄,更快。
Conversion loops are often written to do three different things with each character:
转换循环通常用于对每个字符执行三种不同的操作:
- bail out if it is the end-of-string character
- bail out if it is not a digit
- convert it from its code point to the actual digit value
如果它是字符串结尾字符,则纾困
如果它不是数字,则纾困
将其从代码点转换为实际的数字值
First observation: there is no need to check for the end-of-string character separately, since it is not a digit. Hence the check for 'digitness' covers the EOS condition implicitly.
首先观察:不需要单独检查字符串结尾字符,因为它不是数字。因此,检查“数字性”隐含地涵盖了EOS条件。
Second observation: double conditions for range testing as in (c >= '0' && c <= '9')
can be converted to a single test condition by using an unsigned type and anchoring the range at zero; that way there can be no unwanted values below the beginning of the range, all unwanted values are mapped to the range above the upper limit: (uint8_t(c - '0') <= 9)
第二个观察:范围测试的双重条件如(c> ='0'&& c <='9')可以通过使用无符号类型并将范围固定为零来转换为单个测试条件;这样,在范围的开头可以没有不需要的值,所有不需要的值都映射到高于上限的范围:(uint8_t(c - '0')<= 9)
It just so happens that c - '0'
needs to be computed here anyway...
碰巧c - '0'需要在这里计算......
Hence the inner conversion loop can be slimmed to
因此,内部转换循环可以缩小到
uint64_t n = digit_value(*p);
unsigned d;
while ((d = digit_value(*++p)) <= 9)
{
n = n * 10 + d;
}
The code here is called with the precondition that p
be pointing at a digit, which is why the first digit is extracted without further ado (which also avoids a superfluous MUL).
这里的代码被调用,前提是p指向一个数字,这就是为什么第一个数字被提取而没有进一步的ado(这也避免了多余的MUL)。
That precondition is less outlandish than might appear at first, since p
pointing at a digit is the reason why this code is called by the parser in the first place. In my code the whole shebang looks like this (assertions and other production-quality noise elided):
这个先决条件比起初可能出现的要小一些,因为p指向一个数字是解析器首先调用此代码的原因。在我的代码中,整个shebang看起来像这样(断言和其他生产质量噪声省略):
unsigned digit_value (char c)
{
return unsigned(c - '0');
}
bool is_digit (char c)
{
return digit_value(c) <= 9;
}
uint64_t extract_uint64 (char const **read_ptr)
{
char const *p = *read_ptr;
uint64_t n = digit_value(*p);
unsigned d;
while ((d = digit_value(*++p)) <= 9)
{
n = n * 10 + d;
}
*read_ptr = p;
return n;
}
The first call to digit_value()
is often elided by the compiler, if the code gets inlined and the calling code has already computed that value by calling is_digit()
.
如果代码被内联并且调用代码已经通过调用is_digit()计算了该值,则编译器通常会忽略对digit_value()的第一次调用。
n * 10
happens to be faster than manual shifting (e.g. n = (n << 3) + (n << 1) + d
), at least on my machine with gcc 4.8.1 and VC++ 2013. My guess is that both compilers use LEA
with index scaling for adding up to three values in one go and scaling one of them by 2, 4, or 8.
n * 10碰巧比手动换档更快(例如n =(n << 3)+(n << 1)+ d),至少在我的机器上使用gcc 4.8.1和VC ++ 2013.我的猜测是两者都是编译器使用带索引缩放的LEA,一次最多可添加三个值,并将其中一个值缩放2,4或8。
In any case that's exactly how it should be: we write nice clean code in separate functions and express the desired logic (n * 10, x % CHAR_BIT, whatever) and the compiler converts it to shifting, masking, LEAing and so on, inlines everything into the big bad parser loop and takes care of all the required messiness under the hood to make things fast. We don't even have to stick inline
in front of everything anymore. If anything then we have to do the opposite, by using __declspec(noinline)
judiciously when compilers get over-eager.
在任何情况下,它应该是它应该是什么:我们在单独的函数中编写漂亮的干净代码并表达所需的逻辑(n * 10,x%CHAR_BIT,无论如何),编译器将其转换为移位,屏蔽,LEAing等,内联一切都进入了大的糟糕的解析器循环,并在引擎盖下处理所有必要的混乱,以使事情变得快速。我们甚至不必再将内联插入所有内容。如果有的话,我们必须做相反的事情,在编译器过度使用时明智地使用__declspec(noinline)。
I'm using the above code in a program that reads billions of numbers from text files and pipes; it converts 115 million uints per second if the length is 9..10 digits, and 60 million/s for length 19..20 digits (gcc 4.8.1). That's more than ten times as fast as strtoull()
(and just barely enough for my purposes, but I digress...). That's the timing for converting text blobs containing 10 million numbers each (100..200 MB), meaning that memory timings make these numbers appear a bit worse than they would be in a synthetic benchmark running from cache.
我在一个从文本文件和管道读取数十亿个数字的程序中使用上面的代码;如果长度为9..10位,则每秒转换1.15亿个uint;对于长度为19..20位数(gcc 4.8.1),它转换为6000万/ s。这比strtoull()的速度快十倍(对于我的目的来说勉强够用,但我离题了......)。这是转换包含每个1000万个数字(100..200 MB)的文本blob的时间,这意味着内存时序使这些数字看起来比从缓存中运行的综合基准测试中看起来更糟糕。
#5
3
Why not use a stringstream? I'm not sure of its particular overhead, but you could define:
为什么不使用stringstream?我不确定它的特殊开销,但你可以定义:
int myInt;
string myString = "1561";
stringstream ss;
ss(myString);
ss >> myInt;
Of course, you'd need to
当然,你需要
#include <stringstream>
#6
2
Paddy's implementation of fast_atoi is faster than atoi - without the shadow of the doubt - however it works only for unsigned integers.
Paddy对fast_atoi的实现比atoi更快 - 没有怀疑的阴影 - 但它只适用于无符号整数。
Below, I put evaluated version of Paddy's fast_atoi that also allows only unsigned integers but speeds conversion up even more by replacing costly operation * with +
下面,我把Paddy的fast_atoi的评估版本也只允许使用无符号整数,但通过用+替换昂贵的操作来加快转换速度
unsigned int fast_atou(const char *str)
{
unsigned int val = 0;
while(*str) {
val = (val << 1) + (val << 3) + *(str++) - 48;
}
return val;
}
Here, I put complete version of fast_atoi() that i'm using sometimes which converts singed integers as well:
在这里,我把完整版本的fast_atoi()放在我有时使用的同时转换了烧结的整数:
int fast_atoi(const char *buff)
{
int c = 0, sign = 0, x = 0;
const char *p = buff;
for(c = *(p++); (c < 48 || c > 57); c = *(p++)) {if (c == 45) {sign = 1; c = *(p++); break;}}; // eat whitespaces and check sign
for(; c > 47 && c < 58; c = *(p++)) x = (x << 1) + (x << 3) + c - 48;
return sign ? -x : x;
}
#7
1
Here's the entirety of the atoi function in gcc:
这是gcc中atoi函数的全部内容:
long atoi(const char *str)
{
long num = 0;
int neg = 0;
while (isspace(*str)) str++;
if (*str == '-')
{
neg=1;
str++;
}
while (isdigit(*str))
{
num = 10*num + (*str - '0');
str++;
}
if (neg)
num = -num;
return num;
}
The whitespace and negative check are superfluous in your case, but also only use nanoseconds.
在你的情况下,空格和负面检查是多余的,但也只使用纳秒。
isdigit is almost certainly inlined, so that's not costing you any time.
isdigit几乎肯定是内联的,所以这不会让你花费任何时间。
I really don't see room for improvement here.
我真的没有在这里看到改进的余地。
#8
0
The only definitive answer is with checking with your compiler, your real data.
唯一明确的答案是检查您的编译器,您的真实数据。
Something I'd try (even if it's using memory accesses so it may be slow depending on caching) is
我试过的东西(即使它正在使用内存访问,因此根据缓存可能会很慢)
int value = t1[s[n-1]];
if (n > 1) value += t10[s[n-2]]; else return value;
if (n > 2) value += t100[s[n-3]]; else return value;
if (n > 3) value += t1000[s[n-4]]; else return value;
... continuing for how many digits you need to handle ...
if t1
, t10
and so on are statically allocated and constant the compiler shouldn't fear any aliasing and the machine code generated should be quite decent.
如果t1,t10等静态分配并且常量,编译器不应该担心任何别名,并且生成的机器代码应该相当不错。
#9
0
Here is mine. Atoi is the fastest I could come up with. I compiled with msvc 2010 so it might be possible to combine both templates. In msvc 2010, when I combined templates it made the case where you provide a cb argument slower.
这是我的。 Atoi是我能想到的最快的。我使用msvc 2010编译,因此可以组合这两个模板。在msvc 2010中,当我组合模板时,它提供了一个慢速提供cb参数的情况。
Atoi handles nearly all the special atoi cases, and is as fast or faster than this:
Atoi处理几乎所有特殊的atoi案例,并且速度或速度都快于此:
int val = 0;
while( *str )
val = val*10 + (*str++ - '0');
Here is the code:
这是代码:
#define EQ1(a,a1) (BYTE(a) == BYTE(a1))
#define EQ1(a,a1,a2) (BYTE(a) == BYTE(a1) && EQ1(a,a2))
#define EQ1(a,a1,a2,a3) (BYTE(a) == BYTE(a1) && EQ1(a,a2,a3))
// Atoi is 4x faster than atoi. There is also an overload that takes a cb argument.
template <typename T>
T Atoi(LPCSTR sz) {
T n = 0;
bool fNeg = false; // for unsigned T, this is removed by optimizer
const BYTE* p = (const BYTE*)sz;
BYTE ch;
// test for most exceptions in the leading chars. Most of the time
// this test is skipped. Note we skip over leading zeros to avoid the
// useless math in the second loop. We expect leading 0 to be the most
// likely case, so we test it first, however the cpu might reorder that.
for ( ; (ch=*p-'1') >= 9 ; ++p) { // unsigned trick for range compare
// ignore leading 0's, spaces, and '+'
if (EQ1(ch, '0'-'1', ' '-'1', '+'-'1'))
continue;
// for unsigned T this is removed by optimizer
if (!((T)-1 > 0) && ch==BYTE('-'-'1')) {
fNeg = !fNeg;
continue;
}
// atoi ignores these. Remove this code for a small perf increase.
if (BYTE(*p-9) > 4) // \t, \n, 11, 12, \r. unsigned trick for range compare
break;
}
// deal with rest of digits, stop loop on non digit.
for ( ; (ch=*p-'0') <= 9 ; ++p) // unsigned trick for range compare
n = n*10 + ch;
// for unsigned T, (fNeg) test is removed by optimizer
return (fNeg) ? -n : n;
}
// you could go with a single template that took a cb argument, but I could not
// get the optimizer to create good code when both the cb and !cb case were combined.
// above code contains the comments.
template <typename T>
T Atoi(LPCSTR sz, BYTE cb) {
T n = 0;
bool fNeg = false;
const BYTE* p = (const BYTE*)sz;
const BYTE* p1 = p + cb;
BYTE ch;
for ( ; p<p1 && (ch=*p-'1') >= 9 ; ++p) {
if (EQ1(ch,BYTE('0'-'1'),BYTE(' '-'1'),BYTE('+'-'1')))
continue;
if (!((T)-1 > 0) && ch == BYTE('-'-'1')) {
fNeg = !fNeg;
continue;
}
if (BYTE(*p-9) > 4) // \t, \n, 11, 12, \r
break;
}
for ( ; p<p1 && (ch=*p-'0') <= 9 ; ++p)
n = n*10 + ch;
return (fNeg) ? -n : n;
}
#10
0
a faster convert function
更快的转换功能
multiplication is always slower that sum and shift, therefore change multiply with shift
乘法总是慢于和和移,因此随着移位而变化
int fast_atoi( const char * str )
{
int val = 0;
while( *str ) {
val = (val << 4) - (val << 2) - (val << 1) + (*str++ - '0');
}
return val;
}
#1
18
I experimented with solutions using lookup tables, but found them fraught with issues, and actually not very fast. The fastest solution turned out to be the least imaginitive:
我使用查找表试验了解决方案,但发现它们充满了问题,实际上并不是很快。最快的解决方案结果是最不明显的:
int fast_atoi( const char * str )
{
int val = 0;
while( *str ) {
val = val*10 + (*str++ - '0');
}
return val;
}
Running a benchmark with a million randomly generated strings:
使用一百万随机生成的字符串运行基准测试:
fast_atoi : 0.0097 seconds
atoi : 0.0414 seconds
To be fair, I also tested this function by forcing the compiler not to inline it. The results were still good:
公平地说,我还通过强制编译器不要内联它来测试此函数。结果仍然很好:
fast_atoi : 0.0104 seconds
atoi : 0.0426 seconds
Provided your data conforms to the requirements of the fast_atoi
function, that is pretty reasonable performance. The requirements are:
如果您的数据符合fast_atoi函数的要求,那就是非常合理的性能。要求是:
- Input string contains only numeric characters, or is empty
- Input string represents a number from 0 up to
INT_MAX
输入字符串仅包含数字字符,或为空
输入字符串表示从0到INT_MAX的数字
#2
15
atoi
can be improved upon significantly, given certain assumptions. This was demonstrated powerfully in a presentation by Andrei Alexandrescu at the C++ and Beyond 2012 conference. Hi s replacement used loop unrolling and ALU parallelism to achieve orders of magnitude in perf improvement. I don't have his materials, but this link uses a similar technique: http://tombarta.wordpress.com/2008/04/23/specializing-atoi/
在给定某些假设的情况下,atoi可以得到显着改善。 Andrei Alexandrescu在C ++和Beyond 2012大会上的演讲中证明了这一点。您的替换使用循环展开和ALU并行性来实现性能提升的数量级。我没有他的材料,但这个链接使用了类似的技术:http://tombarta.wordpress.com/2008/04/23/specializing-atoi/
#3
9
This page compares conversion speed between different string->int functions using different compilers. The naive function, which offers no error checking, offers speeds roughly twice as fast as atoi(), according to the results presented.
此页面使用不同的编译器比较不同string-> int函数之间的转换速度。根据所提供的结果,提供无错误检查的天真功能提供的速度大约是atoi()的两倍。
// Taken from http://tinodidriksen.com/uploads/code/cpp/speed-string-to-int.cpp
int naive(const char *p) {
int x = 0;
bool neg = false;
if (*p == '-') {
neg = true;
++p;
}
while (*p >= '0' && *p <= '9') {
x = (x*10) + (*p - '0');
++p;
}
if (neg) {
x = -x;
}
return x;
}
it is always positive
它始终是积极的
Remove the negative checks in the above code for a micro optimization.
删除上述代码中的否定检查以进行微优化。
If you can guarantee the string will not have anything but numeric characters, you can micro optimize further by changing the loop
如果您可以保证字符串除了数字字符之外没有任何内容,您可以通过更改循环进行微观优化
while (*p >= '0' && *p <= '9') {
to
while (*p != '\0' ) {
Which leaves you with
这让你失望
unsigned int naive(const char *p) {
unsigned int x = 0;
while (*p != '\0') {
x = (x*10) + (*p - '0');
++p;
}
return x;
}
#4
5
Quite a few of the code examples here are quite complex and do unnecessary work, meaning the code could be slimmer and faster.
这里的一些代码示例相当复杂,并且做了不必要的工作,这意味着代码可以更轻薄,更快。
Conversion loops are often written to do three different things with each character:
转换循环通常用于对每个字符执行三种不同的操作:
- bail out if it is the end-of-string character
- bail out if it is not a digit
- convert it from its code point to the actual digit value
如果它是字符串结尾字符,则纾困
如果它不是数字,则纾困
将其从代码点转换为实际的数字值
First observation: there is no need to check for the end-of-string character separately, since it is not a digit. Hence the check for 'digitness' covers the EOS condition implicitly.
首先观察:不需要单独检查字符串结尾字符,因为它不是数字。因此,检查“数字性”隐含地涵盖了EOS条件。
Second observation: double conditions for range testing as in (c >= '0' && c <= '9')
can be converted to a single test condition by using an unsigned type and anchoring the range at zero; that way there can be no unwanted values below the beginning of the range, all unwanted values are mapped to the range above the upper limit: (uint8_t(c - '0') <= 9)
第二个观察:范围测试的双重条件如(c> ='0'&& c <='9')可以通过使用无符号类型并将范围固定为零来转换为单个测试条件;这样,在范围的开头可以没有不需要的值,所有不需要的值都映射到高于上限的范围:(uint8_t(c - '0')<= 9)
It just so happens that c - '0'
needs to be computed here anyway...
碰巧c - '0'需要在这里计算......
Hence the inner conversion loop can be slimmed to
因此,内部转换循环可以缩小到
uint64_t n = digit_value(*p);
unsigned d;
while ((d = digit_value(*++p)) <= 9)
{
n = n * 10 + d;
}
The code here is called with the precondition that p
be pointing at a digit, which is why the first digit is extracted without further ado (which also avoids a superfluous MUL).
这里的代码被调用,前提是p指向一个数字,这就是为什么第一个数字被提取而没有进一步的ado(这也避免了多余的MUL)。
That precondition is less outlandish than might appear at first, since p
pointing at a digit is the reason why this code is called by the parser in the first place. In my code the whole shebang looks like this (assertions and other production-quality noise elided):
这个先决条件比起初可能出现的要小一些,因为p指向一个数字是解析器首先调用此代码的原因。在我的代码中,整个shebang看起来像这样(断言和其他生产质量噪声省略):
unsigned digit_value (char c)
{
return unsigned(c - '0');
}
bool is_digit (char c)
{
return digit_value(c) <= 9;
}
uint64_t extract_uint64 (char const **read_ptr)
{
char const *p = *read_ptr;
uint64_t n = digit_value(*p);
unsigned d;
while ((d = digit_value(*++p)) <= 9)
{
n = n * 10 + d;
}
*read_ptr = p;
return n;
}
The first call to digit_value()
is often elided by the compiler, if the code gets inlined and the calling code has already computed that value by calling is_digit()
.
如果代码被内联并且调用代码已经通过调用is_digit()计算了该值,则编译器通常会忽略对digit_value()的第一次调用。
n * 10
happens to be faster than manual shifting (e.g. n = (n << 3) + (n << 1) + d
), at least on my machine with gcc 4.8.1 and VC++ 2013. My guess is that both compilers use LEA
with index scaling for adding up to three values in one go and scaling one of them by 2, 4, or 8.
n * 10碰巧比手动换档更快(例如n =(n << 3)+(n << 1)+ d),至少在我的机器上使用gcc 4.8.1和VC ++ 2013.我的猜测是两者都是编译器使用带索引缩放的LEA,一次最多可添加三个值,并将其中一个值缩放2,4或8。
In any case that's exactly how it should be: we write nice clean code in separate functions and express the desired logic (n * 10, x % CHAR_BIT, whatever) and the compiler converts it to shifting, masking, LEAing and so on, inlines everything into the big bad parser loop and takes care of all the required messiness under the hood to make things fast. We don't even have to stick inline
in front of everything anymore. If anything then we have to do the opposite, by using __declspec(noinline)
judiciously when compilers get over-eager.
在任何情况下,它应该是它应该是什么:我们在单独的函数中编写漂亮的干净代码并表达所需的逻辑(n * 10,x%CHAR_BIT,无论如何),编译器将其转换为移位,屏蔽,LEAing等,内联一切都进入了大的糟糕的解析器循环,并在引擎盖下处理所有必要的混乱,以使事情变得快速。我们甚至不必再将内联插入所有内容。如果有的话,我们必须做相反的事情,在编译器过度使用时明智地使用__declspec(noinline)。
I'm using the above code in a program that reads billions of numbers from text files and pipes; it converts 115 million uints per second if the length is 9..10 digits, and 60 million/s for length 19..20 digits (gcc 4.8.1). That's more than ten times as fast as strtoull()
(and just barely enough for my purposes, but I digress...). That's the timing for converting text blobs containing 10 million numbers each (100..200 MB), meaning that memory timings make these numbers appear a bit worse than they would be in a synthetic benchmark running from cache.
我在一个从文本文件和管道读取数十亿个数字的程序中使用上面的代码;如果长度为9..10位,则每秒转换1.15亿个uint;对于长度为19..20位数(gcc 4.8.1),它转换为6000万/ s。这比strtoull()的速度快十倍(对于我的目的来说勉强够用,但我离题了......)。这是转换包含每个1000万个数字(100..200 MB)的文本blob的时间,这意味着内存时序使这些数字看起来比从缓存中运行的综合基准测试中看起来更糟糕。
#5
3
Why not use a stringstream? I'm not sure of its particular overhead, but you could define:
为什么不使用stringstream?我不确定它的特殊开销,但你可以定义:
int myInt;
string myString = "1561";
stringstream ss;
ss(myString);
ss >> myInt;
Of course, you'd need to
当然,你需要
#include <stringstream>
#6
2
Paddy's implementation of fast_atoi is faster than atoi - without the shadow of the doubt - however it works only for unsigned integers.
Paddy对fast_atoi的实现比atoi更快 - 没有怀疑的阴影 - 但它只适用于无符号整数。
Below, I put evaluated version of Paddy's fast_atoi that also allows only unsigned integers but speeds conversion up even more by replacing costly operation * with +
下面,我把Paddy的fast_atoi的评估版本也只允许使用无符号整数,但通过用+替换昂贵的操作来加快转换速度
unsigned int fast_atou(const char *str)
{
unsigned int val = 0;
while(*str) {
val = (val << 1) + (val << 3) + *(str++) - 48;
}
return val;
}
Here, I put complete version of fast_atoi() that i'm using sometimes which converts singed integers as well:
在这里,我把完整版本的fast_atoi()放在我有时使用的同时转换了烧结的整数:
int fast_atoi(const char *buff)
{
int c = 0, sign = 0, x = 0;
const char *p = buff;
for(c = *(p++); (c < 48 || c > 57); c = *(p++)) {if (c == 45) {sign = 1; c = *(p++); break;}}; // eat whitespaces and check sign
for(; c > 47 && c < 58; c = *(p++)) x = (x << 1) + (x << 3) + c - 48;
return sign ? -x : x;
}
#7
1
Here's the entirety of the atoi function in gcc:
这是gcc中atoi函数的全部内容:
long atoi(const char *str)
{
long num = 0;
int neg = 0;
while (isspace(*str)) str++;
if (*str == '-')
{
neg=1;
str++;
}
while (isdigit(*str))
{
num = 10*num + (*str - '0');
str++;
}
if (neg)
num = -num;
return num;
}
The whitespace and negative check are superfluous in your case, but also only use nanoseconds.
在你的情况下,空格和负面检查是多余的,但也只使用纳秒。
isdigit is almost certainly inlined, so that's not costing you any time.
isdigit几乎肯定是内联的,所以这不会让你花费任何时间。
I really don't see room for improvement here.
我真的没有在这里看到改进的余地。
#8
0
The only definitive answer is with checking with your compiler, your real data.
唯一明确的答案是检查您的编译器,您的真实数据。
Something I'd try (even if it's using memory accesses so it may be slow depending on caching) is
我试过的东西(即使它正在使用内存访问,因此根据缓存可能会很慢)
int value = t1[s[n-1]];
if (n > 1) value += t10[s[n-2]]; else return value;
if (n > 2) value += t100[s[n-3]]; else return value;
if (n > 3) value += t1000[s[n-4]]; else return value;
... continuing for how many digits you need to handle ...
if t1
, t10
and so on are statically allocated and constant the compiler shouldn't fear any aliasing and the machine code generated should be quite decent.
如果t1,t10等静态分配并且常量,编译器不应该担心任何别名,并且生成的机器代码应该相当不错。
#9
0
Here is mine. Atoi is the fastest I could come up with. I compiled with msvc 2010 so it might be possible to combine both templates. In msvc 2010, when I combined templates it made the case where you provide a cb argument slower.
这是我的。 Atoi是我能想到的最快的。我使用msvc 2010编译,因此可以组合这两个模板。在msvc 2010中,当我组合模板时,它提供了一个慢速提供cb参数的情况。
Atoi handles nearly all the special atoi cases, and is as fast or faster than this:
Atoi处理几乎所有特殊的atoi案例,并且速度或速度都快于此:
int val = 0;
while( *str )
val = val*10 + (*str++ - '0');
Here is the code:
这是代码:
#define EQ1(a,a1) (BYTE(a) == BYTE(a1))
#define EQ1(a,a1,a2) (BYTE(a) == BYTE(a1) && EQ1(a,a2))
#define EQ1(a,a1,a2,a3) (BYTE(a) == BYTE(a1) && EQ1(a,a2,a3))
// Atoi is 4x faster than atoi. There is also an overload that takes a cb argument.
template <typename T>
T Atoi(LPCSTR sz) {
T n = 0;
bool fNeg = false; // for unsigned T, this is removed by optimizer
const BYTE* p = (const BYTE*)sz;
BYTE ch;
// test for most exceptions in the leading chars. Most of the time
// this test is skipped. Note we skip over leading zeros to avoid the
// useless math in the second loop. We expect leading 0 to be the most
// likely case, so we test it first, however the cpu might reorder that.
for ( ; (ch=*p-'1') >= 9 ; ++p) { // unsigned trick for range compare
// ignore leading 0's, spaces, and '+'
if (EQ1(ch, '0'-'1', ' '-'1', '+'-'1'))
continue;
// for unsigned T this is removed by optimizer
if (!((T)-1 > 0) && ch==BYTE('-'-'1')) {
fNeg = !fNeg;
continue;
}
// atoi ignores these. Remove this code for a small perf increase.
if (BYTE(*p-9) > 4) // \t, \n, 11, 12, \r. unsigned trick for range compare
break;
}
// deal with rest of digits, stop loop on non digit.
for ( ; (ch=*p-'0') <= 9 ; ++p) // unsigned trick for range compare
n = n*10 + ch;
// for unsigned T, (fNeg) test is removed by optimizer
return (fNeg) ? -n : n;
}
// you could go with a single template that took a cb argument, but I could not
// get the optimizer to create good code when both the cb and !cb case were combined.
// above code contains the comments.
template <typename T>
T Atoi(LPCSTR sz, BYTE cb) {
T n = 0;
bool fNeg = false;
const BYTE* p = (const BYTE*)sz;
const BYTE* p1 = p + cb;
BYTE ch;
for ( ; p<p1 && (ch=*p-'1') >= 9 ; ++p) {
if (EQ1(ch,BYTE('0'-'1'),BYTE(' '-'1'),BYTE('+'-'1')))
continue;
if (!((T)-1 > 0) && ch == BYTE('-'-'1')) {
fNeg = !fNeg;
continue;
}
if (BYTE(*p-9) > 4) // \t, \n, 11, 12, \r
break;
}
for ( ; p<p1 && (ch=*p-'0') <= 9 ; ++p)
n = n*10 + ch;
return (fNeg) ? -n : n;
}
#10
0
a faster convert function
更快的转换功能
multiplication is always slower that sum and shift, therefore change multiply with shift
乘法总是慢于和和移,因此随着移位而变化
int fast_atoi( const char * str )
{
int val = 0;
while( *str ) {
val = (val << 4) - (val << 2) - (val << 1) + (*str++ - '0');
}
return val;
}