I wrote a function, Str::Compare
, that is basically a strcmp
rewritten in another way. While comparing the two function, in a loop repeted 500'000'000 times, strcmp
execute too much fast, about x750 times faster.
我写了一个函数,Str :: Compare,它基本上是用另一种方式重写的strcmp。在比较这两个函数时,在循环中重复500'000'000次,strcmp执行速度太快,大约快x750倍。
This code was compiled in a C library with -Os
parameter active:
此代码在C库中编译,-Os参数处于活动状态:
int Str::Compare(char* String_1, char* String_2)
{
char TempChar_1, TempChar_2;
do
{
TempChar_1 = *String_1++;
TempChar_2 = *String_2++;
} while(TempChar_1 && TempChar_1 == TempChar_2);
return TempChar_1 - TempChar_2;
}
The execution time of that function is 3.058s
, while strcmp
only 0.004s
.
该函数的执行时间为3.058s,而strcmp仅为0.004s。
Why this happen?
为什么会这样?
Also this is how I implemented the benchmark loop:
这也是我实现基准测试循环的方式:
int main()
{
char Xx[] = {"huehuehuehuehuehuehuehuehuehuehuehuehuehue"},
Yy[] = {"huehuehuehuehuehuehuehuehuehuehuehuehuehue"};
for(int i = 0; i < 500000000; ++i)
Str::Compare(Xx, Yy);
}
Edit: While testing some code I wrote and optimization that improved drastically Str::Compare
speed. If before strcmp
was x750 times faster now is only x250. This is the new code:
编辑:测试我编写的一些代码和优化,大大提高了Str :: Compare的速度。如果之前strcmp快了x750倍,现在只有x250。这是新代码:
int Str::Compare(char* String_1, char* String_2)
{
char TempChar_1, TempChar_2, TempChar_3;
while(TempChar_1 && !TempChar_3)
{
TempChar_1 = *String_1++;
TempChar_2 = *String_2++;
TempChar_3 = TempChar_1 ^ TempChar_2;
}
return TempChar_1 - TempChar_2;
}
New execution time is 0.994s
.
新执行时间为0.994秒。
3 个解决方案
#1
24
I was curious about it and build a test program:
我很好奇并建立了一个测试程序:
#include <string.h>
compare(char* String_1, char* String_2)
{
char TempChar_1,
TempChar_2;
do
{
TempChar_1 = *String_1++;
TempChar_2 = *String_2++;
} while(TempChar_1 && TempChar_1 == TempChar_2);
return TempChar_1 - TempChar_2;
}
int main(){
int i=strcmp("foo","bar");
int j=compare("foo","bar");
return i;
}
I compiled it to assembler with gcc -S -Os test.c
using gcc 4.7.3 resulting in the following assembler:
我使用gcc -S -Os test.cusing gcc 4.7.3将它编译成汇编程序,产生以下汇编程序:
.file "test.c"
.text
.globl compare
.type compare, @function
compare:
.LFB24:
.cfi_startproc
xorl %edx, %edx
.L2:
movsbl (%rdi,%rdx), %eax
movsbl (%rsi,%rdx), %ecx
incq %rdx
cmpb %cl, %al
jne .L4
testb %al, %al
jne .L2
.L4:
subl %ecx, %eax
ret
.cfi_endproc
.LFE24:
.size compare, .-compare
.section .rodata.str1.1,"aMS",@progbits,1
.LC0:
.string "bar"
.LC1:
.string "foo"
.section .text.startup,"ax",@progbits
.globl main
.type main, @function
main:
.LFB25:
.cfi_startproc
movl $.LC0, %esi
movl $.LC1, %edi
call compare
movl $1, %eax
ret
.cfi_endproc
.LFE25:
.size main, .-main
.ident "GCC: (Ubuntu/Linaro 4.7.3-1ubuntu1) 4.7.3"
.section .note.GNU-stack,"",@progbits
I am not that good in x86 assembler but as far as I see it the call to the strcmp is removed and simply replaced by a constant expression ( movl $1, %eax
). So if you use a constant expression for your tests, gcc probably optimizes the strcmp to a constant.
我在x86汇编程序中表现不佳,但据我所知,对strcmp的调用被删除并简单地用常量表达式替换(movl $ 1,%eax)。因此,如果为测试使用常量表达式,gcc可能会将strcmp优化为常量。
#2
5
When comparing performance, I've found that it's best to put the test functions and the test driver in separate compilation units. Put your test functions in separate compilation units, and compile those to whatever optimization level you want, but compile the test driver unoptimized. Otherwise you will run into exactly the kind of problem you've seen here.
在比较性能时,我发现最好将测试函数和测试驱动程序放在不同的编译单元中。将测试函数放在单独的编译单元中,并将它们编译为您想要的任何优化级别,但编译未优化的测试驱动程序。否则你会遇到你在这里看到的那种问题。
The problem is that strcmp
compares two const
C-style strings. If you loop 500,000,000 times over strcmp(string_a, string_b)
, an optimizing compiler is going to be smart enough to reduce that loop to optimize that loop away, and then perhaps smart enough to optimize away the one remaining call to strcmp
.
问题是strcmp比较了两个const C风格的字符串。如果你在strcmp(string_a,string_b)上循环500,000,000次,那么优化编译器将足够聪明以减少该循环以优化该循环,然后可能足够聪明以优化对strcmp的剩余调用。
Your compare function takes two non-const strings. As far as the compiler is concerned, your function may well have side effects. The compiler doesn't know, so it can't optimize the loop down to nothing. It has to generate code to perform the comparison 500,000,000 times.
您的compare函数需要两个非const字符串。就编译器而言,您的功能可能会产生副作用。编译器不知道,因此无法将循环优化为零。它必须生成代码才能执行500,000,000次比较。
#3
0
I believe most of the standard libraries are written in assembly language. That could be the reason you see the standard library is faster than yours.
我相信大多数标准库都是用汇编语言编写的。这可能是您看到标准库比您的更快的原因。
#1
24
I was curious about it and build a test program:
我很好奇并建立了一个测试程序:
#include <string.h>
compare(char* String_1, char* String_2)
{
char TempChar_1,
TempChar_2;
do
{
TempChar_1 = *String_1++;
TempChar_2 = *String_2++;
} while(TempChar_1 && TempChar_1 == TempChar_2);
return TempChar_1 - TempChar_2;
}
int main(){
int i=strcmp("foo","bar");
int j=compare("foo","bar");
return i;
}
I compiled it to assembler with gcc -S -Os test.c
using gcc 4.7.3 resulting in the following assembler:
我使用gcc -S -Os test.cusing gcc 4.7.3将它编译成汇编程序,产生以下汇编程序:
.file "test.c"
.text
.globl compare
.type compare, @function
compare:
.LFB24:
.cfi_startproc
xorl %edx, %edx
.L2:
movsbl (%rdi,%rdx), %eax
movsbl (%rsi,%rdx), %ecx
incq %rdx
cmpb %cl, %al
jne .L4
testb %al, %al
jne .L2
.L4:
subl %ecx, %eax
ret
.cfi_endproc
.LFE24:
.size compare, .-compare
.section .rodata.str1.1,"aMS",@progbits,1
.LC0:
.string "bar"
.LC1:
.string "foo"
.section .text.startup,"ax",@progbits
.globl main
.type main, @function
main:
.LFB25:
.cfi_startproc
movl $.LC0, %esi
movl $.LC1, %edi
call compare
movl $1, %eax
ret
.cfi_endproc
.LFE25:
.size main, .-main
.ident "GCC: (Ubuntu/Linaro 4.7.3-1ubuntu1) 4.7.3"
.section .note.GNU-stack,"",@progbits
I am not that good in x86 assembler but as far as I see it the call to the strcmp is removed and simply replaced by a constant expression ( movl $1, %eax
). So if you use a constant expression for your tests, gcc probably optimizes the strcmp to a constant.
我在x86汇编程序中表现不佳,但据我所知,对strcmp的调用被删除并简单地用常量表达式替换(movl $ 1,%eax)。因此,如果为测试使用常量表达式,gcc可能会将strcmp优化为常量。
#2
5
When comparing performance, I've found that it's best to put the test functions and the test driver in separate compilation units. Put your test functions in separate compilation units, and compile those to whatever optimization level you want, but compile the test driver unoptimized. Otherwise you will run into exactly the kind of problem you've seen here.
在比较性能时,我发现最好将测试函数和测试驱动程序放在不同的编译单元中。将测试函数放在单独的编译单元中,并将它们编译为您想要的任何优化级别,但编译未优化的测试驱动程序。否则你会遇到你在这里看到的那种问题。
The problem is that strcmp
compares two const
C-style strings. If you loop 500,000,000 times over strcmp(string_a, string_b)
, an optimizing compiler is going to be smart enough to reduce that loop to optimize that loop away, and then perhaps smart enough to optimize away the one remaining call to strcmp
.
问题是strcmp比较了两个const C风格的字符串。如果你在strcmp(string_a,string_b)上循环500,000,000次,那么优化编译器将足够聪明以减少该循环以优化该循环,然后可能足够聪明以优化对strcmp的剩余调用。
Your compare function takes two non-const strings. As far as the compiler is concerned, your function may well have side effects. The compiler doesn't know, so it can't optimize the loop down to nothing. It has to generate code to perform the comparison 500,000,000 times.
您的compare函数需要两个非const字符串。就编译器而言,您的功能可能会产生副作用。编译器不知道,因此无法将循环优化为零。它必须生成代码才能执行500,000,000次比较。
#3
0
I believe most of the standard libraries are written in assembly language. That could be the reason you see the standard library is faster than yours.
我相信大多数标准库都是用汇编语言编写的。这可能是您看到标准库比您的更快的原因。