为什么一个代码样本执行的时间比其他的要多?

sample 1

示例1

for(int i = 0 ; i <= 99 ; i++)
    printf("Hello world");

sample 2

示例2

printf("Hello world"); // 1st print
printf("Hello world"); // 2nd print
.
.
.
printf("Hello world"); // 100th print

I know that sample one takes more time to execute than sample 2 and sample 2 takes more memory in text segment.

我知道，示例1比示例2花费更多的时间，而示例2在文本段中占用更多的内存。

But, I want to know that what's going on behind the scene.

但是，我想知道幕后发生了什么。

2 个解决方案

#1

Imagine sample one being written as this sequence of operations:

假设一个样本被写成这样的操作序列:

i = 0
if (i <= 99)
print
i++
jump
if (i <= 99)
print
i++
jump
if (i <= 99)
print
i++
jump
...

While the second sample is simply:

第二个例子很简单:

print
print
print
print
...

This is extremely simplified, but you should get the idea - the first sample executes many more instructions to go through the loop.

这是非常简化的，但是您应该得到这个想法——第一个示例执行更多的指令来执行循环。

As a side note - this is one of the optimizations the compiler will frequently do - it will unroll the loop and compile it as if there was no loop. To do that, it has to come to the conclusion it is worth while - note that sample two will compile into much greater total number of instructions and will take much more space in memory (and therefore will take longer to load).

作为附加说明——这是编译器经常要做的优化之一——它将展开循环并将其编译成没有循环。要做到这一点，就必须得出结论，值得注意的是，示例2将编译成更大的指令总数，并在内存中占用更大的空间(因此需要更长的时间来加载)。

#2

The code at sample 2 can be quicker if programmed properly.

如果编程正确，示例2中的代码可以更快。

As you have described, there are 100 calls to printf("... "); with the same string as parameter. If the compiler is an optimizing compiler, it can detect you are passing exactly the same parameter and don't pop the pointer after the call, so it won't need to push it again for the next call.

正如您所描述的，有100个调用printf(“……””);使用与参数相同的字符串。如果编译器是一个优化的编译器，它可以检测到您正在传递完全相同的参数，并且在调用之后不弹出指针，因此它不需要再次推动它进行下一个调用。

Also, the difference in speed between the loop is the time spent in jumping back to the beginning of the loop. With present architectures, that can be even an advantage, as the whole loop code is cached by the CPU (this cannot be done with a large set of similar calls) and no memory access is to be made to get the instructions loaded, compensating for the time spent in executing the loop instructions.

另外，循环之间的速度差异是返回到循环开始的时间。与目前的体系结构,甚至可以是一个优势,作为整个循环代码由CPU缓存(这个不能用大量相似的调用),没有内存访问将指令加载,补偿执行循环指令的时间。

But... even, with a good optimizing compiler, it can detect you have put the same sentence 100 times and fold'em in a loop, with a hidden control variable (as in sample 1) so you don't se a difference in time on execution.

但是…即使有一个好的优化编译器，它也可以检测到你已经将相同的句子重复了100次，并将其折叠成一个循环，并带有一个隐藏的控制变量(如样本1)，所以你不会在执行过程中产生时间差。

Optimizing compilers are used to detect these kind of constructions and to change the code to be more efficient.

优化编译器用于检测这些结构，并更改代码以提高效率。

A good reference for this kind of material is this: http://en.wikipedia.org/wiki/Compilers:_Principles,_Techniques,_and_Tools

这类材料的一个很好的参考是:http://en.wikipedia.org/wiki/编译器:_Principles、_Techniques、_and_Tools。

#1