GCC如何优化循环中增加的未使用的变量?

I wrote this simple C program:

我写了这个简单的C程序:

int main() {
    int i;
    int count = 0;
    for(i = 0; i < 2000000000; i++){
        count = count + 1;
    }
}

I wanted to see how the gcc compiler optimizes this loop (clearly add 1 2000000000 times should be "add 2000000000 one time"). So:

我想看看gcc编译器是如何优化这个循环的(显然，增加1 2000000000次应该是“增加2000000000次”)。所以:

gcc test.c and then time on a.out gives:

gcc测试。然后a上的时间。给:

real 0m7.717s  
user 0m7.710s  
sys 0m0.000s

$ gcc -O2 test.c and then time ona.out` gives:

美元gcc - 02测试。然后是时间。“给:

real 0m0.003s  
user 0m0.000s  
sys 0m0.000s

Then I disassembled both with gcc -S. First one seems quite clear:

然后我用gcc -S把它们拆开。第一个似乎很清楚:

    .file "test.c"  
    .text  
.globl main
    .type   main, @function  
main:
.LFB0:
    .cfi_startproc
    pushq   %rbp
    .cfi_def_cfa_offset 16
    movq    %rsp, %rbp
    .cfi_offset 6, -16
    .cfi_def_cfa_register 6
    movl    $0, -8(%rbp)
    movl    $0, -4(%rbp)
    jmp .L2
.L3:
    addl    $1, -8(%rbp)
    addl    $1, -4(%rbp)
.L2:
    cmpl    $1999999999, -4(%rbp)
    jle .L3
    leave
    .cfi_def_cfa 7, 8
    ret
    .cfi_endproc
.LFE0:
    .size   main, .-main
    .ident  "GCC: (Ubuntu/Linaro 4.5.2-8ubuntu4) 4.5.2"
    .section    .note.GNU-stack,"",@progbits

L3 adds, L2 compare -4(%rbp) with 1999999999 and loops to L3 if i < 2000000000.

L3增加，L2比较-4(%rbp)和1999999999，如果i < 2000000000，则循环到L3。

Now the optimized one:

现在优化的一个:

    .file "test.c"  
    .text
    .p2align 4,,15
.globl main
    .type main, @function
main:
.LFB0:
    .cfi_startproc
    rep
    ret
    .cfi_endproc
.LFE0:
    .size main, .-main
    .ident "GCC: (Ubuntu/Linaro 4.5.2-8ubuntu4) 4.5.2"
    .section .note.GNU-stack,"",@progbits

I can't understand at all what's going on there! I've got little knowledge of assembly, but I expected something like

我完全不明白那里发生了什么事!我对组装几乎一无所知，但我期望的是类似的东西

addl $2000000000, -8(%rbp)

I even tried with gcc -c -g -Wa,-a,-ad -O2 test.c to see the C code together with the assembly it was converted to, but the result was no more clear that the previous one.

我甚至尝试过gcc -c -g -Wa -a -ad -O2测试。c查看c代码及其转换为的程序集，但结果并不比前一个程序集更清楚。

Can someone briefly explain:

谁能简单解释:

The gcc -S -O2 output.
gcc -S -O2输出。
If the loop is optimized as I expected (one sum instead of many sums)?
如果循环按照我的期望进行优化(一个和而不是多个和)?

2 个解决方案

#1

The compiler is even smarter than that. :)

编译器甚至更聪明。:)

In fact, it realizes that you aren't using the result of the loop. So it took out the entire loop completely!

事实上，它意识到您没有使用循环的结果。所以它把整个循环都去掉了!

This is called Dead Code Elimination.

这被称为死代码消除。

A better test is to print the result:

更好的测试是打印结果:

#include <stdio.h>
int main(void) {
    int i; int count = 0;
    for(i = 0; i < 2000000000; i++){
        count = count + 1;
    }

    //  Print result to prevent Dead Code Elimination
    printf("%d\n", count);
}

EDIT : I've added the required #include <stdio.h>; the MSVC assembly listing corresponds to a version without the #include, but it should be the same.

编辑:我添加了必需的#include ;MSVC程序集清单对应于一个没有#include的版本，但它应该是相同的。

I don't have GCC in front of me at the moment, since I'm booted into Windows. But here's the disassembly of the version with the printf() on MSVC:

我现在没有GCC在我面前，因为我被启动到窗口。但是这是用MSVC上的printf()分解的版本:

EDIT : I had the wrong assembly output. Here's the correct one.

编辑:我有错误的汇编输出。这是正确的。

; 57   : int main(){

$LN8:
    sub rsp, 40                 ; 00000028H

; 58   : 
; 59   : 
; 60   :     int i; int count = 0;
; 61   :     for(i = 0; i < 2000000000; i++){
; 62   :         count = count + 1;
; 63   :     }
; 64   : 
; 65   :     //  Print result to prevent Dead Code Elimination
; 66   :     printf("%d\n",count);

    lea rcx, OFFSET FLAT:??_C@_03PMGGPEJJ@?$CFd?6?$AA@
    mov edx, 2000000000             ; 77359400H
    call    QWORD PTR __imp_printf

; 67   : 
; 68   : 
; 69   : 
; 70   :
; 71   :     return 0;

    xor eax, eax

; 72   : }

    add rsp, 40                 ; 00000028H
    ret 0

So yes, Visual Studio does this optimization. I'd assume GCC probably does too.

是的，Visual Studio做了这个优化。我想GCC也会这么做。

And yes, GCC performs a similar optimization. Here's an assembly listing for the same program with gcc -S -O2 test.c (gcc 4.5.2, Ubuntu 11.10, x86):

是的，GCC执行类似的优化。这是一个用于gcc -S -O2测试的程序集清单。c (gcc 4.5.2, Ubuntu 11.10, x86):

        .file   "test.c"
        .section        .rodata.str1.1,"aMS",@progbits,1
.LC0:
        .string "%d\n"
        .text
        .p2align 4,,15
.globl main
        .type   main, @function
main:
        pushl   %ebp
        movl    %esp, %ebp
        andl    $-16, %esp
        subl    $16, %esp
        movl    $2000000000, 8(%esp)
        movl    $.LC0, 4(%esp)
        movl    $1, (%esp)
        call    __printf_chk
        leave
        ret
        .size   main, .-main
        .ident  "GCC: (Ubuntu/Linaro 4.5.2-8ubuntu4) 4.5.2"
        .section        .note.GNU-stack,"",@progbits

#2

Compilers have a few tools at their disposal to make code more efficient or more "efficient":

编译器有一些工具可以使代码更有效或更“有效”:

If the result of a computation is never used, the code that performs the computation can be omitted (if the computation acted upon volatile values, those values must still be read but the results of the read may be ignored). If the results of the computations that fed it weren't used, the code that performs those can be omitted as well. If such omission makes the code for both paths on a conditional branch identical, the condition may be regarded as unused and omitted. This will have no effect on the behaviors (other than execution time) of any program that doesn't make out-of-bounds memory accesses or invoke what Annex L would call "Critical Undefined Behaviors".

如果从不使用计算结果，则可以省略执行计算的代码(如果计算作用于volatile值，则必须读取这些值，但读取结果可能被忽略)。如果输入它的计算结果没有被使用，那么执行这些计算的代码也可以被省略。如果这种省略使条件分支上的两条路径的代码相同，则条件可以视为未使用和省略。这对任何不进行越界内存访问或调用附件L所称的“关键未定义行为”的程序的行为(除执行时间外)没有影响。
If the compiler determines that the machine code that computes a value can only produce results in a certain range, it may omit any conditional tests whose outcome could be predicted on that basis. As above, this will not affect behaviors other than execution time unless code invokes "Critical Undefined Behaviors".

如果编译器确定计算值的机器代码只能在一定范围内产生结果，那么它可能会忽略任何条件测试，这些条件测试的结果可以在此基础上进行预测。如上所述，这将不会影响除执行时间之外的行为，除非代码调用“关键的未定义行为”。
If the compiler determines that certain inputs would invoke any form of Undefined Behavior with the code as written, the Standard would allow the compiler to omit any code which would only be relevant when such inputs are received, even if the natural behavior of the execution platform given such inputs would have been benign and the compiler's rewrite would make it dangerous.

如果编译器确定某些输入调用任何形式的未定义的行为的代码编写,标准允许编译器忽略任何代码只会相关这样的输入是收到时,即使这样的自然行为的执行平台的输入是良性和编译器的修改将会变得危险。

Good compilers do #1 and #2. For some reason, however, #3 has become fashionable.

好的编译器做#1和#2。然而，出于某些原因，#3已经成为一种时尚。

#1