当循环包含条件时，为什么C程序运行较慢

I am rephrasing this question based on the comments received.

我根据收到的意见重新提出这个问题。

I have a loop that runs 30 Billion times and assigns values to a chunk of memory assigned using malloc();

我有一个运行30亿次的循环,并将值分配给使用malloc()分配的内存块;

When the loop contains a condition it runs much slower than when the condition is not present. Review the scenarios below:

当循环包含条件时,它比不存在条件时运行得慢得多。查看以下方案:

Scenario A: Condition is present and program is slow (43 sec)

情景A:条件存在且程序缓慢(43秒)

Scenario B: Condition is not present and program is much faster (4 sec)

场景B:条件不存在,程序更快(4秒)

// gcc -O3 -c block.c && gcc -o block block.o



#include <stdio.h>
#include <stdlib.h>


#define LEN 3000000000

int main (int argc, char** argv){

    long i,j;

    unsigned char *n = NULL;
    unsigned char *m = NULL;

    m = (unsigned char *) malloc (sizeof(char) * LEN);

    n = m;

    srand ((unsigned) time(NULL));  

    int t = (unsigned) time(NULL);

    for (j = 0; j < 10; j++){

        n = m;

        for (i = 0; i < LEN; i++){


            //////////// A: THIS IS SLOW
            /*
            if (i % 2){
                *n = 1;         

            } else {
                *n = 0;
            }   
            */
            /////////// END OF A


            /////////// B: THIS IS FAST

            *n = 0;

            i % 2;

            *n = 1;

            /////////// END OF B

            n += 1;

        }
    }


    printf("Done. %d sec \n", ((unsigned) time(NULL)) - t );

    free(m);

    return 0;
}

Regards, KD

1 个解决方案

#1

You can use gcc -S -O3 to have a look at the resulting assembler. Here is an example on an Intel box:

您可以使用gcc -S -O3查看生成的汇编程序。以下是英特尔机箱的示例:

Fast version:

    movl    %eax, %r12d
    .p2align 4,,10
    .p2align 3
.L2:
    movl    $3000000000, %edx
    movl    $1, %esi
    movq    %rbp, %rdi
    call    memset
    subq    $1, %rbx
    jne .L2

Slow version:

    movl    $10, %edi
    movl    %eax, %ebp
    movl    $3000000000, %esi
    .p2align 4,,10
    .p2align 3
.L2:
    xorl    %edx, %edx
    .p2align 4,,10
    .p2align 3
.L5:
    movq    %rdx, %rcx
    andl    $1, %ecx
    movb    %cl, (%rbx,%rdx)
    addq    $1, %rdx
    cmpq    %rsi, %rdx
    jne     .L5
    subq    $1, %rdi
    jne     .L2

Conclusion: the compiler is smarter than you think. It is able to optimize the inner loop as a memset (which is faster because it uses SSE/AVX or REP instructions on Intel). However, this optimization cannot kick in if the condition is kept - because the result is different.

结论:编译器比你想象的更聪明。它能够将内部循环优化为memset(由于它在Intel上使用SSE / AVX或REP指令,因此速度更快)。但是,如果保持条件,则无法启动此优化 - 因为结果不同。

#1