在浅谈尾调用和尾递归这篇博文中，我谈了什么是尾递归以及编译器如何优化尾递归。这篇文章，咱来个具体的例子，通过汇编代码来看看优化和不优化的区别。

求阶乘的尾递归写法

// file_name : factorial.c
#include <stdio.h>

int factorial_tail(int n, int product_from_n)
{
if (n == 1)
return product_from_n;
else
return factorial_tail(n - 1, n * product_from_n);
}

int main(void)
{
int result = factorial_tail(5,1);
printf("%d\n",result);
}

实验思路是将上面的源码编译为普通版本和优化版本，通过对比2个版本的汇编代码来了解编译器如何优化尾递归。

编译为2个版本

gcc -S factorial.c -o factorial.s         #普通版本
gcc -S factorial.c -O2 -o factorial_O2.s  #优化版本

注：我的gcc版本是 4.8.4

对比文件

普通版本（未优化）

浅谈尾递归的优化

为了说明问题，我删除了一些伪指令，并加了些注释：

factorial_tail:
    pushq   %rbp
    movq    %rsp, %rbp
    subq    $16, %rsp
    movl    %edi, -4(%rbp) // 传递参数 n
    movl    %esi, -8(%rbp) // 传递参数 product_from_n
    cmpl    $1, -4(%rbp)   //把参数n和1比较
    jne .L2                //不相等就跳转到.L2
    movl    -8(%rbp), %eax //相等的话 eax = product_from_n
    jmp .L3                //跳转到.L3
.L2:
    movl    -4(%rbp), %eax // eax = n
    imull   -8(%rbp), %eax // eax = product_from_n*n
    movl    -4(%rbp), %edx // edx = n
    subl    $1, %edx       // edx--;
    movl    %eax, %esi     //esi = eax = (product_from_n*n)
    movl    %edx, %edi     //edi = edx = (n-1)
    call    factorial_tail //递归调用
.L3:
    leave                  //leave和ret用于返回main函数
    ret

main:
    pushq   %rbp
    movq    %rsp, %rbp
    subq    $16, %rsp
    movl    $1, %esi
    movl    $5, %edi
    call    factorial_tail

可以看到，编译器就是把C代码翻译成汇编了，没有做什么优化。递归依然是递归。

优化版本

浅谈尾递归的优化

我整理后的代码如下：

factorial_tail:
    cmpl    $1, %edi   //把edi(edi=参数n)和1比较
    movl    %esi, %eax // eax = esi =product_from_n
    je  .L2            //if(edi==1),则跳到.L2
.L3:
    imull   %edi, %eax  // eax = (edi * eax)= (n * product_from_n)
    subl    $1, %edi    // edi--; (即n--;)
    cmpl    $1, %edi    // edi和1相比较，即n和1相比较
    jne .L3             // if(edi != 1) 则跳转到.L3
.L2:
    rep ret   //返回主调函数，返回值在eax中

由此可见，编译器把递归转化成循环了。从第5行到第9行是典型的循环结构。

再多说一句，最后一行的rep指令是什么意思呢？我搜到的结果如下。

Some AMD processors have a one cycle pipeline stall when a “ret” instruction is the target of a conditional jump or is immediately preceded by a conditional jump. To avoid this, gcc generates “rep; ret” in those cases intead. Since the “rep” prefix means nothing with the “ret” instruction, this effectively becomes a two byte “ret” instruction, and that is sufficient to avoid the pipeline bubble.

秒客网

浅谈尾递归的优化

求阶乘的尾递归写法

编译为2个版本

对比文件

普通版本（未优化）

优化版本

相关文章