Is there any substantial optimization when omitting the frame pointer? If I have understood correctly by reading this page, -fomit-frame-pointer
is used when we want to avoid saving, setting up and restoring frame pointers.
省略帧指针时是否有任何实质性的优化?如果我通过阅读本页正确理解,当我们想要避免保存,设置和恢复帧指针时,使用-fomit-frame-pointer。
Is this done only for each function call and if so, is it really worth to avoid a few instructions for every function? Isn't it trivial for an optimization. What are the actual implications of using this option apart from the debugging limitations?
这是仅针对每个函数调用完成的吗?如果是这样,是否真的值得为每个函数避免一些指令?优化不是一件容易的事。除了调试限制之外,使用此选项的实际含义是什么?
I compiled the following C code with and without this option
我使用和不使用此选项编译了以下C代码
int main(void)
{
int i;
i = myf(1, 2);
}
int myf(int a, int b)
{
return a + b;
}
,
,
# gcc -S -fomit-frame-pointer code.c -o withoutfp.s
# gcc -S code.c -o withfp.s
.
。
diff -u
'ing the two files revealed the following assembly code:
diff -u这两个文件显示以下汇编代码:
--- withfp.s 2009-12-22 00:03:59.000000000 +0000
+++ withoutfp.s 2009-12-22 00:04:17.000000000 +0000
@@ -7,17 +7,14 @@
leal 4(%esp), %ecx
andl $-16, %esp
pushl -4(%ecx)
- pushl %ebp
- movl %esp, %ebp
pushl %ecx
- subl $36, %esp
+ subl $24, %esp
movl $2, 4(%esp)
movl $1, (%esp)
call myf
- movl %eax, -8(%ebp)
- addl $36, %esp
+ movl %eax, 20(%esp)
+ addl $24, %esp
popl %ecx
- popl %ebp
leal -4(%ecx), %esp
ret
.size main, .-main
@@ -25,11 +22,8 @@
.globl myf
.type myf, @function
myf:
- pushl %ebp
- movl %esp, %ebp
- movl 12(%ebp), %eax
- addl 8(%ebp), %eax
- popl %ebp
+ movl 8(%esp), %eax
+ addl 4(%esp), %eax
ret
.size myf, .-myf
.ident "GCC: (GNU) 4.2.1 20070719
Could someone please shed light on the key points of the above code where -fomit-frame-pointer did actually make the difference?
有人可以阐明上面代码的关键点 - -fomit-frame-pointer确实有所作为?
Edit: objdump
's output replaced with gcc -S
's
编辑:objdump的输出被gcc -S替换
4 个解决方案
#1
25
-fomit-frame-pointer
allows one extra register to be available for general-purpose use. I would assume this is really only a big deal on 32-bit x86, which is a bit starved for registers.*
-fomit-frame-pointer允许一个额外的寄存器可用于通用目的。我认为这对32位x86来说真的很重要,因为寄存器有点缺乏。*
One would expect to see EBP no longer saved and adjusted on every function call, and probably some additional use of EBP in normal code, and fewer stack operations on occasions where EBP gets used as a general-purpose register.
人们期望看到EBP不再在每次函数调用时保存和调整,并且可能在正常代码中额外使用EBP,并且在EBP用作通用寄存器的情况下更少的堆栈操作。
Your code is far too simple to see any benefit from this sort of optimization-- you're not using enough registers. Also, you haven't turned on the optimizer, which might be necessary to see some of these effects.
您的代码太简单了,无法从这种优化中获得任何好处 - 您没有使用足够的寄存器。此外,您还没有打开优化器,这可能是查看其中一些效果所必需的。
* ISA registers, not micro-architecture registers.
* ISA寄存器,而不是微架构寄存器。
#2
9
The only downside of omitting it is that debugging is much more difficult.
省略它的唯一缺点是调试要困难得多。
The major upside is that there is one extra general purpose register which can make a big difference on performance. Obviously this extra register is used only when needed (probably in your very simple function it isn't); in some functions it makes more difference than in others.
主要好处是有一个额外的通用寄存器可以对性能产生重大影响。显然,这个额外的寄存器仅在需要时使用(可能在你非常简单的函数中不是这样);在某些功能中,它比其他功能更有区别。
#3
7
You can often get more meaningful assembly code from GCC by using the -S
argument to output the assembly:
通过使用-S参数输出程序集,您通常可以从GCC获得更有意义的汇编代码:
$ gcc code.c -S -o withfp.s
$ gcc code.c -S -o withoutfp.s -fomit-frame-pointer
$ diff -u withfp.s withoutfp.s
GCC doesn't care about the address, so we can compare the actual instructions generated directly. For your leaf function, this gives:
GCC不关心地址,因此我们可以比较直接生成的实际指令。对于你的叶子功能,这给出了:
myf:
- pushl %ebp
- movl %esp, %ebp
- movl 12(%ebp), %eax
- addl 8(%ebp), %eax
- popl %ebp
+ movl 8(%esp), %eax
+ addl 4(%esp), %eax
ret
GCC doesn't generate the code to push the frame pointer onto the stack, and this changes the relative address of the arguments passed to the function on the stack.
GCC不会生成将帧指针压入堆栈的代码,这会更改传递给堆栈上函数的参数的相对地址。
#4
4
Profile your program to see if there is a significant difference.
描述您的程序,看看是否有显着差异。
Next, profile your development process. Is debugging easier or more difficult? Do you spend more time developing or less?
接下来,分析您的开发过程。调试更容易还是更难?你花更多的时间开发还是少开发?
Optimizations without profiling are a waste of time and money.
没有分析的优化是浪费时间和金钱。
#1
25
-fomit-frame-pointer
allows one extra register to be available for general-purpose use. I would assume this is really only a big deal on 32-bit x86, which is a bit starved for registers.*
-fomit-frame-pointer允许一个额外的寄存器可用于通用目的。我认为这对32位x86来说真的很重要,因为寄存器有点缺乏。*
One would expect to see EBP no longer saved and adjusted on every function call, and probably some additional use of EBP in normal code, and fewer stack operations on occasions where EBP gets used as a general-purpose register.
人们期望看到EBP不再在每次函数调用时保存和调整,并且可能在正常代码中额外使用EBP,并且在EBP用作通用寄存器的情况下更少的堆栈操作。
Your code is far too simple to see any benefit from this sort of optimization-- you're not using enough registers. Also, you haven't turned on the optimizer, which might be necessary to see some of these effects.
您的代码太简单了,无法从这种优化中获得任何好处 - 您没有使用足够的寄存器。此外,您还没有打开优化器,这可能是查看其中一些效果所必需的。
* ISA registers, not micro-architecture registers.
* ISA寄存器,而不是微架构寄存器。
#2
9
The only downside of omitting it is that debugging is much more difficult.
省略它的唯一缺点是调试要困难得多。
The major upside is that there is one extra general purpose register which can make a big difference on performance. Obviously this extra register is used only when needed (probably in your very simple function it isn't); in some functions it makes more difference than in others.
主要好处是有一个额外的通用寄存器可以对性能产生重大影响。显然,这个额外的寄存器仅在需要时使用(可能在你非常简单的函数中不是这样);在某些功能中,它比其他功能更有区别。
#3
7
You can often get more meaningful assembly code from GCC by using the -S
argument to output the assembly:
通过使用-S参数输出程序集,您通常可以从GCC获得更有意义的汇编代码:
$ gcc code.c -S -o withfp.s
$ gcc code.c -S -o withoutfp.s -fomit-frame-pointer
$ diff -u withfp.s withoutfp.s
GCC doesn't care about the address, so we can compare the actual instructions generated directly. For your leaf function, this gives:
GCC不关心地址,因此我们可以比较直接生成的实际指令。对于你的叶子功能,这给出了:
myf:
- pushl %ebp
- movl %esp, %ebp
- movl 12(%ebp), %eax
- addl 8(%ebp), %eax
- popl %ebp
+ movl 8(%esp), %eax
+ addl 4(%esp), %eax
ret
GCC doesn't generate the code to push the frame pointer onto the stack, and this changes the relative address of the arguments passed to the function on the stack.
GCC不会生成将帧指针压入堆栈的代码,这会更改传递给堆栈上函数的参数的相对地址。
#4
4
Profile your program to see if there is a significant difference.
描述您的程序,看看是否有显着差异。
Next, profile your development process. Is debugging easier or more difficult? Do you spend more time developing or less?
接下来,分析您的开发过程。调试更容易还是更难?你花更多的时间开发还是少开发?
Optimizations without profiling are a waste of time and money.
没有分析的优化是浪费时间和金钱。