I'm doing some experimenting with x86-64 assembly. Having compiled this dummy function:
我正在尝试使用x86-64程序集。编译了这个虚函数:
long myfunc(long a, long b, long c, long d,
long e, long f, long g, long h)
{
long xx = a * b * c * d * e * f * g * h;
long yy = a + b + c + d + e + f + g + h;
long zz = utilfunc(xx, yy, xx % yy);
return zz + 20;
}
With gcc -O0 -g
I was surprised to find the following in the beginning of the function's assembly:
使用gcc -O0 -g我很惊讶在函数程序集的开头找到以下内容:
0000000000400520 <myfunc>:
400520: 55 push rbp
400521: 48 89 e5 mov rbp,rsp
400524: 48 83 ec 50 sub rsp,0x50
400528: 48 89 7d d8 mov QWORD PTR [rbp-0x28],rdi
40052c: 48 89 75 d0 mov QWORD PTR [rbp-0x30],rsi
400530: 48 89 55 c8 mov QWORD PTR [rbp-0x38],rdx
400534: 48 89 4d c0 mov QWORD PTR [rbp-0x40],rcx
400538: 4c 89 45 b8 mov QWORD PTR [rbp-0x48],r8
40053c: 4c 89 4d b0 mov QWORD PTR [rbp-0x50],r9
400540: 48 8b 45 d8 mov rax,QWORD PTR [rbp-0x28]
400544: 48 0f af 45 d0 imul rax,QWORD PTR [rbp-0x30]
400549: 48 0f af 45 c8 imul rax,QWORD PTR [rbp-0x38]
40054e: 48 0f af 45 c0 imul rax,QWORD PTR [rbp-0x40]
400553: 48 0f af 45 b8 imul rax,QWORD PTR [rbp-0x48]
400558: 48 0f af 45 b0 imul rax,QWORD PTR [rbp-0x50]
40055d: 48 0f af 45 10 imul rax,QWORD PTR [rbp+0x10]
400562: 48 0f af 45 18 imul rax,QWORD PTR [rbp+0x18]
gcc
very strangely spills all argument registers onto the stack and then takes them from memory for further operations.
gcc非常奇怪地将所有参数寄存器溢出到堆栈中,然后将它们从内存中取出以进行进一步的操作。
This only happens on -O0
(with -O1
there are no problems), but still, why? This looks like an anti-optimization to me - why would gcc
do that?
这只发生在-O0上(-O1没有问题),但为什么呢?这看起来像是对我的反优化 - 为什么gcc会这样做?
2 个解决方案
#1
7
I am by no means a GCC internals expert, but I'll give it a shot. Unfortunately most of the information on GCCs register allocation and spilling seems to be out of date (referencing files like local-alloc.c
that don't exist anymore).
我绝不是GCC的内部专家,但我会试一试。不幸的是,关于GCC注册分配和溢出的大多数信息似乎都已过时(引用不再存在的local-alloc.c等文件)。
I'm looking at the source code of gcc-4.5-20110825
.
我正在查看gcc-4.5-20110825的源代码。
In GNU C Compiler Internals it is mentioned that the initial function code is generated by expand_function_start
in gcc/function.c
. There we find the following for handling parameters:
在GNU C Compiler Internals中,提到初始功能代码由gcc / function.c中的expand_function_start生成。我们在处理参数中找到以下内容:
4462 /* Initialize rtx for parameters and local variables.
4463 In some cases this requires emitting insns. */
4464 assign_parms (subr);
In assign_parms
the code that handles where each arguments is stored is the following:
在assign_parms中,处理每个参数存储位置的代码如下:
3207 if (assign_parm_setup_block_p (&data))
3208 assign_parm_setup_block (&all, parm, &data);
3209 else if (data.passed_pointer || use_register_for_decl (parm))
3210 assign_parm_setup_reg (&all, parm, &data);
3211 else
3212 assign_parm_setup_stack (&all, parm, &data);
assign_parm_setup_block_p
handles aggregate data types and is not applicable in this case and since the data is not passed as a pointer GCC checks use_register_for_decl
.
assign_parm_setup_block_p处理聚合数据类型,在这种情况下不适用,因为数据不作为指针传递,GCC会检查use_register_for_decl。
Here the relevant part is:
这里的相关部分是:
1972 if (optimize)
1973 return true;
1974
1975 if (!DECL_REGISTER (decl))
1976 return false;
DECL_REGISTER
tests whether the variable was declared with the register
keyword. And now we have our answer: Most parameters live on the stack when optimizations are not enabled, and are then handled by assign_parm_setup_stack
. The route taken through the source code before it ends up spilling the value is slightly more complicated for pointer arguments, but can be traced in the same file if you're curious.
DECL_REGISTER测试变量是否使用register关键字声明。现在我们得到了答案:当未启用优化时,大多数参数都存在于堆栈中,然后由assign_parm_setup_stack处理。在源代码最终溢出值之前获取的路由对于指针参数稍微复杂一些,但如果你很好奇,可以在同一个文件中跟踪它。
Why does GCC spill all arguments and local variables with optimizations disabled? To help debugging. Consider this simple function:
为什么GCC会在禁用优化的情况下溢出所有参数和局部变量?帮助调试。考虑这个简单的功能:
1 extern int bar(int);
2 int foo(int a) {
3 int b = bar(a | 1);
4 b += 42;
5 return b;
6 }
Compiled with gcc -O1 -c
this generates the following on my machine:
使用gcc -O1 -c编译,这会在我的机器上生成以下内容:
0: 48 83 ec 08 sub $0x8,%rsp
4: 83 cf 01 or $0x1,%edi
7: e8 00 00 00 00 callq c <foo+0xc>
c: 83 c0 2a add $0x2a,%eax
f: 48 83 c4 08 add $0x8,%rsp
13: c3 retq
Which is fine except if you break on line 5 and try to print the value of a, you get
这是好的,除非你在第5行打破并尝试打印a的值,你得到
(gdb) print a
$1 = <value optimized out>
As the argument gets overwritten since it's not used after the call to bar
.
因为参数被覆盖,因为在调用bar之后它没有被使用。
#2
6
A couple of reasons:
有几个原因:
- In the general case, an argument to a function has to be treated like a local variable because it could be stored to or have its address taken within the function. Therefore, it is simplest to just allocate a stack slot for every arguments.
- 在一般情况下,函数的参数必须被视为局部变量,因为它可以存储到函数中或在函数中获取其地址。因此,最简单的方法是为每个参数分配一个堆栈槽。
- Debug information becomes much simpler to emit with stack locations: the argument's value is always at some specific location, instead of moving around between registers and memory.
- 使用堆栈位置发出调试信息变得更加简单:参数的值始终位于某个特定位置,而不是在寄存器和内存之间移动。
When you're looking at -O0 code in general, consider that the compiler's top priorities are reducing compile-time as much as possible and generating high-quality debugging information.
当您正在查看-O0代码时,请考虑编译器的首要任务是尽可能减少编译时间并生成高质量的调试信息。
#1
7
I am by no means a GCC internals expert, but I'll give it a shot. Unfortunately most of the information on GCCs register allocation and spilling seems to be out of date (referencing files like local-alloc.c
that don't exist anymore).
我绝不是GCC的内部专家,但我会试一试。不幸的是,关于GCC注册分配和溢出的大多数信息似乎都已过时(引用不再存在的local-alloc.c等文件)。
I'm looking at the source code of gcc-4.5-20110825
.
我正在查看gcc-4.5-20110825的源代码。
In GNU C Compiler Internals it is mentioned that the initial function code is generated by expand_function_start
in gcc/function.c
. There we find the following for handling parameters:
在GNU C Compiler Internals中,提到初始功能代码由gcc / function.c中的expand_function_start生成。我们在处理参数中找到以下内容:
4462 /* Initialize rtx for parameters and local variables.
4463 In some cases this requires emitting insns. */
4464 assign_parms (subr);
In assign_parms
the code that handles where each arguments is stored is the following:
在assign_parms中,处理每个参数存储位置的代码如下:
3207 if (assign_parm_setup_block_p (&data))
3208 assign_parm_setup_block (&all, parm, &data);
3209 else if (data.passed_pointer || use_register_for_decl (parm))
3210 assign_parm_setup_reg (&all, parm, &data);
3211 else
3212 assign_parm_setup_stack (&all, parm, &data);
assign_parm_setup_block_p
handles aggregate data types and is not applicable in this case and since the data is not passed as a pointer GCC checks use_register_for_decl
.
assign_parm_setup_block_p处理聚合数据类型,在这种情况下不适用,因为数据不作为指针传递,GCC会检查use_register_for_decl。
Here the relevant part is:
这里的相关部分是:
1972 if (optimize)
1973 return true;
1974
1975 if (!DECL_REGISTER (decl))
1976 return false;
DECL_REGISTER
tests whether the variable was declared with the register
keyword. And now we have our answer: Most parameters live on the stack when optimizations are not enabled, and are then handled by assign_parm_setup_stack
. The route taken through the source code before it ends up spilling the value is slightly more complicated for pointer arguments, but can be traced in the same file if you're curious.
DECL_REGISTER测试变量是否使用register关键字声明。现在我们得到了答案:当未启用优化时,大多数参数都存在于堆栈中,然后由assign_parm_setup_stack处理。在源代码最终溢出值之前获取的路由对于指针参数稍微复杂一些,但如果你很好奇,可以在同一个文件中跟踪它。
Why does GCC spill all arguments and local variables with optimizations disabled? To help debugging. Consider this simple function:
为什么GCC会在禁用优化的情况下溢出所有参数和局部变量?帮助调试。考虑这个简单的功能:
1 extern int bar(int);
2 int foo(int a) {
3 int b = bar(a | 1);
4 b += 42;
5 return b;
6 }
Compiled with gcc -O1 -c
this generates the following on my machine:
使用gcc -O1 -c编译,这会在我的机器上生成以下内容:
0: 48 83 ec 08 sub $0x8,%rsp
4: 83 cf 01 or $0x1,%edi
7: e8 00 00 00 00 callq c <foo+0xc>
c: 83 c0 2a add $0x2a,%eax
f: 48 83 c4 08 add $0x8,%rsp
13: c3 retq
Which is fine except if you break on line 5 and try to print the value of a, you get
这是好的,除非你在第5行打破并尝试打印a的值,你得到
(gdb) print a
$1 = <value optimized out>
As the argument gets overwritten since it's not used after the call to bar
.
因为参数被覆盖,因为在调用bar之后它没有被使用。
#2
6
A couple of reasons:
有几个原因:
- In the general case, an argument to a function has to be treated like a local variable because it could be stored to or have its address taken within the function. Therefore, it is simplest to just allocate a stack slot for every arguments.
- 在一般情况下,函数的参数必须被视为局部变量,因为它可以存储到函数中或在函数中获取其地址。因此,最简单的方法是为每个参数分配一个堆栈槽。
- Debug information becomes much simpler to emit with stack locations: the argument's value is always at some specific location, instead of moving around between registers and memory.
- 使用堆栈位置发出调试信息变得更加简单:参数的值始终位于某个特定位置,而不是在寄存器和内存之间移动。
When you're looking at -O0 code in general, consider that the compiler's top priorities are reducing compile-time as much as possible and generating high-quality debugging information.
当您正在查看-O0代码时,请考虑编译器的首要任务是尽可能减少编译时间并生成高质量的调试信息。