In the following code snippet
在以下代码段中
char *str1 = "abcd";
char str2[] = "defg";
I realize that the first statement stores the pointer to a string literal in the readonly section of the executable while the second one to a read write section. On examining the generated instructions I verify that the first one stores the pointer to "abcd" in rodata section to str1.
我意识到第一个语句将指针存储在可执行文件的readonly部分中的字符串文字中,而第二个语句存储到读写部分的第二个语句。在检查生成的指令时,我验证第一个指针将rodata部分中的“abcd”指针存储到str1。
What was interesting was the second statement. The compiler inserted code to store values into
有趣的是第二个声明。编译器插入代码来存储值
char *str1 = "abcd";
8048420: c7 44 24 10 20 85 04 movl $0x8048520,0x10(%esp)
8048427: 08
char str2[] = "defg";
8048428: c7 44 24 17 64 65 66 movl $0x67666564,0x17(%esp)
804842f: 67
8048430: c6 44 24 1b 00 movb $0x0,0x1b(%esp)
How does the compiler decide when to do which out of the following?
编译器如何决定何时执行以下操作?
- The string literal is stored in rodata section
- 字符串文字存储在rodata部分中
- The string literal is stored in data section (rw )
- 字符串文字存储在数据部分(rw)
- The string memory is implicit in the stack and the instructions are generated to fill the stack?
- 字符串内存隐含在堆栈中,生成指令以填充堆栈?
- Are there any other possibilities as well and variations among hardware?
- 硬件之间是否还有其他可能性和变化?
Note: I am running an precise32 vagrant, gcc with debug symbols and -O0
注意:我正在运行一个精确的32个流浪者,带有调试符号的gcc和-O0
2 个解决方案
#1
1
If your
如果你的
char str2[] = "defg";
definition is inside a function, then the compiler will generate instructions to put the data on the stack (ignoring possible optimizations, e.g. keeping values purely in registers). This works just as for other automatic (stack) variables.
如果定义在函数内部,则编译器将生成将数据放入堆栈的指令(忽略可能的优化,例如将值纯粹保存在寄存器中)。这与其他自动(堆栈)变量一样。
It also has the option of copying the data from somewhere else to the stack instead of e.g. having the data values as immediate operands to instructions. It might choose to do this for longer strings to avoid code bloat.
它还可以选择将数据从其他地方复制到堆栈而不是例如将数据值作为指令的立即操作数。它可能会选择为更长的字符串执行此操作以避免代码膨胀。
Regardless of what the compiler does, modifications to the contents of str2
must not be visible by the next invocation of the function though (just as for other automatic variables).
无论编译器做什么,下一次调用函数时都不能看到对str2内容的修改(就像其他自动变量一样)。
If str2
is global (which gives it static storage duration), then the data will end up in the read/write data segment. This also happens if you give the array static storage duration inside the function, as in
如果str2是全局的(给它静态存储持续时间),则数据将以读/写数据段结束。如果在函数内部给出数组静态存储持续时间,也会发生这种情况,如
static char str2[] = "defg";
When initiliazing a pointer with a string literal, as in
使用字符串文字初始化指针时,如
char *s = "defg";
, the data ends up in the read-only data segment, and the rules for how the pointer itself is initialized with the address of the data are the same as above.
,数据最终在只读数据段中,指针本身如何用数据地址初始化的规则与上面相同。
#2
1
When an aggregate object in memory is initialized with a compile-time aggregate value (which is not limited to string literals), the compiler always has a choice
当使用编译时聚合值(不限于字符串文字)初始化内存中的聚合对象时,编译器始终可以选择
-
Pre-build the complete initializer in read-only data section at compile time, and then just copy the whole thing into the modifiable target value by using
memcpy
at run time.在编译时在只读数据部分中预构建完整的初始化程序,然后在运行时使用memcpy将整个内容复制到可修改的目标值中。
-
Generate code that will directly build the target value "in-place" piece-by-piece at run time.
生成代码,该代码将在运行时逐个“就地”直接构建目标值。
Basically, the first is the "data-based" approach and the second is the "code-based" approach. In your case the compiler uses code-based solution, probably because the literal is short. Use a longer literal and, I suspect, it will eventually switch to the first approach.
基本上,第一种是“基于数据”的方法,第二种是“基于代码”的方法。在您的情况下,编译器使用基于代码的解决方案,可能是因为文字很短。使用更长的文字,我怀疑它最终将切换到第一种方法。
One can probably imagine that in some cases a mixed approach might be used by some compiler: part of the data is pre-build somewhere and memcpy
-ed from there, the rest of the data is built on the fly.
人们可以想象,在某些情况下,某些编译器可能会使用混合方法:部分数据在某处预构建并从那里进行memcpy-ed,其余数据是即时构建的。
#1
1
If your
如果你的
char str2[] = "defg";
definition is inside a function, then the compiler will generate instructions to put the data on the stack (ignoring possible optimizations, e.g. keeping values purely in registers). This works just as for other automatic (stack) variables.
如果定义在函数内部,则编译器将生成将数据放入堆栈的指令(忽略可能的优化,例如将值纯粹保存在寄存器中)。这与其他自动(堆栈)变量一样。
It also has the option of copying the data from somewhere else to the stack instead of e.g. having the data values as immediate operands to instructions. It might choose to do this for longer strings to avoid code bloat.
它还可以选择将数据从其他地方复制到堆栈而不是例如将数据值作为指令的立即操作数。它可能会选择为更长的字符串执行此操作以避免代码膨胀。
Regardless of what the compiler does, modifications to the contents of str2
must not be visible by the next invocation of the function though (just as for other automatic variables).
无论编译器做什么,下一次调用函数时都不能看到对str2内容的修改(就像其他自动变量一样)。
If str2
is global (which gives it static storage duration), then the data will end up in the read/write data segment. This also happens if you give the array static storage duration inside the function, as in
如果str2是全局的(给它静态存储持续时间),则数据将以读/写数据段结束。如果在函数内部给出数组静态存储持续时间,也会发生这种情况,如
static char str2[] = "defg";
When initiliazing a pointer with a string literal, as in
使用字符串文字初始化指针时,如
char *s = "defg";
, the data ends up in the read-only data segment, and the rules for how the pointer itself is initialized with the address of the data are the same as above.
,数据最终在只读数据段中,指针本身如何用数据地址初始化的规则与上面相同。
#2
1
When an aggregate object in memory is initialized with a compile-time aggregate value (which is not limited to string literals), the compiler always has a choice
当使用编译时聚合值(不限于字符串文字)初始化内存中的聚合对象时,编译器始终可以选择
-
Pre-build the complete initializer in read-only data section at compile time, and then just copy the whole thing into the modifiable target value by using
memcpy
at run time.在编译时在只读数据部分中预构建完整的初始化程序,然后在运行时使用memcpy将整个内容复制到可修改的目标值中。
-
Generate code that will directly build the target value "in-place" piece-by-piece at run time.
生成代码,该代码将在运行时逐个“就地”直接构建目标值。
Basically, the first is the "data-based" approach and the second is the "code-based" approach. In your case the compiler uses code-based solution, probably because the literal is short. Use a longer literal and, I suspect, it will eventually switch to the first approach.
基本上,第一种是“基于数据”的方法,第二种是“基于代码”的方法。在您的情况下,编译器使用基于代码的解决方案,可能是因为文字很短。使用更长的文字,我怀疑它最终将切换到第一种方法。
One can probably imagine that in some cases a mixed approach might be used by some compiler: part of the data is pre-build somewhere and memcpy
-ed from there, the rest of the data is built on the fly.
人们可以想象,在某些情况下,某些编译器可能会使用混合方法:部分数据在某处预构建并从那里进行memcpy-ed,其余数据是即时构建的。