从C转换为汇编时，以null结尾的字符串在哪里？

I made two programs to output two strings, one in assembly and the other one in C. This is the program in assembly:

我制作了两个程序来输出两个字符串,一个在汇编中,另一个在C中。这是程序集中的程序:

.section .data
string1:
.ascii "Hola\0"
string2:
.ascii "Adios\0"

.section .text
.globl _start
_start:

pushl $string1
call puts
addl $4, %esp

pushl $string2
call puts
addl $4, %esp

movl $1, %eax
movl $0, %ebx
int $0x80

I build the program with

我用它构建程序

as test.s -o test.o
ld -dynamic-linker /lib/ld-linux.so.2 -o test test.o -lc

And the output is as expected

输出正如预期的那样

Hola
Adios

This is the C program:

这是C程序:

#include <stdio.h>
int main(void)
{
    puts("Hola");
    puts("Adios");
    return 0;
}

And I get the expected output, but when converting this C program to assembly with gcc -S (OS is Debian 32 bit) the output assembly source code does not include the null character in both strings, as you can see here:

我得到了预期的输出,但是当使用gcc -S(OS是Debian 32位)将此C程序转换为汇编时,输出汇编源代码不包括两个字符串中的空字符,如下所示:

    .file   "testc.c"
    .section    .rodata
.LC0:
    .string "Hola"
.LC1:
    .string "Adios"
    .text
    .globl  main
    .type   main, @function
main:
.LFB0:
    .cfi_startproc
    leal    4(%esp), %ecx
    .cfi_def_cfa 1, 0
    andl    $-16, %esp
    pushl   -4(%ecx)
    pushl   %ebp
    .cfi_escape 0x10,0x5,0x2,0x75,0
    movl    %esp, %ebp
    pushl   %ecx
    .cfi_escape 0xf,0x3,0x75,0x7c,0x6
    subl    $4, %esp
    subl    $12, %esp
    pushl   $.LC0
    call    puts
    addl    $16, %esp
    subl    $12, %esp
    pushl   $.LC1
    call    puts
    addl    $16, %esp
    movl    $0, %eax
    movl    -4(%ebp), %ecx
    .cfi_def_cfa 1, 0
    leave
    .cfi_restore 5
    leal    -4(%ecx), %esp
    .cfi_def_cfa 4, 4
    ret
    .cfi_endproc
.LFE0:
    .size   main, .-main
    .ident  "GCC: (Debian 4.9.2-10) 4.9.2"
    .section    .note.GNU-stack,"",@progbits

My two questions are:

我的两个问题是:

1) Why the gcc generated assembly code does not append the null character at the end of both strings? I thought that C did this automatically.

1)为什么gcc生成的汇编代码不会在两个字符串的末尾附加空字符?我以为C自动这样做了。

2) If I skip the null characters in my hand made assembly code i get this output:

2)如果我跳过手工制作的汇编代码中的空字符,我得到这个输出:

HolaAdios
Adios

I understand why I get the "HolaAdios" part at the first line, but why does the program end successfully after the "Adios" part if it is not null-terminated?

我理解为什么我在第一行获得“HolaAdios”部分,但为什么程序在“Adios”部分之后成功结束,如果它不是以空终止的?

2 个解决方案

#1

.string always appends a null terminator, as seen here.

.string总是附加一个空终止符,如此处所示。

Well, you can check it yourself. puts just continues until it sees a null byte. \x00s are very common, there must be one nearby so it works (probably due to section alignment of .rodata).

好吧,你可以自己检查一下。 put只是继续,直到它看到一个空字节。 \ x00s非常常见,附近必须有一个它可以工作(可能是由于.rodata的部分对齐)。

#2

Just to add a bit more detail:

只是添加更多细节:

Your second string is zero-terminated by chance, because there's nothing after it in your .data section. You dynamically link glibc, which also has a .data section which gets mapped into your process's address space. It's a private mapping, but I think it is mapped, not copied, so it's page-aligned. The rest of the page holding your executable's data segment is padded with zeros. (The ABI may not guarantee this, but Linux has to do something to avoid leaking kernel data).

你的第二个字符串是偶然的,因为你的.data部分后面没有任何内容。你动态链接glibc,它也有一个.data部分,它被映射到你的进程的地址空间。这是一个私有映射,但我认为它是映射的,而不是复制的,因此它是页面对齐的。保存可执行文件数据段的页面的其余部分用零填充。 (ABI可能无法保证这一点,但Linux必须采取措施避免泄漏内核数据)。

When your executable is loaded into memory, the data segment is loaded separately from the text segment. See this answer about the difference between sections (which the linker cares about) and executable segments (which the program loader cares about).

将可执行文件加载到内存中时,数据段将与文本段分开加载。请参阅此答案,了解各个部分(链接器关心的)和可执行段(程序加载器关心的部分)之间的区别。

Note that gcc puts string constants in the .rodata section, which the linker places in the text segment of the executable, along with the .text section: read-only so it can be shared between multiple processes running the same executable. Sections are aligned by default with padding, so even if you put your strings in .rodata without zero terminators, there would be a zero of padding after the 2nd.

请注意,gcc将字符串常量放在.rodata部分中,链接器放置在可执行文件的文本段中,以及.text部分:只读,以便它可以在运行相同可执行文件的多个进程之间共享。默认情况下,段使用填充对齐,因此即使您将字符串放在.rodata中而没有零终结符,在第二个之后也会出现填充为零的情况。

This wouldn't happen if it happened to end at the right alignment boundary (e.g. length was a multiple of 16, or something).

如果碰巧在右对齐边界处结束(例如,长度是16的倍数,或者其他东西),则不会发生这种情况。

BTW, you can confirm that there weren't any non-printing garbage characters after the string, using strace ./string-test. You can see: write(1, "Adios\n", 6) = 6

顺便说一句,您可以使用strace ./string-test确认字符串后面没有任何非打印垃圾字符。你可以看到:写(1,“Adios \ n”,6)= 6

.string is a synonym for .asciz. The manual uses different language to describe the fact that they process backslash escape sequences, and append a zero-byte, but they do the same thing. The GNU assembler has a lot of synonyms for compatibility with many different Unix vendor-supplied assemblers, so it can be confusing to realize there's actually no difference when gcc uses .zero but clang uses .skip, or something like that.

.string是.asciz的同义词。本手册使用不同的语言来描述它们处理反斜杠转义序列的事实,并附加一个零字节,但它们做同样的事情。 GNU汇编程序有很多同义词可以与许多不同的Unix供应商提供的汇编程序兼容,所以当gcc使用.zero但clang使用.skip之类的东西时,实际上没有区别可能会让人感到困惑。

I build the program with...

我用......构建程序

The commands you used will only work on a 32-bit system. On a 64-bit host, you'd build a 64-bit binary which still uses the 32-bit system call ABI. (And the 32-bit dynamic linker path, so it wouldn't even work by accident, even though static data addresses are in the low 32 bits, so could be passed to the 32-bit wrapper for sys_write.)

您使用的命令仅适用于32位系统。在64位主机上,您将构建一个仍使用32位系统调用ABI的64位二进制文件。 (和32位动态链接器路径,所以它甚至不会偶然工作,即使静态数据地址是低32位,因此可以传递给sys_write的32位包装器。)

Also, I'd recommend calling your source file test.S. capital-S is the usual for hand-written asm source. You can assemble and link with gcc -m32 -nostartfiles test.S -o test to assemble and link the same way as you were doing manually.

另外,我建议你调用你的源文件test.S. capital-S通常用于手写asm源。您可以使用gcc -m32 -nostartfiles test.S -o test进行汇编和链接,以便像手动操作一样进行汇编和链接。

See this Q&A for the full details on building asm on Linux: Assembling 32-bit binaries on a 64-bit system (GNU toolchain)

有关在Linux上构建asm的完整详细信息,请参阅此问答:在64位系统上组装32位二进制文件(GNU工具链)

See also the x86 tag wiki for lots of interesting links.

另请参阅x86标记wiki以获取许多有趣的链接。

#1

.string always appends a null terminator, as seen here.