x86程序集中堆栈对齐的响应性

I am trying to get a clear picture of who (caller or callee) is reponsible of stack alignment. The case for 64-bit assembly is rather clear, that it is by caller.

我试图清楚地了解谁(调用者或被调用者)负责堆栈对齐。 64位汇编的情况相当清楚,它来自调用者。

Referring to System V AMD64 ABI, section 3.2.2 The Stack Frame:

参考System V AMD64 ABI,第3.2.2节Stack Stack:

The end of the input argument area shall be aligned on a 16 (32, if __m256 is passed on stack) byte boundary.

输入参数区域的末尾应在16(32,如果在堆栈上传递__m256)字节边界上对齐。

In other words, it should be safe to assume, that for every entry point of called function:

换句话说,应该可以安全地假设,对于被调用函数的每个入口点:

16 | (%rsp + 8)

16 | (%rsp + 8)

holds (extra eight is because call implicitely pushes return address on stack).

保持(额外八个是因为调用隐含地在栈上推送返回地址)。

How it looks in 32-bit world (assuming cdecl)? I noticed that gcc places the alignment inside the called function with following construct:

它在32位世界中的表现(假设为cdecl)?我注意到gcc使用以下构造将对齐放置在被调用函数中:

and esp, -16

which seems to indicate, that is callee's responsibility.

这似乎表明,这是被告的责任。

To put it clearer, consider following code:

为了更清楚,请考虑以下代码:

global main
extern printf
extern scanf
section .rodata
    s_fmt   db "%d %d", 0
    s_res   db `%d with remainder %d\n`, 0
section .text
main:
    start   0, 0
    sub     esp, 8
    mov     DWORD [ebp-4], 0 ; dividend
    mov     DWORD [ebp-8], 0 ; divisor

    lea     eax, [ebp-8]
    push    eax
    lea     eax, [ebp-4]
    push    eax
    push    s_fmt
    call    scanf
    add     esp, 12

    mov     eax, [ebp-4]
    cdq
    idiv    DWORD [ebp-8]

    push    edx
    push    eax
    push    s_res
    call    printf

    xor     eax, eax
    leave
    ret

Is it required to align the stack before scanf is called? If so, then this would require to decrease %esp by four bytes before pushing these two arguments to scanf as:

是否需要在调用scanf之前对齐堆栈?如果是这样,那么在将这两个参数推送到scanf之前,这需要将%esp减少四个字节:

4 bytes (return address)
4 bytes (%ebp of previous stack frame)
8 bytes (for two variables)
12 bytes (three arguments for scanf)
= 28

1 个解决方案

#1

gcc is just taking a defensive approach with -m32, by not assuming that main is called with a properly 16B-aligned stack.

gcc只是采用-m32的防御方法,不假设使用正确的16B对齐堆栈调用main。

The i386 System V ABI has guaranteed/required for years that ESP+4 is 16B-aligned on entry to a function. (i.e. ESP must be 16B-aligned before a CALL instruction, so args on the stack start at a 16B boundary. This is the same as for x86-64 System V.)

多年来,i386 System V ABI保证/要求ESP + 4在进入功能时进行16B对齐。 (即ESP必须在CALL指令之前对齐16B,因此堆栈上的args从16B边界开始。这与x86-64系统V相同。)

The ABI also guarantees that new 32-bit processes start with ESP aligned on a 16B boundary (e.g. at _start, the ELF entry point, where ESP points at argc, not a return address), and the glibc CRT code maintains that alignment.

ABI还保证新的32位进程以ESP在16B边界上对齐开始(例如在_start,ELF入口点,其中ESP指向argc,而不是返回地址),并且glibc CRT代码保持该对齐。

As far as the calling convention is concerned, EBP is just another call-preserved register. But yes, compiler output with -fno-omit-frame-pointer does take care to push ebp before other call-preserved registers (like EBX), and do so even if the function doesn't need to use EBP, so the saved EBP values form a linked list.

就调用约定而言,EBP只是另一个调用保留寄存器。但是,带有-fno-omit-frame-pointer的编译器输出确实会在其他调用保留寄存器(如EBX)之前推送ebp,即使该函数不需要使用EBP也这样做,所以保存的EBP值形成链表。

Perhaps gcc is defensive because an extremely ancient Linux kernel (from before that revision to the i386 ABI, when the required alignment was only 4B) could violate that assumption, and it's only an extra couple instructions that run once in the life-time of the process (assuming the program doesn't call main recursively).

也许gcc是防御性的,因为一个非常古老的Linux内核(从i386 ABI的修订之前,当所需的对齐只有4B时)可能违反了这个假设,并且它只是在生命周期中运行一次的额外几个指令。进程(假设程序没有递归调用main)。

Unlike gcc, clang assumes the stack is properly aligned on entry to main. (clang also assumes that narrow args have been sign or zero-extended to 32 bits, even though the current ABI revision doesn't specify that behaviour (yet). gcc and clang both emit code that does in the caller side, but only clang depends on it in the callee. This happens in 64-bit code, but I didn't check 32-bit.)

与gcc不同,clang假设堆栈在进入main时正确对齐。 (clang还假设窄args已经签名或零扩展为32位,即使当前的ABI版本没有指定该行为(尚未).gcc和clang都发出在调用者端执行的代码,但只是铿锵取决于它在被调用者。这发生在64位代码,但我没有检查32位。)

Look at compiler output on http://gcc.godbolt.org/ for main and functions other than main if you're curious.

如果你很好奇,请查看http://gcc.godbolt.org/上的编译器输出,了解main以外的main和函数。

I just updated the ABI links in the x86 tag wiki the other day. http://x86-64.org/ is still dead and seems to be not coming back, so I updated the System V links to point to the PDFs of the current revision in HJ Lu's github repo, and his page with links.

我刚刚更新了x86标签wiki中的ABI链接。 http://x86-64.org/仍然死了,似乎没有回来,所以我更新了System V链接以指向HJ Lu的github repo中当前版本的PDF,以及带有链接的页面。

Note that the last version on SCO's site is not the current revision, and doesn't include the 16B-stack-alignment requirement.

请注意,SCO站点上的最新版本不是当前版本,并且不包括16B堆栈对齐要求。

#1