CFI指令意味着什么? (还有一些问题)

时间:2021-01-19 03:17:41

Ok, this is gonna be a long question. I'm trying to understand how "buffer overflow" works. I am reading Smashing the stack for fun and profit by aleph1 and have just got the disassembly of the following code:

好的,这将是一个很长的问题。我试图理解“缓冲区溢出”是如何工作的。我正在阅读通过aleph1粉碎堆栈以获得乐趣和利润,并且刚刚获得了以下代码的反汇编:

void function(int a, int b, int c) {
   char buffer1[5];
   char buffer2[10];
}

void main() {
  function(1,2,3);
}

The disameembly using -S flag of GCC gives me:

使用GCC的-S标志的disameembly给了我:

    .file   "example1.c"
    .text
    .globl  function
    .type   function, @function
function:
.LFB0:
    .cfi_startproc
    pushq   %rbp
    .cfi_def_cfa_offset 16
    .cfi_offset 6, -16
    movq    %rsp, %rbp
    .cfi_def_cfa_register 6
    subq    $48, %rsp
    movl    %edi, -36(%rbp)
    movl    %esi, -40(%rbp)
    movl    %edx, -44(%rbp)
    movq    %fs:40, %rax
    movq    %rax, -8(%rbp)
    xorl    %eax, %eax
    movq    -8(%rbp), %rax
    xorq    %fs:40, %rax
    je  .L2
    call    __stack_chk_fail
.L2:
    leave
    .cfi_def_cfa 7, 8
    ret
    .cfi_endproc
.LFE0:
    .size   function, .-function
    .globl  main
    .type   main, @function
main:
.LFB1:
    .cfi_startproc
    pushq   %rbp
    .cfi_def_cfa_offset 16
    .cfi_offset 6, -16
    movq    %rsp, %rbp
    .cfi_def_cfa_register 6
    movl    $3, %edx
    movl    $2, %esi
    movl    $1, %edi
    call    function
    popq    %rbp
    .cfi_def_cfa 7, 8
    ret
    .cfi_endproc
.LFE1:
    .size   main, .-main
    .ident  "GCC: (Ubuntu 4.8.2-19ubuntu1) 4.8.2"
    .section    .note.GNU-stack,"",@progbits

the .cfi directives are not in the paper by Aleph1 and I guess that they were not used back then. I have read this question on SO and I get that they are used by GCC for exception handling. I have also read another question on SO and I get that .LFB0, .LFE0, .LFE1 and .LFB1 are labels however I have the following doubts:

Aleph1的文章中没有.cfi指令,我猜他们当时没有使用它们。我已经在SO上阅读了这个问题,我得到它们被GCC用于异常处理。我还读了另一个关于SO的问题,我得到了.LFB0,.LFE0,.LFE1和.LFB1是标签,但我有以下疑问:

  1. I get that .cfi directives are used for exception handling however I don't understand what they mean. I have been here and I see some definitions like:
  2. 我知道.cfi指令用于异常处理,但我不明白它们的含义。我一直在这里,我看到一些定义,如:

.cfi_def_cfa register, offset

.cfi_def_cfa寄存器,偏移量

.cfi_def_cfa defines a rule for computing CFA as: take address from register and add offset to it.

.cfi_def_cfa将计算CFA的规则定义为:从寄存器获取地址并向其添加偏移量。

However, if you take a look at the disassembly that I have put above you don't find any register name (like EAX, EBX and so on) instead you find a number there (I have generally found '6') and I don't know how's that supposed to be a register. Especially, can anyone explain what .cfi_def_cfa_offset 16, .cfi_offset 6, -16, .cfi_def_cfa_register 6 and .cfi_def_cfa 7, 8 mean? Also, what does CFA mean? I am asking this because mostly in books/papers the procedure prolog is like :

但是,如果你看一下我上面的反汇编,你找不到任何注册名称(比如EAX,EBX等),而你在那里找到一个数字(我一般都找到'6')而且我不喜欢我不知道那应该是一个寄存器。特别是,任何人都可以解释一下.cfi_def_cfa_offset 16,.cfi_offset 6,-16,.cfi_def_cfa_register 6和.cfi_def_cfa 7,8是什么意思?此外,CFA是什么意思?我问这个是因为主要是在书籍/论文中,程序序言如下:

 pushl %ebp
 movl %esp,%ebp
 subl $20,%esp

However, now I think the procedure prolog in modern computers is as follows:

但是,现在我认为现代计算机中的程序序列如下:

    .cfi_startproc
    pushq   %rbp
    .cfi_def_cfa_offset 16
    .cfi_offset 6, -16
    movq    %rsp, %rbp
    .cfi_def_cfa_register 6
    subq    $48, %rsp

Initially I thought that the CFI directives are used instead of sub mnemonic to set the offset but that's not the case; the sub command is still being used in spite of using the CFI directives.

最初我认为使用CFI指令代替子助记符来设置偏移量但事实并非如此;尽管使用了CFI指令,仍然使用sub命令。

  1. I understood that there are labels for each procedure. However, why are multiple nested labels inside a procedure? In my case main has .LFB1 and .LFE2 labels. What is the need for multiple labels? Similarly the function procedure has the labels .LFB0, .L2 and .LFE0

    我知道每个程序都有标签。但是,为什么程序中有多个嵌套标签?在我的情况下,main有.LFB1和.LFE2标签。多个标签需要什么?类似地,函数过程具有标签.LFB0,.L2和.LFE0

  2. The last 3 lines for both the procedures seem to be used for some housekeeping functions (telling the size of the procedure, maybe?) but I am not sure what do they mean. Can anyone explain what do they mean and what's their use?

    两个程序的最后3行似乎用于一些内务处理功能(告诉程序的大小,可能?)但我不确定它们是什么意思。任何人都可以解释他们的意思和用途是什么?

EDIT:

(adding one more question)

(再添一个问题)

  1. Do the CFI directives take up any space? Because in the procedure "function", each int parameter take up 4 bytes and the number of it is 3, so all parameter takes 12 bytes in memory. Next, the first char array takes 8 bytes (round up 5bytes to 8bytes), and next char array takes 12bytes (round up 10bytes to 12bytes), so the whole char array takes 20 bytes. Summing these all, parameter and local variables only need 12+20=32 bytes.

    CFI指令是否会占用任何空间?因为在过程“函数”中,每个int参数占用4个字节而它的数量是3,所以所有参数在内存中占用12个字节。接下来,第一个char数组占用8个字节(向上舍入为5个字节到8个字节),下一个char数组占用12个字节(向上舍入为10个字节到12个字节),因此整个char数组占用20个字节。对这些全部,参数和局部变量求和只需要12 + 20 = 32个字节。

    But in the procedure "function", compiler subtract 48 bytes to store values. Why?

    但是在过程“function”中,编译器减去48个字节来存储值。为什么?

3 个解决方案

#1


1  

as per your request in reverse engineering i am putting the contents of my comments as answers here ( i dont know if this is going to remain as i see a severe competition to down-vote and up-vote your question there )

根据您在逆向工程中的请求,我将我的评论内容作为答案放在这里(我不知道这是否会继续,因为我看到一个激烈的竞争,向下投票并在那里向上投票)

Lindy Dancer Answered what cfi and cfa means (call frame information ) and (call frame address )

Lindy Dancer回答了cfi和cfa的含义(呼叫帧信息)和(呼叫帧地址)

.L<num> denotes labels as per various tidbits in Google in x64 GCC names all labels in the following format start with .L and end with a numeral so .L1 , .L2 , .L....infinity are labels

.L 表示Google中x64 GCC名称中各种花絮的标签。以下格式的所有标签都以.L开头,并以数字结尾.L1,.L2,.L .... infinity是标签

according to Google and some earlier SO answers BF<num> indicates Function-Begin and EF<num> indicates FUNCTION-END

根据Google和一些早期的SO答案,BF 表示Function-Begin,EF 表示FUNCTION-END

so .LBF0 , .LBF1 . LBF.....infinity and .LFE0 ,......., .LFE....infinity

所以.LBF0,.LBF1。 LBF .....无穷大和.LFE0,.......,。LFE ......无限

denotes function begins and function ends in each function which the compiler probably requires to take care of some internal needs so you should forget them at this moment unless there is a very grave need to dig into compiler internals

表示函数开始,函数在每个函数中结束,编译器可能需要这些函数来处理一些内部需求,所以你应该忘记它们,除非有一个非常严重的需要深入研究编译器内部

the other label .L2 exists to address the branching instruction je in your function

另一个标签.L2用于解决函数中的分支指令je

je  .L2

also every compiler aligns and pads the access to arguments and locals to certain boundary

此外,每个编译器都将对参数和本地的访问权限对齐并填充到某个边界

i can't be sure but x64 default align is 16 bytes I think for GCC so if you request an odd reservation like

我不能确定,但​​是对于GCC,我认为x64默认对齐是16个字节,所以如果你请求像奇怪的预订那样

char foo[5] or
BYTE blah [10]

char foo [5]或BYTE blah [10]

the indices 5 and 10 are not aligned even for x86

即使对于x86,索引5和10也不对齐

for 5 x86 compiler will assign8 bytes and for 10 16 bytes

对于5 x86编译器将分配8个字节和10个16字节

like wise x64 gcc might assign 16 bytes for each of your requests

像明智的x64 gcc可能会为每个请求分配16个字节

you actually shouldn't be worrying about why compiler does what it does

你实际上不应该担心为什么编译器会做它的功能

when you are trying to understand logic of assembly just concentrate on addresses

当你试图理解汇编的逻辑时,只关注地址

if the compiler decided that it will put x at rbp +/- X it will also access it at the same location through out the scope or life of that variable

如果编译器决定将x放在rbp +/- X,它也会在该变量的范围或生命周期内在同一位置访问它

#2


9  

CFI stands for call frame information. It's the way the compiler describes what happens in a function. It can be used by the debugger to present a call stack, by the linker to synthesise exceptions tables, for stack depth analysis and other things like that.

CFI代表呼叫帧信息。这是编译器描述函数中发生的事情的方式。调试器可以使用它来呈现调用堆栈,链接器可以合成异常表,进行堆栈深度分析以及其他类似的事情。

Effectively, it describes where resources such as processor registers are stored and where the return address is.

实际上,它描述了存储处理器寄存器等资源以及返回地址的位置。

CFA stands for call frame address, which mean the address the stack pointer location of the caller function. This is needed to pick up information about the next frame on the stack.

CFA代表调用帧地址,它表示调用者函数的堆栈指针位置的地址。这需要获取有关堆栈上下一帧的信息。

#3


1  

The 48 is to skip over both the arguments and the locals. The 5 byte array is aligned on an 8 byte boundary, and the 10 byte on a 16 byte boundary. The arguments take 8 bytes each, so 3*8 for arguments plus 8 + 16 for locals gives 24+24 or 48. You can see it in gdb just by asking for the address of each of those things.

48是跳过参数和本地人。 5字节数组在8字节边界上对齐,而10字节在16字节边界上对齐。参数每个需要8个字节,因此参数的3 * 8加上本地的8 + 16给出了24 + 24或48.您可以通过询问每个事物的地址在gdb中看到它。

#1


1  

as per your request in reverse engineering i am putting the contents of my comments as answers here ( i dont know if this is going to remain as i see a severe competition to down-vote and up-vote your question there )

根据您在逆向工程中的请求,我将我的评论内容作为答案放在这里(我不知道这是否会继续,因为我看到一个激烈的竞争,向下投票并在那里向上投票)

Lindy Dancer Answered what cfi and cfa means (call frame information ) and (call frame address )

Lindy Dancer回答了cfi和cfa的含义(呼叫帧信息)和(呼叫帧地址)

.L<num> denotes labels as per various tidbits in Google in x64 GCC names all labels in the following format start with .L and end with a numeral so .L1 , .L2 , .L....infinity are labels

.L 表示Google中x64 GCC名称中各种花絮的标签。以下格式的所有标签都以.L开头,并以数字结尾.L1,.L2,.L .... infinity是标签

according to Google and some earlier SO answers BF<num> indicates Function-Begin and EF<num> indicates FUNCTION-END

根据Google和一些早期的SO答案,BF 表示Function-Begin,EF 表示FUNCTION-END

so .LBF0 , .LBF1 . LBF.....infinity and .LFE0 ,......., .LFE....infinity

所以.LBF0,.LBF1。 LBF .....无穷大和.LFE0,.......,。LFE ......无限

denotes function begins and function ends in each function which the compiler probably requires to take care of some internal needs so you should forget them at this moment unless there is a very grave need to dig into compiler internals

表示函数开始,函数在每个函数中结束,编译器可能需要这些函数来处理一些内部需求,所以你应该忘记它们,除非有一个非常严重的需要深入研究编译器内部

the other label .L2 exists to address the branching instruction je in your function

另一个标签.L2用于解决函数中的分支指令je

je  .L2

also every compiler aligns and pads the access to arguments and locals to certain boundary

此外,每个编译器都将对参数和本地的访问权限对齐并填充到某个边界

i can't be sure but x64 default align is 16 bytes I think for GCC so if you request an odd reservation like

我不能确定,但​​是对于GCC,我认为x64默认对齐是16个字节,所以如果你请求像奇怪的预订那样

char foo[5] or
BYTE blah [10]

char foo [5]或BYTE blah [10]

the indices 5 and 10 are not aligned even for x86

即使对于x86,索引5和10也不对齐

for 5 x86 compiler will assign8 bytes and for 10 16 bytes

对于5 x86编译器将分配8个字节和10个16字节

like wise x64 gcc might assign 16 bytes for each of your requests

像明智的x64 gcc可能会为每个请求分配16个字节

you actually shouldn't be worrying about why compiler does what it does

你实际上不应该担心为什么编译器会做它的功能

when you are trying to understand logic of assembly just concentrate on addresses

当你试图理解汇编的逻辑时,只关注地址

if the compiler decided that it will put x at rbp +/- X it will also access it at the same location through out the scope or life of that variable

如果编译器决定将x放在rbp +/- X,它也会在该变量的范围或生命周期内在同一位置访问它

#2


9  

CFI stands for call frame information. It's the way the compiler describes what happens in a function. It can be used by the debugger to present a call stack, by the linker to synthesise exceptions tables, for stack depth analysis and other things like that.

CFI代表呼叫帧信息。这是编译器描述函数中发生的事情的方式。调试器可以使用它来呈现调用堆栈,链接器可以合成异常表,进行堆栈深度分析以及其他类似的事情。

Effectively, it describes where resources such as processor registers are stored and where the return address is.

实际上,它描述了存储处理器寄存器等资源以及返回地址的位置。

CFA stands for call frame address, which mean the address the stack pointer location of the caller function. This is needed to pick up information about the next frame on the stack.

CFA代表调用帧地址,它表示调用者函数的堆栈指针位置的地址。这需要获取有关堆栈上下一帧的信息。

#3


1  

The 48 is to skip over both the arguments and the locals. The 5 byte array is aligned on an 8 byte boundary, and the 10 byte on a 16 byte boundary. The arguments take 8 bytes each, so 3*8 for arguments plus 8 + 16 for locals gives 24+24 or 48. You can see it in gdb just by asking for the address of each of those things.

48是跳过参数和本地人。 5字节数组在8字节边界上对齐,而10字节在16字节边界上对齐。参数每个需要8个字节,因此参数的3 * 8加上本地的8 + 16给出了24 + 24或48.您可以通过询问每个事物的地址在gdb中看到它。