Linux下可执行文件的格式

时间:2022-12-30 08:40:00

一、前言

         通常,操作系统为了加载一个程序,会在编译后的代码的前面添加一个文件头,提供相应的定位信息,这样操作系统才能在加载EXE时将代码段、数据段加载到正确的内存位置。同时,有些编译器还会提供一些调试信息,如符号表等。如果是.o文件,通常称为relocatable file,这种文件没有经过链接,需要进行重定位,不可以执行。如果是EXE文件,称为executable file,经过连接器链接的可以直接执行,这时文件中的虚拟地址是最终的。操作系统可以设定加载的段基地址,也就是操作系统可以将整个EXE加载到任意位置,但是必须按照EXE中的信息将相应的段加载到合适的位置,相对距离不变,这样代码才能正确执行。提供文件头的EXE文件依赖于加载器的加载,如execve()系统调用,然而操作系统的初始阶段是没有加载器的,我们只能直接跳到某条指令开始执行,这时需要纯二进制文件(raw binary),代码的入口即为文件的第一条语句。有工具可以将EXE文件转换为纯二进制文件,即objcopy。这里,我们通过研究64位可执行文件的格式,以及利用工具objdump将编译后的机器指令反汇编为汇编指令,来了解一些EXE的信息。

二、求最大值的GNU汇编代码max.s

#开头的为注释,下同

#数据段

.section .data

data_items:

        .long 'H','E','L','L','O','_','W','O','R','L','D','!','!',0#使用long类型是为了看大端和小端

#代码

.section .text

#将入口地址声明为全局可见,默认是局部可见

.globl  _start

_start:

        #GNU汇编中左边是源操作数,右边是目标操作数,intel汇编正好相反

        #常数要加$,不加$的符号视为地址,寄存器前面要加%

        movl $0, %edi

        movl data_items(,%edi,4), %eax  # (data_items+ 4*edi) →  eax

        #data_items的第一个数据放入寄存器ebx中,ebx保存最大值

        movl %eax, %ebx# eax → ebx

start_loop:

        #数据为0时结束,表示没有数据了

        cmpl $0, %eax

        je loop_exit

        incl %edi

        movl data_items(,%edi,4), %eax# (data_items+ 4*edi) →  eax

        cmpl %ebx, %eax

        jle start_loop# eax <= ebx

        movl %eax, %ebx# eax > ebx,赋给eax → ebx

        jmp start_loop

loop_exit:

        movl $1, %eax# 1号系统调用,exit(ebx),结束进程

        int $0x80

三、编译和运行

环境:ubuntu 15.04

编译:gcc -c -o max.o max.s

链接:ld -o max max.o

运行./max

运行之后通过echo $?可以查看该命令的退出状态,该状态即为最大值,95。

gcc中有指示编译成32位的选项-m32,此时代码段和数据段的对齐就不会是0x200000,距离会变得比较短。对应ld要加-m elf_i386选项,指明为32位平台。

ld中有指示代码段的加载地址的选项-Ttext,如-Ttext 0,则加载地址为0

四、EXE文件的格式

4.1 查看max的ELF等定位信息

命令:readelf -a max

-a表示查看所有ELF信息

可以得到如下的输出信息:

ELF Header:

  Magic:   7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00 #EXE文件的魔数

  Class:                             ELF64

  Data:                              2's complement, little endian

  Version:                           1 (current)

  OS/ABI:                            UNIX - System V

  ABI Version:                       0

  Type:                              EXEC (Executable file)#EXE文件

  Machine:                           Advanced Micro Devices X86-64

  Version:                           0x1

  Entry point address:               0x4000b0 #程序入口地址,虚拟地址

  Start of program headers:   64 (bytes into file)#文件中program headers 的偏移

  Start of section headers:   656 (bytes into file)#文件中section headers的偏移

  Flags:                             0x0

  Size of this header:               64 (bytes)#ELF header的大小

  Size of program headers:           56 (bytes)#program headers的大小

  Number of program headers:         2 #program headers的个数

  Size of section headers:           64 (bytes) #section headers的大小

  Number of section headers:         6#section headers的个数

  Section header string table index: 3

 

Section Headers:

  [Nr] Name              Type             Address           Offset

       Size              EntSize          Flags  Link  Info  Align

  [ 0]                   NULL             0000000000000000  00000000

       0000000000000000  0000000000000000           0     0     0

#代码段入口地址0x4000b0,文件偏移地址0xb0,大小为0x2d

  [ 1] .text             PROGBITS         00000000004000b0  000000b0

       000000000000002d  0000000000000000  AX       0     0     1

#数据段入口地址0x6000dd,文件偏移地址0xdd,大小为0x38

  [ 2] .data             PROGBITS         00000000006000dd  000000dd

       0000000000000038  0000000000000000  WA       0     0     1

#节名表入口地址0x0,文件偏移地址0x115,大小为0x27

  [ 3] .shstrtab         STRTAB           0000000000000000  00000115

       0000000000000027  0000000000000000           0     0     1

#符号表入口地址0x0,文件偏移地址0x140,大小为0x108

  [ 4] .symtab           SYMTAB           0000000000000000  00000140

       0000000000000108  0000000000000018           5     7     8

#字符串表入口地址0x0,文件偏移地址0x248,大小为0x48

  [ 5] .strtab           STRTAB           0000000000000000  00000248

       0000000000000048  0000000000000000           0     0     1

Key to Flags:

  W (write), A (alloc), X (execute), M (merge), S (strings), l (large)

  I (info), L (link order), G (group), T (TLS), E (exclude), x (unknown)

  O (extra OS processing required) o (OS specific), p (processor specific)

 

There are no section groups in this file.

#program headers 提供段定位信息

Program Headers:

  Type           Offset             VirtAddr           PhysAddr

                 FileSiz            MemSiz              Flags  Align

 #代码段,读和可执行,虚拟地址0x400000 →物理地址0x400000,文件偏移0

#长度为#0xdd,对齐为0x200000

#包含ELF header和代码段

 LOAD           0x0000000000000000 0x0000000000400000 0x0000000000400000

                 0x00000000000000dd 0x00000000000000dd  R E    200000

#数据段,读和写,虚拟地址0x6000dd →物理地址0x6000dd,文件偏移0xdd,长度为#0x38,对齐为0x200000

  LOAD           0x00000000000000dd 0x00000000006000dd 0x00000000006000dd

                 0x0000000000000038 0x0000000000000038  RW     200000

 

 Section to Segment mapping:

  Segment Sections...

   00     .text

   01     .data

 

There is no dynamic section in this file.

 

There are no relocations in this file.

 

The decoding of unwind sections for machine type Advanced Micro Devices X86-64 is not currently supported.

#符号表:程序中的符号及其对应的地址

Symbol table '.symtab' contains 11 entries:

   Num:    Value          Size Type    Bind   Vis      Ndx Name

     0: 0000000000000000     0 NOTYPE  LOCAL  DEFAULT  UND

     1: 00000000004000b0     0 SECTION LOCAL  DEFAULT    1

     2: 00000000006000dd     0 SECTION LOCAL  DEFAULT    2

     3: 0000000000000000     0 FILE    LOCAL  DEFAULT  ABS max.o

     4: 00000000006000dd     0 NOTYPE  LOCAL  DEFAULT    2 data_items

     5: 00000000004000bf     0 NOTYPE  LOCAL  DEFAULT    1 start_loop

     6: 00000000004000d6     0 NOTYPE  LOCAL  DEFAULT    1 loop_exit

     7: 00000000004000b0     0 NOTYPE  GLOBAL DEFAULT    1 _start

     8: 0000000000600115     0 NOTYPE  GLOBAL DEFAULT    2 __bss_start

     9: 0000000000600115     0 NOTYPE  GLOBAL DEFAULT    2 _edata

    10: 0000000000600118     0 NOTYPE  GLOBAL DEFAULT    2 _end

 

No version information found in this file.


4.2 反汇编代码

命令:objdump -d max

-d表示反汇编

输出:

file format elf64-x86-64

Disassembly of section .text:

#根据program headers提供的信息,最终代码段将加载到0x4000b0这个位置

00000000004000b0 <_start>:

  4000b0: bf 00 00 00 00                     mov    $0x0,%edi

#data_items被换成0x6000dd,即数据段的起始地址

  4000b5: 67 8b 04 bd dd 00 60  mov    0x6000dd(,%edi,4),%eax

  4000bc: 00

  4000bd: 89 c3                       mov    %eax,%ebx

#start_looploop_exit都被换掉

00000000004000bf <start_loop>:

  4000bf: 83 f8 00                           cmp    $0x0,%eax

  4000c2: 74 12                       je     4000d6 <loop_exit>

  4000c4: ff c7                              inc    %edi

  4000c6: 67 8b 04 bd dd 00 60  mov    0x6000dd(,%edi,4),%eax

  4000cd: 00

  4000ce: 39 d8                       cmp    %ebx,%eax

  4000d0: 7e ed                       jle    4000bf <start_loop>

  4000d2: 89 c3                       mov    %eax,%ebx

  4000d4: eb e9                       jmp    4000bf <start_loop>

00000000004000d6 <loop_exit>:

  4000d6: b8 01 00 00 00                     mov    $0x1,%eax

  4000db: cd 80                       int    $0x80

4.3 max文件的二进制内容及对应关系

命令:xxd -g 1 max

查看整个文件,默认偏移为0

输出:


0000000: 7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00  .ELF............#ELF header

0000010: 02 00 3e 00 01 00 00 00 b0 00 40 00 00 00 00 00  ..>.......@.....#偏移:0

0000020: 40 00 00 00 00 00 00 00 90 02 00 00 00 00 00 00  @...............

0000030: 00 00 00 00 40 00 38 00 02 00 40 00 06 00 03 00  ....@.8...@.....#长度:64B

 

0000040: 01 00 00 00 05 00 00 00 00 00 00 00 00 00 00 00  ................#program headers

0000050: 00 00 40 00 00 00 00 00 00 00 40 00 00 00 00 00  ..@.......@.....#偏移:0x40

0000060: dd 00 00 00 00 00 00 00 dd 00 00 00 00 00 00 00  ................ #长度: 56B x 2

0000070: 00 00 20 00 00 00 00 00 01 00 00 00 06 00 00 00  .. ..........…

0000080: dd 00 00 00 00 00 00 00 dd 00 60 00 00 00 00 00  ..........`.....

0000090: dd 00 60 00 00 00 00 00 38 00 00 00 00 00 00 00  ..`.....8.......

00000a0: 38 00 00 00 00 00 00 00 00 00 20 00 00 00 00 00  8......... .....

 

00000b0: bf 00 00 00 00 67 8b 04 bd dd 00 60 00 89 c3 83  .....g.....`....#代码段 

00000c0: f8 00 74 12 ff c7 67 8b 04 bd dd 00 60 00 39 d8  ..t...g.....`.9.#偏移:0xb0

00000d0: 7e ed 89 c3 eb e9 b8 01 00 00 00 cd 80 48 00 00  ~............H..#长度:0x2d字节

00000e0: 00 45 00 00 00 4c 00 00 00 4c 00 00 00 4f 00 00  .E...L...L...O.. #数据段

00000f0: 00 5f 00 00 00 57 00 00 00 4f 00 00 00 52 00 00  ._...W...O...R..#偏移: 0xdd

0000100: 00 4c 00 00 00 44 00 00 00 21 00 00 00 21 00 00  .L...D...!...!..#长度: 0x38字节

0000110: 00 00 00 00 00 00 2e 73 79 6d 74 61 62 00 2e 73  .......symtab..s#节名表shstrtab

0000120: 74 72 74 61 62 00 2e 73 68 73 74 72 74 61 62 00  trtab..shstrtab.#偏移: 0x115

0000130: 2e 74 65 78 74 00 2e 64 61 74 61 00 00 00 00 00  .text..data.....#长度: 0x27

 

0000140: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................#符号表.symtab

0000150: 00 00 00 00 00 00 00 00 00 00 00 00 03 00 01 00  ................#11条目x 24字节

0000160: b0 00 40 00 00 00 00 00 00 00 00 00 00 00 00 00  ..@.............#对应下面符号的地址

0000170: 00 00 00 00 03 00 02 00 dd 00 60 00 00 00 00 00  ..........`.....#偏移:0x140

0000180: 00 00 00 00 00 00 00 00 01 00 00 00 04 00 f1 ff  ................#长度: 0x108

0000190: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................

00001a0: 07 00 00 00 00 00 02 00 dd 00 60 00 00 00 00 00  ..........`.....#data_items

00001b0: 00 00 00 00 00 00 00 00 12 00 00 00 00 00 01 00  ................#start_loop

00001c0: bf 00 40 00 00 00 00 00 00 00 00 00 00 00 00 00  ..@.............

00001d0: 1d 00 00 00 00 00 01 00 d6 00 40 00 00 00 00 00  ..........@.....#loop_exit

00001e0: 00 00 00 00 00 00 00 00 27 00 00 00 10 00 01 00  ........'.......#_start

00001f0: b0 00 40 00 00 00 00 00 00 00 00 00 00 00 00 00  ..@.............

0000200: 2e 00 00 00 10 00 02 00 15 01 60 00 00 00 00 00  ..........`.....#_bss_start

0000210: 00 00 00 00 00 00 00 00 3a 00 00 00 10 00 02 00  ........:.......

0000220: 15 01 60 00 00 00 00 00 00 00 00 00 00 00 00 00  ..`.............#_edata

0000230: 41 00 00 00 10 00 02 00 18 01 60 00 00 00 00 00  A.........`.....#_end

0000240: 00 00 00 00 00 00 00 00 00 6d 61 78 2e 6f 00 64  .........max.o.d#字符串表strtab

0000250: 61 74 61 5f 69 74 65 6d 73 00 73 74 61 72 74 5f  ata_items.start_#偏移: 0x248

0000260: 6c 6f 6f 70 00 6c 6f 6f 70 5f 65 78 69 74 00 5f  loop.loop_exit._ #长度 : 0x46

0000270: 73 74 61 72 74 00 5f 5f 62 73 73 5f 73 74 61 72  start.__bss_star

0000280: 74 00 5f 65 64 61 74 61 00 5f 65 6e 64 00 00 00  t._edata._end…

 

0000290: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................#section headers

00002a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................偏移:0x290

00002b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................#64B x 6

00002c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................#

00002d0: 1b 00 00 00 01 00 00 00 06 00 00 00 00 00 00 00  ................

00002e0: b0 00 40 00 00 00 00 00 b0 00 00 00 00 00 00 00  ..@.............#.text

00002f0: 2d 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  -............…

0000300: 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................

0000310: 21 00 00 00 01 00 00 00 03 00 00 00 00 00 00 00  !...............

0000320: dd 00 60 00 00 00 00 00 dd 00 00 00 00 00 00 00  ..`.............

0000330: 38 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  8...............#.data

0000340: 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................

0000350: 11 00 00 00 03 00 00 00 00 00 00 00 00 00 00 00  ................

0000360: 00 00 00 00 00 00 00 00 15 01 00 00 00 00 00 00  ................

0000370: 27 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  '...............#.shstrtab

0000380: 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................

0000390: 01 00 00 00 02 00 00 00 00 00 00 00 00 00 00 00  ................

00003a0: 00 00 00 00 00 00 00 00 40 01 00 00 00 00 00 00  ........@.......

00003b0: 08 01 00 00 00 00 00 00 05 00 00 00 07 00 00 00  ................#.symtab

00003c0: 08 00 00 00 00 00 00 00 18 00 00 00 00 00 00 00  ................

00003d0: 09 00 00 00 03 00 00 00 00 00 00 00 00 00 00 00  ................

00003e0: 00 00 00 00 00 00 00 00 48 02 00 00 00 00 00 00  ........H.......#.strtab

00003f0: 46 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  F...............

0000400: 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................


五、关系图

5.1 EXE文件中的关系

 Linux下可执行文件的格式

注:箭头未必表示先后关系

5.2 代码文件的结构

ELF header : 64B

program headers : 56B x 2

.text : 45B

.data : 56B

.shstrtab : 39B

.symtab : 24B x 11

.strtab : 70B

section headers : 64B x 6


六、EXE文件与BIN文件的转换

6.1 抽取代码段和数据段

     要将带有可执行文件头和调试信息的EXE文件转换为纯文本文件,可以用如下命令:

objcopy -O binary -R .note -R .comment max max_copy

表示将max输出为二进制文件,保存在max_copy中,忽略.note和.comment的字段。

6.2 查看代码段

命令:xxd -g 1 -l 256 max_copy

查看开头的256个字节

得到开头的代码段:

0000000: bf 00 00 00 00 67 8b 04 bd dd 00 60 00 89 c3 83  .....g.....`....

0000010: f8 00 74 12 ff c7 67 8b 04 bd dd 00 60 00 39 d8  ..t...g.....`.9.

0000020: 7e ed 89 c3 eb e9 b8 01 00 00 00 cd 80 00 00 00  ~...............

0000030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................

0000040: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................

0000050: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................

0000060: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................

0000070: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................

0000080: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................

0000090: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................

00000a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................

00000b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................

00000c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................

00000d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................

00000e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................

00000f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................



6.3 查看数据段

命令:xxd -g 1 -l 256 -s 0x20002d max_copy

-s 表示offset,从0x20002d(= 数据段加载地址0x6000dd - 代码段加载地址0x4000b0)

开始展示,-g 表示每组是1个字节的十六进制,-l表示展示256个字节。

得到数据段:

020002d: 48 00 00 00 45 00 00 00 4c 00 00 00 4c 00 00 00  H...E...L...L...

020003d: 4f 00 00 00 5f 00 00 00 57 00 00 00 4f 00 00 00  O..._...W...O...

020004d: 52 00 00 00 4c 00 00 00 44 00 00 00 21 00 00 00  R...L...D...!...

020005d: 21 00 00 00 00 00 00 00                          !.......

可以看出,max_copy刚好只包含了代码段和数据段,且代码段位于文件开头。