How does kernel get an executable binary file running under linux?
内核如何获得在linux下运行的可执行二进制文件?
It seems a simple question, but anyone can help me dig deep? How the file is loaded to memory and how execution code get started?
这似乎是一个简单的问题,但任何人都可以帮助我深入挖掘?如何将文件加载到内存以及执行代码如何启动?
Can anyone help me and tell what's happening step by step?
任何人都可以帮助我,一步一步地告诉我们发生了什么吗?
4 个解决方案
#1
17
Best moments of the exec
system call on Linux 4.0
在Linux 4.0上执行exec系统的最佳时刻
-
fs/exec.c
defines the system call atSYSCALL_DEFINE3(execve
fs / exec.c定义SYSCALL_DEFINE3的系统调用(execve
Simply forwards to
do_execve
.简单地转发到do_execve。
-
do_execve
do_execve
Forwards to
do_execveat_common
.转发到do_execveat_common。
-
do_execveat_common
do_execveat_common
To find the next major function, track when return value
retval
is last modified.要查找下一个主要功能,请跟踪上次修改返回值retval的时间。
Starts building a
struct linux_binprm *bprm
to describe the program, and passes it toexec_binprm
to execute.开始构建struct linux_binprm * bprm来描述程序,并将其传递给exec_binprm来执行。
-
exec_binprm
exec_binprm
Once again, follow the return value to find the next major call.
再次按照返回值查找下一个主要调用。
-
search_binary_handler
search_binary_handler
-
Handlers are determined by the first magic bytes of the executable.
处理程序由可执行文件的第一个魔术字节确定。
The two most common handlers are those for interpreted files (
#!
magic) and for ELF (\x7fELF
magic), but there are other built-into the kernel, e.g.a.out
. And users can also register their own though /proc/sys/fs/binfmt_misc两个最常见的处理程序是用于解释文件(#!magic)和用于ELF(\ x7fELF magic)的处理程序,但是内核中还有其他内置处理程序,例如: a.out的。用户也可以通过/ proc / sys / fs / binfmt_misc注册自己的
The ELF handler is defined at
fs/binfmt_elf.c
.ELF处理程序在fs / binfmt_elf.c中定义。
-
The
formats
list contains all the handlers.格式列表包含所有处理程序。
Each handler file contains something like:
每个处理程序文件都包含以
static int __init init_elf_binfmt(void) { register_binfmt(&elf_format); return 0; }
and
elf_format
is astruct linux_binfmt
defined in that file.和elf_format是该文件中定义的struct linux_binfmt。
__init
is magic and puts that code into a magic section that gets called when the kernel starts: What does __init mean in the Linux kernel code?__init是魔术并将该代码放入一个魔术部分,在内核启动时调用它:__ init在Linux内核代码中意味着什么?
Linker-level dependency injection!
链接器级依赖注入!
-
There is also a recursion counter, in case an interpreter executes itself infinitely.
如果解释器无限地执行自身,还有一个递归计数器。
Try this:
尝试这个:
echo '#!/tmp/a' > /tmp/a chmod +x /tmp/a /tmp/a
-
Once again we follow the return value to see what comes next, and see that it comes from:
我们再次按照返回值查看接下来会发生什么,并看到它来自:
retval = fmt->load_binary(bprm);
where
load_binary
is defined for each handler on the struct: C-style polymorsphism.其中为struct上的每个处理程序定义了load_binary:C风格的polymorsphism。
-
-
fs/binfmt_elf.c:load_binary
FS / binfmt_elf.c:load_binary
Does the actual work:
实际工作是否:
- parses the ELF file according to the specs
- 根据规范解析ELF文件
- sets up the process initial program state based on the parsed ELF (memory into a
struct linux_binprm
, registers into astruct pt_regs
) - 基于解析的ELF设置进程初始程序状态(将内存放入struct linux_binprm,注册到struct pt_regs)
- call
start_thread
, which is where it can really start to getting scheduled - 调用start_thread,它可以真正开始调度
TODO: continue source analysis further. What I expect to happen next:
TODO:继续进行源分析。我期望接下来会发生什么:
- the kernel parses the INTERP header of the ELF to find the dynamic loader (usually set to
/lib64/ld-linux-x86-64.so.2
). - 内核解析ELF的INTERP头以找到动态加载器(通常设置为/lib64/ld-linux-x86-64.so.2)。
- the kernel mmaps the dynamic loader and the ELF to be executed to memory
- 内核将动态加载程序和ELF执行到内存
- dynamic loader is started, taking a pointer to the ELF in memory.
- 启动动态加载程序,将指针指向内存中的ELF。
- now in userland, the loader somehow parses elf headers, and does
dlopen
on them - 现在在userland中,加载器以某种方式解析elf头文件,并对它们进行dlopen
-
dlopen
uses a configurable search path to find those libraries (ldd
and friends), mmap them to memory, and somehow inform the ELF where to find its missing symbols - dlopen使用可配置的搜索路径来查找这些库(ldd和朋友),将它们映射到内存,并以某种方式通知ELF在哪里找到它缺少的符号
- loader calls the
_start
of the ELF - loader调用ELF的_start
#2
8
After reading the ELF docs already referenced, you should just read the kernel code that actually does it.
在阅读已经引用的ELF文档之后,您应该只读取实际执行它的内核代码。
If you have trouble understanding that code, build a UML Linux, and you could step through that code in the debugger.
如果您无法理解该代码,请构建一个UML Linux,并且可以在调试器中单步执行该代码。
#3
8
Two system calls from the linux kernel are relevant. The fork system call (or perhaps vfork
or clone
) is used to create a new process, similar to the calling one (every Linux user-land process except init
is created by fork
or friends). The execve system call replace the process address space by a fresh one (essentially by sort-of mmap-ing segments from the ELF executable and anonymous segments, then initializing the registers, including the stack pointer). The x86-64 ABI supplement and the Linux assembly howto give details.
来自linux内核的两个系统调用是相关的。 fork系统调用(或者可能是vfork或clone)用于创建一个新进程,类似于调用进程(除了init之外的每个Linux用户进程都由fork或friends创建)。 execve系统调用用一个新的系统调用替换进程地址空间(主要是从ELF可执行文件和匿名段中排序的mmap段,然后初始化寄存器,包括堆栈指针)。 x86-64 ABI补充和Linux程序集如何提供详细信息。
The dynamic linking happens after execve
and involves the /lib/x86_64-linux-gnu/ld-2.13.so
file, which for ELF is viewed as an "interpreter".
动态链接发生在execve之后,涉及/lib/x86_64-linux-gnu/ld-2.13.so文件,ELF被视为“解释器”。
#4
2
You can start by understanding executable file formats, such as ELF. http://en.wikipedia.org/wiki/Executable_and_Linkable_Format
您可以从了解可执行文件格式开始,例如ELF。 http://en.wikipedia.org/wiki/Executable_and_Linkable_Format
The ELF file contains several sections with headers that describes how and where parts of the binary should be loaded into memory.
ELF文件包含几个带有标题的部分,这些标题描述了应该如何以及在何处将二进制文件的部分加载到内存中。
Then, I suggest reading up on the part of linux that loads binaries and handles dynamic linking, ld-linux. This is also a good description of ld-linux: http://www.cs.virginia.edu/~dww4s/articles/ld_linux.html
然后,我建议阅读linux中加载二进制文件并处理动态链接的部分ld-linux。这也是ld-linux的一个很好的描述:http://www.cs.virginia.edu/~dww4s/articles/ld_linux.html
#1
17
Best moments of the exec
system call on Linux 4.0
在Linux 4.0上执行exec系统的最佳时刻
-
fs/exec.c
defines the system call atSYSCALL_DEFINE3(execve
fs / exec.c定义SYSCALL_DEFINE3的系统调用(execve
Simply forwards to
do_execve
.简单地转发到do_execve。
-
do_execve
do_execve
Forwards to
do_execveat_common
.转发到do_execveat_common。
-
do_execveat_common
do_execveat_common
To find the next major function, track when return value
retval
is last modified.要查找下一个主要功能,请跟踪上次修改返回值retval的时间。
Starts building a
struct linux_binprm *bprm
to describe the program, and passes it toexec_binprm
to execute.开始构建struct linux_binprm * bprm来描述程序,并将其传递给exec_binprm来执行。
-
exec_binprm
exec_binprm
Once again, follow the return value to find the next major call.
再次按照返回值查找下一个主要调用。
-
search_binary_handler
search_binary_handler
-
Handlers are determined by the first magic bytes of the executable.
处理程序由可执行文件的第一个魔术字节确定。
The two most common handlers are those for interpreted files (
#!
magic) and for ELF (\x7fELF
magic), but there are other built-into the kernel, e.g.a.out
. And users can also register their own though /proc/sys/fs/binfmt_misc两个最常见的处理程序是用于解释文件(#!magic)和用于ELF(\ x7fELF magic)的处理程序,但是内核中还有其他内置处理程序,例如: a.out的。用户也可以通过/ proc / sys / fs / binfmt_misc注册自己的
The ELF handler is defined at
fs/binfmt_elf.c
.ELF处理程序在fs / binfmt_elf.c中定义。
-
The
formats
list contains all the handlers.格式列表包含所有处理程序。
Each handler file contains something like:
每个处理程序文件都包含以
static int __init init_elf_binfmt(void) { register_binfmt(&elf_format); return 0; }
and
elf_format
is astruct linux_binfmt
defined in that file.和elf_format是该文件中定义的struct linux_binfmt。
__init
is magic and puts that code into a magic section that gets called when the kernel starts: What does __init mean in the Linux kernel code?__init是魔术并将该代码放入一个魔术部分,在内核启动时调用它:__ init在Linux内核代码中意味着什么?
Linker-level dependency injection!
链接器级依赖注入!
-
There is also a recursion counter, in case an interpreter executes itself infinitely.
如果解释器无限地执行自身,还有一个递归计数器。
Try this:
尝试这个:
echo '#!/tmp/a' > /tmp/a chmod +x /tmp/a /tmp/a
-
Once again we follow the return value to see what comes next, and see that it comes from:
我们再次按照返回值查看接下来会发生什么,并看到它来自:
retval = fmt->load_binary(bprm);
where
load_binary
is defined for each handler on the struct: C-style polymorsphism.其中为struct上的每个处理程序定义了load_binary:C风格的polymorsphism。
-
-
fs/binfmt_elf.c:load_binary
FS / binfmt_elf.c:load_binary
Does the actual work:
实际工作是否:
- parses the ELF file according to the specs
- 根据规范解析ELF文件
- sets up the process initial program state based on the parsed ELF (memory into a
struct linux_binprm
, registers into astruct pt_regs
) - 基于解析的ELF设置进程初始程序状态(将内存放入struct linux_binprm,注册到struct pt_regs)
- call
start_thread
, which is where it can really start to getting scheduled - 调用start_thread,它可以真正开始调度
TODO: continue source analysis further. What I expect to happen next:
TODO:继续进行源分析。我期望接下来会发生什么:
- the kernel parses the INTERP header of the ELF to find the dynamic loader (usually set to
/lib64/ld-linux-x86-64.so.2
). - 内核解析ELF的INTERP头以找到动态加载器(通常设置为/lib64/ld-linux-x86-64.so.2)。
- the kernel mmaps the dynamic loader and the ELF to be executed to memory
- 内核将动态加载程序和ELF执行到内存
- dynamic loader is started, taking a pointer to the ELF in memory.
- 启动动态加载程序,将指针指向内存中的ELF。
- now in userland, the loader somehow parses elf headers, and does
dlopen
on them - 现在在userland中,加载器以某种方式解析elf头文件,并对它们进行dlopen
-
dlopen
uses a configurable search path to find those libraries (ldd
and friends), mmap them to memory, and somehow inform the ELF where to find its missing symbols - dlopen使用可配置的搜索路径来查找这些库(ldd和朋友),将它们映射到内存,并以某种方式通知ELF在哪里找到它缺少的符号
- loader calls the
_start
of the ELF - loader调用ELF的_start
#2
8
After reading the ELF docs already referenced, you should just read the kernel code that actually does it.
在阅读已经引用的ELF文档之后,您应该只读取实际执行它的内核代码。
If you have trouble understanding that code, build a UML Linux, and you could step through that code in the debugger.
如果您无法理解该代码,请构建一个UML Linux,并且可以在调试器中单步执行该代码。
#3
8
Two system calls from the linux kernel are relevant. The fork system call (or perhaps vfork
or clone
) is used to create a new process, similar to the calling one (every Linux user-land process except init
is created by fork
or friends). The execve system call replace the process address space by a fresh one (essentially by sort-of mmap-ing segments from the ELF executable and anonymous segments, then initializing the registers, including the stack pointer). The x86-64 ABI supplement and the Linux assembly howto give details.
来自linux内核的两个系统调用是相关的。 fork系统调用(或者可能是vfork或clone)用于创建一个新进程,类似于调用进程(除了init之外的每个Linux用户进程都由fork或friends创建)。 execve系统调用用一个新的系统调用替换进程地址空间(主要是从ELF可执行文件和匿名段中排序的mmap段,然后初始化寄存器,包括堆栈指针)。 x86-64 ABI补充和Linux程序集如何提供详细信息。
The dynamic linking happens after execve
and involves the /lib/x86_64-linux-gnu/ld-2.13.so
file, which for ELF is viewed as an "interpreter".
动态链接发生在execve之后,涉及/lib/x86_64-linux-gnu/ld-2.13.so文件,ELF被视为“解释器”。
#4
2
You can start by understanding executable file formats, such as ELF. http://en.wikipedia.org/wiki/Executable_and_Linkable_Format
您可以从了解可执行文件格式开始,例如ELF。 http://en.wikipedia.org/wiki/Executable_and_Linkable_Format
The ELF file contains several sections with headers that describes how and where parts of the binary should be loaded into memory.
ELF文件包含几个带有标题的部分,这些标题描述了应该如何以及在何处将二进制文件的部分加载到内存中。
Then, I suggest reading up on the part of linux that loads binaries and handles dynamic linking, ld-linux. This is also a good description of ld-linux: http://www.cs.virginia.edu/~dww4s/articles/ld_linux.html
然后,我建议阅读linux中加载二进制文件并处理动态链接的部分ld-linux。这也是ld-linux的一个很好的描述:http://www.cs.virginia.edu/~dww4s/articles/ld_linux.html