How can we implement the system call using sysenter/syscall directly in x86 Linux? Can anybody provide help? It would be even better if you can also show the code for amd64 platform.
如何在x86 Linux中直接使用sysenter/syscall实现系统调用?任何人都可以提供帮助吗?如果您还可以显示amd64平台的代码,那就更好了。
I know in x86, we can use
我知道在x86,我们可以使用。
__asm__(" movl $1, %eax \n"" movl $0, %ebx \n"" call *%gs:0x10 \n");
to route to sysenter indirectly.
间接路由到sysenter。
But how can we code using sysenter/syscall directly to issue a system call?
但是我们如何使用sysenter/syscall直接编码来发出系统调用呢?
I find some material http://damocles.blogbus.com/tag/sysenter/ . But still find it difficult to figure out.
我找到了一些资料http://damocles.blogbus.com/tag/sysenter/。但还是很难弄清楚。
1 个解决方案
#1
29
I'm going to show you how to execute system calls by writing a program that writes Hello World!
to standard output by using the write()
system call. Here's the source of the program without an implementation of the actual system call :
我将向您展示如何通过编写一个写Hello World的程序来执行系统调用!通过使用write()系统调用来实现标准输出。这里是程序的源代码,没有实际系统调用的实现:
#include <sys/types.h>ssize_t my_write(int fd, const void *buf, size_t size);int main(void){ const char hello[] = "Hello world!\n"; my_write(1, hello, sizeof(hello)); return 0;}
You can see that I named my custom system call function as my_write
in order to avoid name *es with the "normal" write
, provided by libc. The rest of this answer contains the source of my_write
for i386 and amd64.
您可以看到,我将自定义系统调用函数命名为my_write,以避免与libc提供的“常规”写入发生名称冲突。这个答案的其余部分包含了i386和amd64的my_write源。
i386
System calls in i386 Linux are implemented using the 128th interrupt vector, e.g. by calling int 0x80
in your assembly code, having set the parameters accordingly beforehand, of course. It is possible to do the same via SYSENTER
, but actually executing this instruction is achieved by the VDSO virtually mapped to each running process. Since SYSENTER
was never meant as a direct replacement of the int 0x80
API, it's never directly executed by userland applications - instead, when an application needs to access some kernel code, it calls the virtually mapped routine in the VDSO (that's what the call *%gs:0x10
in your code is for), which contains all the code supporting the SYSENTER
instruction. There's quite a lot of it because of how the instruction actually works.
在i386 Linux中,系统调用是使用第128个中断向量实现的,例如,在汇编代码中调用int 0x80,当然,预先设置了相应的参数。可以通过SYSENTER进行相同的操作,但是实际执行这个指令是由VDSO实际映射到每个运行过程的。因为SYSENTER从未意味着作为一个int 0 x80的直接替代API,它从来没有直接执行由用户态应用程序——相反,当应用程序需要访问一些内核代码,实际上它调用映射程序在VDSO(这就是调用* % g:0 x10在代码),它包含所有代码支持SYSENTER指令。因为指令是如何工作的,所以有很多。
If you want to read more about this, have a look at this link. It contains a fairly brief overview of the techniques applied in the kernel and the VDSO.
如果你想了解更多,请看这个链接。它包含了对在内核和VDSO中应用的技术的简要概述。
#define __NR_write 4ssize_t my_write(int fd, const void *buf, size_t size){ ssize_t ret; asm volatile ( "int $0x80" : "=a" (ret) : "0"(__NR_write), "b"(fd), "c"(buf), "d"(size) : "cc", "edi", "esi", "memory" ); return ret;}
As you can see, using the int 0x80
API is relatively simple. The number of the syscall goes to the eax
register, while all the parameters needed for the syscall go into respectively ebx
, ecx
, edx
, esi
, edi
, and ebp
. System call numbers can be obtained by reading the file /usr/include/asm/unistd_32.h
. Prototypes and descriptions of the functions are available in the 2nd section of the manual, so in this case write(2)
. Since the kernel is allowed to destroy practically any of the registers, I put all the remaining GPRs on the clobber list, as well as cc
, since the eflags
register is also likely to change. Keep in mind that the clobber list also contains the memory
parameter, which means that the instruction listed in the instruction list references memory (via the buf
parameter).
如您所见,使用int 0x80 API相对简单。syscall的数量进入eax寄存器,而syscall所需的所有参数分别进入ebx、ecx、edx、esi、edi和ebp。系统调用号可以通过读取文件/usr/include/asm/unistd_32.h获得。函数的原型和描述可以在手册的第二部分中找到,因此在本例中写入(2)。由于内核实际上可以销毁任何寄存器,因此我将所有剩余的GPRs放在clobber列表中,以及cc中,因为eflags寄存器也可能发生更改。记住,clobber列表还包含内存参数,这意味着指令列表中列出的指令引用内存(通过buf参数)。
amd64
Things look very different on the AMD64 architecture, which sports a new instruction called SYSCALL
. It is very different from the original SYSENTER
instruction, and definitely much easier to use from userland applications - it really resembles a normal CALL
, actually, and adapting the old int 0x80
to the new SYSCALL
is pretty much trivial.
在AMD64架构上,事情看起来非常不同,它采用了一种名为SYSCALL的新指令。它与原来的SYSENTER指令非常不同,而且从userland应用程序中使用起来肯定要容易得多——实际上它非常类似于一个普通的调用,并且将旧的int 0x80调整为新的SYSCALL非常简单。
In this case, the number of the system call is still passed in the register rax
, but the registers used to hold the arguments have severely changed, since now they should be used in the following order : rdi
, rsi
, rdx
, r10
, r8
and r9
. The kernel is allowed to destroy content of registers rcx
and r11
(they're used for saving some of the other registers by SYSCALL
).
在这种情况下,系统调用的数量仍然在寄存器rax中传递,但是用于保存参数的寄存器已经发生了严重的变化,因为现在它们应该按照以下顺序使用:rdi、rsi、rdx、r10、r8和r9。内核可以销毁寄存器rcx和r11的内容(它们被SYSCALL用于保存其他寄存器)。
#define __NR_write 1ssize_t my_write(int fd, const void *buf, size_t size){ ssize_t ret; asm volatile ( "syscall" : "=a" (ret) : "0"(__NR_write), "D"(fd), "S"(buf), "d"(size) : "cc", "rcx", "r11", "memory" ); return ret;}
Do notice how practically the only thing that needed changing were the register names, and the actual instruction used for making the call. This is mostly thanks to the input/output lists provided by gcc's extended inline assembly syntax, which automagically provides appropriate move instructions needed for executing the instruction list.
请注意,实际上唯一需要更改的是寄存器名和用于调用的实际指令。这主要归功于gcc扩展的内联程序集语法提供的输入/输出列表,它自动提供执行指令列表所需的适当的移动指令。
#1
29
I'm going to show you how to execute system calls by writing a program that writes Hello World!
to standard output by using the write()
system call. Here's the source of the program without an implementation of the actual system call :
我将向您展示如何通过编写一个写Hello World的程序来执行系统调用!通过使用write()系统调用来实现标准输出。这里是程序的源代码,没有实际系统调用的实现:
#include <sys/types.h>ssize_t my_write(int fd, const void *buf, size_t size);int main(void){ const char hello[] = "Hello world!\n"; my_write(1, hello, sizeof(hello)); return 0;}
You can see that I named my custom system call function as my_write
in order to avoid name *es with the "normal" write
, provided by libc. The rest of this answer contains the source of my_write
for i386 and amd64.
您可以看到,我将自定义系统调用函数命名为my_write,以避免与libc提供的“常规”写入发生名称冲突。这个答案的其余部分包含了i386和amd64的my_write源。
i386
System calls in i386 Linux are implemented using the 128th interrupt vector, e.g. by calling int 0x80
in your assembly code, having set the parameters accordingly beforehand, of course. It is possible to do the same via SYSENTER
, but actually executing this instruction is achieved by the VDSO virtually mapped to each running process. Since SYSENTER
was never meant as a direct replacement of the int 0x80
API, it's never directly executed by userland applications - instead, when an application needs to access some kernel code, it calls the virtually mapped routine in the VDSO (that's what the call *%gs:0x10
in your code is for), which contains all the code supporting the SYSENTER
instruction. There's quite a lot of it because of how the instruction actually works.
在i386 Linux中,系统调用是使用第128个中断向量实现的,例如,在汇编代码中调用int 0x80,当然,预先设置了相应的参数。可以通过SYSENTER进行相同的操作,但是实际执行这个指令是由VDSO实际映射到每个运行过程的。因为SYSENTER从未意味着作为一个int 0 x80的直接替代API,它从来没有直接执行由用户态应用程序——相反,当应用程序需要访问一些内核代码,实际上它调用映射程序在VDSO(这就是调用* % g:0 x10在代码),它包含所有代码支持SYSENTER指令。因为指令是如何工作的,所以有很多。
If you want to read more about this, have a look at this link. It contains a fairly brief overview of the techniques applied in the kernel and the VDSO.
如果你想了解更多,请看这个链接。它包含了对在内核和VDSO中应用的技术的简要概述。
#define __NR_write 4ssize_t my_write(int fd, const void *buf, size_t size){ ssize_t ret; asm volatile ( "int $0x80" : "=a" (ret) : "0"(__NR_write), "b"(fd), "c"(buf), "d"(size) : "cc", "edi", "esi", "memory" ); return ret;}
As you can see, using the int 0x80
API is relatively simple. The number of the syscall goes to the eax
register, while all the parameters needed for the syscall go into respectively ebx
, ecx
, edx
, esi
, edi
, and ebp
. System call numbers can be obtained by reading the file /usr/include/asm/unistd_32.h
. Prototypes and descriptions of the functions are available in the 2nd section of the manual, so in this case write(2)
. Since the kernel is allowed to destroy practically any of the registers, I put all the remaining GPRs on the clobber list, as well as cc
, since the eflags
register is also likely to change. Keep in mind that the clobber list also contains the memory
parameter, which means that the instruction listed in the instruction list references memory (via the buf
parameter).
如您所见,使用int 0x80 API相对简单。syscall的数量进入eax寄存器,而syscall所需的所有参数分别进入ebx、ecx、edx、esi、edi和ebp。系统调用号可以通过读取文件/usr/include/asm/unistd_32.h获得。函数的原型和描述可以在手册的第二部分中找到,因此在本例中写入(2)。由于内核实际上可以销毁任何寄存器,因此我将所有剩余的GPRs放在clobber列表中,以及cc中,因为eflags寄存器也可能发生更改。记住,clobber列表还包含内存参数,这意味着指令列表中列出的指令引用内存(通过buf参数)。
amd64
Things look very different on the AMD64 architecture, which sports a new instruction called SYSCALL
. It is very different from the original SYSENTER
instruction, and definitely much easier to use from userland applications - it really resembles a normal CALL
, actually, and adapting the old int 0x80
to the new SYSCALL
is pretty much trivial.
在AMD64架构上,事情看起来非常不同,它采用了一种名为SYSCALL的新指令。它与原来的SYSENTER指令非常不同,而且从userland应用程序中使用起来肯定要容易得多——实际上它非常类似于一个普通的调用,并且将旧的int 0x80调整为新的SYSCALL非常简单。
In this case, the number of the system call is still passed in the register rax
, but the registers used to hold the arguments have severely changed, since now they should be used in the following order : rdi
, rsi
, rdx
, r10
, r8
and r9
. The kernel is allowed to destroy content of registers rcx
and r11
(they're used for saving some of the other registers by SYSCALL
).
在这种情况下,系统调用的数量仍然在寄存器rax中传递,但是用于保存参数的寄存器已经发生了严重的变化,因为现在它们应该按照以下顺序使用:rdi、rsi、rdx、r10、r8和r9。内核可以销毁寄存器rcx和r11的内容(它们被SYSCALL用于保存其他寄存器)。
#define __NR_write 1ssize_t my_write(int fd, const void *buf, size_t size){ ssize_t ret; asm volatile ( "syscall" : "=a" (ret) : "0"(__NR_write), "D"(fd), "S"(buf), "d"(size) : "cc", "rcx", "r11", "memory" ); return ret;}
Do notice how practically the only thing that needed changing were the register names, and the actual instruction used for making the call. This is mostly thanks to the input/output lists provided by gcc's extended inline assembly syntax, which automagically provides appropriate move instructions needed for executing the instruction list.
请注意,实际上唯一需要更改的是寄存器名和用于调用的实际指令。这主要归功于gcc扩展的内联程序集语法提供的输入/输出列表,它自动提供执行指令列表所需的适当的移动指令。