Windows上的系统调用本身是否比Linux慢?

时间:2021-07-13 03:03:13

My understanding of system calls is that in Linux the system call mechanism (int 0x80 or whatever) is documented and guaranteed to be stable across different kernel versions. Using this information, the system calls are implemented directly in the CRT library, so that when I call e.g. printf("a"); this involves a single function call to the CRT, where the system call is set up and activated. In theory this can be improved further by statically compiling the CRT (not common on Linux, but a possibility) so that even the single function call may be inlined.

我对系统调用的理解是,在Linux中,系统调用机制(int 0x80或其他)被记录在案,并保证在不同的内核版本中保持稳定。使用此信息,系统调用直接在CRT库中实现,因此当我调用例如的printf( “A”);这涉及到CRT的单个函数调用,其中系统调用被设置并激活。从理论上讲,这可以通过静态编译CRT(在Linux上不常见,但有可能)进一步改进,这样即使单个函数调用也可以内联。

On the other hand, Windows does not document or even guarantee consistency of the system call mechanism. The only way to make a system call on Windows is to call into ntdll.dll (or maybe some other *.dll) which is done from the CRT, so there are two function calls involved. If the CRT is used statically and the function gets inlined (slightly more common on Windows than Linux) we still have the single function call into ntdll.dll that we can't get rid of.

另一方面,Windows不记录甚至不保证系统调用机制的一致性。在Windows上进行系统调用的唯一方法是调用从CRT完成的ntdll.dll(或者其他一些* .dll),因此涉及两个函数调用。如果静态使用CRT并且函数内联(在Windows上比Linux稍微常见),我们仍然有对ntdll.dll的单个函数调用,我们无法摆脱它。

So it seems to me that theoretically system calls on Windows will be inherently slower since they always have to do one function call more than their Linux equivalents. Is this understanding (and my explanation above) true?

因此,在我看来理论上,Windows上的系统调用本身就会变慢,因为它们总是必须执行一个函数调用,而不是Linux等价函数。这种理解(以及我上面的解释)是真的吗?

Note: I am asking this purely theoretically. I understand that when doing a system call (which I think always involves 2 context switches - one in each direction) the cost of an extra function call is probably completely negligible.

注意:我纯粹在理论上问这个问题。我知道在进行系统调用时(我认为总是涉及2个上下文切换 - 每个方向一个),额外函数调用的成本可能完全可以忽略不计。

1 个解决方案

#1


9  

On IA-32 there are two ways to make a system call:

在IA-32上,有两种方法可以进行系统调用:

  • using int/iret instructions
  • 使用int / iret指令

  • using sysenter/sysexit instructions
  • 使用sysenter / sysexit说明

Pure int/iret based system call takes 211 CPU cycles (and even much more on modern processors). Sysenter/sysexit takes 46 CPU ticks. As you can see execution of only a pair of instructions used for system call introduces significant overhead. But any system call implementation involves some work on the kernel side (setup of kernel context, dispatching of the call and its arguments etc.). More or less realistic highly optimized system call will take ~250 and ~100 CPU cycles for int/iret and sysenter/sysexit based system calls respectively. In Linux and Windows it will take ~500 ticks.

基于纯int / iret的系统调用需要211个CPU周期(现代处理器甚至更多)。 Sysenter / sysexit占用46个CPU。正如您所看到的,只有一对用于系统调用的指令的执行会带来很大的开销。但是任何系统调用实现都涉及内核方面的一些工作(内核上下文的设置,调用的调度及其参数等)。或多或少现实的高度优化的系统调用将分别为基于int / iret和sysenter / sysexit的系统调用花费约250和~100个CPU周期。在Linux和Windows中,它需要大约500个滴答。

In the same time, function call (based on call/ret) have a cost of 2-4 tics + 1 for each argument.

同时,函数调用(基于call / ret)每个参数的成本为2-4 tics + 1。

As you can see, overhead introduced by function call is negligible in comparision to the system call cost.

如您所见,函数调用引入的开销与系统调用成本的比较可以忽略不计。

On other hand, if you embed raw system calls in your application, you will make it highly hardware dependent. For example, what if your application with embedded sysenter/sysexit based raw system call will be executed on old PC without these instructions support? In addition your application will be sensitive for system call call convention used by OS.

另一方面,如果在应用程序中嵌入原始系统调用,则会使其高度依赖硬件。例如,如果您的应用程序嵌入了基于sysenter / sysexit的原始系统调用将在没有这些指令支持的旧PC上执行,该怎么办?此外,您的应用程序将对OS使用的系统调用约定敏感。

Such libraries like ntdll.dll and glibc are commonly used, because they provide well-known and hardware independent interface for the system services and hides details of the communication with kernel behind the scene.

像ntdll.dll和glibc这样的库是常用的,因为它们为系统服务提供了众所周知的和独立于硬件的接口,并隐藏了与场景后面的内核通信的细节。

Linux and Windows have approximately the same cost of system calls if use the same way of crossing the user/kernel space border (difference will be negligible). Both trying to use fastest way possible on each particular machine. All modern Windows versions starting at least from Windows XP are prepared for sysenter/sysexit. Some old and/or specific versions of Linux can still use int/iret based calls. x64 versions of OSes relies to syscall/sysret instructions which works like the sysenter/sysexit and available as part of AMD64 instructions set.

如果使用相同的方式跨越用户/内核空间边界,Linux和Windows的系统调用成本大致相同(差异可以忽略不计)。两者都试图在每台特定的机器上使用最快的方式。至少从Windows XP开始的所有现代Windows版本都是为sysenter / sysexit准备的。一些旧的和/或特定版本的Linux仍然可以使用基于int / iret的调用。 x64版本的操作系统依赖于syscall / sysret指令,这些指令的作用类似于sysenter / sysexit,可作为AMD64指令集的一部分使用。

#1


9  

On IA-32 there are two ways to make a system call:

在IA-32上,有两种方法可以进行系统调用:

  • using int/iret instructions
  • 使用int / iret指令

  • using sysenter/sysexit instructions
  • 使用sysenter / sysexit说明

Pure int/iret based system call takes 211 CPU cycles (and even much more on modern processors). Sysenter/sysexit takes 46 CPU ticks. As you can see execution of only a pair of instructions used for system call introduces significant overhead. But any system call implementation involves some work on the kernel side (setup of kernel context, dispatching of the call and its arguments etc.). More or less realistic highly optimized system call will take ~250 and ~100 CPU cycles for int/iret and sysenter/sysexit based system calls respectively. In Linux and Windows it will take ~500 ticks.

基于纯int / iret的系统调用需要211个CPU周期(现代处理器甚至更多)。 Sysenter / sysexit占用46个CPU。正如您所看到的,只有一对用于系统调用的指令的执行会带来很大的开销。但是任何系统调用实现都涉及内核方面的一些工作(内核上下文的设置,调用的调度及其参数等)。或多或少现实的高度优化的系统调用将分别为基于int / iret和sysenter / sysexit的系统调用花费约250和~100个CPU周期。在Linux和Windows中,它需要大约500个滴答。

In the same time, function call (based on call/ret) have a cost of 2-4 tics + 1 for each argument.

同时,函数调用(基于call / ret)每个参数的成本为2-4 tics + 1。

As you can see, overhead introduced by function call is negligible in comparision to the system call cost.

如您所见,函数调用引入的开销与系统调用成本的比较可以忽略不计。

On other hand, if you embed raw system calls in your application, you will make it highly hardware dependent. For example, what if your application with embedded sysenter/sysexit based raw system call will be executed on old PC without these instructions support? In addition your application will be sensitive for system call call convention used by OS.

另一方面,如果在应用程序中嵌入原始系统调用,则会使其高度依赖硬件。例如,如果您的应用程序嵌入了基于sysenter / sysexit的原始系统调用将在没有这些指令支持的旧PC上执行,该怎么办?此外,您的应用程序将对OS使用的系统调用约定敏感。

Such libraries like ntdll.dll and glibc are commonly used, because they provide well-known and hardware independent interface for the system services and hides details of the communication with kernel behind the scene.

像ntdll.dll和glibc这样的库是常用的,因为它们为系统服务提供了众所周知的和独立于硬件的接口,并隐藏了与场景后面的内核通信的细节。

Linux and Windows have approximately the same cost of system calls if use the same way of crossing the user/kernel space border (difference will be negligible). Both trying to use fastest way possible on each particular machine. All modern Windows versions starting at least from Windows XP are prepared for sysenter/sysexit. Some old and/or specific versions of Linux can still use int/iret based calls. x64 versions of OSes relies to syscall/sysret instructions which works like the sysenter/sysexit and available as part of AMD64 instructions set.

如果使用相同的方式跨越用户/内核空间边界,Linux和Windows的系统调用成本大致相同(差异可以忽略不计)。两者都试图在每台特定的机器上使用最快的方式。至少从Windows XP开始的所有现代Windows版本都是为sysenter / sysexit准备的。一些旧的和/或特定版本的Linux仍然可以使用基于int / iret的调用。 x64版本的操作系统依赖于syscall / sysret指令,这些指令的作用类似于sysenter / sysexit,可作为AMD64指令集的一部分使用。