I am porting Windows application to Linux. I use CreateProcess
on Windows to run child processes and redirect all standard streams (in, out, error). Streams redirect is critical, main process sends data to children and receives theirs output and error messages. Main process is very big one with a lot of memory and threads, and child processes are small ones. On Linux I see that fork
function has similar functionality as CreateProcess
on Windows. However, manual says that fork
"creates parent process copy", including code, data and stack. Does it mean that if I create copy of a huge process that uses 1 GB of memory just to run a very simple command line tool that uses 1 MB of memory itself, I will need to fist duplicate 1 GB of memory with fork
, and then replace this 1 GB with 1 MB process? So, if I have 100 threads it will be required to have 100 GB of memory to run 100 processes that need just 100 MB of memory to run? Also what about other threads in parent process that "don't know" about fork
execution, what will they do? What fork
function does "under the hood" and is it really effective way to create a lot of small child processes from huge parent?
我正在将Windows应用程序移植到Linux。我在Windows上使用CreateProcess来运行子进程并重定向所有标准流(in,out,error)。流重定向很关键,主进程将数据发送给子进程并接收他们的输出和错误消息。主进程非常大,具有大量内存和线程,子进程是小进程。在Linux上我看到fork函数具有与Windows上的CreateProcess类似的功能。但是,manual说fork“创建父进程副本”,包括代码,数据和堆栈。这是否意味着如果我创建一个使用1 GB内存的巨大进程的副本只是为了运行一个使用1 MB内存的非常简单的命令行工具,我将需要用fork来复制1 GB的内存,然后用1 MB进程替换这1 GB?那么,如果我有100个线程,它将需要100 GB的内存来运行100个需要100 MB内存才能运行的进程?那么父进程中的其他线程如何“不知道”关于fork执行呢,他们会做什么?什么fork函数“幕后”,是否是从巨大的父级创建大量小子进程的真正有效方法?
5 个解决方案
#1
7
When you call fork()
then initially only your VM is copied and all pages are marked copy-on write. Your new child process will have a logical copy of your parent processes VM, but it will not consume any additional RAM until you actually start writing to it.
当您调用fork()时,最初只复制您的VM并且所有页面都标记为copy-on write。您的新子进程将具有父进程VM的逻辑副本,但在您实际开始写入之前,它不会消耗任何额外的RAM。
As for threads, fork
creates only one new thread in the child process that resembles a copy of the calling thread.
至于线程,fork在子进程中只创建一个类似于调用线程副本的新线程。
Also as soon as you call any of the exec
family of calls (which I assume you want to) then your entire process image is replaced with a new one and only file descriptors are kept.
此外,只要您调用任何exec系列调用(我假设您想要),那么您的整个过程映像将被替换为新映像,并且只保留文件描述符。
If your parent process has a lot of open file descriptors then I suggest you go through /proc/self/fd
and close all file descriptors in the child that you don't need.
如果你的父进程有很多打开的文件描述符,那么我建议你通过/ proc / self / fd并关闭你不需要的子进程中的所有文件描述符。
#2
2
The copies are "copy-on-write", so if your child process does not modify the data, it will not use any memory besides that of the father process. Typically, after a fork()
, the child process makes an exec()
to replace the program of this process with a different one, then all the memory is dropped anyway.
副本是“写时复制”,因此如果您的子进程不修改数据,除了父进程之外,它不会使用任何内存。通常,在fork()之后,子进程使exec()用不同的程序替换此进程的程序,然后无论如何都要删除所有内存。
#3
2
fork
basically splits your process into two, with both parent and child processes continuing at the instruction after the fork
function call. However, the return value value in the child process is 0, whilst in the parent process it is the process id of the child process.
fork基本上将您的进程拆分为两个,在fork函数调用之后,父进程和子进程都继续执行指令。但是,子进程中的返回值为0,而在父进程中,它是子进程的进程ID。
The creation of the child process is extremly quick since it uses the same pages as the parent. The pages are marker as copy-on-write (COW) so that if either process changes the page then the other won't be affected. Once the child process exists it usually calls one of the exec
functions to replace itself with a image. Windows doesn't have an equivilant to fork
, instead the CreateProcess
call only allows you to start a new process.
子进程的创建非常快,因为它使用与父进程相同的页面。页面标记为写入时复制(COW),因此如果任一进程更改页面,则另一个不会受到影响。一旦子进程存在,它通常会调用其中一个exec函数来替换自己的图像。 Windows没有与fork等效,而CreateProcess调用只允许您启动一个新进程。
There is an alternative to fork
called clone which gives you much more control over what happens when the new process is started. For example you can specify a function to call in the new process.
有一个名为clone的fork的替代方法,它可以让您更好地控制新进程启动时发生的情况。例如,您可以指定要在新进程中调用的函数。
#4
1
I haven't used CreateProcess
, but fork()
is not an exact copy of the process. It creates a child process, but the child starts its execution at the same instruction in which the parent called fork
, and continues from there.
我没有使用CreateProcess,但fork()不是该过程的精确副本。它创建一个子进程,但是子进程在父进程调用fork的同一指令处开始执行,并从那里继续执行。
I recommend taking a look at Chapter 5 of the Three Easy Pieces OS book. This may get you started and you might find the child spawning call you're looking for.
我建议看一下Three Easy Pieces OS书的第5章。这可能会让你开始,你可能会找到你正在寻找的孩子产卵电话。
#5
1
The forked child process has almost all the parent facility copied: memory, descriptors, text etc. The only exception is parents' threads, they are not copied.
分叉子进程几乎复制了所有父设施:内存,描述符,文本等。唯一的例外是父进程的线程,它们不被复制。
#1
7
When you call fork()
then initially only your VM is copied and all pages are marked copy-on write. Your new child process will have a logical copy of your parent processes VM, but it will not consume any additional RAM until you actually start writing to it.
当您调用fork()时,最初只复制您的VM并且所有页面都标记为copy-on write。您的新子进程将具有父进程VM的逻辑副本,但在您实际开始写入之前,它不会消耗任何额外的RAM。
As for threads, fork
creates only one new thread in the child process that resembles a copy of the calling thread.
至于线程,fork在子进程中只创建一个类似于调用线程副本的新线程。
Also as soon as you call any of the exec
family of calls (which I assume you want to) then your entire process image is replaced with a new one and only file descriptors are kept.
此外,只要您调用任何exec系列调用(我假设您想要),那么您的整个过程映像将被替换为新映像,并且只保留文件描述符。
If your parent process has a lot of open file descriptors then I suggest you go through /proc/self/fd
and close all file descriptors in the child that you don't need.
如果你的父进程有很多打开的文件描述符,那么我建议你通过/ proc / self / fd并关闭你不需要的子进程中的所有文件描述符。
#2
2
The copies are "copy-on-write", so if your child process does not modify the data, it will not use any memory besides that of the father process. Typically, after a fork()
, the child process makes an exec()
to replace the program of this process with a different one, then all the memory is dropped anyway.
副本是“写时复制”,因此如果您的子进程不修改数据,除了父进程之外,它不会使用任何内存。通常,在fork()之后,子进程使exec()用不同的程序替换此进程的程序,然后无论如何都要删除所有内存。
#3
2
fork
basically splits your process into two, with both parent and child processes continuing at the instruction after the fork
function call. However, the return value value in the child process is 0, whilst in the parent process it is the process id of the child process.
fork基本上将您的进程拆分为两个,在fork函数调用之后,父进程和子进程都继续执行指令。但是,子进程中的返回值为0,而在父进程中,它是子进程的进程ID。
The creation of the child process is extremly quick since it uses the same pages as the parent. The pages are marker as copy-on-write (COW) so that if either process changes the page then the other won't be affected. Once the child process exists it usually calls one of the exec
functions to replace itself with a image. Windows doesn't have an equivilant to fork
, instead the CreateProcess
call only allows you to start a new process.
子进程的创建非常快,因为它使用与父进程相同的页面。页面标记为写入时复制(COW),因此如果任一进程更改页面,则另一个不会受到影响。一旦子进程存在,它通常会调用其中一个exec函数来替换自己的图像。 Windows没有与fork等效,而CreateProcess调用只允许您启动一个新进程。
There is an alternative to fork
called clone which gives you much more control over what happens when the new process is started. For example you can specify a function to call in the new process.
有一个名为clone的fork的替代方法,它可以让您更好地控制新进程启动时发生的情况。例如,您可以指定要在新进程中调用的函数。
#4
1
I haven't used CreateProcess
, but fork()
is not an exact copy of the process. It creates a child process, but the child starts its execution at the same instruction in which the parent called fork
, and continues from there.
我没有使用CreateProcess,但fork()不是该过程的精确副本。它创建一个子进程,但是子进程在父进程调用fork的同一指令处开始执行,并从那里继续执行。
I recommend taking a look at Chapter 5 of the Three Easy Pieces OS book. This may get you started and you might find the child spawning call you're looking for.
我建议看一下Three Easy Pieces OS书的第5章。这可能会让你开始,你可能会找到你正在寻找的孩子产卵电话。
#5
1
The forked child process has almost all the parent facility copied: memory, descriptors, text etc. The only exception is parents' threads, they are not copied.
分叉子进程几乎复制了所有父设施:内存,描述符,文本等。唯一的例外是父进程的线程,它们不被复制。