[转载]Python模块学习 ---- subprocess 创建子进程

[转自]http://blog.sciencenet.cn/blog-600900-499638.html

最近，我们老大要我写一个守护者程序，对服务器进程进行守护。如果服务器不幸挂掉了，守护者能即时的重启应用程序。上网Google了一下，发现Python有很几个模块都可以创建进程。最终我选择使用subprocess模块，因为在Python手册中有这样一段话:

　　This module intends to replace several other, older modules and functions, such as: os.system、os.spawn*、os.popen*、popen2.*、commands.*

　　subprocess被用来替换一些老的模块和函数，如：os.system、os.spawn*、os.popen*、popen2.*、commands.*。可见，subprocess是被推荐使用的模块。

下面是一个很简单的例子，创建一个新进程，执行app1.exe，传入相当的参数，并打印出进程的返回值：

import subprocess
returnCode = subprocess.call('app1.exe -a -b -c -d')
print 'returncode:', returnCode
#----- 结果 --------
#Python is powerful
#app1.exe
#-a
#-b
#-c
#-d
returncode: 0

import subprocess returnCode = subprocess.call('app1.exe -a -b -c -d') print 'returncode:', returnCode #----- 结果 -------- #Python is powerful #app1.exe #-a #-b #-c #-d returncode: 0

app1.exe是一个非常简单的控制台程序，它只打印出传入的参数，代码如下：

#include <iostream>
using namespace std;
int main(int argc, const char *argv[])
{
cout << "Python is powerful" << endl;
for (int i = 0; i < argc; i++)
{
cout << argv[i] << endl;
}
return 0;
}

#include <iostream> using namespace std; int main(int argc, const char *argv[]) { cout << "Python is powerful" << endl; for (int i = 0; i < argc; i++) { cout << argv[i] << endl; } return 0; }

　　闲话少说，下面开始详细介绍subprocess模块。subprocess模块中只定义了一个类: Popen。可以使用Popen来创建进程，并与进程进行复杂的交互。它的构造函数如下：

subprocess.Popen(args, bufsize=0, executable=None, stdin=None, stdout=None, stderr=None, preexec_fn=None, close_fds=False, shell=False, cwd=None, env=None, universal_newlines=False, startupinfo=None, creationflags=0)

　　参数args可以是字符串或者序列类型（如：list，元组），用于指定进程的可执行文件及其参数。如果是序列类型，第一个元素通常是可执行文件的路径。我们也可以显式的使用executeable参数来指定可执行文件的路径。在windows操作系统上，Popen通过调用 CreateProcess()来创建子进程,CreateProcess接收一个字符串参数，如果args是序列类型，系统将会通过 list2cmdline()函数将序列类型转换为字符串。
　　参数bufsize：指定缓冲。我到现在还不清楚这个参数的具体含义，望各个大牛指点。
　　参数executable用于指定可执行程序。一般情况下我们通过args参数来设置所要运行的程序。如果将参数shell设为True，executable将指定程序使用的shell。在windows平台下，默认的shell由COMSPEC环境变量来指定。
　　参数stdin, stdout, stderr分别表示程序的标准输入、输出、错误句柄。他们可以是PIPE，文件描述符或文件对象，也可以设置为None，表示从父进程继承。
　　参数preexec_fn只在Unix平台下有效，用于指定一个可执行对象（callable object），它将在子进程运行之前被调用。
　　参数Close_sfs：在windows平台下，如果close_fds被设置为True，则新创建的子进程将不会继承父进程的输入、输出、错误管道。我们不能将close_fds设置为True同时重定向子进程的标准输入、输出与错误(stdin, stdout, stderr)。
　　如果参数shell设为true，程序将通过shell来执行。
　　参数cwd用于设置子进程的当前目录。
　　参数env是字典类型，用于指定子进程的环境变量。如果env = None，子进程的环境变量将从父进程中继承。
　　参数Universal_newlines:不同操作系统下，文本的换行符是不一样的。如：windows下用'/r/n'表示换，而Linux下用'/n'。如果将此参数设置为True，Python统一把这些换行符当作'/n'来处理。
　　参数startupinfo与createionflags只在windows下用效，它们将被传递给底层的CreateProcess()函数，用于设置子进程的一些属性，如：主窗口的外观，进程的优先级等等。

subprocess.PIPE

　　在创建Popen对象时，subprocess.PIPE可以初始化stdin, stdout或stderr参数。表示与子进程通信的标准流。

subprocess.STDOUT

　　创建Popen对象时，用于初始化stderr参数，表示将错误通过标准输出流输出。

Popen的方法：

Popen.poll()

　　用于检查子进程是否已经结束。设置并返回returncode属性。

Popen.wait()

　　等待子进程结束。设置并返回returncode属性。

Popen.communicate(input=None)

　　与子进程进行交互。向stdin发送数据，或从stdout和stderr中读取数据。可选参数input指定发送到子进程的参数。Communicate()返回一个元组：(stdoutdata, stderrdata)。注意：如果希望通过进程的stdin向其发送数据，在创建Popen对象的时候，参数stdin必须被设置为PIPE。同样，如果希望从stdout和stderr获取数据，必须将stdout和stderr设置为PIPE。

Popen.send_signal(signal)

　　向子进程发送信号。

Popen.terminate()

　　停止(stop)子进程。在windows平台下，该方法将调用Windows API TerminateProcess（）来结束子进程。

Popen.kill()

　　杀死子进程。

Popen.stdin

　　如果在创建Popen对象是，参数stdin被设置为PIPE，Popen.stdin将返回一个文件对象用于策子进程发送指令。否则返回None。

Popen.stdout

　　如果在创建Popen对象是，参数stdout被设置为PIPE，Popen.stdout将返回一个文件对象用于策子进程发送指令。否则返回None。

Popen.stderr

　　如果在创建Popen对象是，参数stdout被设置为PIPE，Popen.stdout将返回一个文件对象用于策子进程发送指令。否则返回None。

Popen.pid

　　获取子进程的进程ID。

Popen.returncode

　　获取进程的返回值。如果进程还没有结束，返回None。

下面是一个非常简单的例子，来演示supprocess模块如何与一个控件台应用程序进行交互。

import subprocess
p = subprocess.Popen("app2.exe", stdin = subprocess.PIPE, /
stdout = subprocess.PIPE, stderr = subprocess.PIPE, shell = False)
p.stdin.write('3/n')
p.stdin.write('4/n')
print p.stdout.read()
#---- 结果 ----
input x:
input y:
3 + 4 = 7

import subprocess p = subprocess.Popen("app2.exe", stdin = subprocess.PIPE, / stdout = subprocess.PIPE, stderr = subprocess.PIPE, shell = False) p.stdin.write('3/n') p.stdin.write('4/n') print p.stdout.read() #---- 结果 ---- input x: input y: 3 + 4 = 7

app2.exe也是一个非常简单的控制台程序，它从界面上接收两个数值，执行加操作，并将结果打印到控制台上。代码如下：

#include <iostream>
using namespace std;
int main(int argc, const char *artv[])
{
int x, y;
cout << "input x: " << endl;
cin >> x;
cout << "input y: " << endl;
cin >> y;
cout << x << " + " << y << " = " << x + y << endl;
return 0;
}

#include <iostream> using namespace std; int main(int argc, const char *artv[]) { int x, y; cout << "input x: " << endl; cin >> x; cout << "input y: " << endl; cin >> y; cout << x << " + " << y << " = " << x + y << endl; return 0; }

　　supprocess模块提供了一些函数，方便我们用于创建进程。

subprocess.call(*popenargs, **kwargs)

　　运行命令。该函数将一直等待到子进程运行结束，并返回进程的returncode。文章一开始的例子就演示了call函数。如果子进程不需要进行交互,就可以使用该函数来创建。

subprocess.check_call(*popenargs, **kwargs)

　　与subprocess.call(*popenargs, **kwargs)功能一样，只是如果子进程返回的returncode不为0的话，将触发CalledProcessError异常。在异常对象中，包括进程的returncode信息。

　　subprocess模块的内容就这么多。在Python手册中，还介绍了如何使用subprocess来替换一些老的模块，老的函数的例子。赶兴趣的朋友可以看一下。

在熟悉了Qt的QProcess以后，再回头来看python的subprocess总算不觉得像以前那么恐怖了。

和QProcess一样，subprocess的目标是启动一个新的进程并与之进行通讯。

subprocess.Popen

这个模块主要就提供一个类Popen：

class subprocess.Popen( args, bufsize=0, executable=None, stdin=None, stdout=None, stderr=None, preexec_fn=None, close_fds=False, shell=False, cwd=None, env=None, universal_newlines=False, startupinfo=None, creationflags=0)

这堆东西真让人抓狂：

args	字符串或者列表
bufsize	0 无缓冲 1 行缓冲其他正值缓冲区大小负值采用默认系统缓冲(一般是全缓冲)
executable	一般不用吧，args字符串或列表第一项表示程序名
stdin stdout stderr	None 没有任何重定向，继承父进程 PIPE 创建管道文件对象文件描述符(整数) stderr 还可以设置为 STDOUT
preexec_fn	钩子函数，在fork和exec之间执行。(unix)
close_fds	unix 下执行新进程前是否关闭0/1/2之外的文件 windows下不继承还是继承父进程的文件描述符
shell	为真的话 unix下相当于args前面添加了 "/bin/sh“ ”-c” window下，相当于添加"cmd.exe /c"
cwd	设置工作目录
env	设置环境变量
universal_newlines	各种换行符统一处理成 '\n'
startupinfo	window下传递给CreateProcess的结构体
creationflags	windows下，传递CREATE_NEW_CONSOLE创建自己的控制台窗口

当初最感到困扰的就是 args 参数。可以是一个字符串，可以是一个列表。

subprocess.Popen(["gedit","abc.txt"]) subprocess.Popen("gedit abc.txt")

这两个之中，后者将不会工作。因为如果是一个字符串的话，必须是程序的路径才可以。(考虑unix的api函数 exec，接受的是字符串列表)

但是下面的可以工作

subprocess.Popen("gedit abc.txt", shell=True)

这是因为它相当于

subprocess.Popen(["/bin/sh", "-c", "gedit abc.txt"])

都成了sh的参数，就无所谓了

在Windows下，下面的却又是可以工作的

subprocess.Popen(["notepad.exe", "abc.txt"]) subprocess.Popen("notepad.exe abc.txt")

这是由于windows下的api函数CreateProcess接受的是一个字符串。即使是列表形式的参数，也需要先合并成字符串再传递给api函数。

类似上面

subprocess.Popen("notepad.exe abc.txt" shell=True)

等价于

subprocess.Popen("cmd.exe /C "+"notepad.exe abc.txt" shell=True) subprocess.call*

模块还提供了几个便利函数（这本身也算是很好的Popen的使用例子了）

call() 执行程序，并等待它完成

def call(*popenargs, **kwargs): return Popen(*popenargs, **kwargs).wait()

check_call() 调用前面的call，如果返回值非零，则抛出异常

def check_call(*popenargs, **kwargs): retcode = call(*popenargs, **kwargs) if retcode: cmd = kwargs.get("args") raise CalledProcessError(retcode, cmd) return 0

check_output() 执行程序，并返回其标准输出

def check_output(*popenargs, **kwargs): process = Popen(*popenargs, stdout=PIPE, **kwargs) output, unused_err = process.communicate() retcode = process.poll() if retcode: cmd = kwargs.get("args") raise CalledProcessError(retcode, cmd, output=output) return output Popen对象

该对象提供有不少方法函数可用。而且前面已经用到了wait()/poll()/communicate()

poll()	检查是否结束，设置返回值
wait()	等待结束，设置返回值
communicate()	参数是标准输入，返回标准输出和标准出错
send_signal()	发送信号 (主要在unix下有用)
terminate()	终止进程，unix对应的SIGTERM信号，windows下调用api函数TerminateProcess()
kill()	杀死进程(unix对应SIGKILL信号)，windows下同上
stdin stdout stderr	参数中指定PIPE时，有用
pid	进程id
returncode	进程返回值

参考

Python vs BAT: 用Python来实现批处理

开始将原先的Windows批处理脚本适当的转为Python，好处是自然的，Python的脚本比Windows脚本好维护的多，不过转换不是那么简单直白，一一记录一些心得：

命令行参数

Windows批处理的参数，通常就是通过命令行或者环境变量传给bat，前者就是bat中常见的%1,%2,shift这些东西，在python中可以用OptionParser来实现，OptionParser是我用过最方便的命令行参数解析模块了，可以参考网友总结的中文说明，或者参考一下代码，基本也就清楚如何使用了：

try: from optparse import OptionParser except ImportError: try: from optik import OptionParser except ImportError: raise ImportError, 'Requires Python 2.3 or the Optik option parsing library.' parser = OptionParser(usage=u"这个脚本用于测试") parser.add_option('-p', '--project', dest='project',default=os.path.normpath(os.path.join(os.getcwd(), '../..')), help=u'设置项目目录，缺省为:当前目录的上两级目录') parser.add_option("-s", '--dosvn', action="store_true", dest="dosvn", help=u'设置是否检测svn，缺省为不检测') parser.add_option('-w', '--waittimeout', dest='waittimeout',type="int", default=300, help=u'设置启动时的等待超时，缺省为300秒') (options, args) = parser.parse_args(sys.argv[1:])

如果是环境变量，bat中用%environ%的形式，python中则可以用 os.environ.get(”prompt”)，这个对应比较直白自然。

其实Windows还支持一些很奇怪的变量（官方叫做Modifier），像是%~dp0，这个是表示当前批处理文件所在目录，这是因为Windows的批处理是在功能有限，也不支持函数，对于一些很常用的操作没办法，只能用这些密码一样的符号来实现了。

启动进程

批处理中最方便的功能就是顺序启动一个个进程了，当然也包括cmd自己的内部命令（比如dir什么的）或者call其他的批处理文件，这些在python中统一归subprocess这个模块来做，官方的文档已近给出用subprocess替代原先的诸如os.system，os.spawn，os.popen之类调用的方法，因为subprocess足够的灵活和强大。比如在bat中想捕获一个子进程的输出到一个变量中，得使用这样的难懂的语法，我前面的帖子中曾经给出过一个这样的例子：

for /F %%A in ('svnlook author -r %REV% %1') do @set AUTHOR=%%A

在Python的subprocess下，就比较简单了：

process = subprocess.Popen(target, shell=True, stdout=subprocess.PIPE, stderr=subprocess.STDOUT) (stdoutput,erroutput) = process.communicate() return stdoutput

获取进程返回

批处理中常常通过ERRORLEVEL来判断进程的返回值，来决定下一步的执行，这在python没有问题，用Popen建立的子进程有一个wait方法会等待子进程执行完毕，并返回子进程的return code。

不过对于调用一个bat的情况，则需要通过call来调用，才能得到它的和ERRORLEVEL一样效果的返回值，这点在文档中没有提及，却是非常重要的，下面的函数算是一个例子（绝大部分情况下，使用useCall和useShell都没错）：

def run(self, target, useCall=True, useShell=True, cwd=None): if useCall: target = "call " + target process = subprocess.Popen(target, shell=useShell, cwd=cwd) process.wait() return process.returncode

重定向

这是一个从bat转到python的难点，Windows基本上支持了和Unix类似的stdin、stdout、stderr以及管道等机制，这些在bat中可以和容易的调用，可以在写批处理时比较容易的灵活运用，而在python下要实现还是比较复杂的，不过其实理解以后也不会太难，主要是通过 subprocess和python自己的文件处理功能相结合，下面是一个较复杂的例子，说明怎么通过tee.exe来实现子进程的的标准输出、标准错误输出同时定向到屏幕和文件：

self.tee = subprocess.Popen(["tee", LOG_FILE], stdin=subprocess.PIPE) process = subprocess.Popen(target, shell=True, stdout=self.tee.stdin.fileno(), stderr=subprocess.STDOUT)

tee是来自unixutils的工具，广泛用来解决标准输入同时到标准输出和文件的一个小工具，不知道不用tee的话python怎么简单的解决这个问题，不过用tee的话上面的代码倒是很直接明了。

其他

还有什么批处理转到Python需要注意的，好像没有太多了，基本上，这种转换，一次受累，长时间受益。

subprocess再解析

之前已经写过一篇关于Python subprocess的帖子了，subprocess是Python下标准的用于进程创建、通讯的模块，这里再补充一些，注意：我还一直坚守Python2.x，所以不一定适合Python 3。

subprocess简单用法

这是最简单的用法：

p=subprocess.Popen("dir", shell=True) p.wait()

shell参数根据你要执行的命令的情况来决定，上面是dir命令，就一定要shell=True了，p.wait()可以得到命令的返回值，没有问题。

进程通讯

如果想得到进程的输出，管道是个很方便的方法，这样：

p=subprocess.Popen("dir", shell=True, stdout=subprocess.PIPE, stderr=subprocess.PIPE) (stdoutput,erroutput) = p.communicate()

p.communicate会一直等到进程退出，并将标准输出和标准错误输出返回，这样就可以得到子进程的输出了，上面，标准输出和标准错误输出是分开的，也可以合并起来，只需要将stderr参数设置为subprocess.STDOUT就可以了，这样子：

p=subprocess.Popen("dir", shell=True, stdout=subprocess.PIPE, stderr=subprocess.STDOUT) (stdoutput,erroutput) = p.communicate()

如果你想一行行处理子进程的输出，也没有问题：

p=subprocess.Popen("dir", shell=True, stdout=subprocess.PIPE, stderr=subprocess.STDOUT) while True: buff = p.stdout.readline() if buff == '' and p.poll() != None: break

死锁

但是如果你使用了管道，而又不去处理管道的输出，那么小心点，如果子进程输出数据过多，死锁就会发生了，比如下面的用法：

p=subprocess.Popen("longprint", shell=True, stdout=subprocess.PIPE, stderr=subprocess.STDOUT) p.wait()

longprint是一个假想的有大量输出的进程，那么在我的xp, Python2.5的环境下，当输出达到4096时，死锁就发生了。当然，如果我们用p.stdout.readline或者p.communicate 去清理输出，那么无论输出多少，死锁都是不会发生的。或者我们不使用管道，比如不做重定向，或者重定向到文件，也都是可以避免死锁的。

异步subprocess

无论是使用readline还是communicate，这里有个问题是：他们都是同步的，你没有办法在等待子进程输出的同时做点别的什么事情，标准的subprocess是不支持异步和子进程交互的，幸好，幸好，有人提供了Python 3下的异步方法，我移植到Python2.5下面，可以这样用了：

p=subprocess.Popen("dir", shell=True, stdout=subprocess.PIPE, stderr=subprocess.STDOUT) while True: buff = p.asyncread(timeout=0.5) if buff == '' and p.poll() != None: break

这里带一个超时去读取子进程的输出，如果超时还没有输出，没关系，父进程可以干点别的什么事情，看起来很棒，subprocess的改动部分代码比较长，不在这里贴了。

关闭
在Python 2.6的subprocess模块中，新增加了一个小的接口就是Terminate，用于进程的终结，可惜可惜，十分遗憾，Windows下，这个 Terminate只能杀死subprocess创建的进程，而不能杀死其子进程，如果我们明确知道创建的进程没有子进程，当然可以用这个接口，如果不肯定，则这个接口就没什么用了。

举个简单的例子，如果用shell=True的参数让subprocess创建进程，那么就会多出一个额外的cmd进程，这时用Terminate终结的就是这个cmd进程，而真正那个我们创建的进程则不会被终结。

有很多方法可以对付这个问题，但有一个简单的方法是使用Windows自己提供的taskkill命令，它有一个/T参数，可以杀死一个进程树，正是我们所需要的。subprocess创建的进程有一个pid属性，把pid传给taskkill就ok了