I have some Python code that executes an external app which works fine when the app has a small amount of output, but hangs when there is a lot. My code looks like:
p = subprocess.Popen(cmd, shell=True, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
errcode = p.wait()
retval = p.stdout.read()
errmess = p.stderr.read()
if errcode:
log.error('cmd failed <%s>: %s' % (errcode,errmess))
There are comments in the docs that seem to indicate the potential issue. Under wait, there is:
Warning: This will deadlock if the child process generates enough output to a
pipe such that it blocks waiting for the OS pipe buffer to accept more data. Usecommunicate()
to avoid that.警告:如果子进程生成足够的输出到stdout或stderr管道,这将阻塞等待OS管道缓冲区接受更多数据。使用communic()来避免这种情况。
though under communicate, I see:
Note The data read is buffered in memory, so do not use this method if the data size is large or unlimited.
So it is unclear to me that I should use either of these if I have a large amount of data. They don't indicate what method I should use in that case.
I do need the return value from the exec and do parse and use both the stdout
and stderr
So what is an equivalent method in Python to exec an external app that is going to have large output?
7 个解决方案
You're doing blocking reads to two files; the first needs to complete before the second starts. If the application writes a lot to stderr
, and nothing to stdout
, then your process will sit waiting for data on stdout
that isn't coming, while the program you're running sits there waiting for the stuff it wrote to stderr
to be read (which it never will be--since you're waiting for stdout
你正在阻止对两个文件的读取;第一个需要在第二个开始之前完成。如果应用程序向stderr写了很多内容,而stdout没有任何内容,那么你的进程将等待stdout上的数据没有到来,而你正在运行的程序就在那里等待它写入stderr的东西被读取(它永远不会 - 因为你正在等待stdout)。
There are a few ways you can fix this.
The simplest is to not intercept stderr
; leave stderr=None
. Errors will be output to stderr
directly. You can't intercept them and display them as part of your own message. For commandline tools, this is often OK. For other apps, it can be a problem.
最简单的是不拦截stderr;离开stderr =无。错误将直接输出到stderr。您无法拦截它们并将其显示为您自己的消息的一部分。对于命令行工具,这通常没问题。对于其他应用程序,它可能是一个问题。
Another simple approach is to redirect stderr
to stdout
, so you only have one incoming file: set stderr=STDOUT
. This means you can't distinguish regular output from error output. This may or may not be acceptable, depending on how the application writes output.
另一个简单的方法是将stderr重定向到stdout,因此您只有一个传入文件:set stderr = STDOUT。这意味着您无法区分常规输出和错误输出。这取决于应用程序如何写入输出,这可能是也可能是不可接受的。
The complete and complicated way of handling this is select
(http://docs.python.org/library/select.html). This lets you read in a non-blocking way: you get data whenever data appears on either stdout
or stderr
. I'd only recommend this if it's really necessary. This probably doesn't work in Windows.
A lot of output is subjective so it's a little difficult to make a recommendation. If the amount of output is really large then you likely don't want to grab it all with a single read() call anyway. You may want to try writing the output to a file and then pull the data in incrementally like such:
p = subprocess.Popen(cmd, shell=True, stdout=f, stderr=subprocess.PIPE)
errcode = p.wait()
if errcode:
errmess = p.stderr.read()
log.error('cmd failed <%s>: %s' % (errcode,errmess))
for line in file('data.out'):
#do something
Glenn Maynard is right in his comment about deadlocks. However, the best way of solving this problem is two create two threads, one for stdout and one for stderr, which read those respective streams until exhausted and do whatever you need with the output.
The suggestion of using temporary files may or may not work for you depending on the size of output etc. and whether you need to process the subprocess' output as it is generated.
As Heikki Toivonen has suggested, you should look at the communicate
method. However, this buffers the stdout/stderr of the subprocess in memory and you get those returned from the communicate
call - this is not ideal for some scenarios. But the source of the communicate method is worth looking at.
正如Heikki Toivonen建议的那样,你应该看一下沟通方法。但是,这会将子进程的stdout / stderr缓存在内存中,并从通信调用中返回 - 这对某些情况来说并不理想。但是沟通方法的来源值得关注。
Another example is in a package I maintain, python-gnupg, where the gpg
executable is spawned via subprocess
to do the heavy lifting, and the Python wrapper spawns threads to read gpg's stdout and stderr and consume them as data is produced by gpg. You may be able to get some ideas by looking at the source there, as well. Data produced by gpg to both stdout and stderr can be quite large, in the general case.
Reading stdout
and stderr
independently with very large output (ie, lots of megabytes) using select
import subprocess, select
proc = subprocess.Popen(cmd, bufsize=8192, shell=False, \
stdout=subprocess.PIPE, stderr=subprocess.PIPE)
with open(outpath, "wb") as outf:
dataend = False
while (proc.returncode is None) or (not dataend):
dataend = False
ready = select.select([proc.stdout, proc.stderr], [], [], 1.0)
if proc.stderr in ready[0]:
data = proc.stderr.read(1024)
if len(data) > 0:
if proc.stdout in ready[0]:
data = proc.stdout.read(1024)
if len(data) == 0: # Read of zero bytes means EOF
dataend = True
You could try communicate and see if that solves your problem. If not, I'd redirect the output to a temporary file.
I had the same problem. If you have to handle a large output, another good option could be to use a file for stdout and stderr, and pass those files per parameter.
Check the tempfile module in python: https://docs.python.org/2/library/tempfile.html.
Something like this might work
out = tempfile.NamedTemporaryFile(delete=False)
Then you would do:
Popen(... stdout=out,...)
Then you can read the file, and erase it later.
Here is simple approach which captures both regular output plus error output, all within Python so limitations in stdout
don't apply:
com_str = 'uname -a'
command = subprocess.Popen([com_str], stdout=subprocess.PIPE, shell=True)
(output, error) = command.communicate()
print output
Linux 3.11.0-20-generic SMP Fri May 2 21:32:55 UTC 2014
com_str = 'id'
command = subprocess.Popen([com_str], stdout=subprocess.PIPE, shell=True)
(output, error) = command.communicate()
print output
uid=1000(myname) gid=1000(mygrp) groups=1000(cell),0(root)
You're doing blocking reads to two files; the first needs to complete before the second starts. If the application writes a lot to stderr
, and nothing to stdout
, then your process will sit waiting for data on stdout
that isn't coming, while the program you're running sits there waiting for the stuff it wrote to stderr
to be read (which it never will be--since you're waiting for stdout
你正在阻止对两个文件的读取;第一个需要在第二个开始之前完成。如果应用程序向stderr写了很多内容,而stdout没有任何内容,那么你的进程将等待stdout上的数据没有到来,而你正在运行的程序就在那里等待它写入stderr的东西被读取(它永远不会 - 因为你正在等待stdout)。
There are a few ways you can fix this.
The simplest is to not intercept stderr
; leave stderr=None
. Errors will be output to stderr
directly. You can't intercept them and display them as part of your own message. For commandline tools, this is often OK. For other apps, it can be a problem.
最简单的是不拦截stderr;离开stderr =无。错误将直接输出到stderr。您无法拦截它们并将其显示为您自己的消息的一部分。对于命令行工具,这通常没问题。对于其他应用程序,它可能是一个问题。
Another simple approach is to redirect stderr
to stdout
, so you only have one incoming file: set stderr=STDOUT
. This means you can't distinguish regular output from error output. This may or may not be acceptable, depending on how the application writes output.
另一个简单的方法是将stderr重定向到stdout,因此您只有一个传入文件:set stderr = STDOUT。这意味着您无法区分常规输出和错误输出。这取决于应用程序如何写入输出,这可能是也可能是不可接受的。
The complete and complicated way of handling this is select
(http://docs.python.org/library/select.html). This lets you read in a non-blocking way: you get data whenever data appears on either stdout
or stderr
. I'd only recommend this if it's really necessary. This probably doesn't work in Windows.
A lot of output is subjective so it's a little difficult to make a recommendation. If the amount of output is really large then you likely don't want to grab it all with a single read() call anyway. You may want to try writing the output to a file and then pull the data in incrementally like such:
p = subprocess.Popen(cmd, shell=True, stdout=f, stderr=subprocess.PIPE)
errcode = p.wait()
if errcode:
errmess = p.stderr.read()
log.error('cmd failed <%s>: %s' % (errcode,errmess))
for line in file('data.out'):
#do something
Glenn Maynard is right in his comment about deadlocks. However, the best way of solving this problem is two create two threads, one for stdout and one for stderr, which read those respective streams until exhausted and do whatever you need with the output.
The suggestion of using temporary files may or may not work for you depending on the size of output etc. and whether you need to process the subprocess' output as it is generated.
As Heikki Toivonen has suggested, you should look at the communicate
method. However, this buffers the stdout/stderr of the subprocess in memory and you get those returned from the communicate
call - this is not ideal for some scenarios. But the source of the communicate method is worth looking at.
正如Heikki Toivonen建议的那样,你应该看一下沟通方法。但是,这会将子进程的stdout / stderr缓存在内存中,并从通信调用中返回 - 这对某些情况来说并不理想。但是沟通方法的来源值得关注。
Another example is in a package I maintain, python-gnupg, where the gpg
executable is spawned via subprocess
to do the heavy lifting, and the Python wrapper spawns threads to read gpg's stdout and stderr and consume them as data is produced by gpg. You may be able to get some ideas by looking at the source there, as well. Data produced by gpg to both stdout and stderr can be quite large, in the general case.
Reading stdout
and stderr
independently with very large output (ie, lots of megabytes) using select
import subprocess, select
proc = subprocess.Popen(cmd, bufsize=8192, shell=False, \
stdout=subprocess.PIPE, stderr=subprocess.PIPE)
with open(outpath, "wb") as outf:
dataend = False
while (proc.returncode is None) or (not dataend):
dataend = False
ready = select.select([proc.stdout, proc.stderr], [], [], 1.0)
if proc.stderr in ready[0]:
data = proc.stderr.read(1024)
if len(data) > 0:
if proc.stdout in ready[0]:
data = proc.stdout.read(1024)
if len(data) == 0: # Read of zero bytes means EOF
dataend = True
You could try communicate and see if that solves your problem. If not, I'd redirect the output to a temporary file.
I had the same problem. If you have to handle a large output, another good option could be to use a file for stdout and stderr, and pass those files per parameter.
Check the tempfile module in python: https://docs.python.org/2/library/tempfile.html.
Something like this might work
out = tempfile.NamedTemporaryFile(delete=False)
Then you would do:
Popen(... stdout=out,...)
Then you can read the file, and erase it later.
Here is simple approach which captures both regular output plus error output, all within Python so limitations in stdout
don't apply:
com_str = 'uname -a'
command = subprocess.Popen([com_str], stdout=subprocess.PIPE, shell=True)
(output, error) = command.communicate()
print output
Linux 3.11.0-20-generic SMP Fri May 2 21:32:55 UTC 2014
com_str = 'id'
command = subprocess.Popen([com_str], stdout=subprocess.PIPE, shell=True)
(output, error) = command.communicate()
print output
uid=1000(myname) gid=1000(mygrp) groups=1000(cell),0(root)