I'm writing a Python program for running user-uploaded arbitrary (and thus, at the worst case, unsafe, erroneous and crashing) code on a Linux server. The security questions aside, my objective is to determine, if the code (that might be in any language, compiled or interpreted) writes the correct things to stdout
, stderr
and other files on given input fed into the program's stdin
. After this, I need to display the results to the user.
我正在编写一个Python程序,用于在Linux服务器上运行用户上传的任意(因此,在最坏的情况下,不安全,错误和崩溃)代码。抛开安全问题,我的目标是确定代码(可能是任何语言,编译或解释)是否将正确的内容写入stdout,stderr和其他文件,并将给定的输入提供给程序的stdin。在此之后,我需要向用户显示结果。
The current solution
Currently, my solution is to spawn the child process using subprocess.Popen(...)
with file handles for the stdout
, stderr
and stdin
. The file behind the stdin
handle contains the inputs that the program reads during operation, and after the program has terminated, the stdout
and stderr
files are read and checked for correctness.
目前,我的解决方案是使用subprocess.Popen(...)生成子进程,其中包含stdout,stderr和stdin的文件句柄。 stdin句柄后面的文件包含程序在操作期间读取的输入,在程序终止后,将读取stdout和stderr文件并检查其是否正确。
The problem
This approach works otherwise perfectly, but when I display the results, I can't combine the given inputs and outputs so that the inputs would appear in the same places as they would when running the program from a terminal. I.e. for a program like
这种方法非常完美,但是当我显示结果时,我无法组合给定的输入和输出,因此输入将出现在与从终端运行程序时相同的位置。即对于像这样的程序
print "Hello."
name = raw_input("Type your name: ")
print "Nice to meet you, %s!" % (name)
the contents of the file containing the program's stdout
would, after running, be:
运行后,包含程序stdout的文件内容为:
Hello.
Type your name:
Nice to meet you, Anonymous!
given that the contents the file containing the stdin
were Anonymous<LF>
. So, in short, for the given example code (and, equivalently, for any other code) I want to achieve a result like:
鉴于包含stdin的文件的内容是Anonymous
Hello.
Type your name: Anonymous
Nice to meet you, Anonymous!
Thus, the problem is to detect when the program is waiting for input.
因此,问题是检测程序何时等待输入。
Tried methods
I've tried the following methods for solving the problem:
我尝试了以下方法来解决问题:
Popen.communicate(...)
This allows the parent process to separately send data along a pipe, but can only be called once, and is therefore not suitable for programs with multiple outputs and inputs - just as can be inferred from the documentation.
这允许父进程沿管道单独发送数据,但只能调用一次,因此不适用于具有多个输出和输入的程序 - 正如可以从文档中推断出来的那样。
Directly reading from Popen.stdout and Popen.stderr and writing to Popen.stdin
The documentation warns against this, and the Popen.stdout
s .read()
and .readline()
calls seem to block infinitely when the programs starts to wait for input.
文档警告不要这样,当程序开始等待输入时,Popen.stdouts .read()和.readline()调用似乎无限阻塞。
Using select.select(...)
to see if the file handles are ready for I/O
This doesn't seem to improve anything. Apparently the pipes are always ready for reading or writing, so select.select(...)
doesn't help much here.
这似乎没有任何改善。显然,管道随时可以读取或写入,因此select.select(...)在这里没有多大帮助。
Using a different thread for non-blocking reading
As suggested in this answer, I have tried creating a separate Thread() that stores results from reading from the stdout
into a Queue(). The output lines before a line demanding user input are displayed nicely, but the line on which the program starts to wait for user input ("Type your name: "
in the example above) never gets read.
正如在这个答案中所建议的那样,我尝试创建一个单独的Thread(),它将从stdout读取的结果存储到Queue()中。在要求用户输入的行之前的输出行很好地显示,但程序开始等待用户输入的行(在上面的示例中“键入你的名字:”)永远不会被读取。
Using a PTY slave as the child process' file handles
As directed here, I've tried pty.openpty()
to create a pseudo terminal with master and slave file descriptors. After that, I've given the slave file descriptor as an argument for the subprocess.Popen(...)
call's stdout
, stderr
and stdin
parameters. Reading through the master file descriptor opened with os.fdopen(...)
yields the same result as using a different thread: the line demanding input doesn't get read.
按照这里的指示,我尝试了pty.openpty()来创建一个带有主文件描述符和从文件描述符的伪终端。之后,我将slave文件描述符作为subprocess.Popen(...)调用的stdout,stderr和stdin参数的参数。读取使用os.fdopen(...)打开的主文件描述符会产生与使用不同线程相同的结果:线要求输入不会被读取。
Edit: Using @Antti Haapala's example of pty.fork()
for child process creation instead of subprocess.Popen(...)
seems to allow me also read the output created by raw_input(...)
.
编辑:使用@Antti Haapala的pty.fork()示例来创建子进程而不是subprocess.Popen(...)似乎也允许我读取raw_input(...)创建的输出。
Using pexpect
I've also tried the read()
, read_nonblocking()
and readline()
methods (documented here) of a process spawned with pexpect, but the best result, which I got with read_nonblocking()
, is the same as before: the line with outputs before wanting the user to enter something doesn't get read. is the same as with a PTY created with pty.fork()
: the line demanding input does get read.
我还尝试了使用pexpect生成的进程的read(),read_nonblocking()和readline()方法(此处记录),但是我用read_nonblocking()获得的最佳结果与之前相同:该行在想要用户输入某些内容之前输出的内容无法读取。与使用pty.fork()创建的PTY相同:线要求输入确实被读取。
Edit: By using sys.stdout.write(...)
and sys.stdout.flush()
instead of print
ing in my master program, which creates the child, seemed to fix the prompt line not getting displayed - it actually got read in both cases, though.
编辑:通过使用sys.stdout.write(...)和sys.stdout.flush()而不是在我创建子项的主程序中打印,似乎修复了未显示的提示行 - 它实际上已被读入尽管如此。
Others
I've also tried select.poll(...)
, but it seemed that the pipe or PTY master file descriptors are always ready for writing.
我也尝试过select.poll(...),但似乎管道或PTY主文件描述符总是可以写入。
Notes
Other solutions
- What also crossed my mind is to try feeding the input when some time has passed without new output having been generated. This, however, is risky, because there's no way to know if the program is just in the middle of doing a heavy calculation.
- 我想到的是,在没有生成新输出的情况下经过一段时间后尝试输入输入。然而,这是有风险的,因为没有办法知道程序是否只是在进行繁重的计算。
- As @Antti Haapala mentioned in his answer, the
read()
system call wrapper from glibc could be replaced to communicate the inputs to the master program. However, this doesn't work with statically linked or assembly programs. (Although, now that I think of it, any such calls could be intercepted from the source code and replaced with the patched version ofread()
- could be painstaking to implement still.) - 正如@Antti Haapala在他的回答中提到的,可以替换来自glibc的read()系统调用包装器以将输入传递给主程序。但是,这不适用于静态链接或汇编程序。 (虽然,现在我想起来了,任何这样的调用都可以从源代码中截取并替换为read()的修补版本 - 可能仍然需要付出艰苦的努力。)
- Modifying the Linux kernel code to communicate the
read()
syscalls to the program is probably insane... - 修改Linux内核代码以将read()系统调用传递给程序可能是疯了......
PTYs
I think the PTY is the way to go, since it fakes a terminal and interactive programs are run on terminals everywhere. The question is, how?
我认为PTY是要走的路,因为它假装终端并且交互式程序在各地的终端上运行。问题是,怎么样?
2 个解决方案
#1
5
Have you noticed that raw_input writes the prompt string into stderr if stdout is terminal (isatty); if stdout is not a terminal, then the prompt too is written to stdout, but stdout will be in fully buffered mode.
您是否注意到,如果stdout是terminal(isatty),raw_input会将提示字符串写入stderr;如果stdout不是终端,那么提示也会被写入stdout,但是stdout将处于完全缓冲模式。
With stdout on a tty
随着stdout在tty上
write(1, "Hello.\n", 7) = 7
ioctl(0, SNDCTL_TMR_TIMEBASE or TCGETS, {B38400 opost isig icanon echo ...}) = 0
ioctl(1, SNDCTL_TMR_TIMEBASE or TCGETS, {B38400 opost isig icanon echo ...}) = 0
ioctl(0, SNDCTL_TMR_TIMEBASE or TCGETS, {B38400 opost isig icanon echo ...}) = 0
ioctl(1, SNDCTL_TMR_TIMEBASE or TCGETS, {B38400 opost isig icanon echo ...}) = 0
write(2, "Type your name: ", 16) = 16
fstat(0, {st_mode=S_IFCHR|0600, st_rdev=makedev(136, 3), ...}) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fb114059000
read(0, "abc\n", 1024) = 4
write(1, "Nice to meet you, abc!\n", 23) = 23
With stdout not on a tty
stdout不在tty上
ioctl(0, SNDCTL_TMR_TIMEBASE or TCGETS, {B38400 opost isig icanon echo ...}) = 0
ioctl(1, SNDCTL_TMR_TIMEBASE or TCGETS, 0x7fff8d9d3410) = -1 ENOTTY (Inappropriate ioctl for device)
# oops, python noticed that stdout is NOTTY.
fstat(0, {st_mode=S_IFCHR|0600, st_rdev=makedev(136, 3), ...}) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f29895f0000
read(0, "abc\n", 1024) = 4
rt_sigaction(SIGINT, {SIG_DFL, [], SA_RESTORER, 0x7f29891c4bd0}, {0x451f62, [], SA_RESTORER, 0x7f29891c4bd0}, 8) = 0
write(1, "Hello.\nType your name: Nice to m"..., 46) = 46
# squeeze all output at the same time into stdout... pfft.
Thus all writes are squeezed into stdout all at the same time; and what is worse, after the input is read.
因此,所有写入都同时被压缩到stdout中;更糟糕的是,读取输入后。
The real solution is thus to use the pty. However you are doing it wrong. For the pty to work, you must use the pty.fork() command, not subprocess. (This will be very tricky). I have some working code that goes like this:
因此,真正的解决方案是使用pty。但是你做错了。要使pty起作用,必须使用pty.fork()命令,而不是子进程。 (这将非常棘手)。我有一些像这样的工作代码:
import os
import tty
import pty
program = "python"
# command name in argv[0]
argv = [ "python", "foo.py" ]
pid, master_fd = pty.fork()
# we are in the child process
if pid == pty.CHILD:
# execute the program
os.execlp(program, *argv)
# else we are still in the parent, and pty.fork returned the pid of
# the child. Now you can read, write in master_fd, or use select:
# rfds, wfds, xfds = select.select([master_fd], [], [], timeout)
Notice that depending on the terminal mode set by the child program there might be different kinds of linefeeds coming out, etc.
请注意,根据子程序设置的终端模式,可能会出现不同类型的换行符等。
Now about the "waiting for input" problem, that cannot be really helped as one can always write to a pseudoterminal; the characters will be put to wait in the buffer. Likewise, a pipe always allows one to write up to 4K or 32K or some other implementation defined amount, before blocking. One ugly way is to strace the program and notice whenever it enters the read system call, with fd = 0; the other would be to make a C module with a replacement "read()" system call and link it in before glibc for the dynamic linker (fails if the executable is statically linked or uses system calls directly with assembler...), and then would signal python whenever the read(0, ...) system call is executed. All in all, probably not worth the trouble exactly.
现在关于“等待输入”问题,这不能真正帮助,因为人们总能写入伪终端;字符将被放入缓冲区中等待。同样,管道总是允许在阻塞之前写入4K或32K或其他一些实现定义的数量。一种丑陋的方法是在程序进入读取系统调用时检查程序并注意,fd = 0;另一种方法是创建一个带有替换“read()”系统调用的C模块,并在glibc之前将其链接到动态链接器(如果可执行文件是静态链接的,则会失败,或者直接使用汇编程序使用系统调用...),以及然后每当执行read(0,...)系统调用时都会发出python信号。总而言之,可能完全不值得。
#2
0
Instead of trying to detect when the child process is waiting for an input, you can use the linux script
command. From the man page for script:
您可以使用linux脚本命令,而不是尝试检测子进程何时等待输入。从脚本的手册页:
The script utility makes a typescript of everything printed on your terminal.
脚本实用程序会在终端上打印所有内容的打字稿。
You can use it like this if you were using it on a terminal:
如果您在终端上使用它,可以像这样使用它:
$ script -q <outputfile> <command>
So in Python you can try giving this command to the Popen
routine instead of just <command>
.
因此在Python中,您可以尝试将此命令提供给Popen例程而不仅仅是
Edit: I made the following program:
编辑:我做了以下程序:
#include <stdio.h>
int main() {
int i;
scanf("%d", &i);
printf("i + 1 = %d\n", i+1);
}
and then ran it as follows:
然后运行如下:
$ echo 9 > infile
$ script -q output ./a.out < infile
$ cat output
9
i + 1 = 10
So I think it can be done in Python this way instead of using the stdout
, stderr
and stdin
flags of Popen
.
所以我认为它可以用Python这种方式完成,而不是使用Popen的stdout,stderr和stdin标志。
#1
5
Have you noticed that raw_input writes the prompt string into stderr if stdout is terminal (isatty); if stdout is not a terminal, then the prompt too is written to stdout, but stdout will be in fully buffered mode.
您是否注意到,如果stdout是terminal(isatty),raw_input会将提示字符串写入stderr;如果stdout不是终端,那么提示也会被写入stdout,但是stdout将处于完全缓冲模式。
With stdout on a tty
随着stdout在tty上
write(1, "Hello.\n", 7) = 7
ioctl(0, SNDCTL_TMR_TIMEBASE or TCGETS, {B38400 opost isig icanon echo ...}) = 0
ioctl(1, SNDCTL_TMR_TIMEBASE or TCGETS, {B38400 opost isig icanon echo ...}) = 0
ioctl(0, SNDCTL_TMR_TIMEBASE or TCGETS, {B38400 opost isig icanon echo ...}) = 0
ioctl(1, SNDCTL_TMR_TIMEBASE or TCGETS, {B38400 opost isig icanon echo ...}) = 0
write(2, "Type your name: ", 16) = 16
fstat(0, {st_mode=S_IFCHR|0600, st_rdev=makedev(136, 3), ...}) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fb114059000
read(0, "abc\n", 1024) = 4
write(1, "Nice to meet you, abc!\n", 23) = 23
With stdout not on a tty
stdout不在tty上
ioctl(0, SNDCTL_TMR_TIMEBASE or TCGETS, {B38400 opost isig icanon echo ...}) = 0
ioctl(1, SNDCTL_TMR_TIMEBASE or TCGETS, 0x7fff8d9d3410) = -1 ENOTTY (Inappropriate ioctl for device)
# oops, python noticed that stdout is NOTTY.
fstat(0, {st_mode=S_IFCHR|0600, st_rdev=makedev(136, 3), ...}) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f29895f0000
read(0, "abc\n", 1024) = 4
rt_sigaction(SIGINT, {SIG_DFL, [], SA_RESTORER, 0x7f29891c4bd0}, {0x451f62, [], SA_RESTORER, 0x7f29891c4bd0}, 8) = 0
write(1, "Hello.\nType your name: Nice to m"..., 46) = 46
# squeeze all output at the same time into stdout... pfft.
Thus all writes are squeezed into stdout all at the same time; and what is worse, after the input is read.
因此,所有写入都同时被压缩到stdout中;更糟糕的是,读取输入后。
The real solution is thus to use the pty. However you are doing it wrong. For the pty to work, you must use the pty.fork() command, not subprocess. (This will be very tricky). I have some working code that goes like this:
因此,真正的解决方案是使用pty。但是你做错了。要使pty起作用,必须使用pty.fork()命令,而不是子进程。 (这将非常棘手)。我有一些像这样的工作代码:
import os
import tty
import pty
program = "python"
# command name in argv[0]
argv = [ "python", "foo.py" ]
pid, master_fd = pty.fork()
# we are in the child process
if pid == pty.CHILD:
# execute the program
os.execlp(program, *argv)
# else we are still in the parent, and pty.fork returned the pid of
# the child. Now you can read, write in master_fd, or use select:
# rfds, wfds, xfds = select.select([master_fd], [], [], timeout)
Notice that depending on the terminal mode set by the child program there might be different kinds of linefeeds coming out, etc.
请注意,根据子程序设置的终端模式,可能会出现不同类型的换行符等。
Now about the "waiting for input" problem, that cannot be really helped as one can always write to a pseudoterminal; the characters will be put to wait in the buffer. Likewise, a pipe always allows one to write up to 4K or 32K or some other implementation defined amount, before blocking. One ugly way is to strace the program and notice whenever it enters the read system call, with fd = 0; the other would be to make a C module with a replacement "read()" system call and link it in before glibc for the dynamic linker (fails if the executable is statically linked or uses system calls directly with assembler...), and then would signal python whenever the read(0, ...) system call is executed. All in all, probably not worth the trouble exactly.
现在关于“等待输入”问题,这不能真正帮助,因为人们总能写入伪终端;字符将被放入缓冲区中等待。同样,管道总是允许在阻塞之前写入4K或32K或其他一些实现定义的数量。一种丑陋的方法是在程序进入读取系统调用时检查程序并注意,fd = 0;另一种方法是创建一个带有替换“read()”系统调用的C模块,并在glibc之前将其链接到动态链接器(如果可执行文件是静态链接的,则会失败,或者直接使用汇编程序使用系统调用...),以及然后每当执行read(0,...)系统调用时都会发出python信号。总而言之,可能完全不值得。
#2
0
Instead of trying to detect when the child process is waiting for an input, you can use the linux script
command. From the man page for script:
您可以使用linux脚本命令,而不是尝试检测子进程何时等待输入。从脚本的手册页:
The script utility makes a typescript of everything printed on your terminal.
脚本实用程序会在终端上打印所有内容的打字稿。
You can use it like this if you were using it on a terminal:
如果您在终端上使用它,可以像这样使用它:
$ script -q <outputfile> <command>
So in Python you can try giving this command to the Popen
routine instead of just <command>
.
因此在Python中,您可以尝试将此命令提供给Popen例程而不仅仅是
Edit: I made the following program:
编辑:我做了以下程序:
#include <stdio.h>
int main() {
int i;
scanf("%d", &i);
printf("i + 1 = %d\n", i+1);
}
and then ran it as follows:
然后运行如下:
$ echo 9 > infile
$ script -q output ./a.out < infile
$ cat output
9
i + 1 = 10
So I think it can be done in Python this way instead of using the stdout
, stderr
and stdin
flags of Popen
.
所以我认为它可以用Python这种方式完成,而不是使用Popen的stdout,stderr和stdin标志。