Python中的grep库输出

When calling a program from the command line, I can pipe the output to grep to select the lines I want to see, e.g.

从命令行调用程序时,我可以将输出传递给grep以选择我想要查看的行,例如

printf "hello\ngood day\nfarewell\n" | grep day

I am in search for the same kind of line selection, but for a C library called from Python. Consider the following example:

我正在寻找相同类型的行选择,但是对于从Python调用的C库。请考虑以下示例:

import os

# Function which emulate a C library call
def call_library():
    os.system('printf "hello\ngood day\nfarewell\n"')

# Pure Python stuff
print('hello from Python')
# C library stuff
call_library()

When running this Python code, I want the output of the C part to be grep'ed for the string 'day', making the output of the code

运行这个Python代码时,我希望C部分的输出为字符串'day'的grep'ed,使得代码的输出

hello from Python
good day

你好,从Python好日子

So far I has fiddled around with redirection of stdout, using the methods described here and here. I am able to make the C output vanish completely, or save it to a str and print it out later (which is what the two links are mainly concerned with). I am not however able to select which lines get printed based on its content. Importantly, I want the output in real time while the C library is being called, so I cannot just redirect stdout to some buffer and do some processing on this buffer after the fact.

到目前为止,我已经使用此处和此处描述的方法摆弄了stdout的重定向。我能够使C输出完全消失,或者将其保存到str并稍后将其打印出来(这是两个链接主要关注的内容)。然而,我无法根据其内容选择打印哪些行。重要的是,我希望在调用C库时实时输出,所以我不能将stdout重定向到某个缓冲区,并在事后对此缓冲区进行一些处理。

The solution need only to work with Python 3.x on Linux. If in addition to line selection, the solution makes it possible for line editing, that would be even greater.

该解决方案只需要在Linux上使用Python 3.x.如果除了行选择之外,该解决方案还可以进行行编辑,甚至可以更大。

I think the following should be possible, but I do not know how to set it up

Redirect stdout to a "file" in memory.

将stdout重定向到内存中的“文件”。
Spawn a new thread which constantly reads from this file, does the selection based on line content, and writes the wanted lines to the screen, i.e. the original destination of stdout.

生成一个不断从该文件读取的新线程,根据行内容进行选择,并将所需行写入屏幕,即stdout的原始目标。
Call the C library

调用C库
Join the two threads back together and redirect stdout back to its original destination (the screen).

将两个线程重新连接在一起,并将stdout重定向回其原始目标(屏幕)。

I do not have a firm enough grasp of file descriptors and the like to be able to do this, nor to even know if this is the best way of doing it.

我对文件描述符等没有足够的把握能够做到这一点,甚至不知道这是否是最好的方法。

Edit

Note that the solution cannot simply re-implement the code in call_library. The code must call call_library, totally agnostic to the actual code which then gets executed.

请注意,解决方案不能简单地重新实现call_library中的代码。代码必须调用call_library,与实际代码完全无关,然后执行。

2 个解决方案

#1

I'm a little confused about exactly what your program is doing, but it sounds like you have a C library that writes to the C stdout (not the Python sys.stdout) and you want to capture this output and postprocess it, and you already have a Python binding for the C library, which you would prefer to use rather than a separate C program.

我对你的程序正在做什么感到有点困惑,但听起来你有一个C库写入C stdout(而不是Python sys.stdout)并且你想捕获这个输出并对它进行后处理,而你已经有一个C库的Python绑定,你更喜欢使用它而不是一个单独的C程序。

First off, you must use a child process to do this; nothing else will work reliably. This is because stdout is process-global, so there's no reliable way to capture only one thread's writes to stdout.

首先,您必须使用子进程来执行此操作;别的什么都不会可靠。这是因为stdout是进程全局的,所以没有可靠的方法只捕获一个线程对stdout的写入。

Second off, you can use subprocess.Popen, because you can re-invoke the current script using it! This is what the Python multiprocessing module does under the hood, and it's not terribly hard to do yourself. I would use a special, hidden command line argument to distinguish the child, like this:

第二,您可以使用subprocess.Popen,因为您可以使用它重新调用当前脚本!这就是Python多处理模块所做的事情,而且你自己也不是很难。我会使用一个特殊的隐藏命令行参数来区分孩子,如下所示:

import argparse
import subprocess
import sys

def subprocess_call_c_lib():
    import c_lib
    c_lib.do_stuff()

def invoke_c_lib():
    proc = subprocess.Popen([sys.executable, __file__,
                             "--internal-subprocess-call-c-lib"
                             # , ...
                             ],
                            stdin=subprocess.DEVNULL,
                            stdout=subprocess.PIPE)
    for line in proc.stdout:
        # filter output from the library here
        # to display to "screen", write to sys.stdout as usual

    if proc.wait():
        raise subprocess.CalledProcessError(proc.returncode, "c_lib")

def main():
    ap = argparse.Parser(...)
    ap.add_argument("--internal-subprocess-call-c-lib", action="store_true",
                    help=argparse.SUPPRESS)
    # ... more arguments ...

    args = ap.parse_args()
    if args.internal_subprocess_call_c_lib:
        subprocess_call_c_lib()
        sys.exit(0)

    # otherwise, proceed as before ...

main()

#2

It is possible if the grepping thread prints to stderr, at least:

如果grepping线程打印到stderr,至少可以:

# Function which emulate a C library call
def call_library():
    os.system("echo hello")
    time.sleep(1.0)
    os.system("echo good day")
    time.sleep(1.0)
    os.system("echo farewell")
    time.sleep(1.0)
    os.system("echo done")


class GrepThread(threading.Thread):
    def __init__(self, r,):
        threading.Thread.__init__(self)
        self.r = r

    def run(self):
        while True:
            s = self.r.readline()
            if not s:
                break
            if "day" in s:
                print(s, file=sys.stderr)    

original_stdout_fd = sys.stdout.fileno()
# file descriptors r, w for reading and writing
r, w = os.pipe() 
r = os.fdopen(r)
os.dup2(w, original_stdout_fd)
sys.stdout = io.TextIOWrapper(os.fdopen(original_stdout_fd, 'wb'))

thread = GrepThread(r)
thread.start()
print("Starting", file=sys.stderr)
call_library()

Note that this does not close the thread nor clean things up, but it seems to work on my computer. It will print the lines as the function executes, not afterwards.

请注意,这不会关闭线程也不会清理,但它似乎可以在我的计算机上运行。它将在函数执行时打印行,而不是之后。

#1