如何将并行Grep的输出附加到文件中?

时间:2022-05-13 17:51:46

I have a file of 500 MB, and a pattern file of 20MB. Since it was taking too much time to grep the 1.2 million patterns from the file with 5 million lines, I split the pattern file into 100 parts. I tried to run Grep parallely with the multiple patterns as below.

我有一个500mb的文件和一个20MB的模式文件。由于要花太多的时间从文件中以500万行来grep的120万模式,我将模式文件分成了100个部分。我尝试与下面的多个模式并行运行Grep。

for pat1 in vailtar_*
do
    parallel --block 75M --pipe grep $pat1 infile >> outfile
done;

But I cannot get the output to append to a file. I tried without the block option and as below too -

但是我无法将输出附加到文件中。我尝试了没有区块选项和以下也

cat infile | parallel --block 75M --pipe grep $pat1 >> outfile
< infile parallel --block 75M --pipe grep $pat1 >> outfile

Is there anyway to make the parallel grep append the output to a file? Thanks in advance.

是否有办法让并行grep将输出附加到文件中?提前谢谢。

1 个解决方案

#1


2  

Perhaps it will work better like this?

也许这样会更好?

for pat1 in vailtar_*
do
    parallel --block 75M --pipe grep -f $pat1 < infile
done > outfile

That will take all the output from everything inside the for loop, and put it in outfile.

这将把所有的输出从for循环中取出,并放到outfile中。

Incidentally, I think you meant to use infile as stdin, instead of as an argument to grep, and I think you meant to have -f $pat, not just the filename as the pattern. I've fixed both issues in my version.

顺便说一下,我认为您应该使用infile作为stdin,而不是作为grep的参数,我认为您应该使用-f $pat,而不仅仅是作为模式的文件名。我已经在我的版本中解决了这两个问题。


However, if I were trying to solve this problem I might do it like this:

然而,如果我试图解决这个问题,我可能会这样做:

parallel 'grep -f {} infile' ::: vailtar_*

(I've not tested that.)

(我不是测试。)

#1


2  

Perhaps it will work better like this?

也许这样会更好?

for pat1 in vailtar_*
do
    parallel --block 75M --pipe grep -f $pat1 < infile
done > outfile

That will take all the output from everything inside the for loop, and put it in outfile.

这将把所有的输出从for循环中取出,并放到outfile中。

Incidentally, I think you meant to use infile as stdin, instead of as an argument to grep, and I think you meant to have -f $pat, not just the filename as the pattern. I've fixed both issues in my version.

顺便说一下,我认为您应该使用infile作为stdin,而不是作为grep的参数,我认为您应该使用-f $pat,而不仅仅是作为模式的文件名。我已经在我的版本中解决了这两个问题。


However, if I were trying to solve this problem I might do it like this:

然而,如果我试图解决这个问题,我可能会这样做:

parallel 'grep -f {} infile' ::: vailtar_*

(I've not tested that.)

(我不是测试。)