I have a file of 500 MB, and a pattern file of 20MB. Since it was taking too much time to grep the 1.2 million patterns from the file with 5 million lines, I split the pattern file into 100 parts. I tried to run Grep parallely with the multiple patterns as below.
我有一个500mb的文件和一个20MB的模式文件。由于要花太多的时间从文件中以500万行来grep的120万模式,我将模式文件分成了100个部分。我尝试与下面的多个模式并行运行Grep。
for pat1 in vailtar_*
do
parallel --block 75M --pipe grep $pat1 infile >> outfile
done;
But I cannot get the output to append to a file. I tried without the block option and as below too -
但是我无法将输出附加到文件中。我尝试了没有区块选项和以下也
cat infile | parallel --block 75M --pipe grep $pat1 >> outfile
< infile parallel --block 75M --pipe grep $pat1 >> outfile
Is there anyway to make the parallel grep append the output to a file? Thanks in advance.
是否有办法让并行grep将输出附加到文件中?提前谢谢。
1 个解决方案
#1
2
Perhaps it will work better like this?
也许这样会更好?
for pat1 in vailtar_*
do
parallel --block 75M --pipe grep -f $pat1 < infile
done > outfile
That will take all the output from everything inside the for
loop, and put it in outfile
.
这将把所有的输出从for循环中取出,并放到outfile中。
Incidentally, I think you meant to use infile
as stdin, instead of as an argument to grep, and I think you meant to have -f $pat
, not just the filename as the pattern. I've fixed both issues in my version.
顺便说一下,我认为您应该使用infile作为stdin,而不是作为grep的参数,我认为您应该使用-f $pat,而不仅仅是作为模式的文件名。我已经在我的版本中解决了这两个问题。
However, if I were trying to solve this problem I might do it like this:
然而,如果我试图解决这个问题,我可能会这样做:
parallel 'grep -f {} infile' ::: vailtar_*
(I've not tested that.)
(我不是测试。)
#1
2
Perhaps it will work better like this?
也许这样会更好?
for pat1 in vailtar_*
do
parallel --block 75M --pipe grep -f $pat1 < infile
done > outfile
That will take all the output from everything inside the for
loop, and put it in outfile
.
这将把所有的输出从for循环中取出,并放到outfile中。
Incidentally, I think you meant to use infile
as stdin, instead of as an argument to grep, and I think you meant to have -f $pat
, not just the filename as the pattern. I've fixed both issues in my version.
顺便说一下,我认为您应该使用infile作为stdin,而不是作为grep的参数,我认为您应该使用-f $pat,而不仅仅是作为模式的文件名。我已经在我的版本中解决了这两个问题。
However, if I were trying to solve this problem I might do it like this:
然而,如果我试图解决这个问题,我可能会这样做:
parallel 'grep -f {} infile' ::: vailtar_*
(I've not tested that.)
(我不是测试。)