如何从另一个文件中的一个文件中查找单词?

时间:2022-10-31 16:05:08

In one text file, I have 150 words. I have another text file, which has about 100,000 lines.

在一个文本文件中,我有150个单词。我有另一个文本文件,大约有100,000行。

How can I check for each of the words belonging to the first file whether it is in the second or not?

如何检查属于第一个文件的每个单词是否在第二个文件中?

I thought about using grep, but I could not find out how to use it to read each of the words in the original text.

我想过使用grep,但我找不到如何使用它来阅读原始文本中的每个单词。

Is there any way to do this using awk? Or another solution?

有没有办法用awk做到这一点?或另一种解决方案

I tried with this shell script, but it matches almost every line:

我试过这个shell脚本,但它几乎匹配每一行:

#!/usr/bin/env sh
cat words.txt | while read line; do  
    if grep -F "$FILENAME" text.txt
    then
        echo "Se encontró $line"
    fi
done

Another way I found is:

我发现的另一种方式是:

fgrep -w -o -f "words.txt" "text.txt"

2 个解决方案

#1


5  

You can use fgrep -f:

你可以使用fgrep -f:

fgrep -f "first-file" "second-file"

OR else to match full words:

或者匹配完整的单词:

fgrep -w -f "first-file" "second-file"

UPDATE: As per the comments:

更新:根据评论:

awk 'FNR==NR{a[$1];next} ($1 in a){delete a[$1]; print $1}' file1 file2

#2


2  

Use grep like this:

像这样使用grep:

grep -f firstfile secondfile

SECOND OPTION

第二种选择

Thank you to Ed Morton for pointing out that the words in the file "reserved" are treated as patterns. If that is an issue - it may or may not be - the OP can maybe use something like this which doesn't use patterns:

感谢Ed Morton指出文件“reserved”中的单词被视为模式。如果这是一个问题 - 它可能是也可能不是 - OP可以使用不使用模式的这样的东西:

File "reserved"

档案“保留”

cat
dog
fox

and file "text"

和文件“文本”

The cat jumped over the lazy
fox but didn't land on the
moon at all.
However it did land on the dog!!!

Awk script is like this:

awk脚本是这样的:

awk 'BEGIN{i=0}FNR==NR{res[i++]=$1;next}{for(j=0;j<i;j++)if(index($0,res[j]))print $0}' reserved text

with output:

输出:

The cat jumped over the lazy
fox but didn't land on the
However it did land on the dog!!!

THIRD OPTION

第三种选择

Alternatively, it can be done quite simply, but more slowly in bash:

或者,它可以非常简单地完成,但在bash中更慢:

while read r; do grep $r secondfile; done < firstfile 

#1


5  

You can use fgrep -f:

你可以使用fgrep -f:

fgrep -f "first-file" "second-file"

OR else to match full words:

或者匹配完整的单词:

fgrep -w -f "first-file" "second-file"

UPDATE: As per the comments:

更新:根据评论:

awk 'FNR==NR{a[$1];next} ($1 in a){delete a[$1]; print $1}' file1 file2

#2


2  

Use grep like this:

像这样使用grep:

grep -f firstfile secondfile

SECOND OPTION

第二种选择

Thank you to Ed Morton for pointing out that the words in the file "reserved" are treated as patterns. If that is an issue - it may or may not be - the OP can maybe use something like this which doesn't use patterns:

感谢Ed Morton指出文件“reserved”中的单词被视为模式。如果这是一个问题 - 它可能是也可能不是 - OP可以使用不使用模式的这样的东西:

File "reserved"

档案“保留”

cat
dog
fox

and file "text"

和文件“文本”

The cat jumped over the lazy
fox but didn't land on the
moon at all.
However it did land on the dog!!!

Awk script is like this:

awk脚本是这样的:

awk 'BEGIN{i=0}FNR==NR{res[i++]=$1;next}{for(j=0;j<i;j++)if(index($0,res[j]))print $0}' reserved text

with output:

输出:

The cat jumped over the lazy
fox but didn't land on the
However it did land on the dog!!!

THIRD OPTION

第三种选择

Alternatively, it can be done quite simply, but more slowly in bash:

或者,它可以非常简单地完成,但在bash中更慢:

while read r; do grep $r secondfile; done < firstfile