在两个文本文件中查找匹配项

时间:2022-05-30 11:29:55

I have two files , the first I have this type of data is recorded :

我有两个文件,第一个我记录了这种类型的数据:

77437234:AAAAAA    
34434342:BBBBBB     
65434343:AAAAAA      
99543545:GGGGGG

In the second there is a lot of data(words) that exist in the first file(for example AAAAAA and GGGGGG) , it is necessary to find a match in first file and copy them into a new final file

在第二个文件中,第一个文件中存在大量数据(单词)(例如AAAAAA和GGGGGG),有必要在第一个文件中找到匹配项并将它们复制到新的最终文件中

(need to copy the entire line of the first file)

(需要复制第一个文件的整行)

1 个解决方案

#1


2  

You can use grep to match against a word file:

您可以使用grep匹配word文件:

$ cat file
77437234:AAAAAA
34434342:BBBBBB
65434343:AAAAAA
99543545:GGGGGG
$ cat words
AAAAAA
GGGGGG
$ grep -Fwf words file 
77437234:AAAAAA
65434343:AAAAAA
99543545:GGGGGG

To save the output in new file use redirection:

要将输出保存在新文件中,请使用重定向:

$ grep -Fwf words file > final

Options:

选项:

-w, --word-regexp

-w, - word-regexp

Select only those lines containing matches that form whole words. The test is that the matching substring must either be at the beginning of the line, or preceded by a non-word constituent character. Similarly, it must be either at the end of the line or followed by a non-word constituent character. Word-constituent characters are letters, digits, and the underscore.

仅选择包含构成整个单词的匹配项的行。测试是匹配的子字符串必须位于行的开头,或者前面是非单词构成字符。同样,它必须位于行尾或后跟非单词构成字符。单词构成字符是字母,数字和下划线。

-f FILE, --file=FILE

-f FILE, - file = FILE

Obtain patterns from FILE, one per line. The empty file contains zero patterns, and therefore matches nothing. (-f is specified by POSIX.)

从FILE获取模式,每行一个。空文件包含零模式,因此不匹配任何内容。 (-f由POSIX指定。)

-F, --fixed-strings

-F, - 固定字符串

Interpret PATTERN as a list of fixed strings, separated by newlines, any of which is to be matched. (-F is specified by POSIX.)

将PATTERN解释为固定字符串列表,由换行符分隔,其中任何一个都要匹配。 (-F由POSIX指定。)


If you want to match against an exact field you could use the following awk script:

如果要匹配精确字段,可以使用以下awk脚本:

$ awk -F: 'NR==FNR{words[$0];next}$2 in words' words file
77437234:AAAAAA
65434343:AAAAAA
99543545:GGGGGG

#1


2  

You can use grep to match against a word file:

您可以使用grep匹配word文件:

$ cat file
77437234:AAAAAA
34434342:BBBBBB
65434343:AAAAAA
99543545:GGGGGG
$ cat words
AAAAAA
GGGGGG
$ grep -Fwf words file 
77437234:AAAAAA
65434343:AAAAAA
99543545:GGGGGG

To save the output in new file use redirection:

要将输出保存在新文件中,请使用重定向:

$ grep -Fwf words file > final

Options:

选项:

-w, --word-regexp

-w, - word-regexp

Select only those lines containing matches that form whole words. The test is that the matching substring must either be at the beginning of the line, or preceded by a non-word constituent character. Similarly, it must be either at the end of the line or followed by a non-word constituent character. Word-constituent characters are letters, digits, and the underscore.

仅选择包含构成整个单词的匹配项的行。测试是匹配的子字符串必须位于行的开头,或者前面是非单词构成字符。同样,它必须位于行尾或后跟非单词构成字符。单词构成字符是字母,数字和下划线。

-f FILE, --file=FILE

-f FILE, - file = FILE

Obtain patterns from FILE, one per line. The empty file contains zero patterns, and therefore matches nothing. (-f is specified by POSIX.)

从FILE获取模式,每行一个。空文件包含零模式,因此不匹配任何内容。 (-f由POSIX指定。)

-F, --fixed-strings

-F, - 固定字符串

Interpret PATTERN as a list of fixed strings, separated by newlines, any of which is to be matched. (-F is specified by POSIX.)

将PATTERN解释为固定字符串列表,由换行符分隔,其中任何一个都要匹配。 (-F由POSIX指定。)


If you want to match against an exact field you could use the following awk script:

如果要匹配精确字段,可以使用以下awk脚本:

$ awk -F: 'NR==FNR{words[$0];next}$2 in words' words file
77437234:AAAAAA
65434343:AAAAAA
99543545:GGGGGG