如何使用awk测试列值是否在另一个文件中?

时间:2021-05-14 19:30:28

I want to do something like

我想做一些类似的事情

if ($2 in another file) { print $0 }

So say I have file A.txt which contains

假设有文件A。txt包含

aa
bb
cc

I have B.txt like

我有B。txt等

00,aa
11,bb
00,dd

I want to print

我想打印

00,aa
11,bb

How do I test that in awk? I am not familiar with the tricks of processing two files at a time.

如何在awk中进行测试?我不熟悉一次处理两个文件的技巧。

4 个解决方案

#1


2  

You could use something like this:

你可以用这样的东西:

awk -F, 'NR == FNR { a[$0]; next } $2 in a' A.txt B.txt

This saves each line from A.txt as a key in the array a and then prints any lines from B.txt whose second field is in the array.

这将从A中保存每一行。txt作为数组a中的键,然后从B中打印任何行。其第二个字段在数组中的txt。

NR == FNR is the standard way to target the first file passed to awk, as NR (the total record number) is only equal to FNR (the record number for the current file) for the first file. next skips to the next record so the $2 in a part is never reached until the second file.

NR == FNR是针对传递给awk的第一个文件的标准方法,因为对于第一个文件,NR(总记录号)仅等于FNR(当前文件的记录号)。接下来跳转到下一个记录,所以直到第二个文件时才会到达部分的2美元。

#2


1  

There seem to be two schools of thought on the matter. Some prefer to use the BEGIN-based idiom, and others the FNR-based idiom.

在这个问题上似乎有两种观点。有些人喜欢使用开头的习惯用法,而有些人则喜欢使用基于fnrf2的习惯用法。

Here's the essence of the former:

以下是前者的本质:

awk -v infile=INFILE '
  BEGIN { while( (getline < infile)>0 ) { .... } }
  ... '

For the latter, just search for:

对于后者,只需搜索:

awk 'FNR==NR'

awk FNR = = NR的

#3


1  

alternative with join

选择与连接

if the files are both sorted on the joined field

如果两个文件都在已连接字段上排序

$ join -t, -1 1 -2 2 -o2.1,2.2 file1 file2

00,aa
11,bb

set delimiter to comma, join first field from first file with second field from second file, output fields swapped. If not sorted you need to sort them first, but then awk might be a better choice.

将分隔符设置为逗号,从第一个文件连接第一个字段,从第二个文件连接第二个字段,输出字段交换。如果没有排序,你需要先排序,然后awk可能是更好的选择。

#4


-1  

Another way to do

另一种方法做

   awk -F, -v file_name=a.txt '{if(system("grep -q " $2 OFS file_name) == 0){print $0}}' b.txt 

#1


2  

You could use something like this:

你可以用这样的东西:

awk -F, 'NR == FNR { a[$0]; next } $2 in a' A.txt B.txt

This saves each line from A.txt as a key in the array a and then prints any lines from B.txt whose second field is in the array.

这将从A中保存每一行。txt作为数组a中的键,然后从B中打印任何行。其第二个字段在数组中的txt。

NR == FNR is the standard way to target the first file passed to awk, as NR (the total record number) is only equal to FNR (the record number for the current file) for the first file. next skips to the next record so the $2 in a part is never reached until the second file.

NR == FNR是针对传递给awk的第一个文件的标准方法,因为对于第一个文件,NR(总记录号)仅等于FNR(当前文件的记录号)。接下来跳转到下一个记录,所以直到第二个文件时才会到达部分的2美元。

#2


1  

There seem to be two schools of thought on the matter. Some prefer to use the BEGIN-based idiom, and others the FNR-based idiom.

在这个问题上似乎有两种观点。有些人喜欢使用开头的习惯用法,而有些人则喜欢使用基于fnrf2的习惯用法。

Here's the essence of the former:

以下是前者的本质:

awk -v infile=INFILE '
  BEGIN { while( (getline < infile)>0 ) { .... } }
  ... '

For the latter, just search for:

对于后者,只需搜索:

awk 'FNR==NR'

awk FNR = = NR的

#3


1  

alternative with join

选择与连接

if the files are both sorted on the joined field

如果两个文件都在已连接字段上排序

$ join -t, -1 1 -2 2 -o2.1,2.2 file1 file2

00,aa
11,bb

set delimiter to comma, join first field from first file with second field from second file, output fields swapped. If not sorted you need to sort them first, but then awk might be a better choice.

将分隔符设置为逗号,从第一个文件连接第一个字段,从第二个文件连接第二个字段,输出字段交换。如果没有排序,你需要先排序,然后awk可能是更好的选择。

#4


-1  

Another way to do

另一种方法做

   awk -F, -v file_name=a.txt '{if(system("grep -q " $2 OFS file_name) == 0){print $0}}' b.txt