忽略第一行,但在使用AWK进行模式匹配期间仍然打印它

时间:2022-06-01 22:07:23

I have a straight-forward problem. If an element in the first column (ID) of "file" matches with an element of first column (ID) of "subfile", the matched element of "file" should be replaced by the corresponding element of second column (i.e by Symbol) of "subfile".

我有一个直截了当的问题。如果“file”的第一列(ID)中的元素与“subfile”的第一列(ID)的元素匹配,则“file”的匹配元素应该被第二列的相应元素替换(即,通过Symbol )“子文件”。

The following code works fine, but considers the first element of second column (ie. A) as first element of first column. Therefore, during pattern matching and omits it from final output, and shifts all elements one cell forward leaving the last cell blank.

以下代码工作正常,但将第二列的第一个元素(即A)视为第一列的第一个元素。因此,在模式匹配期间并从最终输出中省略它,并将所有元素向前移动一个单元格,使最后一个单元格留空。

I presume that the possible solution would be to ignore the first row. Any suggestions please.

我认为可能的解决方案是忽略第一行。请给我任何建议。

awk 'FNR==NR {a[$1]=$2;next} {$1=a[$1]}1' OFS="\t" subfile file

file

文件

             A               B                C
204639_at    1.4063964497   1.9690376378    -0.5856006063
209027_s_at -0.6184167971  -0.3803235873     0.6532643621
224864_at    0.9290801469   0.0020026866    -1.2993653537
224637_at    0.4688503882  -0.137487333     -0.453195703
226482_s_at -0.0615034202   0.4300315287    -0.6852205341

Subfile

子文件

204639_at   ADA
209027_s_at ABI1
224864_at   SRA1
224637_at   OST4
226482_s_at TSTD1

Output Obtained:

获得的产出:

      B             C   
ADA   1.4063964497  1.9690376378  -0.5856006063
ABI1 -0.6184167971 -0.3803235873   0.6532643621
SRA1  0.9290801469  0.0020026866  -1.2993653537
OST4  0.4688503882 -0.137487333   -0.453195703
TSTD1 -0.0615034202 0.4300315287  -0.6852205341

Output Needed

需要输出

      A              B                C
ADA   1.4063964497  1.9690376378  -0.5856006063
ABI1 -0.6184167971 -0.3803235873   0.6532643621
SRA1  0.9290801469  0.0020026866  -1.2993653537
OST4  0.4688503882 -0.137487333   -0.453195703
TSTD1 -0.0615034202 0.4300315287  -0.6852205341

2 个解决方案

#1


1  

This checks that there is a match and if not then print the value that is currently there.

这将检查是否存在匹配,如果不匹配则打印当前存在的值。

awk 'FNR==NR {a[$1]=$2;next} a[$1]{$1=a[$1]}1' OFS="\t" subfile file

#2


1  

I don't understand your question at all, and I can't even see GSM155673 anywhere in it. However, if, as you suggest, ignoring the first line of one of your input files will help, you could try this to delete line 1:

我根本不明白你的问题,我甚至无法在任何地方看到GSM155673。但是,如果您按照建议忽略输入文件的第一行会有所帮助,您可以尝试删除第1行:

awk '{...}' subfile <(sed 1d file)

Or, if you don't like bash process substitution, you can ignore it within awk by adding this as the very first part of your script:

或者,如果您不喜欢bash进程替换,可以在awk中忽略它,方法是将其添加为脚本的第一部分:

FNR==1 && NR>1 {print; next}

#1


1  

This checks that there is a match and if not then print the value that is currently there.

这将检查是否存在匹配,如果不匹配则打印当前存在的值。

awk 'FNR==NR {a[$1]=$2;next} a[$1]{$1=a[$1]}1' OFS="\t" subfile file

#2


1  

I don't understand your question at all, and I can't even see GSM155673 anywhere in it. However, if, as you suggest, ignoring the first line of one of your input files will help, you could try this to delete line 1:

我根本不明白你的问题,我甚至无法在任何地方看到GSM155673。但是,如果您按照建议忽略输入文件的第一行会有所帮助,您可以尝试删除第1行:

awk '{...}' subfile <(sed 1d file)

Or, if you don't like bash process substitution, you can ignore it within awk by adding this as the very first part of your script:

或者,如果您不喜欢bash进程替换,可以在awk中忽略它,方法是将其添加为脚本的第一部分:

FNR==1 && NR>1 {print; next}