I have a file like the following (but with 52 columns and 4,000 rows):
我有一个像下面这样的文件(但有52列和4,000行):
1NA2 1NB2 2RA2 2RB2
Vibrionaceae 0.22 0.25 0.36 1.02
Bacillaceae 2.0 1.76 0.55 0.23
Enterobacteriaceae 0.55 0.52 2.40 1.23
Vibrionaceae 0.22 0.25 0.36 1.02
Bacillaceae 2.0 1.76 0.55 0.23
Enterobacteriaceae 0.55 0.52 2.40 1.23
And I want it to look like this:
我希望它看起来像这样:
1NA2 1NB2 2RA2 2RB2
Vibrionaceae 0.44 0.50 0.72 2.04
Bacillaceae 4.0 3.52 1.10 0.46
Enterobacteriaceae 1.10 1.04 4.80 2.46
edit: I´m sorry, I don't want to delete the remaining rows and columns. Every row name is repeated several times, so I want it to appear only 1 time with the the total in every column. I have tried the following:
编辑:对不起,我不想删除剩余的行和列。每个行名称重复几次,所以我希望它只出现一次,每列都有一个总数。我尝试过以下方法:
awk '{a[$1]+=$2}END{for(i in a) print i,a[i]}' file
but it only does it for the first column, and I want it to work for all 52 columns.
但它只针对第一列,我希望它适用于所有52列。
1 个解决方案
#1
4
With GNU awk and a 2D array:
使用GNU awk和2D数组:
awk 'NR==1
NR>1{
for(i=2; i<=NF; i++){
a[$1][i]+=$i
}
}
END{
for(i in a){
printf("%-19s", i)
for(j=2; j<=NF; j++){
printf("%.2f ", a[i][j])
}
print ""
}
}' file
or as one-liner:
或作为一个班轮:
awk 'NR==1; NR>1{for(i=2; i<=NF; i++){a[$1][i]+=$i}} END{for(i in a){printf("%-19s", i); for(j in a[i]){printf("%.2f ", a[i][j])} print ""}}' file
Output:
输出:
1NA2 1NB2 2RA2 2RB2 Bacillaceae 4.00 3.52 1.10 0.46 Vibrionaceae 0.44 0.50 0.72 2.04 Enterobacteriaceae 1.10 1.04 4.80 2.46
NR
is the line numberNR是行号
NF
is the number of fields in a rowNF是一行中的字段数
#1
4
With GNU awk and a 2D array:
使用GNU awk和2D数组:
awk 'NR==1
NR>1{
for(i=2; i<=NF; i++){
a[$1][i]+=$i
}
}
END{
for(i in a){
printf("%-19s", i)
for(j=2; j<=NF; j++){
printf("%.2f ", a[i][j])
}
print ""
}
}' file
or as one-liner:
或作为一个班轮:
awk 'NR==1; NR>1{for(i=2; i<=NF; i++){a[$1][i]+=$i}} END{for(i in a){printf("%-19s", i); for(j in a[i]){printf("%.2f ", a[i][j])} print ""}}' file
Output:
输出:
1NA2 1NB2 2RA2 2RB2 Bacillaceae 4.00 3.52 1.10 0.46 Vibrionaceae 0.44 0.50 0.72 2.04 Enterobacteriaceae 1.10 1.04 4.80 2.46
NR
is the line numberNR是行号
NF
is the number of fields in a rowNF是一行中的字段数