Supposing Ive a row like this:
假设我有这样的一行:
LOCUS NG_052676 31180 bp DNA linear PRI 08-AUG-2017
Which is being selected by match($0, /LOCUS\s*([^\n]*)/, o)
哪个是匹配选择的($ 0,/ LOCUS \ s *([^ \ n] *)/,o)
And printed by print o[1]
并通过打印o [1]打印
But this selects/prints the entire row because of the white space:
但由于白色空间,这会选择/打印整行:
NG_052676 31180 bp DNA linear PRI 08-AUG-2017
How can I capture the first two strings as an array, o, such that: o[1] = NG_052676 and o[2] = 31180 ?
如何将前两个字符串捕获为数组o,这样:o [1] = NG_052676和o [2] = 31180?
NB I don't want to change the FS variable as that is being used for something else
NB我不想更改FS变量,因为它被用于其他东西
NB2 This is the entire awk function I am using:
NB2这是我正在使用的整个awk功能:
BEGIN{RS="//";FS=OFS="|"}
{
match($0, /LOCUS\s*([^\n]*)/, o)
match($0, /\(([^)]+)\)/, a)
match($0, /\/gene="([^"]+)"/, b)
match($0, /\/product="([^"]+)"/, c)
match($0, /\/chromosome="([^"]+)"/, d)
match($0, /\/map="([^"]+)"/, e)
match($0, /Summary:\s([^\[]+)/, f)
print o[1] " ", a[1] " ",b[1] " ", gensub(/\s\s+/, " ", "g1", c[1]) " ",
d[1] " ", e[1] " ",
gensub(/\s\s+/, " ", "g2", f[1])
}
2 个解决方案
#1
1
With GNU awk (which you're already using) for the 3rd arg to match():
使用GNU awk(你已经在使用)为第3个arg匹配():
$ awk 'match($0, /LOCUS\s+(\S+)\s+(\S+)/, o) { print o[1], o[2] }' file
NG_052676 31180
#2
1
Since by default awk is using white space as FS , why not to consider the simpliest awk format?
因为默认情况下awk使用空格作为FS,为什么不考虑最简单的awk格式呢?
$ f1="LOCUS NG_052676 31180 bp DNA linear PRI 08-AUG-2017"
$ awk '{o[1]=$2;o[2]=$3}{print o[1],o[2]}' <(echo "$f1")
NG_052676 31180
You can still combine it with your regex:
您仍然可以将它与正则表达式结合使用:
$ awk '/LOCUS/{o[1]=$2;o[2]=$3;print o[1],o[2]}' <(echo "$f1")
#1
1
With GNU awk (which you're already using) for the 3rd arg to match():
使用GNU awk(你已经在使用)为第3个arg匹配():
$ awk 'match($0, /LOCUS\s+(\S+)\s+(\S+)/, o) { print o[1], o[2] }' file
NG_052676 31180
#2
1
Since by default awk is using white space as FS , why not to consider the simpliest awk format?
因为默认情况下awk使用空格作为FS,为什么不考虑最简单的awk格式呢?
$ f1="LOCUS NG_052676 31180 bp DNA linear PRI 08-AUG-2017"
$ awk '{o[1]=$2;o[2]=$3}{print o[1],o[2]}' <(echo "$f1")
NG_052676 31180
You can still combine it with your regex:
您仍然可以将它与正则表达式结合使用:
$ awk '/LOCUS/{o[1]=$2;o[2]=$3;print o[1],o[2]}' <(echo "$f1")