I have a file on which I am trying to use awk
to remove the text before the ()
, but keep the text in the ()
. I am also trying to remove the whitespace and text after the _#
and then output the entire line. Maybe sed
is a better choice, but I am not certain how.
我有一个文件,我试图使用awk删除()之前的文本,但保持文本在()。我也试图删除_#后面的空格和文本,然后输出整行。也许sed是一个更好的选择,但我不确定如何。
file
文件
chr4 100009839 100009851 426_1201_128(ADH5)_1 0 -
chr4 100006265 100006367 426_1202_128(ADH5)_2 0 -
chr4 100003125 100003267 426_1203_128(ADH5)_3 0 -
desired output
期望的输出
chr4 100009839 100009851 ADH5_1
chr4 100006265 100006367 ADH5_2
chr4 100003125 100003267 ADH5_3
awk
AWK
awk -F'()_*' '{print $1,$2,$3,$4}' file
2 个解决方案
#1
1
awk -F'[\t()]' '{OFS="\t"; print $1, $2, $3, $5 $6}' file
Output:
输出:
chr4 100009839 100009851 ADH5_1 chr4 100006265 100006367 ADH5_2 chr4 100003125 100003267 ADH5_3
#2
1
Using sed with a substitution:
使用带替换的sed:
$ sed 's/[^ ]*(\([^)]*\))\(_[^ ]*\).*$/\1\2/' infile
chr4 100009839 100009851 ADH5_1
chr4 100006265 100006367 ADH5_2
chr4 100003125 100003267 ADH5_3
Taking apart the regex:
拆开正则表达式:
[^ ]*( # Non-spaces up to and including opening parenthesis
\( # Start first capture group
[^)]* # Content between parentheses: everything but a closing parenthesis
\) # End of first capture group
) # Closing parenthesis, not captured
\( # Start second capture group
_[^ ]* # Underscore and non-spaces, '_1' etc.
\) # End of second capture group
.*$ # Rest of line, not captured
#1
1
awk -F'[\t()]' '{OFS="\t"; print $1, $2, $3, $5 $6}' file
Output:
输出:
chr4 100009839 100009851 ADH5_1 chr4 100006265 100006367 ADH5_2 chr4 100003125 100003267 ADH5_3
#2
1
Using sed with a substitution:
使用带替换的sed:
$ sed 's/[^ ]*(\([^)]*\))\(_[^ ]*\).*$/\1\2/' infile
chr4 100009839 100009851 ADH5_1
chr4 100006265 100006367 ADH5_2
chr4 100003125 100003267 ADH5_3
Taking apart the regex:
拆开正则表达式:
[^ ]*( # Non-spaces up to and including opening parenthesis
\( # Start first capture group
[^)]* # Content between parentheses: everything but a closing parenthesis
\) # End of first capture group
) # Closing parenthesis, not captured
\( # Start second capture group
_[^ ]* # Underscore and non-spaces, '_1' etc.
\) # End of second capture group
.*$ # Rest of line, not captured