I have a file such as
我有这样一个文件
head testSed.fastq
@M01551:51:000000000-BCB7H:1:1101:15800:1330 1:N:0:NGTCACTN+TATCCTCTCTTGAAGA
NGTCACTN
+
#>AAAAF#
@M01551:51:000000000-BCB7H:1:1101:15605:1331 1:N:0:NATCAGCN+TAGATCGCCAAGTTAA
NATCAGCN
+
#>>AA?C#
@M01551:51:000000000-BCB7H:1:1101:15557:1332 1:N:0:NCAGCAGN+TATCTTCTATAAATAT
NCAGCAGN
And I am attempting to replace the string after the final colon with 0
(in this example on lines 1,5,9 - but globally) using a regular expression.
我正在尝试使用正则表达式将最后一个冒号后面的字符串替换为0(在本例中,第1,5,9行)。
I have checked my regex using egrep egrep '[ATGCN]{8}\+[ATGCN]{16}$' testSed.fastq
which returns all the lines I would expect.
我已使用白鹭白鹭'[ATGCN]{8}\+[ATGCN]{16}$' testSed检查我的regex。fastq返回我所期望的所有线。
However when I try to use sed -i 's/[ATGCN]{8}\+[ATGCN]{16}$/0/g' testSed.fastq
the original file is unchanged and no replacement occurs.
然而,当我尝试使用sed -i 's/[ATGCN]{8}\+[ATGCN]{16}$/0/g' testSed。原始文件没有改变,没有发生替换。
How can I fix this? Is my regex not specific enough?
我怎么解决这个问题?我的regex是否不够具体?
2 个解决方案
#1
1
Your regex is structured as an ERE rather than a BRE, which is sed's default interpretation. Not all sed implementations support ERE, but you can check man sed
in your environment to determine whether it's possible for you. Look for -r
or -E
options. You can alternately use bounds by preceding the curly braces with backslashes.
你的正则表达式的结构是一个ERE而不是一个BRE,这是sed的默认解释。不是所有的sed实现都支持ERE,但是您可以在您的环境中检查man sed,以确定它是否适合您。寻找-r或-E选项。你可以通过在带反斜杠的花括号前交替使用边界。
That said, rather than matching the precise text in the last field, why not just look for the string that starts with a colon, and is followed by no-more-colons? The following RE is both BRE and ERE compatible.
也就是说,与其在最后一个字段中匹配精确的文本,为什么不直接查找以冒号开头、后面跟着无更多冒号的字符串呢?下面的RE是BRE和ERE兼容的。
$ sed '/^@/s/:[^:]*$/:0/' testq
@M01551:51:000000000-BCB7H:1:1101:15800:1330 1:N:0:0
NGTCACTN
+
#>AAAAF#
@M01551:51:000000000-BCB7H:1:1101:15605:1331 1:N:0:0
NATCAGCN
+
#>>AA?C#
@M01551:51:000000000-BCB7H:1:1101:15557:1332 1:N:0:0
NCAGCAGN
#2
2
Do you need a regex for this?
你需要一个regex吗?
awk -F: -v OFS=: '/^@/ {$NF = "0"} 1' testfile
That won't save in-place. If you have GNU awk you can
救不了原地。如果你有GNU awk,你可以。
gawk -F: -v OFS=: -i inplace '...' file
ref: https://www.gnu.org/software/gawk/manual/html_node/Extension-Sample-Inplace.html
裁判:https://www.gnu.org/software/gawk/manual/html_node/Extension-Sample-Inplace.html
#1
1
Your regex is structured as an ERE rather than a BRE, which is sed's default interpretation. Not all sed implementations support ERE, but you can check man sed
in your environment to determine whether it's possible for you. Look for -r
or -E
options. You can alternately use bounds by preceding the curly braces with backslashes.
你的正则表达式的结构是一个ERE而不是一个BRE,这是sed的默认解释。不是所有的sed实现都支持ERE,但是您可以在您的环境中检查man sed,以确定它是否适合您。寻找-r或-E选项。你可以通过在带反斜杠的花括号前交替使用边界。
That said, rather than matching the precise text in the last field, why not just look for the string that starts with a colon, and is followed by no-more-colons? The following RE is both BRE and ERE compatible.
也就是说,与其在最后一个字段中匹配精确的文本,为什么不直接查找以冒号开头、后面跟着无更多冒号的字符串呢?下面的RE是BRE和ERE兼容的。
$ sed '/^@/s/:[^:]*$/:0/' testq
@M01551:51:000000000-BCB7H:1:1101:15800:1330 1:N:0:0
NGTCACTN
+
#>AAAAF#
@M01551:51:000000000-BCB7H:1:1101:15605:1331 1:N:0:0
NATCAGCN
+
#>>AA?C#
@M01551:51:000000000-BCB7H:1:1101:15557:1332 1:N:0:0
NCAGCAGN
#2
2
Do you need a regex for this?
你需要一个regex吗?
awk -F: -v OFS=: '/^@/ {$NF = "0"} 1' testfile
That won't save in-place. If you have GNU awk you can
救不了原地。如果你有GNU awk,你可以。
gawk -F: -v OFS=: -i inplace '...' file
ref: https://www.gnu.org/software/gawk/manual/html_node/Extension-Sample-Inplace.html
裁判:https://www.gnu.org/software/gawk/manual/html_node/Extension-Sample-Inplace.html