This is an excerpt from the file I want to edit:
这是我想编辑的文件的摘录:
>chr1|-|9|S|somatic ACCACAGCCCTGTTTTACGTTGCGTCATCGCCCCGGGTGCCTGGTGACGTCACCAGCCCGCTCG >chr1|+|9|Y|somatic ACCACAGCCCTGTTTTACGTTGCGTCATCGCCCCGGGTGCCTGGTGACGTCACCAGCCCGCTCG
I would a new text file in which I add a line break before ">" and after "somatic" or after "germline", how can I do in R or Unix?
我将创建一个新的文本文件,在“>”之前、“somatic”之后或“germline”之后添加换行符,如何在R或Unix中完成?
Expected output:
预期的输出:
>chr1|-|9|S|somatic
ACCACAGCCCTGTTTTACGTTGCGTCATCGCCCCGGGTGCCTGGTGACGTCACCAGCCCGCTCG
>chr1|+|9|Y|somatic
ACCACAGCCCTGTTTTACGTTGCGTCATCGCCCCGGGTGCCTGGTGACGTCACCAGCCCGCTCG
3 个解决方案
#1
1
By the looks of your input, you could simply replace spaces with newlines:
通过输入的外观,您可以简单地用换行替换空格:
tr -s ' ' '\n' <infile >outfile
(Some tr
dialects don't like \n
. Try '\012'
or a literal newline: opening quote, newline, closing quote.)
)有些方言不喜欢用\n。尝试“\012”或文字换行:打开报价,换行,结束报价。
If that won't work, you can easily do this in sed
. If somatic
is static, just hard-code it:
如果这行不通,您可以在sed中轻松实现。如果somatic是静态的,只需硬编码:
sed -e 's/somatic */&\n/g' -e 's/ >/\n>/g' file >newfile
The usual caveats about different sed
dialects apply. Some versions don't like \n
for newline, some want a newline or a semicolon instead of multiple -e
arguments.
关于不同sed方言的常见警告适用。有些版本不喜欢用\n表示换行,有些则希望用换行符或分号代替多个-e参数。
On Linux, you can modify the file in-place:
在Linux上,您可以就地修改文件:
sed -i 's/somatic */&\
/g
s/ >/\
/g' file
(For variation, I'm showing how to do this if your sed
doesn't recognize \n
but allows literal newlines, and how to put the script in a single multi-line string.)
(对于变体,我将展示如果您的sed不识别\n,但允许文本换行,以及如何将脚本放入一个多行字符串中,该如何做?)
On *BSD (including MacOS) you need to add an argument to -i
always; sed -i '' ...
在*BSD(包括MacOS)上,您需要向-i始终添加一个参数;sed -我”…
If somatic
is variable, but you always want to replace the first space after a wedge, try something like
如果体细胞是可变的,但是你总是想在楔子后替换第一个空格,试试类似的方法
sed 's/\(>[^ ]*\) /\1\n/g'
>[^ ]
matches a wedge followed by zero or more non-space characters. The parentheses capture the matched string into \1
. Again, some sed
variants don't want backslashes in front of the parentheses, or are otherwise just ... different.
>[^]匹配楔形后跟零个或多个字符进行技术改造。圆括号将匹配的字符串捕获为\1。同样,一些sed变体不希望在括号前面加上反斜杠,或者只是……不同。
If you have very long lines, you might bump into a sed
which has problems with that. Maybe try Perl instead. (Luckily, no dialects to worry about!)
如果你有很长的线,你可能会碰到一个有问题的sed。也许试试Perl。(幸运的是,不用担心方言!)
perl -i -pe 's/(>[^ ]*) /$1\n/g;s/ >/\n>/g' file
(Skip the -i
option if you don't want to modify the input file. Then output will be to standard output.)
如果您不想修改输入文件,请跳过-i选项。然后输出到标准输出)
#2
2
(\bsomatic\b|\bgermline\b)|(?=>)
Try this.See demo.Replace by $1\n
试试这个。看到演示。取代1美元\ n
http://regex101.com/r/tF5fT5/53
http://regex101.com/r/tF5fT5/53
If there's no support for lookahead then try
如果没有对前瞻的支持,那么尝试一下
(\bsomatic\b|\bgermline\b)
Try this.Replace by $1\n
.See demo.
试试这个。取代1美元\ n。看到演示。
http://regex101.com/r/tF5fT5/50
http://regex101.com/r/tF5fT5/50
and
和
(>)
Replace by \n$1
.See demo.
取代\ n 1美元。看到演示。
http://regex101.com/r/tF5fT5/51
http://regex101.com/r/tF5fT5/51
#3
0
Thank you everyone! I used:
谢谢大家!我使用:
tr -s ' ' '\n' <infile >outfile
as suggested by tripleee and it worked perfectly!
正如tripleee所建议的,它运行得非常完美!
#1
1
By the looks of your input, you could simply replace spaces with newlines:
通过输入的外观,您可以简单地用换行替换空格:
tr -s ' ' '\n' <infile >outfile
(Some tr
dialects don't like \n
. Try '\012'
or a literal newline: opening quote, newline, closing quote.)
)有些方言不喜欢用\n。尝试“\012”或文字换行:打开报价,换行,结束报价。
If that won't work, you can easily do this in sed
. If somatic
is static, just hard-code it:
如果这行不通,您可以在sed中轻松实现。如果somatic是静态的,只需硬编码:
sed -e 's/somatic */&\n/g' -e 's/ >/\n>/g' file >newfile
The usual caveats about different sed
dialects apply. Some versions don't like \n
for newline, some want a newline or a semicolon instead of multiple -e
arguments.
关于不同sed方言的常见警告适用。有些版本不喜欢用\n表示换行,有些则希望用换行符或分号代替多个-e参数。
On Linux, you can modify the file in-place:
在Linux上,您可以就地修改文件:
sed -i 's/somatic */&\
/g
s/ >/\
/g' file
(For variation, I'm showing how to do this if your sed
doesn't recognize \n
but allows literal newlines, and how to put the script in a single multi-line string.)
(对于变体,我将展示如果您的sed不识别\n,但允许文本换行,以及如何将脚本放入一个多行字符串中,该如何做?)
On *BSD (including MacOS) you need to add an argument to -i
always; sed -i '' ...
在*BSD(包括MacOS)上,您需要向-i始终添加一个参数;sed -我”…
If somatic
is variable, but you always want to replace the first space after a wedge, try something like
如果体细胞是可变的,但是你总是想在楔子后替换第一个空格,试试类似的方法
sed 's/\(>[^ ]*\) /\1\n/g'
>[^ ]
matches a wedge followed by zero or more non-space characters. The parentheses capture the matched string into \1
. Again, some sed
variants don't want backslashes in front of the parentheses, or are otherwise just ... different.
>[^]匹配楔形后跟零个或多个字符进行技术改造。圆括号将匹配的字符串捕获为\1。同样,一些sed变体不希望在括号前面加上反斜杠,或者只是……不同。
If you have very long lines, you might bump into a sed
which has problems with that. Maybe try Perl instead. (Luckily, no dialects to worry about!)
如果你有很长的线,你可能会碰到一个有问题的sed。也许试试Perl。(幸运的是,不用担心方言!)
perl -i -pe 's/(>[^ ]*) /$1\n/g;s/ >/\n>/g' file
(Skip the -i
option if you don't want to modify the input file. Then output will be to standard output.)
如果您不想修改输入文件,请跳过-i选项。然后输出到标准输出)
#2
2
(\bsomatic\b|\bgermline\b)|(?=>)
Try this.See demo.Replace by $1\n
试试这个。看到演示。取代1美元\ n
http://regex101.com/r/tF5fT5/53
http://regex101.com/r/tF5fT5/53
If there's no support for lookahead then try
如果没有对前瞻的支持,那么尝试一下
(\bsomatic\b|\bgermline\b)
Try this.Replace by $1\n
.See demo.
试试这个。取代1美元\ n。看到演示。
http://regex101.com/r/tF5fT5/50
http://regex101.com/r/tF5fT5/50
and
和
(>)
Replace by \n$1
.See demo.
取代\ n 1美元。看到演示。
http://regex101.com/r/tF5fT5/51
http://regex101.com/r/tF5fT5/51
#3
0
Thank you everyone! I used:
谢谢大家!我使用:
tr -s ' ' '\n' <infile >outfile
as suggested by tripleee and it worked perfectly!
正如tripleee所建议的,它运行得非常完美!