如何在文本文件中在regex之前和之后添加换行符?

This is an excerpt from the file I want to edit:

这是我想编辑的文件的摘录:

>chr1|-|9|S|somatic ACCACAGCCCTGTTTTACGTTGCGTCATCGCCCCGGGTGCCTGGTGACGTCACCAGCCCGCTCG >chr1|+|9|Y|somatic ACCACAGCCCTGTTTTACGTTGCGTCATCGCCCCGGGTGCCTGGTGACGTCACCAGCCCGCTCG

I would a new text file in which I add a line break before ">" and after "somatic" or after "germline", how can I do in R or Unix?

我将创建一个新的文本文件，在“>”之前、“somatic”之后或“germline”之后添加换行符，如何在R或Unix中完成?

Expected output:

预期的输出:

>chr1|-|9|S|somatic
ACCACAGCCCTGTTTTACGTTGCGTCATCGCCCCGGGTGCCTGGTGACGTCACCAGCCCGCTCG
>chr1|+|9|Y|somatic
ACCACAGCCCTGTTTTACGTTGCGTCATCGCCCCGGGTGCCTGGTGACGTCACCAGCCCGCTCG

3 个解决方案

#1

By the looks of your input, you could simply replace spaces with newlines:

通过输入的外观，您可以简单地用换行替换空格:

tr -s ' ' '\n' <infile >outfile

(Some tr dialects don't like \n. Try '\012' or a literal newline: opening quote, newline, closing quote.)

)有些方言不喜欢用\n。尝试“\012”或文字换行:打开报价，换行，结束报价。

If that won't work, you can easily do this in sed. If somatic is static, just hard-code it:

如果这行不通，您可以在sed中轻松实现。如果somatic是静态的，只需硬编码:

sed -e 's/somatic */&\n/g' -e 's/ >/\n>/g' file >newfile

The usual caveats about different sed dialects apply. Some versions don't like \n for newline, some want a newline or a semicolon instead of multiple -e arguments.

关于不同sed方言的常见警告适用。有些版本不喜欢用\n表示换行，有些则希望用换行符或分号代替多个-e参数。

On Linux, you can modify the file in-place:

在Linux上，您可以就地修改文件:

sed -i 's/somatic */&\
/g
s/ >/\
/g' file

(For variation, I'm showing how to do this if your sed doesn't recognize \n but allows literal newlines, and how to put the script in a single multi-line string.)

(对于变体，我将展示如果您的sed不识别\n，但允许文本换行，以及如何将脚本放入一个多行字符串中，该如何做?)

On *BSD (including MacOS) you need to add an argument to -i always; sed -i '' ...

在*BSD(包括MacOS)上，您需要向-i始终添加一个参数;sed -我”…

If somatic is variable, but you always want to replace the first space after a wedge, try something like

如果体细胞是可变的，但是你总是想在楔子后替换第一个空格，试试类似的方法

sed 's/\(>[^ ]*\) /\1\n/g'

>[^ ] matches a wedge followed by zero or more non-space characters. The parentheses capture the matched string into \1. Again, some sed variants don't want backslashes in front of the parentheses, or are otherwise just ... different.

>[^]匹配楔形后跟零个或多个字符进行技术改造。圆括号将匹配的字符串捕获为\1。同样，一些sed变体不希望在括号前面加上反斜杠，或者只是……不同。

If you have very long lines, you might bump into a sed which has problems with that. Maybe try Perl instead. (Luckily, no dialects to worry about!)

如果你有很长的线，你可能会碰到一个有问题的sed。也许试试Perl。(幸运的是，不用担心方言!)

perl -i -pe 's/(>[^ ]*) /$1\n/g;s/ >/\n>/g' file

(Skip the -i option if you don't want to modify the input file. Then output will be to standard output.)

如果您不想修改输入文件，请跳过-i选项。然后输出到标准输出)

#2

(\bsomatic\b|\bgermline\b)|(?=>)

Try this.See demo.Replace by $1\n

试试这个。看到演示。取代1美元\ n

http://regex101.com/r/tF5fT5/53

If there's no support for lookahead then try

如果没有对前瞻的支持，那么尝试一下

(\bsomatic\b|\bgermline\b)

Try this.Replace by $1\n.See demo.

试试这个。取代1美元\ n。看到演示。

http://regex101.com/r/tF5fT5/50

and

和

(>)

Replace by \n$1.See demo.

取代\ n 1美元。看到演示。

http://regex101.com/r/tF5fT5/51

#3

Thank you everyone! I used:

谢谢大家!我使用:

tr -s ' ' '\n' <infile >outfile

as suggested by tripleee and it worked perfectly!

正如tripleee所建议的，它运行得非常完美!

#1