正则表达式将新行插入特定位置的大块文本

时间:2022-02-05 11:13:24

I have a rather large text file that has a bunch of missing newlines, meaning that it's a mess. I need to break it up into appropriate lines.

我有一个相当大的文本文件,有一堆缺少的换行符,这意味着它是一个烂摊子。我需要把它分解成适当的行。

The text looks something like this now:

文本现在看起来像这样:

12345 This is a chunk 23456 This is another chunk 34567 This is yet another chunk 45678 This is yet more chunk 56789 Yet another piece of text

I need a regex that will insert a newline (CR/LF pair) before each group of five digits, resulting in something like this:

我需要一个正则表达式,它将在每组五位数之前插入换行符(CR / LF对),结果如下:

12345 This is a chunk 
23456 This is another chunk 
34567 This is yet another chunk 
45678 This is yet more chunk 
56789 Yet another piece of text

It can insert one before the first group of digits or not; that I can deal with.

它可以在第一组数字之前插入一个数字;我可以处理。

Any ideas? Thanks.

有任何想法吗?谢谢。

2 个解决方案

#1


Very simple (but not as "flashy" as possible, since I'm too lazy to use lookaheads):

非常简单(但不是尽可能“浮华”,因为我懒得使用前瞻):

s/(\d{5})/\r\n\1/gs

#2


s/(?<=\D)(\d{5})(?=\D|$)/\n\1/g

On "\n" vs. "\r\n"

It might depend on the programming language at hand but Perl and Python replace \n by \r\n on Windows therefore it is a mistake in this case to replace \n by \r\n in the above regex.

它可能取决于手头的编程语言,但Perl和Python在Windows上用\ r \ n替换\ n因此在这种情况下,在上面的正则表达式中用\ r \ n替换\ n是错误的。

#1


Very simple (but not as "flashy" as possible, since I'm too lazy to use lookaheads):

非常简单(但不是尽可能“浮华”,因为我懒得使用前瞻):

s/(\d{5})/\r\n\1/gs

#2


s/(?<=\D)(\d{5})(?=\D|$)/\n\1/g

On "\n" vs. "\r\n"

It might depend on the programming language at hand but Perl and Python replace \n by \r\n on Windows therefore it is a mistake in this case to replace \n by \r\n in the above regex.

它可能取决于手头的编程语言,但Perl和Python在Windows上用\ r \ n替换\ n因此在这种情况下,在上面的正则表达式中用\ r \ n替换\ n是错误的。