使用RegEx搜索和替换使用一个空行替换多个空行

时间:2022-03-30 16:51:15

I have a file that I need to reformat and remove "extra" blank lines.

我有一个文件需要重新格式化并删除“额外”的空行。

I am using the Perl syntax regular expression search and replace functionality of UltraEdit and need the regular expression to put in the "Find What:" field.

我正在使用Perl语法正则表达式搜索和替换UltraEdit的功能,并需要正则表达式放入“查找什么:”字段。

Here is a sample of the file I need to re-format.

这是我需要重新格式化的文件的一个示例。

All current text

REPLACE with all the following:


Winter 2011 Class Schedule 

Winter 2011 Class Registration Dates:  Dec. 6, 2010 – Jan. 1, 2011
Winter 2011 Class Session Dates:  Jan. 5 – Feb. 12, 2011

DANCE

Adventures in Ballet & Tap      
3 – 6 years Instructor:  Ann Newby
Tots ages 3 – 6 years old develop a greater sense of rhythm, flexibility and coordination as they explore the basic elements of movement.
Saturdays   9 - 10 a.m.     Jan. 8 – Feb. 12        Six-week fees:   $30 


African Storytelling
3 – 6 years Instructor:  Ann Newby
Tots ages 3 – 6 years old explore storytelling and fables through spoken word, music, movement and visual arts experiences.
Saturdays   10 – 11 a.m.    Jan. 8 – Feb. 12        Six-week fee:   $30


African Dance / Children

You'll notice that some of the double blank lines have spaces or tabs or both in them.

您将注意到有些双空行中有空格或制表符,或者两者都有。

After the search and replace has been run I should have a file that looks like this.

在搜索和替换运行之后,我应该有一个像这样的文件。

All current text

REPLACE with all the following:

Winter 2011 Class Schedule 

Winter 2011 Class Registration Dates:  Dec. 6, 2010 – Jan. 1, 2011
Winter 2011 Class Session Dates:  Jan. 5 – Feb. 12, 2011

DANCE

Adventures in Ballet & Tap      
3 – 6 years Instructor:  Ann Newby
Tots ages 3 – 6 years old develop a greater sense of rhythm, flexibility and coordination as they explore the basic elements of movement.
Saturdays   9 - 10 a.m.     Jan. 8 – Feb. 12        Six-week fees:   $30 

African Storytelling
3 – 6 years Instructor:  Ann Newby
Tots ages 3 – 6 years old explore storytelling and fables through spoken word, music, movement and visual arts experiences.
Saturdays   10 – 11 a.m.    Jan. 8 – Feb. 12        Six-week fee:   $30

African Dance / Children

10 个解决方案

#1


21  

Replacing

替换

^(\s*\r\n){2,}

With

\r\n

Is what I ended up with.

这就是我最后的结果。

This only selects blank lines in multiples of two or more and replaces them with one.

这只选择两个或更多的空行,并用一个替换它们。

#2


17  

It depends what the line endings are. Assuming \n, replace this:

这取决于线的结尾是什么。假设\ n,替换:

([ \t]*\n){3,}

with \n\n.

与\ n \ n。

#3


3  

In Vim, Using

在Vim,使用

:%!cat -s

I find this is the easiest way to delete extra empty line so far.

我发现这是到目前为止删除多余的空行最简单的方法。

#4


2  

Replacing

替换

\n\s*\n\s* 

with

\n\n

should do the trick

应该足够了

#5


2  

Try this perl oneliner perl -00pe0, if you want in place editing, just add -i option

尝试这个perl oneliner perl -00pe0,如果您想进行适当的编辑,只需添加-i选项

#6


1  

For completeness I want to reference here the large post Remove / delete blank and empty lines in the user forums of UltraEdit which contains at bottom after all the explanations for newbies the solution for reducing two or more lines with nothing (empty lines) or just whitespaces (blank lines) to one empty line independent on line terminator type.

出于完整性的考虑,我想引用大帖子删除/删除空白和空行文本编辑器包含的用户论坛底部毕竟新手的解释解决方案减少两行或更多行的一无所有(空行)或空格(空行)到一个空行独立联机终结者类型。

And some words on what Alan Moore wrote in his answer:

关于艾伦·摩尔在他的回答中写道:

UltraEdit's Perl regular expression support is not crippled by its line-based architecture. Perl regular expression engines have a flag which determine if a dot matches all characters except newline characters like carriage return (CR) and line feed (LF) or really all characters including CR and LF. This makes the difference if a text file is interpreted as large byte stream or as a sequence of lines for Perl regular expression finds/replaces. In UltraEdit the flag is set by default to not include \r (CR) and \n (LF) by a dot in the regular expression search string. But this behavior can be easily changed in UltraEdit by starting the regular expression string with (?s) which changes the value of the flag match_not_dot_newline as posted in UltraEdit user forums at topic "." in Perl regular expressions doesn't include CRLFs?

超微编辑的Perl正则表达式支持没有受到基于行的体系结构的影响。Perl正则表达式引擎有一个标志,它确定点是否匹配所有字符,除了换行符(如回车符(CR)和换行符(LF)等换行符,或者实际上所有字符(包括CR和LF)。如果文本文件被解释为大字节流或Perl正则表达式查找/替换的行序列,则会产生不同的结果。在super edit中,标志被默认设置为不包含\r (CR)和\n (LF),由正则表达式搜索字符串中的一个点组成。但是,通过使用(?s)启动正则表达式字符串,就可以很容易地改变这种行为,该字符串会改变在topic上的超微编辑用户论坛上发布的match_not_dot_newline的值。

A Perl regular expression replace working for files with

一个Perl正则表达式替换了对文件的处理

  • carriage return + line feed (DOS/Windows) or
  • 回车+换行(DOS/Windows)或
  • only line feed (Unix, Mac OS 10.0 and later versions) or
  • 只有行提要(Unix、Mac OS 10.0和后续版本)或
  • only carriage return (Mac OS 9 and previous versions)
  • 只有回车(Mac OS 9和以前的版本)

as line ending with optionally trailing spaces and tabs at end of a paragraph (one or more lines) and with two or more lines without (empty line) or with whitespaces (blank line) below the paragraph could be done with search string \h*(\r?\n|\r)(?:\h*\1){2,} and \1\1 as replace string.

在段落末尾(一行或多行)以可选的拖尾空格和制表符作为行尾,在段落下方以两行或多行(空行)或空白行(空行)作为行尾,可以使用搜索字符串\h*(\r?\n|\r)(?:\h*\1){2,}和\1\1 \1\1 \1\1作为字符串替换。

Explanation:

解释:

\h* matches any horizontal whitespace character according to Unicode 0 or more times. This first part of the search expression matches horizontal whitespace characters at end of a line like horizontal tabs, normal spaces, no-break-spaces and some other not often used spaces.

\h*根据Unicode 0或更多次匹配任何水平空格字符。搜索表达式的第一部分匹配行尾的水平空白字符,如水平制表符、普通空格、不间断空格和其他一些不常用的空格。

The usage of \s is not good as this character class matches any whitespace character including the vertical whitespace characters carriage return and line feed.

\s的使用不是很好,因为这个字符类匹配任何空格字符,包括垂直空格字符回车和换行。

(\r?\n|\r) ... is an OR expression with two arguments in a marking group. The first argument matches a line feed optionally with a preceding carriage return while the second argument matches just a carriage return. So this expression matches all three common types of line terminations completely correct. It is important for the rest of the search and the replace to match always either CR+LF (both together) or just LF or just CR.

(\ r ? \ n | \ r)……在一个标记组中具有两个参数的OR表达式。第一个参数可以选择与前面的回车匹配,而第二个参数只匹配回车。因此这个表达式匹配所有三种常见的行终止类型,完全正确。对于其他搜索和替换来说,匹配总是很重要的,要么是CR+LF(两者都在一起),要么是LF,要么就是CR。

(?:\h*\1) ... is a non marking group which matches 0 or more horizontal whitespaces and the newline as found before back-referenced with \1, i.e. CR+LF or just LF or just CR. So this part of the expression finds an empty or blank line.

(?:\ h * \ 1)……是一个非标记组,它匹配0或更多水平的白空格和之前用\1反向引用的换行符,即CR+LF或只是LF或仅仅CR。

{2,} ... is a multiplier for the previous expression in the non marking group which means at least two times. So after end of a paragraph there must be two or more empty or blank lines. Only one empty or blank line below a paragraph is not enough for a positive match of search expression.

{ 2 }…为非标记组中先前表达式的乘法器,表示至少2次。因此,在一段结束后,必须有两个或两个以上的空行或空行。段落下面只有一个空行或空行不足以匹配搜索表达式。

The replace string \1\1 references twice the first found line break.

替换字符串\1\1引用了第一个发现的换行符的两倍。

The advantage of this regular expression in comparison to the others posted here is that the line ending type must not be known. The search expression finds that out and found line ending is referenced in the replace string. And probably existing trailing whitespaces at end of a paragraph and whitespaces on next line are removed also by this regular expression replace if there are two or more empty or blank lines below a paragraph.

这个正则表达式的优点与这里发布的其他表达式相比,它的优点是不应该知道行结束类型。搜索表达式会发现,在替换字符串中引用了out和found行结束。而且,如果一个段落下面有两条或两条以上的空行或空行,那么下一行可能也会被这个正则表达式替换掉。

{2,} can be replaced by + in search string if trimming whitespaces at end of a paragraph and on next empty or blank line should be also done on running this Perl regular expression replace. But please note that in this case the replace makes replaces which do not change anything at all if there are not trailing whitespaces at end of a paragraph and next line is an empty line.

{2,}在搜索字符串中可以被+替换,如果在段落结尾和在下一个空行或空行中修剪白空格也应该在运行这个Perl正则表达式替换时完成。但是请注意,在这种情况下,replace不会改变任何东西,如果在段落末尾没有拖尾白空格,下一行是空行。

#7


0  

I'm not sure what UltraEdit lets you get away with in the "replace" area, but if you cannot use a newline (I've had this problem before) but can use capture references, this might work:

我不太确定在“替换”区域使用什么超文本编辑器,但是如果您不能使用换行符(我以前遇到过这个问题),但是可以使用捕获引用,这可能是可行的:

Find    : \s*(\r\n)\s*(\r\n)\s*\r\n
Replace : $1$2

Not tested extensively, but seems to work on the sample you provided.

虽然没有经过广泛的测试,但似乎可以使用您提供的示例。

#8


0  

See this thread for what's causing the problem. As I understand it, UltraEdit regexes are greedy at the character level (i.e., within a line), but non-greedy at the line level (roughly speaking). I don't have access to UE, but I would try writing the regex so it has to match something concrete after the last blank line. For example:

请查看此线程以了解导致问题的原因。正如我所理解的,在字符级别上,超编辑正则表达式是贪婪的(例如。,但在行级别上不贪婪(粗略地说)。我无法访问UE,但我会试着写regex,这样它就必须在最后一个空行之后匹配一些具体的东西。例如:

search:   (\r\n[ \t]*){2,}(\S)
replace:  $1$2

This matches and captures two or more instances of a line separator and any horizontal whitespace that follows it, but it only retains the last one. The \S should force it to keep matching until it finds a line with at least one non-whitespace character.

这将匹配并捕获一个行分隔符的两个或多个实例和跟随它的任何水平空白,但它只保留最后一个。\S应该强制它保持匹配,直到它找到一个至少有一个非空格字符的行。

I admit that I don't have a whole lot of confidence in this solution; UltraEdit's regex support is crippled by its line-based architecture. If you want an editor that does regexes right, and you don't want to learn a whole new regex syntax (like vim's), get EditPadPro.

我承认我对这个解决方案并不是很有信心;超微编辑的regex支持因其基于行的体系结构而受损。如果您希望编辑器能够正确地执行regexe,并且您不想学习全新的regex语法(如vim的语法),那么请获取EditPadPro。

#9


0  

Should also work with spaces on blank lines

还应该在空行上使用空格吗

  • Search - /\n^\s*\n/
  • 搜索- / \ n \ n ^ \ s * /
  • Replace - \n\n
  • 取代- \ n \ n

#10


0  

On my Intellij IDE what was search for \n\n and Replace it by \n

在我的Intellij IDE中,搜索的是什么

#1


21  

Replacing

替换

^(\s*\r\n){2,}

With

\r\n

Is what I ended up with.

这就是我最后的结果。

This only selects blank lines in multiples of two or more and replaces them with one.

这只选择两个或更多的空行,并用一个替换它们。

#2


17  

It depends what the line endings are. Assuming \n, replace this:

这取决于线的结尾是什么。假设\ n,替换:

([ \t]*\n){3,}

with \n\n.

与\ n \ n。

#3


3  

In Vim, Using

在Vim,使用

:%!cat -s

I find this is the easiest way to delete extra empty line so far.

我发现这是到目前为止删除多余的空行最简单的方法。

#4


2  

Replacing

替换

\n\s*\n\s* 

with

\n\n

should do the trick

应该足够了

#5


2  

Try this perl oneliner perl -00pe0, if you want in place editing, just add -i option

尝试这个perl oneliner perl -00pe0,如果您想进行适当的编辑,只需添加-i选项

#6


1  

For completeness I want to reference here the large post Remove / delete blank and empty lines in the user forums of UltraEdit which contains at bottom after all the explanations for newbies the solution for reducing two or more lines with nothing (empty lines) or just whitespaces (blank lines) to one empty line independent on line terminator type.

出于完整性的考虑,我想引用大帖子删除/删除空白和空行文本编辑器包含的用户论坛底部毕竟新手的解释解决方案减少两行或更多行的一无所有(空行)或空格(空行)到一个空行独立联机终结者类型。

And some words on what Alan Moore wrote in his answer:

关于艾伦·摩尔在他的回答中写道:

UltraEdit's Perl regular expression support is not crippled by its line-based architecture. Perl regular expression engines have a flag which determine if a dot matches all characters except newline characters like carriage return (CR) and line feed (LF) or really all characters including CR and LF. This makes the difference if a text file is interpreted as large byte stream or as a sequence of lines for Perl regular expression finds/replaces. In UltraEdit the flag is set by default to not include \r (CR) and \n (LF) by a dot in the regular expression search string. But this behavior can be easily changed in UltraEdit by starting the regular expression string with (?s) which changes the value of the flag match_not_dot_newline as posted in UltraEdit user forums at topic "." in Perl regular expressions doesn't include CRLFs?

超微编辑的Perl正则表达式支持没有受到基于行的体系结构的影响。Perl正则表达式引擎有一个标志,它确定点是否匹配所有字符,除了换行符(如回车符(CR)和换行符(LF)等换行符,或者实际上所有字符(包括CR和LF)。如果文本文件被解释为大字节流或Perl正则表达式查找/替换的行序列,则会产生不同的结果。在super edit中,标志被默认设置为不包含\r (CR)和\n (LF),由正则表达式搜索字符串中的一个点组成。但是,通过使用(?s)启动正则表达式字符串,就可以很容易地改变这种行为,该字符串会改变在topic上的超微编辑用户论坛上发布的match_not_dot_newline的值。

A Perl regular expression replace working for files with

一个Perl正则表达式替换了对文件的处理

  • carriage return + line feed (DOS/Windows) or
  • 回车+换行(DOS/Windows)或
  • only line feed (Unix, Mac OS 10.0 and later versions) or
  • 只有行提要(Unix、Mac OS 10.0和后续版本)或
  • only carriage return (Mac OS 9 and previous versions)
  • 只有回车(Mac OS 9和以前的版本)

as line ending with optionally trailing spaces and tabs at end of a paragraph (one or more lines) and with two or more lines without (empty line) or with whitespaces (blank line) below the paragraph could be done with search string \h*(\r?\n|\r)(?:\h*\1){2,} and \1\1 as replace string.

在段落末尾(一行或多行)以可选的拖尾空格和制表符作为行尾,在段落下方以两行或多行(空行)或空白行(空行)作为行尾,可以使用搜索字符串\h*(\r?\n|\r)(?:\h*\1){2,}和\1\1 \1\1 \1\1作为字符串替换。

Explanation:

解释:

\h* matches any horizontal whitespace character according to Unicode 0 or more times. This first part of the search expression matches horizontal whitespace characters at end of a line like horizontal tabs, normal spaces, no-break-spaces and some other not often used spaces.

\h*根据Unicode 0或更多次匹配任何水平空格字符。搜索表达式的第一部分匹配行尾的水平空白字符,如水平制表符、普通空格、不间断空格和其他一些不常用的空格。

The usage of \s is not good as this character class matches any whitespace character including the vertical whitespace characters carriage return and line feed.

\s的使用不是很好,因为这个字符类匹配任何空格字符,包括垂直空格字符回车和换行。

(\r?\n|\r) ... is an OR expression with two arguments in a marking group. The first argument matches a line feed optionally with a preceding carriage return while the second argument matches just a carriage return. So this expression matches all three common types of line terminations completely correct. It is important for the rest of the search and the replace to match always either CR+LF (both together) or just LF or just CR.

(\ r ? \ n | \ r)……在一个标记组中具有两个参数的OR表达式。第一个参数可以选择与前面的回车匹配,而第二个参数只匹配回车。因此这个表达式匹配所有三种常见的行终止类型,完全正确。对于其他搜索和替换来说,匹配总是很重要的,要么是CR+LF(两者都在一起),要么是LF,要么就是CR。

(?:\h*\1) ... is a non marking group which matches 0 or more horizontal whitespaces and the newline as found before back-referenced with \1, i.e. CR+LF or just LF or just CR. So this part of the expression finds an empty or blank line.

(?:\ h * \ 1)……是一个非标记组,它匹配0或更多水平的白空格和之前用\1反向引用的换行符,即CR+LF或只是LF或仅仅CR。

{2,} ... is a multiplier for the previous expression in the non marking group which means at least two times. So after end of a paragraph there must be two or more empty or blank lines. Only one empty or blank line below a paragraph is not enough for a positive match of search expression.

{ 2 }…为非标记组中先前表达式的乘法器,表示至少2次。因此,在一段结束后,必须有两个或两个以上的空行或空行。段落下面只有一个空行或空行不足以匹配搜索表达式。

The replace string \1\1 references twice the first found line break.

替换字符串\1\1引用了第一个发现的换行符的两倍。

The advantage of this regular expression in comparison to the others posted here is that the line ending type must not be known. The search expression finds that out and found line ending is referenced in the replace string. And probably existing trailing whitespaces at end of a paragraph and whitespaces on next line are removed also by this regular expression replace if there are two or more empty or blank lines below a paragraph.

这个正则表达式的优点与这里发布的其他表达式相比,它的优点是不应该知道行结束类型。搜索表达式会发现,在替换字符串中引用了out和found行结束。而且,如果一个段落下面有两条或两条以上的空行或空行,那么下一行可能也会被这个正则表达式替换掉。

{2,} can be replaced by + in search string if trimming whitespaces at end of a paragraph and on next empty or blank line should be also done on running this Perl regular expression replace. But please note that in this case the replace makes replaces which do not change anything at all if there are not trailing whitespaces at end of a paragraph and next line is an empty line.

{2,}在搜索字符串中可以被+替换,如果在段落结尾和在下一个空行或空行中修剪白空格也应该在运行这个Perl正则表达式替换时完成。但是请注意,在这种情况下,replace不会改变任何东西,如果在段落末尾没有拖尾白空格,下一行是空行。

#7


0  

I'm not sure what UltraEdit lets you get away with in the "replace" area, but if you cannot use a newline (I've had this problem before) but can use capture references, this might work:

我不太确定在“替换”区域使用什么超文本编辑器,但是如果您不能使用换行符(我以前遇到过这个问题),但是可以使用捕获引用,这可能是可行的:

Find    : \s*(\r\n)\s*(\r\n)\s*\r\n
Replace : $1$2

Not tested extensively, but seems to work on the sample you provided.

虽然没有经过广泛的测试,但似乎可以使用您提供的示例。

#8


0  

See this thread for what's causing the problem. As I understand it, UltraEdit regexes are greedy at the character level (i.e., within a line), but non-greedy at the line level (roughly speaking). I don't have access to UE, but I would try writing the regex so it has to match something concrete after the last blank line. For example:

请查看此线程以了解导致问题的原因。正如我所理解的,在字符级别上,超编辑正则表达式是贪婪的(例如。,但在行级别上不贪婪(粗略地说)。我无法访问UE,但我会试着写regex,这样它就必须在最后一个空行之后匹配一些具体的东西。例如:

search:   (\r\n[ \t]*){2,}(\S)
replace:  $1$2

This matches and captures two or more instances of a line separator and any horizontal whitespace that follows it, but it only retains the last one. The \S should force it to keep matching until it finds a line with at least one non-whitespace character.

这将匹配并捕获一个行分隔符的两个或多个实例和跟随它的任何水平空白,但它只保留最后一个。\S应该强制它保持匹配,直到它找到一个至少有一个非空格字符的行。

I admit that I don't have a whole lot of confidence in this solution; UltraEdit's regex support is crippled by its line-based architecture. If you want an editor that does regexes right, and you don't want to learn a whole new regex syntax (like vim's), get EditPadPro.

我承认我对这个解决方案并不是很有信心;超微编辑的regex支持因其基于行的体系结构而受损。如果您希望编辑器能够正确地执行regexe,并且您不想学习全新的regex语法(如vim的语法),那么请获取EditPadPro。

#9


0  

Should also work with spaces on blank lines

还应该在空行上使用空格吗

  • Search - /\n^\s*\n/
  • 搜索- / \ n \ n ^ \ s * /
  • Replace - \n\n
  • 取代- \ n \ n

#10


0  

On my Intellij IDE what was search for \n\n and Replace it by \n

在我的Intellij IDE中,搜索的是什么