I am hoping this is quite simple... I am trying to remove a footer from a block of text using a regular expression, this includes the two initial line breaks which is where my problem lies.
我希望这很简单……我正在尝试使用正则表达式从文本块中删除页脚,这包括两个初始换行符,这就是我的问题所在。
Message body blah blah balh
{Line Break}
{Line Break}
----------------------------------
Custom footer text
I have been experimenting with variations of /\?(\r\n)(\r\n)([-{34}])/.*
but nothing is working.
我一直在试验各种变化的/\?(\r\n)(\r\n)([-{34}])/。但是没有任何东西是有效的。
1 个解决方案
#1
3
I made a test and this works:
我做了一个测试
[\r\n]*-{34}[\w\s\n\r]*
Here's the code:
这是代码:
var input = @"Message body blah blah balh
----------------------------------
Custom footer text";
var pattern = @"[\r\n]*-{34}[\w\s\n\r]*";
var clean = Regex.Replace(input, pattern, "", RegexOptions.Multiline);
Console.WriteLine(clean);
The output is the expected one:
输出是预期的:
Message body blah blah balh
There were several problems with the initial approach. Some of them were pointed out by abc667 in the comment above.
最初的方法存在几个问题。abc667在上面的评论中指出了其中的一些。
Here are two others:
这里有两个:
-
when you do
(\r\n)
, you are expecting the exact character sequence CR, LF. In some operating systems however, a line break can be represented by only a\n
(LF). To make the pattern work for both cases, you could use a character class, like so:[\r\n]*
. This means: "all the sequence of\n
and/or\r
characters you can find, in any order".当你这样做的时候(\r\n),你正在期待确切的字符序列CR, LF。然而,在某些操作系统中,断行只能用一个\n (LF)表示。要使模式对这两种情况都有效,您可以使用字符类,如:[\r\n]*。这意味着:“你可以找到任何顺序的\n和/或\r字符”。
-
the dot (
.
) matches any single character except \n (see docs). In some regex flavours it may also match newlines under special conditions (see "(dot)" here), but not in .NET. This is why I replaced the.*
that was supposed to match everything after the dotted line with[\w\s\r\n]*
that will match any word characters, space characters, CR and LF.点(.)匹配除\n(参见文档)之外的任何单个字符。在某些regex风格中,它也可能在特殊条件下匹配新行(参见这里的“(dot)”),但在。net中不会。这就是为什么我用[\w\s\r\n]*替换了.*,该*应该在虚线之后匹配所有内容,该*将匹配任何单词字符、空格字符、CR和LF。
#1
3
I made a test and this works:
我做了一个测试
[\r\n]*-{34}[\w\s\n\r]*
Here's the code:
这是代码:
var input = @"Message body blah blah balh
----------------------------------
Custom footer text";
var pattern = @"[\r\n]*-{34}[\w\s\n\r]*";
var clean = Regex.Replace(input, pattern, "", RegexOptions.Multiline);
Console.WriteLine(clean);
The output is the expected one:
输出是预期的:
Message body blah blah balh
There were several problems with the initial approach. Some of them were pointed out by abc667 in the comment above.
最初的方法存在几个问题。abc667在上面的评论中指出了其中的一些。
Here are two others:
这里有两个:
-
when you do
(\r\n)
, you are expecting the exact character sequence CR, LF. In some operating systems however, a line break can be represented by only a\n
(LF). To make the pattern work for both cases, you could use a character class, like so:[\r\n]*
. This means: "all the sequence of\n
and/or\r
characters you can find, in any order".当你这样做的时候(\r\n),你正在期待确切的字符序列CR, LF。然而,在某些操作系统中,断行只能用一个\n (LF)表示。要使模式对这两种情况都有效,您可以使用字符类,如:[\r\n]*。这意味着:“你可以找到任何顺序的\n和/或\r字符”。
-
the dot (
.
) matches any single character except \n (see docs). In some regex flavours it may also match newlines under special conditions (see "(dot)" here), but not in .NET. This is why I replaced the.*
that was supposed to match everything after the dotted line with[\w\s\r\n]*
that will match any word characters, space characters, CR and LF.点(.)匹配除\n(参见文档)之外的任何单个字符。在某些regex风格中,它也可能在特殊条件下匹配新行(参见这里的“(dot)”),但在。net中不会。这就是为什么我用[\w\s\r\n]*替换了.*,该*应该在虚线之后匹配所有内容,该*将匹配任何单词字符、空格字符、CR和LF。