我对正则表达式的微调有问题

时间:2022-09-20 18:40:38

i've got regex which was alright, but as it camed out doesn't work well in some situations

我有正则表达式,这是正常的,但它在一些情况下出来并不好用

Keep eye on message preview cause message editor do some tricky things with "\"

留意消息预览原因消息编辑器用“\”做一些棘手的事情

[\[]?[\^%#\$\*@\-;].*?[\^%#\$\*@\-;][\]]

its task is to find pattern which in general looks like that

它的任务是找到一般看起来像这样的模式

[ABA]

  • A - char from set ^,%,#,$,*,@,-,;
  • A - 来自set ^,%,#,$,*,@, - ,char的char

  • B - some text
  • B - 一些文字

  • [ and ] are included in pattern
  • [和]包含在模式中

is expected to find all occurences of this pattern in test string

预计会在测试字符串中找到此模式的所有出现

Black fox [#sample1#] [%sample2%] - [#sample3#] eats blocks.

黑狐狸[#sample1#] [%sample2%] - [#sample3#]吃块。

but instead of expected list of matches

而不是预期的匹配列表

  • "[#sample1#]"
  • "[%sample2%]"
  • "[#sample3#]"

I get this

我明白了

  • "[#sample1#]"
  • "[%sample2%]"
  • "- [#sample3#]"

And it seems that this problem will occur also with other chars in set "A". So could somebody suggest changes to my regex to make it work as i need?

似乎这个问题也会在集合“A”中的其他字符中出现。那么有人可以建议改变我的正则表达式,使其按我的需要工作吗?

and less important thing, how to make my regex to exclude patterns which look like that

而不太重要的是,如何使我的正则表达式排除看起来像这样的模式

[ABC]

  • A - char from set ^,%,#,$,*,@,-,;
  • A - 来自set ^,%,#,$,*,@, - ,char的char

  • B - some text
  • B - 一些文字

  • C - char from set ^,%,#,$,*,@,-,; other than A
  • C - 来自set ^,%,#,$,*,@, - ,char的char除了A

  • [ and ] are included in pattern
  • [和]包含在模式中

for example

[$sample1#] [%sample2@] [%sample3;]

[$ sample1#] [%sample2 @] [%sample3;]

thanks in advance

提前致谢

MTH

3 个解决方案

#1


3  

\[([%#$*@;^-]).+?\1\]

applied to text:

适用于文字:

Black fox [#sample1#] [%sample2%] - [#sample3#] [%sample4;] eats blocks.

matches

  • [#sample1#]
  • [%sample2%]
  • [#sample3#]
  • but not [%sample4;]
  • 但不是[%sample4;]

EDIT

This works for me (Output as expected, regex accepted by C# as expected):

这适用于我(按预期输出,C#按预期接受正则表达式):

Regex re = new Regex(@"\[([%#$*@;^-]).+?\1\]");
string s = "Black fox [#sample1#] [%sample2%] - [#sample3#] [%sample4;] eats blocks.";

MatchCollection mc = re.Matches(s);
foreach (Match m in mc)
{
  Console.WriteLine(m.Value);
}

#2


1  

Why the first "?" in "[[]?"

为什么第一个“?”在“[[]?”

\[[\^%#\$\*@\-;].*?[\^%#\$\*@\-;]\]

would detect your different strings just fine

会检测你的不同字符串就好了

To be more precise:

更确切地说:

\[([\^%#\$\*@\-;])([^\]]*?)(?=\1)([\^%#\$\*@\-;])\]

would detect [ABA]

会发现[ABA]

\[([\^%#\$\*@\-;])([^\]]*?)(?!\1)([\^%#\$\*@\-;])\]

would detect [ABC]

会发现[ABC]

#3


1  

You have an optional matching of the opening square bracket:

您有一个可选的开口方括号:

[\]]?

For the second part of you question (and to perhaps simplify) try this:

对于你问题的第二部分(也许是简化),试试这个:

\[\%[^\%]+\%\]|\[\#[^\#]+\#\]|\[\$[^\$]+\$\]

In this case there is a sub pattern for each possible delimiter. The | character is "OR", so it will match if any of the 3 sub expressions match.

在这种情况下,每个可能的分隔符都有一个子模式。 |字符是“OR”,因此如果3个子表达式中的任何一个匹配,它将匹配。

Each subexpression will:

每个子表达式将:

  • Opening bracket
  • Special Char
  • Everything that is not a special char (1)
  • 一切都不是特殊的炭(1)

  • Special char
  • Closing backet

(1) may need to add extra exclusions like ']' or '[' so it doesn't accidently match across a large body of text like:

(1)可能需要添加额外的排除项,例如']'或'[',因此它不会意外地匹配大量文本,例如:

[%MyVar#] blah blah [$OtherVar%]

[%MyVar#]等等[$ OtherVar%]

Rob

#1


3  

\[([%#$*@;^-]).+?\1\]

applied to text:

适用于文字:

Black fox [#sample1#] [%sample2%] - [#sample3#] [%sample4;] eats blocks.

matches

  • [#sample1#]
  • [%sample2%]
  • [#sample3#]
  • but not [%sample4;]
  • 但不是[%sample4;]

EDIT

This works for me (Output as expected, regex accepted by C# as expected):

这适用于我(按预期输出,C#按预期接受正则表达式):

Regex re = new Regex(@"\[([%#$*@;^-]).+?\1\]");
string s = "Black fox [#sample1#] [%sample2%] - [#sample3#] [%sample4;] eats blocks.";

MatchCollection mc = re.Matches(s);
foreach (Match m in mc)
{
  Console.WriteLine(m.Value);
}

#2


1  

Why the first "?" in "[[]?"

为什么第一个“?”在“[[]?”

\[[\^%#\$\*@\-;].*?[\^%#\$\*@\-;]\]

would detect your different strings just fine

会检测你的不同字符串就好了

To be more precise:

更确切地说:

\[([\^%#\$\*@\-;])([^\]]*?)(?=\1)([\^%#\$\*@\-;])\]

would detect [ABA]

会发现[ABA]

\[([\^%#\$\*@\-;])([^\]]*?)(?!\1)([\^%#\$\*@\-;])\]

would detect [ABC]

会发现[ABC]

#3


1  

You have an optional matching of the opening square bracket:

您有一个可选的开口方括号:

[\]]?

For the second part of you question (and to perhaps simplify) try this:

对于你问题的第二部分(也许是简化),试试这个:

\[\%[^\%]+\%\]|\[\#[^\#]+\#\]|\[\$[^\$]+\$\]

In this case there is a sub pattern for each possible delimiter. The | character is "OR", so it will match if any of the 3 sub expressions match.

在这种情况下,每个可能的分隔符都有一个子模式。 |字符是“OR”,因此如果3个子表达式中的任何一个匹配,它将匹配。

Each subexpression will:

每个子表达式将:

  • Opening bracket
  • Special Char
  • Everything that is not a special char (1)
  • 一切都不是特殊的炭(1)

  • Special char
  • Closing backet

(1) may need to add extra exclusions like ']' or '[' so it doesn't accidently match across a large body of text like:

(1)可能需要添加额外的排除项,例如']'或'[',因此它不会意外地匹配大量文本,例如:

[%MyVar#] blah blah [$OtherVar%]

[%MyVar#]等等[$ OtherVar%]

Rob