正则表达式匹配不包括特定上下文

时间:2022-09-13 16:28:37

I'm trying to search a string for words within single quotes, but only if those single quotes are not within parentheses.

我正在尝试在单引号内搜索字符串中的单词,但前提是这些单引号不在括号内。

Example string: something, 'foo', something ('bar')

示例字符串:某事,'foo',某事('bar')

So for the given example I'd like to match foo, but not bar.

所以对于给定的例子,我想匹配foo,但不是bar。

After searching for regex examples I'm able to match within single quotes (see below code snippet), but am not sure how to exclude matches in the context previously described.

在搜索正则表达式示例后,我能够在单引号内匹配(请参阅下面的代码段),但我不确定如何在前面描述的上下文中排除匹配。

string line = "something, 'foo', something ('bar')";
Match name = Regex.Match(line, @"'([^']*)");
if (name.Success)
{
    string matchedName = name.Groups[1].Value;
    Console.WriteLine(matchedName);
}

2 个解决方案

#1


3  

I would recommend using lookahead instead (see it live) using:

我建议使用lookahead(使用前观看):

(?<!\()'([^']*)'(?!\))

Or with C#:

或者使用C#:

string line = "something, 'foo', something ('bar')";
Match name = Regex.Match(line, @"(?<!\()'([^']*)'(?!\))");
if (name.Success)
{
    Console.WriteLine(name.Groups[1].Value);
}

#2


2  

The easiest way to get what you need is to use an alternation group and match and capture what you need and only match what you do not need:

获得所需内容的最简单方法是使用一个交替组,匹配并捕获您需要的内容,并且只匹配您不需要的内容:

\([^()]*\)|'([^']*)'

See the regex demo

请参阅正则表达式演示

Details:

  • \( - a (
  • \( - 一个 (

  • [^()]* - 0+ chars other than ( and )
  • [^()] * - 0+以外的字符(和)

  • \) - a )
  • \) - 一个 )

  • | - or
  • | - 要么

  • ' - a '
  • ' - 一个 '

  • ([^']*) - Group 1 capturing 0+ chars other than '
  • ([^'] *) - 第1组捕获0以外的字符

  • ' - a single quote.
  • ' - 单引号。

In C#, use .Groups[1].Value to get the values you need. See the online demo:

在C#中,使用.Groups [1] .Value来获取所需的值。查看在线演示:

var str = "something, 'foo', something ('bar')";
var result = Regex.Matches(str, @"\([^()]*\)|'([^']*)'")
    .Cast<Match>()
    .Select(m => m.Groups[1].Value)
    .ToList();

Another alternative is the one mentioned by Thomas, but since it is .NET, you may use infinite-width lookbehind:

另一种选择是Thomas提到的那个,但由于它是.NET,你可以使用无限宽度的lookbehind:

(?<!\([^()]*)'([^']*)'(?![^()]*\))

See this regex demo.

看到这个正则表达式演示。

Details:

  • (?<!\([^()]*) - a negative lookbehind failing the match if there is ( followed with 0+ chars other than ( and ) up to
  • (?

  • '([^']*)' - a quote, 0+ chars other than single quote captured into Group 1, and another single quote
  • '([^'] *)' - 一个引用,0 +字符除了捕获到第1组的单引号,另一个单引号

  • (?![^()]*\)) - a negative lookahead that fails the match if there are 0+ chars other than ( and ) followed with ) right after the ' from the preceding subpattern.
  • (?![^()] * \)) - 如果在前一个子模式之后的'之后有0个字符而不是(和)后面有0个字符,则表示匹配失败。

Since you'd want to exclude ', the same code as above applies.

由于您要排除',因此适用与上述相同的代码。

#1


3  

I would recommend using lookahead instead (see it live) using:

我建议使用lookahead(使用前观看):

(?<!\()'([^']*)'(?!\))

Or with C#:

或者使用C#:

string line = "something, 'foo', something ('bar')";
Match name = Regex.Match(line, @"(?<!\()'([^']*)'(?!\))");
if (name.Success)
{
    Console.WriteLine(name.Groups[1].Value);
}

#2


2  

The easiest way to get what you need is to use an alternation group and match and capture what you need and only match what you do not need:

获得所需内容的最简单方法是使用一个交替组,匹配并捕获您需要的内容,并且只匹配您不需要的内容:

\([^()]*\)|'([^']*)'

See the regex demo

请参阅正则表达式演示

Details:

  • \( - a (
  • \( - 一个 (

  • [^()]* - 0+ chars other than ( and )
  • [^()] * - 0+以外的字符(和)

  • \) - a )
  • \) - 一个 )

  • | - or
  • | - 要么

  • ' - a '
  • ' - 一个 '

  • ([^']*) - Group 1 capturing 0+ chars other than '
  • ([^'] *) - 第1组捕获0以外的字符

  • ' - a single quote.
  • ' - 单引号。

In C#, use .Groups[1].Value to get the values you need. See the online demo:

在C#中,使用.Groups [1] .Value来获取所需的值。查看在线演示:

var str = "something, 'foo', something ('bar')";
var result = Regex.Matches(str, @"\([^()]*\)|'([^']*)'")
    .Cast<Match>()
    .Select(m => m.Groups[1].Value)
    .ToList();

Another alternative is the one mentioned by Thomas, but since it is .NET, you may use infinite-width lookbehind:

另一种选择是Thomas提到的那个,但由于它是.NET,你可以使用无限宽度的lookbehind:

(?<!\([^()]*)'([^']*)'(?![^()]*\))

See this regex demo.

看到这个正则表达式演示。

Details:

  • (?<!\([^()]*) - a negative lookbehind failing the match if there is ( followed with 0+ chars other than ( and ) up to
  • (?

  • '([^']*)' - a quote, 0+ chars other than single quote captured into Group 1, and another single quote
  • '([^'] *)' - 一个引用,0 +字符除了捕获到第1组的单引号,另一个单引号

  • (?![^()]*\)) - a negative lookahead that fails the match if there are 0+ chars other than ( and ) followed with ) right after the ' from the preceding subpattern.
  • (?![^()] * \)) - 如果在前一个子模式之后的'之后有0个字符而不是(和)后面有0个字符,则表示匹配失败。

Since you'd want to exclude ', the same code as above applies.

由于您要排除',因此适用与上述相同的代码。