需要复杂的Regex Split模式

时间:2022-09-29 21:40:43

I'd like to split the following string

我想拆分下面的字符串

// Comments
KeyA : SomeType { SubKey : SubValue } KeyB:'This\'s a string'
KeyC : [ 1 2 3 ] // array value

into

KeyA
:
SomeType
{ SubKey : SubValue }
KeyB
:
This's a string
KeyC
:
[ 1 2 3 ]

(: and blank spaces are the delimiters although : is kept in the result; comments are ignored; no splitting between {}, [], or '')

(:和空格是分隔符,但是:保留在结果中;注释被忽略; {},[]或''之间没有分隔)

Can I achieve that with Regex Split or Match? If so, what would be the right pattern? Comments to the pattern string would be appreciated.

我能用Regex Split或Match实现吗?如果是这样,那么正确的模式是什么?对模式字符串的评论将不胜感激。

Moreover, it's also desirable to throw exception or return an error message if the input string is not valid (see the comment below).

此外,如果输入字符串无效,还需要抛出异常或返回错误消息(请参阅下面的注释)。

Thanks.

3 个解决方案

#1


1  

You can use this pattern...

你可以使用这种模式......

string pattern = @"(\w+)\s*:\s*((?>[^\w\s\"'{[:]+|\w+\b(?!\s*:)|\s(?!\w+\s*:|$)|\[[^]]*]|{[^}]*}|\"(?>[^\"\\]|\\.)*\"|'(?>[^'\\]|\\.)*')+)\s*";

... in two ways:

......有两种方式:

  1. with Match method which will give you what you are looking for with keys in group 1 and values in group 2
  2. 使用匹配方法,可以使用组1中的键和组2中的值为您提供所需的内容

  3. with Split method, but you must remove all the empty results.
  4. 使用Split方法,但必须删除所有空结果。

How is build the second part (after the :) of the pattern?

如何构建模式的第二部分(在:)之后?

The idea is to avoid, first at all, problematic characters: [^\w\s\"'{[:]+ Then you allow each of these characters but in a specific situation:

这个想法首先要避免有问题的字符:[^ \ w \ s \“'{[:] +然后你允许这些字符,但在特定情况下:

  • \w+\b(?!\s*:) a word that is not the key
  • \ w + \ b(?!\ s * :)一个不是关键的词

  • \s(?!\w+\s*:|$) spaces that are not at the end of the value (to trim them)
  • \ s(?!\ w + \ s *:| $)空格不在值的末尾(修剪它们)

  • \[[^]]*] content surrounded by square brackets
  • \ [[^]] *]方括号括起来的内容

  • {[^}]*} the same with curly brackets
  • {[^}] *}与大括号相同

  • "(?>[^"\\]|\\\\|\\.)*" content between double quotes (with escaped double quotes allowed)
  • “(?> [^”\\] | \\\\ | \\。)*“双引号之间的内容(允许使用转义双引号)

  • '(?>[^'\\]|\\\\|\\.)*' the same with single quotes
  • '(?> [^'\\] | \\\\ | \\。)*'与单引号相同

Note that the problem with colon inside brackets or quotes is avoided.

请注意,避免使用括号或引号内的冒号问题。

#2


0  

I'm not quite sure what you're looking for when you get to KeyC. How do you know when the string value for KeyB ends and the string for KeyC begins? Is there a colon after 'this\'s is a string' or a line break? Here's a sample to get you started:

当你到达KeyC时,我不太清楚你在寻找什么。你怎么知道KeyB的字符串值何时结束并且KeyC的字符串开始?在'this'是字符串'或换行符之后是否有冒号?这是一个让您入门的示例:

[TestMethod]
public void SplitString()
{
    string splitMe = "KeyA : SubComponent { SubKey : SubValue } KeyB:This's is a string";
    string pattern = "^(.*):(.*)({.*})(.*):(.*)";

    Match match = Regex.Match(splitMe, pattern);

    Assert.IsTrue(match.Success);
    Assert.AreEqual(6, match.Groups.Count); // 1st group is the entire match
    Assert.AreEqual("KeyA", match.Groups[1].Value.Trim());
    Assert.AreEqual("SubComponent", match.Groups[2].Value.Trim());
    Assert.AreEqual("{ SubKey : SubValue }", match.Groups[3].Value.Trim());
    Assert.AreEqual("KeyB", match.Groups[4].Value.Trim());
    Assert.AreEqual("This's is a string", match.Groups[5].Value.Trim());
}

#3


0  

this Regex pattern should work for you

这个正则表达式模式应该适合你

\s*:\s*(?![^\[]*\])(?![^{]*})(?=(([^"]*"[^"]*){2})*$|[^"]+$)

when replaced with

什么时候换成

\n$0\n

Demo

#1


1  

You can use this pattern...

你可以使用这种模式......

string pattern = @"(\w+)\s*:\s*((?>[^\w\s\"'{[:]+|\w+\b(?!\s*:)|\s(?!\w+\s*:|$)|\[[^]]*]|{[^}]*}|\"(?>[^\"\\]|\\.)*\"|'(?>[^'\\]|\\.)*')+)\s*";

... in two ways:

......有两种方式:

  1. with Match method which will give you what you are looking for with keys in group 1 and values in group 2
  2. 使用匹配方法,可以使用组1中的键和组2中的值为您提供所需的内容

  3. with Split method, but you must remove all the empty results.
  4. 使用Split方法,但必须删除所有空结果。

How is build the second part (after the :) of the pattern?

如何构建模式的第二部分(在:)之后?

The idea is to avoid, first at all, problematic characters: [^\w\s\"'{[:]+ Then you allow each of these characters but in a specific situation:

这个想法首先要避免有问题的字符:[^ \ w \ s \“'{[:] +然后你允许这些字符,但在特定情况下:

  • \w+\b(?!\s*:) a word that is not the key
  • \ w + \ b(?!\ s * :)一个不是关键的词

  • \s(?!\w+\s*:|$) spaces that are not at the end of the value (to trim them)
  • \ s(?!\ w + \ s *:| $)空格不在值的末尾(修剪它们)

  • \[[^]]*] content surrounded by square brackets
  • \ [[^]] *]方括号括起来的内容

  • {[^}]*} the same with curly brackets
  • {[^}] *}与大括号相同

  • "(?>[^"\\]|\\\\|\\.)*" content between double quotes (with escaped double quotes allowed)
  • “(?> [^”\\] | \\\\ | \\。)*“双引号之间的内容(允许使用转义双引号)

  • '(?>[^'\\]|\\\\|\\.)*' the same with single quotes
  • '(?> [^'\\] | \\\\ | \\。)*'与单引号相同

Note that the problem with colon inside brackets or quotes is avoided.

请注意,避免使用括号或引号内的冒号问题。

#2


0  

I'm not quite sure what you're looking for when you get to KeyC. How do you know when the string value for KeyB ends and the string for KeyC begins? Is there a colon after 'this\'s is a string' or a line break? Here's a sample to get you started:

当你到达KeyC时,我不太清楚你在寻找什么。你怎么知道KeyB的字符串值何时结束并且KeyC的字符串开始?在'this'是字符串'或换行符之后是否有冒号?这是一个让您入门的示例:

[TestMethod]
public void SplitString()
{
    string splitMe = "KeyA : SubComponent { SubKey : SubValue } KeyB:This's is a string";
    string pattern = "^(.*):(.*)({.*})(.*):(.*)";

    Match match = Regex.Match(splitMe, pattern);

    Assert.IsTrue(match.Success);
    Assert.AreEqual(6, match.Groups.Count); // 1st group is the entire match
    Assert.AreEqual("KeyA", match.Groups[1].Value.Trim());
    Assert.AreEqual("SubComponent", match.Groups[2].Value.Trim());
    Assert.AreEqual("{ SubKey : SubValue }", match.Groups[3].Value.Trim());
    Assert.AreEqual("KeyB", match.Groups[4].Value.Trim());
    Assert.AreEqual("This's is a string", match.Groups[5].Value.Trim());
}

#3


0  

this Regex pattern should work for you

这个正则表达式模式应该适合你

\s*:\s*(?![^\[]*\])(?![^{]*})(?=(([^"]*"[^"]*){2})*$|[^"]+$)

when replaced with

什么时候换成

\n$0\n

Demo