正则表达式由空格和包含字符的字符组成。

时间:2022-09-13 16:28:01

How can one perform this split with the Regex.Split(input, pattern) method?

如何用正则表达式来执行这个拆分。分割(输入模式)方法?

This is a [normal string ] made up of # different types # of characters

Array of strings output:

字符串数组输出:

1. This 
2. is
3. a
4. [normal string ]
5. made
6. up
7. of
8. # different types #
9. of
10. characters

Also it should keep the leading spaces, so I want to preserve everything. A string contains 20 chars, array of strings should total 20 chars across all elements.

它还应该保留前导空间,所以我想保存所有东西。一个字符串包含20个字符,字符串数组应该包含所有元素的20个字符。

What I have tried:

我已经尝试:

Regex.Split(text, @"(?<=[ ]|# #)")

Regex.Split(text, @"(?<=[ ])(?<=# #")

3 个解决方案

#1


2  

I suggest matching, i.e. extracting words, not splitting:

我建议匹配,即提取单词,而不是拆分:

string source = @"This is a [normal string ] made up of # different types # of characters";

// Three possibilities:
//   - plain word [A-Za-z]+
//   - # ... # quotation
//   - [ ... ] quotation  
string pattern = @"[A-Za-z]+|(#.*?#)|(\[.*?\])";

var words = Regex
  .Matches(source, pattern)
  .OfType<Match>()
  .Select(match => match.Value)
  .ToArray();

Console.WriteLine(string.Join(Environment.NewLine, words
  .Select((w, i) => $"{i + 1}. {w}")));

Outcome:

结果:

1. This
2. is
3. a
4. [normal string ]
5. made
6. up
7. of
8. # different types #
9. of
10. characters

#2


1  

You may use

你可以用

var res = Regex.Split(s, @"(\[[^][]*]|#[^#]*#)|\s+")
    .Where(x => !string.IsNullOrEmpty(x));

See the regex demo

看到regex演示

The (\[[^][]*]|#[^#]*#) part is a capturing group whose value is output to the resulting list along with the split items.

(\[[^][]*]| # ^ # * #)是一个捕获组的值是输出到结果列表以及划分条目。

Pattern details

模式的细节

  • (\[[^][]*]|#[^#]*#) - Group 1: either of the two patterns:
    • \[[^][]*] - [, followed with 0+ chars other than [ and ] and then ]
    • \[[^][]*]-[,其次为0 +[和]然后]以外的字符
    • #[^#]*# - a #, then 0+ chars other than # and then #
    • # ^ # * # - #,然后0 + #然后#以外的字符
  • (\[[^][]*]| # ^ # * #)-组1:要么两个模式:\[[^][]*]-[,其次为0 +字符以外的[和]然后]# ^ # * # - #,然后0 + #然后#以外的字符
  • | - or
  • |——或者
  • \s+ - 1+ whitespaces
  • \ s + - 1 +空格

C# demo:

c#演示:

var s = "This is a [normal string ] made up of # different types # of characters";
var results = Regex.Split(s, @"(\[[^][]*]|#[^#]*#)|\s+")
    .Where(x => !string.IsNullOrEmpty(x));
Console.WriteLine(string.Join("\n", results));

Result:

结果:

This
is
a
[normal string ]
made
up
of
# different types #
of
characters

#3


0  

It would be easier using matching approach however it can be done using negative lookeaheads :

使用匹配方法会更容易,但是可以使用负面的外观:

[ ](?![^\]\[]*\])(?![^#]*\#([^#]*\#{2})*[^#]*$)

matches a space not followed by

匹配不跟随的空格

  • any character sequence except [ or ] followed by ]
  • 除了[或]后面跟着]的任何字符序列
  • # followed by an even number of #
  • 后面是偶数

#1


2  

I suggest matching, i.e. extracting words, not splitting:

我建议匹配,即提取单词,而不是拆分:

string source = @"This is a [normal string ] made up of # different types # of characters";

// Three possibilities:
//   - plain word [A-Za-z]+
//   - # ... # quotation
//   - [ ... ] quotation  
string pattern = @"[A-Za-z]+|(#.*?#)|(\[.*?\])";

var words = Regex
  .Matches(source, pattern)
  .OfType<Match>()
  .Select(match => match.Value)
  .ToArray();

Console.WriteLine(string.Join(Environment.NewLine, words
  .Select((w, i) => $"{i + 1}. {w}")));

Outcome:

结果:

1. This
2. is
3. a
4. [normal string ]
5. made
6. up
7. of
8. # different types #
9. of
10. characters

#2


1  

You may use

你可以用

var res = Regex.Split(s, @"(\[[^][]*]|#[^#]*#)|\s+")
    .Where(x => !string.IsNullOrEmpty(x));

See the regex demo

看到regex演示

The (\[[^][]*]|#[^#]*#) part is a capturing group whose value is output to the resulting list along with the split items.

(\[[^][]*]| # ^ # * #)是一个捕获组的值是输出到结果列表以及划分条目。

Pattern details

模式的细节

  • (\[[^][]*]|#[^#]*#) - Group 1: either of the two patterns:
    • \[[^][]*] - [, followed with 0+ chars other than [ and ] and then ]
    • \[[^][]*]-[,其次为0 +[和]然后]以外的字符
    • #[^#]*# - a #, then 0+ chars other than # and then #
    • # ^ # * # - #,然后0 + #然后#以外的字符
  • (\[[^][]*]| # ^ # * #)-组1:要么两个模式:\[[^][]*]-[,其次为0 +字符以外的[和]然后]# ^ # * # - #,然后0 + #然后#以外的字符
  • | - or
  • |——或者
  • \s+ - 1+ whitespaces
  • \ s + - 1 +空格

C# demo:

c#演示:

var s = "This is a [normal string ] made up of # different types # of characters";
var results = Regex.Split(s, @"(\[[^][]*]|#[^#]*#)|\s+")
    .Where(x => !string.IsNullOrEmpty(x));
Console.WriteLine(string.Join("\n", results));

Result:

结果:

This
is
a
[normal string ]
made
up
of
# different types #
of
characters

#3


0  

It would be easier using matching approach however it can be done using negative lookeaheads :

使用匹配方法会更容易,但是可以使用负面的外观:

[ ](?![^\]\[]*\])(?![^#]*\#([^#]*\#{2})*[^#]*$)

matches a space not followed by

匹配不跟随的空格

  • any character sequence except [ or ] followed by ]
  • 除了[或]后面跟着]的任何字符序列
  • # followed by an even number of #
  • 后面是偶数