How can one perform this split with the Regex.Split(input, pattern)
method?
如何用正则表达式来执行这个拆分。分割(输入模式)方法?
This is a [normal string ] made up of # different types # of characters
Array of strings output:
字符串数组输出:
1. This
2. is
3. a
4. [normal string ]
5. made
6. up
7. of
8. # different types #
9. of
10. characters
Also it should keep the leading spaces, so I want to preserve everything. A string contains 20 chars, array of strings should total 20 chars across all elements.
它还应该保留前导空间,所以我想保存所有东西。一个字符串包含20个字符,字符串数组应该包含所有元素的20个字符。
What I have tried:
我已经尝试:
Regex.Split(text, @"(?<=[ ]|# #)")
Regex.Split(text, @"(?<=[ ])(?<=# #")
3 个解决方案
#1
2
I suggest matching, i.e. extracting words, not splitting:
我建议匹配,即提取单词,而不是拆分:
string source = @"This is a [normal string ] made up of # different types # of characters";
// Three possibilities:
// - plain word [A-Za-z]+
// - # ... # quotation
// - [ ... ] quotation
string pattern = @"[A-Za-z]+|(#.*?#)|(\[.*?\])";
var words = Regex
.Matches(source, pattern)
.OfType<Match>()
.Select(match => match.Value)
.ToArray();
Console.WriteLine(string.Join(Environment.NewLine, words
.Select((w, i) => $"{i + 1}. {w}")));
Outcome:
结果:
1. This
2. is
3. a
4. [normal string ]
5. made
6. up
7. of
8. # different types #
9. of
10. characters
#2
1
You may use
你可以用
var res = Regex.Split(s, @"(\[[^][]*]|#[^#]*#)|\s+")
.Where(x => !string.IsNullOrEmpty(x));
See the regex demo
看到regex演示
The (\[[^][]*]|#[^#]*#)
part is a capturing group whose value is output to the resulting list along with the split items.
(\[[^][]*]| # ^ # * #)是一个捕获组的值是输出到结果列表以及划分条目。
Pattern details
模式的细节
-
(\[[^][]*]|#[^#]*#)
- Group 1: either of the two patterns:-
\[[^][]*]
-[
, followed with 0+ chars other than[
and]
and then]
- \[[^][]*]-[,其次为0 +[和]然后]以外的字符
-
#[^#]*#
- a#
, then 0+ chars other than#
and then#
- # ^ # * # - #,然后0 + #然后#以外的字符
-
- (\[[^][]*]| # ^ # * #)-组1:要么两个模式:\[[^][]*]-[,其次为0 +字符以外的[和]然后]# ^ # * # - #,然后0 + #然后#以外的字符
-
|
- or - |——或者
-
\s+
- 1+ whitespaces - \ s + - 1 +空格
c#演示:
var s = "This is a [normal string ] made up of # different types # of characters";
var results = Regex.Split(s, @"(\[[^][]*]|#[^#]*#)|\s+")
.Where(x => !string.IsNullOrEmpty(x));
Console.WriteLine(string.Join("\n", results));
Result:
结果:
This
is
a
[normal string ]
made
up
of
# different types #
of
characters
#3
0
It would be easier using matching approach however it can be done using negative lookeaheads :
使用匹配方法会更容易,但是可以使用负面的外观:
[ ](?![^\]\[]*\])(?![^#]*\#([^#]*\#{2})*[^#]*$)
matches a space not followed by
匹配不跟随的空格
- any character sequence except
[
or]
followed by]
- 除了[或]后面跟着]的任何字符序列
-
#
followed by an even number of#
- 后面是偶数
#1
2
I suggest matching, i.e. extracting words, not splitting:
我建议匹配,即提取单词,而不是拆分:
string source = @"This is a [normal string ] made up of # different types # of characters";
// Three possibilities:
// - plain word [A-Za-z]+
// - # ... # quotation
// - [ ... ] quotation
string pattern = @"[A-Za-z]+|(#.*?#)|(\[.*?\])";
var words = Regex
.Matches(source, pattern)
.OfType<Match>()
.Select(match => match.Value)
.ToArray();
Console.WriteLine(string.Join(Environment.NewLine, words
.Select((w, i) => $"{i + 1}. {w}")));
Outcome:
结果:
1. This
2. is
3. a
4. [normal string ]
5. made
6. up
7. of
8. # different types #
9. of
10. characters
#2
1
You may use
你可以用
var res = Regex.Split(s, @"(\[[^][]*]|#[^#]*#)|\s+")
.Where(x => !string.IsNullOrEmpty(x));
See the regex demo
看到regex演示
The (\[[^][]*]|#[^#]*#)
part is a capturing group whose value is output to the resulting list along with the split items.
(\[[^][]*]| # ^ # * #)是一个捕获组的值是输出到结果列表以及划分条目。
Pattern details
模式的细节
-
(\[[^][]*]|#[^#]*#)
- Group 1: either of the two patterns:-
\[[^][]*]
-[
, followed with 0+ chars other than[
and]
and then]
- \[[^][]*]-[,其次为0 +[和]然后]以外的字符
-
#[^#]*#
- a#
, then 0+ chars other than#
and then#
- # ^ # * # - #,然后0 + #然后#以外的字符
-
- (\[[^][]*]| # ^ # * #)-组1:要么两个模式:\[[^][]*]-[,其次为0 +字符以外的[和]然后]# ^ # * # - #,然后0 + #然后#以外的字符
-
|
- or - |——或者
-
\s+
- 1+ whitespaces - \ s + - 1 +空格
c#演示:
var s = "This is a [normal string ] made up of # different types # of characters";
var results = Regex.Split(s, @"(\[[^][]*]|#[^#]*#)|\s+")
.Where(x => !string.IsNullOrEmpty(x));
Console.WriteLine(string.Join("\n", results));
Result:
结果:
This
is
a
[normal string ]
made
up
of
# different types #
of
characters
#3
0
It would be easier using matching approach however it can be done using negative lookeaheads :
使用匹配方法会更容易,但是可以使用负面的外观:
[ ](?![^\]\[]*\])(?![^#]*\#([^#]*\#{2})*[^#]*$)
matches a space not followed by
匹配不跟随的空格
- any character sequence except
[
or]
followed by]
- 除了[或]后面跟着]的任何字符序列
-
#
followed by an even number of#
- 后面是偶数