在字符串c# (Regex, Char数组?)[duplicate]中查找所有子字符串

This question already has an answer here:

这个问题已经有了答案:

.NET - How can you split a “caps” delimited string into an array? 16 answers
.NET -如何将带分隔符的字符串分割成数组?16个问题

I need to identify substrings found in a string such as:

我需要识别字符串中的子字符串，例如:

"CityABCProcess Test" or "cityABCProcess Test"

“CityABCProcess Test”或“CityABCProcess Test”

to yield : [ "City/city", "ABC", "Process", "Test" ]

to yield: ["City/ City "， "ABC"， "Process"， "Test"]

The first string in the substring can be lowercase or uppercase
子字符串中的第一个字符串可以是小写的或大写的
Any substring with recurring uppercase letters will be a substring until a lowercase letter or space is found "ABCProcess -> ABC, ABC Process -> ABC"
任何带有重复大写字母的子字符串都将成为子字符串，直到找到一个小写字母或空格“ABCProcess -> ABC, ABC进程-> ABC”
If there is an uppercase letter followed by a lowercase letter the substring will be everything until the next uppercase letter.
如果有一个大写字母，后面跟着一个小写字母，子字符串将是所有的东西，直到下一个大写字母。

Can this be handled by regex? Or should I convert my strings to a character array and manually check these cases using some indexing logic. Would a lambda solution work here? What is the best way to go about this?

这能被regex处理吗?或者我应该将字符串转换为字符数组，并使用一些索引逻辑手动检查这些情况。解在这里能行吗?最好的办法是什么?

1 个解决方案

#1

Pay no attention to the naysayers! Even something like this really isn't that complicated with RegEx. I believe this pattern should do the trick:

不要理会那些反对者!即使是这样的东西也没有RegEx那么复杂。我认为这种模式应该会奏效:

[A-Z][a-z]+|[A-Z]+\b|[A-Z]+(?=[A-Z])|[a-z]+

[a - z][a - z]+ |[a - z]+ \ b |[a - z]+(? =[a - z])|[a - z]+

See here for a working demonstration. It's just a bunch of OR's processed in order. Here's the breakdown:

请看这里的工作演示。它只是一堆或按顺序处理。分解:

[A-Z][a-z]+ - Any word that starts with an uppercase letter and then is followed by all lowercase letters
[A-Z][A-Z] + -任何以大写字母开头，然后以小写字母结尾的单词
[A-Z]+\b - Any word that is in all uppercase (so as to include the last uppercase letter which would be excluded in the following option)
[A-Z]+\b -所有大写的单词(包括最后一个大写字母，将被排除在下列选项中)
[A-Z]+(?=[A-Z]) - Any word that is in all uppercase, but not including the first uppercase letter of the next word
[A-Z]+(?=[A-Z]) -所有大写的单词，但不包括下一个单词的第一个大写字母
[a-z]+ - Any word that's all lowercase
-所有小写的单词

For instance:

例如:

string input = "CityABCProcess TEST";
StringBuilder builder = new StringBuilder();
builder.Append("[A-Z][a-z]+");
builder.Append("|");
builder.Append("[A-Z]+$");
builder.Append("|");
builder.Append("[A-Z]+(?=[A-Z])");
builder.Append("|");
builder.Append("[a-z]+");
foreach (Match m in Regex.Matches(input, builder.ToString()))
    {
    Console.WriteLine(m.Value);
    }

#1