This question already has an answer here:
这个问题已经有了答案:
- .NET - How can you split a “caps” delimited string into an array? 16 answers
- .NET -如何将带分隔符的字符串分割成数组?16个问题
I need to identify substrings found in a string such as:
我需要识别字符串中的子字符串,例如:
"CityABCProcess Test" or "cityABCProcess Test"
“CityABCProcess Test”或“CityABCProcess Test”
to yield : [ "City/city", "ABC", "Process", "Test" ]
to yield: ["City/ City ", "ABC", "Process", "Test"]
- The first string in the substring can be lowercase or uppercase
- 子字符串中的第一个字符串可以是小写的或大写的
- Any substring with recurring uppercase letters will be a substring until a lowercase letter or space is found "ABCProcess -> ABC, ABC Process -> ABC"
- 任何带有重复大写字母的子字符串都将成为子字符串,直到找到一个小写字母或空格“ABCProcess -> ABC, ABC进程-> ABC”
- If there is an uppercase letter followed by a lowercase letter the substring will be everything until the next uppercase letter.
- 如果有一个大写字母,后面跟着一个小写字母,子字符串将是所有的东西,直到下一个大写字母。
Can this be handled by regex? Or should I convert my strings to a character array and manually check these cases using some indexing logic. Would a lambda solution work here? What is the best way to go about this?
这能被regex处理吗?或者我应该将字符串转换为字符数组,并使用一些索引逻辑手动检查这些情况。解在这里能行吗?最好的办法是什么?
1 个解决方案
#1
3
Pay no attention to the naysayers! Even something like this really isn't that complicated with RegEx. I believe this pattern should do the trick:
不要理会那些反对者!即使是这样的东西也没有RegEx那么复杂。我认为这种模式应该会奏效:
[A-Z][a-z]+|[A-Z]+\b|[A-Z]+(?=[A-Z])|[a-z]+
[a - z][a - z]+ |[a - z]+ \ b |[a - z]+(? =[a - z])|[a - z]+
See here for a working demonstration. It's just a bunch of OR
's processed in order. Here's the breakdown:
请看这里的工作演示。它只是一堆或按顺序处理。分解:
-
[A-Z][a-z]+
- Any word that starts with an uppercase letter and then is followed by all lowercase letters - [A-Z][A-Z] + -任何以大写字母开头,然后以小写字母结尾的单词
-
[A-Z]+\b
- Any word that is in all uppercase (so as to include the last uppercase letter which would be excluded in the following option) - [A-Z]+\b -所有大写的单词(包括最后一个大写字母,将被排除在下列选项中)
-
[A-Z]+(?=[A-Z])
- Any word that is in all uppercase, but not including the first uppercase letter of the next word - [A-Z]+(?=[A-Z]) -所有大写的单词,但不包括下一个单词的第一个大写字母
-
[a-z]+
- Any word that's all lowercase - -所有小写的单词
For instance:
例如:
string input = "CityABCProcess TEST";
StringBuilder builder = new StringBuilder();
builder.Append("[A-Z][a-z]+");
builder.Append("|");
builder.Append("[A-Z]+$");
builder.Append("|");
builder.Append("[A-Z]+(?=[A-Z])");
builder.Append("|");
builder.Append("[a-z]+");
foreach (Match m in Regex.Matches(input, builder.ToString()))
{
Console.WriteLine(m.Value);
}
#1
3
Pay no attention to the naysayers! Even something like this really isn't that complicated with RegEx. I believe this pattern should do the trick:
不要理会那些反对者!即使是这样的东西也没有RegEx那么复杂。我认为这种模式应该会奏效:
[A-Z][a-z]+|[A-Z]+\b|[A-Z]+(?=[A-Z])|[a-z]+
[a - z][a - z]+ |[a - z]+ \ b |[a - z]+(? =[a - z])|[a - z]+
See here for a working demonstration. It's just a bunch of OR
's processed in order. Here's the breakdown:
请看这里的工作演示。它只是一堆或按顺序处理。分解:
-
[A-Z][a-z]+
- Any word that starts with an uppercase letter and then is followed by all lowercase letters - [A-Z][A-Z] + -任何以大写字母开头,然后以小写字母结尾的单词
-
[A-Z]+\b
- Any word that is in all uppercase (so as to include the last uppercase letter which would be excluded in the following option) - [A-Z]+\b -所有大写的单词(包括最后一个大写字母,将被排除在下列选项中)
-
[A-Z]+(?=[A-Z])
- Any word that is in all uppercase, but not including the first uppercase letter of the next word - [A-Z]+(?=[A-Z]) -所有大写的单词,但不包括下一个单词的第一个大写字母
-
[a-z]+
- Any word that's all lowercase - -所有小写的单词
For instance:
例如:
string input = "CityABCProcess TEST";
StringBuilder builder = new StringBuilder();
builder.Append("[A-Z][a-z]+");
builder.Append("|");
builder.Append("[A-Z]+$");
builder.Append("|");
builder.Append("[A-Z]+(?=[A-Z])");
builder.Append("|");
builder.Append("[a-z]+");
foreach (Match m in Regex.Matches(input, builder.ToString()))
{
Console.WriteLine(m.Value);
}