在字符串c# (Regex, Char数组?)[duplicate]中查找所有子字符串

时间:2022-10-30 13:10:06

This question already has an answer here:

这个问题已经有了答案:

I need to identify substrings found in a string such as:

我需要识别字符串中的子字符串,例如:

"CityABCProcess Test" or "cityABCProcess Test"

“CityABCProcess Test”或“CityABCProcess Test”

to yield : [ "City/city", "ABC", "Process", "Test" ]

to yield: ["City/ City ", "ABC", "Process", "Test"]

  1. The first string in the substring can be lowercase or uppercase
  2. 子字符串中的第一个字符串可以是小写的或大写的
  3. Any substring with recurring uppercase letters will be a substring until a lowercase letter or space is found "ABCProcess -> ABC, ABC Process -> ABC"
  4. 任何带有重复大写字母的子字符串都将成为子字符串,直到找到一个小写字母或空格“ABCProcess -> ABC, ABC进程-> ABC”
  5. If there is an uppercase letter followed by a lowercase letter the substring will be everything until the next uppercase letter.
  6. 如果有一个大写字母,后面跟着一个小写字母,子字符串将是所有的东西,直到下一个大写字母。

Can this be handled by regex? Or should I convert my strings to a character array and manually check these cases using some indexing logic. Would a lambda solution work here? What is the best way to go about this?

这能被regex处理吗?或者我应该将字符串转换为字符数组,并使用一些索引逻辑手动检查这些情况。解在这里能行吗?最好的办法是什么?

1 个解决方案

#1


3  

Pay no attention to the naysayers! Even something like this really isn't that complicated with RegEx. I believe this pattern should do the trick:

不要理会那些反对者!即使是这样的东西也没有RegEx那么复杂。我认为这种模式应该会奏效:

[A-Z][a-z]+|[A-Z]+\b|[A-Z]+(?=[A-Z])|[a-z]+

[a - z][a - z]+ |[a - z]+ \ b |[a - z]+(? =[a - z])|[a - z]+

See here for a working demonstration. It's just a bunch of OR's processed in order. Here's the breakdown:

请看这里的工作演示。它只是一堆或按顺序处理。分解:

  • [A-Z][a-z]+ - Any word that starts with an uppercase letter and then is followed by all lowercase letters
  • [A-Z][A-Z] + -任何以大写字母开头,然后以小写字母结尾的单词
  • [A-Z]+\b - Any word that is in all uppercase (so as to include the last uppercase letter which would be excluded in the following option)
  • [A-Z]+\b -所有大写的单词(包括最后一个大写字母,将被排除在下列选项中)
  • [A-Z]+(?=[A-Z]) - Any word that is in all uppercase, but not including the first uppercase letter of the next word
  • [A-Z]+(?=[A-Z]) -所有大写的单词,但不包括下一个单词的第一个大写字母
  • [a-z]+ - Any word that's all lowercase
  • -所有小写的单词

For instance:

例如:

string input = "CityABCProcess TEST";
StringBuilder builder = new StringBuilder();
builder.Append("[A-Z][a-z]+");
builder.Append("|");
builder.Append("[A-Z]+$");
builder.Append("|");
builder.Append("[A-Z]+(?=[A-Z])");
builder.Append("|");
builder.Append("[a-z]+");
foreach (Match m in Regex.Matches(input, builder.ToString()))
    {
    Console.WriteLine(m.Value);
    }

#1


3  

Pay no attention to the naysayers! Even something like this really isn't that complicated with RegEx. I believe this pattern should do the trick:

不要理会那些反对者!即使是这样的东西也没有RegEx那么复杂。我认为这种模式应该会奏效:

[A-Z][a-z]+|[A-Z]+\b|[A-Z]+(?=[A-Z])|[a-z]+

[a - z][a - z]+ |[a - z]+ \ b |[a - z]+(? =[a - z])|[a - z]+

See here for a working demonstration. It's just a bunch of OR's processed in order. Here's the breakdown:

请看这里的工作演示。它只是一堆或按顺序处理。分解:

  • [A-Z][a-z]+ - Any word that starts with an uppercase letter and then is followed by all lowercase letters
  • [A-Z][A-Z] + -任何以大写字母开头,然后以小写字母结尾的单词
  • [A-Z]+\b - Any word that is in all uppercase (so as to include the last uppercase letter which would be excluded in the following option)
  • [A-Z]+\b -所有大写的单词(包括最后一个大写字母,将被排除在下列选项中)
  • [A-Z]+(?=[A-Z]) - Any word that is in all uppercase, but not including the first uppercase letter of the next word
  • [A-Z]+(?=[A-Z]) -所有大写的单词,但不包括下一个单词的第一个大写字母
  • [a-z]+ - Any word that's all lowercase
  • -所有小写的单词

For instance:

例如:

string input = "CityABCProcess TEST";
StringBuilder builder = new StringBuilder();
builder.Append("[A-Z][a-z]+");
builder.Append("|");
builder.Append("[A-Z]+$");
builder.Append("|");
builder.Append("[A-Z]+(?=[A-Z])");
builder.Append("|");
builder.Append("[a-z]+");
foreach (Match m in Regex.Matches(input, builder.ToString()))
    {
    Console.WriteLine(m.Value);
    }