如何匹配多个字符的单词？

I would like to use a regular expression to match all words with more that one character, as opposed to words entirely made of the same char.

我想使用正则表达式来匹配所有多于一个字符的单词,而不是完全由相同字符组成的单词。

This should not match: ttttt, rrrrr, ggggggggggggg

这不应该匹配:ttttt,rrrrr,ggggggggggggg

This should match: rttttttt, word, wwwwwwwwwu

这应该匹配:rttttttt,word,wwwwwwwwwu

5 个解决方案

#1

The following expression will do the trick.

以下表达式将起到作用。

^(?<FIRST>[a-zA-Z])[a-zA-Z]*?(?!\k<FIRST>)[a-zA-Z]+$

capture the first character into the group FIRST

将第一个字符捕获到FIRST组中

capture some more characters (lazily to avoid backtracking)

捕捉一些更多的角色(懒洋洋地避免回溯)

ensure that that the next character is different from FIRST using a negative lookahead assertion

使用负前瞻断言确保下一个字符与FIRST不同

capture all (at least one due to the assertion) remaining characters

捕获所有(至少一个由于断言)剩余的字符

Note that is sufficient to look for a character that is different from the first one, because if no character is different from the first one, all characters are equal.

请注意,足以查找与第一个字符不同的字符,因为如果没有字符与第一个字符不同,则所有字符都相等。

You can shorten the expression to the following.

您可以将表达式缩短为以下内容。

^(\w)\w*?(?!\1)\w+$

This will match some more characters other than [a-zA-Z].

这将匹配[a-zA-Z]以外的其他一些字符。

#2

I would add all unique words to a list and then used this regex

我会将所有唯一的单词添加到列表中,然后使用此正则表达式

\b(\w)\1+\b

to grab all one character words and get rid of them

抓住所有一个字符并摆脱它们

#3

This doesn't use a regular expression, but I believe it will do what you require:

这不使用正则表达式,但我相信它会按照您的要求执行:

public bool Match(string str)
{
    return string.IsNullOrEmpty(str)
               || str.ToCharArray()
                     .Skip(1)
                     .Any( c => !c.Equals(str[0]) );
}

#4

The following RE will do the opposite of what you're asking for: match where a word is composed of the same character. It may still be useful to you though.

以下RE将与您要求的相反:匹配单词由同一个字符组成。尽管如此,它可能仍然有用。

\b(\w)\1*\b

#5

\b\w*?(\w)\1*(?:(?!\1)\w)\w*\b

\b(\w)(?!\1*\b)\w*\b

This assumes you're plucking the words out of some larger text; that's why it needs the word boundaries and the padding. If you have a list of words and you're just trying to validate the ones that meet the criteria, a much simpler regex would probably do:

这假设你从一些较大的文本中剔除了这些词;这就是为什么它需要单词边界和填充。如果你有一个单词列表,并且你只是想验证符合条件的单词,那么一个更简单的正则表达式可能会做:

(.)(?:(?!\1).)

...because you already know each word contains only word characters. On the other hand, depending on your definition of "word" you might need to replace \w in the first two regexes with something more specific, like [A-Za-z].

...因为你已经知道每个单词只包含单词字符。另一方面,根据您对“单词”的定义,您可能需要在前两个正则表达式中使用更具体的内容替换\ w,如[A-Za-z]。

#1