I need match a word in English text that appears 2 times in the text. I tried
我需要匹配一个在英语文本中出现两次的单词。我试着
(^|\ )([^\ ][^\b]*\b).*\ \2\b
but this doesn't match all lines.
但这并不符合所有的线。
1 个解决方案
#1
3
There are a few problems with your regex. For example, \b
word boundaries cannot be used in a character class, so [^\b]*
will not work as intended.
您的regex有一些问题。例如,\ b单词边界不能用于一个字符类,所以b[^ \]*不会按预期工作。
You probably want something like
你可能想要类似的东西
(?s)\b(\w+)\b.*\b\1\b
This will match the entire text from the first occurrence of the word to the last. This might not be what you actually intended.
这将匹配从第一个单词到最后一个单词的整个文本。这可能不是你真正想要的。
Another idea:
另一个想法:
(?s)\b(\w+)\b.*?\b\1\b
This will match only the text from the first occurrence of the word to the next.
这将只匹配从第一个单词到下一个单词的文本。
The problem with both these approaches is that for example in a text like
这两种方法的问题是,例如在文本中
foo bar bar foo
the regex will match from foo
to foo
, blindly ignoring that there is a duplicate bar
in-between.
regex将从foo匹配到foo,盲目地忽略中间有一个重复的bar。
So if you actually want to find all words that occur in duplicate, then use
所以,如果你想要找到所有重复出现的单词,然后使用。
(?s)\b(\w+)\b(?=.*?\b\1\b)
Explanation:
解释:
(?s) # Allow the dot to match newlines
\b(\w+)\b # Match an entire word
(?= # Assert that the following regex can be matched from here:
.*? # Any number of characters
\b\1\b # followed by the word that was previously captured
) # End of lookahead
#1
3
There are a few problems with your regex. For example, \b
word boundaries cannot be used in a character class, so [^\b]*
will not work as intended.
您的regex有一些问题。例如,\ b单词边界不能用于一个字符类,所以b[^ \]*不会按预期工作。
You probably want something like
你可能想要类似的东西
(?s)\b(\w+)\b.*\b\1\b
This will match the entire text from the first occurrence of the word to the last. This might not be what you actually intended.
这将匹配从第一个单词到最后一个单词的整个文本。这可能不是你真正想要的。
Another idea:
另一个想法:
(?s)\b(\w+)\b.*?\b\1\b
This will match only the text from the first occurrence of the word to the next.
这将只匹配从第一个单词到下一个单词的文本。
The problem with both these approaches is that for example in a text like
这两种方法的问题是,例如在文本中
foo bar bar foo
the regex will match from foo
to foo
, blindly ignoring that there is a duplicate bar
in-between.
regex将从foo匹配到foo,盲目地忽略中间有一个重复的bar。
So if you actually want to find all words that occur in duplicate, then use
所以,如果你想要找到所有重复出现的单词,然后使用。
(?s)\b(\w+)\b(?=.*?\b\1\b)
Explanation:
解释:
(?s) # Allow the dot to match newlines
\b(\w+)\b # Match an entire word
(?= # Assert that the following regex can be matched from here:
.*? # Any number of characters
\b\1\b # followed by the word that was previously captured
) # End of lookahead