How do I write a Pattern (Java) to match any sequence of characters except a given list of words?
如何编写模式(Java)来匹配除了给定的单词列表之外的任何字符序列?
I need to find if a given code has any text surrounded by tags like besides a given list of words. For example, I want to check if there are any other words besides "one" and "two" surrounded by the tag .
我需要查找一个给定的代码是否有任何被标记包围的文本,比如除了给定的单词列表。例如,我想检查除了“one”和“two”之外是否还有其他单词。
"This is the first tag <span>one</span> and this is the third <span>three</span>"
The pattern should match the above string because the word "three" is surrounded by the tag and is not part of the list of given words ("one", "two").
模式应该与上面的字符串匹配,因为单词“3”被标记包围,并且不在给定单词列表中(“1”、“2”)。
3 个解决方案
#1
7
Look-ahead can do this:
有预见性的可以这样做:
\b(?!your|given|list|of|exclusions)\w+\b
Matches
匹配
- a word boundary (start-of-word)
- 一个字边界(start-of-word)
- not followed by any of "your", "given", "list", "of", "exclusions"
- 不包含任何“你的”、“给定的”、“列表”、“of”、“排除”
- followed by multiple word characters
- 后面跟着多个单词字符
- followed by a word boundary (end-of-word)
- 后面跟着一个单词边界(单词结尾)
In effect, this matches any word that is not excluded.
实际上,它匹配任何没有被排除的词。
#2
4
This should get you started.
这应该会让你开始。
import java.util.regex.*;
// >(?!one<|two<)(\w+)/
//
// Match the character “>” literally «>»
// Assert that it is impossible to match the regex below starting at this position (negative lookahead) «(?!one|two)»
// Match either the regular expression below (attempting the next alternative only if this one fails) «one»
// Match the characters “one<” literally «one»
// Or match regular expression number 2 below (the entire group fails if this one fails to match) «two»
// Match the characters “two<” literally «two»
// Match the regular expression below and capture its match into backreference number 1 «(\w+)»
// Match a single character that is a “word character” (letters, digits, etc.) «\w+»
// Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+»
// Match the characters “/” literally «</»
List<String> matchList = new ArrayList<String>();
try {
Pattern regex = Pattern.compile(">(?!one<|two<)(\\w+)/");
Matcher regexMatcher = regex.matcher(subjectString);
while (regexMatcher.find()) {
matchList.add(regexMatcher.group(1));
}
} catch (PatternSyntaxException ex) {
// Syntax error in the regular expression
}
#3
2
Use this:
用这个:
if (!Pattern.matches(".*(word1|word2|word3).*", "word1")) {
System.out.println("We're good.");
};
You're checking that the pattern does not match the string.
您正在检查模式是否与字符串不匹配。
#1
7
Look-ahead can do this:
有预见性的可以这样做:
\b(?!your|given|list|of|exclusions)\w+\b
Matches
匹配
- a word boundary (start-of-word)
- 一个字边界(start-of-word)
- not followed by any of "your", "given", "list", "of", "exclusions"
- 不包含任何“你的”、“给定的”、“列表”、“of”、“排除”
- followed by multiple word characters
- 后面跟着多个单词字符
- followed by a word boundary (end-of-word)
- 后面跟着一个单词边界(单词结尾)
In effect, this matches any word that is not excluded.
实际上,它匹配任何没有被排除的词。
#2
4
This should get you started.
这应该会让你开始。
import java.util.regex.*;
// >(?!one<|two<)(\w+)/
//
// Match the character “>” literally «>»
// Assert that it is impossible to match the regex below starting at this position (negative lookahead) «(?!one|two)»
// Match either the regular expression below (attempting the next alternative only if this one fails) «one»
// Match the characters “one<” literally «one»
// Or match regular expression number 2 below (the entire group fails if this one fails to match) «two»
// Match the characters “two<” literally «two»
// Match the regular expression below and capture its match into backreference number 1 «(\w+)»
// Match a single character that is a “word character” (letters, digits, etc.) «\w+»
// Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+»
// Match the characters “/” literally «</»
List<String> matchList = new ArrayList<String>();
try {
Pattern regex = Pattern.compile(">(?!one<|two<)(\\w+)/");
Matcher regexMatcher = regex.matcher(subjectString);
while (regexMatcher.find()) {
matchList.add(regexMatcher.group(1));
}
} catch (PatternSyntaxException ex) {
// Syntax error in the regular expression
}
#3
2
Use this:
用这个:
if (!Pattern.matches(".*(word1|word2|word3).*", "word1")) {
System.out.println("We're good.");
};
You're checking that the pattern does not match the string.
您正在检查模式是否与字符串不匹配。