用于匹配除给定列表之外的任何字符序列的Java模式

时间:2022-10-29 18:46:50

How do I write a Pattern (Java) to match any sequence of characters except a given list of words?

如何编写模式(Java)来匹配除了给定的单词列表之外的任何字符序列?

I need to find if a given code has any text surrounded by tags like besides a given list of words. For example, I want to check if there are any other words besides "one" and "two" surrounded by the tag .

我需要查找一个给定的代码是否有任何被标记包围的文本,比如除了给定的单词列表。例如,我想检查除了“one”和“two”之外是否还有其他单词。

"This is the first tag <span>one</span> and this is the third <span>three</span>"

The pattern should match the above string because the word "three" is surrounded by the tag and is not part of the list of given words ("one", "two").

模式应该与上面的字符串匹配,因为单词“3”被标记包围,并且不在给定单词列表中(“1”、“2”)。

3 个解决方案

#1


7  

Look-ahead can do this:

有预见性的可以这样做:

\b(?!your|given|list|of|exclusions)\w+\b

Matches

匹配

  • a word boundary (start-of-word)
  • 一个字边界(start-of-word)
  • not followed by any of "your", "given", "list", "of", "exclusions"
  • 不包含任何“你的”、“给定的”、“列表”、“of”、“排除”
  • followed by multiple word characters
  • 后面跟着多个单词字符
  • followed by a word boundary (end-of-word)
  • 后面跟着一个单词边界(单词结尾)

In effect, this matches any word that is not excluded.

实际上,它匹配任何没有被排除的词。

#2


4  

This should get you started.

这应该会让你开始。

import java.util.regex.*;

// >(?!one<|two<)(\w+)/
// 
// Match the character “>” literally «>»
// Assert that it is impossible to match the regex below starting at this position (negative lookahead) «(?!one|two)»
//    Match either the regular expression below (attempting the next alternative only if this one fails) «one»
//       Match the characters “one<” literally «one»
//    Or match regular expression number 2 below (the entire group fails if this one fails to match) «two»
//       Match the characters “two<” literally «two»
// Match the regular expression below and capture its match into backreference number 1 «(\w+)»
//    Match a single character that is a “word character” (letters, digits, etc.) «\w+»
//       Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+»
// Match the characters “/” literally «</»
List<String> matchList = new ArrayList<String>();
try {
    Pattern regex = Pattern.compile(">(?!one<|two<)(\\w+)/");
    Matcher regexMatcher = regex.matcher(subjectString);
    while (regexMatcher.find()) {
        matchList.add(regexMatcher.group(1));
    } 
} catch (PatternSyntaxException ex) {
    // Syntax error in the regular expression
}

#3


2  

Use this:

用这个:

if (!Pattern.matches(".*(word1|word2|word3).*", "word1")) {
    System.out.println("We're good.");
};

You're checking that the pattern does not match the string.

您正在检查模式是否与字符串不匹配。

#1


7  

Look-ahead can do this:

有预见性的可以这样做:

\b(?!your|given|list|of|exclusions)\w+\b

Matches

匹配

  • a word boundary (start-of-word)
  • 一个字边界(start-of-word)
  • not followed by any of "your", "given", "list", "of", "exclusions"
  • 不包含任何“你的”、“给定的”、“列表”、“of”、“排除”
  • followed by multiple word characters
  • 后面跟着多个单词字符
  • followed by a word boundary (end-of-word)
  • 后面跟着一个单词边界(单词结尾)

In effect, this matches any word that is not excluded.

实际上,它匹配任何没有被排除的词。

#2


4  

This should get you started.

这应该会让你开始。

import java.util.regex.*;

// >(?!one<|two<)(\w+)/
// 
// Match the character “>” literally «>»
// Assert that it is impossible to match the regex below starting at this position (negative lookahead) «(?!one|two)»
//    Match either the regular expression below (attempting the next alternative only if this one fails) «one»
//       Match the characters “one<” literally «one»
//    Or match regular expression number 2 below (the entire group fails if this one fails to match) «two»
//       Match the characters “two<” literally «two»
// Match the regular expression below and capture its match into backreference number 1 «(\w+)»
//    Match a single character that is a “word character” (letters, digits, etc.) «\w+»
//       Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+»
// Match the characters “/” literally «</»
List<String> matchList = new ArrayList<String>();
try {
    Pattern regex = Pattern.compile(">(?!one<|two<)(\\w+)/");
    Matcher regexMatcher = regex.matcher(subjectString);
    while (regexMatcher.find()) {
        matchList.add(regexMatcher.group(1));
    } 
} catch (PatternSyntaxException ex) {
    // Syntax error in the regular expression
}

#3


2  

Use this:

用这个:

if (!Pattern.matches(".*(word1|word2|word3).*", "word1")) {
    System.out.println("We're good.");
};

You're checking that the pattern does not match the string.

您正在检查模式是否与字符串不匹配。