Java匹配字符串中的整个单词

时间:2022-09-19 18:54:33

I have an ArrayList<String> which I iterate through to find the correct index given a String. Basically, given a String, the program should search through the list and find the index where the whole word matches. For example:

我有一个ArrayList ,我迭代查找给定String的正确索引。基本上,给定一个String,程序应该搜索列表并找到整个单词匹配的索引。例如:

ArrayList<String> foo = new ArrayList<String>();
foo.add("AAAB_11232016.txt");
foo.add("BBB_12252016.txt");
foo.add("AAA_09212017.txt");

So if I give the String AAA, I should get back index 2 (the last one). So I can't use the contains() method as that would give me back index 0.

所以如果我给String AAA,我应该回到索引2(最后一个)。所以我不能使用contains()方法,因为这会给我回索引0。

I tried with this code:

我尝试使用此代码:

String str = "AAA";
String pattern = "\\b" + str + "\\b";
Pattern p = Pattern.compile(pattern);

for(int i = 0; i < foo.size(); i++) {
    // Check each entry of list to find the correct value
    Matcher match = p.matcher(foo.get(i));

    if(match.find() == true) {
        return i;
    }
}

Unfortunately, this code never reaches the if statement inside the loop. I'm not sure what I'm doing wrong.

不幸的是,这段代码永远不会到达循环内的if语句。我不确定我做错了什么。

Note: This should also work if I searched for AAA_0921, the full name AAA_09212017.txt, or any part of the String that is unique to it.

注意:如果我搜索AAA_0921,全名AAA_09212017.txt或其唯一的String的任何部分,这也应该有效。

1 个解决方案

#1


6  

Since word boundary does not match between a word char and underscore you need

由于单词边界与您需要的单词char和下划线不匹配

String pattern = "(?<=_|\\b)" + str + "(?=_|\\b)";

Here, (?<=_|\b) positive lookbehind requires a word boundary or an underscore to appear before the str, and the (?=_|\b) positive lookahead requires an underscore or a word boundary to appear right after the str.

这里,(?<= _ | \ b)正向后视需要在str之前出现一个单词边界或一个下划线,而(?= _ | \ b)正向前瞻需要一个下划线或一个单词边界才能出现在海峡。

See this regex demo.

看到这个正则表达式演示。

If your word may have special chars inside, you might want to use a more straight-forward word boundary:

如果你的单词里面有特殊的字符,你可能想要使用更直接的单词边界:

"(?<![^\\W_])" + Pattern.quote(str) + "(?![^\\W_])"

Here, the negative lookbehind (?<![^\\W_]) fails the match if there is a word character except an underscore ([^...] is a negated character class that matches any character other than the characters, ranges, etc. defined inside this class, thus, it matches all characters other than a non-word char \W and a _), and the (?![^\W_]) negative lookahead fails the match if there is a word char except the underscore after the str.

这里,如果除了下划线之外有一个单词字符([^ ...]是一个与字符以外的任何字符匹配的否定字符类,则负面的lookbehind(?

Note that the second example has a quoted search string, so that even AA.A_str.txt could be matched well with AA.A.

请注意,第二个示例具有带引号的搜索字符串,因此即使AA.A_str.txt也可以与AA.A匹配得很好。

See another regex demo

看另一个正则表达式演示

#1


6  

Since word boundary does not match between a word char and underscore you need

由于单词边界与您需要的单词char和下划线不匹配

String pattern = "(?<=_|\\b)" + str + "(?=_|\\b)";

Here, (?<=_|\b) positive lookbehind requires a word boundary or an underscore to appear before the str, and the (?=_|\b) positive lookahead requires an underscore or a word boundary to appear right after the str.

这里,(?<= _ | \ b)正向后视需要在str之前出现一个单词边界或一个下划线,而(?= _ | \ b)正向前瞻需要一个下划线或一个单词边界才能出现在海峡。

See this regex demo.

看到这个正则表达式演示。

If your word may have special chars inside, you might want to use a more straight-forward word boundary:

如果你的单词里面有特殊的字符,你可能想要使用更直接的单词边界:

"(?<![^\\W_])" + Pattern.quote(str) + "(?![^\\W_])"

Here, the negative lookbehind (?<![^\\W_]) fails the match if there is a word character except an underscore ([^...] is a negated character class that matches any character other than the characters, ranges, etc. defined inside this class, thus, it matches all characters other than a non-word char \W and a _), and the (?![^\W_]) negative lookahead fails the match if there is a word char except the underscore after the str.

这里,如果除了下划线之外有一个单词字符([^ ...]是一个与字符以外的任何字符匹配的否定字符类,则负面的lookbehind(?

Note that the second example has a quoted search string, so that even AA.A_str.txt could be matched well with AA.A.

请注意,第二个示例具有带引号的搜索字符串,因此即使AA.A_str.txt也可以与AA.A匹配得很好。

See another regex demo

看另一个正则表达式演示