I'm doing some string search using Java Pattern class. I'm trying to match string (txt) that contains "c++" or "c#" inside using java Pattern class.
我正在使用Java Pattern类进行一些字符串搜索。我正在尝试使用java Pattern类匹配包含“c ++”或“c#”的字符串(txt)。
String txt="c++ / c# developer";
Pattern p = Pattern.compile(".*\\b(c\\+\\+|c#)\\b.*" , Pattern.CASE_INSENSITIVE);
Matcher m = p.matcher(txt);
while (m.find()) {
...
break;
}
m.find is always false What am i doing wrong? Thanks Ofer
m.find总是假的我做错了什么?谢谢Ofer
2 个解决方案
#1
6
\\b
is a word boundary. Which means it matches between a word and a non-word character. +
and #
are both non-word characters, so you require c++
or c#
to be followed by a letter, digit or underscore. Try removing the \\b
or replacing it with a \\B
(which would require that there is another non-word character after the +
or #
).
\\ b是一个单词边界。这意味着它匹配单词和非单词字符。 +和#都是非单词字符,因此您需要c ++或c#后跟字母,数字或下划线。尝试删除\\ b或用\\ B替换它(这将要求+或#后面还有另一个非单词字符)。
Note that, when you are using find
, you don't need the .*
either. find
will happily return partial matches. Your pattern would give you the last occurrence of either c++
or c#
in the first capturing group. If that is not what you want, remove the parentheses and wildcards.
请注意,当您使用find时,您也不需要。*。 find会很高兴地返回部分匹配。您的模式将在第一个捕获组中为您提供c ++或c#的最后一次出现。如果这不是您想要的,请删除括号和通配符。
EDIT: If you are adding other alternatives that do end in word characters (like java
). The cleanest solution would be not to use \\b
or \\B
at all, but create your own boundary condition using a negative lookahead. This way you are simply saying "match if there is no word character next":
编辑:如果您要添加以单词字符结尾的其他替代品(如java)。最干净的解决方案是不要使用\\ b或\\ B,而是使用负前瞻创建自己的边界条件。这样你只是简单地说“如果下一个没有单词字符就匹配”:
\\b(c\\+\\+|c#|java)(?!\\w)
#2
0
You can try using ^.*c(\+{2}|\#).*$
. It says find a c
followed by either 2 +
's or a #
. You can see an example here.
您可以尝试使用^。* c(\ + {2} | \#)。* $。它说找到一个c后跟2 +或#。你可以在这里看到一个例子。
#1
6
\\b
is a word boundary. Which means it matches between a word and a non-word character. +
and #
are both non-word characters, so you require c++
or c#
to be followed by a letter, digit or underscore. Try removing the \\b
or replacing it with a \\B
(which would require that there is another non-word character after the +
or #
).
\\ b是一个单词边界。这意味着它匹配单词和非单词字符。 +和#都是非单词字符,因此您需要c ++或c#后跟字母,数字或下划线。尝试删除\\ b或用\\ B替换它(这将要求+或#后面还有另一个非单词字符)。
Note that, when you are using find
, you don't need the .*
either. find
will happily return partial matches. Your pattern would give you the last occurrence of either c++
or c#
in the first capturing group. If that is not what you want, remove the parentheses and wildcards.
请注意,当您使用find时,您也不需要。*。 find会很高兴地返回部分匹配。您的模式将在第一个捕获组中为您提供c ++或c#的最后一次出现。如果这不是您想要的,请删除括号和通配符。
EDIT: If you are adding other alternatives that do end in word characters (like java
). The cleanest solution would be not to use \\b
or \\B
at all, but create your own boundary condition using a negative lookahead. This way you are simply saying "match if there is no word character next":
编辑:如果您要添加以单词字符结尾的其他替代品(如java)。最干净的解决方案是不要使用\\ b或\\ B,而是使用负前瞻创建自己的边界条件。这样你只是简单地说“如果下一个没有单词字符就匹配”:
\\b(c\\+\\+|c#|java)(?!\\w)
#2
0
You can try using ^.*c(\+{2}|\#).*$
. It says find a c
followed by either 2 +
's or a #
. You can see an example here.
您可以尝试使用^。* c(\ + {2} | \#)。* $。它说找到一个c后跟2 +或#。你可以在这里看到一个例子。