I'm using this regex:
我正在使用这个正则表达式:
x.split("[^a-zA-Z0-9']+");
This returns an array of strings with letters and/or numbers.
这将返回带有字母和/或数字的字符串数组。
If I use this:
如果我用这个:
String name = "CEN01_Automated_TestCase.java";
String[] names = name.Split.split("[^a-zA-Z0-9']+");
I got:
我有:
CEN01
Automated
TestCase
Java
But if I use this:
但如果我使用这个:
String name = "CEN01_Automação_Caso_Teste.java";
String[] names = name.Split.split("[^a-zA-Z0-9']+");
I got:
我有:
CEN01
Automa
o
Caso
Teste
Java
How can I modify this regex to include accented characters? (á,ã,õ, etc...)
如何修改此正则表达式以包含重音字符? (á,ã,õ等......)
5 个解决方案
#1
9
From http://docs.oracle.com/javase/7/docs/api/java/util/regex/Pattern.html
来自http://docs.oracle.com/javase/7/docs/api/java/util/regex/Pattern.html
Categories that behave like the
java.lang.Character boolean ismethodname
methods (except for the deprecated ones) are available through the same\p{prop}
syntax where the specified property has the namejavamethodname
.行为类似于java.lang.Character boolean ismethodname方法的类别(不推荐使用的方法除外)可通过相同的\ p {prop}语法获得,其中指定的属性名称为javamethodname。
Since Character
class contains isAlphabetic
method you can use
由于Character类包含isAlphabetic方法,您可以使用
name.split("[^\\p{IsAlphabetic}0-9']+");
You can also use
你也可以使用
name.split("(?U)[^\\p{Alpha}0-9']+");
but you will need to use UNICODE_CHARACTER_CLASS
flag which can be used by adding (?U)
in regex.
但是你需要使用UNICODE_CHARACTER_CLASS标志,可以通过在正则表达式中添加(?U)来使用它。
#2
2
I would check out the Java Documentation on Regular Expressions. There is a unicode section which I believe is what you may be looking for.
我会查看正则表达式的Java文档。有一个unicode部分,我相信你可能正在寻找。
EDIT: Example
编辑:示例
Another way would be to match on the character code you are looking for. For example
另一种方法是匹配您正在寻找的字符代码。例如
\uFFFF where FFFF is the hexadecimal number of the character you are trying to match.
Example: \u00E0 matches à
示例:\ u00E0匹配à
Realize that the backslash will need to be escaped in Java if you are using it as a string literal.
如果您将其用作字符串文字,请认识到需要在Java中转义反斜杠。
Read more about it here.
在这里阅读更多相关信息。
#3
2
You can use this:
你可以用这个:
String[] names = name.split("[^a-zA-Z0-9'\\p{L}]+");
System.out.println(Arrays.toString(names));
Will output:
的System.out.println(Arrays.toString(地名));将输出:
[CEN01, Automação, Caso, Teste, java]
[CEN01,Automação,Caso,Teste,java]
See this for more information.
有关更多信息,请参阅此
#4
1
Why not split on the separator characters?
为什么不拆分分隔符?
String[] names = name.split("[_.]");
#5
0
Instead of blacklisting all the characters you don't want, you could always whitlist the characters you want like :
您可以随时将所需的字符列入白名单,而不是将您不想要的所有字符列入黑名单:
^[^<>%$]*$
The expression [^(many characters here)] just matches any character that is not listed.
表达式[^(这里有很多字符)]只匹配未列出的任何字符。
But that is a personnal opinion.
但那是个人意见。
#1
9
From http://docs.oracle.com/javase/7/docs/api/java/util/regex/Pattern.html
来自http://docs.oracle.com/javase/7/docs/api/java/util/regex/Pattern.html
Categories that behave like the
java.lang.Character boolean ismethodname
methods (except for the deprecated ones) are available through the same\p{prop}
syntax where the specified property has the namejavamethodname
.行为类似于java.lang.Character boolean ismethodname方法的类别(不推荐使用的方法除外)可通过相同的\ p {prop}语法获得,其中指定的属性名称为javamethodname。
Since Character
class contains isAlphabetic
method you can use
由于Character类包含isAlphabetic方法,您可以使用
name.split("[^\\p{IsAlphabetic}0-9']+");
You can also use
你也可以使用
name.split("(?U)[^\\p{Alpha}0-9']+");
but you will need to use UNICODE_CHARACTER_CLASS
flag which can be used by adding (?U)
in regex.
但是你需要使用UNICODE_CHARACTER_CLASS标志,可以通过在正则表达式中添加(?U)来使用它。
#2
2
I would check out the Java Documentation on Regular Expressions. There is a unicode section which I believe is what you may be looking for.
我会查看正则表达式的Java文档。有一个unicode部分,我相信你可能正在寻找。
EDIT: Example
编辑:示例
Another way would be to match on the character code you are looking for. For example
另一种方法是匹配您正在寻找的字符代码。例如
\uFFFF where FFFF is the hexadecimal number of the character you are trying to match.
Example: \u00E0 matches à
示例:\ u00E0匹配à
Realize that the backslash will need to be escaped in Java if you are using it as a string literal.
如果您将其用作字符串文字,请认识到需要在Java中转义反斜杠。
Read more about it here.
在这里阅读更多相关信息。
#3
2
You can use this:
你可以用这个:
String[] names = name.split("[^a-zA-Z0-9'\\p{L}]+");
System.out.println(Arrays.toString(names));
Will output:
的System.out.println(Arrays.toString(地名));将输出:
[CEN01, Automação, Caso, Teste, java]
[CEN01,Automação,Caso,Teste,java]
See this for more information.
有关更多信息,请参阅此
#4
1
Why not split on the separator characters?
为什么不拆分分隔符?
String[] names = name.split("[_.]");
#5
0
Instead of blacklisting all the characters you don't want, you could always whitlist the characters you want like :
您可以随时将所需的字符列入白名单,而不是将您不想要的所有字符列入黑名单:
^[^<>%$]*$
The expression [^(many characters here)] just matches any character that is not listed.
表达式[^(这里有很多字符)]只匹配未列出的任何字符。
But that is a personnal opinion.
但那是个人意见。