This question already has an answer here:
这个问题在这里已有答案:
- String.split() — How do I treat consecutive delimiters as one? 4 answers
String.split() - 如何将连续分隔符视为一个? 4个答案
I am splitting below string with multiple delimiters. Delimiters are:
我在具有多个分隔符的字符串下面分裂。分隔符是:
, . @ ? ! _ ' and white space etc.
Below is my code:
以下是我的代码:
String[] tokens = s.split("[!|?|,|.|_|'|@ |\\s]");
For input:
He is a very very good boy, isn't he?
他是一个非常好的男孩,不是吗?
Expected output after split is: 10 tokens
拆分后的预期输出为:10个令牌
He
is
a
very
very
good
boy
isn
t
he他是个非常好的男孩
But I am getting below ouput: 11 tokens
但是我得到的输出低于11:令牌
He
is
a
very
very
good
boy他是一个非常好的男孩
isn
t
he不是吗
Because two delimiters whitespace and comma are adjacent, it is giving 11 tokens. How to get expected output?
因为两个分隔符的空格和逗号是相邻的,所以它给出了11个令牌。如何获得预期的产量?
2 个解决方案
#1
3
You can use +
for finding the combination, if you want to avoid multiple consecutive delimiters which results in empty string
如果要避免多个连续的分隔符导致空字符串,可以使用+来查找组合
s.split("[,.@?!_'\\s]+")
NOTE :- As I mentioned in comment, character class itself works as OR
condition for characters. So, there is no need of using |
inside character class for achieving alternation, because it will match |
literally.
注意: - 正如我在评论中提到的,字符类本身作为字符的OR条件。所以,没有必要使用|用于实现交替的内部字符类,因为它将匹配|从字面上。
#2
3
To match more than one consecutive delimiter, use the +
:
要匹配多个连续分隔符,请使用+:
s.split("[,.@?!_'\\s]+");
Another regex that you should consider using is:
您应该考虑使用的另一个正则表达式是:
s.split("[\\W_]+");
This will split so that any non-word character will be treated as a delimiter. This is not specified by your question, but it has the output you expect as well.
这将拆分,以便任何非单词字符将被视为分隔符。您的问题没有指定,但它也有您期望的输出。
#1
3
You can use +
for finding the combination, if you want to avoid multiple consecutive delimiters which results in empty string
如果要避免多个连续的分隔符导致空字符串,可以使用+来查找组合
s.split("[,.@?!_'\\s]+")
NOTE :- As I mentioned in comment, character class itself works as OR
condition for characters. So, there is no need of using |
inside character class for achieving alternation, because it will match |
literally.
注意: - 正如我在评论中提到的,字符类本身作为字符的OR条件。所以,没有必要使用|用于实现交替的内部字符类,因为它将匹配|从字面上。
#2
3
To match more than one consecutive delimiter, use the +
:
要匹配多个连续分隔符,请使用+:
s.split("[,.@?!_'\\s]+");
Another regex that you should consider using is:
您应该考虑使用的另一个正则表达式是:
s.split("[\\W_]+");
This will split so that any non-word character will be treated as a delimiter. This is not specified by your question, but it has the output you expect as well.
这将拆分,以便任何非单词字符将被视为分隔符。您的问题没有指定,但它也有您期望的输出。