Java用不包含[]括号的空格分隔字符串

时间:2021-05-31 21:43:24

How do I split a string by whitespaces if it is not surrounded with this kind of bracket [ ]

如果没有用这种括号[]包围的话,如何用空格分割字符串?

So the string " book [new interesting book] buy it " should be split in

所以字符串“书[新有趣的书]买它”应该分开

book
new interesting book
buy
it

or

book
[new interesting book]
buy
it

Thank you!

4 个解决方案

#1


3  

Does it have to be regex? You can do it in one iteration just by counting how many brackets ware before space to determine if that space should be replaced by new line mark or not.

它必须是正则表达式吗?您可以在一次迭代中执行此操作,只需计算空格前有多少个括号,以确定是否应该用新行标记替换该空格。

String data="book [new [interesting] book] buy it";
StringBuilder buffer=new StringBuilder();
int bracketCounter=0;
for (char c:data.toCharArray()){
    if (c=='[') bracketCounter++;
    if (c==']') bracketCounter--;
    if (c==' ' && bracketCounter==0)
        buffer.append("\n");
    else 
        buffer.append(c);
}
System.out.println(buffer);

Out:

book
[new [interesting] book]
buy
it

#2


2  

It's difficult to use String.split() here because it's hard to distinguish between spaces within brackets and spaces outside of them. Instead, continually Matcher.find() against your string until you have exhausted it of tokens.

这里很难使用String.split(),因为很难区分括号内的空格和它们之外的空格。相反,不断使用Matcher.find()反对你的字符串,直到你用尽它的标记为止。

List<String> tokens = new ArrayList<String>();
Pattern p = Pattern.compile("\\s*(\\[.*\\]|[^\\s]+)\\s*");
Matcher m = p.matcher(" book [new interesting book] buy it ");
while (m.find()) {
    tokens.add(m.group());
}
System.out.println(tokens);
// Prints: [ book , [new interesting book] , buy , it ]

The regex above ignores leading and trailing whitespace, and grabs: (1) anything, if it is within brackets or (2) any sequence of non-spaces.

上面的正则表达式忽略了前导和尾随空格,并且抓取:(1)任何东西,如果它在括号内或(2)任何非空格序列。

#3


2  

I have changed a little bit @cheeken's response, just to improve it a little bit. I decided to include it in an answer because of code formatting:

我已经改变了一点@ cheeken的回应,只是为了改善它一点点。由于代码格式化,我决定将其包含在答案中:

List<String> tokens = new ArrayList<String>();
Pattern p = Pattern.compile("\\s*(\\[.*\\]|[\\S]*)\\s*");
Matcher m = p.matcher(" book [new interesting book] buy it ");
while (m.find()) {            
    if (!m.group().matches("\\s*")) {    
       tokens.add(m.group());
    }
}

I changed the second part of the pattern in order to use the predefined class of \S instead of his negation and I tested the pattern against the empty string in order to avoid including the initial and final spaces his answer would allow.

我改变了模式的第二部分,以便使用预定义的类\ S而不是他的否定,并且我针对空字符串测试了模式,以避免包含他的答案允许的初始和最终空格。

#4


0  

String input = "foo [bar bar] foo";
Pattern p = Pattern.compile("\[|\]");
String[] s = p.split(input);

now we have the part left of the [, the part inside the brackets and the part right from the ]. Now you can go trough these parts (if necessary) and split them further.

现在我们左边有[,括号里面的部分和正确的部分]。现在你可以通过这些部分(如果需要)进一步拆分它们。

#1


3  

Does it have to be regex? You can do it in one iteration just by counting how many brackets ware before space to determine if that space should be replaced by new line mark or not.

它必须是正则表达式吗?您可以在一次迭代中执行此操作,只需计算空格前有多少个括号,以确定是否应该用新行标记替换该空格。

String data="book [new [interesting] book] buy it";
StringBuilder buffer=new StringBuilder();
int bracketCounter=0;
for (char c:data.toCharArray()){
    if (c=='[') bracketCounter++;
    if (c==']') bracketCounter--;
    if (c==' ' && bracketCounter==0)
        buffer.append("\n");
    else 
        buffer.append(c);
}
System.out.println(buffer);

Out:

book
[new [interesting] book]
buy
it

#2


2  

It's difficult to use String.split() here because it's hard to distinguish between spaces within brackets and spaces outside of them. Instead, continually Matcher.find() against your string until you have exhausted it of tokens.

这里很难使用String.split(),因为很难区分括号内的空格和它们之外的空格。相反,不断使用Matcher.find()反对你的字符串,直到你用尽它的标记为止。

List<String> tokens = new ArrayList<String>();
Pattern p = Pattern.compile("\\s*(\\[.*\\]|[^\\s]+)\\s*");
Matcher m = p.matcher(" book [new interesting book] buy it ");
while (m.find()) {
    tokens.add(m.group());
}
System.out.println(tokens);
// Prints: [ book , [new interesting book] , buy , it ]

The regex above ignores leading and trailing whitespace, and grabs: (1) anything, if it is within brackets or (2) any sequence of non-spaces.

上面的正则表达式忽略了前导和尾随空格,并且抓取:(1)任何东西,如果它在括号内或(2)任何非空格序列。

#3


2  

I have changed a little bit @cheeken's response, just to improve it a little bit. I decided to include it in an answer because of code formatting:

我已经改变了一点@ cheeken的回应,只是为了改善它一点点。由于代码格式化,我决定将其包含在答案中:

List<String> tokens = new ArrayList<String>();
Pattern p = Pattern.compile("\\s*(\\[.*\\]|[\\S]*)\\s*");
Matcher m = p.matcher(" book [new interesting book] buy it ");
while (m.find()) {            
    if (!m.group().matches("\\s*")) {    
       tokens.add(m.group());
    }
}

I changed the second part of the pattern in order to use the predefined class of \S instead of his negation and I tested the pattern against the empty string in order to avoid including the initial and final spaces his answer would allow.

我改变了模式的第二部分,以便使用预定义的类\ S而不是他的否定,并且我针对空字符串测试了模式,以避免包含他的答案允许的初始和最终空格。

#4


0  

String input = "foo [bar bar] foo";
Pattern p = Pattern.compile("\[|\]");
String[] s = p.split(input);

now we have the part left of the [, the part inside the brackets and the part right from the ]. Now you can go trough these parts (if necessary) and split them further.

现在我们左边有[,括号里面的部分和正确的部分]。现在你可以通过这些部分(如果需要)进一步拆分它们。