使用String.split时使用逗号

时间:2022-07-04 21:37:50

I'm trying to perform some super simple parsing o log files, so I'm using String.split method like this:

我尝试执行一些超级简单的解析o日志文件,所以我使用字符串。分割方法是这样的:

String [] parts = input.split(",");

And works great for input like:

对于输入非常有用:

a,b,c

Or

type=simple, output=Hello, repeat=true 

Just to say something.

说点什么。

How can I escape the comma, so it doesn't match intermediate commas?

如何转义逗号,使它与中间逗号不匹配?

For instance, if I want to include a comma in one of the parts:

例如,如果我想在其中一个部分包含一个逗号:

type=simple, output=Hello, world, repeate=true

I was thinking in something like:

我在想:

type=simple, output=Hello\, world, repeate=true

But I don't know how to create the split to avoid matching the comma.

但我不知道如何创建分割以避免与逗号匹配。

I've tried:

我试过了:

String [] parts = input.split("[^\,],");

But, well, is not working.

但是,这是行不通的。

4 个解决方案

#1


23  

You can solve it using a negative look behind.

你可以用消极的眼光看后面。

String[] parts = str.split("(?<!\\\\), ");

Basically it says, split on each ", " that is not preceeded by a backslash.

基本上,它说,对每一个","那不是前面有一个反斜杠。

String str = "type=simple, output=Hello\\, world, repeate=true";
String[] parts = str.split("(?<!\\\\), ");
for (String s : parts)
    System.out.println(s);

Output:

输出:

type=simple
output=Hello\, world
repeate=true

(ideone.com link)

(ideone.com链接)


If you happen to be stuck with the non-escaped comma-separated values, you could do the following (similar) hack:

如果您碰巧遇到了不可转义的逗号分隔值,您可以执行以下(类似)hack:

String[] parts = str.split(", (?=\\w+=)");

Which says split on each ", " which is followed by some word-characters and an =

它表示每个"分裂",后面跟着一些字字符和一个=

(ideone.com link)

(ideone.com链接)

#2


4  

I'm afraid, there's no perfect solution for String.split. Using a matcher for the three parts would work. In case the number of parts is not constant, I'd recommend a loop with matcher.find. Something like this maybe

恐怕没有完美的解决方法。对于这三部分使用一个匹配器是可行的。如果零件的数量不是恒定的,我建议用matcher.find。这样也许

final String s = "type=simple, output=Hello, world, repeat=true";
final Pattern p = Pattern.compile("((?:[^\\\\,]|\\\\.)*)(?:,|$)");
final Matcher m = p.matcher(s);
while (m.find()) System.out.println(m.group(1));

You'll probably want to skip the spaces after the comma as well:

你可能还想跳过逗号后面的空格:

final Pattern p = Pattern.compile("((?:[^\\\\,]|\\\\.)*)(?:,\\s*|$)");

It's not really complicated, just note that you need four backslashes in order to match one.

这并不复杂,只要注意你需要四个反斜杠来匹配一个。

#3


2  

Escaping works with the opposite of aioobe's answer (updated: aioobe now uses the same construct but I didn't know that when I wrote this), negative lookbehind

escape与aioobe的答案相反(更新:aioobe现在使用相同的结构,但我不知道我写这篇文章的时候),消极的lookbehind

final String s = "type=simple, output=Hello\\, world, repeate=true";
final String[] tokens = s.split("(?<!\\\\),\\s*");
for(final String item : tokens){
    System.out.println("'" + item.replace("\\,", ",") + "'");
}

Output:

输出:

'type=simple'
'output=Hello, world'
'repeate=true'

类型=简单的“输出=你好,世界”“重复= true”

Reference:

参考:

#4


0  

I think

我认为

input.split("[^\\\\],");

should work. It will split at all commas that are not preceeded with a backslash. BTW if you are working with Eclipse, I can recommend the QuickRex Plugin to test and debug Regexes.

应该工作。它将会分裂,因为之前没有使用反斜杠。顺便说一句,如果您正在使用Eclipse,我可以推荐QuickRex插件来测试和调试Regexes。

#1


23  

You can solve it using a negative look behind.

你可以用消极的眼光看后面。

String[] parts = str.split("(?<!\\\\), ");

Basically it says, split on each ", " that is not preceeded by a backslash.

基本上,它说,对每一个","那不是前面有一个反斜杠。

String str = "type=simple, output=Hello\\, world, repeate=true";
String[] parts = str.split("(?<!\\\\), ");
for (String s : parts)
    System.out.println(s);

Output:

输出:

type=simple
output=Hello\, world
repeate=true

(ideone.com link)

(ideone.com链接)


If you happen to be stuck with the non-escaped comma-separated values, you could do the following (similar) hack:

如果您碰巧遇到了不可转义的逗号分隔值,您可以执行以下(类似)hack:

String[] parts = str.split(", (?=\\w+=)");

Which says split on each ", " which is followed by some word-characters and an =

它表示每个"分裂",后面跟着一些字字符和一个=

(ideone.com link)

(ideone.com链接)

#2


4  

I'm afraid, there's no perfect solution for String.split. Using a matcher for the three parts would work. In case the number of parts is not constant, I'd recommend a loop with matcher.find. Something like this maybe

恐怕没有完美的解决方法。对于这三部分使用一个匹配器是可行的。如果零件的数量不是恒定的,我建议用matcher.find。这样也许

final String s = "type=simple, output=Hello, world, repeat=true";
final Pattern p = Pattern.compile("((?:[^\\\\,]|\\\\.)*)(?:,|$)");
final Matcher m = p.matcher(s);
while (m.find()) System.out.println(m.group(1));

You'll probably want to skip the spaces after the comma as well:

你可能还想跳过逗号后面的空格:

final Pattern p = Pattern.compile("((?:[^\\\\,]|\\\\.)*)(?:,\\s*|$)");

It's not really complicated, just note that you need four backslashes in order to match one.

这并不复杂,只要注意你需要四个反斜杠来匹配一个。

#3


2  

Escaping works with the opposite of aioobe's answer (updated: aioobe now uses the same construct but I didn't know that when I wrote this), negative lookbehind

escape与aioobe的答案相反(更新:aioobe现在使用相同的结构,但我不知道我写这篇文章的时候),消极的lookbehind

final String s = "type=simple, output=Hello\\, world, repeate=true";
final String[] tokens = s.split("(?<!\\\\),\\s*");
for(final String item : tokens){
    System.out.println("'" + item.replace("\\,", ",") + "'");
}

Output:

输出:

'type=simple'
'output=Hello, world'
'repeate=true'

类型=简单的“输出=你好,世界”“重复= true”

Reference:

参考:

#4


0  

I think

我认为

input.split("[^\\\\],");

should work. It will split at all commas that are not preceeded with a backslash. BTW if you are working with Eclipse, I can recommend the QuickRex Plugin to test and debug Regexes.

应该工作。它将会分裂,因为之前没有使用反斜杠。顺便说一句,如果您正在使用Eclipse,我可以推荐QuickRex插件来测试和调试Regexes。