I'm trying to perform some super simple parsing o log files, so I'm using String.split
method like this:
我尝试执行一些超级简单的解析o日志文件,所以我使用字符串。分割方法是这样的:
String [] parts = input.split(",");
And works great for input like:
对于输入非常有用:
a,b,c
Or
或
type=simple, output=Hello, repeat=true
Just to say something.
说点什么。
How can I escape the comma, so it doesn't match intermediate commas?
如何转义逗号,使它与中间逗号不匹配?
For instance, if I want to include a comma in one of the parts:
例如,如果我想在其中一个部分包含一个逗号:
type=simple, output=Hello, world, repeate=true
I was thinking in something like:
我在想:
type=simple, output=Hello\, world, repeate=true
But I don't know how to create the split to avoid matching the comma.
但我不知道如何创建分割以避免与逗号匹配。
I've tried:
我试过了:
String [] parts = input.split("[^\,],");
But, well, is not working.
但是,这是行不通的。
4 个解决方案
#1
23
You can solve it using a negative look behind.
你可以用消极的眼光看后面。
String[] parts = str.split("(?<!\\\\), ");
Basically it says, split on each ", "
that is not preceeded by a backslash.
基本上,它说,对每一个","那不是前面有一个反斜杠。
String str = "type=simple, output=Hello\\, world, repeate=true";
String[] parts = str.split("(?<!\\\\), ");
for (String s : parts)
System.out.println(s);
Output:
输出:
type=simple
output=Hello\, world
repeate=true
(ideone.com链接)
If you happen to be stuck with the non-escaped comma-separated values, you could do the following (similar) hack:
如果您碰巧遇到了不可转义的逗号分隔值,您可以执行以下(类似)hack:
String[] parts = str.split(", (?=\\w+=)");
Which says split on each ", "
which is followed by some word-characters and an =
它表示每个"分裂",后面跟着一些字字符和一个=
(ideone.com链接)
#2
4
I'm afraid, there's no perfect solution for String.split
. Using a matcher for the three parts would work. In case the number of parts is not constant, I'd recommend a loop with matcher.find
. Something like this maybe
恐怕没有完美的解决方法。对于这三部分使用一个匹配器是可行的。如果零件的数量不是恒定的,我建议用matcher.find。这样也许
final String s = "type=simple, output=Hello, world, repeat=true";
final Pattern p = Pattern.compile("((?:[^\\\\,]|\\\\.)*)(?:,|$)");
final Matcher m = p.matcher(s);
while (m.find()) System.out.println(m.group(1));
You'll probably want to skip the spaces after the comma as well:
你可能还想跳过逗号后面的空格:
final Pattern p = Pattern.compile("((?:[^\\\\,]|\\\\.)*)(?:,\\s*|$)");
It's not really complicated, just note that you need four backslashes in order to match one.
这并不复杂,只要注意你需要四个反斜杠来匹配一个。
#3
2
Escaping works with the opposite of aioobe's answer (updated: aioobe now uses the same construct but I didn't know that when I wrote this), negative lookbehind
escape与aioobe的答案相反(更新:aioobe现在使用相同的结构,但我不知道我写这篇文章的时候),消极的lookbehind
final String s = "type=simple, output=Hello\\, world, repeate=true";
final String[] tokens = s.split("(?<!\\\\),\\s*");
for(final String item : tokens){
System.out.println("'" + item.replace("\\,", ",") + "'");
}
Output:
输出:
'type=simple'
'output=Hello, world'
'repeate=true'类型=简单的“输出=你好,世界”“重复= true”
Reference:
参考:
- Pattern: Special Constructs
- 模式:特殊结构
#4
0
I think
我认为
input.split("[^\\\\],");
should work. It will split at all commas that are not preceeded with a backslash. BTW if you are working with Eclipse, I can recommend the QuickRex Plugin to test and debug Regexes.
应该工作。它将会分裂,因为之前没有使用反斜杠。顺便说一句,如果您正在使用Eclipse,我可以推荐QuickRex插件来测试和调试Regexes。
#1
23
You can solve it using a negative look behind.
你可以用消极的眼光看后面。
String[] parts = str.split("(?<!\\\\), ");
Basically it says, split on each ", "
that is not preceeded by a backslash.
基本上,它说,对每一个","那不是前面有一个反斜杠。
String str = "type=simple, output=Hello\\, world, repeate=true";
String[] parts = str.split("(?<!\\\\), ");
for (String s : parts)
System.out.println(s);
Output:
输出:
type=simple
output=Hello\, world
repeate=true
(ideone.com链接)
If you happen to be stuck with the non-escaped comma-separated values, you could do the following (similar) hack:
如果您碰巧遇到了不可转义的逗号分隔值,您可以执行以下(类似)hack:
String[] parts = str.split(", (?=\\w+=)");
Which says split on each ", "
which is followed by some word-characters and an =
它表示每个"分裂",后面跟着一些字字符和一个=
(ideone.com链接)
#2
4
I'm afraid, there's no perfect solution for String.split
. Using a matcher for the three parts would work. In case the number of parts is not constant, I'd recommend a loop with matcher.find
. Something like this maybe
恐怕没有完美的解决方法。对于这三部分使用一个匹配器是可行的。如果零件的数量不是恒定的,我建议用matcher.find。这样也许
final String s = "type=simple, output=Hello, world, repeat=true";
final Pattern p = Pattern.compile("((?:[^\\\\,]|\\\\.)*)(?:,|$)");
final Matcher m = p.matcher(s);
while (m.find()) System.out.println(m.group(1));
You'll probably want to skip the spaces after the comma as well:
你可能还想跳过逗号后面的空格:
final Pattern p = Pattern.compile("((?:[^\\\\,]|\\\\.)*)(?:,\\s*|$)");
It's not really complicated, just note that you need four backslashes in order to match one.
这并不复杂,只要注意你需要四个反斜杠来匹配一个。
#3
2
Escaping works with the opposite of aioobe's answer (updated: aioobe now uses the same construct but I didn't know that when I wrote this), negative lookbehind
escape与aioobe的答案相反(更新:aioobe现在使用相同的结构,但我不知道我写这篇文章的时候),消极的lookbehind
final String s = "type=simple, output=Hello\\, world, repeate=true";
final String[] tokens = s.split("(?<!\\\\),\\s*");
for(final String item : tokens){
System.out.println("'" + item.replace("\\,", ",") + "'");
}
Output:
输出:
'type=simple'
'output=Hello, world'
'repeate=true'类型=简单的“输出=你好,世界”“重复= true”
Reference:
参考:
- Pattern: Special Constructs
- 模式:特殊结构
#4
0
I think
我认为
input.split("[^\\\\],");
should work. It will split at all commas that are not preceeded with a backslash. BTW if you are working with Eclipse, I can recommend the QuickRex Plugin to test and debug Regexes.
应该工作。它将会分裂,因为之前没有使用反斜杠。顺便说一句,如果您正在使用Eclipse,我可以推荐QuickRex插件来测试和调试Regexes。