I have to split a string using comma(,) as a separator and ignore any comma that is inside quotes(")
我必须使用逗号(,)作为分隔符拆分字符串,并忽略引号内的任何逗号(“)
fieldSeparator : ,
fieldGrouper : "
fieldSeparator:,fieldGrouper:“
The string to split is : "1","2",3,"4,5"
要拆分的字符串是:“1”,“2”,3,“4,5”
I am able to achieve it as follows :
我能够实现如下:
String record = "\"1\",\"2\",3,\"4,5\"";
String[] tokens = record.split(",(?=([^\"]*\"[^\"]*\")*[^\"]*$)");
Output :
输出:
"1"
"2"
3
"4,5"
Now the challenge is that the fieldGrouper(") should not be a part of the split tokens. I am unable to figure out the regex for this.
现在的挑战是fieldGrouper(“)不应该是拆分令牌的一部分。我无法弄清楚这个的正则表达式。
The expected output of the split is :
拆分的预期输出是:
1
2
3
4,5
4 个解决方案
#1
4
Update:
更新:
String[] tokens = record.split( "(,*\",*\"*)" );
String [] tokens = record.split(“(,* \”,* \“*)”);
Result:
结果:
Initial Solution:
( doesn't work @.split
method )初始解决方案:(不起作用@ .split方法)
This RexEx pattern will isolate the sections you want:
(?:\\")(.*?)(?:\\")
此RexEx模式将隔离您想要的部分:(?:\\“)(。*?)(?:\\”)
It uses non-capturing groups to isolate the pairs of escaped quotes, and a capturing group to isolate everything in between.
它使用非捕获组来隔离转义引号对,并使用捕获组来隔离它们之间的所有内容。
Check it out here: Live Demo
在这里查看:现场演示
#2
2
My suggestion:
我的建议:
"([^"]+)"|(?<=,|^)([^,]*)
See the regex demo. It will match "..."
like strings and capture into Group 1 only what is in-between the quotes, and then will match and capture into Group 2 sequences of characters other than ,
at the start of a string or after a comma.
请参阅正则表达式演示。它将匹配“...”像字符串,只捕获引号之间的内容,然后匹配并捕获到字符串开头或逗号后的第2组字符序列。
Here is a Java sample code:
这是一个Java示例代码:
String s = "value1,\"1\",\"2\",3,\"4,5\",value2";
Pattern pattern = Pattern.compile("\"([^\"]+)\"|(?<=,|^)([^,]*)");
Matcher matcher = pattern.matcher(s);
List<String> res = new ArrayList<String>();
while (matcher.find()){ // Run the matcher
if (matcher.group(1) != null) { // If Group 1 matched
res.add(matcher.group(1)); // Add it to the resulting array
} else {
res.add(matcher.group(2)); // Add Group 2 as it got matched
}
}
System.out.println(res); // => [value1, 1, 2, 3, 4,5, value2]
#3
1
I would try with this kind of workaround:
我会尝试这种解决方法:
String record = "\"1\",\"2\",3,\"4,5\"";
record = record.replaceAll("\"?(?<!\"\\w{1,9999}),\"?|\""," ");
String[] tokens = record.trim().split(" ");
for(String str : tokens){
System.out.println(str);
}
Output:
输出:
1
2
3
4,5
#4
0
My proposition:
我的主张:
record = record.replaceAll("\",", "|");
record = record.replaceAll(",\\\"", "|");
record = record.replaceAll("\"", "");
String[] tokens = record.split("\\|");
for (String token : tokens) {
System.out.println(token);
}
#1
4
Update:
更新:
String[] tokens = record.split( "(,*\",*\"*)" );
String [] tokens = record.split(“(,* \”,* \“*)”);
Result:
结果:
Initial Solution:
( doesn't work @.split
method )初始解决方案:(不起作用@ .split方法)
This RexEx pattern will isolate the sections you want:
(?:\\")(.*?)(?:\\")
此RexEx模式将隔离您想要的部分:(?:\\“)(。*?)(?:\\”)
It uses non-capturing groups to isolate the pairs of escaped quotes, and a capturing group to isolate everything in between.
它使用非捕获组来隔离转义引号对,并使用捕获组来隔离它们之间的所有内容。
Check it out here: Live Demo
在这里查看:现场演示
#2
2
My suggestion:
我的建议:
"([^"]+)"|(?<=,|^)([^,]*)
See the regex demo. It will match "..."
like strings and capture into Group 1 only what is in-between the quotes, and then will match and capture into Group 2 sequences of characters other than ,
at the start of a string or after a comma.
请参阅正则表达式演示。它将匹配“...”像字符串,只捕获引号之间的内容,然后匹配并捕获到字符串开头或逗号后的第2组字符序列。
Here is a Java sample code:
这是一个Java示例代码:
String s = "value1,\"1\",\"2\",3,\"4,5\",value2";
Pattern pattern = Pattern.compile("\"([^\"]+)\"|(?<=,|^)([^,]*)");
Matcher matcher = pattern.matcher(s);
List<String> res = new ArrayList<String>();
while (matcher.find()){ // Run the matcher
if (matcher.group(1) != null) { // If Group 1 matched
res.add(matcher.group(1)); // Add it to the resulting array
} else {
res.add(matcher.group(2)); // Add Group 2 as it got matched
}
}
System.out.println(res); // => [value1, 1, 2, 3, 4,5, value2]
#3
1
I would try with this kind of workaround:
我会尝试这种解决方法:
String record = "\"1\",\"2\",3,\"4,5\"";
record = record.replaceAll("\"?(?<!\"\\w{1,9999}),\"?|\""," ");
String[] tokens = record.trim().split(" ");
for(String str : tokens){
System.out.println(str);
}
Output:
输出:
1
2
3
4,5
#4
0
My proposition:
我的主张:
record = record.replaceAll("\",", "|");
record = record.replaceAll(",\\\"", "|");
record = record.replaceAll("\"", "");
String[] tokens = record.split("\\|");
for (String token : tokens) {
System.out.println(token);
}