Java:使用Regex拆分字符串

时间:2021-05-17 21:42:30

I have to split a string using comma(,) as a separator and ignore any comma that is inside quotes(")

我必须使用逗号(,)作为分隔符拆分字符串,并忽略引号内的任何逗号(“)

fieldSeparator : ,
fieldGrouper : "

fieldSeparator:,fieldGrouper:“

The string to split is : "1","2",3,"4,5"

要拆分的字符串是:“1”,“2”,3,“4,5”

I am able to achieve it as follows :

我能够实现如下:

String record = "\"1\",\"2\",3,\"4,5\"";
String[] tokens = record.split(",(?=([^\"]*\"[^\"]*\")*[^\"]*$)");

Output :

输出:

"1"
"2"
3
"4,5"

Now the challenge is that the fieldGrouper(") should not be a part of the split tokens. I am unable to figure out the regex for this.

现在的挑战是fieldGrouper(“)不应该是拆分令牌的一部分。我无法弄清楚这个的正则表达式。

The expected output of the split is :

拆分的预期输出是:

1
2
3
4,5

4 个解决方案

#1


4  

Update:

更新:

String[] tokens = record.split( "(,*\",*\"*)" );

String [] tokens = record.split(“(,* \”,* \“*)”);

Result:
Java:使用Regex拆分字符串

结果:

Initial Solution:
( doesn't work @ .split method )

初始解决方案:(不起作用@ .split方法)

This RexEx pattern will isolate the sections you want:
(?:\\")(.*?)(?:\\")

此RexEx模式将隔离您想要的部分:(?:\\“)(。*?)(?:\\”)

It uses non-capturing groups to isolate the pairs of escaped quotes, and a capturing group to isolate everything in between.

它使用非捕获组来隔离转义引号对,并使用捕获组来隔离它们之间的所有内容。

Check it out here: Live Demo

在这里查看:现场演示

#2


2  

My suggestion:

我的建议:

"([^"]+)"|(?<=,|^)([^,]*)

See the regex demo. It will match "..." like strings and capture into Group 1 only what is in-between the quotes, and then will match and capture into Group 2 sequences of characters other than , at the start of a string or after a comma.

请参阅正则表达式演示。它将匹配“...”像字符串,只捕获引号之间的内容,然后匹配并捕获到字符串开头或逗号后的第2组字符序列。

Here is a Java sample code:

这是一个Java示例代码:

String s = "value1,\"1\",\"2\",3,\"4,5\",value2";
Pattern pattern = Pattern.compile("\"([^\"]+)\"|(?<=,|^)([^,]*)");
Matcher matcher = pattern.matcher(s);
List<String> res = new ArrayList<String>();
while (matcher.find()){                      // Run the matcher
    if (matcher.group(1) != null) {          // If Group 1 matched
        res.add(matcher.group(1));           // Add it to the resulting array
    } else {
        res.add(matcher.group(2));           // Add Group 2 as it got matched
    }
} 
System.out.println(res); // => [value1, 1, 2, 3, 4,5, value2]

#3


1  

I would try with this kind of workaround:

我会尝试这种解决方法:

String record = "\"1\",\"2\",3,\"4,5\"";
record = record.replaceAll("\"?(?<!\"\\w{1,9999}),\"?|\""," ");
String[] tokens = record.trim().split(" ");
for(String str : tokens){
    System.out.println(str);
}

Output:

输出:

1
2
3
4,5

#4


0  

My proposition:

我的主张:

record = record.replaceAll("\",", "|");
record = record.replaceAll(",\\\"", "|");
record = record.replaceAll("\"", "");

String[] tokens = record.split("\\|");

for (String token : tokens) {
   System.out.println(token);
}

#1


4  

Update:

更新:

String[] tokens = record.split( "(,*\",*\"*)" );

String [] tokens = record.split(“(,* \”,* \“*)”);

Result:
Java:使用Regex拆分字符串

结果:

Initial Solution:
( doesn't work @ .split method )

初始解决方案:(不起作用@ .split方法)

This RexEx pattern will isolate the sections you want:
(?:\\")(.*?)(?:\\")

此RexEx模式将隔离您想要的部分:(?:\\“)(。*?)(?:\\”)

It uses non-capturing groups to isolate the pairs of escaped quotes, and a capturing group to isolate everything in between.

它使用非捕获组来隔离转义引号对,并使用捕获组来隔离它们之间的所有内容。

Check it out here: Live Demo

在这里查看:现场演示

#2


2  

My suggestion:

我的建议:

"([^"]+)"|(?<=,|^)([^,]*)

See the regex demo. It will match "..." like strings and capture into Group 1 only what is in-between the quotes, and then will match and capture into Group 2 sequences of characters other than , at the start of a string or after a comma.

请参阅正则表达式演示。它将匹配“...”像字符串,只捕获引号之间的内容,然后匹配并捕获到字符串开头或逗号后的第2组字符序列。

Here is a Java sample code:

这是一个Java示例代码:

String s = "value1,\"1\",\"2\",3,\"4,5\",value2";
Pattern pattern = Pattern.compile("\"([^\"]+)\"|(?<=,|^)([^,]*)");
Matcher matcher = pattern.matcher(s);
List<String> res = new ArrayList<String>();
while (matcher.find()){                      // Run the matcher
    if (matcher.group(1) != null) {          // If Group 1 matched
        res.add(matcher.group(1));           // Add it to the resulting array
    } else {
        res.add(matcher.group(2));           // Add Group 2 as it got matched
    }
} 
System.out.println(res); // => [value1, 1, 2, 3, 4,5, value2]

#3


1  

I would try with this kind of workaround:

我会尝试这种解决方法:

String record = "\"1\",\"2\",3,\"4,5\"";
record = record.replaceAll("\"?(?<!\"\\w{1,9999}),\"?|\""," ");
String[] tokens = record.trim().split(" ");
for(String str : tokens){
    System.out.println(str);
}

Output:

输出:

1
2
3
4,5

#4


0  

My proposition:

我的主张:

record = record.replaceAll("\",", "|");
record = record.replaceAll(",\\\"", "|");
record = record.replaceAll("\"", "");

String[] tokens = record.split("\\|");

for (String token : tokens) {
   System.out.println(token);
}