Java：正则表达式转义正则表达式

This sample data is returned by Web Service

此示例数据由Web Service返回

200,6, "California, USA"

200,6，“美国加利福尼亚州”

I want to split them using split(",") and tried to see the result using simple code.

我想用split（“，”）拆分它们，并尝试使用简单的代码查看结果。

String loc = "200,6,\"California, USA\"";       
String[] s = loc.split(",");

for(String f : s)
   System.out.println(f);

Unfortunately this is the result

不幸的是，这是结果

200
6
"California
 USA"

The expected result should be

预期的结果应该是

200
6
"California, USA"

I tried different regular expressions and no luck. Is it possible to escape the given regular expression inside of "" ?

我尝试了不同的正则表达式，没有运气。是否有可能逃避“”中的给定正则表达式？

UPDATE 1: Added C# Code

更新1：添加了C＃代码

UPDATE 2: Removed C# Code

更新2：删除了C＃代码

4 个解决方案

#1

,(?=(?:[^"]|"[^"]*")*$)

This is the regex you want (To put it in the split function you'll need to escape the quotes in the string)

这是你想要的正则表达式（要将它放在split函数中，你需要转义字符串中的引号）

Explanation

说明

You need to find all ','s not in quotes.. That is you need lookahead (http://www.regular-expressions.info/lookaround.html) to see whether your current matching comma is within quotes or out.

你需要找到所有'，'不在引号中。那就是你需要预测（http://www.regular-expressions.info/lookaround.html）来查看当前匹配的逗号是在引号内还是在引号内。

To do that we use lookahead to basically ensure the current matching ',' is followed by an EVEN number of '"' characters (meaning that it lies outside quotes)

要做到这一点，我们使用lookahead基本上确保当前匹配'，'后面跟着偶数个'''字符（意思是它位于引号之外）

So (?:[^"]|"[^"]*")*$ means match only when there are non quote characters till the end OR a pair of quotes with anything in between them

所以（？：[^“] |”[^“] *”）* $表示仅当非引号字符结束时才匹配或者引用它们之间有任何引号的引号

(?=(?:[^"]|"[^"]*")*$) will lookahead for the above match

（？=（？：[^“] |”[^“] *”）* $）将预见上述比赛

,(?=(?:[^"]|"[^"]*")*$) and finally this will match all ',' with the above lookahead

，（？=（？：[^“] |”[^“] *”）* $）最后这将匹配所有'，'与上述前瞻

#2

An easier solution might be to use an existing library, such as OpenCSV to parse your data. This can be accomplished in two lines using this library:

更简单的解决方案可能是使用现有的库（如OpenCSV）来解析数据。这可以使用这个库在两行中完成：

CSVParser parser = new CSVParser();
String [] data = parser.parseLine(inputLine);

This will become especially important if you have more complex CSV values coming back in the future (multiline values, or values with escaped quotes inside an element, etc). If you don't want to add the dependency, you could always use their code as a reference (though it is not based on RegEx)

如果您将来会有更复杂的CSV值（多行值，或元素中带有转义引号的值等），这将变得尤为重要。如果您不想添加依赖项，则可以始终使用其代码作为参考（尽管它不基于RegEx）

#3

If there's a good lexer/parser library for Java, you could define a lexer like the following pseudo-lexer code:

如果有一个很好的Java词法分析器/解析器库，你可以定义一个类似于以下伪词法分析器的词法分析器：

Delimiter: ,
Item: ([^,"]+) | ("[^,"]+")
Data: Item Delimiter Data | Item

How lexers work is that it starts at the top level token definition (in this case Data) and attempts to form tokens out of the string until it cannot or until the string is all gone. So in the case of your string the following would happen:

词法分析器的工作原理是它从*令牌定义（在本例中为Data）开始，并尝试从字符串中形成令牌，直到它不能或直到字符串全部消失为止。因此，对于您的字符串，将发生以下情况：

I want to make Data out of 200,6, "California, USA".
我想从200,6，“加利福尼亚，美国”制作数据。
I can make Data out of an Item, a Delimiter and Data.
我可以从一个项目，一个分隔符和数据中创建数据。
I looked - 200 is an Item and then , is a Delimiter so I can tokenize that and keep going.
我看了 - 200是一个项目，然后，是一个分隔符，所以我可以标记并继续前进。
I want to make data out of 6, "California, USA"
我想从6个“美国加利福尼亚”中获取数据
I can make Data out of an Item, a Delimiter and Data.
我可以从一个项目，一个分隔符和数据中创建数据。
I looked - 6 is an Item and then , is a Delimiter so I can tokenize that and keep going.
我看了 - 6是一个项目，然后，是一个分隔符，所以我可以标记并继续前进。
I want to make data out of "California, USA"
我想从“加利福尼亚，美国”制作数据
I can make Data out of an Item, a Delimiter and Data.
我可以从一个项目，一个分隔符和数据中创建数据。
I looked - "California, USA" is an Item, but I see no Delimiter after it, so let's try something else.
我看了 - “加利福尼亚，美国”是一个项目，但我看不到Delimiter，所以让我们尝试其他的东西。
I can make Data out of an Item.
我可以从一个项目中创建数据。
I looked - "California, USA" is an item, so I can tokenize that and keep going.
我看了 - “加利福尼亚，美国”是一个项目，所以我可以将其标记并继续前进。
The string is empty. I'm done. Here's your tokens.
该字符串为空。我受够了。这是你的代币。

(I learned about how lexers work from the guide to PLY, a Python lexer/parser: http://www.dabeaz.com/ply/ply.html )

（我了解了词法分析器如何从指南PLY工作，这是一个Python词法分析器/解析器：http：//www.dabeaz.com/ply/ply.html）

#4

Hello Try this Expression.

你好试试这个表达式。

public class Test {

    /**
     * @param args
     */
    public static void main(String[] args) {
        String loc = "200,6,\"Paris, France\"";  
        String[] str1 =loc.split(",(?=(?:[^\"]|\"[^\"]*\")*$)");

        for(String tmp : str1 ){
            System.out.println(tmp);
        }

    }

}

#1

,(?=(?:[^"]|"[^"]*")*$)