I have a string [{"Id":"1","msg":""Lorem Ipsum""}]
in which I need to just escape the quotes inside the quotes like this [{"Id":"1","msg":"\"Lorem Ipsum\""}]
. I don't have access to generator code to modify, so I'm looking for a regex solution or efficient Java solution.
我有一个字符串[{“Id”:“1”,“msg”:“”Lorem Ipsum“”}]我需要在引号内转义引号,如[{“Id”:“1”, “msg”:“\”Lorem Ipsum \“”}]。我没有访问生成器代码来修改,所以我正在寻找一个正则表达式解决方案或高效的Java解决方案。
I tried selecting matches with \"[^\"]*?(\"*)[^\"]*?\"
which is of no use. Any help is really appreciated. Thanks in advance.
我尝试用\“[^ \”] *?(\“*)[^ \”] *?\“选择匹配,这是没用的。非常感谢任何帮助。提前致谢。
Note that it isn't guaranteed that the pattern is always two double quotes together, it can be something like this too "Lorem "Ipsum" test"
, which should become "Lorem \"Ipsum\" test"
.
请注意,不能保证模式总是两个双引号,它也可能是这样的“Lorem”Ipsum“test”,它应该变成“Lorem”Ipsum \“test”。
PS: I've already looked at Regular expression to escape double quotes within double quotes
PS:我已经看过正则表达式以避免双引号内的双引号
3 个解决方案
#1
3
The problem
A finite automaton - the theoretical equivalent of a regex - can't parse recursive structures. Since you can have inner quotes, and possible inner-inner quotes, your problem can't be solved using a regex.
有限自动机 - 正则表达式的理论等价物 - 无法解析递归结构。由于您可以使用内部引号和可能的内部引号,因此使用正则表达式无法解决您的问题。
Although modern regex engines can overcome this problem with several extensions, don't waste your time on hunting quotes-within-quotes. You'll soon find out that you're actually building a full blown JSON parser.
虽然现代的正则表达式引擎可以通过几个扩展来克服这个问题,但不要浪费你的时间在引号内搜索引号。您很快就会发现,您实际上正在构建一个完整的JSON解析器。
As @johnchen902 stated, even a turing-machine powered parser can not handle ambiguities - so you better not try to suggest a fix to the broken JSON.
正如@ johnchen902所说,即使是图灵机驱动的解析器也无法处理歧义 - 所以你最好不要试图修复破坏的JSON。
Solutions
Create the JSON using a dedicated utility
The given string is not a valid JSON. It's probably created using string concatenation, which is generally a bad idea because it does not escape correctly. You should use a JSON library that can build JSON from a Java data structure, like gson. Create a list of Objects, add an Object-to-Object dictionary to it, and let the library do the escaping and conversions.
给定的字符串不是有效的JSON。它可能是使用字符串连接创建的,这通常是一个坏主意,因为它无法正确转义。您应该使用可以从Java数据结构构建JSON的JSON库,例如gson。创建一个对象列表,向其中添加一个Object-to-Object字典,然后让库进行转义和转换。
Ask the creator to use a validator
If you have received the String from an external source, it's perfectly legitimate to ask for a valid json you can work with. I guess that the creator stitched Strings together, which is the wrong way to build a structured language. Ask the original creator to use a standard library for creating JSONs, or at least use a validator. All modern programming languages offer these mechanisms.
如果您从外部源接收到字符串,那么请求您可以使用的有效json是完全合法的。我想创作者将Strings拼接在一起,这是构建结构化语言的错误方法。请原始创建者使用标准库来创建JSON,或者至少使用验证器。所有现代编程语言都提供这些机制。
#2
2
No, you can't, because a string may have several meanings.
不,你不能,因为一个字符串可能有几个含义。
For example:
例如:
[{"Id":"1","msg":""Lorem Ipsum""}]
May means
可能意味着
[{"Id":"1","msg":""Lorem Ipsum""}]
That is, it can be escaped (parsed) as
也就是说,它可以被转义(解析)为
[{"Id":"1\",\"msg\":\"\"Lorem Ipsum\""}]
There's no way for a program to determine its meaning unless more rules are given.
除非给出更多规则,否则程序无法确定其含义。
#3
0
String escaped = str.replaceAll(":\"\"(.+?)\"\"([,}])", ":\"\\\\\"$1\\\\\"\"$2");
#1
3
The problem
A finite automaton - the theoretical equivalent of a regex - can't parse recursive structures. Since you can have inner quotes, and possible inner-inner quotes, your problem can't be solved using a regex.
有限自动机 - 正则表达式的理论等价物 - 无法解析递归结构。由于您可以使用内部引号和可能的内部引号,因此使用正则表达式无法解决您的问题。
Although modern regex engines can overcome this problem with several extensions, don't waste your time on hunting quotes-within-quotes. You'll soon find out that you're actually building a full blown JSON parser.
虽然现代的正则表达式引擎可以通过几个扩展来克服这个问题,但不要浪费你的时间在引号内搜索引号。您很快就会发现,您实际上正在构建一个完整的JSON解析器。
As @johnchen902 stated, even a turing-machine powered parser can not handle ambiguities - so you better not try to suggest a fix to the broken JSON.
正如@ johnchen902所说,即使是图灵机驱动的解析器也无法处理歧义 - 所以你最好不要试图修复破坏的JSON。
Solutions
Create the JSON using a dedicated utility
The given string is not a valid JSON. It's probably created using string concatenation, which is generally a bad idea because it does not escape correctly. You should use a JSON library that can build JSON from a Java data structure, like gson. Create a list of Objects, add an Object-to-Object dictionary to it, and let the library do the escaping and conversions.
给定的字符串不是有效的JSON。它可能是使用字符串连接创建的,这通常是一个坏主意,因为它无法正确转义。您应该使用可以从Java数据结构构建JSON的JSON库,例如gson。创建一个对象列表,向其中添加一个Object-to-Object字典,然后让库进行转义和转换。
Ask the creator to use a validator
If you have received the String from an external source, it's perfectly legitimate to ask for a valid json you can work with. I guess that the creator stitched Strings together, which is the wrong way to build a structured language. Ask the original creator to use a standard library for creating JSONs, or at least use a validator. All modern programming languages offer these mechanisms.
如果您从外部源接收到字符串,那么请求您可以使用的有效json是完全合法的。我想创作者将Strings拼接在一起,这是构建结构化语言的错误方法。请原始创建者使用标准库来创建JSON,或者至少使用验证器。所有现代编程语言都提供这些机制。
#2
2
No, you can't, because a string may have several meanings.
不,你不能,因为一个字符串可能有几个含义。
For example:
例如:
[{"Id":"1","msg":""Lorem Ipsum""}]
May means
可能意味着
[{"Id":"1","msg":""Lorem Ipsum""}]
That is, it can be escaped (parsed) as
也就是说,它可以被转义(解析)为
[{"Id":"1\",\"msg\":\"\"Lorem Ipsum\""}]
There's no way for a program to determine its meaning unless more rules are given.
除非给出更多规则,否则程序无法确定其含义。
#3
0
String escaped = str.replaceAll(":\"\"(.+?)\"\"([,}])", ":\"\\\\\"$1\\\\\"\"$2");