如何使用排除转义版本的令牌的正则表达式分割字符串?

时间:2021-03-15 21:39:50

In Java, I'm using the String split method to split a string containing values separated by semicolons.

在Java中,我使用String split方法来分割一个包含由分号分隔的值的字符串。

Currently, I have the following line that works in 99% of all cases.

目前,在99%的情况下,我都有如下一行。

String[] fields = optionsTxt.split(";");

However, the requirement has been added to include escaped semicolons as part of the string. So, the following strings should parse out to the following values:

但是,已经添加了需求,将转义的分号作为字符串的一部分。因此,以下字符串应该解析为以下值:

"Foo foo;Bar bar" => [Foo foo] [Bar bar]
"Foo foo\; foo foo;Bar bar bar" => [Foo foo\; foo foo] [Bar bar bar]

This should be painfully simple, but I'm totally unsure about how to go about it. I just want to not tokenize when there is a \; and only tokenize when there is a ;.

这应该很简单,但我完全不确定该怎么做。我只是想在有问题的时候不去做。只有当有a的时候,才会让人知道;

Does anyone out there know the magic formula?

有人知道这个神奇的公式吗?

4 个解决方案

#1


2  

try this:

试试这个:

String[] fields = optionsTxt.split("(?<!\\\\);");

#2


1  

There's probably a better way but the quick-and-dirty method would be to first replace \; with some string that won't appear in your input buffers, like {{ESCAPED_SEMICOLON}}, then do the tokenize on ;, and then when you pull out each token do the original substitution in reverse to put back the \;

可能有更好的方法,但快速和脏的方法是先替换\;有一些字符串不会出现在您的输入缓冲区中,比如{{{escaped_分号}},然后执行标记;然后当您取出每个令牌时,执行反向的原始替换以放回\;

#3


1  

Using a regular expression (java.util.regex)

使用正则表达式(java.util.regex)

[^\\];

should be what you are looking for without doing a double replace.

应该是你所寻找的,而不需要做双重替换。

try it out using a tool like this

用这样的工具试试

#4


0  

Using only your provided examples, you can use objects' code from above. If you want the split to happen only when there's an even number of backslashes before your semi-colon, try this:

只使用您提供的示例,您可以使用上面的对象代码。如果你想要分裂只发生在你的分号前有一个偶数反斜杠的时候,试试以下方法:

String[] fields = optionsTxt.split("((?<!\\\\)|(?<=[^\\\\](\\\\\\\\){0,15}));");

I've picked 15 arbitrarily. Change it to a higher number if need be.

我随便选了15。如果需要的话,把它改成更高的数字。

#1


2  

try this:

试试这个:

String[] fields = optionsTxt.split("(?<!\\\\);");

#2


1  

There's probably a better way but the quick-and-dirty method would be to first replace \; with some string that won't appear in your input buffers, like {{ESCAPED_SEMICOLON}}, then do the tokenize on ;, and then when you pull out each token do the original substitution in reverse to put back the \;

可能有更好的方法,但快速和脏的方法是先替换\;有一些字符串不会出现在您的输入缓冲区中,比如{{{escaped_分号}},然后执行标记;然后当您取出每个令牌时,执行反向的原始替换以放回\;

#3


1  

Using a regular expression (java.util.regex)

使用正则表达式(java.util.regex)

[^\\];

should be what you are looking for without doing a double replace.

应该是你所寻找的,而不需要做双重替换。

try it out using a tool like this

用这样的工具试试

#4


0  

Using only your provided examples, you can use objects' code from above. If you want the split to happen only when there's an even number of backslashes before your semi-colon, try this:

只使用您提供的示例,您可以使用上面的对象代码。如果你想要分裂只发生在你的分号前有一个偶数反斜杠的时候,试试以下方法:

String[] fields = optionsTxt.split("((?<!\\\\)|(?<=[^\\\\](\\\\\\\\){0,15}));");

I've picked 15 arbitrarily. Change it to a higher number if need be.

我随便选了15。如果需要的话,把它改成更高的数字。