Java用正则表达式分割复杂的字符串

时间:2021-05-29 21:42:19

I have a following string:

我有一个以下字符串:

field 'data' OR field2 'data2 complex' AND (field3 'data3' OR field3 'data4')

I nedd to split it into a form:

我想把它分成一个形式:

[field,
data,
OR,
field2,
data2 complex,
AND,
(,
field3,
data3,
OR
field3,
data4,
)]

Is it possible to do it using regex? Please, help me to write correct one to solve that task. Thanks a lot

是否有可能使用正则表达式?请帮我写正确的一个来解决这个任务。非常感谢

2 个解决方案

#1


1  

You could also use this regex:

你也可以使用这个正则表达式:

    String[] list = s.split("'|(\\b(?![^']*?\\w'))");  

The output is:

输出是:

[field, , data, , OR, , field2, , data2 complex, , AND, (, field3, , data3, , OR, , field3, , data4, )]

[field ,, data,OR ,, field2,data2 complex ,, AND,(,field3,data3,,OR,field3,data4,)]

The idea to to split at word boundries (\\b) only if the next ' is an opening apostrophe, not a closing one (because then you would be inside).

只有当下一个'是开头撇号而不是结束撇号(因为那时你会在里面)时,才能分开单词边界(\\ b)。

I've tried to clean up the empty spaces without messing up the regex, and I couldn't find a way (I'm new to regex). So please be welcome to edit it if you can.

我试图清理空白空间而不搞乱正则表达式,我找不到办法(我是正则表达式的新手)。如果可以的话,欢迎您编辑。

#2


0  

If I read your requirements correctly, you want "single quote delimited sequences" OR parentheticals OR alphanumeric words.

如果我正确地阅读了您的要求,您需要“单引号分隔序列”或括号或字母数字字。

Thus you could use this regex (set global to true so you can tokenize it one at a time):

因此,您可以使用此正则表达式(将global设置为true,以便您可以一次标记一个):

/('[^']*?'|\w+|[\(\)])/g

[note: this simple regex would not account for nested or escaped single quotes in the string, to do this properly is possible with regex but much more complicated.]

[注意:这个简单的正则表达式不会考虑字符串中的嵌套或转义单引号,为了正确地执行此操作,可以使用正则表达式,但要复杂得多。]

if you wanted to a single match and then access your match groups to get your data, just account for the space delimiters:

如果您想要一个匹配,然后访问您的匹配组以获取您的数据,只需考虑空格分隔符:

/(?:('[^']*?'|\w+|[\(\)])\s*)+/

#1


1  

You could also use this regex:

你也可以使用这个正则表达式:

    String[] list = s.split("'|(\\b(?![^']*?\\w'))");  

The output is:

输出是:

[field, , data, , OR, , field2, , data2 complex, , AND, (, field3, , data3, , OR, , field3, , data4, )]

[field ,, data,OR ,, field2,data2 complex ,, AND,(,field3,data3,,OR,field3,data4,)]

The idea to to split at word boundries (\\b) only if the next ' is an opening apostrophe, not a closing one (because then you would be inside).

只有当下一个'是开头撇号而不是结束撇号(因为那时你会在里面)时,才能分开单词边界(\\ b)。

I've tried to clean up the empty spaces without messing up the regex, and I couldn't find a way (I'm new to regex). So please be welcome to edit it if you can.

我试图清理空白空间而不搞乱正则表达式,我找不到办法(我是正则表达式的新手)。如果可以的话,欢迎您编辑。

#2


0  

If I read your requirements correctly, you want "single quote delimited sequences" OR parentheticals OR alphanumeric words.

如果我正确地阅读了您的要求,您需要“单引号分隔序列”或括号或字母数字字。

Thus you could use this regex (set global to true so you can tokenize it one at a time):

因此,您可以使用此正则表达式(将global设置为true,以便您可以一次标记一个):

/('[^']*?'|\w+|[\(\)])/g

[note: this simple regex would not account for nested or escaped single quotes in the string, to do this properly is possible with regex but much more complicated.]

[注意:这个简单的正则表达式不会考虑字符串中的嵌套或转义单引号,为了正确地执行此操作,可以使用正则表达式,但要复杂得多。]

if you wanted to a single match and then access your match groups to get your data, just account for the space delimiters:

如果您想要一个匹配,然后访问您的匹配组以获取您的数据,只需考虑空格分隔符:

/(?:('[^']*?'|\w+|[\(\)])\s*)+/