使用正则表达式爆炸字符串

时间:2022-11-09 20:15:59

I have a string as below (the letters in the example could be numbers or texts and could be either uppercase or lowercase or both. If a value is a sentence, it should be between single quotations):

我有一个字符串如下(示例中的字母可以是数字或文本,可以是大写或小写或两者。如果值是一个句子,它应该在单引号之间):

$string="a,b,c,(d,e,f),g,'h, i j.',k";

How can I explode that to get the following result?

我如何爆炸以获得以下结果?

Array([0]=>"a",[1]=>"b",[2]=>"c",[3]=>"(d,e,f)",[4]=>"g",[5]=>"'h,i j'",[6]=>"k")

I think using regular expressions will be a fast as well as clean solution. Any idea?

我认为使用正则表达式将是一个快速而干净的解决方案。任何想法?

EDIT: This is what I have done so far, which is very slow for the strings having a long part between parenthesis:

编辑:这是我到目前为止所做的,对于在括号之间有长部分的字符串,这是非常慢的:

$separator="*"; // whatever which is not used in the string
$Pattern="'[^,]([^']+),([^']+)[^,]'";
while(ereg($Pattern,$String,$Regs)){
    $String=ereg_replace($Pattern,"'\\1$separator\\2'",$String);
}

$Pattern="\(([^(^']+),([^)^']+)\)";
while(ereg($Pattern,$String,$Regs)){
    $String=ereg_replace($Pattern,"(\\1$separator\\2)",$String);
}

return $String;

This, will replace all the commas between the parenthesis. Then I can explode it by commas and the replace the $separator with the original comma.

这将替换括号之间的所有逗号。然后我可以用逗号爆炸它并用原始逗号替换$ separator。

1 个解决方案

#1


4  

You can do the job using preg_match_all

你可以使用preg_match_all完成这项工作

$string="a,b,c,(d,e,f),g,'h, i j.',k";

preg_match_all('~\'[^\']++\'|\([^)]++\)|[^,]++~', $string,$result);
print_r($result[0]);

Explanation:

The trick is to match parenthesis before the ,

诀窍是在之前匹配括号,

~          Pattern delimiter
'
[^']       All charaters but not a single quote
++         one or more time in [possessive][1] mode
'
|          or
\([^)]++\) the same with parenthesis
|          or
[^,]       All characters but not a comma
++
~

if you have more than one delimiter like quotes (that are the same for open and close), you can write your pattern like this, using a capture group:

如果你有多个分隔符,如引号(打开和关闭相同),你可以使用捕获组编写这样的模式:

$string="a,b,c,(d,e,f),g,'h, i j.',k,°l,m°,#o,p#,@q,r@,s";

preg_match_all('~([\'#@°]).*?\1|\([^)]++\)|[^,]++~', $string,$result);
print_r($result[0]);

explanation:

(['#@°])   one character in the class is captured in group 1
.*?        any character zero or more time in lazy mode 
\1         group 1 content

With nested parenthesis:

使用嵌套括号:

$string="a,b,(c,(d,(e),f),t),g,'h, i j.',k,°l,m°,#o,p#,@q,r@,s";

preg_match_all('~([\'#@°]).*?\1|(\((?>[^()]++|(?-1)?)*\))|[^,]++~', $string,$result);
print_r($result[0]);

#1


4  

You can do the job using preg_match_all

你可以使用preg_match_all完成这项工作

$string="a,b,c,(d,e,f),g,'h, i j.',k";

preg_match_all('~\'[^\']++\'|\([^)]++\)|[^,]++~', $string,$result);
print_r($result[0]);

Explanation:

The trick is to match parenthesis before the ,

诀窍是在之前匹配括号,

~          Pattern delimiter
'
[^']       All charaters but not a single quote
++         one or more time in [possessive][1] mode
'
|          or
\([^)]++\) the same with parenthesis
|          or
[^,]       All characters but not a comma
++
~

if you have more than one delimiter like quotes (that are the same for open and close), you can write your pattern like this, using a capture group:

如果你有多个分隔符,如引号(打开和关闭相同),你可以使用捕获组编写这样的模式:

$string="a,b,c,(d,e,f),g,'h, i j.',k,°l,m°,#o,p#,@q,r@,s";

preg_match_all('~([\'#@°]).*?\1|\([^)]++\)|[^,]++~', $string,$result);
print_r($result[0]);

explanation:

(['#@°])   one character in the class is captured in group 1
.*?        any character zero or more time in lazy mode 
\1         group 1 content

With nested parenthesis:

使用嵌套括号:

$string="a,b,(c,(d,(e),f),t),g,'h, i j.',k,°l,m°,#o,p#,@q,r@,s";

preg_match_all('~([\'#@°]).*?\1|(\((?>[^()]++|(?-1)?)*\))|[^,]++~', $string,$result);
print_r($result[0]);