Further on from my previous question about preg_split
which was answers super fast, thanks to nick; I would really like to extend the scenario to no split the string when a delimiter is within quotes. For example:
从我之前关于preg_split的问题开始,这个答案超级快,这要归功于尼克;当分隔符在引号内时,我真的想将场景扩展为不拆分字符串。例如:
If I have the string foo = bar AND bar=foo OR foobar="foo bar"
, I'd wish to split the sting on every space or =
character but include the =
character in the returned array (which works great currently), but I don't want to split the string either of the delimiters are within quotes.
如果我有串富=酒吧和酒吧= foo的OR = foobar的“富巴”,我想拆就每一个空间或=字符的刺痛,但包括返回数组中的字符=(目前的伟大作品),但是我不想拆分字符串中的任何一个分隔符都在引号内。
I've got this so far:
到目前为止我有这个:
<!doctype html>
<?php
$string = 'foo = bar AND bar=foo';
$array = preg_split('/ +|(=)/', $string, -1, PREG_SPLIT_DELIM_CAPTURE | PREG_SPLIT_NO_EMPTY);
?>
<pre>
<?php
print_r($array);
?>
</pre>
Which gets me:
哪个让我:
Array
(
[0] => foo
[1] => =
[2] => bar
[3] => AND
[4] => bar
[5] => =
[6] => foo
)
But if I changed the string to:
但是,如果我将字符串更改为:
$string = 'foo = bar AND bar=foo OR foobar = "foo bar"';
I'd really like the array to be:
我真的很喜欢这个数组:
Array
(
[0] => foo
[1] => =
[2] => bar
[3] => AND
[4] => bar
[5] => =
[6] => foo
[6] => OR
[6] => foobar
[6] => =
[6] => "foo bar"
)
Notice the "foo bar"
wasn't split on the space because it's in quotes?
请注意,“foo bar”没有在空格上拆分,因为它在引号中?
Really not sure how to do this within the RegEx or if there is even a better way but all your help would be very much appreciated!
真的不知道如何在RegEx中做到这一点,或者如果有更好的方法,但所有的帮助将非常感谢!
Thank you all in advance!
谢谢大家!
3 个解决方案
#1
2
I was able to do this by adding quoted strings as a delimiter a-la
我能够通过添加带引号的字符串作为分隔符a-la来完成此操作
"(.*?)"| +|(=)
The quoted part will be captured. It seems like this is a bit tenuous and I did not test it extensively, but it at least works on your example.
引用的部分将被捕获。这似乎有点脆弱,我没有广泛测试它,但它至少适用于你的例子。
#2
5
Try
尝试
$array = preg_split('/(?: +|(=))(?=(?:[^"]*"[^"]*")*[^"]*$)/', $string, -1, PREG_SPLIT_DELIM_CAPTURE | PREG_SPLIT_NO_EMPTY);
The
该
(?=(?:[^"]*"[^"]*")*[^"]*$)
part is a lookahead assertion making sure that there is an even number of quote characters ahead in the string, therefore it will fail if the current position is between quotes:
part是一个先行断言,确保字符串前面有一个偶数个引号字符,因此如果当前位置在引号之间,它将失败:
(?= # Assert that the following can be matched:
(?: # A group containing...
[^"]*" # any number of non-quote characters followed by one quote
[^"]*" # the same (to ensure an even number of quotes)
)* # ...repeated zero or more times,
[^"]* # followed by any number of non-quotes
$ # until the end of the string
)
#3
0
But why bother splitting?
但为什么还要分手呢?
After a look at this old question, this simple solution comes to mind, using a preg_match_all
rather than a preg_split
. We can use this simple regex to specify what we want:
在看了这个老问题之后,我想到了这个简单的解决方案,使用preg_match_all而不是preg_split。我们可以使用这个简单的正则表达式来指定我们想要的东西:
"[^"]*"|\b\w+\b|=
See online demo.
见在线演示。
#1
2
I was able to do this by adding quoted strings as a delimiter a-la
我能够通过添加带引号的字符串作为分隔符a-la来完成此操作
"(.*?)"| +|(=)
The quoted part will be captured. It seems like this is a bit tenuous and I did not test it extensively, but it at least works on your example.
引用的部分将被捕获。这似乎有点脆弱,我没有广泛测试它,但它至少适用于你的例子。
#2
5
Try
尝试
$array = preg_split('/(?: +|(=))(?=(?:[^"]*"[^"]*")*[^"]*$)/', $string, -1, PREG_SPLIT_DELIM_CAPTURE | PREG_SPLIT_NO_EMPTY);
The
该
(?=(?:[^"]*"[^"]*")*[^"]*$)
part is a lookahead assertion making sure that there is an even number of quote characters ahead in the string, therefore it will fail if the current position is between quotes:
part是一个先行断言,确保字符串前面有一个偶数个引号字符,因此如果当前位置在引号之间,它将失败:
(?= # Assert that the following can be matched:
(?: # A group containing...
[^"]*" # any number of non-quote characters followed by one quote
[^"]*" # the same (to ensure an even number of quotes)
)* # ...repeated zero or more times,
[^"]* # followed by any number of non-quotes
$ # until the end of the string
)
#3
0
But why bother splitting?
但为什么还要分手呢?
After a look at this old question, this simple solution comes to mind, using a preg_match_all
rather than a preg_split
. We can use this simple regex to specify what we want:
在看了这个老问题之后,我想到了这个简单的解决方案,使用preg_match_all而不是preg_split。我们可以使用这个简单的正则表达式来指定我们想要的东西:
"[^"]*"|\b\w+\b|=
See online demo.
见在线演示。