I would like to parse shortcode into array via "preg_split".
我想通过“preg_split”将短代码解析成数组。
This is example shortcode:
这是示例短代码:
[contactform id="8411" label="This is \" first label" label2='This is second \' label']
and this should be result array:
这应该是结果数组:
Array ( [id] => 8411 [label] => This is \" first label [label2] => This is second \' label )
I have this regexp:
我有这个正则表达式:
$atts_arr = preg_split('~\s+(?=(?:[^\'"]*[\'"][^\'"]*[\'"])*[^\'"]*$)~', trim($shortcode, '[]'));
Unfortunately, this works only if there is no escaping of quotes \'
or \"
.
不幸的是,这只有在没有引用\'或\“的情况下才有效。
Thx in advance!
Thx提前!
1 个解决方案
#1
Using preg_split
is not always handy or appropriate in particular when you have to deal with escaped quotes. So, a better approach consists to use preg_match_all
, example:
使用preg_split并不总是方便或适当,特别是当你必须处理转义引号时。因此,更好的方法是使用preg_match_all,例如:
$pattern = <<<'EOD'
~
(\w+) \s*=
(?|
\s* "([^"\\]*(?:\\.[^"\\]*)*)"
|
\s* '([^'\\]*(?:\\.[^'\\]*)*)'
# | uncomment if you want to handle unquoted attributes
# ([^]\s]*)
)
~xs
EOD;
if (preg_match_all($pattern, $yourshortcode, $matches))
$attributes = array_combine($matches[1], $matches[2]);
The pattern uses the branch reset feature (?|...(..)...|...(...)..)
that gives the same number(s) to the capture groups for each branch.
该模式使用分支重置功能(?| ...(..)... | ...(...)..),它为每个分支的捕获组提供相同的数字。
I was speaking about the \G
anchor in my comment, this anchor succeeds if the current position is immediatly after the last match. It can be useful if you want to check the syntax of your shortcode from start to end at the same time (otherwise it is totally useless). Example:
我在评论中谈到了\ G锚点,如果当前位置在最后一场比赛后立即成功,这个锚点会成功。如果您想同时从头到尾检查短代码的语法,这将非常有用(否则它完全没用)。例:
$pattern2 = <<<'EOD'
~
(?:
\G(?!\A) # anchor for the position after the last match
# it ensures that all matches are contiguous
|
\[(?<tagName>\w+) # begining of the shortcode
)
\s+
(?<key>\w+) \s*=
(?|
\s* "(?<value>[^"\\]*(?:\\.[^"\\]*)*)"
|
\s* '([^'\\]*(?:\\.[^'\\]*)*')
# | uncomment if you want to handle unquoted attributes
# ([^]\s]*)
)
(?<end>\s*+]\z)? # check that the end has been reached
~xs
EOD;
if (preg_match_all($pattern2, $yourshortcode, $matches) && isset($matches['end']))
$attributes = array_combine($matches['key'], $matches['value']);
#1
Using preg_split
is not always handy or appropriate in particular when you have to deal with escaped quotes. So, a better approach consists to use preg_match_all
, example:
使用preg_split并不总是方便或适当,特别是当你必须处理转义引号时。因此,更好的方法是使用preg_match_all,例如:
$pattern = <<<'EOD'
~
(\w+) \s*=
(?|
\s* "([^"\\]*(?:\\.[^"\\]*)*)"
|
\s* '([^'\\]*(?:\\.[^'\\]*)*)'
# | uncomment if you want to handle unquoted attributes
# ([^]\s]*)
)
~xs
EOD;
if (preg_match_all($pattern, $yourshortcode, $matches))
$attributes = array_combine($matches[1], $matches[2]);
The pattern uses the branch reset feature (?|...(..)...|...(...)..)
that gives the same number(s) to the capture groups for each branch.
该模式使用分支重置功能(?| ...(..)... | ...(...)..),它为每个分支的捕获组提供相同的数字。
I was speaking about the \G
anchor in my comment, this anchor succeeds if the current position is immediatly after the last match. It can be useful if you want to check the syntax of your shortcode from start to end at the same time (otherwise it is totally useless). Example:
我在评论中谈到了\ G锚点,如果当前位置在最后一场比赛后立即成功,这个锚点会成功。如果您想同时从头到尾检查短代码的语法,这将非常有用(否则它完全没用)。例:
$pattern2 = <<<'EOD'
~
(?:
\G(?!\A) # anchor for the position after the last match
# it ensures that all matches are contiguous
|
\[(?<tagName>\w+) # begining of the shortcode
)
\s+
(?<key>\w+) \s*=
(?|
\s* "(?<value>[^"\\]*(?:\\.[^"\\]*)*)"
|
\s* '([^'\\]*(?:\\.[^'\\]*)*')
# | uncomment if you want to handle unquoted attributes
# ([^]\s]*)
)
(?<end>\s*+]\z)? # check that the end has been reached
~xs
EOD;
if (preg_match_all($pattern2, $yourshortcode, $matches) && isset($matches['end']))
$attributes = array_combine($matches['key'], $matches['value']);