I'm trying to write a regular expression to split a string into separate elements inside matching curly braces. First off, it needs to be recursive, and second off, it has to return the offsets (like with PREG_OFFSET_CAPTURE
).
我正在尝试编写一个正则表达式,将字符串拆分为匹配花括号内的单独元素。首先,它需要递归,其次,它必须返回偏移量(如PREG_OFFSET_CAPTURE)。
I actually think this is probably a less efficient way to process this data, but I'm unsure of an easier, more performance driven technique. (If you've got one, I would love to hear it!)
我实际上认为这可能是处理这些数据的一种效率较低的方法,但我不确定更简单,更具性能的技术。 (如果你有一个,我很乐意听到!)
So, the input can be in this format:
因此,输入可以采用以下格式:
Hello {#name}! I'm a {%string|sentence|bit of {#random} text}
Processing the data is easy enough if it's in this format:
如果采用以下格式处理数据很简单:
Hello {#name}! I'm a {%string|sentence|bit of random text}
But it's the recursive curly braces within another set of curly braces that is the problem when it comes to processing. I'm using the following code to split the string:
但它是另一组花括号中的递归花括号,这是处理时的问题。我正在使用以下代码来拆分字符串:
preg_match_all("/(?<={)[^}]*(?=})/m", $string, $braces, PREG_OFFSET_CAPTURE);
And as before mentioned, it's very nice for the simple form. Just less so for the more complicated form. The intention for this (and I have it functional in a non-recursive form) is to replace each parenthesized area with the content as processed by functions, working upwards.
如前所述,这对于简单的形式来说非常好。对于更复杂的形式,更不用说了。这个的意图(我让它以非递归形式运行)是用函数处理的内容替换每个带括号的区域,向上工作。
Ideally, I'd like to be able to write Hello {#name}! I'm a {%string|sentence|bit of {?(random == "strange") ? {#random} : "strange"}} text}
and for it to be manageable.
理想情况下,我希望能够编写Hello {#name}!我是{?(随机==“奇怪”)的{%string |句子|? {#random}:“strange”}}}}以及它是可管理的。
Any help would be very much appreciated.
任何帮助将非常感谢。
1 个解决方案
#1
2
You can leverage PCRE regex power of capturing groups in look-aheads and subroutines to get the nested {...}
substrings.
您可以在前瞻和子例程中利用PCRE正则表达式捕获组的功能来获取嵌套的{...}子字符串。
A regex demo is available here.
这里有一个正则表达式演示。
$re = "#(?=(\{(?>[^{}]|(?1))*+\}))#";
$str = "Hello {#name}! I'm a {%string|sentence|bit of {#random} text}";
preg_match_all($re, $str, $matches, PREG_OFFSET_CAPTURE);
print_r($matches[1]);
See IDEONE demo
请参阅IDEONE演示
It will return an array with the captured {...}-like strings and their positions:
它将返回一个包含捕获的{...}字符串及其位置的数组:
Array
(
[0] => Array
(
[0] => {#name}
[1] => 6
)
[1] => Array
(
[0] => {%string|sentence|bit of {#random} text}
[1] => 21
)
[2] => Array
(
[0] => {#random}
[1] => 46
)
)
#1
2
You can leverage PCRE regex power of capturing groups in look-aheads and subroutines to get the nested {...}
substrings.
您可以在前瞻和子例程中利用PCRE正则表达式捕获组的功能来获取嵌套的{...}子字符串。
A regex demo is available here.
这里有一个正则表达式演示。
$re = "#(?=(\{(?>[^{}]|(?1))*+\}))#";
$str = "Hello {#name}! I'm a {%string|sentence|bit of {#random} text}";
preg_match_all($re, $str, $matches, PREG_OFFSET_CAPTURE);
print_r($matches[1]);
See IDEONE demo
请参阅IDEONE演示
It will return an array with the captured {...}-like strings and their positions:
它将返回一个包含捕获的{...}字符串及其位置的数组:
Array
(
[0] => Array
(
[0] => {#name}
[1] => 6
)
[1] => Array
(
[0] => {%string|sentence|bit of {#random} text}
[1] => 21
)
[2] => Array
(
[0] => {#random}
[1] => 46
)
)