编写表达式以在括号之间递归提取数据

时间:2021-11-12 22:42:23

I'm trying to write a regular expression to split a string into separate elements inside matching curly braces. First off, it needs to be recursive, and second off, it has to return the offsets (like with PREG_OFFSET_CAPTURE).

我正在尝试编写一个正则表达式,将字符串拆分为匹配花括号内的单独元素。首先,它需要递归,其次,它必须返回偏移量(如PREG_OFFSET_CAPTURE)。

I actually think this is probably a less efficient way to process this data, but I'm unsure of an easier, more performance driven technique. (If you've got one, I would love to hear it!)

我实际上认为这可能是处理这些数据的一种效率较低的方法,但我不确定更简单,更具性能的技术。 (如果你有一个,我很乐意听到!)

So, the input can be in this format:

因此,输入可以采用以下格式:

Hello {#name}! I'm a {%string|sentence|bit of {#random} text}

Processing the data is easy enough if it's in this format:

如果采用以下格式处理数据很简单:

Hello {#name}! I'm a {%string|sentence|bit of random text}

But it's the recursive curly braces within another set of curly braces that is the problem when it comes to processing. I'm using the following code to split the string:

但它是另一组花括号中的递归花括号,这是处理时的问题。我正在使用以下代码来拆分字符串:

preg_match_all("/(?<={)[^}]*(?=})/m", $string, $braces, PREG_OFFSET_CAPTURE);

And as before mentioned, it's very nice for the simple form. Just less so for the more complicated form. The intention for this (and I have it functional in a non-recursive form) is to replace each parenthesized area with the content as processed by functions, working upwards.

如前所述,这对于简单的形式来说非常好。对于更复杂的形式,更不用说了。这个的意图(我让它以非递归形式运行)是用函数处理的内容替换每个带括号的区域,向上工作。

Ideally, I'd like to be able to write Hello {#name}! I'm a {%string|sentence|bit of {?(random == "strange") ? {#random} : "strange"}} text} and for it to be manageable.

理想情况下,我希望能够编写Hello {#name}!我是{?(随机==“奇怪”)的{%string |句子|? {#random}:“strange”}}}}以及它是可管理的。

Any help would be very much appreciated.

任何帮助将非常感谢。

1 个解决方案

#1


2  

You can leverage PCRE regex power of capturing groups in look-aheads and subroutines to get the nested {...} substrings.

您可以在前瞻和子例程中利用PCRE正则表达式捕获组的功能来获取嵌套的{...}子字符串。

A regex demo is available here.

这里有一个正则表达式演示。

$re = "#(?=(\{(?>[^{}]|(?1))*+\}))#"; 
$str = "Hello {#name}! I'm a {%string|sentence|bit of {#random} text}"; 
preg_match_all($re, $str, $matches, PREG_OFFSET_CAPTURE);
print_r($matches[1]);

See IDEONE demo

请参阅IDEONE演示

It will return an array with the captured {...}-like strings and their positions:

它将返回一个包含捕获的{...}字符串及其位置的数组:

Array
(
    [0] => Array
        (
            [0] => {#name}
            [1] => 6
        )

    [1] => Array
        (
            [0] => {%string|sentence|bit of {#random} text}
            [1] => 21
        )

    [2] => Array
        (
            [0] => {#random}
            [1] => 46
        )

)

#1


2  

You can leverage PCRE regex power of capturing groups in look-aheads and subroutines to get the nested {...} substrings.

您可以在前瞻和子例程中利用PCRE正则表达式捕获组的功能来获取嵌套的{...}子字符串。

A regex demo is available here.

这里有一个正则表达式演示。

$re = "#(?=(\{(?>[^{}]|(?1))*+\}))#"; 
$str = "Hello {#name}! I'm a {%string|sentence|bit of {#random} text}"; 
preg_match_all($re, $str, $matches, PREG_OFFSET_CAPTURE);
print_r($matches[1]);

See IDEONE demo

请参阅IDEONE演示

It will return an array with the captured {...}-like strings and their positions:

它将返回一个包含捕获的{...}字符串及其位置的数组:

Array
(
    [0] => Array
        (
            [0] => {#name}
            [1] => 6
        )

    [1] => Array
        (
            [0] => {%string|sentence|bit of {#random} text}
            [1] => 21
        )

    [2] => Array
        (
            [0] => {#random}
            [1] => 46
        )

)