在引号内使用带有转义引号的REGEX

时间:2022-09-15 16:14:16

I have a PHP preg_match_all and REGEX question.

我有一个PHP preg_match_all和REGEX问题。

I have the following code:

我有以下代码:

<?php

$string= 'attribute1="some_value" attribute2="<h1 class=\"title\">Blahhhh</h1>"';

preg_match_all('/(.*?)\s*=\s*(\'|"|&#?\w+;)(.*?)\2/s', trim($string), $matches);

print_r($matches);

?>

That does not seem to pickup escaped quotes for the instance that I want to pass in HTML with quotes. I have tried numerous solutions for this with the basic quotes inside quotes REGEX fixes, but none seem to be working for me. I can't seem to place them correctly inside this pre-existing REGEX.

对于我想用引号用HTML传递的实例,似乎没有提取转义引号。我已经尝试了很多解决方案,引用REGEX修复内部的基本引号,但似乎没有一个适合我。我似乎无法将它们正确放置在这个预先存在的REGEX中。

I am not a REGEX master, can someone please point me in the right direction?

我不是REGEX的主人,有人可以指点我正确的方向吗?

The result I am trying to achieve is this:

我试图实现的结果是:

Array
(
    [0] => Array
        (
            [0] => attribute1="some_value"
            [1] =>  attribute2="<h1 class=\"title\">Blahhhh</h1>"
        )

    [1] => Array
        (
            [0] => attribute1
            [1] =>  attribute2
        )

    [2] => Array
        (
            [0] => "
            [1] => "
        )

    [3] => Array
        (
            [0] => some_value
            [1] => <h1 class=\"title\">Blahhhh</h1>
        )
)

Thanks.

1 个解决方案

#1


1  

You can solve this with a negative lookbehind assertion:

您可以使用负面的lookbehind断言来解决这个问题:

'/(.*?)\s*=\s*(\'|"|&#?\w+;)(.*?)(?<!\\\\)\2~/'
                                 ^^^^^^^^^

The closing quote should not be prepended by \. Gives me:

结尾引用不应该以\为前缀。给我:

Array
(
    [0] => Array
        (
            [0] => attribute1="some_value"
            [1] =>  attribute2="<h1 class=\"title\">Blahhhh</h1>"
        )

    [1] => Array
        (
            [0] => attribute1
            [1] =>  attribute2
        )

    [2] => Array
        (
            [0] => "
            [1] => "
        )

    [3] => Array
        (
            [0] => some_value
            [1] => <h1 class=\"title\">Blahhhh</h1>
        )
)

This regex ain't perfect because it of the entity you but in there as delimiter, like the quotes it can be escaped as well with \. No idea if that is really intended.

这个正则表达式并不完美,因为它是你的实体,但它作为分隔符,就像引号一样,它也可以用\来转义。不知道这是不是真的有意。

See also this great question/answer: Split string by delimiter, but not if it is escaped.

另请参阅这个很棒的问题/答案:按分隔符拆分字符串,但如果转义则不会。

#1


1  

You can solve this with a negative lookbehind assertion:

您可以使用负面的lookbehind断言来解决这个问题:

'/(.*?)\s*=\s*(\'|"|&#?\w+;)(.*?)(?<!\\\\)\2~/'
                                 ^^^^^^^^^

The closing quote should not be prepended by \. Gives me:

结尾引用不应该以\为前缀。给我:

Array
(
    [0] => Array
        (
            [0] => attribute1="some_value"
            [1] =>  attribute2="<h1 class=\"title\">Blahhhh</h1>"
        )

    [1] => Array
        (
            [0] => attribute1
            [1] =>  attribute2
        )

    [2] => Array
        (
            [0] => "
            [1] => "
        )

    [3] => Array
        (
            [0] => some_value
            [1] => <h1 class=\"title\">Blahhhh</h1>
        )
)

This regex ain't perfect because it of the entity you but in there as delimiter, like the quotes it can be escaped as well with \. No idea if that is really intended.

这个正则表达式并不完美,因为它是你的实体,但它作为分隔符,就像引号一样,它也可以用\来转义。不知道这是不是真的有意。

See also this great question/answer: Split string by delimiter, but not if it is escaped.

另请参阅这个很棒的问题/答案:按分隔符拆分字符串,但如果转义则不会。