如何使字符串“XML安全”?

时间:2021-02-11 16:04:54

I am responding to an AJAX call by sending it an XML document through PHP echos. In order to form this XML document, I loop through the records of a database. The problem is that the database includes records that have '<' symbols in them. So naturally, the browser throws an error at that particular spot. How can this be fixed?

我通过PHP echos发送XML文档来响应AJAX调用。为了形成这个XML文档,我对数据库的记录进行循环。问题是,数据库中包含有'<'符号的记录。所以很自然,浏览器会在特定的点上抛出错误。这怎么能解决呢?

7 个解决方案

#1


55  

By either escaping those characters with htmlspecialchars, or, perhaps more appropriately, using a library for building XML documents, such as DOMDocument or XMLWriter.

通过使用htmlspecialchars转义这些字符,或者更恰当地使用库构建XML文档,例如DOMDocument或XMLWriter。

Another alternative would be to use CDATA sections, but then you'd have to look out for occurrences of ]]>.

另一种选择是使用CDATA区域,但是您必须查找出现]>的情况。

Take also into consideration that that you must respect the encoding you define for the XML document (by default UTF-8).

还需要考虑的是,您必须尊重为XML文档定义的编码(默认为UTF-8)。

#2


44  

Since PHP 5.4 you can use:

由于PHP 5.4,您可以使用:

htmlspecialchars($string, ENT_XML1);

You should specify the encoding, such as:

您应该指定编码,例如:

htmlspecialchars($string, ENT_XML1, 'UTF-8');

Update

Note that the above will only convert:

注意,上述资料只会转换:

  • & to &amp;
  • 和&;
  • < to &lt;
  • < & lt;
  • > to &gt;
  • >比;

If you want to escape text for use in an attribute enclosed in double quotes:

如果您想要转义文本,以便在包含双引号的属性中使用:

htmlspecialchars($string, ENT_XML1 | ENT_COMPAT, 'UTF-8');

will convert " to &quot; in addition to &, < and >.

将“转换成”除了&, <和> 。


And if your attributes are enclosed in single quotes:

如果你的属性包含在单引号中:

htmlspecialchars($string, ENT_XML1 | ENT_QUOTES, 'UTF-8');

will convert ' to &apos; in addition to &, <, > and ".

将'转换为'除了&,<,>和"。

(Of course you can use this even outside of attributes).

(当然,您可以在属性之外使用它)。


See the manual entry for htmlspecialchars.

请参阅htmlspecialchars的手册条目。

#3


7  

1) You can wrap your text as CDATA like this:

1)您可以将您的文本包装为CDATA如下:

<mytag>
    <![CDATA[Your text goes here. Btw: 5<6 and 6>5]]>
</mytag>

see http://www.w3schools.com/xml/xml_cdata.asp

参见http://www.w3schools.com/xml/xml_cdata.asp

2) As already someone said: Escape those chars. E.g. like so:

就像已经有人说的那样:逃掉那些烧焦的东西。例如,像这样:

5&lt;6 and 6&gt;5

#4


5  

If at all possible, its always a good idea to create your XML using the XML classes rather than string manipulation - one of the benefits being that the classes will automatically escape characters as needed.

如果可能的话,使用XML类而不是字符串操作来创建XML始终是一个好主意——其中一个好处是类将根据需要自动转义字符。

#5


4  

Try this:

试试这个:

$str = htmlentities($str,ENT_QUOTES,'UTF-8');

So, after filtering your data using htmlentities() function, you can use the data in XML tag like:

因此,在使用htmlentities()函数过滤数据之后,您可以使用XML标记中的数据,如:

<mytag>$str</mytag>

#6


4  

Adding this in case it helps someone.

加上这个,以防万一。

As I am working with Japanese characters, encoding has also been set appropriately. However, from time to time, I find that htmlentities and htmlspecialchars are not sufficient.

由于我正在处理日语字符,编码也被适当地设置。然而,我不时地发现htmlentities和htmlspecialchars是不够的。

Some user inputs contain special characters that are not stripped by the above functions. In those cases I have to do this:

一些用户输入包含特殊字符,这些字符不被上述函数所剥离。在这些情况下,我必须这样做:

preg_replace('/[\x00-\x1f]/','',htmlspecialchars($string))

This will also remove certain xml-unsafe control characters like Null character or EOT. You can use this table to determine which characters you wish to omit.

这也将删除某些xml不安全的控制字符,如空字符或EOT。您可以使用此表来确定希望省略哪些字符。

#7


0  

I prefer the way Golang does quote escaping for XML (and a few extras like newline escaping, and escaping some other characters), so I have ported its XML escape function to PHP below

我更喜欢Golang为XML转义的方式(还有一些额外的内容,比如换行和转义其他字符),因此我将它的XML escape函数移植到下面的PHP中。

function isInCharacterRange(int $r): bool {
    return $r == 0x09 ||
            $r == 0x0A ||
            $r == 0x0D ||
            $r >= 0x20 && $r <= 0xDF77 ||
            $r >= 0xE000 && $r <= 0xFFFD ||
            $r >= 0x10000 && $r <= 0x10FFFF;
}

function xml(string $s, bool $escapeNewline = true): string {
    $w = '';

    $Last = 0;
    $l = strlen($s);
    $i = 0;

    while ($i < $l) {
        $r = mb_substr(substr($s, $i), 0, 1);
        $Width = strlen($r);
        $i += $Width;
        switch ($r) {
            case '"':
                $esc = '&#34;';
                break;
            case "'":
                $esc = '&#39;';
                break;
            case '&':
                $esc = '&amp;';
                break;
            case '<':
                $esc = '&lt;';
                break;
            case '>':
                $esc = '&gt;';
                break;
            case "\t":
                $esc = '&#x9;';
                break;
            case "\n":
                if (!$escapeNewline) {
                    continue 2;
                }
                $esc = '&#xA;';
                break;
            case "\r":
                $esc = '&#xD;';
                break;
            default:
                if (!isInCharacterRange(mb_ord($r)) || (mb_ord($r) === 0xFFFD && $Width === 1)) {
                    $esc = "\u{FFFD}";
                    break;
                }

                continue 2;
        }
        $w .= substr($s, $Last, $i - $Last - $Width) . $esc;
        $Last = $i;
    }
    $w .= substr($s, $Last);
    return $w;
}

Note you'll need at least PHP7.2 because of the mb_ord usage, or you'll have to swap it out for another polyfill, but these functions are working great for us!

注意,由于mb_ord的使用,您将至少需要PHP7.2,或者您将不得不将它替换为另一个polyfill,但是这些函数对我们非常有用!

For anyone curious, here is the relevant Go source https://golang.org/src/encoding/xml/xml.go?s=44219:44263#L1887

对于任何好奇的人,以下是相关的Go源https://golang.org/src/encoding/xml/xml.go?

#1


55  

By either escaping those characters with htmlspecialchars, or, perhaps more appropriately, using a library for building XML documents, such as DOMDocument or XMLWriter.

通过使用htmlspecialchars转义这些字符,或者更恰当地使用库构建XML文档,例如DOMDocument或XMLWriter。

Another alternative would be to use CDATA sections, but then you'd have to look out for occurrences of ]]>.

另一种选择是使用CDATA区域,但是您必须查找出现]>的情况。

Take also into consideration that that you must respect the encoding you define for the XML document (by default UTF-8).

还需要考虑的是,您必须尊重为XML文档定义的编码(默认为UTF-8)。

#2


44  

Since PHP 5.4 you can use:

由于PHP 5.4,您可以使用:

htmlspecialchars($string, ENT_XML1);

You should specify the encoding, such as:

您应该指定编码,例如:

htmlspecialchars($string, ENT_XML1, 'UTF-8');

Update

Note that the above will only convert:

注意,上述资料只会转换:

  • & to &amp;
  • 和&;
  • < to &lt;
  • < & lt;
  • > to &gt;
  • >比;

If you want to escape text for use in an attribute enclosed in double quotes:

如果您想要转义文本,以便在包含双引号的属性中使用:

htmlspecialchars($string, ENT_XML1 | ENT_COMPAT, 'UTF-8');

will convert " to &quot; in addition to &, < and >.

将“转换成”除了&, <和> 。


And if your attributes are enclosed in single quotes:

如果你的属性包含在单引号中:

htmlspecialchars($string, ENT_XML1 | ENT_QUOTES, 'UTF-8');

will convert ' to &apos; in addition to &, <, > and ".

将'转换为'除了&,<,>和"。

(Of course you can use this even outside of attributes).

(当然,您可以在属性之外使用它)。


See the manual entry for htmlspecialchars.

请参阅htmlspecialchars的手册条目。

#3


7  

1) You can wrap your text as CDATA like this:

1)您可以将您的文本包装为CDATA如下:

<mytag>
    <![CDATA[Your text goes here. Btw: 5<6 and 6>5]]>
</mytag>

see http://www.w3schools.com/xml/xml_cdata.asp

参见http://www.w3schools.com/xml/xml_cdata.asp

2) As already someone said: Escape those chars. E.g. like so:

就像已经有人说的那样:逃掉那些烧焦的东西。例如,像这样:

5&lt;6 and 6&gt;5

#4


5  

If at all possible, its always a good idea to create your XML using the XML classes rather than string manipulation - one of the benefits being that the classes will automatically escape characters as needed.

如果可能的话,使用XML类而不是字符串操作来创建XML始终是一个好主意——其中一个好处是类将根据需要自动转义字符。

#5


4  

Try this:

试试这个:

$str = htmlentities($str,ENT_QUOTES,'UTF-8');

So, after filtering your data using htmlentities() function, you can use the data in XML tag like:

因此,在使用htmlentities()函数过滤数据之后,您可以使用XML标记中的数据,如:

<mytag>$str</mytag>

#6


4  

Adding this in case it helps someone.

加上这个,以防万一。

As I am working with Japanese characters, encoding has also been set appropriately. However, from time to time, I find that htmlentities and htmlspecialchars are not sufficient.

由于我正在处理日语字符,编码也被适当地设置。然而,我不时地发现htmlentities和htmlspecialchars是不够的。

Some user inputs contain special characters that are not stripped by the above functions. In those cases I have to do this:

一些用户输入包含特殊字符,这些字符不被上述函数所剥离。在这些情况下,我必须这样做:

preg_replace('/[\x00-\x1f]/','',htmlspecialchars($string))

This will also remove certain xml-unsafe control characters like Null character or EOT. You can use this table to determine which characters you wish to omit.

这也将删除某些xml不安全的控制字符,如空字符或EOT。您可以使用此表来确定希望省略哪些字符。

#7


0  

I prefer the way Golang does quote escaping for XML (and a few extras like newline escaping, and escaping some other characters), so I have ported its XML escape function to PHP below

我更喜欢Golang为XML转义的方式(还有一些额外的内容,比如换行和转义其他字符),因此我将它的XML escape函数移植到下面的PHP中。

function isInCharacterRange(int $r): bool {
    return $r == 0x09 ||
            $r == 0x0A ||
            $r == 0x0D ||
            $r >= 0x20 && $r <= 0xDF77 ||
            $r >= 0xE000 && $r <= 0xFFFD ||
            $r >= 0x10000 && $r <= 0x10FFFF;
}

function xml(string $s, bool $escapeNewline = true): string {
    $w = '';

    $Last = 0;
    $l = strlen($s);
    $i = 0;

    while ($i < $l) {
        $r = mb_substr(substr($s, $i), 0, 1);
        $Width = strlen($r);
        $i += $Width;
        switch ($r) {
            case '"':
                $esc = '&#34;';
                break;
            case "'":
                $esc = '&#39;';
                break;
            case '&':
                $esc = '&amp;';
                break;
            case '<':
                $esc = '&lt;';
                break;
            case '>':
                $esc = '&gt;';
                break;
            case "\t":
                $esc = '&#x9;';
                break;
            case "\n":
                if (!$escapeNewline) {
                    continue 2;
                }
                $esc = '&#xA;';
                break;
            case "\r":
                $esc = '&#xD;';
                break;
            default:
                if (!isInCharacterRange(mb_ord($r)) || (mb_ord($r) === 0xFFFD && $Width === 1)) {
                    $esc = "\u{FFFD}";
                    break;
                }

                continue 2;
        }
        $w .= substr($s, $Last, $i - $Last - $Width) . $esc;
        $Last = $i;
    }
    $w .= substr($s, $Last);
    return $w;
}

Note you'll need at least PHP7.2 because of the mb_ord usage, or you'll have to swap it out for another polyfill, but these functions are working great for us!

注意,由于mb_ord的使用,您将至少需要PHP7.2,或者您将不得不将它替换为另一个polyfill,但是这些函数对我们非常有用!

For anyone curious, here is the relevant Go source https://golang.org/src/encoding/xml/xml.go?s=44219:44263#L1887

对于任何好奇的人,以下是相关的Go源https://golang.org/src/encoding/xml/xml.go?