用PHP生成XML文档(转义字符)

时间:2021-12-08 22:27:00

I'm generating an XML document from a PHP script and I need to escape the XML special characters. I know the list of characters that should be escaped; but what is the correct way to do it?

我正在从PHP脚本生成XML文档,我需要转义XML特殊字符。我知道应该转义的字符列表;但是正确的方法是什么呢?

Should the characters be escaped just with backslash (\') or what is the proper way? Is there any built-in PHP function that can handle this for me?

应该只使用反斜杠来转义字符,还是应该使用什么正确的方法?有什么内置的PHP函数可以帮我处理吗?

10 个解决方案

#1


33  

Use the DOM classes to generate your whole XML document. It will handle encodings and decodings that we don't even want to care about.

使用DOM类生成整个XML文档。它将处理编码和解码,我们甚至不希望关心这些。


Edit: This was criticized by @Tchalvak:

编辑:这被@Tchalvak批评:

The DOM object creates a full XML document, it doesn't easily lend itself to just encoding a string on it's own.

DOM对象创建了一个完整的XML文档,它不容易将其本身用于编码一个字符串。

Which is wrong, DOMDocument can properly output just a fragment not the whole document:

错误的是,DOMDocument只能输出一个片段而不能输出整个文档:

$doc->saveXML($fragment);

which gives:

这使:

Test &amp; <b> and encode </b> :)
Test &amp;amp; &lt;b&gt; and encode &lt;/b&gt; :)

as in:

如:

$doc = new DOMDocument();
$fragment = $doc->createDocumentFragment();

// adding XML verbatim:
$xml = "Test &amp; <b> and encode </b> :)\n";
$fragment->appendXML($xml);

// adding text:
$text = $xml;
$fragment->appendChild($doc->createTextNode($text));

// output the result
echo $doc->saveXML($fragment);

See Demo

看到演示

#2


35  

I created simple function that escapes with the five "predefined entities" that are in XML:

我创建了一个简单的函数,用XML表示的五个“预定义实体”进行转义:

function xml_entities($string) {
    return strtr(
        $string, 
        array(
            "<" => "&lt;",
            ">" => "&gt;",
            '"' => "&quot;",
            "'" => "&apos;",
            "&" => "&amp;",
        )
    );
}

Usage example Demo:

使用示例演示:

$text = "Test &amp; <b> and encode </b> :)";
echo xml_entities($text);

Output:

输出:

Test &amp;amp; &lt;b&gt; and encode &lt;/b&gt; :)

A similar effect can be achieved by using str_replace but it is fragile because of double-replacings (untested, not recommended):

使用str_replace也可以达到类似的效果,但由于重复替换(未经测试,不推荐),它是脆弱的:

function xml_entities($string) {
    return str_replace(
        array("&",     "<",    ">",    '"',      "'"),
        array("&amp;", "&lt;", "&gt;", "&quot;", "&apos;"), 
        $string
    );
}

#3


16  

What about the htmlspecialchars() function?

那htmlspecialchars()函数呢?

htmlspecialchars($input, ENT_QUOTES | ENT_XML1, $encoding);

Note: the ENT_XML1 flag is only available if you have PHP 5.4.0 or higher.

注意:只有在PHP 5.4.0或更高版本时,才能使用ENT_XML1标志。

htmlspecialchars() with these parameters replaces the following characters:

htmlspecialchars()使用这些参数替换以下字符:

  • & (ampersand) becomes &amp;
  • &(&)成为,
  • " (double quote) becomes &quot;
  • (双引号)变成“;
  • ' (single quote) becomes &apos;
  • (单引号)变成'
  • < (less than) becomes &lt;
  • <(小于)变成<
  • > (greater than) becomes &gt;
  • >(大于)变为>

You can get the translation table by using the get_html_translation_table() function.

您可以使用get_html_translation_table()函数获得转换表。

#4


12  

Tried hard to deal with XML entity issue, solve in this way:

努力处理XML实体问题,解决方法如下:

htmlspecialchars($value, ENT_QUOTES, 'UTF-8')

#5


5  

In order to have a valid final XML text, you need to escape all XML entities and have the text written in the same encoding as the XML document processing-instruction states it (the "encoding" in the <?xml line). The accented characters don't need to be escaped as long as they are encoded as the document.

为了获得有效的最终XML文本,您需要转义所有XML实体,并将文本编写为与XML文档处理指令状态相同的编码(

However, in many situations simply escaping the input with htmlspecialchars may lead to double-encoded entities (for example &eacute; would become &amp;eacute;), so I suggest decoding html entities first:

然而,在许多情况下,简单地用htmlspecialchars转义输入可能导致双编码实体(例如& e急性;因此我建议先解码html实体:

function xml_escape($s)
{
    $s = html_entity_decode($s, ENT_QUOTES, 'UTF-8');
    $s = htmlspecialchars($s, ENT_QUOTES, 'UTF-8', false);
    return $s;
}

Now you need to make sure all accented characters are valid in the XML document encoding. I strongly encourage to always encode XML output in UTF-8, since not all the XML parsers respect the XML document processing-instruction encoding. If your input might come from a different charset, try using utf8_encode().

现在,您需要确保所有重音字符在XML文档编码中都是有效的。我强烈建议始终使用UTF-8编码XML输出,因为并非所有XML解析器都遵循XML文档处理指令编码。如果您的输入可能来自不同的字符集,请尝试使用utf8_encode()。

There's a special case, which is your input may come from one of these encodings: ISO-8859-1, ISO-8859-15, UTF-8, cp866, cp1251, cp1252, and KOI8-R -- PHP treats them all the same, but there are some slight differences in them -- some of which even iconv() cannot handle. I could only solve this encoding issue by complementing utf8_encode() behavior:

这里有一个特殊的情况,即您的输入可能来自这些编码中的一个:ISO-8859-1、ISO-8859-15、UTF-8、cp866、cp1251、cp1252和KOI8-R——PHP都处理它们,但它们中有一些细微的差异——有些甚至iconv()也无法处理。我只能通过补充utf8_encode()行为来解决这个编码问题:

function encode_utf8($s)
{
    $cp1252_map = array(
    "\xc2\x80" => "\xe2\x82\xac",
    "\xc2\x82" => "\xe2\x80\x9a",
    "\xc2\x83" => "\xc6\x92",
    "\xc2\x84" => "\xe2\x80\x9e",
    "\xc2\x85" => "\xe2\x80\xa6",
    "\xc2\x86" => "\xe2\x80\xa0",
    "\xc2\x87" => "\xe2\x80\xa1",
    "\xc2\x88" => "\xcb\x86",
    "\xc2\x89" => "\xe2\x80\xb0",
    "\xc2\x8a" => "\xc5\xa0",
    "\xc2\x8b" => "\xe2\x80\xb9",
    "\xc2\x8c" => "\xc5\x92",
    "\xc2\x8e" => "\xc5\xbd",
    "\xc2\x91" => "\xe2\x80\x98",
    "\xc2\x92" => "\xe2\x80\x99",
    "\xc2\x93" => "\xe2\x80\x9c",
    "\xc2\x94" => "\xe2\x80\x9d",
    "\xc2\x95" => "\xe2\x80\xa2",
    "\xc2\x96" => "\xe2\x80\x93",
    "\xc2\x97" => "\xe2\x80\x94",
    "\xc2\x98" => "\xcb\x9c",
    "\xc2\x99" => "\xe2\x84\xa2",
    "\xc2\x9a" => "\xc5\xa1",
    "\xc2\x9b" => "\xe2\x80\xba",
    "\xc2\x9c" => "\xc5\x93",
    "\xc2\x9e" => "\xc5\xbe",
    "\xc2\x9f" => "\xc5\xb8"
    );
    $s=strtr(utf8_encode($s), $cp1252_map);
    return $s;
}

#6


2  

If you need proper xml output, simplexml is the way to go:

如果需要适当的xml输出,可以使用simplexml:

http://www.php.net/manual/en/simplexmlelement.asxml.php

http://www.php.net/manual/en/simplexmlelement.asxml.php

#7


1  

Proper escaping is the way to get correct XML output but you need to handle escaping differently for attributes and elements. (That is Tomas' answer is incorrect).

正确的转义是获得正确XML输出的方法,但是您需要对属性和元素处理不同的转义。(托马斯的回答是不正确的)。

I wrote/stole some Java code a while back that differentiates between attribute and element escaping. The reason is that the XML parser considers all white space special particularly in attributes.

我不久前编写/窃取了一些Java代码,它们区分了属性和元素转义。原因是XML解析器认为所有空格都是特殊的,特别是在属性中。

It should be trivial to port that over to PHP (you can use Tomas Jancik's approach with the above appropriate escaping). You don't have to worry about escaping extended entities if your using UTF-8.

将其移植到PHP应该是微不足道的(您可以使用Tomas Jancik的方法与上述适当的转义)。如果您使用的是UTF-8,那么您不必担心是否要从扩展实体中转义。

If you don't want to port my Java code you can look at XMLWriter which is stream based and uses libxml so it should be very efficient.

如果不希望移植我的Java代码,可以查看XMLWriter,它是基于流的,使用libxml,因此应该非常高效。

#8


0  

You can use this methods: http://php.net/manual/en/function.htmlentities.php

您可以使用以下方法:http://php.net/manual/en/function.htmlentities.php

In that way all entities (html/xml) are escaped and you can put your string inside XML tags

通过这种方式,所有实体(html/xml)都被转义,您可以将字符串放入xml标记中

#9


-1  

 function replace_char($arr1)
 {
  $arr[]=preg_replace('>','&gt', $arr1); 
  $arr[]=preg_replace('<','&lt', $arr1);
  $arr[]=preg_replace('"','&quot', $arr1);
  $arr[]=preg_replace('\'','&apos', $arr1);
  $arr[]=preg_replace('&','&amp', $arr1);

  return $arr;
  }       

#10


-1  

Based on the solution of sadeghj the following code worked for me:

基于sadeghj的解决方案,以下代码对我有效:

/**
 * @param $arr1 the single string that shall be masked
 * @return the resulting string with the masked characters
 */
function replace_char($arr1)
{
    if (strpos ($arr1,'&')!== FALSE) { //test if the character appears 
        $arr1=preg_replace('/&/','&amp;', $arr1); // do this first
    }

    // just encode the
    if (strpos ($arr1,'>')!== FALSE) {
        $arr1=preg_replace('/>/','&gt;', $arr1);
    }
    if (strpos ($arr1,'<')!== FALSE) {
        $arr1=preg_replace('/</','&lt;', $arr1);
    }

    if (strpos ($arr1,'"')!== FALSE) {
        $arr1=preg_replace('/"/','&quot;', $arr1);
    }

    if (strpos ($arr1,'\'')!== FALSE) {
        $arr1=preg_replace('/\'/','&apos;', $arr1);
    }

    return $arr1;
}

#1


33  

Use the DOM classes to generate your whole XML document. It will handle encodings and decodings that we don't even want to care about.

使用DOM类生成整个XML文档。它将处理编码和解码,我们甚至不希望关心这些。


Edit: This was criticized by @Tchalvak:

编辑:这被@Tchalvak批评:

The DOM object creates a full XML document, it doesn't easily lend itself to just encoding a string on it's own.

DOM对象创建了一个完整的XML文档,它不容易将其本身用于编码一个字符串。

Which is wrong, DOMDocument can properly output just a fragment not the whole document:

错误的是,DOMDocument只能输出一个片段而不能输出整个文档:

$doc->saveXML($fragment);

which gives:

这使:

Test &amp; <b> and encode </b> :)
Test &amp;amp; &lt;b&gt; and encode &lt;/b&gt; :)

as in:

如:

$doc = new DOMDocument();
$fragment = $doc->createDocumentFragment();

// adding XML verbatim:
$xml = "Test &amp; <b> and encode </b> :)\n";
$fragment->appendXML($xml);

// adding text:
$text = $xml;
$fragment->appendChild($doc->createTextNode($text));

// output the result
echo $doc->saveXML($fragment);

See Demo

看到演示

#2


35  

I created simple function that escapes with the five "predefined entities" that are in XML:

我创建了一个简单的函数,用XML表示的五个“预定义实体”进行转义:

function xml_entities($string) {
    return strtr(
        $string, 
        array(
            "<" => "&lt;",
            ">" => "&gt;",
            '"' => "&quot;",
            "'" => "&apos;",
            "&" => "&amp;",
        )
    );
}

Usage example Demo:

使用示例演示:

$text = "Test &amp; <b> and encode </b> :)";
echo xml_entities($text);

Output:

输出:

Test &amp;amp; &lt;b&gt; and encode &lt;/b&gt; :)

A similar effect can be achieved by using str_replace but it is fragile because of double-replacings (untested, not recommended):

使用str_replace也可以达到类似的效果,但由于重复替换(未经测试,不推荐),它是脆弱的:

function xml_entities($string) {
    return str_replace(
        array("&",     "<",    ">",    '"',      "'"),
        array("&amp;", "&lt;", "&gt;", "&quot;", "&apos;"), 
        $string
    );
}

#3


16  

What about the htmlspecialchars() function?

那htmlspecialchars()函数呢?

htmlspecialchars($input, ENT_QUOTES | ENT_XML1, $encoding);

Note: the ENT_XML1 flag is only available if you have PHP 5.4.0 or higher.

注意:只有在PHP 5.4.0或更高版本时,才能使用ENT_XML1标志。

htmlspecialchars() with these parameters replaces the following characters:

htmlspecialchars()使用这些参数替换以下字符:

  • & (ampersand) becomes &amp;
  • &(&)成为,
  • " (double quote) becomes &quot;
  • (双引号)变成“;
  • ' (single quote) becomes &apos;
  • (单引号)变成'
  • < (less than) becomes &lt;
  • <(小于)变成<
  • > (greater than) becomes &gt;
  • >(大于)变为>

You can get the translation table by using the get_html_translation_table() function.

您可以使用get_html_translation_table()函数获得转换表。

#4


12  

Tried hard to deal with XML entity issue, solve in this way:

努力处理XML实体问题,解决方法如下:

htmlspecialchars($value, ENT_QUOTES, 'UTF-8')

#5


5  

In order to have a valid final XML text, you need to escape all XML entities and have the text written in the same encoding as the XML document processing-instruction states it (the "encoding" in the <?xml line). The accented characters don't need to be escaped as long as they are encoded as the document.

为了获得有效的最终XML文本,您需要转义所有XML实体,并将文本编写为与XML文档处理指令状态相同的编码(

However, in many situations simply escaping the input with htmlspecialchars may lead to double-encoded entities (for example &eacute; would become &amp;eacute;), so I suggest decoding html entities first:

然而,在许多情况下,简单地用htmlspecialchars转义输入可能导致双编码实体(例如& e急性;因此我建议先解码html实体:

function xml_escape($s)
{
    $s = html_entity_decode($s, ENT_QUOTES, 'UTF-8');
    $s = htmlspecialchars($s, ENT_QUOTES, 'UTF-8', false);
    return $s;
}

Now you need to make sure all accented characters are valid in the XML document encoding. I strongly encourage to always encode XML output in UTF-8, since not all the XML parsers respect the XML document processing-instruction encoding. If your input might come from a different charset, try using utf8_encode().

现在,您需要确保所有重音字符在XML文档编码中都是有效的。我强烈建议始终使用UTF-8编码XML输出,因为并非所有XML解析器都遵循XML文档处理指令编码。如果您的输入可能来自不同的字符集,请尝试使用utf8_encode()。

There's a special case, which is your input may come from one of these encodings: ISO-8859-1, ISO-8859-15, UTF-8, cp866, cp1251, cp1252, and KOI8-R -- PHP treats them all the same, but there are some slight differences in them -- some of which even iconv() cannot handle. I could only solve this encoding issue by complementing utf8_encode() behavior:

这里有一个特殊的情况,即您的输入可能来自这些编码中的一个:ISO-8859-1、ISO-8859-15、UTF-8、cp866、cp1251、cp1252和KOI8-R——PHP都处理它们,但它们中有一些细微的差异——有些甚至iconv()也无法处理。我只能通过补充utf8_encode()行为来解决这个编码问题:

function encode_utf8($s)
{
    $cp1252_map = array(
    "\xc2\x80" => "\xe2\x82\xac",
    "\xc2\x82" => "\xe2\x80\x9a",
    "\xc2\x83" => "\xc6\x92",
    "\xc2\x84" => "\xe2\x80\x9e",
    "\xc2\x85" => "\xe2\x80\xa6",
    "\xc2\x86" => "\xe2\x80\xa0",
    "\xc2\x87" => "\xe2\x80\xa1",
    "\xc2\x88" => "\xcb\x86",
    "\xc2\x89" => "\xe2\x80\xb0",
    "\xc2\x8a" => "\xc5\xa0",
    "\xc2\x8b" => "\xe2\x80\xb9",
    "\xc2\x8c" => "\xc5\x92",
    "\xc2\x8e" => "\xc5\xbd",
    "\xc2\x91" => "\xe2\x80\x98",
    "\xc2\x92" => "\xe2\x80\x99",
    "\xc2\x93" => "\xe2\x80\x9c",
    "\xc2\x94" => "\xe2\x80\x9d",
    "\xc2\x95" => "\xe2\x80\xa2",
    "\xc2\x96" => "\xe2\x80\x93",
    "\xc2\x97" => "\xe2\x80\x94",
    "\xc2\x98" => "\xcb\x9c",
    "\xc2\x99" => "\xe2\x84\xa2",
    "\xc2\x9a" => "\xc5\xa1",
    "\xc2\x9b" => "\xe2\x80\xba",
    "\xc2\x9c" => "\xc5\x93",
    "\xc2\x9e" => "\xc5\xbe",
    "\xc2\x9f" => "\xc5\xb8"
    );
    $s=strtr(utf8_encode($s), $cp1252_map);
    return $s;
}

#6


2  

If you need proper xml output, simplexml is the way to go:

如果需要适当的xml输出,可以使用simplexml:

http://www.php.net/manual/en/simplexmlelement.asxml.php

http://www.php.net/manual/en/simplexmlelement.asxml.php

#7


1  

Proper escaping is the way to get correct XML output but you need to handle escaping differently for attributes and elements. (That is Tomas' answer is incorrect).

正确的转义是获得正确XML输出的方法,但是您需要对属性和元素处理不同的转义。(托马斯的回答是不正确的)。

I wrote/stole some Java code a while back that differentiates between attribute and element escaping. The reason is that the XML parser considers all white space special particularly in attributes.

我不久前编写/窃取了一些Java代码,它们区分了属性和元素转义。原因是XML解析器认为所有空格都是特殊的,特别是在属性中。

It should be trivial to port that over to PHP (you can use Tomas Jancik's approach with the above appropriate escaping). You don't have to worry about escaping extended entities if your using UTF-8.

将其移植到PHP应该是微不足道的(您可以使用Tomas Jancik的方法与上述适当的转义)。如果您使用的是UTF-8,那么您不必担心是否要从扩展实体中转义。

If you don't want to port my Java code you can look at XMLWriter which is stream based and uses libxml so it should be very efficient.

如果不希望移植我的Java代码,可以查看XMLWriter,它是基于流的,使用libxml,因此应该非常高效。

#8


0  

You can use this methods: http://php.net/manual/en/function.htmlentities.php

您可以使用以下方法:http://php.net/manual/en/function.htmlentities.php

In that way all entities (html/xml) are escaped and you can put your string inside XML tags

通过这种方式,所有实体(html/xml)都被转义,您可以将字符串放入xml标记中

#9


-1  

 function replace_char($arr1)
 {
  $arr[]=preg_replace('>','&gt', $arr1); 
  $arr[]=preg_replace('<','&lt', $arr1);
  $arr[]=preg_replace('"','&quot', $arr1);
  $arr[]=preg_replace('\'','&apos', $arr1);
  $arr[]=preg_replace('&','&amp', $arr1);

  return $arr;
  }       

#10


-1  

Based on the solution of sadeghj the following code worked for me:

基于sadeghj的解决方案,以下代码对我有效:

/**
 * @param $arr1 the single string that shall be masked
 * @return the resulting string with the masked characters
 */
function replace_char($arr1)
{
    if (strpos ($arr1,'&')!== FALSE) { //test if the character appears 
        $arr1=preg_replace('/&/','&amp;', $arr1); // do this first
    }

    // just encode the
    if (strpos ($arr1,'>')!== FALSE) {
        $arr1=preg_replace('/>/','&gt;', $arr1);
    }
    if (strpos ($arr1,'<')!== FALSE) {
        $arr1=preg_replace('/</','&lt;', $arr1);
    }

    if (strpos ($arr1,'"')!== FALSE) {
        $arr1=preg_replace('/"/','&quot;', $arr1);
    }

    if (strpos ($arr1,'\'')!== FALSE) {
        $arr1=preg_replace('/\'/','&apos;', $arr1);
    }

    return $arr1;
}