How do you deal with broken data in XML files? For example, if I had
你如何处理XML文件中的数据损坏?例如,如果我有
<text>Some &improper; text here.</text>
I'm trying to do:
我正在尝试:
$doc = new DOMDocument();
$doc->validateOnParse = false;
$doc->formatOutput = false;
$doc->load(...xml');
and it fails miserably, because there's an unknown entity. Note, I can't use CDATA due to the way the software is written. I'm writing a module which reads and writes XML, and sometimes the user inserts improper text.
它失败了,因为有一个未知的实体。注意,由于软件的编写方式,我无法使用CDATA。我正在编写一个读取和写入XML的模块,有时用户会插入不正确的文本。
I've noticed that DOMDocument->loadHTML() nicely encodes everything, but how could I continue from there?
我注意到DOMDocument-> loadHTML()很好地编码了所有内容,但我怎么能从那里继续?
3 个解决方案
#1
0
Perhaps you can use preg_replace_callback
to do the heavy lifting with entities for you:
也许您可以使用preg_replace_callback为您执行繁重的实体:
http://php.net/manual/en/function.preg-replace-callback.php
function fixEntities($data) {
switch(substr($data, 1, strlen($data) - 2)) {
case 'amp':
case 'lt':
case 'gt':
case 'quot': // etc., etc., etc.
return $data;
}
return '';
}
$xml = preg_replace_callback('/&([a-zA-Z0-9#]*);{1}/', 'fixEntities', $xml);
#2
1
Use htmlspecialchars to serialize special xml characters before pushing the input into your xml/xhtml dom. While its name is prefixed with "html", based on the only characters it replaces, it is truely useful for xml data serialization.
在将输入推送到xml / xhtml dom之前,使用htmlspecialchars序列化特殊的xml字符。虽然它的名称以“html”为前缀,但基于它替换的唯一字符,它对于xml数据序列化非常有用。
#3
0
If you are the one who writes the xml, there should be no problem, as you can encode any user input into entities before putting it into xml.
如果您是编写xml的人,那么应该没有问题,因为您可以在将任何用户输入放入xml之前将其编码为实体。
#1
0
Perhaps you can use preg_replace_callback
to do the heavy lifting with entities for you:
也许您可以使用preg_replace_callback为您执行繁重的实体:
http://php.net/manual/en/function.preg-replace-callback.php
function fixEntities($data) {
switch(substr($data, 1, strlen($data) - 2)) {
case 'amp':
case 'lt':
case 'gt':
case 'quot': // etc., etc., etc.
return $data;
}
return '';
}
$xml = preg_replace_callback('/&([a-zA-Z0-9#]*);{1}/', 'fixEntities', $xml);
#2
1
Use htmlspecialchars to serialize special xml characters before pushing the input into your xml/xhtml dom. While its name is prefixed with "html", based on the only characters it replaces, it is truely useful for xml data serialization.
在将输入推送到xml / xhtml dom之前,使用htmlspecialchars序列化特殊的xml字符。虽然它的名称以“html”为前缀,但基于它替换的唯一字符,它对于xml数据序列化非常有用。
#3
0
If you are the one who writes the xml, there should be no problem, as you can encode any user input into entities before putting it into xml.
如果您是编写xml的人,那么应该没有问题,因为您可以在将任何用户输入放入xml之前将其编码为实体。