I am having some grief with an XML feed that I am being sent. I know it is invalid, but the development cycle of the sending program is such that it is not worth waiting for them to be able to correct the error. So I am looking for a work around for it, some way to get PHP to let me read the XML and merge/drop the invalid attribute entries while keeping all the others.
我对发送的XML提要有些不满。我知道它是无效的,但是发送程序的开发周期是如此的不值得等待他们能够纠正错误。所以我在寻找一种工作,一种让PHP让我读取XML并合并/删除无效的属性条目的方法,同时保留所有其他的。
The fault is that I have duplicate attributes on an XML node. I have been using simpleXML to read the files and process them into a useful values, but this line just breaks the system outright. The offending XML looks like this
错误在于我在XML节点上有重复的属性。我一直在使用simpleXML读取文件并将它们处理成有用的值,但是这条线直接破坏了系统。出错的XML看起来像这样。
<dCategory dec="1102" dup="45" dup="4576" loc="274" mov="31493" prf="23469" unq="240031" xxx="7861" />
What I would really like is the PHP equivalent of C#'s .MoveToNextAttribute() on the XML reader. I can't seem to find anything that doesn't just blow up when presented with the duplicate attribute.
我真正想要的是XML阅读器上c#的. movetonextattribute()的PHP版本。当显示duplicate属性时,我似乎找不到任何东西不会崩溃。
Anyone help out on this?
有人帮忙吗?
The answers linked to address errors in characters within the XML itself. e.g. & not appearing as &. The problem here is that the structure of the XML is broken, not the content. The answer in that thread returns
答案链接到XML内部字符中的地址错误。不以&出现。这里的问题是XML的结构被破坏了,而不是内容。该线程中的答案返回
parser error : Attribute attr1 redefined
when presented with the XML
当显示XML时
<open-1 attr1="atr1" attr1="atr1">Text</open-1>
Which is what I am trying to parse.
这就是我要分析的。
1 个解决方案
#1
1
You could use tidy to clean up your input :
你可以使用tidy整理你的输入:
<?php
$buffer = '<?xml version="1.0" encoding="UTF-8"?><open-1 attr1="atr1" attr1="atr1">Text</open-1>';
$config = [
'indent' => true,
'output-xml' => true,
'input-xml' => true,
];
$tidy = tidy_parse_string($buffer, $config, 'UTF8');
$tidy->cleanRepair();
echo $tidy;
Will output :
将输出:
<?xml version="1.0" encoding="utf-8"?>
<open-1 attr1="atr1">Text</open-1>
#1
1
You could use tidy to clean up your input :
你可以使用tidy整理你的输入:
<?php
$buffer = '<?xml version="1.0" encoding="UTF-8"?><open-1 attr1="atr1" attr1="atr1">Text</open-1>';
$config = [
'indent' => true,
'output-xml' => true,
'input-xml' => true,
];
$tidy = tidy_parse_string($buffer, $config, 'UTF8');
$tidy->cleanRepair();
echo $tidy;
Will output :
将输出:
<?xml version="1.0" encoding="utf-8"?>
<open-1 attr1="atr1">Text</open-1>