让PHP承认XML错误。

时间:2022-01-31 20:45:51

I am having some grief with an XML feed that I am being sent. I know it is invalid, but the development cycle of the sending program is such that it is not worth waiting for them to be able to correct the error. So I am looking for a work around for it, some way to get PHP to let me read the XML and merge/drop the invalid attribute entries while keeping all the others.

我对发送的XML提要有些不满。我知道它是无效的,但是发送程序的开发周期是如此的不值得等待他们能够纠正错误。所以我在寻找一种工作,一种让PHP让我读取XML并合并/删除无效的属性条目的方法,同时保留所有其他的。

The fault is that I have duplicate attributes on an XML node. I have been using simpleXML to read the files and process them into a useful values, but this line just breaks the system outright. The offending XML looks like this

错误在于我在XML节点上有重复的属性。我一直在使用simpleXML读取文件并将它们处理成有用的值,但是这条线直接破坏了系统。出错的XML看起来像这样。

<dCategory dec="1102" dup="45" dup="4576" loc="274" mov="31493" prf="23469" unq="240031" xxx="7861" />

What I would really like is the PHP equivalent of C#'s .MoveToNextAttribute() on the XML reader. I can't seem to find anything that doesn't just blow up when presented with the duplicate attribute.

我真正想要的是XML阅读器上c#的. movetonextattribute()的PHP版本。当显示duplicate属性时,我似乎找不到任何东西不会崩溃。

Anyone help out on this?

有人帮忙吗?

The answers linked to address errors in characters within the XML itself. e.g. & not appearing as &. The problem here is that the structure of the XML is broken, not the content. The answer in that thread returns

答案链接到XML内部字符中的地址错误。不以&出现。这里的问题是XML的结构被破坏了,而不是内容。该线程中的答案返回

 parser error : Attribute attr1 redefined

when presented with the XML

当显示XML时

<open-1 attr1="atr1" attr1="atr1">Text</open-1>

Which is what I am trying to parse.

这就是我要分析的。

1 个解决方案

#1


1  

You could use tidy to clean up your input :

你可以使用tidy整理你的输入:

<?php

$buffer = '<?xml version="1.0" encoding="UTF-8"?><open-1 attr1="atr1" attr1="atr1">Text</open-1>';

$config = [
 'indent' => true,
 'output-xml' => true,
 'input-xml' => true,
];

$tidy = tidy_parse_string($buffer, $config, 'UTF8');
$tidy->cleanRepair();
echo $tidy;

Will output :

将输出:

 <?xml version="1.0" encoding="utf-8"?>
 <open-1 attr1="atr1">Text</open-1>

#1


1  

You could use tidy to clean up your input :

你可以使用tidy整理你的输入:

<?php

$buffer = '<?xml version="1.0" encoding="UTF-8"?><open-1 attr1="atr1" attr1="atr1">Text</open-1>';

$config = [
 'indent' => true,
 'output-xml' => true,
 'input-xml' => true,
];

$tidy = tidy_parse_string($buffer, $config, 'UTF8');
$tidy->cleanRepair();
echo $tidy;

Will output :

将输出:

 <?xml version="1.0" encoding="utf-8"?>
 <open-1 attr1="atr1">Text</open-1>