如何从PHP XML中删除所有名称空间(标记和属性)

时间:2022-05-24 15:42:33

I've recently had serious grief from XML namespaces and dealing with them effectively in PHP. Here's a sample of the worst kind of culprit:

最近,我对XML名称空间和在PHP中有效地处理它们感到非常难过。这里有一个最坏的罪魁祸首的例子:

<dc:type xsi:type="TypeName" xsi:identifier="NN">Others</dc:type>

What I successfully managed to do using preg_replace was to "un-namespace" the tags (without breaking URLs) using:

使用preg_replace,我成功地实现了对标记(不破坏url)的“取消命名空间”:

$xml = preg_replace(
  '/<(\/?)([^:" ].*):([^>\/ ].*)(\/?)>/msiU',
  '<$1$2_$3$4>',
  $x->readOuterXML()
);

# <dc_type xsi:type="TypeName" xsi:identifier="NN">Others</dc_type>

What I couldn't do - through lack of regular expression wizardry - was convert all namespaced attributes into the same format. I managed to convert the first occurence, but don't know how to set a repeatable condition. I deleted the code because it didn't work (and I can't remember what I did), but the result was like this:

由于缺乏正则表达式向导,我无法将所有名称空间属性转换为相同的格式。我设法转换了第一个出现的情况,但是不知道如何设置可重复的条件。我删除了代码,因为它不起作用(我不记得我做了什么),但是结果是这样的:

<dc_type xsi_type="TypeName" xsi:identifier="NN">Others</dc_type>

Whereas what would be beautiful is this:

而美丽的是:

<dc_type xsi_type="TypeName" xsi_identifier="NN">Others</dc_type>

Are there any regex masters out there who can help?

有什么regex大师可以帮忙吗?

2 个解决方案

#1


1  

To rewrite a complete XML document like renaming element or attribute names as well as changing namespace related data like xmlns attributes, you can use the expat based xml parser extension:

要重写完整的XML文档(如重命名元素或属性名),以及更改名称空间相关数据(如xmlns属性),可以使用基于expat的XML解析器扩展:

This works by parsing the file and change the output on the fly. The parser invokes callback functions (so called handler) that gets the data pre-parsed, for example the elements name in form of a string and the attributes in form of an array.

这可以通过解析文件并动态更改输出来实现。解析器调用回调函数(所谓的处理程序)来获取预先解析的数据,例如字符串形式的元素名称和数组形式的属性。

You then can change these values on the fly and output the (potentially changed) data.

然后可以动态地更改这些值并输出(可能更改的)数据。

Done this way you don't need to care about regular expressions any longer (which is non-trivial for proper XML parsing).

这样做,您就不需要再关心正则表达式了(这对于正确的XML解析来说非常重要)。

You can find some boilerplate code to get this started in a previous answer of mine.

您可以找到一些样板代码,以便在我以前的回答中开始这一工作。

#2


4  

I was looking for the same thing but I know better than to try using regular expressions against XML (search for just about any StackOverfow question about parsing XML/HTML with regex and read the whole answer to find out why. You'll know it when you see it)!

我也在寻找同样的东西,但我知道最好不要对XML使用正则表达式(搜索关于使用regex解析XML/HTML的StackOverfow问题,并阅读完整的答案以找出原因。当你看到它的时候你就会知道)!

Here is the code I came up with:

下面是我想到的代码:

<?php
// Some test XML
$xml = <<<XML
<root xmlns:a="bogus.a" xmlns:b="bogus.b">
    <a:first>
        <b:second>text</b:second>
    </a:first>
</root>
XML;

$sxe = new SimpleXMLElement($xml);
$dom_sxe = dom_import_simplexml($sxe);

$dom = new DOMDocument('1.0');
$dom_sxe = $dom->importNode($dom_sxe, true);
$dom_sxe = $dom->appendChild($dom_sxe);

$element = $dom->childNodes->item(0);

// See what the XML looks like before the transformation
echo "<pre>\n" . htmlspecialchars($dom->saveXML()) . "\n</pre>";
foreach ($sxe->getDocNamespaces() as $name => $uri) {
    $element->removeAttributeNS($uri, $name);
}
// See what the XML looks like after the transformation
echo "<pre>\n" . htmlspecialchars($dom->saveXML()) . "\n</pre>";
?>

#1


1  

To rewrite a complete XML document like renaming element or attribute names as well as changing namespace related data like xmlns attributes, you can use the expat based xml parser extension:

要重写完整的XML文档(如重命名元素或属性名),以及更改名称空间相关数据(如xmlns属性),可以使用基于expat的XML解析器扩展:

This works by parsing the file and change the output on the fly. The parser invokes callback functions (so called handler) that gets the data pre-parsed, for example the elements name in form of a string and the attributes in form of an array.

这可以通过解析文件并动态更改输出来实现。解析器调用回调函数(所谓的处理程序)来获取预先解析的数据,例如字符串形式的元素名称和数组形式的属性。

You then can change these values on the fly and output the (potentially changed) data.

然后可以动态地更改这些值并输出(可能更改的)数据。

Done this way you don't need to care about regular expressions any longer (which is non-trivial for proper XML parsing).

这样做,您就不需要再关心正则表达式了(这对于正确的XML解析来说非常重要)。

You can find some boilerplate code to get this started in a previous answer of mine.

您可以找到一些样板代码,以便在我以前的回答中开始这一工作。

#2


4  

I was looking for the same thing but I know better than to try using regular expressions against XML (search for just about any StackOverfow question about parsing XML/HTML with regex and read the whole answer to find out why. You'll know it when you see it)!

我也在寻找同样的东西,但我知道最好不要对XML使用正则表达式(搜索关于使用regex解析XML/HTML的StackOverfow问题,并阅读完整的答案以找出原因。当你看到它的时候你就会知道)!

Here is the code I came up with:

下面是我想到的代码:

<?php
// Some test XML
$xml = <<<XML
<root xmlns:a="bogus.a" xmlns:b="bogus.b">
    <a:first>
        <b:second>text</b:second>
    </a:first>
</root>
XML;

$sxe = new SimpleXMLElement($xml);
$dom_sxe = dom_import_simplexml($sxe);

$dom = new DOMDocument('1.0');
$dom_sxe = $dom->importNode($dom_sxe, true);
$dom_sxe = $dom->appendChild($dom_sxe);

$element = $dom->childNodes->item(0);

// See what the XML looks like before the transformation
echo "<pre>\n" . htmlspecialchars($dom->saveXML()) . "\n</pre>";
foreach ($sxe->getDocNamespaces() as $name => $uri) {
    $element->removeAttributeNS($uri, $name);
}
// See what the XML looks like after the transformation
echo "<pre>\n" . htmlspecialchars($dom->saveXML()) . "\n</pre>";
?>