如何在php DOMDocument中导入XML字符串

时间:2022-10-20 19:13:56

For exemple, i create a DOMDocument like that :

例如,我创建了一个这样的DOMDocument:

<?php

$implementation = new DOMImplementation();

$dtd =
  $implementation->createDocumentType
  (
    'html',                                     // qualifiedName
    '-//W3C//DTD XHTML 1.0 Transitional//EN',   // publicId
    'http://www.w3.org/TR/xhtml1/DTD/xhtml1-'
      .'transitional.dtd'                       // systemId
  );

$document = $implementation->createDocument('', '', $dtd);

$elementHtml     = $document->createElement('html');
$elementHead     = $document->createElement('head');
$elementBody     = $document->createElement('body');
$elementTitle    = $document->createElement('title');
$textTitre       = $document->createTextNode('My bweb page');
$attrLang        = $document->createAttribute('lang');
$attrLang->value = 'en';

$document->appendChild($elementHtml);
$elementHtml->appendChild($elementHead);
$elementHtml->appendChild($attrLang);
$elementHead->appendChild($elementTitle);
$elementTitle->appendChild($textTitre);
$elementHtml->appendChild($elementBody);

So, now, if i have some xhtml string like that :

现在,如果我有这样的xhtml字符串:

<?php
$xhtml = '<h1>Hello</h1><p>World</p>';

How can i import it in the <body> node of my DOMDocument ?

如何在DOMDocument的节点中导入它?

For now, the only solution I've found, is something like that :

现在,我找到的唯一解决办法是:

<?php
$simpleXmlElement = new SimpleXMLElement($xhtml);

$domElement = dom_import_simplexml($simpleXmlElement);

$domElement = $document->importNode($domElement, true);

$elementBody->appendChild($domElement);

This solution seems very bad for me, and create some problemes, like when I try with a string like that :

这个解决方案对我来说似乎很糟糕,并且产生了一些问题,比如当我尝试使用这样的字符串时:

<?php
$xhtml = '<p>Hello&nbsp;World</p>';

Ok, I can bypass this problem by converting xhtml entities in Unicode entities, but it's so ugly...

好的,我可以通过在Unicode实体中转换xhtml实体来绕过这个问题,但是它太丑了……

Any help ?

任何帮助吗?

Thanks by advance !

提前谢谢!

Related question :

相关问题:

2 个解决方案

#1


9  

The problem is DOM does not know that it should consider the XHTML DTD unless you validated the document against it. Unless you do that, DOM doesnt know any entities defined in the DTD, nor any other rules in it. Fortunately, we sorted out how to do the validation in that other question, so armed with that knowledge you can do

问题是,DOM不知道应该考虑XHTML DTD,除非您针对它验证文档。如果不这样做,DOM不知道DTD中定义的任何实体,也不知道其中的任何其他规则。幸运的是,我们找到了如何在另一个问题中进行验证的方法,有了这些知识,您就可以这么做了

$document->validate(); // anywhere before importing the other DOM

And then import with

然后导入

$fragment = $document->createDocumentFragment();
$fragment->appendXML('<h1>Hello</h1><p>Hello&nbsp;World</p>');
$document->getElementsByTagName('body')->item(0)->appendChild($fragment);
$document->formatOutput = TRUE;
echo $document->saveXml();

outputs:

输出:

<?xml version="1.0"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" lang="en" xml:lang="en">
  <head>
    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
    <title>My bweb page</title>
  </head>
  <body>
    <h1>Hello</h1>
    <p>Hello&nbsp;World</p>
  </body>
</html>

The other way to import XML into another DOM is to use

将XML导入另一个DOM的另一种方法是使用

$one = new DOMDocument;
$two = new DOMDocument;
$one->loadXml('<root><foo>one</foo></root>');
$two->loadXml('<root><bar><sub>two</sub></bar></root>');
$bar = $two->documentElement->firstChild; // we want to import the bar tree
$one->documentElement->appendChild($one->importNode($bar, TRUE));
echo $one->saveXml();

outputs:

输出:

<?xml version="1.0"?>
<root><foo>one</foo><bar><sub>two</sub></bar></root>

However, this cannot work with

然而,这不能起作用

<h1>Hello</h1><p>Hello&nbsp;World</p>

because when you load a document into DOM, DOM will overwrite everything you told it before about the document. Thus, when using load, libxml (and thus SimpleXml, DOM and XMLReader) does (do) not know you mean XHTML. And it does not know any entities defined in it and will fuzz about them instead. But even if the string would not contain the entity, it is not valid XML, because it lacks a root node. That's why you use the fragment.

因为当您将文档加载到DOM中时,DOM将覆盖您之前告诉它的关于文档的所有内容。因此,在使用load时,libxml(以及SimpleXml、DOM和XMLReader)并不知道您指的是XHTML。它不知道其中定义了什么实体,而是对它们进行模糊处理。但是,即使字符串不包含实体,它也不是有效的XML,因为它缺少根节点。这就是为什么要使用片段。

#2


1  

You can use a DomDocumentFragment for this:

您可以使用DomDocumentFragment for this:

$fragment = $document->createDocumentFragment();
$fragment->appendXml($xhtml);
$elementBody->appendChild($fragment);

That's all there is to it...

这就是一切……

Edit: Well, if you must have xhtml (instead of valid xml), you could do this dirty workaround:

编辑:好吧,如果您必须使用xhtml(而不是有效的xml),那么您可以进行这种肮脏的变通:

function xhtmlToDomNode($xhtml) {
    $dom = new DomDocument();
    $dom->loadHtml('<html><body>'.$xhtml.'</body></html>');
    $fragment = $dom->createDocumentFragment();
    $body = $dom->getElementByTagName('body')->item(0);
    foreach ($body->childNodes as $child) {
        $fragment->appendChild($child);
    }
    return $fragment;
}

usage:

用法:

$fragment = xhtmlToDomNode($xhtml);
$document->importNode($fragment, true);
$elementBody->appendChild($fragment);

#1


9  

The problem is DOM does not know that it should consider the XHTML DTD unless you validated the document against it. Unless you do that, DOM doesnt know any entities defined in the DTD, nor any other rules in it. Fortunately, we sorted out how to do the validation in that other question, so armed with that knowledge you can do

问题是,DOM不知道应该考虑XHTML DTD,除非您针对它验证文档。如果不这样做,DOM不知道DTD中定义的任何实体,也不知道其中的任何其他规则。幸运的是,我们找到了如何在另一个问题中进行验证的方法,有了这些知识,您就可以这么做了

$document->validate(); // anywhere before importing the other DOM

And then import with

然后导入

$fragment = $document->createDocumentFragment();
$fragment->appendXML('<h1>Hello</h1><p>Hello&nbsp;World</p>');
$document->getElementsByTagName('body')->item(0)->appendChild($fragment);
$document->formatOutput = TRUE;
echo $document->saveXml();

outputs:

输出:

<?xml version="1.0"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" lang="en" xml:lang="en">
  <head>
    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
    <title>My bweb page</title>
  </head>
  <body>
    <h1>Hello</h1>
    <p>Hello&nbsp;World</p>
  </body>
</html>

The other way to import XML into another DOM is to use

将XML导入另一个DOM的另一种方法是使用

$one = new DOMDocument;
$two = new DOMDocument;
$one->loadXml('<root><foo>one</foo></root>');
$two->loadXml('<root><bar><sub>two</sub></bar></root>');
$bar = $two->documentElement->firstChild; // we want to import the bar tree
$one->documentElement->appendChild($one->importNode($bar, TRUE));
echo $one->saveXml();

outputs:

输出:

<?xml version="1.0"?>
<root><foo>one</foo><bar><sub>two</sub></bar></root>

However, this cannot work with

然而,这不能起作用

<h1>Hello</h1><p>Hello&nbsp;World</p>

because when you load a document into DOM, DOM will overwrite everything you told it before about the document. Thus, when using load, libxml (and thus SimpleXml, DOM and XMLReader) does (do) not know you mean XHTML. And it does not know any entities defined in it and will fuzz about them instead. But even if the string would not contain the entity, it is not valid XML, because it lacks a root node. That's why you use the fragment.

因为当您将文档加载到DOM中时,DOM将覆盖您之前告诉它的关于文档的所有内容。因此,在使用load时,libxml(以及SimpleXml、DOM和XMLReader)并不知道您指的是XHTML。它不知道其中定义了什么实体,而是对它们进行模糊处理。但是,即使字符串不包含实体,它也不是有效的XML,因为它缺少根节点。这就是为什么要使用片段。

#2


1  

You can use a DomDocumentFragment for this:

您可以使用DomDocumentFragment for this:

$fragment = $document->createDocumentFragment();
$fragment->appendXml($xhtml);
$elementBody->appendChild($fragment);

That's all there is to it...

这就是一切……

Edit: Well, if you must have xhtml (instead of valid xml), you could do this dirty workaround:

编辑:好吧,如果您必须使用xhtml(而不是有效的xml),那么您可以进行这种肮脏的变通:

function xhtmlToDomNode($xhtml) {
    $dom = new DomDocument();
    $dom->loadHtml('<html><body>'.$xhtml.'</body></html>');
    $fragment = $dom->createDocumentFragment();
    $body = $dom->getElementByTagName('body')->item(0);
    foreach ($body->childNodes as $child) {
        $fragment->appendChild($child);
    }
    return $fragment;
}

usage:

用法:

$fragment = xhtmlToDomNode($xhtml);
$document->importNode($fragment, true);
$elementBody->appendChild($fragment);