将DOMDocument中的特定元素导出为字符串

时间:2022-10-20 15:38:57

I'm importing some arbitrary HTML into a DOMDocument using the loadHTML() function, eg.:

我正在使用loadHTML()函数将一些任意HTML导入DOMDocument,例如:

$html = '<p><a href="test.php">Test</a></p>';
$doc = new DOMDocument;
$doc->loadHTML($html);

I then want to change a few attributes/node values using DOMDocument methods which I can do no problem.

然后我想使用DOMDocument方法更改一些属性/节点值,我可以毫无问题。

Once I've made these changes I'd like to export the HTML string (using ->saveHTML()), without the <html><body>... tags that the DOMDocument automatically adds to the HTML.

一旦我做了这些更改,我想导出HTML字符串(使用 - > saveHTML()),而不使用DOMDocument自动添加到HTML的 ...标记。

I understand why these are added (to ensure a valid document), but how would I go about just getting my edited HTML back (essentially everything between the <body> tags)?

我理解为什么会添加这些(以确保有效的文档),但是我如何才能将我编辑的HTML(尤其是标签之间的所有内容)放回去?

I have read this post and while it offers some solutions I would rather do this 'properly', i.e. without using a string replace on the <body> tags. Validity of the HTML is not an issue as it's run through an HTML purifier before hand.

我已经阅读了这篇文章,虽然它提供了一些解决方案,但我宁愿这样做“正确”,即不使用标签上的字符串替换。 HTML的有效性不是问题,因为它预先通过HTML净化器运行。

Any ideas? Thanks.

有任何想法吗?谢谢。

EDIT

编辑

I'm aware of the $node parameter added to saveHTML() in PHP 5.3.6, unfortunately I'm stuck with 5.2.

我知道在PHP 5.3.6中为saveHTML()添加了$ node参数,不幸的是我坚持使用5.2。

3 个解决方案

#1


4  

Perhaps the source code of this will help - They're using a regex to strip out the unnecessary strings:

也许这个源代码会有所帮助 - 他们使用正则表达式删除不必要的字符串:

http://beerpla.net/projects/smartdomdocument-a-smarter-php-domdocument-class/

http://beerpla.net/projects/smartdomdocument-a-smarter-php-domdocument-class/

$content = preg_replace(array("/^\<\!DOCTYPE.*?<html><body>/si",
                                  "!</body></html>$!si"),
                            "",
                            $this->saveHTML());

return $content;

saveHTMLExact() - DOMDocument has an extremely badly designed "feature" where if the HTML code you are loading does not contain <html> and <body> tags, it adds them automatically (yup, there are no flags to turn this behavior off).

saveHTMLExact() - DOMDocument有一个设计极其糟糕的“功能”,如果你加载的HTML代码不包含和标签,它会自动添加它们(是的,没有标志可以关闭此行为) 。

Thus, when you call $doc->saveHTML(), your newly saved content now has <html><body> and DOCTYPE in it. Not very handy when trying to work with code fragments (XML has a similar problem).

因此,当您调用$ doc-> saveHTML()时,您新保存的内容现在包含 和DOCTYPE。在尝试使用代码片段时不是很方便(XML有类似的问题)。

SmartDOMDocument contains a new function called saveHTMLExact() which does exactly what you would want – it saves HTML without adding that extra garbage that DOMDocument does.

SmartDOMDocument包含一个名为saveHTMLExact()的新函数,它可以完全满足您的需要 - 它可以保存HTML而不会添加DOMDocument所做的额外垃圾。

Also, other questions have asked similar things:

此外,其他问题也提出了类似的问题:

How to saveHTML of DOMDocument without HTML wrapper?

如何在没有HTML包装的情况下保存DOMDocument的HTML?

#2


2  

Try using DOMDocument->saveXML()?

尝试使用DOMDocument-> saveXML()?

<?php
$html = '<p><a href="test.php">Test</a></p>';
$doc = new DOMDocument();
$doc->loadHTML($html);
$domnodelist = $doc->getElementsByTagName('p');
$domnode = $domnodelist->item(0);
echo $doc->saveXML($domnode);
?>

It outputs <p><a href="test.php">Test</a></p>

它输出

测试

#3


-1  

Thanks but I won't necessarily know the type of the first tag in the body, it needs to be generic

谢谢,但我不一定知道正文中第一个标签的类型,它需要是通用的

$domnodelist = $doc->getElementsByTagName('*');
$domnode = $domnodelist->item(0);
echo $doc->saveXML($domnode);

#1


4  

Perhaps the source code of this will help - They're using a regex to strip out the unnecessary strings:

也许这个源代码会有所帮助 - 他们使用正则表达式删除不必要的字符串:

http://beerpla.net/projects/smartdomdocument-a-smarter-php-domdocument-class/

http://beerpla.net/projects/smartdomdocument-a-smarter-php-domdocument-class/

$content = preg_replace(array("/^\<\!DOCTYPE.*?<html><body>/si",
                                  "!</body></html>$!si"),
                            "",
                            $this->saveHTML());

return $content;

saveHTMLExact() - DOMDocument has an extremely badly designed "feature" where if the HTML code you are loading does not contain <html> and <body> tags, it adds them automatically (yup, there are no flags to turn this behavior off).

saveHTMLExact() - DOMDocument有一个设计极其糟糕的“功能”,如果你加载的HTML代码不包含和标签,它会自动添加它们(是的,没有标志可以关闭此行为) 。

Thus, when you call $doc->saveHTML(), your newly saved content now has <html><body> and DOCTYPE in it. Not very handy when trying to work with code fragments (XML has a similar problem).

因此,当您调用$ doc-> saveHTML()时,您新保存的内容现在包含 和DOCTYPE。在尝试使用代码片段时不是很方便(XML有类似的问题)。

SmartDOMDocument contains a new function called saveHTMLExact() which does exactly what you would want – it saves HTML without adding that extra garbage that DOMDocument does.

SmartDOMDocument包含一个名为saveHTMLExact()的新函数,它可以完全满足您的需要 - 它可以保存HTML而不会添加DOMDocument所做的额外垃圾。

Also, other questions have asked similar things:

此外,其他问题也提出了类似的问题:

How to saveHTML of DOMDocument without HTML wrapper?

如何在没有HTML包装的情况下保存DOMDocument的HTML?

#2


2  

Try using DOMDocument->saveXML()?

尝试使用DOMDocument-> saveXML()?

<?php
$html = '<p><a href="test.php">Test</a></p>';
$doc = new DOMDocument();
$doc->loadHTML($html);
$domnodelist = $doc->getElementsByTagName('p');
$domnode = $domnodelist->item(0);
echo $doc->saveXML($domnode);
?>

It outputs <p><a href="test.php">Test</a></p>

它输出

测试

#3


-1  

Thanks but I won't necessarily know the type of the first tag in the body, it needs to be generic

谢谢,但我不一定知道正文中第一个标签的类型,它需要是通用的

$domnodelist = $doc->getElementsByTagName('*');
$domnode = $domnodelist->item(0);
echo $doc->saveXML($domnode);