I'm using DOM to parse string. I need function that strips span tags and its contents. For example, if I have:
我正在使用DOM来解析字符串。我需要剥离span标签及其内容的功能。例如,如果我有:
This is some text that contains photo.
<span class='title'> photobyile</span>
I would like function to return
我想功能返回
This is some text that contains photo.
This is what I tried:
这是我试过的:
$dom = new domDocument;
$dom->loadHTML($string);
$dom->preserveWhiteSpace = false;
$spans = $dom->getElementsByTagName('span');
foreach($spans as $span)
{
$naslov = $span->nodeValue;
echo $naslov;
$string = preg_replace("/$naslov/", " ", $string);
}
I'm aware that $span->nodeValue
returns value of span tag and not whole tag, but I don't know how to get whole tag, together with class name.
我知道$ span-> nodeValue返回span标记的值而不是整个标记,但我不知道如何获得整个标记以及类名。
Thanks, Ile
2 个解决方案
#1
9
Try removing the spans directly from the DOM tree.
尝试直接从DOM树中删除跨度。
$dom = new DOMDocument();
$dom->loadHTML($string);
$dom->preserveWhiteSpace = false;
$elements = $dom->getElementsByTagName('span');
while($span = $elements->item(0)) {
$span->parentNode->removeChild($span);
}
echo $dom->saveHTML();
#2
1
@ile - I've had that problem - it's because the index of the foreach iterator happily keeps incrementing, while calling removeChild() on the DOM also seems to remove the nodes from the DomNodeList ($spans). So for every span you remove, the nodelist shrinks one element and then gets its foreach counter incremented by one. Net result: it skips one span.
@ile - 我遇到了这个问题 - 因为foreach迭代器的索引很快就会继续增加,而在DOM上调用removeChild()似乎也会从DomNodeList($ spans)中删除节点。因此,对于您删除的每个跨度,nodelist缩小一个元素,然后将其foreach计数器加1。最终结果:它跳过一个跨度。
I'm sure there is a more elegant way, but this is how I did it - I moved the references from the DomNodeList to a second array, where they would not be removed by the removeChild() operation.
我确信有一种更优雅的方式,但这就是我做的方式 - 我将引用从DomNodeList移动到第二个数组,在那里它们不会被removeChild()操作删除。
foreach($spans as $span) {
$nodes[] = $span;
}
foreach($nodes as $span) {
$span->parentNode->removeChild($span);
}
#1
9
Try removing the spans directly from the DOM tree.
尝试直接从DOM树中删除跨度。
$dom = new DOMDocument();
$dom->loadHTML($string);
$dom->preserveWhiteSpace = false;
$elements = $dom->getElementsByTagName('span');
while($span = $elements->item(0)) {
$span->parentNode->removeChild($span);
}
echo $dom->saveHTML();
#2
1
@ile - I've had that problem - it's because the index of the foreach iterator happily keeps incrementing, while calling removeChild() on the DOM also seems to remove the nodes from the DomNodeList ($spans). So for every span you remove, the nodelist shrinks one element and then gets its foreach counter incremented by one. Net result: it skips one span.
@ile - 我遇到了这个问题 - 因为foreach迭代器的索引很快就会继续增加,而在DOM上调用removeChild()似乎也会从DomNodeList($ spans)中删除节点。因此,对于您删除的每个跨度,nodelist缩小一个元素,然后将其foreach计数器加1。最终结果:它跳过一个跨度。
I'm sure there is a more elegant way, but this is how I did it - I moved the references from the DomNodeList to a second array, where they would not be removed by the removeChild() operation.
我确信有一种更优雅的方式,但这就是我做的方式 - 我将引用从DomNodeList移动到第二个数组,在那里它们不会被removeChild()操作删除。
foreach($spans as $span) {
$nodes[] = $span;
}
foreach($nodes as $span) {
$span->parentNode->removeChild($span);
}