PHP DOM:如何以优雅的方式通过标签名称获取子元素?

时间:2022-10-27 11:37:34

I'm parsing some XML with PHP DOM extension in order to store the data in some other form. Quite unsurprisingly, when I parse an element I pretty often need to obtain all children elements of some name. There is the method DOMElement::getElementsByTagName($name), but it returns all descendants with that name, not just immediate children. There is also the property DOMNode::$childNodes but (1) it contains node list, not element list, and even if I managed to turn the list items into elements (2) I'd still need to check all of them for the name. Is there really no elegant solution to get only the children of some specific name or am I missing something in the documentation?

我正在使用PHP DOM扩展解析一些XML,以便以其他形式存储数据。毫不奇怪,当我解析一个元素时,我经常需要获得一些名字的所有子元素。有方法DOMElement :: getElementsByTagName($ name),但它返回具有该名称的所有后代,而不仅仅是直接子项。还有属性DOMNode :: $ childNodes但是(1)它包含节点列表,而不是元素列表,即使我设法将列表项转换为元素(2)我仍然需要检查它们的所有内容名称。是否真的没有优雅的解决方案只能获得某些特定名称的孩子,或者我在文档中遗漏了什么?

Some illustration:

<?php

DOMDocument();
$document->loadXML(<<<EndOfXML
<a>
  <b>1</b>
  <b>2</b>
  <c>
    <b>3</b>
    <b>4</b>
  </c>
</a>
EndOfXML
);

$bs = $document
    ->getElementsByTagName('a')
    ->item(0)
    ->getElementsByTagName('b');

foreach($bs as $b){
    echo $b->nodeValue . "\n";
}

// Returns:
//   1
//   2
//   3
//   4
// I'd like to obtain only:
//   1
//   2

?>

3 个解决方案

#1


4  

An elegant manner I can imagine would be using a FilterIterator that is suitable for the job. Exemplary one that is able to work on such a said DOMNodeList and (optionally) accepting a tagname to filter for as an exemplary DOMElementFilter from the Iterator Garden does:

我能想象的优雅方式是使用适合这项工作的FilterIterator。能够处理这样的所述DOMNodeList并且(可选地)接受标记名以从Iterator Garden过滤作为示例性DOMElementFilter的示例性实例:

$a = $doc->getElementsByTagName('a')->item(0);

$bs = new DOMElementFilter($a->childNodes, 'b');

foreach($bs as $b){
    echo $b->nodeValue . "\n";
}

This will give the results you're looking for:

这将给出您正在寻找的结果:

1
2

You can find DOMElementFilter in the Development branch now. It's perhaps worth to allow * for any tagname as it's possible with getElementsByTagName("*") as well. But that's just some commentary.

您现在可以在Development分支中找到DOMElementFilter。对于任何标记名,允许使用getElementsByTagName(“*”)也可能是值得的。但这只是一些评论。

Hier is a working usage example online: https://eval.in/57170

Hier是一个在线工作用例:https://eval.in/57170

#2


2  

simple iteration process

简单的迭代过程

        $parent = $p->parentNode;

        foreach ( $parent->childNodes as $pp ) {

            if ( $pp->nodeName == 'p' ) {
                if ( strlen( $pp->nodeValue ) ) {
                    echo "{$pp->nodeValue}\n";
                }
            }

        }

#3


0  

My solution used in a production:

我在生产中使用的解决方案:

Finds a needle (node) in a haystack (DOM)

在大海捞针(DOM)中查找针(节点)

function getAttachableNodeByAttributeName(\DOMElement $parent = null, string $elementTagName = null, string $attributeName = null, string $attributeValue = null)
{
    $returnNode = null;

    $needleDOMNode = $parent->getElementsByTagName($elementTagName);

    $length = $needleDOMNode->length;
    //traverse through each existing given node object
    for ($i = $length; --$i >= 0;) {

        $needle = $needleDOMNode->item($i);

        //only one DOM node and no attributes specified?
        if (!$attributeName && !$attributeValue && 1 === $length) return $needle;
        //multiple nodes and attributes are specified
        elseif ($attributeName && $attributeValue && $needle->getAttribute($attributeName) === $attributeValue) return $needle;
    }

    return $returnNode;
}

Usage:

$countryNode = getAttachableNodeByAttributeName($countriesNode, 'country', 'iso', 'NL');

Returns DOM element from parent countries node by specified attribute iso using country ISO code 'NL', basically like a real search would do. Find a certain country by it's name in an array / object.

使用国家ISO代码'NL'通过指定的属性iso从父国家节点返回DOM元素,基本上就像真正的搜索一样。在数组/对象中按名称查找某个国家/地区。

Another usage example:

另一个用法示例:

$productNode = getAttachableNodeByAttributeName($products, 'partner-products');

Returns DOM node element containing only single (root) node, without searching by any attribute. Note: for this you must make sure that root nodes are unique by elements' tag name, e.g. countries->country[ISO] - countries node here is unique and parent to all child nodes.

返回仅包含单个(根)节点的DOM节点元素,而不搜索任何属性。注意:为此,您必须确保根节点的元素标记名称是唯一的,例如, countries-> country [ISO] - 这里的国家/地区节点是唯一的并且是所有子节点的父节点。

#1


4  

An elegant manner I can imagine would be using a FilterIterator that is suitable for the job. Exemplary one that is able to work on such a said DOMNodeList and (optionally) accepting a tagname to filter for as an exemplary DOMElementFilter from the Iterator Garden does:

我能想象的优雅方式是使用适合这项工作的FilterIterator。能够处理这样的所述DOMNodeList并且(可选地)接受标记名以从Iterator Garden过滤作为示例性DOMElementFilter的示例性实例:

$a = $doc->getElementsByTagName('a')->item(0);

$bs = new DOMElementFilter($a->childNodes, 'b');

foreach($bs as $b){
    echo $b->nodeValue . "\n";
}

This will give the results you're looking for:

这将给出您正在寻找的结果:

1
2

You can find DOMElementFilter in the Development branch now. It's perhaps worth to allow * for any tagname as it's possible with getElementsByTagName("*") as well. But that's just some commentary.

您现在可以在Development分支中找到DOMElementFilter。对于任何标记名,允许使用getElementsByTagName(“*”)也可能是值得的。但这只是一些评论。

Hier is a working usage example online: https://eval.in/57170

Hier是一个在线工作用例:https://eval.in/57170

#2


2  

simple iteration process

简单的迭代过程

        $parent = $p->parentNode;

        foreach ( $parent->childNodes as $pp ) {

            if ( $pp->nodeName == 'p' ) {
                if ( strlen( $pp->nodeValue ) ) {
                    echo "{$pp->nodeValue}\n";
                }
            }

        }

#3


0  

My solution used in a production:

我在生产中使用的解决方案:

Finds a needle (node) in a haystack (DOM)

在大海捞针(DOM)中查找针(节点)

function getAttachableNodeByAttributeName(\DOMElement $parent = null, string $elementTagName = null, string $attributeName = null, string $attributeValue = null)
{
    $returnNode = null;

    $needleDOMNode = $parent->getElementsByTagName($elementTagName);

    $length = $needleDOMNode->length;
    //traverse through each existing given node object
    for ($i = $length; --$i >= 0;) {

        $needle = $needleDOMNode->item($i);

        //only one DOM node and no attributes specified?
        if (!$attributeName && !$attributeValue && 1 === $length) return $needle;
        //multiple nodes and attributes are specified
        elseif ($attributeName && $attributeValue && $needle->getAttribute($attributeName) === $attributeValue) return $needle;
    }

    return $returnNode;
}

Usage:

$countryNode = getAttachableNodeByAttributeName($countriesNode, 'country', 'iso', 'NL');

Returns DOM element from parent countries node by specified attribute iso using country ISO code 'NL', basically like a real search would do. Find a certain country by it's name in an array / object.

使用国家ISO代码'NL'通过指定的属性iso从父国家节点返回DOM元素,基本上就像真正的搜索一样。在数组/对象中按名称查找某个国家/地区。

Another usage example:

另一个用法示例:

$productNode = getAttachableNodeByAttributeName($products, 'partner-products');

Returns DOM node element containing only single (root) node, without searching by any attribute. Note: for this you must make sure that root nodes are unique by elements' tag name, e.g. countries->country[ISO] - countries node here is unique and parent to all child nodes.

返回仅包含单个(根)节点的DOM节点元素,而不搜索任何属性。注意:为此,您必须确保根节点的元素标记名称是唯一的,例如, countries-> country [ISO] - 这里的国家/地区节点是唯一的并且是所有子节点的父节点。