LIBXML——如何获取标记的名称?

时间:2022-11-09 08:26:44

I have the following:

我有以下几点:

my $string='<entry><name>Bob</name><zip>90210</zip></entry>';

my $parser=XML::LibXML->new(); 
use HTML::Entities;
my $encodedXml=encode_entities($string,'&\'');

my $doc=$parser->parse_string($encodedXml);

foreach my $text($doc->findnodes("//text()")){
print $text->to_literal,"\n";
}

This prints out 'Bob' and '90210';

打印出“Bob”和“90210”;

How do I get the actual node names...I need a way to get all the nodes within my xml tree....ie 'name' and 'zip'

如何获得实际的节点名……我需要一种方法来得到我的xml树中的所有节点....即“名称”和“zip”

2 个解决方案

#1


6  

Text nodes don't have names. Perhaps you want the name of the parent?

文本节点没有名称。也许你想要父母的名字?

I think this will work:

我认为这行得通:

for my $node ($doc->findnodes('//text()')) {
   print $node->parentNode()->nodeName(), ": ", $node->nodeValue(), "\n";
}

I would use

我将使用

for my $node ($doc->findnodes('//*[text()]')) {
   print $node->nodeName(), ": ", $node->textContent(), "\n";
}

Note: This later version combines all the text children of the element, so it's not equivalent if a node has more than one text child. They should be equivalent for you, though.

注意:这个后期版本结合了元素的所有文本子元素,因此如果一个节点有多个文本子元素,那么它并不等价。但它们对你来说应该是等价的。

#2


1  

What your code does is select the text nodes, which exist as children of the nodes you are looking for. A text node is a separate entity, and it does not have a name. You need to navigate to the text node's parent and that node will contain the tag name.

代码所做的是选择文本节点,这些节点作为要查找的节点的子节点存在。文本节点是一个独立的实体,它没有名称。您需要导航到文本节点的父节点,该节点将包含标记名。

Things get trickier with mixed-content nodes that contain both text and element nodes, such as

对于包含文本和元素节点的混合内容节点,比如

<p>Beginning of <i>sentence</i> and now the end</p>

In this case the structure is

在这种情况下,结构是

<p>
 |
 +---text (Beginning of )
 |
 +---<i>
 |    |
 |    +---text (sentence)
 |
 +---text ( and now the end)

#1


6  

Text nodes don't have names. Perhaps you want the name of the parent?

文本节点没有名称。也许你想要父母的名字?

I think this will work:

我认为这行得通:

for my $node ($doc->findnodes('//text()')) {
   print $node->parentNode()->nodeName(), ": ", $node->nodeValue(), "\n";
}

I would use

我将使用

for my $node ($doc->findnodes('//*[text()]')) {
   print $node->nodeName(), ": ", $node->textContent(), "\n";
}

Note: This later version combines all the text children of the element, so it's not equivalent if a node has more than one text child. They should be equivalent for you, though.

注意:这个后期版本结合了元素的所有文本子元素,因此如果一个节点有多个文本子元素,那么它并不等价。但它们对你来说应该是等价的。

#2


1  

What your code does is select the text nodes, which exist as children of the nodes you are looking for. A text node is a separate entity, and it does not have a name. You need to navigate to the text node's parent and that node will contain the tag name.

代码所做的是选择文本节点,这些节点作为要查找的节点的子节点存在。文本节点是一个独立的实体,它没有名称。您需要导航到文本节点的父节点,该节点将包含标记名。

Things get trickier with mixed-content nodes that contain both text and element nodes, such as

对于包含文本和元素节点的混合内容节点,比如

<p>Beginning of <i>sentence</i> and now the end</p>

In this case the structure is

在这种情况下,结构是

<p>
 |
 +---text (Beginning of )
 |
 +---<i>
 |    |
 |    +---text (sentence)
 |
 +---text ( and now the end)