在Python中从XML获取文本值

from xml.dom.minidom import parseString
dom = parseString(data)
data = dom.getElementsByTagName('data')

the 'data' variable returns as an element object but I cant for the life of me see in the documentation to grab the text value of the element.

'data'变量作为元素对象返回但是我不能在生活中看到在文档中获取元素的文本值。

For example:

<something><data>I WANT THIS</data></something>

Anyone have any ideas?

有人有主意吗?

2 个解决方案

#1

This should do the trick:

这应该是诀窍:

dom = parseString('<something><data>I WANT THIS</data></something>')
data = dom.getElementsByTagName('data')[0].childNodes[0].data

i.e. you need to wade deeper into the DOM structure to get at the text child node and then access its value.

即,您需要深入了解DOM结构以获取文本子节点,然后访问其值。

#2

So the way to look at it is that "I WANT THIS" is actually another node. It's a text child of "data".

所以看待它的方式是“我想要这个”实际上是另一个节点。它是“数据”的文本孩子。

from xml.dom.minidom import parseString
dom = parseString(data)
nodes = dom.getElementsByTagName('data')

At this point, "nodes" is a NodeList and in your example, it has one item in it which is the "data" element. Correspondingly the "data" element also only has one child which is a text node "I WANT THIS".

此时,“节点”是NodeList,在您的示例中,它有一个项目,即“data”元素。相应地,“data”元素也只有一个子节点,它是一个文本节点“我想要这个”。

So you could just do something like this:

所以你可以这样做:

print nodes[0].firstChild.nodeValue

Note that in the case where you have more than one tag called "data" in your input, you should use some sort of iteration technique on "nodes" rather than index it directly.

请注意,如果输入中有多个名为“data”的标记,则应在“节点”上使用某种迭代技术,而不是直接对其进行索引。

#1