Python:如何使用XML .dom. dom.minidom获取XML元素的文本内容?

时间:2022-01-12 22:38:44

I've called elems = xmldoc.getElementsByTagName('myTagName') on an XML object that I parsed as minidom.parse(xmlObj). Now I'm trying to get the text content of this element, and although I spent a while looking through the dir() and trying things out, I haven't found the call yet. As an example of what I want to accomplish, in:

我在一个XML对象上调用了elems = xmldoc.getElementsByTagName('myTagName'),我解析为minidom.parse(xmlObj)。现在我正在尝试获取这个元素的文本内容,尽管我花了一些时间查看dir()并尝试了一些东西,但是我还没有找到调用。作为我想要完成的一个例子:

<myTagName> Hello there </myTagName>

< myTagName >你好< / myTagName >

I would like the extract just "Hello there". (obviously I could parse this myself but I expect there is some built-in functionality)

我想要节选“你好”。(显然我可以自己解析它,但我希望有一些内置功能)

Thanks

谢谢

3 个解决方案

#1


23  

Try like this:

试试这样:

xmldoc.getElementsByTagName('myTagName')[0].firstChild.nodeValue

#2


4  

for elem in elems:
    print elem.firstValue.nodeValue

That will print out each myTagName's text.

这将打印出每个myTagName的文本。

James

詹姆斯

#3


3  

wait a mo... do you want ALL the text under a given node? It has then to involve a subtree traversal function of some kind. Doesn't have to be recursive but this works fine:

等莫…您想要给定节点下的所有文本吗?它必须包含一个子树遍历函数。不一定是递归的,但这很好:

    def get_all_text( node ):
        if node.nodeType ==  node.TEXT_NODE:
            return node.data
        else:
            text_string = ""
            for child_node in node.childNodes:
                text_string += get_all_text( child_node )
            return text_string

#1


23  

Try like this:

试试这样:

xmldoc.getElementsByTagName('myTagName')[0].firstChild.nodeValue

#2


4  

for elem in elems:
    print elem.firstValue.nodeValue

That will print out each myTagName's text.

这将打印出每个myTagName的文本。

James

詹姆斯

#3


3  

wait a mo... do you want ALL the text under a given node? It has then to involve a subtree traversal function of some kind. Doesn't have to be recursive but this works fine:

等莫…您想要给定节点下的所有文本吗?它必须包含一个子树遍历函数。不一定是递归的,但这很好:

    def get_all_text( node ):
        if node.nodeType ==  node.TEXT_NODE:
            return node.data
        else:
            text_string = ""
            for child_node in node.childNodes:
                text_string += get_all_text( child_node )
            return text_string