使用python(带有xmlns属性)进行XML解析是行不通的

时间:2021-10-11 23:00:31

It's my first time trying to parse XML with python so answer could be simple but I can't figure this out.

这是我第一次尝试用python解析XML,所以答案可能很简单,但我搞不清楚。

I'm using ElementTree to parse some XML file. Problem is that I cannot get any result inside the tree when having this attribute:

我使用ElementTree解析一些XML文件。问题是当我有这个属性时,我不能在树中得到任何结果:

<package xmlns="http://apple.com/itunes/importer" version="software5.1">

When removing this attribute everything works great. To be clear I mean when changing first line of XML file to:

当删除这个属性时,一切都很好。我的意思是,当把XML文件的第一行更改为:

<package>

Everything works great.

一切都很好。

What am I doing wrong?

我做错了什么?

Here is my code:

这是我的代码:

import xml.etree.ElementTree as ET

tree = ET.parse('metadataCopy.xml')
root = tree.getroot()

p = root.find(".//intervals/interval")

print p
for interval in root.iterfind(".//intervals/interval"):
    start_date = interval.find('start_date').text
    end_date = interval.find('end_date').text
    print start_date, end_date

Please help. Thanks!

请帮助。谢谢!

UPDATE: The XML file:

更新:XML文件:

<?xml version="1.0" encoding="UTF-8"?>
<package xmlns="http://apple.com/itunes/importer" version="software5.1">
<metadata_token>TOKEN</metadata_token>
<provider>Provider Name</provider>
<team_id>Team_ID_Here</team_id>
<software>
    <!--Apple ID: 01234567-->
    <vendor_id>vendorSKU</vendor_id>
    <read_only_info>
        <read_only_value key="apple-id">01234567</read_only_value>
    </read_only_info>
    <software_metadata>
        <versions>
            <version string="1.0">
                <locales>
                    <locale name="en-US">
                        <title>title text</title>
                        <description>Description text</description>
                        <keywords>
                            <keyword>key1</keyword>
                            <keyword>key2</keyword>
                        </keywords>
                        <version_whats_new>New things here</version_whats_new>
                        <support_url>http://someurl.com</support_url>
                        <software_screenshots>
                            <software_screenshot display_target="iOS-3.5-in" position="1">

                            </software_screenshot>
                            <software_screenshot display_target="iOS-4-in" position="1">

                            </software_screenshot>
                        </software_screenshots>
                    </locale>
                </locales>
            </version>
        </versions>
        <products>
            <product>
                <territory>WW</territory>
                <cleared_for_sale>true</cleared_for_sale>
                <sales_start_date>2013-01-05</sales_start_date>
                <intervals>
                    <interval>
                        <start_date>2013-08-25</start_date>
                        <end_date>2014-09-01</end_date>
                        <wholesale_price_tier>5</wholesale_price_tier>
                    </interval>
                    <interval>
                        <start_date>2014-09-01</start_date>
                        <wholesale_price_tier>6</wholesale_price_tier>
                    </interval>
                </intervals>
                <allow_volume_discount>true</allow_volume_discount>
            </product>
        </products>
    </software_metadata>
</software>

1 个解决方案

#1


4  

This is because, xml in python is not auto aware of namespaces. We need to prefix every element in a tree with the namespace prefix for lookup.

这是因为在python中,xml并不自动知道名称空间。我们需要在树中的每个元素前面加上前缀以进行查找。

    import xml.etree.ElementTree as ET

namespaces = {"pns" : "http://apple.com/itunes/importer"}
tree = ET.parse('metadataCopy.xml')
root = tree.getroot()

p = root.find(".//pns:intervals/pns:interval", namespaces=namespaces)

print p
for interval in root.iterfind(".//pns:intervals/pns:interval",namespaces=namespaces):
    start_date = interval.find('pns:start_date',namespaces=namespaces)
    end_date = interval.find('pns:end_date',namespaces=namespaces)
    st_text = end_text = None
    if start_date is not None:
        st_text = start_date.text
    if end_date is not None:
        end_text = end_date.text 
    print st_text, end_text

The xml file shared is not well formed XML. The last tag has to end with package tag. With this change done, programs produces:

共享的xml文件不是格式良好的xml。最后一个标签必须以包标签结束。当这个改变完成后,程序产生:

<Element '{http://apple.com/itunes/importer}interval' at 0x178b350>
2013-08-25 2014-09-01
2014-09-01 None

If its possible to change the library, you can look for using lxml. lxml has a great support for working with namespaces. Check out the quick short tutorial here http://lxml.de/tutorial.html#namespaces

如果可以更改库,可以使用lxml查找。lxml对使用名称空间有很大的支持。请查看这里的quick short教程http://lxml.de/tutories.html #名称空间。

#1


4  

This is because, xml in python is not auto aware of namespaces. We need to prefix every element in a tree with the namespace prefix for lookup.

这是因为在python中,xml并不自动知道名称空间。我们需要在树中的每个元素前面加上前缀以进行查找。

    import xml.etree.ElementTree as ET

namespaces = {"pns" : "http://apple.com/itunes/importer"}
tree = ET.parse('metadataCopy.xml')
root = tree.getroot()

p = root.find(".//pns:intervals/pns:interval", namespaces=namespaces)

print p
for interval in root.iterfind(".//pns:intervals/pns:interval",namespaces=namespaces):
    start_date = interval.find('pns:start_date',namespaces=namespaces)
    end_date = interval.find('pns:end_date',namespaces=namespaces)
    st_text = end_text = None
    if start_date is not None:
        st_text = start_date.text
    if end_date is not None:
        end_text = end_date.text 
    print st_text, end_text

The xml file shared is not well formed XML. The last tag has to end with package tag. With this change done, programs produces:

共享的xml文件不是格式良好的xml。最后一个标签必须以包标签结束。当这个改变完成后,程序产生:

<Element '{http://apple.com/itunes/importer}interval' at 0x178b350>
2013-08-25 2014-09-01
2014-09-01 None

If its possible to change the library, you can look for using lxml. lxml has a great support for working with namespaces. Check out the quick short tutorial here http://lxml.de/tutorial.html#namespaces

如果可以更改库,可以使用lxml查找。lxml对使用名称空间有很大的支持。请查看这里的quick short教程http://lxml.de/tutories.html #名称空间。