It's my first time trying to parse XML with python so answer could be simple but I can't figure this out.
这是我第一次尝试用python解析XML,所以答案可能很简单,但我搞不清楚。
I'm using ElementTree to parse some XML file. Problem is that I cannot get any result inside the tree when having this attribute:
我使用ElementTree解析一些XML文件。问题是当我有这个属性时,我不能在树中得到任何结果:
<package xmlns="http://apple.com/itunes/importer" version="software5.1">
When removing this attribute everything works great. To be clear I mean when changing first line of XML file to:
当删除这个属性时,一切都很好。我的意思是,当把XML文件的第一行更改为:
<package>
Everything works great.
一切都很好。
What am I doing wrong?
我做错了什么?
Here is my code:
这是我的代码:
import xml.etree.ElementTree as ET
tree = ET.parse('metadataCopy.xml')
root = tree.getroot()
p = root.find(".//intervals/interval")
print p
for interval in root.iterfind(".//intervals/interval"):
start_date = interval.find('start_date').text
end_date = interval.find('end_date').text
print start_date, end_date
Please help. Thanks!
请帮助。谢谢!
UPDATE: The XML file:
更新:XML文件:
<?xml version="1.0" encoding="UTF-8"?>
<package xmlns="http://apple.com/itunes/importer" version="software5.1">
<metadata_token>TOKEN</metadata_token>
<provider>Provider Name</provider>
<team_id>Team_ID_Here</team_id>
<software>
<!--Apple ID: 01234567-->
<vendor_id>vendorSKU</vendor_id>
<read_only_info>
<read_only_value key="apple-id">01234567</read_only_value>
</read_only_info>
<software_metadata>
<versions>
<version string="1.0">
<locales>
<locale name="en-US">
<title>title text</title>
<description>Description text</description>
<keywords>
<keyword>key1</keyword>
<keyword>key2</keyword>
</keywords>
<version_whats_new>New things here</version_whats_new>
<support_url>http://someurl.com</support_url>
<software_screenshots>
<software_screenshot display_target="iOS-3.5-in" position="1">
</software_screenshot>
<software_screenshot display_target="iOS-4-in" position="1">
</software_screenshot>
</software_screenshots>
</locale>
</locales>
</version>
</versions>
<products>
<product>
<territory>WW</territory>
<cleared_for_sale>true</cleared_for_sale>
<sales_start_date>2013-01-05</sales_start_date>
<intervals>
<interval>
<start_date>2013-08-25</start_date>
<end_date>2014-09-01</end_date>
<wholesale_price_tier>5</wholesale_price_tier>
</interval>
<interval>
<start_date>2014-09-01</start_date>
<wholesale_price_tier>6</wholesale_price_tier>
</interval>
</intervals>
<allow_volume_discount>true</allow_volume_discount>
</product>
</products>
</software_metadata>
</software>
1 个解决方案
#1
4
This is because, xml in python is not auto aware of namespaces. We need to prefix every element in a tree with the namespace prefix for lookup.
这是因为在python中,xml并不自动知道名称空间。我们需要在树中的每个元素前面加上前缀以进行查找。
import xml.etree.ElementTree as ET
namespaces = {"pns" : "http://apple.com/itunes/importer"}
tree = ET.parse('metadataCopy.xml')
root = tree.getroot()
p = root.find(".//pns:intervals/pns:interval", namespaces=namespaces)
print p
for interval in root.iterfind(".//pns:intervals/pns:interval",namespaces=namespaces):
start_date = interval.find('pns:start_date',namespaces=namespaces)
end_date = interval.find('pns:end_date',namespaces=namespaces)
st_text = end_text = None
if start_date is not None:
st_text = start_date.text
if end_date is not None:
end_text = end_date.text
print st_text, end_text
The xml file shared is not well formed XML. The last tag has to end with package tag. With this change done, programs produces:
共享的xml文件不是格式良好的xml。最后一个标签必须以包标签结束。当这个改变完成后,程序产生:
<Element '{http://apple.com/itunes/importer}interval' at 0x178b350>
2013-08-25 2014-09-01
2014-09-01 None
If its possible to change the library, you can look for using lxml. lxml has a great support for working with namespaces. Check out the quick short tutorial here http://lxml.de/tutorial.html#namespaces
如果可以更改库,可以使用lxml查找。lxml对使用名称空间有很大的支持。请查看这里的quick short教程http://lxml.de/tutories.html #名称空间。
#1
4
This is because, xml in python is not auto aware of namespaces. We need to prefix every element in a tree with the namespace prefix for lookup.
这是因为在python中,xml并不自动知道名称空间。我们需要在树中的每个元素前面加上前缀以进行查找。
import xml.etree.ElementTree as ET
namespaces = {"pns" : "http://apple.com/itunes/importer"}
tree = ET.parse('metadataCopy.xml')
root = tree.getroot()
p = root.find(".//pns:intervals/pns:interval", namespaces=namespaces)
print p
for interval in root.iterfind(".//pns:intervals/pns:interval",namespaces=namespaces):
start_date = interval.find('pns:start_date',namespaces=namespaces)
end_date = interval.find('pns:end_date',namespaces=namespaces)
st_text = end_text = None
if start_date is not None:
st_text = start_date.text
if end_date is not None:
end_text = end_date.text
print st_text, end_text
The xml file shared is not well formed XML. The last tag has to end with package tag. With this change done, programs produces:
共享的xml文件不是格式良好的xml。最后一个标签必须以包标签结束。当这个改变完成后,程序产生:
<Element '{http://apple.com/itunes/importer}interval' at 0x178b350>
2013-08-25 2014-09-01
2014-09-01 None
If its possible to change the library, you can look for using lxml. lxml has a great support for working with namespaces. Check out the quick short tutorial here http://lxml.de/tutorial.html#namespaces
如果可以更改库,可以使用lxml查找。lxml对使用名称空间有很大的支持。请查看这里的quick short教程http://lxml.de/tutories.html #名称空间。