xml.etree.ElementTree用于解析和构建XML文件
<?xml version="1.0"?> <data> <country name="Liechtenstein"> <rank>1</rank> <year>2008</year> <gdppc>141100</gdppc> <neighbor name="Austria" direction="E"/> <neighbor name="Switzerland" direction="W"/> </country> <country name="Singapore"> <rank>4</rank> <year>2011</year> <gdppc>59900</gdppc> <neighbor name="Malaysia" direction="N"/> </country> <country name="Panama"> <rank>68</rank> <year>2011</year> <gdppc>13600</gdppc> <neighbor name="Costa Rica" direction="W"/> <neighbor name="Colombia" direction="E"/> </country> </data>
解析XML文件
parse()函数,从xml文件返回ElementTree
from xml.etree.ElementTree import parse tree = parse('demo.xml') //获取ElementTree root = tree.getroot() // 获取根元素
Element.tag 、Element.attrib、Element.text
In [6]: root.tag Out[6]: 'data' In [7]: root.attrib Out[7]: {} In [25]: root.text Out[25]: '\n '
for child in root 迭代获得子元素
In [8]: for child in root: ...: print(child.tag, child.attrib) ...: country {'name': 'Liechtenstein'} country {'name': 'Singapore'} country {'name': 'Panama'}
Element.get() 获得属性值
In [27]: for child in root: ...: print (child.tag, child.get('name')) ...: country Liechtenstein country Singapore country Panama
root.getchildren() 获得直接子元素
In [21]: root.getchildren() Out[21]: [<Element 'country' at 0x7f673581c728>, <Element 'country' at 0x7f673581ca98>, <Element 'country' at 0x7f673581cc28>]
root[0][1] 根据索引查找子元素
In [9]: root[0][1].text Out[9]: '2008' In [10]: root[1][0].text Out[10]: '4'
root.find() 根据tag查找直接子元素,返回查到的第一个元素
In [13]: root.find('country').attrib Out[13]: {'name': 'Liechtenstein'}
root.findall() 根据tag查找直接子元素,返回查到的所有元素的列表
In [16]: for country in root.findall('country'): ...: print (country.attrib) ...: {'name': 'Liechtenstein'} {'name': 'Singapore'} {'name': 'Panama'}
root.iterfind() 根据tag查找直接子元素,返回查到的所有元素的生成器
In [22]: root.iterfind('country') Out[22]: <generator object prepare_child.<locals>.select at 0x7f6736dccfc0>
支持的XPath语句(XML Path)
In [19]: root.findall('.//rank') //查找任意层次元素 Out[19]: [<Element 'rank' at 0x7f673581c8b8>, <Element 'rank' at 0x7f673581c6d8>, <Element 'rank' at 0x7f673581cc78>] In [32]: root.findall('country/*') //查找孙子节点元素 Out[32]: [<Element 'rank' at 0x7f673581c8b8>, <Element 'year' at 0x7f673581cbd8>, <Element 'gdppc' at 0x7f673581c958>, <Element 'neighbor' at 0x7f673581c688>, <Element 'neighbor' at 0x7f673581cb38>, <Element 'rank' at 0x7f673581c6d8>, <Element 'year' at 0x7f673581c5e8>, <Element 'gdppc' at 0x7f673581c868>, <Element 'neighbor' at 0x7f673581cb88>, <Element 'rank' at 0x7f673581cc78>, <Element 'year' at 0x7f673581ccc8>, <Element 'gdppc' at 0x7f673581cd18>, <Element 'neighbor' at 0x7f673581cd68>, <Element 'neighbor' at 0x7f673581cdb8>] In [33]: root.findall('.//rank/..') // ..表示父元素 Out[33]: [<Element 'country' at 0x7f673581c728>, <Element 'country' at 0x7f673581ca98>, <Element 'country' at 0x7f673581cc28>] In [34]: root.findall('country[@name]') // 包含name属性的country Out[34]: [<Element 'country' at 0x7f673581c728>, <Element 'country' at 0x7f673581ca98>, <Element 'country' at 0x7f673581cc28>] In [35]: root.findall('country[@name="Singapore"]') // name属性为Singapore的country Out[35]: [<Element 'country' at 0x7f673581ca98>] In [36]: root.findall('country[rank]') // 孩子元素中包含rank的country Out[36]: [<Element 'country' at 0x7f673581c728>, <Element 'country' at 0x7f673581ca98>, <Element 'country' at 0x7f673581cc28>] In [37]: root.findall('country[rank="68"]') // 孩子元素中包含rank且rank元素的text为68的country Out[37]: [<Element 'country' at 0x7f673581cc28>] In [38]: root.findall('country[1]') // 第一个country Out[38]: [<Element 'country' at 0x7f673581c728>] In [39]: root.findall('country[last()]') // 最后一个country Out[39]: [<Element 'country' at 0x7f673581cc28>] In [40]: root.findall('country[last()-1]') // 倒数第二个country Out[40]: [<Element 'country' at 0x7f673581ca98>]
root.iter() 递归查询指定的或所有子元素
In [29]: root.iter() Out[29]: <_elementtree._element_iterator at 0x7f67355dd728> In [30]: list(root.iter()) Out[30]: [<Element 'data' at 0x7f673581c778>, <Element 'country' at 0x7f673581c728>, <Element 'rank' at 0x7f673581c8b8>, <Element 'year' at 0x7f673581cbd8>, <Element 'gdppc' at 0x7f673581c958>, <Element 'neighbor' at 0x7f673581c688>, <Element 'neighbor' at 0x7f673581cb38>, <Element 'country' at 0x7f673581ca98>, <Element 'rank' at 0x7f673581c6d8>, <Element 'year' at 0x7f673581c5e8>, <Element 'gdppc' at 0x7f673581c868>, <Element 'neighbor' at 0x7f673581cb88>, <Element 'country' at 0x7f673581cc28>, <Element 'rank' at 0x7f673581cc78>, <Element 'year' at 0x7f673581ccc8>, <Element 'gdppc' at 0x7f673581cd18>, <Element 'neighbor' at 0x7f673581cd68>, <Element 'neighbor' at 0x7f673581cdb8>] In [31]: list(root.iter('rank')) Out[31]: [<Element 'rank' at 0x7f673581c8b8>, <Element 'rank' at 0x7f673581c6d8>, <Element 'rank' at 0x7f673581cc78>]