Python学习——

lxml是Python语言中处理XML和HTML功能最丰富，最易于使用的库。

lxml是libxml2和libxslt两个C库的Python化绑定，它的独特之处在于兼顾了这些库的速度和功能完整性，同时还具有Python API的简介。兼容ElementTree API,但是比它更优越。

用libxml2编程就像是一个异于常人的陌生人的令人惊恐的拥抱，它看上去可以满足你一切疯狂的梦想，但是你的内心深处一直在警告你，你有可能会以最糟糕的方式遭殃，所以就有了lxml。

由于我从来没有用过ElementTree，所以对于两者的比较也没有看，直接看的教程了。

教程

作者：Stefan Behnel

这是一个用来处理XML的教程，它简单的概述了ElementTree API的主要概念，同时有一些能让你的程序生涯更轻松的简单的提高。

这里附上链接：/

我也不一定会把这个官方指南全部翻译完，第一次尝试翻译，也存在很多问题，欢迎批评指导。

首先是导入的方式:from lxml import etree

为了协助代码的可移植性，本教程中的例子很明显可以看出，一部分API是在ElementTree API（由Fredrik Lundh 的ElementTree库定义）的基础上的扩展。（官方文档上附有链接）
第一章 The Element class

Element是ElementTree API的主要容器类，大部分XML tree的功能都是通过这个类来实现的，Element的创建很容易：root = ("root")
element的XML tag名通过tag属性来访问
>>>print
root

许多Element被组织成一个XML树状结构，创建一个子element并添加进父element使用append方法：
>>>(("child1"))

还有一个更简短更有效的方法：the SubElement，它的参数和element一样，但是需要父element作为第一个参数：
>>>child2 = (root,"child2")
>>>child3 = (root,"child3")

可以序列化你创建的树：
>>>print((root, pretty_print=True))
<root>
<child1/>
<child2/>
<child3/>
</root>

为了更方便直观的访问这些子节点，element模仿了正常的Python链：

>>> child = root[0]
>>> print()

child1

>>> print(len(root))

>>> (root[1]) # only!

>>> children = list(root)
>>> for child in root:
... print()
child1
child2

child3

>>> (0, ("child0"))
>>> start = root[:1]
>>> end = root[-1:]

>>> print(start[0].tag)
child0
>>> print(end[0].tag)
child3

还可以根据element的真值看其是否有孩子节点：
if root: # this no longer works!
print("The root element has children")
用len(element)更直观，且不容易出错：
>>> print((root)) # test if it's some kind of Element
True
>>> if len(root): # test if it has children
... print("The root element has children")
The root element has children

还有一个重要的特性，原文的句子只可意会，看例子应该是能看懂什么意思吧。

>>> for child in root:
... print()
child0
child1
child2
child3
>>> root[0] = root[-1] #移动了element
>>> for child in root:
... print()
child3
child1
child2

>>> l = [0, 1, 2, 3]
>>> l[0] = l[-1]
>>> l
[3, 1, 2, 3]

>>> root is root[0].getparent() # only!
True

If you want to copy an element to a different position in , consider creating an independent deep copy using the copy module from Python's standard library:

>>> from copy import deepcopy

>>> element = ("neu")
>>> ( deepcopy(root[1]) )

>>> print(element[0].tag)
child1
>>> print([ for c in root ])
['child3', 'child1', 'child2']

XML支持属性，创建方式如下：
>>> root = ("root", interesting="totally")
>>> (root)
b'<root interesting="totally"/>'

属性是无序的键值对，所以可以用element类似于字典接口的方式处理：
>>> print(("interesting"))
totally

>>> print(("hello"))
None
>>> ("hello", "Huhu")
>>> print(("hello"))
Huhu

>>> (root)
b'<root interesting="totally" hello="Huhu"/>'

>>> sorted(())
['hello', 'interesting']

>>> for name, value in sorted(()):
... print('%s = %r' % (name, value))
hello = 'Huhu'
interesting = 'totally'

如果需要获得一个类似dict的对象，可以使用attrib属性：
>>> attributes =

>>> print(attributes["interesting"])
totally
>>> print(("no-such-attribute"))
None

>>> attributes["hello"] = "Guten Tag"
>>> print(attributes["hello"])
Guten Tag
>>> print(("hello"))
Guten Tag

既然attrib是element本身支持的类似dict的对象，这就意味着任何对element的改变都会影响attrib，反之亦然。这还意味着只要element的任何一个attrib还在使用，XML树就一直在内存中。通过如下方法，可以获得一个独立于XML树的attrib的快照：
>>> d = dict()
>>> sorted(())
[('hello', 'Guten Tag'), ('interesting', 'totally')]

秒客网

Python学习——

教程

相关文章