一、什么是xml?有何特征?
xml即可扩展标记语言,它可以用来标记数据、定义数据类型,是一种允许用户对自己的标记语言进行定义的源语言。
例子:del.xml
1
2
3
4
5
6
7
8
9
10
11
12
13
|
<? xml version = "1.0" encoding = "utf-8" ?>
< catalog >
< maxid >4</ maxid >
< login username = "pytest" passwd = '123456' >
< caption >Python</ caption >
< item id = "4" >
< caption >test</ caption >
</ item >
</ login >
< item id = "2" >
< caption >Zope</ caption >
</ item >
</ catalog >
|
从结构上,很像HTML超文本标记语言。但他们被设计的目的是不同的,超文本标记语言被设计用来显示数据,其焦点是数据的外观。它被设计用来传输和存储数据,其焦点是数据的内容。
那么它有如下特征:
•它是有标签对组成, <aa></aa>
•标签可以有属性: <aa id='123'></aa>
•标签对可以嵌入数据: <aa>abc</aa>
•标签可以嵌入子标签(具有层级关系)
二、获得标签属性
1
2
3
4
5
6
7
8
9
|
#coding: utf-8
import xml.dom.minidom
dom = xml.dom.minidom.parse( "del.xml" ) #打开xml文档
root = dom.documentElement #得到xml文档对象
print "nodeName:" , root.nodeName #每一个结点都有它的nodeName,nodeValue,nodeType属性
print "nodeValue:" , root.nodeValue #nodeValue是结点的值,只对文本结点有效
print "nodeType:" , root.nodeType
print "ELEMENT_NODE:" , root.ELEMENT_NODE
|
nodeType是结点的类型。catalog是ELEMENT_NODE类型
现在有以下几种:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
|
'ATTRIBUTE_NODE'
'CDATA_SECTION_NODE'
'COMMENT_NODE'
'DOCUMENT_FRAGMENT_NODE'
'DOCUMENT_NODE'
'DOCUMENT_TYPE_NODE'
'ELEMENT_NODE'
'ENTITY_NODE'
'ENTITY_REFERENCE_NODE'
'NOTATION_NODE'
'PROCESSING_INSTRUCTION_NODE'
'TEXT_NODE'
|
运行结果
1
2
3
4
5
6
7
|
nodeName: catalog
nodeValue: None
nodeType: 1
ELEMENT_NODE: 1
|
三、获得子标签
1
2
3
4
5
6
7
8
9
10
11
|
#coding: utf-8
import xml.dom.minidom
dom = xml.dom.minidom.parse( "del.xml" )
root = dom.documentElement
bb = root.getElementsByTagName( 'maxid' )
print type (bb)
print bb
b = bb[ 0 ]
print b.nodeName
print b.nodeValue
|
运行结果
1
2
3
4
5
6
7
|
< class 'xml.dom.minicompat.NodeList' >
[<DOM Element: maxid at 0x2707a48 >]
maxid
None
|
四、获得标签属性值
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
|
#coding: utf-8
import xml.dom.minidom
dom = xml.dom.minidom.parse( "del.xml" )
root = dom.documentElement
itemlist = root.getElementsByTagName( 'login' )
item = itemlist[ 0 ]
print item.getAttribute( "username" )
print item.getAttribute( "passwd" )
itemlist = root.getElementsByTagName( "item" )
item = itemlist[ 0 ] #通过在itemlist中的位置区分
print item.getAttribute( "id" )
item2 = itemlist[ 1 ] #通过在itemlist中的位置区分
print item2.getAttribute( "id" )
|
运行结果
1
2
3
4
5
6
7
|
pytest
123456
4
2
|
五、获得标签对之间的数据
1
2
3
4
5
6
7
8
9
10
11
12
|
#coding: utf-8
import xml.dom.minidom
dom = xml.dom.minidom.parse( "del.xml" )
root = dom.documentElement
itemlist = root.getElementsByTagName( 'caption' )
item = itemlist[ 0 ]
print item.firstChild.data
item2 = itemlist[ 1 ]
print item2.firstChild.data
|
运行结果
1
2
3
|
Python
test
|
六、例子
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
|
<?xml version = "1.0" encoding = "UTF-8" ?>
<users>
<user id = "1000001" >
<username>Admin< / username>
<email>admin@live.cn< / email>
<age> 23 < / age>
<sex>boy< / sex>
< / user>
<user id = "1000002" >
<username>Admin2< / username>
<email>admin2@live.cn< / email>
<age> 22 < / age>
<sex>boy< / sex>
< / user>
<user id = "1000003" >
<username>Admin3< / username>
<email>admin3@live.cn< / email>
<age> 27 < / age>
<sex>boy< / sex>
< / user>
<user id = "1000004" >
<username>Admin4< / username>
<email>admin4@live.cn< / email>
<age> 25 < / age>
<sex>girl< / sex>
< / user>
<user id = "1000005" >
<username>Admin5< / username>
<email>admin5@live.cn< / email>
<age> 20 < / age>
<sex>boy< / sex>
< / user>
<user id = "1000006" >
<username>Admin6< / username>
<email>admin6@live.cn< / email>
<age> 23 < / age>
<sex>girl< / sex>
< / user>
< / users>
|
把name、email、age、sex输出
参考代码
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
|
# -*- coding:utf-8 -*-
from xml.dom import minidom
def get_attrvalue(node, attrname):
return node.getAttribute(attrname) if node else ''
def get_nodevalue(node, index = 0 ):
return node.childNodes[index].nodeValue if node else ''
def get_xmlnode(node, name):
return node.getElementsByTagName(name) if node else []
def get_xml_data(filename = 'user.xml' ):
doc = minidom.parse(filename)
root = doc.documentElement
user_nodes = get_xmlnode(root, 'user' )
print "user_nodes:" , user_nodes
user_list = []
for node in user_nodes:
user_id = get_attrvalue(node, 'id' )
node_name = get_xmlnode(node, 'username' )
node_email = get_xmlnode(node, 'email' )
node_age = get_xmlnode(node, 'age' )
node_sex = get_xmlnode(node, 'sex' )
user_name = get_nodevalue(node_name[ 0 ])
user_email = get_nodevalue(node_email[ 0 ])
user_age = int (get_nodevalue(node_age[ 0 ]))
user_sex = get_nodevalue(node_sex[ 0 ])
user = {}
user[ 'id' ] , user[ 'username' ] , user[ 'email' ] , user[ 'age' ] , user[ 'sex' ] = (
int (user_id), user_name , user_email , user_age , user_sex
)
user_list.append(user)
return user_list
def test_load_xml():
user_list = get_xml_data()
for user in user_list :
print '-----------------------------------------------------'
if user:
user_str = 'No.:\t%d\nname:\t%s\nsex:\t%s\nage:\t%s\nEmail:\t%s' % ( int (user[ 'id' ]) , user[ 'username' ], user[ 'sex' ] , user[ 'age' ] , user[ 'email' ])
print user_str
if __name__ = = "__main__" :
test_load_xml()
|
结果
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
|
C:\Users\wzh94434\Desktop\xml>python user.py
user_nodes: [<DOM Element: user at 0x2758c48 >, <DOM Element: user at 0x2756288 >,
<DOM Element: user at 0x2756888 >, <DOM Element: user at 0x2756e88 >, <DOM Elemen
t: user at 0x275e4c8 >, <DOM Element: user at 0x275eac8 >]
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
No.: 1000001
name: Admin
sex: boy
age: 23
Email: admin@live.cn
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
No.: 1000002
name: Admin2
sex: boy
age: 22
Email: admin2@live.cn
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
No.: 1000003
name: Admin3
sex: boy
age: 27
Email: admin3@live.cn
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
No.: 1000004
name: Admin4
sex: gril
age: 25
Email: admin4@live.cn
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
No.: 1000005
name: Admin5
sex: boy
age: 20
Email: admin5@live.cn
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
No.: 1000006
name: Admin6
sex: gril
age: 23
Email: admin6@live.cn
|
七、总结
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
|
minidom.parse(filename)
加载读取XML文件
doc.documentElement
获取XML文档对象
node.getAttribute(AttributeName)
获取XML节点属性值
node.getElementsByTagName(TagName)
获取XML节点对象集合
node.childNodes #返回子节点列表。
node.childNodes[index].nodeValue
获取XML节点值
node.firstChild
#访问第一个节点。等价于pagexml.childNodes[0]
doc = minidom.parse(filename)
doc.toxml('UTF-8')
返回Node节点的xml表示的文本
Node.attributes["id"]
a.name #就是上面的 "id"
a.value #属性的值
访问元素属性
|
好了,以上就是这篇文章的全部内容了,希望本文的内容对大家的学习或者工作能带来一定的帮助,如果有疑问大家可以留言交流,谢谢大家对服务器之家的支持。
原文链接:http://www.cnblogs.com/kaituorensheng/p/4493306.html