之前在转换数据集格式的时候需要将json转换到xml文件,用lxml包进行操作非常方便。
1. 写xml文件
a) 用etree和objectify
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
|
from lxml import etree, objectify
E = objectify.ElementMaker(annotate = False )
anno_tree = E.annotation(
E.folder( 'VOC2014_instance' ),
E.filename( "test.jpg" ),
E.source(
E.database( 'COCO' ),
E.annotation( 'COCO' ),
E.image( 'COCO' ),
E.url( "http://test.jpg" )
),
E.size(
E.width( 800 ),
E.height( 600 ),
E.depth( 3 )
),
E.segmented( 0 ),
)
etree.ElementTree(anno_tree).write( "text.xml" , pretty_print = True )
|
输出的test.xml文件内容如下:
```
如果需要在anno_tree的基础上加其他标签的话用append即可:
1
2
3
4
5
6
7
8
9
10
11
12
|
E2 = objectify.ElementMaker(annotate = False )
anno_tree2 = E2. object (
E.name( "person" ),
E.bndbox(
E.xmin( 100 ),
E.ymin( 200 ),
E.xmax( 300 ),
E.ymax( 400 )
),
E.difficult( 0 )
)
anno_tree.append(anno_tree2)
|
上面的输出就变成了:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
|
<annotation>
<folder>VOC2014_instance / person< / folder>
<filename>test.jpg< / filename>
<source>
<database>COCO< / database>
<annotation>COCO< / annotation>
<image>COCO< / image>
<url>http: / / test.jpg< / url>
< / source>
<size>
<width> 800 < / width>
<height> 600 < / height>
<depth> 3 < / depth>
< / size>
<segmented> 0 < / segmented>
< object >
<name>person< / name>
<bndbox>
<xmin> 100 < / xmin>
<ymin> 200 < / ymin>
<xmax> 300 < / xmax>
<ymax> 400 < / ymax>
< / bndbox>
<difficult> 0 < / difficult>
< / object >
< / annotation>
|
b) 用etree和SubElement
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
|
annotation = etree.Element( "annotation" )
etree.SubElement(annotation, "folder" ).text = "VOC2014_instance"
etree.SubElement(annotation, "filename" ).text = "test.jpg"
source = etree.SubElement(annotation, "source" )
etree.SubElement(source, "database" ).text = "COCO"
etree.SubElement(source, "annotation" ).text = "COCO"
etree.SubElement(source, "image" ).text = "COCO"
etree.SubElement(source, "url" ).text = "http://test.jpg"
size = etree.SubElement(annotation, "size" )
etree.SubElement(size, "width" ).text = '800' # 必须用string
etree.SubElement(size, "height" ).text = '600'
etree.SubElement(size, "depth" ).text = '3'
etree.SubElement(annotation, "segmented" ).text = '0'
key_object = etree.SubElement(annotation, "object" )
etree.SubElement(key_object, "name" ).text = “person”
bndbox = etree.SubElement(key_object, "bndbox" )
etree.SubElement(bndbox, "xmin" ).text = str ( 100 )
etree.SubElement(bndbox, "ymin" ).text = str ( 200 )
etree.SubElement(bndbox, "xmax" ).text = str ( 300 )
etree.SubElement(bndbox, "ymax" ).text = str ( 400 )
etree.SubElement(key_object, "difficult" ).text = '0'
doc = etree.ElementTree(annotation)
doc.write( open ( "test.xml" , "w" ), pretty_print = True )
|
2. 读xml
这里可以用xpath直接提取所需的元素的值。比如想要获取上面test.xml文件的x, y坐标:
1
2
3
4
5
|
tree = etree.parse( "test.xml" )
# get bbox
for bbox in tree.xpath( '//bndbox' ): # 获取bndbox元素的内容
for corner in bbox.getchildren(): # 便利bndbox元素下的子元素
print corner.text # string类型
|
参考
https://*.com/questions/12657043/parse-xml-with-lxml-extract-element-value
以上就是本文的全部内容,希望对大家的学习有所帮助,也希望大家多多支持服务器之家。
原文链接:http://www.cnblogs.com/arkenstone/p/7338978.html