使用python脚本修改xml文件

How do I modify the following xml snippet

如何修改以下xml片段

<routes xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="http://sumo.dlr.de/xsd/routes_file.xsd">
    <vType id="car1_73" length="4.70" minGap="1.00" maxSpeed="12.76" probability="0.00" vClass="passenger" guiShape="passenger/van">
        <carFollowing-Krauss accel="2.40" decel="4.00" sigma="0.55"/>
    </vType>
    <vehicle id="0" type="vTypeDist" depart="0.00" departLane="best" departPos="random" departSpeed="random">
        <routeDistribution last="1">
            <route cost="108.41" probability="0.44076116" edges="bottom7to7/0 7/0to6/0 6/0to6/1 6/1to5/1 5/1to5/2 5/2to6/2"/>
            <route cost="76.56" probability="0.55923884" edges="bottom7to7/0 7/0to6/0 6/0to5/0 5/0to5/1 5/1to5/2 5/2to6/2"/>
        </routeDistribution>
    </vehicle>
</routes>

so that the resulting one looks like this:

这样得到的结果如下：

<routes xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="http://sumo.dlr.de/xsd/routes_file.xsd">
    <vehicle id="0" type="vTypeDist" depart="0.00" departLane="best" departPos="random" departSpeed="random">
        <route edges="bottom7to7/0 7/0to6/0 6/0to5/0 5/0to5/1 5/1to5/2 5/2to6/2"/>
    </vehicle>
</routes>

Basically the following has to be done

基本上必须完成以下工作

remove the <vtype> (and the <carFollowing...> elements within it) completely,
完全删除（以及其中的元素），
remove the <routeDistribution...>,
删除，
create <route> element which keeps only the last edges attribute from within <routeDistribution...> element.
create 元素，它仅保留元素中的最后一个edge属性。

EDIT: Here I provide my version using xml.etree.ElementTree. Why all the downvotes though... :/

编辑：这里我使用xml.etree.ElementTree提供我的版本。为什么所有的downvotes虽然......：/

import xml.etree.ElementTree as ET


if __name__ == "__main__":

tree = ET.parse('total-test.xml')
root = tree.getroot()

# remove <carFollowing> subelement from each vType 
vTypes = root.findall("vType")
for vType in vTypes:
    carFollowings = vType.findall("carFollowing-Krauss")
    for carFollowing in carFollowings:
         vType.remove(carFollowing)

# remove each <vType> (to remove an element reference to its parent is required)
for element in root:
    if element.tag == "vType":
        root.remove(element)

# from root get into <vehicle>
vehicles = root.findall("vehicle")
for vehicle in vehicles:
    # for each <vehicle> get reference <routeDistribution>s
    routeDistributions = vehicle.findall("routeDistribution")
    for routeDist in routeDistributions:
        # for each vehicle distrbution get reference to <route>s
        routes = routeDist.findall("route")

        # fill a container with dictionaries which represent <route> attributes
        listOfRouteDicts = list()
        for route in routes:
            listOfRouteDicts.append(route.attrib)

        # find the min_cost for the given routes
        min_cost = min(float(routeDict['cost']) for routeDict in listOfRouteDicts)
        print(min_cost)

        for route in routes:
            if route.get('cost') == str(min_cost):
                # remove the other attributes of the <route>, we only want the <edges>
                route.attrib = {routeAttr:v for routeAttr,v in route.attrib.items() if routeAttr == "edges"}
                vehicle.append(route)   # move route one level-up to <vehicle> because <routeDistribution> needs to be removed 
            else:
                routeDist.remove(route) # remove all routes which don't have the lowest cost

    # remove the <routeDistribution> for each <vehicle> 
    vehicle.remove(routeDist)
    vehicle.set('type', 'vTypeDist')


tree.write('output.xml')

1 个解决方案

#1

Probably you need something a bit more generic. the following script takes your input (in.xml) and generate the new output (out.xml). For sure this is not really good coding but it can get you started with the syntax and help you generalize this for your needs.

可能你需要一些更通用的东西。以下脚本接受您的输入（in.xml）并生成新输出（out.xml）。当然，这不是很好的编码，但它可以帮助您开始使用语法并帮助您根据需要进行概括。

from xml.dom.minidom import parse, parseString

dom = parse("in.xml" )   # parse an XML file
docRoot = dom.documentElement

# delete all vType
vTypeNode = docRoot.getElementsByTagName('vType')[0]
docRoot.removeChild(vTypeNode)

#i keep only first route node... second is the same... 
#but i am not sure if this will always be the case
routeNode = docRoot.getElementsByTagName('route')[0]

#remove all old route nodes
vehicleNode = docRoot.getElementsByTagName('vehicle')[0]
for child in vehicleNode.childNodes:
    if child.nodeType == child.ELEMENT_NODE:
        vehicleNode.removeChild(child) 

#create a new route node
newRouteNode = dom.createElement("route")
newRouteNode.setAttribute("edges"  , routeNode.getAttribute("edges"))

#append new node
vehicleNode.appendChild(newRouteNode)

#print output
#print dom.toprettyxml()

#write to file
outFile = open("out.xml","wb")
dom.writexml(outFile)
outFile.close()

N.B: this is just a quick and dirty to get you started!!!

N.B：这只是一个快速而肮脏的开始！

EDIT:

编辑：

minidom ouptus is always quite dirty as it contains many useless white spaces. This is a well known problem but can be easily fixed in different ways. You might be interested having alook here:

minidom ouptus总是非常脏，因为它包含许多无用的空白区域。这是众所周知的问题，但可以以不同的方式容易地修复。你可能有兴趣在这里说：

problem with the new lines when I use toprettyxml()

使用toprettyxml（）时新行的问题

#1

from xml.dom.minidom import parse, parseString

dom = parse("in.xml" )   # parse an XML file
docRoot = dom.documentElement

# delete all vType
vTypeNode = docRoot.getElementsByTagName('vType')[0]
docRoot.removeChild(vTypeNode)

#i keep only first route node... second is the same... 
#but i am not sure if this will always be the case
routeNode = docRoot.getElementsByTagName('route')[0]

#remove all old route nodes
vehicleNode = docRoot.getElementsByTagName('vehicle')[0]
for child in vehicleNode.childNodes:
    if child.nodeType == child.ELEMENT_NODE:
        vehicleNode.removeChild(child) 

#create a new route node
newRouteNode = dom.createElement("route")
newRouteNode.setAttribute("edges"  , routeNode.getAttribute("edges"))

#append new node
vehicleNode.appendChild(newRouteNode)

#print output
#print dom.toprettyxml()

#write to file
outFile = open("out.xml","wb")
dom.writexml(outFile)
outFile.close()