使用lxml解析xml文件。

时间:2022-12-01 11:05:00

I'm trying to edit an xml file by finding each Watts tag and changing the text in it. So far I've managed to change all tags, but not the Watts tag specifically.

我正在尝试编辑一个xml文件,找到每一个美国瓦茨标签,并改变其中的文本。到目前为止,我已经设法改变了所有的标签,但不是特别的瓦特标签。

My parser is:

我的解析器:

from lxml import etree
tree = etree.parse("cycling.xml")
root = tree.getroot()

for watt in root.iter():
    if watt.tag == "Watts":
        watt.text = "strong"

tree.write("output.xml")

This keeps my cycling.xml file unchanged. A snippet from output.xml (which is also the cycling.xml file since this is unchanged) is:

这使我的自行车。xml文件不变。输出的一个片段。xml(也就是自行车)。xml文件,因为这是不变的)是:

<TrainingCenterDatabase xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://www.garmin.com/xmlschemas/TrainingCenterDatabase/v2">
  <Activities>
    <Activity Sport="Biking">
      <Id>2018-05-06T20:49:56Z</Id>
      <Lap StartTime="2018-05-06T20:49:56Z">
        <TotalTimeSeconds>2495.363</TotalTimeSeconds>
        <DistanceMeters>15345</DistanceMeters>
        <MaximumSpeed>18.4</MaximumSpeed>
        <Calories>0</Calories>
        <Intensity>Active</Intensity>
        <TriggerMethod>Manual</TriggerMethod>
        <Track>
          <Trackpoint>
            <Time>2018-05-06T20:49:56Z</Time>
            <Position>
              <LatitudeDegrees>49.319297</LatitudeDegrees>
              <LongitudeDegrees>-123.024128</LongitudeDegrees>
            </Position>
            <HeartRateBpm>
              <Value>99</Value>
            </HeartRateBpm>
            <Extensions>
              <TPX xmlns="http://www.garmin.com/xmlschemas/ActivityExtension/v2">
                <Watts>0</Watts>
                <Speed>2</Speed>
              </TPX>
            </Extensions>
          </Trackpoint>

If I change my parser to change all tags with:

如果我更改解析器以更改所有标记:

for watt in root.iter():
    if watt.tag != "Watts":
        watt.text = "strong"

Then my output.xml file becomes:

然后我的输出。xml文件是:

<TrainingCenterDatabase xmlns="http://www.garmin.com/xmlschemas/TrainingCenterDatabase/v2">strong<Activities>strong<Activity Sport="Biking">strong<Id>strong</Id>
      <Lap StartTime="2018-05-06T20:49:56Z">strong<TotalTimeSeconds>strong</TotalTimeSeconds>
        <DistanceMeters>strong</DistanceMeters>
        <MaximumSpeed>strong</MaximumSpeed>
        <Calories>strong</Calories>
        <Intensity>strong</Intensity>
        <TriggerMethod>strong</TriggerMethod>
        <Track>strong<Trackpoint>strong<Time>strong</Time>
            <Position>strong<LatitudeDegrees>strong</LatitudeDegrees>
              <LongitudeDegrees>strong</LongitudeDegrees>
            </Position>
            <HeartRateBpm>strong<Value>strong</Value>
            </HeartRateBpm>
            <Extensions>strong<TPX xmlns="http://www.garmin.com/xmlschemas/ActivityExtension/v2">strong<Watts>strong</Watts>
                <Speed>strong</Speed>
              </TPX>
            </Extensions>
          </Trackpoint>
          <Trackpoint>strong<Time>strong</Time>
            <Position>strong<LatitudeDegrees>strong</LatitudeDegrees>
              <LongitudeDegrees>strong</LongitudeDegrees>
            </Position>
            <AltitudeMeters>strong</AltitudeMeters>
            <HeartRateBpm>strong<Value>strong</Value>
            </HeartRateBpm>
            <Extensions>strong<TPX xmlns="http://www.garmin.com/xmlschemas/ActivityExtension/v2">strong<Watts>strong</Watts>
                <Speed>strong</Speed>
              </TPX>
            </Extensions>
          </Trackpoint>
  1. How can I change just the Watts tag?
  2. 我怎样才能改变瓦茨标签?
  3. I don't understand what the root = tree.getroot() does. I just thought I'd ask this question at the same time, although I'm not sure it matters in my particular problem.
  4. 我不明白什么是根=树。我只是觉得我应该同时问这个问题,虽然我不确定这对我的具体问题有什么影响。

2 个解决方案

#1


2  

Your document defines a default XML namespace. Look at the xmlns= attribute at the end of the opening tag:

文档定义了一个默认的XML名称空间。在开始标记的末尾查看xmlns=属性:

<TrainingCenterDatabase
  xmlns:xsd="http://www.w3.org/2001/XMLSchema"
  xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
  xmlns="http://www.garmin.com/xmlschemas/TrainingCenterDatabase/v2">

This means there is no element named "Watts" in your document; you will need to qualify tag names with the appropriate namespace. If you print out the value of watt.tag in our loop, you will see:

这意味着文档中没有名为“Watts”的元素;您需要用适当的名称空间限定标签名称。如果你打印出瓦特的值。在我们的循环中,你会看到:

$ python filter.py 
{http://www.garmin.com/xmlschemas/TrainingCenterDatabase/v2}TrainingCenterDatabase
[...]
{http://www.garmin.com/xmlschemas/ActivityExtension/v2}Watts
{http://www.garmin.com/xmlschemas/ActivityExtension/v2}Speed

With this in mind, you can modify your filter so that it looks like this:

考虑到这一点,你可以修改你的过滤器,让它看起来像这样:

from lxml import etree
tree = etree.parse("cycling.xml")
root = tree.getroot()

for watt in root.iter():
    if watt.tag == "{http://www.garmin.com/xmlschemas/ActivityExtension/v2}Watts":
        watt.text = "strong"

tree.write("output.xml")

You can read more about namespace handling in the lxml documentation.

您可以在lxml文档中阅读更多有关名称空间处理的内容。

#2


0  

Alternatively, since you use two important words edit xml and you are using lxml, consider XSLT (the XML transformation language) where you can define a namespace prefix and change Watts anywhere in document without looping. Plus, you can pass values into XSLT from Python!

或者,由于您使用了两个重要的单词编辑xml,并且使用了lxml,所以可以考虑XSLT (xml转换语言),您可以在其中定义名称空间前缀,并在文档中不循环地对其进行更改。此外,您还可以从Python传递值到XSLT !

XSLT (save as .xsl file)

XSLT(保存为.xsl文件)

<?xml version="1.0"?>
<xsl:transform xmlns:xsl="http://www.w3.org/1999/XSL/Transform"              
               xmlns:doc="http://www.garmin.com/xmlschemas/ActivityExtension/v2" version="1.0">
    <xsl:output version="1.0" encoding="UTF-8" omit-xml-declaration="no" indent="yes"/>
    <xsl:strip-space elements="*"/>

    <!-- VALUE TO BE PASSED INTO FROM PYTHON -->
    <xsl:param name="python_value">

    <!-- Identity Transform -->
    <xsl:template match="@*|node()">
        <xsl:copy>
           <xsl:apply-templates select="@*|node()"/>
       </xsl:copy>
    </xsl:template>

    <!-- ADJUST WATTS TEXT -->
    <xsl:template match="doc:Watts">
        <xsl:copy><xsl:value-of select="$python_value"/></xsl:copy>
    </xsl:template>

</xsl:transform>

Python

Python

from lxml import etree

# LOAD XML AND XSL
doc = etree.parse("cycling.xml")
xsl = etree.parse('XSLT_Script.xsl')

# CONFIGURE TRANSFORMER
transform = etree.XSLT(xsl)    

# RUN TRANSFORMATION WITH PARAM
n = etree.XSLT.strparam('Strong')
result = transform(doc, python_value=n)

# PRINT TO CONSOLE
print(result) 

# SAVE TO FILE
with open('Output.xml', 'wb') as f:
    f.write(result)

#1


2  

Your document defines a default XML namespace. Look at the xmlns= attribute at the end of the opening tag:

文档定义了一个默认的XML名称空间。在开始标记的末尾查看xmlns=属性:

<TrainingCenterDatabase
  xmlns:xsd="http://www.w3.org/2001/XMLSchema"
  xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
  xmlns="http://www.garmin.com/xmlschemas/TrainingCenterDatabase/v2">

This means there is no element named "Watts" in your document; you will need to qualify tag names with the appropriate namespace. If you print out the value of watt.tag in our loop, you will see:

这意味着文档中没有名为“Watts”的元素;您需要用适当的名称空间限定标签名称。如果你打印出瓦特的值。在我们的循环中,你会看到:

$ python filter.py 
{http://www.garmin.com/xmlschemas/TrainingCenterDatabase/v2}TrainingCenterDatabase
[...]
{http://www.garmin.com/xmlschemas/ActivityExtension/v2}Watts
{http://www.garmin.com/xmlschemas/ActivityExtension/v2}Speed

With this in mind, you can modify your filter so that it looks like this:

考虑到这一点,你可以修改你的过滤器,让它看起来像这样:

from lxml import etree
tree = etree.parse("cycling.xml")
root = tree.getroot()

for watt in root.iter():
    if watt.tag == "{http://www.garmin.com/xmlschemas/ActivityExtension/v2}Watts":
        watt.text = "strong"

tree.write("output.xml")

You can read more about namespace handling in the lxml documentation.

您可以在lxml文档中阅读更多有关名称空间处理的内容。

#2


0  

Alternatively, since you use two important words edit xml and you are using lxml, consider XSLT (the XML transformation language) where you can define a namespace prefix and change Watts anywhere in document without looping. Plus, you can pass values into XSLT from Python!

或者,由于您使用了两个重要的单词编辑xml,并且使用了lxml,所以可以考虑XSLT (xml转换语言),您可以在其中定义名称空间前缀,并在文档中不循环地对其进行更改。此外,您还可以从Python传递值到XSLT !

XSLT (save as .xsl file)

XSLT(保存为.xsl文件)

<?xml version="1.0"?>
<xsl:transform xmlns:xsl="http://www.w3.org/1999/XSL/Transform"              
               xmlns:doc="http://www.garmin.com/xmlschemas/ActivityExtension/v2" version="1.0">
    <xsl:output version="1.0" encoding="UTF-8" omit-xml-declaration="no" indent="yes"/>
    <xsl:strip-space elements="*"/>

    <!-- VALUE TO BE PASSED INTO FROM PYTHON -->
    <xsl:param name="python_value">

    <!-- Identity Transform -->
    <xsl:template match="@*|node()">
        <xsl:copy>
           <xsl:apply-templates select="@*|node()"/>
       </xsl:copy>
    </xsl:template>

    <!-- ADJUST WATTS TEXT -->
    <xsl:template match="doc:Watts">
        <xsl:copy><xsl:value-of select="$python_value"/></xsl:copy>
    </xsl:template>

</xsl:transform>

Python

Python

from lxml import etree

# LOAD XML AND XSL
doc = etree.parse("cycling.xml")
xsl = etree.parse('XSLT_Script.xsl')

# CONFIGURE TRANSFORMER
transform = etree.XSLT(xsl)    

# RUN TRANSFORMATION WITH PARAM
n = etree.XSLT.strparam('Strong')
result = transform(doc, python_value=n)

# PRINT TO CONSOLE
print(result) 

# SAVE TO FILE
with open('Output.xml', 'wb') as f:
    f.write(result)