Python从xml中提取数据并将其保存到excel

I would like to extract some data from an XML file and save it in a table format, such as XLS or DBF.

我想从XML文件中提取一些数据并将其保存为表格格式,例如XLS或DBF。

Here is XML file i have:

这是我有的XML文件:

<?xml version="1.0" encoding="utf-8"?>
<SOAP-ENV:Envelope xmlns:SOAP-ENV="http://schemas.xmlsoap.org/soap/envelope/">
  <SOAP-ENV:Header />
  <SOAP-ENV:Body>
    <ADD_LandIndex_001>
      <CNTROLAREA>
        <BSR>
          <VERB>ADD</VERB>
          <NOUN>LandIndex</NOUN>
          <REVISION>001</REVISION>
        </BSR>
      </CNTROLAREA>
      <DATAAREA>
        <LandIndex>
          <reportId>AMI100031</reportId>
          <requestKey>R3278458</requestKey>
          <SubmittedBy>EN4871</SubmittedBy>
          <submittedOn>2015/01/06 4:20:11 PM</submittedOn>
          <LandIndex>
            <agreementdetail>
              <agreementid>001       4860</agreementid>
              <agreementtype>NATURAL GAS</agreementtype>
              <currentstatus>
                <status>ACTIVE</status>
                <statuseffectivedate>1965/02/18</statuseffectivedate>
                <termdate>1965/02/18</termdate>
              </currentstatus>
              <designatedrepresentative>
              </designatedrepresentative>
            </agreementdetail>
          </LandIndex>
        </LandIndex>
      </DATAAREA>
    </ADD_LandIndex_001>
  </SOAP-ENV:Body>
</SOAP-ENV:Envelope>

I am interested in information inside the agreementdetail tag which is under DATAAREA/LandIndex/LandIndex/

我对在DATAAREA / LandIndex / LandIndex /下的agreementdetail标签内的信息感兴趣

UPDATE:

Thanks to MattDMo this task has moved a bit from its dead point. So I made this script below. It iterates the file and gets all instances of the agreementdetail and outputs agreementid and agreementtype for each.

感谢MattDMo,这项任务已经从它的死点转移了一点。所以我在下面制作了这个脚本。它迭代文件并获取协议的所有实例,并为每个实例输出agreementid和agreementtype。

import xml.etree.ElementTree as ET
import arcpy

xmlfile = 'D:/Working/Test/Test.xml'
element_tree = ET.parse(xmlfile)
root = element_tree.getroot()
agreement = root.findall(".//agreementdetail")
result = []
elements = ('agreementid', 'agreementtype')

for a in agreement:
    obj = {}
    for e in elements:
        obj[e] = a.find(e).text
    result.append(obj)

arcpy.AddMessage(result)

The output I am receiving consists of a bunch of this strings: {'agreementid': '001 4860', 'agreementtype': 'NATURAL GAS'}

我收到的输出包括一堆这样的字符串:{'agreementid':'001 4860','agreementtype':'NATURAL GAS'}

Now I need to convert this output into a table format (.csv, .dbf, .xls etc.) so that agreementid and agreementtype are columns:

现在我需要将此输出转换为表格式(.csv,.dbf,.xls等),以便agreementid和agreementtype是列:

agreementid    | agreementtype 
001       4860 | NATURAL GAS

I will be very grateful if you could guide me on how to accomplish it. Or maybe any example?

如果你能指导我如何完成它,我将非常感激。或者也许是一个例子?

P.S. Python version is 2.7

附: Python版本是2.7

2 个解决方案

#1

The following should work:

以下应该有效:

import xml.etree.ElementTree as ET
import arcpy

xmlfile = 'D:/Working/Test/Test.xml'
element_tree = ET.parse(xmlfile)
root = element_tree.getroot()
agreement = root.find(".//agreementid").text
arcpy.AddMessage(agreement)

The root.find() call uses an XPath expression (quick cheatsheet is in the Python docs here) to find the first tag at any level under the current level named agreementid. If there are multiple tags named that in your file, you can use root.findall() and iterate over the results. If, for example, there are three fields named agreementid, and you know you want the second one, then root.findall(".//agreementid")[1] should work.

root.find()调用使用XPath表达式(快速cheatsheet在这里的Python文档中)来查找当前级别下名为agreementid的任何级别的第一个标记。如果文件中有多个标记名称,则可以使用root.findall()并迭代结果。例如,如果有三个名为agreementid的字段,并且您知道您想要第二个字段,则root.findall(“。// agreementid”)[1]应该有效。

#2

MattDMo has given a sufficient answer to the problem, but I just want to remind you that python has a csv module which makes it easier to write comma separated data, which is typically then read into applications such as databases or spreadsheets.

MattDMo已经给出了足够的答案,但我只想提醒你,python有一个csv模块,可以更容易地编写逗号分隔数据,然后通常将其读入数据库或电子表格等应用程序。

From the docs:

来自文档:

import csv
with open('eggs.csv', 'wb') as csvfile:
    spamwriter = csv.writer(csvfile, delimiter=' ',
                        quotechar='|', quoting=csv.QUOTE_MINIMAL)
    spamwriter.writerow(['Spam'] * 5 + ['Baked Beans'])
    spamwriter.writerow(['Spam', 'Lovely Spam', 'Wonderful Spam'])

#1