Python版本2.7:XML ElementTree:如何遍历子元素的某些元素以找到匹配

I'm a programming novice and only rarely use python so please bear with me as I try to explain what I am trying to do :)

我是一个编程新手，很少使用python，所以请接受我的解释:

I have the following XML:

我有以下XML:

<?xml version = "1.0" encoding = "utf-8"?>
<Patients>
    <Patient>
               <PatientCharacteristics>
                   <patientCode>3</patientCode>
               </PatientCharacteristics>
               <Visits>
                   <Visit>
                          <DAS>
                               <CRP>14</CRP>
                               <ESR/>
                               <Joints>
                                       <DAS_PROFILE>28/28</DAS_PROFILE>
                                       <SWOL28>20</SWOL28>
                                       <TEN28>20</TEN28>
                               </Joints>
                          </DAS>
                          <VisitDate>2010-02-17</VisitDate>
                   </Visit>
                   <Visit>
                          <DAS>
                               <CRP>10</CRP>
                               <ESR/>
                               <Joints>
                                       <DAS_PROFILE>28/28</DAS_PROFILE>
                                       <SWOL28>15</SWOL28>
                                       <TEN28>20</TEN28>
                               </Joints>
                          </DAS>
                          <VisitDate>2010-02-10</VisitDate>
                   </Visit>
               </Visits>
    </Patient>
    <Patient>
        <PatientCharacteristics>
                   <patientCode>3</patientCode>
        </PatientCharacteristics>
               <Visits>
                   <Visit>
                          <DAS>
                               <CRP>14</CRP>
                               <ESR/>
                               <Joints>
                                       <DAS_PROFILE>28/28</DAS_PROFILE>
                                       <SWOL28>34</SWOL28>
                                       <TEN28>0</TEN28>
                               </Joints>
                          </DAS>
                          <VisitDate>2010-08-17</VisitDate>
                   </Visit>
                   <Visit>
                          <DAS>
                               <CRP>10</CRP>
                               <ESR/>
                               <Joints>
                                       <DAS_PROFILE>28/28</DAS_PROFILE>
                                       <SWOL28></SWOL28>
                                       <TEN28>2</TEN28>
                               </Joints>
                          </DAS>
                          <VisitDate>2010-07-10</VisitDate>
                   </Visit>
                   <Visit>
                          <DAS>
                               <CRP>9</CRP>
                               <ESR/>
                               <Joints>
                                       <DAS_PROFILE>28/28</DAS_PROFILE>
                                       <SWOL28>56</SWOL28>
                                       <TEN28>6</TEN28>
                               </Joints>
                          </DAS>
                          <VisitDate>2009-07-10</VisitDate>
                   </Visit>
               </Visits>

    </Patient>
</Patients>

All I want to do here is update certain 'SWOL28' values if they match the patientCode and VisitDate that I have stored in a text file. As I understand, elementtree does not include a parent reference, as if it did, I could just use findall() from the root and work backwards from there. As it stands here is my psuedocode:

这里我要做的就是更新某些“SWOL28”值，如果它们与我在文本文件中存储的patientCode和VisitDate匹配。正如我所理解的，elementtree不包含父引用，就好像它不包含父引用一样，我可以从根中使用findall()，然后从那里向后操作。这是我的psuedocode:

For each line in the text file:
对于文本文件中的每一行:
Put Visit_Date Patient_Code New_SWOL28 into variables
将Visit_Date Patient_Code New_SWOL28放入变量中
For each patient element:
对每个病人的元素:
If patientCode = Patient_Code
如果patientCode = Patient_Code
For each Visit element:
为每个访问元素:
If VisitDate = Visit_Date
如果VisitDate = Visit_Date
If SWOL28 element exists for this visit
如果这次访问存在SWOL28元素
Update SWOL28 to New_SWOL28
更新SWOL28 New_SWOL28

But I am stuck at step number 5. How do I get a list of visits to iterated through? Apologies if this is a very dumb question but I have searched high and low for an answer I assure you! I have stripped down my code to the bare example of the part I need to fix below:

但我被困在第5步。如何获得迭代访问列表?如果这是一个非常愚蠢的问题，我向你保证，我已经到处寻找答案了。我已经将我的代码简化为下面需要修复的部分的简单示例:

import xml.etree.ElementTree as ET
tree = ET.parse('DB3.xml')
root = tree.getroot()
for child in root: # THIS GETS ME ALL THE PATIENT ATTRIBUTES
    print child.tag 
    for x in child/Visit: # THIS IS WHAT I CANNOT FIND THE CORRECT SYNTAX FOR
        # I WOULD THEN PERFORM STEPS 6, 7 AND 8 HERE

I would be deeply appreciative of any ideas any of you may have on this. I am not a programming natural that's for sure!

我将非常感激你们中的任何一个人对这个问题的看法。我不是天生的编程高手!

Thanks in advance, Sarah

谢谢你提前,莎拉

Edit 1:

编辑1:

On the advice of SVK below I tried the following:

根据SVK的建议，我尝试了以下方法:

import xml.etree.ElementTree as ET
tree = ET.parse('Untitled.xml')
root = tree.getroot()
for child in root:
    print child.tag 
    child.find( "visits" )
    for x in child.iter("visit"):
        print x.tag, x.text

But the only output I get is: Patient Patient and none of the lower tags. Any ideas?

但我得到的唯一输出是:耐心的病人，没有任何较低的标签。什么好主意吗?

4 个解决方案

#1

This is untested by it should be fairly close to what you want.

这是未经测试的，它应该相当接近你想要的。

for patient in root:
    patient_code =  patient.find('PatientCharacteristics').find('patientCode')
    if patient_code.text == code:
            for visit in patient.find('Visits'):
                    visit_date = visit.find('VisitDate')
                    if visit_date.text == date:
                        swol28 = visit.find('DAS').find('Joints').find('SWOL28')
                        if swol28.text:
                            visit.find('DAS').find('Joints').set('SWOL28', new_swol28)

#2

You can iterate over all the "visit" tags directly under an element "element" like this:

您可以在元素“元素”下直接遍历所有“访问”标记，如下所示:

for x in element.iter("visit"):

You can find the first direct child of element matching a certain tag with:

您可以找到元素的第一个直接子元素，该子元素匹配某个标记与:

element.find( "visits" )

It looks like you will first have to locate the "visits" element, which is the parent of "visit", and then iterate through its "visit" children. Putting those together you'd have something like this:

看起来您必须首先找到“visit”元素，它是“visit”的父元素，然后遍历它的“visit”子元素。把它们放在一起，你会得到这样的结果:

for patient_element in root:
    print patient_element.tag 
    visits_element = patient_element.find( "visits" )
    for visit_element in visits_element.iter("visit"):
        print visit_element.tag, visit_element.text
        # ... further processing of each visit element here

In general look at the section "Finding interesting elements" in the documentation for xml.etree.ElementTree: http://docs.python.org/2/library/xml.etree.elementtree.html#finding-interesting-elements

通常，请查看xml.etree文档中的“查找有趣的元素”一节。ElementTree:http://docs.python.org/2/library/xml.etree.elementtree.html finding-interesting-elements

#3

You could use a CssSelector to get the nodes you want from the Patient element:

您可以使用CssSelector从患者元素获得所需的节点:

from lxml.cssselect import CSSSelector
visitSelector = CSSSelector('Visit')
visits =  visitSelector(child)

you can do the same to get the patientCode Tag and the SWOL28 tag then you can access and modifiy the text of the elements using element.text

你可以做同样的事情获得patientCode标签和SWOL28标签，然后你可以使用elelement .text访问和修改元素的文本

#4

If you use lxml.etree, you can use xpath to find the elements you need to update.

如果你使用lxml。etree，可以使用xpath查找需要更新的元素。

E.g.

如。

doc.xpath('Patient[PatientCharacteristics/patientCode=$patient]/Visits/Visit[VisitDate=$visit]',patient="3",visit="2009-07-10")

所以

from lxml import etree

doc = etree.parse("DB3.xml")

changes = [
  dict(patient='3',visit='2010-08-17',swol28="99"),
]

def update_doc(x,d):
  for row in d:
    for visit in x.xpath('Patient[PatientCharacteristics/patientCode=$patient]/Visits/Visit[VisitDate=$visit]',**row):
      for swol28 in visit.xpath('DAS/Joints/SWOL28'):
        swol28.text = row['swol28']

update_doc(doc,changes)

print etree.tostring(doc)

Should yield you something that contains:

应该给你一些包含:

<Patient>
  <PatientCharacteristics>
    <patientCode>3</patientCode>
  </PatientCharacteristics>
  <Visits>
    <Visit>
      <DAS>
      <CRP>14</CRP>
      <ESR/>
      <Joints>
        <DAS_PROFILE>28/28</DAS_PROFILE>
        <SWOL28>99</SWOL28>
        <TEN28>0</TEN28>
      </Joints>
    </DAS>
    <VisitDate>2010-08-17</VisitDate>
    </Visit>
  </Visits>
</Patient>

#1