用python用xml解析CDATA

时间:2022-12-01 13:15:49

I need to parse an XML file with a number of blocks of CDATA that I need to retain for later plotting:

我需要用一些CDATA块解析一个XML文件,这些块需要保留下来以便以后绘图:

<process id="process1"> <log name="name1" device="device1"><![CDATA[timestamp value]]]></log> <log name="name2" device="device2"><![CDATA[timestamp value, timestamp value, timestamp]]]></log> </process>

时间戳值,时间戳值,时间戳]

I will need to do this repeatedly and quickly, and I am looking for the best way to do this. I've read that ElementTree is the faster of the methods, but I am open to other suggestions.

我需要反复快速地做这件事,我正在寻找最好的方法来做这件事。我已经读过ElementTree是更快的方法,但是我愿意接受其他的建议。

1 个解决方案

#1


10  

Here are two examples of how to do it:

以下是如何做到这一点的两个例子:

from lxml import etree
import xml.etree.ElementTree as ElementTree

CONTENT = """
<process id="process1">
 <log name="name1" device="device1"><![CDATA[timestamp value]]></log>
 <log name="name2" device="device2"><![CDATA[timestamp value, timestamp value, timestamp]]></log>
</process>
"""

def parse_with_lxml():
    root = etree.fromstring(CONTENT)
    for log in root.xpath("//log"):
        print log.text

def parse_with_stdlib():
    root = ElementTree.fromstring(CONTENT)
    for log in root.iter('log'):
        print log.text

if __name__ == '__main__':
    parse_with_lxml()
    parse_with_stdlib()

Output:

输出:

timestamp value
timestamp value, timestamp value, timestamp
timestamp value
timestamp value, timestamp value, timestamp

The text attribute it handles it in both cases.

它在这两种情况下处理的文本属性。

#1


10  

Here are two examples of how to do it:

以下是如何做到这一点的两个例子:

from lxml import etree
import xml.etree.ElementTree as ElementTree

CONTENT = """
<process id="process1">
 <log name="name1" device="device1"><![CDATA[timestamp value]]></log>
 <log name="name2" device="device2"><![CDATA[timestamp value, timestamp value, timestamp]]></log>
</process>
"""

def parse_with_lxml():
    root = etree.fromstring(CONTENT)
    for log in root.xpath("//log"):
        print log.text

def parse_with_stdlib():
    root = ElementTree.fromstring(CONTENT)
    for log in root.iter('log'):
        print log.text

if __name__ == '__main__':
    parse_with_lxml()
    parse_with_stdlib()

Output:

输出:

timestamp value
timestamp value, timestamp value, timestamp
timestamp value
timestamp value, timestamp value, timestamp

The text attribute it handles it in both cases.

它在这两种情况下处理的文本属性。