用Python读取xml文件

时间:2021-03-05 18:20:18

I am reading a file with a jml extension. The code is very simple and it reads

我正在读取带有jml扩展名的文件。代码非常简单,它可以读取

import xml.etree.ElementTree as ET
tree = ET.parse('VOAPoints_2010_M25.jml')
root = tree.getroot()

but I get a parsing error:

但我得到一个解析错误:

ParseError: not well-formed (invalid token): line 75, column 16

the file I am trying to read is a dataset that has been used before so I am confident that there are no problems with it.

我试图读取的文件是之前使用过的数据集,因此我确信它没有任何问题。

The file is 用Python读取xml文件 用Python读取xml文件 Can anyone help ?

该文件是否有人可以帮助?

2 个解决方案

#1


1  

Since the pound sign was the issue, you can escape it with the character entity £. Python can even automate the replace in XML file by iteratively reading each line and replacing it conditionally on the pound symbol:

由于英镑符号是问题,你可以用字符实体£来逃避它。 Python甚至可以通过迭代读取每一行并在井号上有条件地替换它来自动化XML文件中的替换:

import xml.etree.ElementTree as ET

oldfile = "VOAPoints_2010_M25.jml"
newfile = "VOAPoints_2010_M25_new.jml"

with open(oldfile, 'r') as otxt:
    for rline in otxt:
        if "£" in rline:
            rline = rline.replace("£", "£")

        with open(newfile, 'a') as ntxt:
            ntxt.write(rline)

tree = ET.parse(newfile)
root = tree.getroot()

#2


1  

Sorry for using an answer as a question, but formatting this inside a comment is painful. Does the code below solve your problem?

很抱歉使用答案作为问题,但在评论中格式化这是很痛苦的。下面的代码是否解决了您的问题?

import xml.etree.ElementTree as ET
myParser = ET.XMLParser(encoding="utf-8")
tree = ET.parse('VOAPoints_2010_M25.jml',parser=myParser)
root = tree.getroot()

#1


1  

Since the pound sign was the issue, you can escape it with the character entity £. Python can even automate the replace in XML file by iteratively reading each line and replacing it conditionally on the pound symbol:

由于英镑符号是问题,你可以用字符实体£来逃避它。 Python甚至可以通过迭代读取每一行并在井号上有条件地替换它来自动化XML文件中的替换:

import xml.etree.ElementTree as ET

oldfile = "VOAPoints_2010_M25.jml"
newfile = "VOAPoints_2010_M25_new.jml"

with open(oldfile, 'r') as otxt:
    for rline in otxt:
        if "£" in rline:
            rline = rline.replace("£", "£")

        with open(newfile, 'a') as ntxt:
            ntxt.write(rline)

tree = ET.parse(newfile)
root = tree.getroot()

#2


1  

Sorry for using an answer as a question, but formatting this inside a comment is painful. Does the code below solve your problem?

很抱歉使用答案作为问题,但在评论中格式化这是很痛苦的。下面的代码是否解决了您的问题?

import xml.etree.ElementTree as ET
myParser = ET.XMLParser(encoding="utf-8")
tree = ET.parse('VOAPoints_2010_M25.jml',parser=myParser)
root = tree.getroot()