I am creating XML file in Python and there's a field on my XML that I put the contents of a text file. I do it by
我在Python中创建XML文件,我的XML上有一个字段,我放了一个文本文件的内容。我是这样做的
f = open ('myText.txt',"r")
data = f.read()
f.close()
root = ET.Element("add")
doc = ET.SubElement(root, "doc")
field = ET.SubElement(doc, "field")
field.set("name", "text")
field.text = data
tree = ET.ElementTree(root)
tree.write("output.xml")
And then I get the UnicodeDecodeError
. I already tried to put the special comment # -*- coding: utf-8 -*-
on top of my script but still got the error. Also I tried already to enforce the encoding of my variable data.encode('utf-8')
but still got the error. I know this issue is very common but all the solutions I got from other questions didn't work for me.
然后我得到UnicodeDecodeError。我已经尝试在我的脚本之上添加特殊注释# - * - coding:utf-8 - * - 但仍然出现错误。我也试过强制我的变量data.encode('utf-8')的编码,但仍然得到错误。我知道这个问题很常见,但我从其他问题得到的所有解决方案都不适用于我。
UPDATE
UPDATE
Traceback: Using only the special comment on the first line of the script
回溯:仅使用脚本第一行的特殊注释
Traceback (most recent call last):
File "D:\Python\lse\createxml.py", line 151, in <module>
tree.write("D:\\python\\lse\\xmls\\" + items[ctr][0] + ".xml")
File "C:\Python27\lib\xml\etree\ElementTree.py", line 820, in write
serialize(write, self._root, encoding, qnames, namespaces)
File "C:\Python27\lib\xml\etree\ElementTree.py", line 939, in _serialize_xml
_serialize_xml(write, e, encoding, qnames, None)
File "C:\Python27\lib\xml\etree\ElementTree.py", line 939, in _serialize_xml
_serialize_xml(write, e, encoding, qnames, None)
File "C:\Python27\lib\xml\etree\ElementTree.py", line 937, in _serialize_xml
write(_escape_cdata(text, encoding))
File "C:\Python27\lib\xml\etree\ElementTree.py", line 1073, in _escape_cdata
return text.encode(encoding, "xmlcharrefreplace")
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position 243: ordina
l not in range(128)
Traceback: Using .encode('utf-8')
回溯:使用.encode('utf-8')
Traceback (most recent call last):
File "D:\Python\lse\createxml.py", line 148, in <module>
field.text = data.encode('utf-8')
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position 227: ordina
l not in range(128)
I used .decode('utf-8')
and the error message didn't appear and it successfully created my XML file. But the problem is that the XML is not viewable on my browser.
我使用.decode('utf-8')并且没有出现错误消息,它成功创建了我的XML文件。但问题是我的浏览器无法查看XML。
4 个解决方案
#1
59
You need to decode data from input string into unicode, before using it, to avoid encoding problems.
在使用之前,您需要将输入字符串中的数据解码为unicode,以避免编码问题。
field.text = data.decode("utf8")
#2
12
I was running into a similar error in pywikipediabot. The .decode
method is a step in the right direction but for me it didn't work without adding 'ignore'
:
我在pywikipediabot遇到了类似的错误。 .decode方法是朝着正确方向迈出的一步,但对我而言,如果不添加'ignore'则无效:
fix_encoding = lambda s: s.decode('utf8', 'ignore')
#3
7
Python 2
Python 2
The error is caused because ElementTree did not expect to find non-ASCII strings set the XML when trying to write it out. You should use Unicode strings for non-ASCII instead. Unicode strings can be made either by using the u
prefix on strings, i.e. u'€'
or by decoding a string with mystr.decode('utf-8')
using the appropriate encoding.
导致该错误是因为ElementTree在尝试将其写出时不希望找到非ASCII字符串设置XML。您应该使用Unicode字符串替换非ASCII。可以通过在字符串上使用u前缀来创建Unicode字符串,即u''或使用适当的编码使用mystr.decode('utf-8')解码字符串。
The best practice is to decode all text data as it's read, rather than decoding mid-program. The io
module provides an open()
method which decodes text data to Unicode strings as it's read.
最佳实践是在读取所有文本数据时对其进行解码,而不是解码中间程序。 io模块提供了一个open()方法,该方法在读取时将文本数据解码为Unicode字符串。
ElementTree will be much happier with Unicodes and will properly encode it correctly when using the ET.write()
method.
使用ET.write()方法时,ElementTree会对Unicodes更加满意并正确编码。
Also, for best compatibility and readability, ensure that ET encodes to UTF-8 during write()
and adds the relevant header.
此外,为了获得最佳兼容性和可读性,请确保在write()期间ET编码为UTF-8并添加相关标头。
Presuming your input file is UTF-8 encoded (0xC2
is common UTF-8 lead byte), putting everything together, and using the with
statement, your code should look like:
假设您的输入文件是UTF-8编码(0xC2是常见的UTF-8前导字节),将所有内容放在一起,并使用with语句,您的代码应如下所示:
with io.open('myText.txt', "r", encoding='utf-8') as f:
data = f.read()
root = ET.Element("add")
doc = ET.SubElement(root, "doc")
field = ET.SubElement(doc, "field")
field.set("name", "text")
field.text = data
tree = ET.ElementTree(root)
tree.write("output.xml", encoding='utf-8', xml_declaration=True)
Output:
输出:
<?xml version='1.0' encoding='utf-8'?>
<add><doc><field name="text">data€</field></doc></add>
#4
1
#!/usr/bin/python
#!的/ usr / bin中/蟒蛇
# encoding=utf8
#encoding = utf8
Try This to starting of python file
试试这个以启动python文件
#1
59
You need to decode data from input string into unicode, before using it, to avoid encoding problems.
在使用之前,您需要将输入字符串中的数据解码为unicode,以避免编码问题。
field.text = data.decode("utf8")
#2
12
I was running into a similar error in pywikipediabot. The .decode
method is a step in the right direction but for me it didn't work without adding 'ignore'
:
我在pywikipediabot遇到了类似的错误。 .decode方法是朝着正确方向迈出的一步,但对我而言,如果不添加'ignore'则无效:
fix_encoding = lambda s: s.decode('utf8', 'ignore')
#3
7
Python 2
Python 2
The error is caused because ElementTree did not expect to find non-ASCII strings set the XML when trying to write it out. You should use Unicode strings for non-ASCII instead. Unicode strings can be made either by using the u
prefix on strings, i.e. u'€'
or by decoding a string with mystr.decode('utf-8')
using the appropriate encoding.
导致该错误是因为ElementTree在尝试将其写出时不希望找到非ASCII字符串设置XML。您应该使用Unicode字符串替换非ASCII。可以通过在字符串上使用u前缀来创建Unicode字符串,即u''或使用适当的编码使用mystr.decode('utf-8')解码字符串。
The best practice is to decode all text data as it's read, rather than decoding mid-program. The io
module provides an open()
method which decodes text data to Unicode strings as it's read.
最佳实践是在读取所有文本数据时对其进行解码,而不是解码中间程序。 io模块提供了一个open()方法,该方法在读取时将文本数据解码为Unicode字符串。
ElementTree will be much happier with Unicodes and will properly encode it correctly when using the ET.write()
method.
使用ET.write()方法时,ElementTree会对Unicodes更加满意并正确编码。
Also, for best compatibility and readability, ensure that ET encodes to UTF-8 during write()
and adds the relevant header.
此外,为了获得最佳兼容性和可读性,请确保在write()期间ET编码为UTF-8并添加相关标头。
Presuming your input file is UTF-8 encoded (0xC2
is common UTF-8 lead byte), putting everything together, and using the with
statement, your code should look like:
假设您的输入文件是UTF-8编码(0xC2是常见的UTF-8前导字节),将所有内容放在一起,并使用with语句,您的代码应如下所示:
with io.open('myText.txt', "r", encoding='utf-8') as f:
data = f.read()
root = ET.Element("add")
doc = ET.SubElement(root, "doc")
field = ET.SubElement(doc, "field")
field.set("name", "text")
field.text = data
tree = ET.ElementTree(root)
tree.write("output.xml", encoding='utf-8', xml_declaration=True)
Output:
输出:
<?xml version='1.0' encoding='utf-8'?>
<add><doc><field name="text">data€</field></doc></add>
#4
1
#!/usr/bin/python
#!的/ usr / bin中/蟒蛇
# encoding=utf8
#encoding = utf8
Try This to starting of python file
试试这个以启动python文件