I'm having problems reading from a file, processing its string and saving to an UTF-8 File.
我在读取文件时遇到问题,处理它的字符串并保存到UTF-8文件。
Here is the code:
这是代码:
try:
filehandle = open(filename,"r")
except:
print("Could not open file " + filename)
quit()
text = filehandle.read()
filehandle.close()
I then do some processing on the variable text.
然后对变量文本进行一些处理。
And then
然后
try:
writer = open(output,"w")
except:
print("Could not open file " + output)
quit()
#data = text.decode("iso 8859-15")
#writer.write(data.encode("UTF-8"))
writer.write(text)
writer.close()
This output the file perfectly but it does so in iso 8859-15 according to my editor. Since the same editor recognizes the input file (in the variable filename) as UTF-8 I don't know why this happened. As far as my reasearch has shown the commented lines should solve the problem. However when I use those lines the resulting file has gibberish in special character mainly, words with tilde as the text is in spanish. I would really appreciate any help as I am stumped....
这个文件完美地输出了文件,但在iso 8859-15中是这样做的。因为同一个编辑器识别输入文件(在变量文件名中)作为UTF-8,我不知道为什么会发生这种情况。就我的研究来看,注释行应该可以解决这个问题。然而,当我使用这些行时,结果文件主要是在特殊字符上胡言乱语,而文字是用西班牙语写成的。我会很感激任何帮助我难住了....
3 个解决方案
#1
117
Process text to and from Unicode at the I/O boundaries of your program using the codecs
module:
使用codecs模块在程序的I/O边界上对Unicode进行处理文本:
import codecs
with codecs.open(filename,'r',encoding='utf8') as f:
text = f.read()
# process Unicode text
with codecs.open(filename,'w',encoding='utf8') as f:
f.write(text)
Edit: The io
module is now recommended instead of codecs and is compatible with Python 3's open
syntax:
编辑:io模块现在被推荐,而不是codecs,并且与Python 3的开放语法兼容:
import io
with io.open(filename,'r',encoding='utf8') as f:
text = f.read()
# process Unicode text
with io.open(filename,'w',encoding='utf8') as f:
f.write(text)
#2
4
You can't do that using open. use codecs.
你不能打开它。使用编解码器。
when you are opening a file in python using the open built-in function you will always read/write the file in ascii. To write it in utf-8 try this:
当您使用open内置函数在python中打开一个文件时,您将始终以ascii方式读取/写入文件。用utf-8来写:
import codecs
file = codecs.open('data.txt','w','utf-8')
#3
3
You can also get through it by the code below:
你也可以通过下面的代码来完成:
file=open(completefilepath,'r',encoding='utf8',errors="ignore")
file.read()
#1
117
Process text to and from Unicode at the I/O boundaries of your program using the codecs
module:
使用codecs模块在程序的I/O边界上对Unicode进行处理文本:
import codecs
with codecs.open(filename,'r',encoding='utf8') as f:
text = f.read()
# process Unicode text
with codecs.open(filename,'w',encoding='utf8') as f:
f.write(text)
Edit: The io
module is now recommended instead of codecs and is compatible with Python 3's open
syntax:
编辑:io模块现在被推荐,而不是codecs,并且与Python 3的开放语法兼容:
import io
with io.open(filename,'r',encoding='utf8') as f:
text = f.read()
# process Unicode text
with io.open(filename,'w',encoding='utf8') as f:
f.write(text)
#2
4
You can't do that using open. use codecs.
你不能打开它。使用编解码器。
when you are opening a file in python using the open built-in function you will always read/write the file in ascii. To write it in utf-8 try this:
当您使用open内置函数在python中打开一个文件时,您将始终以ascii方式读取/写入文件。用utf-8来写:
import codecs
file = codecs.open('data.txt','w','utf-8')
#3
3
You can also get through it by the code below:
你也可以通过下面的代码来完成:
file=open(completefilepath,'r',encoding='utf8',errors="ignore")
file.read()