错误UnicodeDecodeError:'utf-8'编解码器无法解码位置0的字节0xff:无效的起始字节

时间:2021-08-15 20:52:57

https://github.com/affinelayer/pix2pix-tensorflow/tree/master/tools

An error occurred when compiling "process.py" on the above site.

在上面的网站上编译“process.py”时发生错误。

 python tools/process.py --input_dir data --            operation resize --outp
ut_dir data2/resize
data/0.jpg -> data2/resize/0.png

Traceback (most recent call last):

回溯(最近的呼叫最后):

File "tools/process.py", line 235, in <module>
  main()
File "tools/process.py", line 167, in main
  src = load(src_path)
File "tools/process.py", line 113, in load
  contents = open(path).read()
      File"/home/user/anaconda3/envs/tensorflow_2/lib/python3.5/codecs.py", line 321, in decode
  (result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode     byte 0xff in position 0: invalid start byte

What is the cause of the error? Python's version is 3.5.2.

错误的原因是什么? Python的版本是3.5.2。

9 个解决方案

#1


49  

Python tries to convert a byte-array (a bytes which it assumes to be a utf-8-encoded string) to a unicode string (str). This process of course is a decoding according to utf-8 rules. When it tries this, it encounters a byte sequence which is not allowed in utf-8-encoded strings (namely this 0xff at position 0).

Python尝试将字节数组(它假定为utf-8编码的字符串的字节)转换为unicode字符串(str)。这个过程当然是根据utf-8规则进行解码。当它尝试这个时,它遇到一个字节序列,在utf-8编码的字符串中不允许(即位置0的0xff)。

Since you did not provide any code we could look at, we only could guess on the rest.

由于您没有提供我们可以查看的任何代码,我们只能猜测其余的。

From the stack trace we can assume that the triggering action was the reading from a file (contents = open(path).read()). I propose to recode this in a fashion like this:

从堆栈跟踪中我们可以假设触发操作是从文件读取(contents = open(path).read())。我建议以这样的方式重新编码:

with open(path, 'rb') as f:
  contents = f.read()

That b in the mode specifier in the open() states that the file shall be treated as binary, so contents will remain a bytes. No decoding attempt will happen this way.

open()中的模式说明符中的b表示该文件应被视为二进制,因此内容将保留为字节。不会以这种方式发生解码尝试。

#2


13  

Use this solution it will strip out (ignore) the characters and return the string without them. Only use this if your need is to strip them not convert them.

使用此解决方案,它将剥离(忽略)字符并返回不带它们的字符串。只有在您需要剥离它们而不是转换它们时才使用它。

with open(path, encoding="utf8", errors='ignore') as f:

Using errors='ignore' You'll just lose some characters. but if your don't care about them as they seem to be extra characters originating from a the bad formatting and programming of the clients connecting to my socket server. Then its a easy direct solution. reference

使用errors ='ignore'你只会丢失一些字符。但是如果你不关心它们,因为它们似乎是源于连接到我的套接字服务器的客户端的错误格式化和编程的额外字符。然后是一个简单的直接解决方案参考

#3


11  

Had an issue similar to this, Ended up using UTF-16 to decode. my code is below.

有一个类似的问题,结束使用UTF-16解码。我的代码如下。

with open(path_to_file,'rb') as f:
    contents = f.read()
contents = contents.rstrip("\n").decode("utf-16")
contents = contents.split("\r\n")

this would take the file contents as an import, but it would return the code in UTF format. from there it would be decoded and seperated by lines.

这会将文件内容作为导入,但它将以UTF格式返回代码。从那里它将被线解码和分离。

#4


8  

I've come across this thread when suffering the same error, after doing some research I can confirm, this is an error that happens when you try to decode a UTF-16 file with UTF-8.

我遇到同样的错误时遇到过这个问题,经过一些研究我可以确认,这是当你尝试用UTF-8解码UTF-16文件时发生的错误。

With UTF-16 the first characther (2 bytes in UTF-16) is a Byte Order Mark (BOM), which is used as a decoding hint and doesn't appear as a character in the decoded string. This means the first byte will be either FE or FF and the second, the other.

对于UTF-16,第一个字符(UTF-16中的2个字节)是字节顺序标记(BOM),它用作解码提示,并且不会在解码字符串中显示为字符。这意味着第一个字节将是FE或FF,第二个字节将是另一个。

Heavily edited after I found out the real answer

在我找到真正的答案后,重新编辑

#5


1  

Check the path of the file to be read. My code kept on giving me errors until I changed the path name to present working directory. The error was:

检查要读取的文件的路径。我的代码一直给我错误,直到我将路径名改为现在的工作目录。错误是:

newchars, decodedbytes = self.decode(data, self.errors)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 0: invalid start byte

#6


1  

use only

base64.b64decode(a) 

instead of

base64.b64decode(a).decode('utf-8')

#7


0  

HitHere, you should load the "GoogleNews-vectors-negative300.bin.gz" file at first then extract it by this command in Ubuntu: gunzip -k GoogleNews-vectors-negative300.bin.gz. [ manually extracting is never recommended]. secondly, you should apply these commands in pyrhon 3:

HitHere,你应首先加载“GoogleNews-vectors-negative300.bin.gz”文件,然后在Ubuntu中使用此命令解压缩它:gunzip -k GoogleNews-vectors-negative300.bin.gz。 [永远不建议手动提取]。其次,你应该在pyrhon 3中应用这些命令:

import gensim model = gensim.models.Word2Vec.load_word2vec_format('./model/GoogleNews-vectors-negative300.bin', binary=True) . I hope it will be useful.

import gensim model = gensim.models.Word2Vec.load_word2vec_format('./ model / GoogleNews-vectors-negative300.bin',binary = True)。我希望它会有用。

#8


-1  

I have a similar problem. I try to run an example in tensorflow/models/objective_detection and met the same message. Try to change Python3 to Python2

我有一个类似的问题。我尝试在tensorflow / models / objective_detection中运行一个示例并遇到相同的消息。尝试将Python3更改为Python2

#9


-2  

If possible, open the file in a text editor and try to change the encoding to UTF-8. Otherwise do it programatically at the OS level.

如果可能,请在文本编辑器中打开文件,然后尝试将编码更改为UTF-8。否则在操作系统级别以编程方式执行。

#1


49  

Python tries to convert a byte-array (a bytes which it assumes to be a utf-8-encoded string) to a unicode string (str). This process of course is a decoding according to utf-8 rules. When it tries this, it encounters a byte sequence which is not allowed in utf-8-encoded strings (namely this 0xff at position 0).

Python尝试将字节数组(它假定为utf-8编码的字符串的字节)转换为unicode字符串(str)。这个过程当然是根据utf-8规则进行解码。当它尝试这个时,它遇到一个字节序列,在utf-8编码的字符串中不允许(即位置0的0xff)。

Since you did not provide any code we could look at, we only could guess on the rest.

由于您没有提供我们可以查看的任何代码,我们只能猜测其余的。

From the stack trace we can assume that the triggering action was the reading from a file (contents = open(path).read()). I propose to recode this in a fashion like this:

从堆栈跟踪中我们可以假设触发操作是从文件读取(contents = open(path).read())。我建议以这样的方式重新编码:

with open(path, 'rb') as f:
  contents = f.read()

That b in the mode specifier in the open() states that the file shall be treated as binary, so contents will remain a bytes. No decoding attempt will happen this way.

open()中的模式说明符中的b表示该文件应被视为二进制,因此内容将保留为字节。不会以这种方式发生解码尝试。

#2


13  

Use this solution it will strip out (ignore) the characters and return the string without them. Only use this if your need is to strip them not convert them.

使用此解决方案,它将剥离(忽略)字符并返回不带它们的字符串。只有在您需要剥离它们而不是转换它们时才使用它。

with open(path, encoding="utf8", errors='ignore') as f:

Using errors='ignore' You'll just lose some characters. but if your don't care about them as they seem to be extra characters originating from a the bad formatting and programming of the clients connecting to my socket server. Then its a easy direct solution. reference

使用errors ='ignore'你只会丢失一些字符。但是如果你不关心它们,因为它们似乎是源于连接到我的套接字服务器的客户端的错误格式化和编程的额外字符。然后是一个简单的直接解决方案参考

#3


11  

Had an issue similar to this, Ended up using UTF-16 to decode. my code is below.

有一个类似的问题,结束使用UTF-16解码。我的代码如下。

with open(path_to_file,'rb') as f:
    contents = f.read()
contents = contents.rstrip("\n").decode("utf-16")
contents = contents.split("\r\n")

this would take the file contents as an import, but it would return the code in UTF format. from there it would be decoded and seperated by lines.

这会将文件内容作为导入,但它将以UTF格式返回代码。从那里它将被线解码和分离。

#4


8  

I've come across this thread when suffering the same error, after doing some research I can confirm, this is an error that happens when you try to decode a UTF-16 file with UTF-8.

我遇到同样的错误时遇到过这个问题,经过一些研究我可以确认,这是当你尝试用UTF-8解码UTF-16文件时发生的错误。

With UTF-16 the first characther (2 bytes in UTF-16) is a Byte Order Mark (BOM), which is used as a decoding hint and doesn't appear as a character in the decoded string. This means the first byte will be either FE or FF and the second, the other.

对于UTF-16,第一个字符(UTF-16中的2个字节)是字节顺序标记(BOM),它用作解码提示,并且不会在解码字符串中显示为字符。这意味着第一个字节将是FE或FF,第二个字节将是另一个。

Heavily edited after I found out the real answer

在我找到真正的答案后,重新编辑

#5


1  

Check the path of the file to be read. My code kept on giving me errors until I changed the path name to present working directory. The error was:

检查要读取的文件的路径。我的代码一直给我错误,直到我将路径名改为现在的工作目录。错误是:

newchars, decodedbytes = self.decode(data, self.errors)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 0: invalid start byte

#6


1  

use only

base64.b64decode(a) 

instead of

base64.b64decode(a).decode('utf-8')

#7


0  

HitHere, you should load the "GoogleNews-vectors-negative300.bin.gz" file at first then extract it by this command in Ubuntu: gunzip -k GoogleNews-vectors-negative300.bin.gz. [ manually extracting is never recommended]. secondly, you should apply these commands in pyrhon 3:

HitHere,你应首先加载“GoogleNews-vectors-negative300.bin.gz”文件,然后在Ubuntu中使用此命令解压缩它:gunzip -k GoogleNews-vectors-negative300.bin.gz。 [永远不建议手动提取]。其次,你应该在pyrhon 3中应用这些命令:

import gensim model = gensim.models.Word2Vec.load_word2vec_format('./model/GoogleNews-vectors-negative300.bin', binary=True) . I hope it will be useful.

import gensim model = gensim.models.Word2Vec.load_word2vec_format('./ model / GoogleNews-vectors-negative300.bin',binary = True)。我希望它会有用。

#8


-1  

I have a similar problem. I try to run an example in tensorflow/models/objective_detection and met the same message. Try to change Python3 to Python2

我有一个类似的问题。我尝试在tensorflow / models / objective_detection中运行一个示例并遇到相同的消息。尝试将Python3更改为Python2

#9


-2  

If possible, open the file in a text editor and try to change the encoding to UTF-8. Otherwise do it programatically at the OS level.

如果可能,请在文本编辑器中打开文件,然后尝试将编码更改为UTF-8。否则在操作系统级别以编程方式执行。