UnicodeEncodeError: 'decimal' codec不能将字符u'\x00编码在位置8:无效的十进制Unicode字符串。

This line is giving me a UnicodeEncodeError

这条线给了我一个UnicodeEncodeError。

studentID = int(studentID.unicode_markup.encode('utf-8').decode('utf-8', 'ignore'))

Specifically the error is this UnicodeEncodeError: 'decimal' codec can't encode character u'\x00' in position 8: invalid decimal Unicode string

具体来说，这个错误就是这个UnicodeEncodeError: 'decimal' codec不能将字符u'\x00编码在位置8:无效的十进制Unicode字符串。

If I change the line to this:

如果我改变这条线:

studentID = int(studentID.unicode_markup.encode('utf-8'))

I get this error:

我得到这个错误:

ValueError: invalid literal for int() with base 10: '\xc2\xa0\xc2\xa0100\xc2\xa0\xc2\xa0'

I have tried specifying a different encoding (like 'ascii') but it still gives me the same error.

我尝试过指定不同的编码(比如“ascii”)，但它仍然给了我同样的错误。

Help is greatly appreciated.

非常感谢您的帮助。

1 个解决方案

#1

You have some non-visible characters in your string before and after the 100. Therefore theint function is failing because it can't convert this string into an int.

在100之前和之后，字符串中都有一些非可见字符。因此，int函数失败了，因为它不能将这个字符串转换为int类型。

Try the following approach to parse out any numbers before attempting to convert to int:

在尝试转换为int之前，尝试以下方法来解析任何数字:

import re

# find all characters in the string that are numeric.
m = re.search(r'\d+', studentID.unicode_markup)
numeric = m.group() # retrieve numeric string
int(numeric) # returns 100

#1