This line is giving me a UnicodeEncodeError
这条线给了我一个UnicodeEncodeError。
studentID = int(studentID.unicode_markup.encode('utf-8').decode('utf-8', 'ignore'))
Specifically the error is this UnicodeEncodeError: 'decimal' codec can't encode character u'\x00' in position 8: invalid decimal Unicode string
具体来说,这个错误就是这个UnicodeEncodeError: 'decimal' codec不能将字符u'\x00编码在位置8:无效的十进制Unicode字符串。
If I change the line to this:
如果我改变这条线:
studentID = int(studentID.unicode_markup.encode('utf-8'))
I get this error:
我得到这个错误:
ValueError: invalid literal for int() with base 10: '\xc2\xa0\xc2\xa0100\xc2\xa0\xc2\xa0'
I have tried specifying a different encoding (like 'ascii') but it still gives me the same error.
我尝试过指定不同的编码(比如“ascii”),但它仍然给了我同样的错误。
Help is greatly appreciated.
非常感谢您的帮助。
1 个解决方案
#1
0
You have some non-visible characters in your string before and after the 100
. Therefore theint
function is failing because it can't convert this string into an int.
在100之前和之后,字符串中都有一些非可见字符。因此,int函数失败了,因为它不能将这个字符串转换为int类型。
Try the following approach to parse out any numbers before attempting to convert to int:
在尝试转换为int之前,尝试以下方法来解析任何数字:
import re
# find all characters in the string that are numeric.
m = re.search(r'\d+', studentID.unicode_markup)
numeric = m.group() # retrieve numeric string
int(numeric) # returns 100
#1
0
You have some non-visible characters in your string before and after the 100
. Therefore theint
function is failing because it can't convert this string into an int.
在100之前和之后,字符串中都有一些非可见字符。因此,int函数失败了,因为它不能将这个字符串转换为int类型。
Try the following approach to parse out any numbers before attempting to convert to int:
在尝试转换为int之前,尝试以下方法来解析任何数字:
import re
# find all characters in the string that are numeric.
m = re.search(r'\d+', studentID.unicode_markup)
numeric = m.group() # retrieve numeric string
int(numeric) # returns 100