爬虫过程中遇到的编码解码问题

最近学习爬虫想去爬一个网站的数据时，出现了下面的错误，其实是编码问题：
TypeError: cannot use a string pattern on a bytes-like object
爬虫过程中遇到的编码解码问题

TypeError: can’t use a string pattern on a bytes-like object.
客户端只支持utf-8编码格式，服务器支持gbk格式（有代理服务器的情况下，代理服务器只支持gbk格式，需要decode(‘gbk’)解码）。
html用decode(‘utf-8’)进行解码，由bytes变成string。
py3的urlopen返回的不是string是bytes。

于是，我打算用decode(‘utf-8’)对html进行解码，结果报错：UnicodeDecodeError: ‘utf-8’ codec can’t decode byte 0xd7 in position 263: invalid continuation byte
爬虫过程中遇到的编码解码问题
utf-8不能解码html内容，说明此处不应该用utf-8进行解码，换用gbk解码，解码成功，说明此处服务器返回的html是gbk格式的：

秒客网

爬虫过程中遇到的编码解码问题

相关文章