tree = (page_text,parser=parser) File "src\lxml\", line 3521, in File "src\lxml\", line 1859, in ._parseDocument File "src\lxml\", line 1885, in ._parseDocumentFromURL File "src\lxml\", line 1789, in ._parseDocFromFile File "src\lxml\", line 1177, in ._BaseParser._parseDocFromFile File "src\lxml\", line 615, in ._ParserContext._handleParseResultDoc File "src\lxml\", line 725, in ._handleParseResult File "src\lxml\", line 652, in ._raiseParseError OSError: Error reading file
原因
直接用(page_text),读取从网上爬取的HTML,而不是从文件中读取则会报错
tree = etree.parse(page_text,parser=parser)
解决方案:
先使用(网上爬取的HTML)让其进行解析,然后再使用xpath()进行数据解析
html = etree.HTML(page_text)
content_list = html.xpath("//ul[@class='house-list-wrap']/li")