i am trying to create a webpage scraper and I want to use BeautifulSoup to do so. I installed BeautifulSoup 4.3.2 as the website said it was compatible with python 3.x. I used
我正在尝试创建一个网页刮刀,我想用BeautifulSoup来做。我安装了漂亮的soup 4.3.2,因为网站说它与python 3.x兼容。我使用
pip install beautifulsoup4
to install it. But when i run
安装它。但当我运行
from bs4 import BeautifulSoup
import requests
url = input("Enter a URL (start with www): ")
link = "http://" + url
data = requests.get(link).content
soup = BeautifulSoup(data)
for link in soup.find_all('a'):
print(link.get('href'))
i get an error that says
我得到一个错误说
Traceback (most recent call last):
File "/Users/user/Desktop/project.py", line 1, in <module>
from bs4 import BeautifulSoup
File "/Library/Frameworks/Python.framework/Versions/3.1/lib/python3.1/site-packages /bs4/__init__.py", line 30, in <module>
from .builder import builder_registry, ParserRejectedMarkup
File "/Library/Frameworks/Python.framework/Versions/3.1/lib/python3.1/site-packages/bs4/builder /__init__.py", line 308, in <module>
from .. import _htmlparser
ImportError: cannot import name _htmlparser
3 个解决方案
#1
1
Just installed Python 3.x on my end and tested the latest download of BS4. Didn't work. However, a fix can be found here: https://github.com/il-vladislav/BeautifulSoup4 (credits to GitHub user Il Vladislav, whoever you are).
刚刚安装Python 3。x在我这边测试了最新的BS4下载。没有工作。但是,可以在这里找到一个补丁:https://github.com/il-vladislav/BeautifulSoup4(对GitHub用户Il Vladislav,无论你是谁)。
Download the zip, overwrite the bs4
folder inside your BeautifulSoup
download, then reinstall it via python setup.py install
. Works now on my end, as you can see in the screenshot below where an error is evident before working completely.
下载zip文件,覆盖BeautifulSoup下载中的bs4文件夹,然后通过python安装重新安装。py安装。现在在我这边工作,你可以在下面的截图中看到,在完全工作之前有一个明显的错误。
Code:
代码:
from bs4 import BeautifulSoup
import requests
url = input("Enter a URL (start with www): ")
link = "http://" + url
data = requests.get(link).content
soup = BeautifulSoup(data)
for link in soup.find_all('a'):
print(link.get('href'))
Screenshot:
截图:
Relevant SO topic found here, showing that BS4 is not totally compatible with Python 3.x yet (even after 2 years).
这里找到了相关的SO主题,表明BS4与Python 3不完全兼容。x还没有(即使两年后)。
#2
1
I think there might be an error in the source file, specifically here:
我认为源文件中可能有错误,特别是这里:
File "/Library/Frameworks/Python.framework/Versions/3.1/lib/python3.1/site-packages/bs4/builder /__init__.py", line 308, in <module>
from .. import _htmlparser
In my installation, line 308 of bs4/builder /__init__.py
在我的安装中,bs4/builder /__init__.py的308行
from . import _htmlparser
You could probably just fix it there and see if bs4 will successfully import. Not sure which version of bs4 you got installed, but mine is at 4.3.2, and the _htmlparser.py
is also at bs4/builder
您可以在那里进行修复,看看bs4是否会成功导入。不确定安装了哪个版本的bs4,但是我的版本是4.3.2,还有_htmlparser。py也在bs4/builder
#3
0
I just edited the bs4/builder/_htmlparser.py
so that
我刚刚编辑了bs4/builder/_htmlparser。py这样
A) HTMLParseError wasn't imported
一)HTMLParseError不是进口的
from html.parser import HTMLParser
B) The HTMLParseError class was defined
B)定义了HTMLParseError类
class HTMLParseError(Exception):
"""Exception raised for all parse errors."""
def __init__(self, msg, position=(None, None)):
assert msg
self.msg = msg
self.lineno = position[0]
self.offset = position[1]
def __str__(self):
result = self.msg
if self.lineno is not None:
result = result + ", at line %d" % self.lineno
if self.offset is not None:
result = result + ", column %d" % (self.offset + 1)
return result
This probably isn't the best since HTMLParserError isn't going to be raised. But! Your exception will just be uncaught and is unhandled anyways.
这可能不是最好的,因为HTMLParserError不会被引发。但是!你的例外将会被发现,并且没有被处理。
#1
1
Just installed Python 3.x on my end and tested the latest download of BS4. Didn't work. However, a fix can be found here: https://github.com/il-vladislav/BeautifulSoup4 (credits to GitHub user Il Vladislav, whoever you are).
刚刚安装Python 3。x在我这边测试了最新的BS4下载。没有工作。但是,可以在这里找到一个补丁:https://github.com/il-vladislav/BeautifulSoup4(对GitHub用户Il Vladislav,无论你是谁)。
Download the zip, overwrite the bs4
folder inside your BeautifulSoup
download, then reinstall it via python setup.py install
. Works now on my end, as you can see in the screenshot below where an error is evident before working completely.
下载zip文件,覆盖BeautifulSoup下载中的bs4文件夹,然后通过python安装重新安装。py安装。现在在我这边工作,你可以在下面的截图中看到,在完全工作之前有一个明显的错误。
Code:
代码:
from bs4 import BeautifulSoup
import requests
url = input("Enter a URL (start with www): ")
link = "http://" + url
data = requests.get(link).content
soup = BeautifulSoup(data)
for link in soup.find_all('a'):
print(link.get('href'))
Screenshot:
截图:
Relevant SO topic found here, showing that BS4 is not totally compatible with Python 3.x yet (even after 2 years).
这里找到了相关的SO主题,表明BS4与Python 3不完全兼容。x还没有(即使两年后)。
#2
1
I think there might be an error in the source file, specifically here:
我认为源文件中可能有错误,特别是这里:
File "/Library/Frameworks/Python.framework/Versions/3.1/lib/python3.1/site-packages/bs4/builder /__init__.py", line 308, in <module>
from .. import _htmlparser
In my installation, line 308 of bs4/builder /__init__.py
在我的安装中,bs4/builder /__init__.py的308行
from . import _htmlparser
You could probably just fix it there and see if bs4 will successfully import. Not sure which version of bs4 you got installed, but mine is at 4.3.2, and the _htmlparser.py
is also at bs4/builder
您可以在那里进行修复,看看bs4是否会成功导入。不确定安装了哪个版本的bs4,但是我的版本是4.3.2,还有_htmlparser。py也在bs4/builder
#3
0
I just edited the bs4/builder/_htmlparser.py
so that
我刚刚编辑了bs4/builder/_htmlparser。py这样
A) HTMLParseError wasn't imported
一)HTMLParseError不是进口的
from html.parser import HTMLParser
B) The HTMLParseError class was defined
B)定义了HTMLParseError类
class HTMLParseError(Exception):
"""Exception raised for all parse errors."""
def __init__(self, msg, position=(None, None)):
assert msg
self.msg = msg
self.lineno = position[0]
self.offset = position[1]
def __str__(self):
result = self.msg
if self.lineno is not None:
result = result + ", at line %d" % self.lineno
if self.offset is not None:
result = result + ", column %d" % (self.offset + 1)
return result
This probably isn't the best since HTMLParserError isn't going to be raised. But! Your exception will just be uncaught and is unhandled anyways.
这可能不是最好的,因为HTMLParserError不会被引发。但是!你的例外将会被发现,并且没有被处理。