There are some question like this online but I looked at them and none of them have helped me I am currently working on a script that pulls an item name from http://www.supremenewyork.com/shop/all/accessories
网上有一些类似的问题,但我看了一下,没有一个人帮我,我现在正在写一个脚本,从http://www.mainmenewyork.com/shop/all/accessories中提取一个项目名称。
I want it to pull this information from supreme uk but im having trouble with the proxy stuff but right now im strugglinh with this script everytime I run it I get the error listed above in the title.
我想让它从最高的英国获取信息,但是我在代理问题上遇到了麻烦,但是现在我在运行这个脚本的时候,每次运行它都会得到上面列出的错误。
Here is my Script:
这是我的脚本:
import requests
from bs4 import BeautifulSoup
URL = ('http://www.supremenewyork.com/shop/all/accessories')
proxy_script = requests.get(URL).text
soup = BeautifulSoup(proxy_script, 'lxml')
for item in soup.find_all('div', class_='inner-article'):
name = soup.find('h1', itemprop='name').text
print(name)
I am always getting this error and when I run the script without the .text at the end of the itemprop=name
I just get a bunch of None's like this:
我总是会得到这个错误当我运行脚本的时候没有。text在itemprop=名字的末尾我只是得到了一堆这样的东西:
None
None
None etc......
The exact amount of Nones as there are Items available to print
确切的Nones数量,因为有可以打印的项目。
1 个解决方案
#1
0
Here we go, I've commented the code that I've used below. and the reason we use class_='something
is because the word class
is reserved for classes in Python.
现在,我已经注释了下面所使用的代码。我们使用class_='的原因是因为这个单词类是为Python中的类保留的。
URL = ('http://www.supremenewyork.com/shop/all/accessories')
URL =(“http://www.supremenewyork.com/shop/all/accessories”)
#UK_Proxy1 = '178.62.13.163:8080'
#proxies = {
# 'http': 'http://' + UK_Proxy1,
#'https': 'https://' + UK_Proxy1
#}
#proxy_script = requests.get(URL, proxies=proxies).text
proxy_script = requests.get(URL).text
soup = BeautifulSoup(proxy_script, 'lxml')
thetable = soup.find('div', class_='turbolink_scroller')
items = thetable.find_all('div', class_='inner-article')
for item in items:
only_text = item.h1.a.text
# by doing .<tag> we extract information just from that tag
# example bsobject = <html><body><b>ey</b></body</html>
# if we print bsobject.body.b it will return `<b>ey</b>`
color = item.p.a.text
print(only_text, color)
#1
0
Here we go, I've commented the code that I've used below. and the reason we use class_='something
is because the word class
is reserved for classes in Python.
现在,我已经注释了下面所使用的代码。我们使用class_='的原因是因为这个单词类是为Python中的类保留的。
URL = ('http://www.supremenewyork.com/shop/all/accessories')
URL =(“http://www.supremenewyork.com/shop/all/accessories”)
#UK_Proxy1 = '178.62.13.163:8080'
#proxies = {
# 'http': 'http://' + UK_Proxy1,
#'https': 'https://' + UK_Proxy1
#}
#proxy_script = requests.get(URL, proxies=proxies).text
proxy_script = requests.get(URL).text
soup = BeautifulSoup(proxy_script, 'lxml')
thetable = soup.find('div', class_='turbolink_scroller')
items = thetable.find_all('div', class_='inner-article')
for item in items:
only_text = item.h1.a.text
# by doing .<tag> we extract information just from that tag
# example bsobject = <html><body><b>ey</b></body</html>
# if we print bsobject.body.b it will return `<b>ey</b>`
color = item.p.a.text
print(only_text, color)