I am trying to parse a list of urls from a webpage. I did the following things:
我试图从网页中解析url列表。我做了以下事情:
- Got a list of all "a" tags.
- 得到所有“a”标签的列表。
- Used a for loop to
get("href")
- 使用for循环获取(“href”)
- While looping, I kept assigning the get value to a new empty list called links
- 在循环过程中,我不断地将get值分配给一个名为links的新的空列表
But I kept getting a index out of range error. I thought it might be because of the way I was incrementing the index of links, but I am sure that is not the case. This is the error prone code:
但我总是得到一个超出范围的指数。我认为这可能是因为我增加链接索引的方式,但我确信事实并非如此。这是容易出错的代码:
import urllib
import bs4
url = "http://tellerprimer.ucdavis.edu/pdf/"
response = urllib.urlopen(url)
webpage = response.read()
soup = bs4.BeautifulSoup(webpage, 'html.parser')
i = 0
links = []
for tags in soup.find_all('a'):
links[i] = str(tags.get('href'))
i +=1
print i, links
I gave links a fixed length and it fixed it, like so:
我给了一个固定长度的链接,它固定了它,就像这样:
links = [0]*89 #89 is the length of soup.find_all('a')
links =[0]*89 #89是soup.find_all('a')的长度
I want to know what was causing this problem.
我想知道是什么引起了这个问题。
2 个解决方案
#1
4
You are attempting to assign something to a non-existent index. When you create links
, you create it as an empty list.
您正在尝试为一个不存在的索引分配一些东西。当您创建链接时,您将它创建为一个空列表。
Then you do links[i]
, but links
is empty, so there is no i
th index.
然后是链接[i],但是链接是空的,所以没有ith索引。
The proper way to do this is:
正确的做法是:
links.append(str(tags.get('href')))
This also means that you can eliminate your i
variable. It's not needed.
这也意味着你可以消去i变量。这不是必要的。
for tags in soup.find_all('a'):
links.append(str(tags.get('href')))
print links
This will print all 89 links in your links
list.
这将打印您的链接列表中的所有89个链接。
#2
1
The list is initially empty, so you're trying to assign values to non-existing index locations in the list.
列表最初是空的,因此您试图将值分配给列表中不存在的索引位置。
Use append()
to add items to a list:
使用append()将项目添加到列表:
links = []
for tags in soup.find_all('a'):
links.append(str(tags.get('href')))
Or use map()
instead:
或使用map():
links = map(lambda tags: str(tags.get('href')), soup.find_all('a'))
Or use a list comprehension:
或使用列表理解:
links = [str(tags.get('href')) for tags in soup.find_all('a')]
#1
4
You are attempting to assign something to a non-existent index. When you create links
, you create it as an empty list.
您正在尝试为一个不存在的索引分配一些东西。当您创建链接时,您将它创建为一个空列表。
Then you do links[i]
, but links
is empty, so there is no i
th index.
然后是链接[i],但是链接是空的,所以没有ith索引。
The proper way to do this is:
正确的做法是:
links.append(str(tags.get('href')))
This also means that you can eliminate your i
variable. It's not needed.
这也意味着你可以消去i变量。这不是必要的。
for tags in soup.find_all('a'):
links.append(str(tags.get('href')))
print links
This will print all 89 links in your links
list.
这将打印您的链接列表中的所有89个链接。
#2
1
The list is initially empty, so you're trying to assign values to non-existing index locations in the list.
列表最初是空的,因此您试图将值分配给列表中不存在的索引位置。
Use append()
to add items to a list:
使用append()将项目添加到列表:
links = []
for tags in soup.find_all('a'):
links.append(str(tags.get('href')))
Or use map()
instead:
或使用map():
links = map(lambda tags: str(tags.get('href')), soup.find_all('a'))
Or use a list comprehension:
或使用列表理解:
links = [str(tags.get('href')) for tags in soup.find_all('a')]