为什么我需要指定这个列表的大小，否则它会给出超出范围的列表索引错误

I am trying to parse a list of urls from a webpage. I did the following things:

我试图从网页中解析url列表。我做了以下事情:

Got a list of all "a" tags.
得到所有“a”标签的列表。
Used a for loop to get("href")
使用for循环获取(“href”)
While looping, I kept assigning the get value to a new empty list called links
在循环过程中，我不断地将get值分配给一个名为links的新的空列表

But I kept getting a index out of range error. I thought it might be because of the way I was incrementing the index of links, but I am sure that is not the case. This is the error prone code:

但我总是得到一个超出范围的指数。我认为这可能是因为我增加链接索引的方式，但我确信事实并非如此。这是容易出错的代码:

import urllib
import bs4
url = "http://tellerprimer.ucdavis.edu/pdf/"
response = urllib.urlopen(url)
webpage = response.read()
soup = bs4.BeautifulSoup(webpage, 'html.parser')
i = 0
links = []

for tags in soup.find_all('a'):
    links[i] = str(tags.get('href'))
    i +=1
print i, links

I gave links a fixed length and it fixed it, like so:

我给了一个固定长度的链接，它固定了它，就像这样:

links = [0]*89 #89 is the length of soup.find_all('a')

links =[0]*89 #89是soup.find_all('a')的长度

I want to know what was causing this problem.

我想知道是什么引起了这个问题。

2 个解决方案

#1

You are attempting to assign something to a non-existent index. When you create links, you create it as an empty list.

您正在尝试为一个不存在的索引分配一些东西。当您创建链接时，您将它创建为一个空列表。

Then you do links[i], but links is empty, so there is no ith index.

然后是链接[i]，但是链接是空的，所以没有ith索引。

The proper way to do this is:

正确的做法是:

links.append(str(tags.get('href')))

This also means that you can eliminate your i variable. It's not needed.

这也意味着你可以消去i变量。这不是必要的。

for tags in soup.find_all('a'):
    links.append(str(tags.get('href')))
print links

This will print all 89 links in your links list.

这将打印您的链接列表中的所有89个链接。

#2

The list is initially empty, so you're trying to assign values to non-existing index locations in the list.

列表最初是空的，因此您试图将值分配给列表中不存在的索引位置。

Use append() to add items to a list:

使用append()将项目添加到列表:

links = []

for tags in soup.find_all('a'):
     links.append(str(tags.get('href')))

Or use map() instead:

或使用map():

links = map(lambda tags: str(tags.get('href')), soup.find_all('a'))

Or use a list comprehension:

或使用列表理解:

links = [str(tags.get('href')) for tags in soup.find_all('a')]

#1