为什么我需要指定这个列表的大小,否则它会给出超出范围的列表索引错误

时间:2021-01-02 16:38:38

I am trying to parse a list of urls from a webpage. I did the following things:

我试图从网页中解析url列表。我做了以下事情:

  1. Got a list of all "a" tags.
  2. 得到所有“a”标签的列表。
  3. Used a for loop to get("href")
  4. 使用for循环获取(“href”)
  5. While looping, I kept assigning the get value to a new empty list called links
  6. 在循环过程中,我不断地将get值分配给一个名为links的新的空列表

But I kept getting a index out of range error. I thought it might be because of the way I was incrementing the index of links, but I am sure that is not the case. This is the error prone code:

但我总是得到一个超出范围的指数。我认为这可能是因为我增加链接索引的方式,但我确信事实并非如此。这是容易出错的代码:

import urllib
import bs4
url = "http://tellerprimer.ucdavis.edu/pdf/"
response = urllib.urlopen(url)
webpage = response.read()
soup = bs4.BeautifulSoup(webpage, 'html.parser')
i = 0
links = []

for tags in soup.find_all('a'):
    links[i] = str(tags.get('href'))
    i +=1
print i, links

I gave links a fixed length and it fixed it, like so:

我给了一个固定长度的链接,它固定了它,就像这样:

links = [0]*89 #89 is the length of soup.find_all('a')

links =[0]*89 #89是soup.find_all('a')的长度

I want to know what was causing this problem.

我想知道是什么引起了这个问题。

2 个解决方案

#1


4  

You are attempting to assign something to a non-existent index. When you create links, you create it as an empty list.

您正在尝试为一个不存在的索引分配一些东西。当您创建链接时,您将它创建为一个空列表。

Then you do links[i], but links is empty, so there is no ith index.

然后是链接[i],但是链接是空的,所以没有ith索引。

The proper way to do this is:

正确的做法是:

links.append(str(tags.get('href')))

This also means that you can eliminate your i variable. It's not needed.

这也意味着你可以消去i变量。这不是必要的。


for tags in soup.find_all('a'):
    links.append(str(tags.get('href')))
print links

This will print all 89 links in your links list.

这将打印您的链接列表中的所有89个链接。

#2


1  

The list is initially empty, so you're trying to assign values to non-existing index locations in the list.

列表最初是空的,因此您试图将值分配给列表中不存在的索引位置。

Use append() to add items to a list:

使用append()将项目添加到列表:

links = []

for tags in soup.find_all('a'):
     links.append(str(tags.get('href')))

Or use map() instead:

或使用map():

links = map(lambda tags: str(tags.get('href')), soup.find_all('a'))

Or use a list comprehension:

或使用列表理解:

links = [str(tags.get('href')) for tags in soup.find_all('a')]

#1


4  

You are attempting to assign something to a non-existent index. When you create links, you create it as an empty list.

您正在尝试为一个不存在的索引分配一些东西。当您创建链接时,您将它创建为一个空列表。

Then you do links[i], but links is empty, so there is no ith index.

然后是链接[i],但是链接是空的,所以没有ith索引。

The proper way to do this is:

正确的做法是:

links.append(str(tags.get('href')))

This also means that you can eliminate your i variable. It's not needed.

这也意味着你可以消去i变量。这不是必要的。


for tags in soup.find_all('a'):
    links.append(str(tags.get('href')))
print links

This will print all 89 links in your links list.

这将打印您的链接列表中的所有89个链接。

#2


1  

The list is initially empty, so you're trying to assign values to non-existing index locations in the list.

列表最初是空的,因此您试图将值分配给列表中不存在的索引位置。

Use append() to add items to a list:

使用append()将项目添加到列表:

links = []

for tags in soup.find_all('a'):
     links.append(str(tags.get('href')))

Or use map() instead:

或使用map():

links = map(lambda tags: str(tags.get('href')), soup.find_all('a'))

Or use a list comprehension:

或使用列表理解:

links = [str(tags.get('href')) for tags in soup.find_all('a')]