I'm looking for a way to use findAll to get two tags, in the order they appear on the page.
我正在寻找一种方法来使用findAll按照它们在页面上显示的顺序获取两个标签。
Currently I have:
目前我有:
import requests
import BeautifulSoup
def get_soup(url):
request = requests.get(url)
page = request.text
soup = BeautifulSoup(page)
get_tags = soup.findAll('hr' and 'strong')
for each in get_tags:
print each
If I use that on a page with only 'em' or 'strong' in it then it will get me all of those tags, if I use on one with both it will get 'strong' tags.
如果我在一个只有'em'或'strong'的页面上使用它,那么它将为我提供所有这些标签,如果我在两者上使用它将获得'强'标签。
Is there a way to do this? My main concern is preserving the order in which the tags are found.
有没有办法做到这一点?我主要关注的是保留标签的查找顺序。
2 个解决方案
#1
55
You could pass a list, to find either hr
or strong
tags:
您可以传递一个列表,以查找hr或strong标记:
tags = soup.find_all(['hr', 'strong'])
#2
4
Use regular expressions:
使用正则表达式:
import re
get_tags = soup.findAll(re.compile(r'(hr|strong)'))
The expression r'(hr|strong)'
will find either hr
tags or strong
tags.
表达式r'(hr | strong)'将找到hr标签或强标签。
#1
55
You could pass a list, to find either hr
or strong
tags:
您可以传递一个列表,以查找hr或strong标记:
tags = soup.find_all(['hr', 'strong'])
#2
4
Use regular expressions:
使用正则表达式:
import re
get_tags = soup.findAll(re.compile(r'(hr|strong)'))
The expression r'(hr|strong)'
will find either hr
tags or strong
tags.
表达式r'(hr | strong)'将找到hr标签或强标签。