I am currently working on an application where I scrape information from a number of different sites. To get the deeplink for the desired topic on a site I rely on the sitemap that is provided (e.g. "Forum"). As I am expanding I came across some sites that don't provide a sitemap themselves, so I was wondering if there was any way to generate it within Rails from the top level domain?
我目前正在开发一个应用程序,我从许多不同的站点获取信息。为了获得网站上所需主题的深层链接,我依赖于提供的站点地图(例如“论坛”)。随着我的扩展,我遇到了一些自己没有提供站点地图的网站,所以我想知道是否有任何方法可以在*域名的Rails中生成它?
I am using Nokogiri and Mechanize to retrieve data, so if there is any functionality that could help to tackle that task it would be easier to integrate.
我正在使用Nokogiri和Mechanize来检索数据,因此如果有任何功能可以帮助解决该任务,那么集成起来会更容易。
1 个解决方案
#1
0
This can be done with the Spidr gem like so:
这可以使用Spidr gem来完成,如下所示:
url_map = Hash.new { |hash,key| hash[key] = [] }
Spidr.site('http://intranet.com/') do |spider|
spider.every_link do |origin,dest|
url_map[dest] << origin
end
end
#1
0
This can be done with the Spidr gem like so:
这可以使用Spidr gem来完成,如下所示:
url_map = Hash.new { |hash,key| hash[key] = [] }
Spidr.site('http://intranet.com/') do |spider|
spider.every_link do |origin,dest|
url_map[dest] << origin
end
end