I have a client request on one of my projects where they want to be able to enter a url and have it pull in some information form the site who's url they entered and save it in the database.
我有一个客户端请求在我的一个项目中,他们想要输入一个url并让它从他们输入的url的站点中获取一些信息并将其保存到数据库中。
So the user enters: http://www.example.com/2342342 and my controller visits that site, and gets the content of the first <h1>Tag</h1>
on the site and saves this in the database. Is this possible? If so, how would I go about doing it? Would I use some rails commands to do it, or something else, like jQuery?
因此用户输入:http://www.example.com/2342342,我的控制器访问该站点,并获取站点上第一个
标记
的内容并保存在数据库中。这是可能的吗?如果是的话,我该怎么做呢?我会使用一些rails命令来完成它,还是使用jQuery之类的其他东西?2 个解决方案
#1
7
Nokogiri is a great parser and can work directly with an url.
Nokogiri是一个很好的解析器,可以直接使用url。
So two steps there:
所以两个步骤:
-
Instantiate a Nokogiri object with the url as param
用url作为param实例化一个Nokogiri对象
-
Parse the html page to get what you expect
解析html页面以得到您所期望的
Find instructions here: http://nokogiri.org/tutorials/parsing_an_html_xml_document.html
在这里找到指示:http://nokogiri.org/tutorials/parsing_an_html_xml_document.html
Because you'll work with another website, keep in mind two advice:
因为你将与另一个网站合作,记住两个建议:
-
wrap your queries so that you can rescue if the website is down
包装您的查询,以便您可以挽救,如果网站是关闭
-
consider using ajax request because it could be long
考虑使用ajax请求,因为它可能很长。
#2
3
I would checkout the Railscast here:
我要在这里结帐:
http://railscasts.com/episodes/190-screen-scraping-with-nokogiri
http://railscasts.com/episodes/190-screen-scraping-with-nokogiri
It's explained very well on how to use Nokogiri and scrape content from other sites.
它很好地解释了如何使用Nokogiri和从其他网站上抓取内容。
#1
7
Nokogiri is a great parser and can work directly with an url.
Nokogiri是一个很好的解析器,可以直接使用url。
So two steps there:
所以两个步骤:
-
Instantiate a Nokogiri object with the url as param
用url作为param实例化一个Nokogiri对象
-
Parse the html page to get what you expect
解析html页面以得到您所期望的
Find instructions here: http://nokogiri.org/tutorials/parsing_an_html_xml_document.html
在这里找到指示:http://nokogiri.org/tutorials/parsing_an_html_xml_document.html
Because you'll work with another website, keep in mind two advice:
因为你将与另一个网站合作,记住两个建议:
-
wrap your queries so that you can rescue if the website is down
包装您的查询,以便您可以挽救,如果网站是关闭
-
consider using ajax request because it could be long
考虑使用ajax请求,因为它可能很长。
#2
3
I would checkout the Railscast here:
我要在这里结帐:
http://railscasts.com/episodes/190-screen-scraping-with-nokogiri
http://railscasts.com/episodes/190-screen-scraping-with-nokogiri
It's explained very well on how to use Nokogiri and scrape content from other sites.
它很好地解释了如何使用Nokogiri和从其他网站上抓取内容。