Rails 3从另一个站点提取数据

时间:2021-10-11 09:46:03

I have a client request on one of my projects where they want to be able to enter a url and have it pull in some information form the site who's url they entered and save it in the database.

我有一个客户端请求在我的一个项目中,他们想要输入一个url并让它从他们输入的url的站点中获取一些信息并将其保存到数据库中。

So the user enters: http://www.example.com/2342342 and my controller visits that site, and gets the content of the first <h1>Tag</h1> on the site and saves this in the database. Is this possible? If so, how would I go about doing it? Would I use some rails commands to do it, or something else, like jQuery?

因此用户输入:http://www.example.com/2342342,我的控制器访问该站点,并获取站点上第一个

标记

的内容并保存在数据库中。这是可能的吗?如果是的话,我该怎么做呢?我会使用一些rails命令来完成它,还是使用jQuery之类的其他东西?

2 个解决方案

#1


7  

Nokogiri is a great parser and can work directly with an url.

Nokogiri是一个很好的解析器,可以直接使用url。

So two steps there:

所以两个步骤:

  1. Instantiate a Nokogiri object with the url as param

    用url作为param实例化一个Nokogiri对象

  2. Parse the html page to get what you expect

    解析html页面以得到您所期望的

Find instructions here: http://nokogiri.org/tutorials/parsing_an_html_xml_document.html

在这里找到指示:http://nokogiri.org/tutorials/parsing_an_html_xml_document.html

Because you'll work with another website, keep in mind two advice:

因为你将与另一个网站合作,记住两个建议:

  • wrap your queries so that you can rescue if the website is down

    包装您的查询,以便您可以挽救,如果网站是关闭

  • consider using ajax request because it could be long

    考虑使用ajax请求,因为它可能很长。

#2


3  

I would checkout the Railscast here:

我要在这里结帐:

http://railscasts.com/episodes/190-screen-scraping-with-nokogiri

http://railscasts.com/episodes/190-screen-scraping-with-nokogiri

It's explained very well on how to use Nokogiri and scrape content from other sites.

它很好地解释了如何使用Nokogiri和从其他网站上抓取内容。

#1


7  

Nokogiri is a great parser and can work directly with an url.

Nokogiri是一个很好的解析器,可以直接使用url。

So two steps there:

所以两个步骤:

  1. Instantiate a Nokogiri object with the url as param

    用url作为param实例化一个Nokogiri对象

  2. Parse the html page to get what you expect

    解析html页面以得到您所期望的

Find instructions here: http://nokogiri.org/tutorials/parsing_an_html_xml_document.html

在这里找到指示:http://nokogiri.org/tutorials/parsing_an_html_xml_document.html

Because you'll work with another website, keep in mind two advice:

因为你将与另一个网站合作,记住两个建议:

  • wrap your queries so that you can rescue if the website is down

    包装您的查询,以便您可以挽救,如果网站是关闭

  • consider using ajax request because it could be long

    考虑使用ajax请求,因为它可能很长。

#2


3  

I would checkout the Railscast here:

我要在这里结帐:

http://railscasts.com/episodes/190-screen-scraping-with-nokogiri

http://railscasts.com/episodes/190-screen-scraping-with-nokogiri

It's explained very well on how to use Nokogiri and scrape content from other sites.

它很好地解释了如何使用Nokogiri和从其他网站上抓取内容。