I have a client request on one of my projects where they want to be able to enter a url and have it pull in some information form the site who's url they entered and save it in the database.

我有一个客户端请求在我的一个项目中，他们想要输入一个url并让它从他们输入的url的站点中获取一些信息并将其保存到数据库中。

So the user enters: http://www.example.com/2342342 and my controller visits that site, and gets the content of the first <h1>Tag</h1> on the site and saves this in the database. Is this possible? If so, how would I go about doing it? Would I use some rails commands to do it, or something else, like jQuery?

因此用户输入:http://www.example.com/2342342，我的控制器访问该站点，并获取站点上第一个

标记

的内容并保存在数据库中。这是可能的吗?如果是的话，我该怎么做呢?我会使用一些rails命令来完成它，还是使用jQuery之类的其他东西?

2 个解决方案

#1

Nokogiri is a great parser and can work directly with an url.

Nokogiri是一个很好的解析器，可以直接使用url。

So two steps there:

所以两个步骤:

Instantiate a Nokogiri object with the url as param

用url作为param实例化一个Nokogiri对象
Parse the html page to get what you expect

解析html页面以得到您所期望的

Find instructions here: http://nokogiri.org/tutorials/parsing_an_html_xml_document.html

在这里找到指示:http://nokogiri.org/tutorials/parsing_an_html_xml_document.html

Because you'll work with another website, keep in mind two advice:

因为你将与另一个网站合作，记住两个建议:

wrap your queries so that you can rescue if the website is down

包装您的查询，以便您可以挽救，如果网站是关闭
consider using ajax request because it could be long

考虑使用ajax请求，因为它可能很长。

#2

I would checkout the Railscast here:

我要在这里结帐:

http://railscasts.com/episodes/190-screen-scraping-with-nokogiri

It's explained very well on how to use Nokogiri and scrape content from other sites.

它很好地解释了如何使用Nokogiri和从其他网站上抓取内容。

#1