如何用Nokogiri解析AngularJS页面？

I'm using Nokogiri gem to parse a web page document. However I'm trying to do it on a website that uses AngularJS (I believe) and when I load it into Nokogiri it isn't formatted as what I see when visiting the page in my browser. How can I cause it to load as I would expect so I can parse it via css selectors?

我正在使用Nokogiri gem来解析网页文档。但是我试图在一个使用AngularJS的网站上做(我相信),当我把它加载到Nokogiri时,它没有格式化为我在浏览器中访问页面时看到的内容。我怎样才能使它像我期望的那样加载,所以我可以通过css选择器解析它?

Url: http://www.ukathletics.com/sport/m-footbl/roster/#/2015/Players/table

Code

require 'open-uri'
require 'nokogiri'
require 'capybara-webkit'
require 'capybara/dsl'
require 'byebug'
require './ncaa_school_sites'
require './functions'

include Capybara::DSL
Capybara.current_driver = :webkit
Capybara::Webkit.configure do |config|
  config.block_unknown_urls
  NcaaSchoolSite.where(code: 'KYUN').order(:code).each do |school|
    config.allow_url("*#{school.website_url}")
  end
end

visit(school.roster_url)
doc = Nokogiri::HTML.parse(body)

byebug
roster_table = doc.css("div.player_table table")
headers      = retrieve_headers(roster_table.css("thead tr"))
process_player_rows(roster_table, headers, school,"tbody td",1)

1 个解决方案

#1

I'm confused as to why you're using Capybara and then trying to parse the page with Nokogiri? Using Capybara alone you can do things like things like

我很困惑你为什么要使用Capybara,然后尝试用Nokogiri解析页面?单独使用Capybara就可以做类似的事情

roster_table = page.find(:css, 'div.player_table table')
headers = roster_table.all(:css, 'thead tr')

etc...

#1

I'm confused as to why you're using Capybara and then trying to parse the page with Nokogiri? Using Capybara alone you can do things like things like

我很困惑你为什么要使用Capybara,然后尝试用Nokogiri解析页面?单独使用Capybara就可以做类似的事情

roster_table = page.find(:css, 'div.player_table table')
headers = roster_table.all(:css, 'thead tr')

etc...

秒客网

如何用Nokogiri解析AngularJS页面？

1 个解决方案

#1

#1

相关文章