I'm using Nokogiri gem to parse a web page document. However I'm trying to do it on a website that uses AngularJS (I believe) and when I load it into Nokogiri it isn't formatted as what I see when visiting the page in my browser. How can I cause it to load as I would expect so I can parse it via css selectors?
我正在使用Nokogiri gem来解析网页文档。但是我试图在一个使用AngularJS的网站上做(我相信),当我把它加载到Nokogiri时,它没有格式化为我在浏览器中访问页面时看到的内容。我怎样才能使它像我期望的那样加载,所以我可以通过css选择器解析它?
Url: http://www.ukathletics.com/sport/m-footbl/roster/#/2015/Players/table
Code
require 'open-uri'
require 'nokogiri'
require 'capybara-webkit'
require 'capybara/dsl'
require 'byebug'
require './ncaa_school_sites'
require './functions'
include Capybara::DSL
Capybara.current_driver = :webkit
Capybara::Webkit.configure do |config|
config.block_unknown_urls
NcaaSchoolSite.where(code: 'KYUN').order(:code).each do |school|
config.allow_url("*#{school.website_url}")
end
end
visit(school.roster_url)
doc = Nokogiri::HTML.parse(body)
byebug
roster_table = doc.css("div.player_table table")
headers = retrieve_headers(roster_table.css("thead tr"))
process_player_rows(roster_table, headers, school,"tbody td",1)
1 个解决方案
#1
0
I'm confused as to why you're using Capybara and then trying to parse the page with Nokogiri? Using Capybara alone you can do things like things like
我很困惑你为什么要使用Capybara,然后尝试用Nokogiri解析页面?单独使用Capybara就可以做类似的事情
roster_table = page.find(:css, 'div.player_table table')
headers = roster_table.all(:css, 'thead tr')
etc...
#1
0
I'm confused as to why you're using Capybara and then trying to parse the page with Nokogiri? Using Capybara alone you can do things like things like
我很困惑你为什么要使用Capybara,然后尝试用Nokogiri解析页面?单独使用Capybara就可以做类似的事情
roster_table = page.find(:css, 'div.player_table table')
headers = roster_table.all(:css, 'thead tr')
etc...