html - nokogiri screen scrape css selector issue -
i'm trying css working on rake task.
namespace :task task test: :environment ticketmaster_url = "http://www.ticketmaster.co.uk/derren-brown-miracle-glasgow-04-07-2016/event/370050789149169e?artistid=1408737&majorcatid=10002&minorcatid=53&tpab=-1" doc = nokogiri::html(open(ticketmaster_url)) #psec-p label doc.css("#psec-p").each |price| puts price.at_css("#psec-p") byebug end end end
however i'm returning this:
#<nokogiri::xml::element:0x3fd226469e60 name="fieldset" attributes=[#<nokogiri::xml::attr:0x3fd2281c953c name="class" value="group-price widget-group">, #<nokogiri::xml::attr:0x3fd2281c9528 name="id" value="psec-p">] children=[#<nokogiri::xml::text:0x3fd2281c8d44 "\n ">, #<nokogiri::xml::element:0x3fd2281c8c7c name="legend" attributes=[#<nokogiri::xml::attr:0x3fd2281c8c18 name="id" value="psec-p-legend">] children=[#<nokogiri::xml::text:0x3fd2281c8614 "price:">]>, #<nokogiri::xml::text:0x3fd2281c8448 "\n ">]>
i'm guessing selected wrong element have chosen psec-p
could let me know i'm going wrong?
i've been following railscast 190
the prices on http://www.ticketmaster.co.uk applied html dynamically, via javascript. partially done hinder scraping efforts. cannot use nokogiri scrape type of content domain, nokogiri processes raw html/xml, , not execute javascript in process. other tools exist this, require entirely different approach.
for learning purposes, should choose less dynamic site. instance, http://www.wallacesuk.com has nice, parseable site. learn basic web scraping techniques site presents information inline page, such this.
scraping http://ticketmaster.co.uk require advanced scraping techniques, beyond railscast 190 demonstrating.
Comments
Post a Comment