ruby on rails - Tried Webscraping with XPATH, Nokogiri, Mechanize -
I am trying to parse some information from a secure web site and it is getting to work.
If I can get the first value, then I can customize it to get relief ...
This example is returned to careers
entity type should return next
source:
http://safer.fmcsa.dot.gov/query.asp?searchtype=ANY&query_type=queryCarrierSnapshot&query_param = MC_MX & amp; query_string = 733709
mechanization w / Hpricot
require the need to require 'rubygems' is 'mechanization' Hpricot 'agent = Mechanize.new page = agent.get (' http: //safer.fmcsa? .dot.gov / Query.asp search = any & amp; QUERY_TYPE = queryCarrierSnapshot & amp; query_param = MC_MX & amp; QUERY_STRING = 733,709 ') @response = page.content Dock = Hpricot (@response) a = (doc / "/ html / Body / P / table / tbody / tr [2] / td / table / tbody / tr [2] / td / center [1] / table / tbody / tr [2] / td ") [0]. WinnerHTML A
Notchory
requir e 'nokogiri' is required 'open uri' doc = Nokogiri :: HTML (open ("http: /safer.fmcsa.dot.gov/query.asp?searchtype=ANY&query_type=queryCarrierSnapshot&query_param=MC_MX&query_string= 733,709 ")) EBIT = doc.at (" / html / body / p / table / tbody / tr [2] / td / table / tbody / tr [2] / td / center [1] / table / tbody / tr [2] / td "). The text entry ebit
This value looks like a column, all have the same CSS class , So it is possible to search using it. It works for me.
'nockery' is required 'Open-Yuri' Doctor = Dochory :: HTML (open ("http://safer.fmcsa.dot.gov/query.asp? search- = any & amp; QUERY_TYPE = queryCarrierSnapshot & amp; query_param = MC_MX & amp;.! QUERY_STRING = 733,709 ")) # Get the entity type field EBIT = doc.at ( '. queryfield') lesson # all white space ebit.gsub Get rid of ("\ U00A0", ""). Strip! Put ebit
Comments
Post a Comment