python - Access element using xpath? -


i links of elements in first column in page (http://en.wikipedia.org/wiki/list_of_school_districts_in_alabama).

i comfortable using beautifulsoup, seems less well-suited task (i've been trying access first child of contents of each tr hasn't been working well).

the xpaths follow regular pattern, row number updating each new row in following expression:

xpath = '//*[@id="mw-content-text"]/table[1]/tbody/tr[' + str(counter) + ']/td[1]/a' 

would me posting means of iterating through rows links?

i thinking along these lines:

urls = []  while counter < 100:      urls.append(get xpath('//*[@id="mw-content-text"]/table[1]/tbody/tr[' + str(counter) + ']/td[1]/a'))      counter += 1 

thanks!

here's example on how can of links first column:

from lxml import etree import requests  url = "http://en.wikipedia.org/wiki/list_of_school_districts_in_alabama" response = requests.get(url)  parser = etree.htmlparser() tree = etree.fromstring(response.text, parser)  row in tree.xpath('//*[@id="mw-content-text"]/table[1]/tr'):     links = row.xpath('./td[1]/a')     if links:         link = links[0]         print link.text, link.attrib.get('href') 

note, that, tbody appended browser - lxml won't see tag (just skip in xpath).

hope helps.


Comments

Popular posts from this blog

ios - RestKit 0.20 — CoreData: error: Failed to call designated initializer on NSManagedObject class (again) -

java - Digest auth with Spring Security using javaconfig -

laravel - PDOException in Connector.php line 55: SQLSTATE[HY000] [1045] Access denied for user 'root'@'localhost' (using password: YES) -