python - Access element using xpath? -
i links of elements in first column in page (http://en.wikipedia.org/wiki/list_of_school_districts_in_alabama).
i comfortable using beautifulsoup, seems less well-suited task (i've been trying access first child of contents of each tr hasn't been working well).
the xpaths follow regular pattern, row number updating each new row in following expression:
xpath = '//*[@id="mw-content-text"]/table[1]/tbody/tr[' + str(counter) + ']/td[1]/a'
would me posting means of iterating through rows links?
i thinking along these lines:
urls = [] while counter < 100: urls.append(get xpath('//*[@id="mw-content-text"]/table[1]/tbody/tr[' + str(counter) + ']/td[1]/a')) counter += 1
thanks!
here's example on how can of links first column:
from lxml import etree import requests url = "http://en.wikipedia.org/wiki/list_of_school_districts_in_alabama" response = requests.get(url) parser = etree.htmlparser() tree = etree.fromstring(response.text, parser) row in tree.xpath('//*[@id="mw-content-text"]/table[1]/tr'): links = row.xpath('./td[1]/a') if links: link = links[0] print link.text, link.attrib.get('href')
note, that, tbody
appended browser - lxml
won't see tag (just skip in xpath).
hope helps.
Comments
Post a Comment