这是Selenium和Python。下面的第一行很好用:
from selenium import webdriver
browser = webdriver.Firefox()
browser.get('http://www.palottery.state.pa.us/Games/Past-Winning-Numbers.aspx?id=8')
elm = browser.find_element_by_xpath(".//*[@id='p_lt_zoneMain_pageplaceholder1_p_lt_zoneContent_pageplaceholder_p_lt_zoneContent_PaLotteryPastWinningNumbers_Button1']")
elm.click()
elm2 = browser.find_element_by_xpath(".//*[@id='page-content']/div[2]/div/a/img")
elm2.click()
browser.implicitly_wait(10)
在这里我得到了错误
Dtable = browser.find_element_by_xpath('.//*[@id="p_lt_zoneLeft_PaLotteryPastWinningNumbers_Results"]/tbody')
for i in Dtable.find_elements_by_xpath('.//tr'):
print(i.get_attribute('innerHTML'))
方法消息:找不到元素:{“elenium.common.exceptions.NoSuchElementException:”:“xpath”,"selector":".//*@id=\"p_lt_zoneLeft_PaLotteryPastWinningNumbers_Results\"/tbody"}
更新:我仍然无法获得表中的全部250行。因为某些原因,我只得到了10行...
def getWinNums():
l = []
from selenium import webdriver
browser = webdriver.Firefox()
browser.get('http://www.palottery.state.pa.us/Games/Past-Winning-Numbers.aspx?id=8')
elm = browser.find_element_by_xpath(".//*[@id='p_lt_zoneMain_pageplaceholder1_p_lt_zoneContent_pageplaceholder_p_lt_zoneContent_PaLotteryPastWinningNumbers_Button1']")
elm.click()
elm2 = browser.find_element_by_xpath(".//*[@id='page-content']/div[2]/div/a/img")
elm2.click()
browser.implicitly_wait(10)
Dtable = browser.find_element_by_xpath(".//*[@id='page-content']//table/tbody")
# create list were elements are dates followed by 5 numbers for that date
l = [i.text.strip() for i in Dtable.find_elements_by_xpath('.//td') if i.text != "Payout"]
browser.close()
# convert list into list of tuples (date, 5 numbers)
l = zip(*[iter(l)]*2)
return l
def main():
l = getWinNums()
for el in l:
print(el)
if __name__ == "__main__":
main()
输出:
('09/08/2015', '2 32 35 36 39')
('09/07/2015', '14 17 19 24 43')
('09/06/2015', '10 13 15 36 38')
('09/05/2015', '4 5 24 29 34')
('09/04/2015', '1 12 18 34 36')
('09/03/2015', '4 9 15 28 40')
('09/02/2015', '14 16 17 18 34')
('09/01/2015', '7 26 33 36 41')
('08/31/2015', '17 20 22 32 41')
('08/30/2015', '11 14 23 24 38')
更新#2
CSS选择器的工作方式如下所示,但同样,Dtable.find_elements_by_xpath('.//td')
只生成251行中的10行。
Dtable = browser.find_element_by_css_selector("table>tbody")
更新#3
现在,我可以使用下面的代码获得50行表格:
for i in range(1,6):
link3 = browser.find_element_by_xpath(".//*[@id='p_lt_zoneMain_pageplaceholder1_p_lt_zoneContent_pageplaceholder_p_lt_zoneContent_PaLotteryPastWinningNumbers_Results_paginate']/span/a[{i}]".format(i=i))
link3.click()
Dtable = browser.find_element_by_css_selector("table>tbody>tr")
l = [i.text.strip() for i in Dtable.find_elements_by_xpath('//td') if i.text != "Payout"]
l_result += l
剩下的问题是如何通过单击分页按钮转到下50行。我可以获得按钮的xpath,它是:
.//*[@id='p_lt_zoneMain_pageplaceholder1_p_lt_zoneContent_pageplaceholder_p_lt_zoneContent_PaLotteryPastWinningNumbers_Results_next']
但是单击它并重复上面的for循环不会从表中产生任何新行。
发布于 2015-09-09 07:35:41
我猜您想要更改选择器以获取如下所示的表:
Dtable = browser.find_element_by_xpath('.//*[@id="p_lt_zoneLeft_PaLotteryPastWinningNumbers_Results"]/tbody')
至:
Dtable = browser.find_element_by_css_selector("table[id^='p_lt_zoneLeft']")
https://stackoverflow.com/questions/32468726
复制相似问题