问尝试从web表中提取数据时出现Selenium错误
EN

Stack Overflow用户

提问于 2015-09-09 07:26:03

回答 1查看 582关注 0票数 3

这是Selenium和Python。下面的第一行很好用：

from selenium import webdriver
    browser = webdriver.Firefox()
    browser.get('http://www.palottery.state.pa.us/Games/Past-Winning-Numbers.aspx?id=8')
    elm = browser.find_element_by_xpath(".//*[@id='p_lt_zoneMain_pageplaceholder1_p_lt_zoneContent_pageplaceholder_p_lt_zoneContent_PaLotteryPastWinningNumbers_Button1']")
    elm.click()
    elm2 = browser.find_element_by_xpath(".//*[@id='page-content']/div[2]/div/a/img")
    elm2.click()
    browser.implicitly_wait(10)

在这里我得到了错误

    Dtable = browser.find_element_by_xpath('.//*[@id="p_lt_zoneLeft_PaLotteryPastWinningNumbers_Results"]/tbody')

    for i in Dtable.find_elements_by_xpath('.//tr'):
        print(i.get_attribute('innerHTML'))

方法消息:找不到元素：{“elenium.common.exceptions.NoSuchElementException：”：“xpath”，"selector":".//*@id=\"p_lt_zoneLeft_PaLotteryPastWinningNumbers_Results\"/tbody"}

更新：我仍然无法获得表中的全部250行。因为某些原因，我只得到了10行...

def getWinNums():

    l = []

    from selenium import webdriver
    browser = webdriver.Firefox()
    browser.get('http://www.palottery.state.pa.us/Games/Past-Winning-Numbers.aspx?id=8')

    elm = browser.find_element_by_xpath(".//*[@id='p_lt_zoneMain_pageplaceholder1_p_lt_zoneContent_pageplaceholder_p_lt_zoneContent_PaLotteryPastWinningNumbers_Button1']")
    elm.click()
    elm2 = browser.find_element_by_xpath(".//*[@id='page-content']/div[2]/div/a/img")
    elm2.click()
    browser.implicitly_wait(10)

    Dtable = browser.find_element_by_xpath(".//*[@id='page-content']//table/tbody")


        # create list were elements are dates followed by 5 numbers for that date
    l = [i.text.strip() for i in Dtable.find_elements_by_xpath('.//td') if i.text != "Payout"]

    browser.close()

    # convert list into list of tuples (date, 5 numbers)
    l =  zip(*[iter(l)]*2)

    return l


def main():

    l = getWinNums()

    for el in l:
        print(el)


if __name__ == "__main__":
        main()

输出：

('09/08/2015', '2   32   35   36   39')

('09/07/2015', '14   17   19   24   43')

('09/06/2015', '10   13   15   36   38')

('09/05/2015', '4   5   24   29   34')

('09/04/2015', '1   12   18   34   36')

('09/03/2015', '4   9   15   28   40')

('09/02/2015', '14   16   17   18   34')

('09/01/2015', '7   26   33   36   41')

('08/31/2015', '17   20   22   32   41')

('08/30/2015', '11   14   23   24   38')

更新#2

CSS选择器的工作方式如下所示，但同样，Dtable.find_elements_by_xpath('.//td')只生成251行中的10行。

Dtable = browser.find_element_by_css_selector("table>tbody")

更新#3

现在，我可以使用下面的代码获得50行表格：

for i in range(1,6):

    link3 = browser.find_element_by_xpath(".//*[@id='p_lt_zoneMain_pageplaceholder1_p_lt_zoneContent_pageplaceholder_p_lt_zoneContent_PaLotteryPastWinningNumbers_Results_paginate']/span/a[{i}]".format(i=i))

    link3.click()

    Dtable = browser.find_element_by_css_selector("table>tbody>tr")

    l = [i.text.strip() for i in Dtable.find_elements_by_xpath('//td') if i.text != "Payout"]

    l_result += l

剩下的问题是如何通过单击分页按钮转到下50行。我可以获得按钮的xpath，它是：

.//*[@id='p_lt_zoneMain_pageplaceholder1_p_lt_zoneContent_pageplaceholder_p_lt_zoneContent_PaLotteryPastWinningNumbers_Results_next']

但是单击它并重复上面的for循环不会从表中产生任何新行。

python

selenium

xpath

webdriver

回答 1

Stack Overflow用户

回答已采纳

发布于 2015-09-09 07:35:41

我猜您想要更改选择器以获取如下所示的表：

 Dtable = browser.find_element_by_xpath('.//*[@id="p_lt_zoneLeft_PaLotteryPastWinningNumbers_Results"]/tbody')

至：

 Dtable = browser.find_element_by_css_selector("table[id^='p_lt_zoneLeft']")

票数 2

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/32468726

复制

相似问题

问尝试从web表中提取数据时出现Selenium错误
EN

回答 1

Stack Overflow用户

社区

活动

资源

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问尝试从web表中提取数据时出现Selenium错误EN

回答 1

Stack Overflow用户

社区

活动

资源

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问尝试从web表中提取数据时出现Selenium错误
EN