我试图迭代一遍每10秒刷新一次的列表。
这就是我尝试过的:
driver.get("https://www.winmasters.ro/ro/live-betting/")
events = driver.find_elements_by_css_selector('.event-wrapper.v1.event-live.odds-hidden.event-sport-1')
for i in range(len(events)):
try:
event = events[i]
name = event.find_element_by_css_selector('.event-details-team-name.event-details-team-a')# the error occurs here
except: # NoSuchElementException or StaleElementReferenceException
time.sleep(3) # i have tried up to 20 sec
event = events[i]
name = event.find_element_by_css_selecto('.event-details-team-name.event-details-team-a')
这不管用,所以我又试了一次,除了
except: # second try that also did not work
element = WebDriverWait(driver, 20).until(
EC.presence_of_all_elements_located((By.CSS_SELECTOR, '.event-details-team-name.event-details-team-a'))
)
name = event.find_element_by_css_selecto('.event-details-team-name.event-details-team-a')
现在,我正在分配一些我永远不会使用的name
,例如:
try:
event = events[i]
name = event.find_element_by_css_selector('.event-details-team-name.event-details-team-a')
except:
name = "blablabla"
使用此代码,当页面刷新时,我会得到大约7或8个"blablabla“,直到它再次从网页中找到我的选择器。
发布于 2018-12-26 21:31:02
您可以使用JavaScript获取所有必需的数据。
下面的代码将为您提供事件列表( map
),包括所有细节,并且没有NoSuchElementException
或StaleElementReferenceException
错误:
me_id:唯一标识符
href :href具有可用于获取详细信息的详细信息
team_a:第一团队的名称
team_a_score:一线队得分
team_b:第二组的名称
team_b_score:第二队得分
event_status:事件的状态
event_clock:事件的时间
def events = driver.execute_script('return [...document.querySelectorAll(\'[data-uat="live-betting-overview-leagues"] .events-for-league .event-live\')].map(e=>{return {me_id:e.getAttribute("me_id"), href:e.querySelector("a.event-details-live").href, team_a:e.querySelector(".event-details-team-a").textContent, team_a_score:e.querySelector(".event-details-score-1").textContent, team_b:e.querySelector(".event-details-team-b").textContent, team_b_score:e.querySelector(".event-details-score-2").textContent, event_status:e.querySelector(\'[data-uat="event-status"]\').textContent, event_clock:e.querySelector(\'[data-uat="event-clock"]\').textContent}})')
for event in events:
print(event.get('me_id'))
print(event.get('href')) #using href you can open event details using: driver.get(event.get('href'))
print(event.get('team_a'))
print(event.get('team_a_score'))
print(event.get('team_b'))
print(event.get('team_b_score'))
print(event.get('event_status'))
print(event.get('event_clock'))
发布于 2018-12-26 19:49:35
一个主要的问题是,您要预先获取所有元素,然后迭代该列表。由于页面本身正在频繁更新,您已经获得的元素已经“陈旧”了,这意味着它们与当前DOM对象的关联时间并不长。当您尝试使用那些陈旧的元素时,Selenium会抛出StaleElementReferenceExceptions,因为它无法对那些过时的对象做任何事情。
克服这一问题的一种方法是只在需要时正确地获取和使用一个元素,而不是预先抓取所有元素。我个人认为最干净的方法是使用CSS :nth-child()
方法:
from selenium import webdriver
def main():
base_css = '.event-wrapper.v1.event-live.odds-hidden.event-sport-1'
driver = webdriver.Chrome()
try:
driver.get("https://www.winmasters.ro/ro/live-betting/")
# Get a list of all elements
events = driver.find_elements_by_css_selector(base_css)
print("Found {} events".format(len(events)))
# Iterate through the list, keeping track of the index
# note that nth-child referencing begins at index 1, not 0
for index, _ in enumerate(events, 1):
name = driver.find_element_by_css_selector("{}:nth-child({}) {}".format(
base_css,
index,
'.event-details-team-name.event-details-team-a'
))
print(name.text)
finally:
driver.quit()
if __name__ == "__main__":
main()
如果运行上述脚本,将得到以下输出:
$ python script.py
Found 2 events
Hapoel Haifa
FC Ashdod
现在,由于底层的网页确实更新了很多,仍然有一个不错的机会,你可以得到一个SERE错误。要克服这种情况,可以使用重试装饰器(pip install retry
获取包)来处理SERE并重新获取元素:
import retry
from selenium import webdriver
from selenium.common.exceptions import StaleElementReferenceException
@retry.retry(StaleElementReferenceException, tries=3)
def get_name(driver, selector):
elem = driver.find_element_by_css_selector(selector)
return elem.text
def main():
base_css = '.event-wrapper.v1.event-live.odds-hidden.event-sport-1'
driver = webdriver.Chrome()
try:
driver.get("https://www.winmasters.ro/ro/live-betting/")
events = driver.find_elements_by_css_selector(base_css)
print("Found {} events".format(len(events)))
for index, _ in enumerate(events, 1):
name = get_name(
driver,
"{}:nth-child({}) {}".format(
base_css,
index,
'.event-details-team-name.event-details-team-a'
)
)
print(name)
finally:
driver.quit()
if __name__ == "__main__":
main()
现在,尽管有上述示例,我认为您的CSS选择器仍然存在问题,这是导致NoSuchElement异常的主要原因。如果不更好地描述您实际上试图用这个脚本完成的任务,我将无法帮助您。
https://stackoverflow.com/questions/53932295
复制相似问题