我想从这个网站获取表格内容:"https://www.premierleague.com/stats/top/players/red_card?se=42&cl=2“。当我在Chrome浏览器上检查元素时,我可以在浏览器中显示的DOMTree中找到表格条目。但是当我运行下面的代码时,我得到了一个不同的表,它对应于https://www.premierleague.com/stats/top/players/red_card中的表。
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
import time
BASEURL = "https://www.premierleague.com/stats/top/players/"
driver = webdriver.Chrome("/Users/manpreet/Downloads/chromedriver")
driver.get("https://www.premierleague.com/stats/top/players/red_card?se=42&cl=2")
##for i in range(5000):
## print i
## time.sleep(1)
try:
elem = WebDriverWait(driver, 10).until(
EC.presence_of_element_located((By.XPATH, '//*[@id="mainContent"]/div[2]/div/div[2]/div[1]/div[2]/table'))
)
finally:
print('10 secs over')
print(elem.text)我调用WebDriverWait函数长达30秒,但是我没有得到正确的表。我注意到,当我使用WebDriverWait时,Selenium打开的浏览器在整个30秒的时间内都会在https://www.premierleague.com/stats/top/players/red_card中显示表。但是当我不使用WebDriverWait时,驱动程序首先在https://www.premierleague.com/stats/top/players/red_card中显示表,页面加载几秒钟,然后在https://www.premierleague.com/stats/top/players/red_card?se=42&cl=2中显示表。整个过程只需要5-6秒(最多)。我认为当我使用WebDriverWait时,Ajax调用被卡住了。这可能是selenium没有返回正确的表的原因,因为Selenium抓取了显示的内容。
有没有人能告诉我怎么找对的桌子?
发布于 2018-03-19 15:08:45
你需要更多的等待。
1.等待状态下拉菜单关闭。您可以等待css样式的“转换”值被更改。请看我的答案中的自定义等待,element_transform_changed。
2.等待所有过滤器都显示出来。只需在EC中使用WebDriverWait即可。
3.休眠几秒钟,等待Javascript执行。使用time.sleep()。
BASEURL = "https://www.premierleague.com/stats/top/players/"
driver = webdriver.Chrome()
driver.get("https://www.premierleague.com/stats/top/players/red_card?se=42&cl=2")
##for i in range(5000):
## print i
## time.sleep(1)
class element_transform_changed(object):
def __init__(self, locator, text):
self.locator = locator
self.text = text
def __call__(self, driver):
wait = WebDriverWait(driver, 20)
element = wait.until(EC.presence_of_element_located(self.locator))
newText = element.value_of_css_property("transform")
if newText is None or len(newText)==0:
return False
print("OLD: " + self.text + ", NEW: " + newText)
if len(self.text)==0 or (self.text!=newText.strip()):
return element
else:
return False
try:
WebDriverWait(driver, 40).until(element_transform_changed((By.CSS_SELECTOR, "[data-script='pl_stats'] [class*='topStatsFilterDropdown'] ul"),"matrix(1, 0, 0, 1, 0, 0)"))
WebDriverWait(driver, 10).until(EC.presence_of_element_located((By.CSS_SELECTOR, "[data-dropdown-current='FOOTBALL_COMPSEASON']")))
WebDriverWait(driver, 10).until(EC.presence_of_element_located((By.CSS_SELECTOR, "[data-dropdown-current='FOOTBALL_CLUB']")))
WebDriverWait(driver, 10).until(EC.presence_of_element_located((By.CSS_SELECTOR, "[data-dropdown-current='Nationality']")))
WebDriverWait(driver, 10).until(EC.presence_of_element_located((By.CSS_SELECTOR, "[data-dropdown-current='Position']")))
time.sleep(5)
elem = WebDriverWait(driver, 10).until(EC.presence_of_element_located((By.XPATH, '//*[@id="mainContent"]/div[2]/div/div[2]/div[1]/div[2]/table')))
except:
print('ERROR')
print(elem.text)
time.sleep(10)https://stackoverflow.com/questions/49355710
复制相似问题