首页
学习
活动
专区
工具
TVP
发布
社区首页 >问答首页 >如何通过使用SeleniumWebdriver和Python滚动查找网页上的所有元素

如何通过使用SeleniumWebdriver和Python滚动查找网页上的所有元素
EN

Stack Overflow用户
提问于 2018-12-08 07:45:38
回答 2查看 2.5K关注 0票数 1

我似乎不能把所有的元素都放在网页上。不管我用过什么硒。我肯定我错过了什么。这是我的密码。该url至少有30个元素,但每当我刮擦时,只有6个元素返回。我遗漏了什么?

代码语言:javascript
复制
import requests
import webbrowser
import time
from bs4 import BeautifulSoup as bs
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
from selenium.common.exceptions import NoSuchElementException



headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36'}
url = 'https://www.adidas.com/us/men-shoes-new_arrivals'

res = requests.get(url, headers = headers)
page_soup = bs(res.text, "html.parser")


containers = page_soup.findAll("div", {"class": "gl-product-card-container show-variation-carousel"})


print(len(containers))
#for each container find shoe model
shoe_colors = []

for container in containers:
    if container.find("div", {'class': 'gl-product-card__reviews-number'}) is not None:
        shoe_model = container.div.div.img["title"]
        review = container.find('div', {'class':'gl-product-card__reviews-number'})
        review = int(review.text)



driver = webdriver.Chrome()
driver.get(url)
time.sleep(5)
shoe_prices = driver.find_elements_by_css_selector('.gl-price')

for price in shoe_prices:
    print(price.text)
print(len(shoe_prices))
EN

回答 2

Stack Overflow用户

回答已采纳

发布于 2018-12-08 19:39:31

因此,在使用代码试用版时,结果似乎有一些不同。

  • 您可以找到30项和requests,以及Selenium6条目
  • 在这里,我发现了40项和请求,和4项和Selenium

该网站上的这些项是通过懒惰加载动态生成的,因此您必须使用scrollDown并等待HTML DOM中呈现的新元素,您可以使用以下解决方案:

  • 代码块: 从bs4导入BeautifulSoup作为bs从selenium.webdriver.chrome.options导入User驱动程序,从selenium.webdriver.support.ui导入WebDriverWait,从selenium.webdriver.support导入expected_conditions作为EC从selenium.webdriver.common.by导入,TimeoutException headers ={‘User’:'Mozilla/5.0 (Windows 10.0;Win64;( x64) AppleWebKit/537.36 (KHTML,类似壁虎) Chrome/61.0.3163.100 Safari/537.36'} url = '抵港‘res = requests.get(url,headers = headers) page_soup = bs(res.text,"html.parser")容器= page_soup.findAll("div",{"class":“gl-产品-容器显示-容器显示-变异-旋转木马”})打印(len(容器)) shoe_colors = []容器中的容器:如果container.find("div",{'class':‘gl-产品-卡_评论-数字’}不是零: shoe_model = container.div.div.img"title“review = container.find('div',{‘class’:‘gl-产品-卡_ review -数字’}审查= int(review.text) options = Options() options.add_argument(‘start-最大化’)options.add_argument(‘disable’)options.add_argument(‘-disable’)驱动程序= webdriver.Chrome(chrome_options=options,webdriver.Chrome driver.get(url) myLength =len(驱动程序,WebDriverWait),( 20).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR,“span.gl-Price”))(而True: driver.execute_script("window.scrollBy(0,400)","")尝试:WebDriverWait(驱动程序,20) len(driver.find_elements_by_css_selector("span.gl-price")) (lambda驱动程序: driver.find_elements_by_css_selector("span.gl-price") > myLength) title =driver.find_elements_by_css_selector(“span.gl-price”) myLength =len(Title)除了TimeoutException:空格打印(MyLength)标题中标题: print(title.text) driver.quit()
  • 控制台输出: $100 $100 $100 $100 $100 $100 $100 $100 $100 $100 $100 $100 $100 $100 $100$100$100$100$100$100$100$100$100 $180 $130 $130 $130 $130 $120 $120 $120 $120 $100 $140 $100$100 $180 $100$100$100$100$100$100$100$100$100$100$100$100$100$100$100$100$100$100$100$100$100$100$100$100$100$100$100$100$100$100$100$100$100$100$100$100$100$100$100$100$100$100$100$100$100$100$100$100$100$100$100$100$100$100$100$100$100$100$100$100$100$100$100
票数 0
EN

Stack Overflow用户

发布于 2018-12-08 09:55:30

你得慢慢地向下滚动。它只在查看产品时使用ajax请求价格数据。

代码语言:javascript
复制
options = Options()
options.add_argument('--start-maximized')
driver = webdriver.Chrome(options=options)

url = 'https://www.adidas.com/us/men-shoes-new_arrivals'
driver.get(url)

scroll_times = len(driver.find_elements_by_class_name('col-s-6')) / 4 # (divide by 4 column product per row)
scrolled = 0
scroll_size = 400

while scrolled < scroll_times:
    driver.execute_script('window.scrollTo(0, arguments[0]);', scroll_size)
    scrolled +=1
    scroll_size += 400
    time.sleep(1)

shoe_prices = driver.find_elements_by_class_name('gl-price')

for price in shoe_prices:
    print(price.text)

print(len(shoe_prices))
票数 1
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/53680597

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档