首页
学习
活动
专区
圈层
工具
发布
首页
学习
活动
专区
圈层
工具
MCP广场
社区首页 >问答首页 >如何在python中使用selenium webdriver无限滚动网页?

如何在python中使用selenium webdriver无限滚动网页?
EN

Stack Overflow用户
提问于 2022-01-29 14:31:05
回答 2查看 749关注 0票数 1

我是selenium的新手,我想弄清楚如何无限滚动--我尝试了其他堆栈溢出说的几乎所有的东西

1.

代码语言:javascript
运行
复制
chrome_options = webdriver.ChromeOptions()
chrome_options.add_argument('--headless')
chrome_options.add_argument('--no-sandbox')
chrome_options.add_argument('--disable-dev-shm-usage')
chrome_options.add_argument("user-agent=Mozilla/5.0 (Windows NT 6.1; WOW64; Trident/7.0; rv:11.0) like Gecko")
driver = webdriver.Chrome('chromedriver', options=chrome_options)
driver.set_window_size(1320, 550)

exchange_link = "https://icodrops.com/ico-stats/"
driver.get(exchange_link)
wait = WebDriverWait(driver, 10)

SCROLL_PAUSE_TIME = 0.5

# Get scroll height
last_height = driver.execute_script("return document.body.scrollHeight")

while True:
    # Scroll down to bottom
    driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")

    # Wait to load page
    time.sleep(SCROLL_PAUSE_TIME)

    # Calculate new scroll height and compare with last scroll height
    new_height = driver.execute_script("return document.body.scrollHeight")
    if new_height == last_height:
        break
    last_height = new_height

代码语言:javascript
运行
复制
from selenium.webdriver.common.keys import Keys
Number = wait.until(EC.presence_of_element_located((By.XPATH,'html[1]/body[1]/div[1]/div[1]/div[1]/main[1]/div[1]/div[4]/div[2]/div[1]/div[1]/div[1]')))
lastElement = Number.find_elements(By.XPATH,'div')[-1]
lastElement.send_keys(Keys.NULL)

代码语言:javascript
运行
复制
Number = wait.until(EC.presence_of_element_located((By.XPATH,'html[1]/body[1]/div[1]/div[1]/div[1]/main[1]/div[1]/div[4]/div[2]/div[1]/div[1]/div[1]')))
lastElement = Number.find_elements(By.XPATH,'div')[-1]
lastElement.location_once_scrolled_into_view

代码语言:javascript
运行
复制
driver.execute_script("var scrollingElement = (document.scrollingElement || document.body);scrollingElement.scrollTop = scrollingElement.scrollHeight;")

driver.execute_script("document.getElementById('mydiv').scrollIntoView();")

做点别的我能做的我花了很多时间来修复它

谢谢你的回复。但什么都没用我试过两件事

代码语言:javascript
运行
复制
while True:
        if j == 900:
            break

        try :
            ele = wait.until(EC.visibility_of_element_located((By.XPATH, f"(//div[@id='market-ico-stat-container']/div)[{j}]")))
            driver.execute_script("arguments[0].scrollIntoView(true);", ele)
            ico_name = wait.until(EC.presence_of_element_located((By.XPATH,f'/html[1]/body[1]/div[1]/div[1]/div[1]/main[1]/div[1]/div[5]/div[2]/div[1]/div[1]/div[1]/div[{j}]/a[1]/div[1]/div[1]/div[2]/h3/a'))).get_attribute("textContent")
            print(j)
            print(ico_name)
            j+=1

        except :
            break

但结果一样。从51开始它就不能爬行了。所以这意味着没有向下滚动

EN

回答 2

Stack Overflow用户

发布于 2022-01-29 15:27:34

您应该在execute_script的帮助下逐个滚动每个web元素。

代码:

代码语言:javascript
运行
复制
driver = webdriver.Chrome(driver_path)

driver.maximize_window()
wait = WebDriverWait(driver, 30)
driver.get("https://icodrops.com/ico-stats/")

j = 1
while True:
    ele = wait.until(EC.visibility_of_element_located((By.XPATH, f"(//div[@id='market-ico-stat-container']/div)[{j}]")))
    driver.execute_script("arguments[0].scrollIntoView(true);", ele)
    time.sleep(0.5)
    name = ele.find_element(By.XPATH, ".//descendant::h3//a").get_attribute('innerText')
    print(name)
    j = j + 1

    #below code is just in case you want to break from infinite loop
    if j > 50:
        break

输出:

代码语言:javascript
运行
复制
Ambire Wallet
Himo World
Highstreet
Decimated
Planet Sandbox
BENQI
DeHorizon
Mines Of Dalarnia
MonoX
Lobis
AntEx
Titan Hunters
Tempus
The Realm Defenders
Aurora
XDEFI Wallet
Libre DeFi
Genopets
Mytheria
ReSource
Defactor
PlaceWar
CryptoXpress
Cryowar
Numbers Protocol
Dragon Kart
Trusted Node
Cere Network
Elemon
Meta Spatial
YIN Finance
Ardana
CropBytes
Good Games Guild
Ariadne
ThorSwap
Solend
GooseFX
Galactic Arena
DotOracle
Scallop
AcknoLedger
Clearpool
Sandclock
ArtWallet
Aurory
BloXmove
WonderHero
Lazio Fan Token
Hero Arena

Process finished with exit code 0

导入:

代码语言:javascript
运行
复制
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC

上面的代码永远不会中断,并且将无限执行,为了克服这种行为,您应该引入如下的最大限制:

代码语言:javascript
运行
复制
if j == 500:
    break

但是,web应用程序似乎检测到了Selenium脚本。

票数 1
EN

Stack Overflow用户

发布于 2022-01-29 16:13:44

在接下来的代码更改中,我能够滚动这个代码:

  1. 添加了额外的选项,使脚本不被检测到(它在
  2. 添加键盘操作ARROW_UP之前被阻止),这做了魔术,在js滚动之后开始加载内容。
  3. 添加5秒超时加载新的内容

代码语言:javascript
运行
复制
chrome_options = webdriver.ChromeOptions()
chrome_options.add_argument('--headless')
chrome_options.add_argument('--no-sandbox')
chrome_options.add_argument('--disable-dev-shm-usage')
chrome_options.add_argument("user-agent=Mozilla/5.0 (Windows NT 6.1; WOW64; Trident/7.0; rv:11.0) like Gecko")
#extra options
chrome_options.add_experimental_option("excludeSwitches", ["enable-automation"])
chrome_options.add_argument('--disable-blink-features=AutomationControlled')

driver = webdriver.Chrome('chromedriver', options=chrome_options)
driver.set_window_size(1320, 550)

exchange_link = "https://icodrops.com/ico-stats/"
driver.get(exchange_link)

SCROLL_PAUSE_TIME = 5 #5 seconds
time.sleep(SCROLL_PAUSE_TIME)

# Get scroll height
last_height = driver.execute_script("return document.body.scrollHeight")

for x in range(0, 10):
    driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
    driver.find_element(By.XPATH, "//body").send_keys(Keys.ARROW_UP)
    time.sleep(SCROLL_PAUSE_TIME)
    new_height = driver.execute_script("return document.body.scrollHeight")
    print('current Y: ' + str(new_height))
    if new_height == last_height:
        break
    last_height = new_height
driver.close()

输出:

代码语言:javascript
运行
复制
current Y: 9792
current Y: 32542
current Y: 68942
current Y: 82592
current Y: 82592

我已经用Selenium 4,Chrome 97,Windows进行了测试。

这段代码可能会得到改进和优化,但至少我希望它能工作。

票数 0
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/70906433

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档