我想刮所有的评论,从谷歌游戏商店为一个特定的应用程序。我编写了以下脚本:
# App Reviews Scraper
import time
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from webdriver_manager.chrome import ChromeDriverManager
from bs4 import BeautifulSoup
url = "https://play.google.com/store/apps/details?id=com.android.chrome&hl=en&showAllReviews=true"
# make request
driver = webdriver.Chrome(service=Service(ChromeDriverManager().install()))
driver.get(url)
SCROLL_PAUSE_TIME = 5
# Get scroll height
last_height = driver.execute_script("return document.body.scrollHeight")
time.sleep(SCROLL_PAUSE_TIME)
while True:
# Scroll down to bottom
driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
# Wait to load page
time.sleep(SCROLL_PAUSE_TIME)
# Calculate new scroll height and compare with last scroll height
new_height = driver.execute_script("return document.body.scrollHeight")
if new_height == last_height:
break
last_height = new_height
# Get everything inside <html> tag including javscript
html = driver.execute_script("return document.getElementsByTagName('html')[0].innerHTML")
soup = BeautifulSoup(html, 'html.parser')
reviewer = []
date = []
# review text
for span in soup.find_all("span", class_="X43Kjb"):
reviewer.append(span.text)
# review date
for span in soup.find_all("span", class_="p2TkOb"):
date.append(span.text)
print(len(reviewer))
print(len(date))
然而,它总是只显示203。有35,474,218份评论。那么,我如何下载所有的评论呢?
发布于 2021-12-12 10:10:51
wait=WebDriverWait(driver,1)
try:
wait.until(EC.element_to_be_clickable((By.XPATH,"//span[text()='Show More']"))).click()
except:
continue
只需添加这个以检查无限滚动中的show元素。
进口:
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
https://stackoverflow.com/questions/70322053
复制相似问题