首页
学习
活动
专区
圈层
工具
发布
首页
学习
活动
专区
圈层
工具
MCP广场
社区首页 >问答首页 >WebScraping Aliexpress -延迟加载

WebScraping Aliexpress -延迟加载
EN

Stack Overflow用户
提问于 2021-04-07 16:18:18
回答 1查看 229关注 0票数 0

我正在尝试使用Selenium和Python对Aliexpress进行web抓取。我是按照youtube的教程来做的,我遵循了每一个步骤,但我似乎就是不能让它工作。

我尝试使用requests,BeautifulSoup也是如此。但是Aliexpress似乎在他们的产品列表中使用了惰性加载器。我尝试使用窗口滚动脚本,但它不起作用。看起来内容在我亲自滚动之前是不会加载的。

这是我想要web抓取https://www.aliexpress.com/wholesale?trafficChannel=main&d=y&CatId=0&SearchText=dog+supplies的页面的url

这是我目前拥有的代码。它不会在输出中返回任何内容。我想这是因为它试图浏览所有的产品列表,但由于没有加载,所以找不到任何产品列表……

任何建议/帮助都将非常感谢,对于错误的格式和错误的代码,我们深表歉意。

谢谢!

代码语言:javascript
运行
复制
"""
To do
HOT PRODUCT FINDER Enter: Keyword, to generate a url

Product Name
Product Image
Product Link
Sales Number
Price
Create an excel file that contains these data
Sort the list by top selling orders
Develop an algorithm for the velocity of the product (total sales increased / time?)
Scrape site every day """

import csv
from bs4 import BeautifulSoup
from selenium import webdriver
import time
import requests
import lxml

#Starting Up the web driver
driver = webdriver.Chrome()

# grab Keywords
search_term = input('Keywords: ')

# url generator

def get_url(search_term):
    """Generate a url link using search term provided"""
    url_template = 'https://www.aliexpress.com/wholesale?trafficChannel=main&d=y&CatId=0&SearchText={}&ltype=wholesale&SortType=default&g=n'
    search_term = search_term.replace(" ", "+")
    return url_template.format(search_term)

url = get_url('search_term')
driver.get(url)

#scrolling down to the end of the page
time.sleep(2)
driver.execute_script('window.scrollTo(0, document.body.scrollHeight);')

#Extracting the Collection
r = requests.get(url)
soup = BeautifulSoup(r.content,'lxml')
productlist = soup.find_all('div', class_='list product-card')
print(productlist)
EN

回答 1

Stack Overflow用户

发布于 2021-04-07 16:56:10

代码语言:javascript
运行
复制
import csv
from bs4 import BeautifulSoup
from selenium import webdriver
import time
import requests
import lxml
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC

chrome_options = webdriver.ChromeOptions()
chrome_options.add_argument("start-maximized")
chrome_options.add_argument("disable-infobars")
chrome_options.add_argument("--disable-extensions")
chrome_options.add_argument("--disable-gpu")
chrome_options.add_argument("--disable-dev-shm-usage")
chrome_options.add_argument("--no-sandbox")
chrome_options.add_argument('--disable-blink-features=AutomationControlled') 

driver = webdriver.Chrome(executable_path = 'chromedriver.exe',options = chrome_options)
from selenium.webdriver.support.ui import Select
from selenium.webdriver.common.keys import Keys

# grab Keywords
search_term = input('Keywords: ')

# url generator
driver.get('https://www.aliexpress.com')
driver.implicitly_wait(10)


p = driver.find_element_by_name('SearchText')
p.send_keys(search_term)
p.send_keys(Keys.ENTER)

productlist = []
product = driver.find_element_by_xpath('//*[@id="root"]/div/div/div[2]/div[2]/div/div[2]/ul')

height = driver.execute_script("return document.body.scrollHeight")
for scrol in range(100,height-1800,100):
    driver.execute_script(f"window.scrollTo(0,{scrol})")
    time.sleep(0.5)
# driver.execute_script('window.scrollTo(0, document.body.scrollHeight);')
div = []
list_i = []
item_title = []
a = []
for z in range(1,16):                     
    div.append(product.find_element_by_xpath('//*[@id="root"]/div/div/div[2]/div[2]/div/div[2]/ul/div'+str([z])))
for pr in div:
    list_i.append(pr.find_elements_by_class_name('list-item'))
for pc in list_i:
    for p in pc:
        item_title.append(p.find_element_by_class_name('item-title-wrap'))
for pt in item_title:
    a.append(pt.find_element_by_tag_name('a'))
for prt in a:
    productlist.append(prt.text)
票数 0
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/66982115

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档