我正在尝试使用Selenium和Python对Aliexpress进行web抓取。我是按照youtube的教程来做的,我遵循了每一个步骤,但我似乎就是不能让它工作。
我尝试使用requests,BeautifulSoup也是如此。但是Aliexpress似乎在他们的产品列表中使用了惰性加载器。我尝试使用窗口滚动脚本,但它不起作用。看起来内容在我亲自滚动之前是不会加载的。
这是我想要web抓取https://www.aliexpress.com/wholesale?trafficChannel=main&d=y&CatId=0&SearchText=dog+supplies的页面的url
这是我目前拥有的代码。它不会在输出中返回任何内容。我想这是因为它试图浏览所有的产品列表,但由于没有加载,所以找不到任何产品列表……
任何建议/帮助都将非常感谢,对于错误的格式和错误的代码,我们深表歉意。
"""
To do
HOT PRODUCT FINDER Enter: Keyword, to generate a url
Product Name
Product Image
Product Link
Sales Number
Price
Create an excel file that contains these data
Sort the list by top selling orders
Develop an algorithm for the velocity of the product (total sales increased / time?)
Scrape site every day """
import csv
from bs4 import BeautifulSoup
from selenium import webdriver
import time
import requests
import lxml
#Starting Up the web driver
driver = webdriver.Chrome()
# grab Keywords
search_term = input('Keywords: ')
# url generator
def get_url(search_term):
"""Generate a url link using search term provided"""
url_template = 'https://www.aliexpress.com/wholesale?trafficChannel=main&d=y&CatId=0&SearchText={}<ype=wholesale&SortType=default&g=n'
search_term = search_term.replace(" ", "+")
return url_template.format(search_term)
url = get_url('search_term')
driver.get(url)
#scrolling down to the end of the page
time.sleep(2)
driver.execute_script('window.scrollTo(0, document.body.scrollHeight);')
#Extracting the Collection
r = requests.get(url)
soup = BeautifulSoup(r.content,'lxml')
productlist = soup.find_all('div', class_='list product-card')
print(productlist)
发布于 2021-04-07 16:56:10
import csv
from bs4 import BeautifulSoup
from selenium import webdriver
import time
import requests
import lxml
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
chrome_options = webdriver.ChromeOptions()
chrome_options.add_argument("start-maximized")
chrome_options.add_argument("disable-infobars")
chrome_options.add_argument("--disable-extensions")
chrome_options.add_argument("--disable-gpu")
chrome_options.add_argument("--disable-dev-shm-usage")
chrome_options.add_argument("--no-sandbox")
chrome_options.add_argument('--disable-blink-features=AutomationControlled')
driver = webdriver.Chrome(executable_path = 'chromedriver.exe',options = chrome_options)
from selenium.webdriver.support.ui import Select
from selenium.webdriver.common.keys import Keys
# grab Keywords
search_term = input('Keywords: ')
# url generator
driver.get('https://www.aliexpress.com')
driver.implicitly_wait(10)
p = driver.find_element_by_name('SearchText')
p.send_keys(search_term)
p.send_keys(Keys.ENTER)
productlist = []
product = driver.find_element_by_xpath('//*[@id="root"]/div/div/div[2]/div[2]/div/div[2]/ul')
height = driver.execute_script("return document.body.scrollHeight")
for scrol in range(100,height-1800,100):
driver.execute_script(f"window.scrollTo(0,{scrol})")
time.sleep(0.5)
# driver.execute_script('window.scrollTo(0, document.body.scrollHeight);')
div = []
list_i = []
item_title = []
a = []
for z in range(1,16):
div.append(product.find_element_by_xpath('//*[@id="root"]/div/div/div[2]/div[2]/div/div[2]/ul/div'+str([z])))
for pr in div:
list_i.append(pr.find_elements_by_class_name('list-item'))
for pc in list_i:
for p in pc:
item_title.append(p.find_element_by_class_name('item-title-wrap'))
for pt in item_title:
a.append(pt.find_element_by_tag_name('a'))
for prt in a:
productlist.append(prt.text)
https://stackoverflow.com/questions/66982115
复制相似问题