我意识到Selenium删除了一些属性,我的代码不能使用each_item.find_element(By.CSS_SELECTOR语句):
for i in range(pagenum):
driver.get(f"https://www.adiglobaldistribution.us/search?attributes=dd1a8f50-5ac8-ec11-a837-000d3a006ffb&page={i}&criteria=Tp-link")
time.sleep(5)
wait=WebDriverWait(driver,10)
search_items = driver.find_elements(By.CSS_SELECTOR,"[class='rd-thumb-details-price']")
for each_item in search_items:
item_title = each_item.find_element(By.CSS_SELECTOR, "span[class='rd-item-name-desc']").text
item_name = each_item.find_element(By.CSS_SELECTOR, "span[class='item-num-mfg']").text[7:]
item_link = each_item.find_element(By.CSS_SELECTOR, "div[class='item-thumb'] a").get_attribute('href')
item_price = each_item.find_element(By.CSS_SELECTOR, "div[class='rd-item-price rd-item-price--list']").text[2:].replace("\n",".")
item_stock = each_item.find_element(By.CSS_SELECTOR, "div[class='rd-item-price']").text[19:]
table = {"title": item_title, "name": item_name, "Price": item_price, "Stock": item_stock, "link": item_link}
data_adi.append(table)
错误:
发布于 2022-09-21 02:04:05
你可能是以错误的方式对待整个情况。这些产品在页面中被javascript补充,一旦页面加载,您就可以实际地刮取api端点,并避免selenium的复杂性(和缓慢性)。下面是一种基于请求和熊猫的解决方案,用于抓取API端点(可在Dev tools - Network选项卡下找到):
import requests
import pandas as pd
headers = {
'User-Agent': "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/104.0.0.0 Safari/537.36"
}
full_df = pd.DataFrame()
for x in range(1, 4):
r = requests.get(f'https://www.adiglobaldistribution.us/api/v2/adiglobalproducts/?applyPersonalization=true&boostIds=&categoryId=16231864-9ed5-4536-a8b3-ae870078e9f7&expand=pricing,brand&getAllAttributeFacets=false&hasMarketingTileContent=false&includeAttributes=IncludeOnProduct&includeSuggestions=false&makeBrandUrls=false&page={x}&pageSize=36&previouslyPurchasedProducts=false&query=&searchWithin=&sort=Bestseller', headers=headers)
df = pd.json_normalize(r.json()['products'])
full_df = pd.concat([full_df, df], axis=0, ignore_index=True)
# print([x for x in full_df.columns])
print(full_df[['basicListPrice', 'modelNumber', 'name', 'properties.countrY_OF_ORIGIN', 'productDetailUrl', 'properties.minimuM_QTY', 'properties.onsalenow']])
在终端打印的结果:
basicListPrice modelNumber name properties.countrY_OF_ORIGIN productDetailUrl properties.minimuM_QTY properties.onsalenow
0 51.99 TL-SG1005P TP-Link TL-SG1005P 5-Port Gigabit Desktop Switch with 4-Port PoE China /Catalog/shop-brands/tp-link/FP-TLSG1005P 1 0
1 81.99 C7 TP-Link ARCHER C7 AC1750 Wireless Dual Band Gigabit Router China /Catalog/shop-brands/tp-link/FP-ARCHERC7 1 0
2 18.99 TL-POE150S TP-Link TL-POE150S PoE Injector, IEEE 802.3af Compliant China /Catalog/shop-brands/tp-link/FP-TLPOE150S 1 0
3 19.99 TL-WR841N TP-Link TL-WR841N 300Mbps Wireless N Router China /Catalog/shop-brands/tp-link/FP-TLWR841N 1 0
4 43.99 TL-PA4010 KIT TP-Link TL-PA4010KIT AV600 600Mbps Powerline Starter Kit China /Catalog/shop-brands/tp-link/FP-TLPA4010K 1 0
... ... ... ... ... ... ... ...
85 76.99 TL-SL1311MP TP-Link TL-SL1311MP 8-Port 10/100mbps + 3-Port Gigabit Desktop Switch With 8-Port PoE+ /Catalog/shop-brands/tp-link/FP-TSL1311MP 1 0
86 35.99 C20 TP-Link ARCHER C20 IEEE 802.11ac Ethernet Wireless Router China /Catalog/shop-brands/tp-link/FP-ARCHERC20 1 0
87 29.99 TL-WR802N TP-Link TL-WR802N 300Mbps Wireless N Nano Router, Pocket Size China /Catalog/shop-brands/tp-link/FP-TLWR802N 1 0
88 100.99 EAP610 TP-Link EAP610_V2 AX1800 CEILING MOUNT WI-FI 6" China /Catalog/shop-brands/tp-link/FP-EAP610V2 1 0
89 130.99 EAP650 TP-Link EAP650 AX3000 Ceiling Mount Wi-Fi 6 Access Point China /Catalog/shop-brands/tp-link/FP-EAP650 1 0
90 rows × 7 columns
您可以进一步检查json响应,并查看是否需要更多有用的信息。雷夫和熊猫的文档:https://pandas.pydata.org/docs/reference/api/pandas.json_normalize.html
https://stackoverflow.com/questions/73793786
复制