我试着做一个脚本来跟踪亚马逊上的价格。但我不明白为什么它会给我这个错误:
Traceback (most recent call last):
File "scraping_amazon.py", line 12, in <module>
price = soup.find('span', class_ = 'a-size-medium a-color-price priceBlockBuyingPriceString').text
AttributeError: 'NoneType' object has no attribute 'text'
到目前为止,我的脚本如下:
import requests
from bs4 import BeautifulSoup
URL = 'https://www.amazon.de/Sony-Vollformat-Digitalkamera-Megapixel-SEL-2870/dp/B00FWUDEEC/ref=sr_1_4?__mk_de_DE=%C3%85M%C3%85%C5%BD%C3%95%C3%91&dchild=1&keywords=sony+a7&qid=1604245969&quartzVehicle=5-672&replacementKeywords=sony&sr=8-4'
page = requests.get(URL)
soup = BeautifulSoup(page.text, 'html.parser')
price = soup.find('span', class_ = 'a-size-medium a-color-price priceBlockBuyingPriceString').text
print(price)
我遵循了与我的其他网络抓取脚本相同的过程,他们正在工作,但不是他。
有什么想法吗?谢谢。
发布于 2020-11-02 00:11:02
页面内容使用javascript
动态加载。你必须使用像selenium
这样的东西来抓取动态加载的页面。下面是完成此操作的完整代码:
import requests
from bs4 import BeautifulSoup
from selenium import webdriver
import time
URL = 'https://www.amazon.de/Sony-Vollformat-Digitalkamera-Megapixel-SEL-2870/dp/B00FWUDEEC/ref=sr_1_4?__mk_de_DE=%C3%85M%C3%85%C5%BD%C3%95%C3%91&dchild=1&keywords=sony+a7&qid=1604245969&quartzVehicle=5-672&replacementKeywords=sony&sr=8-4'
driver = webdriver.Chrome()
driver.get(URL)
time.sleep(4)
soup = BeautifulSoup(driver.page_source,'html5lib')
price = soup.find('span', class_ = 'a-size-medium a-color-price priceBlockBuyingPriceString').text
print(price)
driver.close()
输出:
962,16 €
发布于 2020-11-02 00:06:29
将HTMLsession与某个虚拟用户代理一起使用
from bs4 import BeautifulSoup
from requests_html import HTMLSession
headers = {
'User-Agent':'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/54.0.2840.71 Safari/537.36'
}
session = HTMLSession()
response = session.get('https://www.amazon.de/Sony-Vollformat-Digitalkamera-Megapixel-SEL-2870/dp/B00FWUDEEC/ref=sr_1_4?__mk_de_DE=%C3%85M%C3%85%C5%BD%C3%95%C3%91&dchild=1&keywords=sony+a7&qid=1604245969&quartzVehicle=5-672&replacementKeywords=sony&sr=8-4',headers=headers)
soup = BeautifulSoup(response.content, 'html.parser')
price = soup.find('span', class_ = 'a-size-medium a-color-price priceBlockBuyingPriceString').text
print(price)
https://stackoverflow.com/questions/64633914
复制相似问题