首页
学习
活动
专区
圈层
工具
发布
首页
学习
活动
专区
圈层
工具
MCP广场
社区首页 >问答首页 >BeautifulSoup AttributeError

BeautifulSoup AttributeError
EN

Stack Overflow用户
提问于 2021-05-25 21:55:30
回答 1查看 45关注 0票数 0

我正在尝试使用BeautifulSoup和requests抓取谷歌购物。这是我的代码,它非常简单:

代码语言:javascript
运行
复制
from bs4 import BeautifulSoup
import requests
import lxml
import json

def gshop(q):
    q = q.replace(' ', '+')
    
    headers = {
    "User-Agent":
    "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.102 Safari/537.36 Edge/18.19582"
    }
    
    response = requests.get(f'https://www.google.com/search?q={q}&tbm=shop', headers=headers).text

    soup = BeautifulSoup(response, 'lxml')
    data = []

    for container in soup.findAll('div', class_='sh-dgr__content'):
        title = container.find('h4', class_='A2sOrd').text
        price = container.find('span', class_='a8Pemb').text
        supplier = container.find('div', class_='aULzUe IuHnof').text
        buy = 'https://google.com'+(container.find('a', class_='eaGTj mQaFGe shntl')['href'])
        rating = container.find('span', class_='Rsc7Yb').text
        data.append({
            "Title": title,
            "Price": price,
            "Rating": rating,
            "Supplier": supplier,
            "Link": buy
        })

    return json.dumps(data, indent = 2, ensure_ascii = False)

print(gshop('toys'))

这会抛出一个错误:

代码语言:javascript
运行
复制
Traceback (most recent call last):
  File "c:/Users/Maanav/Desktop/ValRal/main.py", line 45, in <module>
    print(gshop('toys'))
  File "c:/Users/Maanav/Desktop/ValRal/main.py", line 34, in gshop
    rating = container.find('span', class_='Rsc7Yb').text
AttributeError: 'NoneType' object has no attribute 'text'

请查看一个谷歌购物网址的源代码,以更好地理解我的代码。哪里出了问题?

EN

回答 1

Stack Overflow用户

发布于 2021-05-25 23:10:12

由@simpleApp在评论中解决:

有时,google购物清单上的产品可能没有评级,或者卖家可能没有添加供应商名称。这会使程序停止运行。为了阻止这种情况的发生,我们必须使用异常处理。

代码语言:javascript
运行
复制
from bs4 import BeautifulSoup
import requests
import lxml
import json

def gshop(q):
    q = q.replace(' ', '+')
    
    headers = {
    "User-Agent":
    "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.102 Safari/537.36 Edge/18.19582"
    }
    
    response = requests.get(f'https://www.google.com/search?q={q}&tbm=shop', headers=headers).text

    soup = BeautifulSoup(response, 'lxml')
    data = []

    for container in soup.findAll('div', class_='sh-dgr__content'):
        try:
            title = container.find('h4', class_='A2sOrd').text
        except:
            title = None
        try:
            price = container.find('span', class_='a8Pemb').text
        except:
            price = None
        try:
            supplier = container.find('div', class_='aULzUe IuHnof').text
        except:
            supplier = None
        try:
            buy = 'https://google.com'+(container.find('a', class_='eaGTj mQaFGe shntl')['href'])
        except:
            buy = None
        try:
            rating = container.find('span', class_='Rsc7Yb').text
        except:
            rating = None
        data.append({
            "Title": title,
            "Price": price,
            "Rating": rating,
            "Supplier": supplier,
            "Link": buy
        })

    return json.dumps(data, indent = 2, ensure_ascii = False)
票数 0
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/67689486

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档