首页
学习
活动
专区
圈层
工具
发布
首页
学习
活动
专区
圈层
工具
MCP广场
社区首页 >问答首页 >BeautifulSoup在任何soup命令上返回‘`NoneType`’

BeautifulSoup在任何soup命令上返回‘`NoneType`’
EN

Stack Overflow用户
提问于 2021-05-27 06:42:08
回答 1查看 225关注 0票数 1

我正在使用BeautifulSoup抓取“华尔街日报”,但它似乎永远找不到id=的“顶部新闻”元素,它总是可以在主页上找到。我已经尝试了find()、find_all()和各种其他方法,它们都为在NoneType对象上调用的任何方法返回一个NoneType

我试图提取关于头条新闻文章的元数据,主要是文章标题和url。每一篇文章的元数据都在一个名为“WSJTheme--标题--7VCzo7Ay”的类下,但我只希望那些位于“头条新闻”的类中。

这是我的代码:

代码语言:javascript
运行
复制
import requests
from bs4 import BeautifulSoup
from shutil import copyfile

URL = 'https://www.wsj.com'
page = requests.get(URL)

soup = BeautifulSoup(page.content, 'html.parser')
results = soup.find(id='top-news')

topArticles = results.find_all('div', class_='WSJTheme--headline--7VCzo7Ay ')
EN

回答 1

Stack Overflow用户

回答已采纳

发布于 2021-05-27 07:34:13

指定User-Agent从服务器获得正确的响应:

代码语言:javascript
运行
复制
import requests
from bs4 import BeautifulSoup


url = "https://www.wsj.com/"

headers = {
    "User-Agent": "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:88.0) Gecko/20100101 Firefox/88.0"
}

soup = BeautifulSoup(requests.get(url, headers=headers).content, "html.parser")

for headline in soup.select('#top-news span[class*="headline"]'):
    print(headline.text)

指纹:

代码语言:javascript
运行
复制
Oil Giants Dealt Defeats as Climate Pressures Intensify
At Least Eight Killed in San Jose Shooting
HSBC to Exit Most U.S. Retail Banking
Amazon-MGM Deal Marks Win for Hedge Funds
Cities Reverse Defunding the Police Amid Rising Crime
Federal Prosecutors Have Asked Banks for Information About Archegos Meltdown
Why a Grand Plan to Vaccinate the World Against Covid Unraveled
Inside the Israel-Hamas Conflict and One of Its Deadliest Hours in Gaza
Eric Carle, ‘The Very Hungry Caterpillar’ Author, Dies at 91
Wynn May Face U.S. Action for Role in China’s Push to Expel Businessman
Walmart to Sell New Line of Gap-Branded Homegoods
票数 1
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/67716937

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档