首页
学习
活动
专区
圈层
工具
发布
首页
学习
活动
专区
圈层
工具
MCP广场
社区首页 >问答首页 >如何得到下一个标签?

如何得到下一个标签?
EN

Stack Overflow用户
提问于 2019-12-31 20:27:09
回答 3查看 56关注 0票数 1

我正试图得到介于班级之间的头条新闻。标题围绕着h2标签。标题出现在标签后面。

代码语言:javascript
运行
复制
from bs4 import BeautifulSoup
import requests
r = requests.get("https://www.dailypost.ng/hot-news")
soup = BeautifulSoup(r.content, "html.parser")
mydivs = soup.findAll("span", {"class": "mvp-cd-date left relative"})
mytags = mydivs.findNext('h2')
for tag in mytags:
    print(tag.text.strip())
EN

回答 3

Stack Overflow用户

回答已采纳

发布于 2019-12-31 20:40:37

您必须遍历mydivs 才能使用findNext()

mydivs是一个web元素列表。findNext只适用于单个web元素。您必须遍历div并对每个div运行findNext

只需加上这一行

代码语言:javascript
运行
复制
for div in mydivs:

把它放在前面

代码语言:javascript
运行
复制
mytags = div.findNext('h2')

以下是您的工作计划的完整代码:

代码语言:javascript
运行
复制
from bs4 import BeautifulSoup
import requests
r = requests.get("https://www.dailypost.ng/hot-news")
soup = BeautifulSoup(r.content, "html.parser")
mydivs = soup.findAll("span", {"class": "mvp-cd-date left relative"})
for div in mydivs:
    mytags = div.findNext('h2')
    for tag in mytags:
        print(tag.strip())
票数 0
EN

Stack Overflow用户

发布于 2019-12-31 20:34:48

soup.findAll()返回一个列表(或None),因此不能在其上调用findNext()。但是,您可以迭代标记并分别调用每个标记上的find_next()

代码语言:javascript
运行
复制
import requests
from bs4 import BeautifulSoup

r = requests.get("https://www.dailypost.ng/hot-news")
soup = BeautifulSoup(r.content, "html.parser")
mydivs = soup.findAll("span", {"class": "mvp-cd-date left relative"})
for tag in mydivs:
    print(tag.find_next('h2').get_text(strip=True))

指纹:

代码语言:javascript
运行
复制
BREAKING: Another federal lawmaker dies in Dubai hospital
Cross-Over Night: Enugu Govt bans burning of tyres on roads
Dadiyata: DSS breaks silence as Nigerian govt critic remains missing
CAC: Nigerian govt appoints new Acting Registrar-General
What Buhari told me – Dabiri-Erewa
What soldiers should expect in 2020 – Buratai
Only earthquake can erase Amosun’s legacies in Ogun – Akinlade
Civil War: Militia leader sentenced to 20yrs in prison
2020: Prophet Omale releases prophecies on Buhari, Aisha, Kyari, govs, coup plot
BREAKING: EFCC arrests Shehu Sani
Armed Forces Day: Yobe Governor Buni, donates N40 million for emblem appeal fund
Zamfara govt bans illegal gathering in the state
Agbenu Kacholalo: Colours of culture at Idoma International Carnival 2019 [PHOTOS]
Men of God are too fearful, weak to challenge government activities
2020: Peter Obi sends message to Nigerians
TETFUND: EFCC, ICPC asked to probe agency over alleged corruption
Two inmates regain freedom from Uyo prison
Buhari meets President of AfDB, Adeshina at Aso Rock
New Kogi CP resumes office, promises crime free state
Nothing stops you from paying N30,000 minimum wage to workers – APC challenges Makinde

编辑:此脚本将从几页中刮取标题:

代码语言:javascript
运行
复制
import requests
from bs4 import BeautifulSoup

url = 'https://dailypost.ng/hot-news/page/{}/'

for page in range(1, 5):    # <-- change how many pages do you want
    print('Page no.{}'.format(page))
    soup = BeautifulSoup(requests.get(url.format(page)).content, "html.parser")
    mydivs = soup.findAll("span", {"class": "mvp-cd-date left relative"})
    for tag in mydivs:
        print(tag.find_next('h2').get_text(strip=True))
    print('-' * 80)
票数 0
EN

Stack Overflow用户

发布于 2019-12-31 20:36:41

尝试将最后3行替换为:

代码语言:javascript
运行
复制
for div in mydivs:
    mytags = div.findNext('h2')
    for tag in mytags:
        print(tag.strip())
票数 0
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/59548220

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档