我正试图得到介于班级之间的头条新闻。标题围绕着h2标签。标题出现在标签后面。
from bs4 import BeautifulSoup
import requests
r = requests.get("https://www.dailypost.ng/hot-news")
soup = BeautifulSoup(r.content, "html.parser")
mydivs = soup.findAll("span", {"class": "mvp-cd-date left relative"})
mytags = mydivs.findNext('h2')
for tag in mytags:
print(tag.text.strip())
发布于 2019-12-31 20:40:37
您必须遍历mydivs
才能使用findNext()
mydivs
是一个web元素列表。findNext
只适用于单个web元素。您必须遍历div并对每个div运行findNext
。
只需加上这一行
for div in mydivs:
把它放在前面
mytags = div.findNext('h2')
以下是您的工作计划的完整代码:
from bs4 import BeautifulSoup
import requests
r = requests.get("https://www.dailypost.ng/hot-news")
soup = BeautifulSoup(r.content, "html.parser")
mydivs = soup.findAll("span", {"class": "mvp-cd-date left relative"})
for div in mydivs:
mytags = div.findNext('h2')
for tag in mytags:
print(tag.strip())
发布于 2019-12-31 20:34:48
soup.findAll()
返回一个列表(或None
),因此不能在其上调用findNext()
。但是,您可以迭代标记并分别调用每个标记上的find_next()
:
import requests
from bs4 import BeautifulSoup
r = requests.get("https://www.dailypost.ng/hot-news")
soup = BeautifulSoup(r.content, "html.parser")
mydivs = soup.findAll("span", {"class": "mvp-cd-date left relative"})
for tag in mydivs:
print(tag.find_next('h2').get_text(strip=True))
指纹:
BREAKING: Another federal lawmaker dies in Dubai hospital
Cross-Over Night: Enugu Govt bans burning of tyres on roads
Dadiyata: DSS breaks silence as Nigerian govt critic remains missing
CAC: Nigerian govt appoints new Acting Registrar-General
What Buhari told me – Dabiri-Erewa
What soldiers should expect in 2020 – Buratai
Only earthquake can erase Amosun’s legacies in Ogun – Akinlade
Civil War: Militia leader sentenced to 20yrs in prison
2020: Prophet Omale releases prophecies on Buhari, Aisha, Kyari, govs, coup plot
BREAKING: EFCC arrests Shehu Sani
Armed Forces Day: Yobe Governor Buni, donates N40 million for emblem appeal fund
Zamfara govt bans illegal gathering in the state
Agbenu Kacholalo: Colours of culture at Idoma International Carnival 2019 [PHOTOS]
Men of God are too fearful, weak to challenge government activities
2020: Peter Obi sends message to Nigerians
TETFUND: EFCC, ICPC asked to probe agency over alleged corruption
Two inmates regain freedom from Uyo prison
Buhari meets President of AfDB, Adeshina at Aso Rock
New Kogi CP resumes office, promises crime free state
Nothing stops you from paying N30,000 minimum wage to workers – APC challenges Makinde
编辑:此脚本将从几页中刮取标题:
import requests
from bs4 import BeautifulSoup
url = 'https://dailypost.ng/hot-news/page/{}/'
for page in range(1, 5): # <-- change how many pages do you want
print('Page no.{}'.format(page))
soup = BeautifulSoup(requests.get(url.format(page)).content, "html.parser")
mydivs = soup.findAll("span", {"class": "mvp-cd-date left relative"})
for tag in mydivs:
print(tag.find_next('h2').get_text(strip=True))
print('-' * 80)
发布于 2019-12-31 20:36:41
尝试将最后3行替换为:
for div in mydivs:
mytags = div.findNext('h2')
for tag in mytags:
print(tag.strip())
https://stackoverflow.com/questions/59548220
复制相似问题