文章/答案/技术大牛

发布

社区首页 >问答首页 >BeautifulSoup :从div类中提取所有标题文本

问BeautifulSoup :从div类中提取所有标题文本
EN

Stack Overflow用户

提问于 2017-12-25 12:21:58

回答 2查看 2.5K关注 0票数 1

import requests
from bs4 import BeautifulSoup

res = requests.get('http://aicd.companydirectors.com.au/events/events-calendar')
soup = BeautifulSoup(res.text,"lxml")


event_containers = soup.find_all('div', class_ = "col-xs-12 col-sm-6 col-md-8")

first_event = event_containers[0]  
print(first_event.h3.text)

通过使用这段代码，我可以提取事件名称，我正在尝试一种循环和提取所有事件名称和日期的方法？此外，我还试图提取可访问的位置信息，在单击readmore链接之后

python

web-scraping

beautifulsoup

回答 2

Stack Overflow用户

回答已采纳

发布于 2017-12-25 12:29:09

event_containers是一个bs4.element.ResultSet对象，它基本上是Tag对象的列表。

只需循环event_containers中的标记，并选择h3作为标题，选择div.date作为日期，选择a作为URL，例如：

for tag in event_containers:
    print(tag.h3.text)
    print(tag.select_one('div.date').text)
    print(tag.a['href'])

现在，对于位置信息，您必须访问每个URL并在div.date中收集文本。

完整代码：

import requests
from bs4 import BeautifulSoup

res = requests.get('http://aicd.companydirectors.com.au/events/events-calendar')
soup = BeautifulSoup(res.text,"lxml")
event_containers = soup.find_all('div', class_ = "col-xs-12 col-sm-6 col-md-8")
base_url = 'http://aicd.companydirectors.com.au'

for tag in event_containers:
    link = base_url + tag.a['href']
    soup = BeautifulSoup(requests.get(link).text,"lxml")
    location = ', '.join(list(soup.select_one('div.event-add').stripped_strings)[1:-1])
    print('Title:', tag.h3.text)
    print('Date:', tag.select_one('div.date').text)
    print('Link:', link)
    print('Location:', location)

票数 1

Stack Overflow用户

发布于 2017-12-25 12:29:10

试着获取您所追求的所有事件和日期：

import requests
from bs4 import BeautifulSoup

res = requests.get('http://aicd.companydirectors.com.au/events/events-calendar')
soup = BeautifulSoup(res.text,"lxml")
for item in soup.find_all(class_='lead'):
    date = item.find_previous_sibling().text.split(" |")[0]
    print(item.text,date)

票数 1

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/47968568

复制

相似问题

问BeautifulSoup :从div类中提取所有标题文本
EN

回答 2

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问BeautifulSoup :从div类中提取所有标题文本EN

回答 2

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问BeautifulSoup :从div类中提取所有标题文本
EN