专栏首页Hank’s Blog7.01-beautiful_soup3

7.01-beautiful_soup3

# pip install beautifulsoup4

from bs4 import BeautifulSoup

html_doc = """
<html><head>
<title id="one">The Dormouse's story</title>
</head>
<body>
<p class="story"><!--...--></p>
<p class="title">
    p标签的内容
    <b>The Dormouse's story</b>
</p>

<p class="story">Once upon a time there were three little sisters; and their names were
<a href="http://example.com/elsie" class="sister" id="link1">Elsie</a>,
<a href="http://example.com/lacie" class="sister" id="link2">Lacie</a> and
<a href="http://example.com/tillie" class="sister" id="link3">Tillie</a>;
and they lived at the bottom of a well.</p>
"""

# 1.转类型 bs4.BeautifulSoup'
soup = BeautifulSoup(html_doc, 'lxml')

# 2.通用解析方法

#  find--返回符合查询条件的 第一个标签对象
result = soup.find(name="p")
result = soup.find(attrs={"class": "title"})
result = soup.find(text="Tillie")
result = soup.find(
    name='p',
    attrs={"class": "story"},
)

# find_all--list(标签对象)
result = soup.find_all('a')
result = soup.find_all("a", limit=1)[0]
result = soup.find_all(attrs={"class": "sister"})

# select_one---css选择器
result = soup.select_one('.sister')

# select----css选择器---list
result = soup.select('.sister')
result = soup.select('#one')
result = soup.select('head title')
result = soup.select('title,.title')
result = soup.select('a[id="link3"]')

# 标签包裹的内容---list
result = soup.select('.title')[0].get_text()


# 标签的属性
# result = soup.select('#link1')[0].get('href')
print(result)

本文参与腾讯云自媒体分享计划,欢迎正在阅读的你也加入,一起分享。

我来说两句

0 条评论
登录 后参与评论

相关文章

  • 7.01-beautiful_soup2

    hankleo
  • AttributeError: 'list' object has no attribute 'keys'

    hankleo
  • 9.1 mongo_python.py

    hankleo
  • spring cloud oauth2资源服务器自定义认证失败和拒绝访问错误消息

    路过君
  • Oracle之PL/SQL学习笔记

      自己在学习Oracle是做的笔记及实验代码记录,内容挺全的,也挺详细,发篇博文分享给需要的朋友,共有1w多字的学习笔记吧。是以前做的,一直在压箱底,今天拿出...

    lizelu
  • [译] 为何每次 Git Commit 要尽可能小?

    原文:https://medium.com/better-programming/why-you-should-write-small-git-commits-...

    江米小枣
  • Angular 从入坑到挖坑 - 路由守卫连连看

    Angular 入坑记录的笔记第六篇,介绍 Angular 路由模块中关于路由守卫的相关知识点,了解常用到的路由守卫接口,知道如何通过实现路由守卫接口来实现特定...

    程序员宇说
  • 聊聊dubbo-go的broadcastCluster

    dubbo-go-v1.4.2/cluster/cluster_impl/broadcast_cluster.go

    codecraft
  • 聊聊dubbo-go的broadcastCluster

    dubbo-go-v1.4.2/cluster/cluster_impl/broadcast_cluster.go

    codecraft
  • DATETIME类型和BIGINT 类型互相转换

    项目中使用BIGINT来存放时间,以下代码用来转换时间类型和BIGINT类型 SET ANSI_NULLS ON GO SET QUOTED_IDENTIFIE...

    用户1217611

扫码关注云+社区

领取腾讯云代金券