前往小程序,Get更优阅读体验!
立即前往
首页
学习
活动
专区
工具
TVP
发布
社区首页 >专栏 >爬虫进阶:Scrapy抓取慕课网

爬虫进阶:Scrapy抓取慕课网

作者头像
happyJared
发布2018-10-10 10:14:37
1.8K0
发布2018-10-10 10:14:37
举报
文章被收录于专栏:happyJaredhappyJared

前言

  Scrapy抓取慕课网免费以及实战课程信息,相关环境列举如下:

  • scrapy v1.5.1
  • redis
  • psycopg2 (操作并保存数据到PostgreSQL)

数据表

  完整的爬虫流程大致是这样的:分析页面结构 -> 确定提取信息 -> 设计相应表结构 -> 编写爬虫脚本 -> 数据保存入库;入库可以选择mongo这样的文档数据库,也可以选择mysql这样的关系型数据库。废话不多讲,这里暂且跳过页面分析,现给出如下两张数据表设计:

tb_imooc_course

代码语言:javascript
复制
-- ----------------------------
-- Table structure for tb_imooc_course
-- ----------------------------
DROP TABLE IF EXISTS "public"."tb_imooc_course";
CREATE TABLE "public"."tb_imooc_course" (
  "id" serial4,
  "course_id" int4 NOT NULL,
  "name" varchar(100) COLLATE "pg_catalog"."default" NOT NULL,
  "difficult" varchar(30) COLLATE "pg_catalog"."default" NOT NULL,
  "student" int4 NOT NULL,
  "desc" varchar(250) COLLATE "pg_catalog"."default" NOT NULL,
  "label" varchar(50) COLLATE "pg_catalog"."default" NOT NULL,
  "image_urls" varchar(250) COLLATE "pg_catalog"."default" NOT NULL,
  "detail" varchar(250) COLLATE "pg_catalog"."default" NOT NULL,
  "duration" varchar(50) COLLATE "pg_catalog"."default" NOT NULL,
  "overall_score" float4,
  "content_score" float4,
  "concise_score" float4,
  "logic_score" float4,
  "summary" varchar(800) COLLATE "pg_catalog"."default",
  "teacher_nickname" varchar(30) COLLATE "pg_catalog"."default",
  "teacher_avatar" varchar(250) COLLATE "pg_catalog"."default",
  "teacher_job" varchar(30) COLLATE "pg_catalog"."default",
  "tip" varchar(500) COLLATE "pg_catalog"."default",
  "can_learn" varchar(500) COLLATE "pg_catalog"."default",
  "update_time" timestamp(6) NOT NULL,
  "create_time" timestamp(6) NOT NULL
)
;
COMMENT ON COLUMN "public"."tb_imooc_course"."id" IS '自增主键';
COMMENT ON COLUMN "public"."tb_imooc_course"."course_id" IS '课程id';
COMMENT ON COLUMN "public"."tb_imooc_course"."name" IS '课程名称';
COMMENT ON COLUMN "public"."tb_imooc_course"."difficult" IS '难度级别';
COMMENT ON COLUMN "public"."tb_imooc_course"."student" IS '学习人数';
COMMENT ON COLUMN "public"."tb_imooc_course"."desc" IS '课程描述';
COMMENT ON COLUMN "public"."tb_imooc_course"."label" IS '分类标签';
COMMENT ON COLUMN "public"."tb_imooc_course"."image_urls" IS '封面图片';
COMMENT ON COLUMN "public"."tb_imooc_course"."detail" IS '详情地址';
COMMENT ON COLUMN "public"."tb_imooc_course"."duration" IS '课程时长';
COMMENT ON COLUMN "public"."tb_imooc_course"."overall_score" IS '综合评分';
COMMENT ON COLUMN "public"."tb_imooc_course"."content_score" IS '内容实用';
COMMENT ON COLUMN "public"."tb_imooc_course"."concise_score" IS '简洁易懂';
COMMENT ON COLUMN "public"."tb_imooc_course"."logic_score" IS '逻辑清晰';
COMMENT ON COLUMN "public"."tb_imooc_course"."summary" IS '课程简介';
COMMENT ON COLUMN "public"."tb_imooc_course"."teacher_nickname" IS '教师昵称';
COMMENT ON COLUMN "public"."tb_imooc_course"."teacher_avatar" IS '教师头像';
COMMENT ON COLUMN "public"."tb_imooc_course"."teacher_job" IS '教师职位';
COMMENT ON COLUMN "public"."tb_imooc_course"."tip" IS '课程须知';
COMMENT ON COLUMN "public"."tb_imooc_course"."can_learn" IS '能学什么';
COMMENT ON COLUMN "public"."tb_imooc_course"."update_time" IS '更新时间';
COMMENT ON COLUMN "public"."tb_imooc_course"."create_time" IS '入库时间';
COMMENT ON TABLE "public"."tb_imooc_course" IS '免费课程表';

-- ----------------------------
-- Indexes structure for table tb_imooc_course
-- ----------------------------
CREATE UNIQUE INDEX "uni_cid" ON "public"."tb_imooc_course" USING btree (
  "course_id" "pg_catalog"."int4_ops" ASC NULLS LAST
);

tb_imooc_coding

代码语言:javascript
复制
-- ----------------------------
-- Table structure for tb_imooc_coding
-- ----------------------------
DROP TABLE IF EXISTS "public"."tb_imooc_coding";
CREATE TABLE "public"."tb_imooc_coding" (
  "id" serial4,
  "coding_id" int4 NOT NULL,
  "name" varchar(100) COLLATE "pg_catalog"."default" NOT NULL,
  "difficult" varchar(30) COLLATE "pg_catalog"."default" NOT NULL,
  "student" int4 NOT NULL,
  "desc" varchar(250) COLLATE "pg_catalog"."default" NOT NULL,
  "image_urls" varchar(250) COLLATE "pg_catalog"."default" NOT NULL,
  "price" varchar(50) COLLATE "pg_catalog"."default" NOT NULL,
  "detail" varchar(250) COLLATE "pg_catalog"."default" NOT NULL,
  "overall_score" varchar(20) COLLATE "pg_catalog"."default",
  "teacher_nickname" varchar(30) COLLATE "pg_catalog"."default",
  "teacher_avatar" varchar(250) COLLATE "pg_catalog"."default",
  "duration" varchar(50) COLLATE "pg_catalog"."default" NOT NULL,
  "video" varchar(250) COLLATE "pg_catalog"."default",
  "small_title" varchar(250) COLLATE "pg_catalog"."default",
  "detail_desc" varchar(800) COLLATE "pg_catalog"."default",
  "teacher_job" varchar(50) COLLATE "pg_catalog"."default",
  "suit_crowd" varchar(500) COLLATE "pg_catalog"."default",
  "skill_require" varchar(500) COLLATE "pg_catalog"."default",
  "update_time" timestamp(6) NOT NULL,
  "create_time" timestamp(6) NOT NULL
)
;
COMMENT ON COLUMN "public"."tb_imooc_coding"."id" IS '自增主键';
COMMENT ON COLUMN "public"."tb_imooc_coding"."coding_id" IS '课程id';
COMMENT ON COLUMN "public"."tb_imooc_coding"."name" IS '课程名称';
COMMENT ON COLUMN "public"."tb_imooc_coding"."difficult" IS '难度级别';
COMMENT ON COLUMN "public"."tb_imooc_coding"."student" IS '学习人数';
COMMENT ON COLUMN "public"."tb_imooc_coding"."desc" IS '课程描述';
COMMENT ON COLUMN "public"."tb_imooc_coding"."image_urls" IS '封面图片';
COMMENT ON COLUMN "public"."tb_imooc_coding"."price" IS '课程价格';
COMMENT ON COLUMN "public"."tb_imooc_coding"."detail" IS '详情地址';
COMMENT ON COLUMN "public"."tb_imooc_coding"."overall_score" IS '评价得分';
COMMENT ON COLUMN "public"."tb_imooc_coding"."teacher_nickname" IS '教师昵称';
COMMENT ON COLUMN "public"."tb_imooc_coding"."teacher_avatar" IS '教师头像';
COMMENT ON COLUMN "public"."tb_imooc_coding"."duration" IS '课程时长';
COMMENT ON COLUMN "public"."tb_imooc_coding"."video" IS '演示视频';
COMMENT ON COLUMN "public"."tb_imooc_coding"."small_title" IS '详情标题';
COMMENT ON COLUMN "public"."tb_imooc_coding"."detail_desc" IS '详情简介';
COMMENT ON COLUMN "public"."tb_imooc_coding"."teacher_job" IS '教师职位';
COMMENT ON COLUMN "public"."tb_imooc_coding"."suit_crowd" IS '适合人群';
COMMENT ON COLUMN "public"."tb_imooc_coding"."skill_require" IS '技术要求';
COMMENT ON COLUMN "public"."tb_imooc_coding"."update_time" IS '更新时间';
COMMENT ON COLUMN "public"."tb_imooc_coding"."create_time" IS '入库时间';
COMMENT ON TABLE "public"."tb_imooc_coding" IS '实战课程表';

-- ----------------------------
-- Indexes structure for table tb_imooc_coding
-- ----------------------------
CREATE UNIQUE INDEX "uni_coding_id" ON "public"."tb_imooc_coding" USING btree (
  "coding_id" "pg_catalog"."int4_ops" ASC NULLS LAST
);

新建项目

  创建项目:scrapy startproject imooc,接着在items.py中定义相应的Item:

代码语言:javascript
复制
# -*- coding: utf-8 -*-

# Define here the models for your scraped items
#
# See documentation in:
# https://doc.scrapy.org/en/latest/topics/items.html

from scrapy.item import Item, Field
from scrapy.loader.processors import TakeFirst


# 免费课程
class CourseItem(Item):
    # define the fields for your item here like:
    # name = scrapy.Field()

    name = Field()  # 课程名称
    difficult = Field()  # 难度级别
    student = Field()  # 学习人数
    desc = Field()  # 课程描述
    label = Field()  # 分类标签
    image_urls = Field()  # 封面图片
    detail = Field()  # 详情地址
    course_id = Field()  # 课程id
    duration = Field()  # 课程时长
    overall_score = Field()  # 综合评分
    content_score = Field()  # 内容实用
    concise_score = Field()  # 简洁易懂
    logic_score = Field()  # 逻辑清晰
    summary = Field()  # 课程简介
    teacher_nickname = Field()  # 教师昵称
    teacher_avatar = Field()  # 教师头像
    teacher_job = Field()  # 教师职位
    tip = Field()  # 课程须知
    can_learn = Field()  # 能学什么

# 实战课程
class CodingItem(Item):
    name = Field()  # 课程名称
    difficult = Field()  # 难度级别
    student = Field()  # 学习人数
    desc = Field()  # 课程描述
    image_urls = Field()  # 封面图片
    price = Field()  # 课程价格
    detail = Field()  # 详情地址
    coding_id = Field()  # 课程id
    overall_score = Field()  # 评价得分
    teacher_nickname = Field()  # 教师昵称
    teacher_avatar = Field()  # 教师头像
    duration = Field()  # 课程时长
    video = Field()  # 演示视频
    small_title = Field()  # 详情标题
    detail_desc = Field()  # 详情简介
    teacher_job = Field()  # 教师职位
    suit_crowd = Field()  # 适合人群
    skill_require = Field()  # 技术要求

"免费课程"爬虫编写

  下面分析下慕课网免费课程页面的爬虫编写。简单分析下页面情况,以上定义的数据表(tb_imooc_course)信息,分别需要从列表页和课程详情页获取(如下图红框所示):

免费课程列表页

免费课程详情页

  在项目的spiders目录下创建该爬虫:scrapy genspider course。关于页面数据解析这块强烈建议使用scrapy shell xxx进行调试,以下是这部分内容的代码:

代码语言:javascript
复制
# -*- coding: utf-8 -*-
import re
from urllib import parse

import scrapy
from scrapy.http import Request

from imooc.items import *


class CourseSpider(scrapy.Spider):
    name = 'course'
    allowed_domains = ['www.imooc.com']
    start_urls = ['https://www.imooc.com/course/list/?page=0']
    https = "https:"

    def parse(self, response):
        """抓取课程列表页面"""
        
        url = response.url
        self.logger.info("Response url is %s" % url)

        # 根据Scrapy默认的后入先出(LIFO)深度爬取策略,这里应先提交下一页请求
        next_btn = response.xpath('//a[contains(.//text(),"下一页")]/@href').extract_first()
        if next_btn:
            # 存在下一页按钮,爬取下一页
            next_page = parse.urljoin(url, next_btn)
            yield Request(next_page, callback=self.parse)

        course_list = response.xpath('//div[@class="course-card-container"]')
        for index, course in enumerate(course_list):
            course_item = CourseItem()
            # 课程名称
            course_item['name'] = course.xpath('.//h3[@class="course-card-name"]/text()').extract_first()
            # 课程难度
            course_item['difficult'] = course.xpath(
                './/div[@class="course-card-info"]/span[1]/text()').extract_first()
            # 学习人次
            course_student = course.xpath('.//div[@class="course-card-info"]/span[2]/text()').extract_first()
            course_item['student'] = int(course_student)
            # 课程描述
            course_item['desc'] = course.xpath('.//p[@class="course-card-desc"]/text()').extract_first()
            # 课程分类
            course_label = course.xpath('.//div[@class="course-label"]/label/text()').extract()
            course_item['label'] = ', '.join(course_label)
            # 课程封面
            course_banner = course.xpath('.//div[@class="course-card-top"]/img/@src').extract_first()
            course_item['image_urls'] = ["{0}{1}".format(CourseSpider.https, course_banner)]
            # 详情地址
            course_detail = course.xpath('.//a/@href').extract_first()
            course_item['detail'] = parse.urljoin(url, course_detail)
            # 课程id
            course_id = re.split('/', course_detail)[-1]
            course_item['course_id'] = int(course_id)
            self.log("Item: %s" % course_item)
            # 爬取详情页
            yield Request(course_item['detail'], callback=self.parse_detail, meta={'course_item': course_item})

    def parse_detail(self, response):
        """ 抓取课程详情页面 """
        
        url = response.url
        self.logger.info("Response url is %s" % url)

        course_item = response.meta['course_item']
        meta_value = response.xpath('//span[@class="meta-value"]/text()').extract()
        # 课程时长
        course_item['duration'] = meta_value[1].strip()
        # 综合得分
        course_item['overall_score'] = meta_value[2]
        # 内容实用
        course_item['content_score'] = meta_value[3]
        # 简洁易懂
        course_item['concise_score'] = meta_value[4]
        # 逻辑清晰
        course_item['logic_score'] = meta_value[5]
        # 课程简介
        course_item['summary'] = response.xpath('//div[@class="course-description course-wrap"]/text()') \
            .extract_first().strip()
        # 教师昵称
        course_item['teacher_nickname'] = response.xpath('//span[@class="tit"]/a/text()').extract_first()
        # 教师头像
        avatar = response.xpath('//img[@class="js-usercard-dialog"]/@src').extract_first()
        if avatar:
            course_item['teacher_avatar'] = "{0}{1}".format(CourseSpider.https, avatar)
        # 教师职位
        course_item['teacher_job'] = response.xpath('//span[@class="job"]/text()').extract_first()
        # 课程须知
        course_item['tip'] = response.xpath('//dl[@class="first"]/dd/text()').extract_first()
        # 能学什么
        course_item['can_learn'] = response.xpath(
            '//div[@class="course-info-tip"]/dl[not(@class)]/dd/text()').extract_first()
        yield course_item

  数据的入库在pipelines.py中,后面跟"实战课程"爬虫再一起介绍。

"实战课程"爬虫编写

  继续介绍慕课网实战课程页面的爬虫编写,同样简单分析下页面情况,实战课程定义的数据表(tb_imooc_coding)信息,同样需要从列表页和课程详情页获取(如下图红框所示):

实战课程列表页

实战课程详情页

  同样在spiders目录下创建该爬虫:scrapy genspider coding。以下是这部分内容的代码:

代码语言:javascript
复制
# -*- coding: utf-8 -*-
import re
from urllib import parse

import scrapy
from scrapy.http import Request

from imooc.items import *


class CodingSpider(scrapy.Spider):
    name = 'coding'
    allowed_domains = ['coding.imooc.com']
    start_urls = ['https://coding.imooc.com/?page=0']
    https = "https:"

    def parse(self, response):
        """抓取课程列表页面"""

        url = response.url
        self.logger.info("Response url is %s" % url)

        next_btn = response.xpath('//a[contains(.//text(),"下一页")]/@href').extract_first()
        if next_btn:
            next_page = parse.urljoin(url, next_btn)
            yield Request(next_page, callback=self.parse)

        coding_list = response.xpath('//div[@class="shizhan-course-wrap l "]')
        for index, coding in enumerate(coding_list):
            coding_item = CodingItem()
            # 课程名称
            coding_item['name'] = coding.xpath('.//p[@class="shizan-name"]/text()').extract_first()
            # 课程难度
            coding_item['difficult'] = coding.xpath('.//span[@class="grade"]/text()').extract_first()
            # 学习人次
            coding_student = coding.xpath('.//div[@class="shizhan-info"]/span[2]/text()').extract_first()
            coding_item['student'] = int(coding_student)
            # 课程描述
            coding_item['desc'] = coding.xpath('.//p[@class="shizan-desc"]/text()').extract_first()
            # 课程封面
            coding_banner = coding.xpath('.//img[@class="shizhan-course-img"]/@src').extract_first()
            coding_item['image_urls'] = ["{0}{1}".format(CodingSpider.https, coding_banner)]
            # 课程价格
            coding_item['price'] = coding.xpath('.//div[@class="course-card-price"]/text()').extract_first()
            # 详情地址
            coding_detail = coding.xpath('.//a/@href').extract_first()
            coding_item['detail'] = parse.urljoin(url, coding_detail)
            # 课程id
            coding_id = re.split('/', coding_detail)[-1].replace('.html', '')
            coding_item['coding_id'] = int(coding_id)
            # 评价得分
            coding_item['overall_score'] = coding.xpath('.//span[@class="r"]/text()').extract_first().replace('评价:', '')
            # 教师昵称
            coding_item['teacher_nickname'] = coding.xpath('.//div[@class="lecturer-info"]/span/text()').extract_first()
            # 教师头像
            avatar = coding.xpath('.//div[@class="lecturer-info"]/img/@src').extract_first()
            coding_item['teacher_avatar'] = "{0}{1}".format(CodingSpider.https, avatar)

            self.log("Item: %s" % coding_item)
            # 爬取详情页
            yield Request(coding_item['detail'], callback=self.parse_detail, meta={'coding_item': coding_item})

    def parse_detail(self, response):
        """ 抓取课程详情页面 """

        url = response.url
        self.logger.info("Response url is %s" % url)

        coding_item = response.meta['coding_item']
        # 课程时长
        coding_item['duration'] = response.xpath(
            '//div[@class="static-item static-time"]/span/strong/text()').extract_first()
        # 演示视频
        video = response.xpath('//div[@id="js-video-content"]/@data-vurl').extract_first()
        coding_item['video'] = parse.urljoin(CodingSpider.https, video)
        # 详情标题
        coding_item['small_title'] = response.xpath('//div[@class="title-box "]/h2/text()').extract_first()
        # 详情简介
        coding_item['detail_desc'] = response.xpath('//div[@class="info-desc"]/text()').extract_first()
        # 教师职位
        coding_item['teacher_job'] = response.xpath('//div[@class="teacher"]/p/text()').extract_first()
        yield coding_item

数据入库

  项目中有用到redis,用来简单判断下数据应该是入库保存还是更新,用mongo这种有版本号的文档数据库可能会更好一些:

代码语言:javascript
复制
# -*- coding: utf-8 -*-

# Define your item pipelines here
#
# Don't forget to add your pipeline to the ITEM_PIPELINES setting
# See: https://doc.scrapy.org/en/latest/topics/item-pipeline.html
from utils import rds
from utils import pgs
from imooc import items
from datetime import datetime


class ImoocPipeline(object):
    def __init__(self):
        # PostgreSQL和Redis连接应自行更改
        # PostgreSQL
        host = 'localhost'
        port = 12432
        db_name = 'scrapy'
        username = db_name
        password = db_name
        self.postgres = pgs.Pgs(host=host, port=port, db_name=db_name, user=username, password=password)
        # Redis
        self.redis = rds.Rds(host=host, port=12379, db=1, password='redis6379').redis_cli

    def process_item(self, item, spider):
        name = item['name']
        difficult = item['difficult']
        student = item['student']
        desc = item['desc']
        image_urls = item['image_urls'][0]
        detail = item['detail']
        duration = item['duration']
        overall_score = item['overall_score']
        teacher_nickname = item['teacher_nickname']
        teacher_avatar = item.get('teacher_avatar')
        teacher_job = item['teacher_job']
        now = datetime.now()
        str_now = now.strftime('%Y-%m-%d %H:%M:%S')

        if isinstance(item, items.CourseItem):
            # 免费课程
            course_id = item['course_id']
            label = item['label']
            overall_score = float(overall_score)
            content_score = float(item['content_score'])
            concise_score = float(item['concise_score'])
            logic_score = float(item['logic_score'])
            summary = item['summary']
            tip = item['tip']
            can_learn = item['can_learn']

            key = 'imooc:course:{0}'.format(course_id)
            if self.redis.exists(key):
                params = (student, overall_score, content_score, concise_score, logic_score, now, course_id)
                self.postgres.handler(update_course(), params)
            else:
                params = (course_id, name, difficult, student, desc, label, image_urls, detail, duration,
                          overall_score, content_score, concise_score, logic_score, summary, teacher_nickname,
                          teacher_avatar, teacher_job, tip, can_learn, now, now)
                effect_count = self.postgres.handler(add_course(), params)
                if effect_count > 0:
                    self.redis.set(key, str_now)

        if isinstance(item, items.CodingItem):
            # 实战课程
            price = item['price']
            coding_id = item['coding_id']
            video = item['video']
            small_title = item['small_title']
            detail_desc = item['detail_desc']

            key = 'imooc:coding:{0}'.format(coding_id)
            if self.redis.exists(key):
                params = (student, price, overall_score, now, coding_id)
                self.postgres.handler(update_coding(), params)
            else:
                params = (coding_id, name, difficult, student, desc, image_urls, price, detail,
                          overall_score, teacher_nickname, teacher_avatar, duration, video, small_title,
                          detail_desc, teacher_job, now, now)
                effect_count = self.postgres.handler(add_coding(), params)
                if effect_count > 0:
                    self.redis.set(key, str_now)

        return item

    def close_spider(self, spider):
        self.postgres.close()


def add_course():
    sql = 'insert into tb_imooc_course(course_id,"name",difficult,student,"desc",label,image_urls,' \
          'detail,duration,overall_score,content_score,concise_score,logic_score,summary,' \
          'teacher_nickname,teacher_avatar,teacher_job,tip,can_learn,update_time,create_time) ' \
          'values(%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s)'
    return sql


def update_course():
    sql = 'update tb_imooc_course set student=%s,overall_score=%s,content_score=%s,concise_score=%s,' \
          'logic_score=%s,update_time=%s where course_id = %s'
    return sql


def add_coding():
    sql = 'insert into tb_imooc_coding(coding_id,"name",difficult,student,"desc",image_urls,price,detail,' \
          'overall_score,teacher_nickname,teacher_avatar,duration,video,small_title,detail_desc,teacher_job,' \
          'update_time,create_time) values(%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s)'
    return sql


def update_coding():
    sql = 'update tb_imooc_coding set student=%s,price=%s,overall_score=%s,update_time=%s where coding_id = %s'
    return sql

  最后,别忘了要在settings.py中手动开启pipelines配置:

配置pipelines

运行爬虫

  启动上述Scrapy爬虫,可分别使用命令scrapy crawl coursescrapy crawl coding运行,如果不想每次都要输入这么麻烦, 可以Scrapy提供的API将启动命令编码到py中,再用python命令运行该脚本即可,具体可参考如下:

代码语言:javascript
复制
from scrapy.cmdline import execute

# 免费课程
execute(['scrapy', 'crawl', 'course'])
# 实战课程
execute('scrapy crawl coding'.split(' '))

数据展示

  实际运行中并没有看到有相关的反爬虫限制,可能是因为整体数据量不大(免费课程有900多,实战课程有100多门),借助Scrapy的多线程能力(setting.py中的CONCURRENT_REQUESTS配置,默认是16)很快也就抓取完了:

免费课程

实战课程

实例代码 - GitHub

本文参与 腾讯云自媒体分享计划,分享自作者个人站点/博客。
原始发表:2018.09.02 ,如有侵权请联系 cloudcommunity@tencent.com 删除

本文分享自 作者个人站点/博客 前往查看

如有侵权,请联系 cloudcommunity@tencent.com 删除。

本文参与 腾讯云自媒体分享计划  ,欢迎热爱写作的你一起参与!

评论
登录后参与评论
0 条评论
热度
最新
推荐阅读
目录
  • 前言
  • 数据表
  • 新建项目
  • "免费课程"爬虫编写
  • "实战课程"爬虫编写
  • 数据入库
  • 运行爬虫
  • 数据展示
相关产品与服务
云数据库 Redis
腾讯云数据库 Redis(TencentDB for Redis)是腾讯云打造的兼容 Redis 协议的缓存和存储服务。丰富的数据结构能帮助您完成不同类型的业务场景开发。支持主从热备,提供自动容灾切换、数据备份、故障迁移、实例监控、在线扩容、数据回档等全套的数据库服务。
领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档