使用requests+BeautifulSoup爬取龙族V小说

用户1558882

发布于 2019-01-30 17:13:55

6010

发布于 2019-01-30 17:13:55

文章被收录于专栏：Rgc

这几天想看龙族最新版本，但是搜索半天发现没有网站提供下载，我又只想下载后离线阅读（写代码已经很费眼睛了）。无奈只有自己爬取了。

这里记录一下，以后想看时，直接运行脚本下载小说。

这里是从 http://longzu5.co 这个网站下载的小说，如果需要更改存储路径，可以更改 FILE_URL 常量的值

如果爬取不到了，说明，此网站做了防爬虫，或者其渲染网页的 html 元素改变了。

# -*- coding: utf-8 -*-
# (C) rgc, 2018
# All rights reserved
# requirements list: [python3.6, requests, bs4]

import requests
from bs4 import BeautifulSoup

URL = "http://longzu5.co"
FILE_URL = 'E:\lz.txt'


def get_son_text(strs):
    # 获取文章内容
    soup = BeautifulSoup(strs, 'html.parser')
    body_soup = soup.find('div', 'post-body')
    result = body_soup.find_all('p')
    title = soup.find('h2', 'post-title')
    title = title.text
    final_txt = title + '\n'

    for item in result:
        txt = item.text
        final_txt += txt
    final_txt += '\n\n'
    with open(FILE_URL, 'a', encoding='utf-8') as f:
        f.write(final_txt)


def get_father_text():
    """
    获取文章列表
    :return:
    """
    res = requests.get(URL + "/")
    strs = res.text
    soup = BeautifulSoup(strs, 'html.parser')

    ul_soup = soup.find('ul', 'booklist')
    x = ul_soup.find_all('a')
    section_list = []
    for item in x:
        url = URL + item.get('href')
        section_list.append(url)

    section_list.reverse()
    for url in section_list:
        print(url)
        section = requests.get(url)
        sec_txt = section.text
        get_son_text(sec_txt)


if __name__ == '__main__':
    get_father_text()

# 如有版权，请及时联系我，我会及时删除，如有冒犯，请原谅。

本文参与腾讯云自媒体同步曝光计划，分享自作者个人站点/博客。

原始发表：2019-01-15 ，如有侵权请联系 cloudcommunity@tencent.com 删除

网站

本文分享自作者个人站点/博客前往查看

如有侵权，请联系 cloudcommunity@tencent.com 删除。

本文参与腾讯云自媒体同步曝光计划，欢迎热爱写作的你一起参与！

网站

登录后参与评论

0 条评论

热度

使用requests+BeautifulSoup爬取龙族V小说

使用requests+BeautifulSoup爬取龙族V小说

社区

活动

资源

关于

腾讯云开发者

热门产品

热门推荐

更多推荐