首页
学习
活动
专区
工具
TVP
发布
社区首页 >专栏 >python爬虫抓取内涵段子

python爬虫抓取内涵段子

作者头像
IT架构圈
发布2018-06-01 11:11:05
2.1K0
发布2018-06-01 11:11:05
举报
文章被收录于专栏:IT架构圈IT架构圈
#!/usr/bin/env python
#coding:utf-8
import requests,io,time
from bs4 import BeautifulSoup
def neihanjoke():
    headers = {
        'Accept':'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8',
        'Accept-Encoding':'gzip, deflate',
        'Accept-Language':'zh-CN,zh;q=0.9',
        'Cookie':'tt_webid=6536425014367225358; uuid="w:1057f146c0254dafbd487a6da58210b7"; _ga=GA1.2.64952905.1521880043; _gid=GA1.2.1818828277.1521880043; csrftoken=111d911d1b2b2a61b5cad8282ee5b16e; _gat=1',
        'Host':'neihanshequ.com',
        'Referer':'https://www.baidu.com/link?url=DP5I6qLhobaPUAJ321iP0PzTkPBvbUE0-YdK4x6H01Wuq_PuPpwErjcv4dICWag3&wd=&eqid=82195f930001ef0c000000035ab61073',
        'Upgrade-Insecure-Requests':'1',
        'User-Agent':'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/64.0.3282.119 Safari/537.36',
    }
    s = requests.session()
    s.keep_alive = False
    respone = requests.get('http://neihanshequ.com/',headers=headers)
    soup = BeautifulSoup(respone.text,"html.parser")
    jokedct = {}
    for joke in soup.find_all('div',class_='detail-wrapper'):
        value =  u'发布时间:'+ joke.find_all('span')[1]['title'] +'   '+ u'段子内容:'+joke.p.string
        jokedct[joke.span.text] = value
    f = io.open('neihanjok.txt', 'a',encoding='utf-8')
    for joke in jokedct:
        joke2 = u"用户: %s   %s  \n" %(joke,jokedct[joke])
        f.write(joke2)
    f.close()
if __name__ == '__main__':
    f = open("neihanjok.txt")
    lines =  len(f.readlines())
    while lines < 52113.14:
        f = open("neihanjok.txt")
        lines = len(f.readlines())
        neihanjoke()
        time.sleep(3)
本文参与 腾讯云自媒体分享计划,分享自微信公众号。
原始发表:2018-03-26,如有侵权请联系 cloudcommunity@tencent.com 删除

本文分享自 编程坑太多 微信公众号,前往查看

如有侵权,请联系 cloudcommunity@tencent.com 删除。

本文参与 腾讯云自媒体分享计划  ,欢迎热爱写作的你一起参与!

评论
登录后参与评论
0 条评论
热度
最新
推荐阅读
领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档