前往小程序,Get更优阅读体验!
立即前往
首页
学习
活动
专区
工具
TVP
发布
社区首页 >专栏 >Scrapy笔记三 自动多网页爬取-本wordpress博客所有文章

Scrapy笔记三 自动多网页爬取-本wordpress博客所有文章

作者头像
十四君
发布2019-11-27 23:37:43
7440
发布2019-11-27 23:37:43
举报
文章被收录于专栏:UrlteamUrlteam

学习自http://blog.csdn.net/u012150179/article/details/34486677

重写了部分代码使之能够爬取本博客.

0.创建项目

代码语言:javascript
复制
scapy startproject URLteam

1.items.py

代码语言:javascript
复制
# -*- coding:utf-8 -*-
 
from scrapy.item import Item, Field
 
class UrlteamItem(Item):
 
    article_name = Field()
    article_url = Field()

2.pipelines.py

代码语言:javascript
复制
import json
import codecs
 
class UrlteamPipeline(object):
 
    def __init__(self):
        self.file = codecs.open('urlteam_data.json', mode='wb', encoding='utf-8')
 
    def process_item(self, item, spider):
        line = json.dumps(dict(item)) + '\n'
        self.file.write(line.decode("unicode_escape"))
 
        return item

3.settings.py

代码语言:javascript
复制
# -*- coding: utf-8 -*-
 
# Scrapy settings for urlteam project
#
# For simplicity, this file contains only the most important settings by
# default. All the other settings are documented here:
#
#     http://doc.scrapy.org/en/latest/topics/settings.html
#
 
BOT_NAME = 'URLteam'
 
SPIDER_MODULES = ['URLteam.spiders']
NEWSPIDER_MODULE = 'URLteam.spiders'
 
# Crawl responsibly by identifying yourself (and your website) on the user-agent
#USER_AGENT = 'urlteam (+https://www.urlteam.org)'
#禁止cookies,防止被ban  
COOKIES_ENABLED = False  
  
ITEM_PIPELINES = {  
    'URLteam.pipelines.UrlteamPipeline':300  
}  

4.spiders目录下建立urlteam.py

代码语言:javascript
复制
#!/usr/bin/python
# -*- coding:utf-8 -*-
 
# from scrapy.contrib.spiders import  CrawlSpider,Rule
 
from scrapy.spider import Spider
from scrapy.http import Request
from scrapy.selector import Selector
from URLteam.items import UrlteamItem
 
 
class URLteamSpider(Spider):
    name = "urlteam"
    #减慢爬取速度 为1s
    download_delay = 1
    allowed_domains = ["urlteam.org"]
    start_urls = [
        "https://www.urlteam.org/2016/06/scrapy-%E5%85%A5%E9%97%A8%E9%A1%B9%E7%9B%AE-%E7%88%AC%E8%99%AB%E6%8A%93%E5%8F%96w3c%E7%BD%91%E7%AB%99/"
    ]
 
    def parse(self, response):
        sel = Selector(response)
 
        #items = []
        #获得文章url和标题
        item = UrlteamItem()
 
        article_url = str(response.url)
        article_name = sel.xpath('//h1/text()').extract()
 
        item['article_name'] = [n.encode('utf-8') for n in article_name]
        item['article_url'] = article_url.encode('utf-8')
 
        yield item
 
        #获得下一篇文章的url
        urls = sel.xpath('//div[@class="nav-previous"]/a/@href').extract()
 
        for url in urls:
            print url
            yield Request(url, callback=self.parse)

5.运行与结果

在urlteam项目根目录下运行

代码语言:javascript
复制
scrapy crawl urlteam

结果.预计会花很多时间,,反正我运行了半小时还没结束,不过可以提前看看结果,我的json文件的内容是:

比较长,基本是标题和url都爬下来了,.,大约有100多个页面,现在半小时才爬了90+…

代码语言:javascript
复制
{"article_name": ["scrapy-笔记一 入门项目 爬虫抓取w3c网站"], "article_url": "https://www.urlteam.org/2016/06/scrapy-%E5%85%A5%E9%97%A8%E9%A1%B9%E7%9B%AE-%E7%88%AC%E8%99%AB%E6%8A%93%E5%8F%96w3c%E7%BD%91%E7%AB%99/"}
{"article_name": ["视力大作战_web静态小游戏制作"], "article_url": "https://www.urlteam.org/2016/06/%e8%a7%86%e5%8a%9b%e5%a4%a7%e4%bd%9c%e6%88%98_web%e9%9d%99%e6%80%81%e5%b0%8f%e6%b8%b8%e6%88%8f%e5%88%b6%e4%bd%9c/"}
{"article_name": ["解决 启动mysql 提示 stop: Unknown instance"], "article_url": "https://www.urlteam.org/2016/06/%e8%a7%a3%e5%86%b3-%e5%90%af%e5%8a%a8mysql-%e6%8f%90%e7%a4%ba-stop-unknown-instance/"}
{"article_name": ["树莓派 python 百度语音控制 gpio 控制开关灯"], "article_url": "https://www.urlteam.org/2016/06/%e6%a0%91%e8%8e%93%e6%b4%be-python-%e7%99%be%e5%ba%a6%e8%af%ad%e9%9f%b3%e6%8e%a7%e5%88%b6-gpio-%e6%8e%a7%e5%88%b6%e5%bc%80%e5%85%b3%e7%81%af/"}
{"article_name": ["IT方向各种资源集合"], "article_url": "https://www.urlteam.org/2016/05/it%e6%96%b9%e5%90%91%e5%90%84%e7%a7%8d%e8%b5%84%e6%ba%90%e9%9b%86%e5%90%88/"}
{"article_name": ["交互式艺术设计–壁纸设计生成器"], "article_url": "https://www.urlteam.org/2016/05/%e4%ba%a4%e4%ba%92%e5%bc%8f%e8%89%ba%e6%9c%af%e8%ae%be%e8%ae%a1/"}
{"article_name": ["为何总是忙碌却不知干了什么? 关于栈式与优先队列式的处事风格"], "article_url": "https://www.urlteam.org/2016/05/%e4%b8%ba%e4%bd%95%e6%80%bb%e6%98%af%e5%bf%99%e7%a2%8c%e5%8d%b4%e4%b8%8d%e7%9f%a5%e5%b9%b2%e4%ba%86%e4%bb%80%e4%b9%88-%e5%85%b3%e4%ba%8e%e6%a0%88%e5%bc%8f%e4%b8%8e%e4%bc%98%e5%85%88%e9%98%9f%e5%88%97/"}
{"article_name": ["模拟EXCEL排序 c++ sort排序 多重排序 题解"], "article_url": "https://www.urlteam.org/2016/05/%e6%a8%a1%e6%8b%9fexcel%e6%8e%92%e5%ba%8f-c-sort%e6%8e%92%e5%ba%8f-%e5%a4%9a%e9%87%8d%e6%8e%92%e5%ba%8f-%e9%a2%98%e8%a7%a3/"}
{"article_name": ["ubuntu 14|15下服务器下搭建 hustoj 比赛平台 附多题库与问题解析"], "article_url": "https://www.urlteam.org/2016/05/ubuntu-1415%e4%b8%8b%e6%9c%8d%e5%8a%a1%e5%99%a8%e4%b8%8b%e6%90%ad%e5%bb%ba-hustoj-%e6%af%94%e8%b5%9b%e5%b9%b3%e5%8f%b0-%e9%99%84%e5%a4%9a%e9%a2%98%e5%ba%93%e4%b8%8e%e9%97%ae%e9%a2%98%e8%a7%a3/"}
{"article_name": ["ACM算法模板收藏 支持免费下载"], "article_url": "https://www.urlteam.org/2016/05/acm%e7%ae%97%e6%b3%95%e6%a8%a1%e6%9d%bf%e6%94%b6%e8%97%8f-%e6%94%af%e6%8c%81%e5%85%8d%e8%b4%b9%e4%b8%8b%e8%bd%bd/"}
{"article_name": ["银行业务队列简单模拟 STL队列 题解", "STL__sqeue 队列", "5-18 银行业务队列简单模拟   (25分)"], "article_url": "https://www.urlteam.org/2016/05/%e9%93%b6%e8%a1%8c%e4%b8%9a%e5%8a%a1%e9%98%9f%e5%88%97%e7%ae%80%e5%8d%95%e6%a8%a1%e6%8b%9f-stl%e9%98%9f%e5%88%97-%e9%a2%98%e8%a7%a3/"}
{"article_name": ["5-26 Windows消息队列   (25分) 结构体优先队列解法", "优先队列:", "5-26 Windows消息队列   (25分)", "题解:"], "article_url": "https://www.urlteam.org/2016/05/5-26-windows%e6%b6%88%e6%81%af%e9%98%9f%e5%88%97-25%e5%88%86-%e7%bb%93%e6%9e%84%e4%bd%93%e4%bc%98%e5%85%88%e9%98%9f%e5%88%97%e8%a7%a3%e6%b3%95/"}
{"article_name": ["遥 控 器 | 河南省第五届省赛题解"], "article_url": "https://www.urlteam.org/2016/05/%e9%81%a5-%e6%8e%a7-%e5%99%a8-%e6%b2%b3%e5%8d%97%e7%9c%81%e7%ac%ac%e4%ba%94%e5%b1%8a%e7%9c%81%e8%b5%9b%e9%a2%98%e8%a7%a3/"}
{"article_name": ["2016″百度之星” – 资格赛(更新中)", "Problem A", "题解:", "Problem B", "Problem D"], "article_url": "https://www.urlteam.org/2016/05/2016%e7%99%be%e5%ba%a6%e4%b9%8b%e6%98%9f-%e8%b5%84%e6%a0%bc%e8%b5%9b%e6%9b%b4%e6%96%b0%e4%b8%ad/"}
{"article_name": ["python 规范审查 pylint 的使用"], "article_url": "https://www.urlteam.org/2016/05/python-%e8%a7%84%e8%8c%83%e5%ae%a1%e6%9f%a5-pylint-%e7%9a%84%e4%bd%bf%e7%94%a8/"}
{"article_name": ["python-pep8 编码规范"], "article_url": "https://www.urlteam.org/2016/05/python-pep8-%e7%bc%96%e7%a0%81%e8%a7%84%e8%8c%83/"}
{"article_name": ["Merchant’s Guide To The Galaxy笔试题解析 python解决 罗马数字转阿拉伯数字"], "article_url": "https://www.urlteam.org/2016/05/merchants-guide-to-the-galaxy%e7%ac%94%e8%af%95%e9%a2%98%e8%a7%a3%e6%9e%90/"}
{"article_name": ["背包九讲之分组背包-HDU1712题解"], "article_url": "https://www.urlteam.org/2016/05/%e8%83%8c%e5%8c%85%e4%b9%9d%e8%ae%b2%e4%b9%8b%e5%88%86%e7%bb%84%e8%83%8c%e5%8c%85%ef%bc%8d%ef%bd%88%ef%bd%84%ef%bd%95%ef%bc%91%ef%bc%97%ef%bc%91%ef%bc%92%e9%a2%98%e8%a7%a3/"}
{"article_name": ["python语音智能对话聊天机器人,linux&&树莓派双平台兼容"], "article_url": "https://www.urlteam.org/2016/05/python%e8%af%ad%e9%9f%b3%e6%99%ba%e8%83%bd%e5%af%b9%e8%af%9d%e8%81%8a%e5%a4%a9%e6%9c%ba%e5%99%a8%e4%ba%ba%ef%bc%8clinux%e6%a0%91%e8%8e%93%e6%b4%be%e5%8f%8c%e5%b9%b3%e5%8f%b0%e5%85%bc%e5%ae%b9/"}
{"article_name": ["河南省第一届ACM程序设计大赛题解"], "article_url": "https://www.urlteam.org/2016/04/%e6%b2%b3%e5%8d%97%e7%9c%81%e7%ac%ac%e4%b8%80%e5%b1%8a%ef%bd%81%ef%bd%83%ef%bd%8d%e7%a8%8b%e5%ba%8f%e8%ae%be%e8%ae%a1%e5%a4%a7%e8%b5%9b%e9%a2%98%e8%a7%a3/"}
{"article_name": ["动态规划-各种子序列问题集合"], "article_url": "https://www.urlteam.org/2016/04/%e5%8a%a8%e6%80%81%e8%a7%84%e5%88%92%ef%bc%8d%e5%90%84%e7%a7%8d%e5%ad%90%e5%ba%8f%e5%88%97%e9%97%ae%e9%a2%98%e9%9b%86%e5%90%88/"}
{"article_name": ["背包九讲-问法的灵活变化"], "article_url": "https://www.urlteam.org/2016/04/%e8%83%8c%e5%8c%85%e4%b9%9d%e8%ae%b2%ef%bc%8d%e9%97%ae%e6%b3%95%e7%9a%84%e7%81%b5%e6%b4%bb%e5%8f%98%e5%8c%96/"}
{"article_name": ["背包九讲之二维费用的背包", "FATE"], "article_url": "https://www.urlteam.org/2016/04/%e8%83%8c%e5%8c%85%e4%b9%9d%e8%ae%b2%e4%b9%8b%e4%ba%8c%e7%bb%b4%e8%b4%b9%e7%94%a8%e7%9a%84%e8%83%8c%e5%8c%85/"}
{"article_name": ["写代码没激情怎么办?atom教你酷炫掉咋天"], "article_url": "https://www.urlteam.org/2016/04/%e5%86%99%e4%bb%a3%e7%a0%81%e6%b2%a1%e6%bf%80%e6%83%85%e6%80%8e%e4%b9%88%e5%8a%9e%ef%bc%9fatom%e6%95%99%e4%bd%a0%e9%85%b7%e7%82%ab%e6%8e%89%e5%92%8b%e5%a4%a9/"}
{"article_name": ["解决.htaccess: Invalid command ‘RewriteEngine’,问题"], "article_url": "https://www.urlteam.org/2016/04/%e8%a7%a3%e5%86%b3-htaccess-invalid-command-rewriteengine%e9%97%ae%e9%a2%98/"}
{"article_name": ["背包九讲之多重背包&&混合背包详解"], "article_url": "https://www.urlteam.org/2016/04/%e8%83%8c%e5%8c%85%e4%b9%9d%e8%ae%b2%e4%b9%8b%e5%a4%9a%e9%87%8d%e8%83%8c%e5%8c%85%e8%af%a6%e8%a7%a3/"}
{"article_name": ["STL set"], "article_url": "https://www.urlteam.org/2016/04/stl-set/"}
{"article_name": ["c语言函数库学习~sscanf~格式化输入"], "article_url": "https://www.urlteam.org/2016/04/c%e8%af%ad%e8%a8%80%e5%87%bd%e6%95%b0%e5%ba%93%e5%ad%a6%e4%b9%a0sscanf%e6%a0%bc%e5%bc%8f%e5%8c%96%e8%be%93%e5%85%a5/"}
{"article_name": ["C的|、||、&、&&、异或、~、!运算 位运算"], "article_url": "https://www.urlteam.org/2016/04/c%e7%9a%84%e3%80%81%e3%80%81%e3%80%81%e3%80%81%e5%bc%82%e6%88%96%e3%80%81%e3%80%81%ef%bc%81%e8%bf%90%e7%ae%97-%e4%bd%8d%e8%bf%90%e7%ae%97/"}
{"article_name": ["STL 算法部分 原创入门教程,要详细资料请百度"], "article_url": "https://www.urlteam.org/2016/04/stl-%e7%ae%97%e6%b3%95%e9%83%a8%e5%88%86-%e5%8e%9f%e5%88%9b%e5%85%a5%e9%97%a8%e6%95%99%e7%a8%8b%ef%bc%8c%e8%a6%81%e8%af%a6%e7%bb%86%e8%b5%84%e6%96%99%e8%af%b7%e7%99%be%e5%ba%a6/"}
{"article_name": ["acm比赛刷题小技巧"], "article_url": "https://www.urlteam.org/2016/04/acm%e6%af%94%e8%b5%9b%e5%88%b7%e9%a2%98%e5%b0%8f%e6%8a%80%e5%b7%a7/"}
{"article_name": ["shell脚本实现监控服务器mysql,解决服务器内存不足自动关闭mysql问题"], "article_url": "https://www.urlteam.org/2016/04/shell%e8%84%9a%e6%9c%ac%e5%ae%9e%e7%8e%b0%e7%9b%91%e6%8e%a7%e6%9c%8d%e5%8a%a1%e5%99%a8mysql%ef%bc%8c%e8%a7%a3%e5%86%b3%e6%9c%8d%e5%8a%a1%e5%99%a8%e5%86%85%e5%ad%98%e4%b8%8d%e8%b6%b3%e8%87%aa%e5%8a%a8/"}
{"article_name": ["linux运维常用状态检测工具集锦"], "article_url": "https://www.urlteam.org/2016/04/linux%e8%bf%90%e7%bb%b4%e5%b8%b8%e7%94%a8%e7%8a%b6%e6%80%81%e6%a3%80%e6%b5%8b%e5%b7%a5%e5%85%b7%e9%9b%86%e9%94%a6/"}
{"article_name": ["背包九讲之完全背包详解"], "article_url": "https://www.urlteam.org/2016/04/%e8%83%8c%e5%8c%85%e4%b9%9d%e8%ae%b2%e4%b9%8b%e5%ae%8c%e5%85%a8%e8%83%8c%e5%8c%85%e8%af%a6%e8%a7%a3/"}
{"article_name": ["背包九讲之01背包详解"], "article_url": "https://www.urlteam.org/2016/04/%e8%83%8c%e5%8c%85%e4%b9%9d%e8%ae%b2%e4%b9%8b01%e8%83%8c%e5%8c%85%e8%af%a6%e8%a7%a3/"}
{"article_name": ["使用linux下的dd指令为树莓派做备份"], "article_url": "https://www.urlteam.org/2016/04/%e4%bd%bf%e7%94%a8linux%e4%b8%8b%e7%9a%84dd%e6%8c%87%e4%bb%a4%e4%b8%ba%e6%a0%91%e8%8e%93%e6%b4%be%e5%81%9a%e5%a4%87%e4%bb%bd/"}
{"article_name": ["linux服务器ssh文件传输—scp使用指南"], "article_url": "https://www.urlteam.org/2016/03/linux%e6%9c%8d%e5%8a%a1%e5%99%a8ssh%e6%96%87%e4%bb%b6%e4%bc%a0%e8%be%93-scp%e4%bd%bf%e7%94%a8%e6%8c%87%e5%8d%97/"}
{"article_name": ["2016-3.28.重新整理学习生活方针"], "article_url": "https://www.urlteam.org/2016/03/2016-3-28-%e9%87%8d%e6%96%b0%e6%95%b4%e7%90%86%e5%ad%a6%e4%b9%a0%e7%94%9f%e6%b4%bb%e6%96%b9%e9%92%88/"}
{"article_name": ["wordpress解决谷歌字体问题–与谷歌字体的战争!"], "article_url": "https://www.urlteam.org/2016/03/%e6%94%b9%e5%86%99%e4%b8%80%e5%8f%a5wordpress%e8%a7%a3%e5%86%b3%e8%b0%b7%e6%ad%8c%e5%ad%97%e4%bd%93%e9%97%ae%e9%a2%98/"}
{"article_name": ["mysql基本操作以及python控制mysql(3)–python控制"], "article_url": "https://www.urlteam.org/2016/03/mysql%e5%9f%ba%e6%9c%ac%e6%93%8d%e4%bd%9c%e4%bb%a5%e5%8f%8apython%e6%8e%a7%e5%88%b6mysql%ef%bc%883%ef%bc%89-python%e6%8e%a7%e5%88%b6/"}
{"article_name": ["mysql基本操作以及python控制mysql(2)–mysql基础操作"], "article_url": "https://www.urlteam.org/2016/03/mysql%e5%9f%ba%e6%9c%ac%e6%93%8d%e4%bd%9c%e4%bb%a5%e5%8f%8apython%e6%8e%a7%e5%88%b6mysql%ef%bc%882%ef%bc%89-mysql%e5%9f%ba%e7%a1%80%e6%93%8d%e4%bd%9c/"}
{"article_name": ["mysql基本操作以及python控制mysql(1)–环境安装"], "article_url": "https://www.urlteam.org/2016/03/mysql%e5%9f%ba%e6%9c%ac%e6%93%8d%e4%bd%9c%e4%bb%a5%e5%8f%8apython%e6%8e%a7%e5%88%b6mysql%ef%bc%881%ef%bc%89-%e7%8e%af%e5%a2%83%e5%ae%89%e8%a3%85/"}
{"article_name": ["python解决UnicodeEncodeError: ‘ascii’ codec can’t encode characters in position问题"], "article_url": "https://www.urlteam.org/2016/03/python%e8%a7%a3%e5%86%b3unicodeencodeerror-ascii-codec-cant-encode-characters-in-position%e9%97%ae%e9%a2%98/"}
{"article_name": ["人脸识别考勤系统-第二版本研发手札"], "article_url": "https://www.urlteam.org/2016/03/%e4%ba%ba%e8%84%b8%e8%af%86%e5%88%ab%e8%80%83%e5%8b%a4%e7%b3%bb%e7%bb%9f%ef%bc%8d%e7%ac%ac%e4%ba%8c%e7%89%88%e6%9c%ac%e7%a0%94%e5%8f%91%e6%89%8b%e6%9c%ad/"}
{"article_name": ["与高兄信"], "article_url": "https://www.urlteam.org/2016/03/%e4%b8%8e%e9%ab%98%e5%85%84%e4%bf%a1/"}
{"article_name": ["仔细想想,该认真再纠正下生活的路线了."], "article_url": "https://www.urlteam.org/2016/03/%e4%bb%94%e7%bb%86%e6%83%b3%e6%83%b3%ef%bc%8c%e8%af%a5%e8%ae%a4%e7%9c%9f%e5%86%8d%e7%ba%a0%e6%ad%a3%e4%b8%8b%e7%94%9f%e6%b4%bb%e7%9a%84%e8%b7%af%e7%ba%bf%e4%ba%86%ef%bc%8e/"}
{"article_name": ["python_os 系统文件夹操作"], "article_url": "https://www.urlteam.org/2016/03/python_os%e3%80%80%e7%b3%bb%e7%bb%9f%e6%96%87%e4%bb%b6%e5%a4%b9%e6%93%8d%e4%bd%9c/"}
{"article_name": ["python_face++ 上传本地图片进行解析"], "article_url": "https://www.urlteam.org/2016/03/python_face%e3%80%80%e4%b8%8a%e4%bc%a0%e6%9c%ac%e5%9c%b0%e5%9b%be%e7%89%87%e8%bf%9b%e8%a1%8c%e8%a7%a3%e6%9e%90/"}
{"article_name": ["通过python在两台linux服务器间传递文件"], "article_url": "https://www.urlteam.org/2016/03/%e9%80%9a%e8%bf%87python%e5%9c%a8%e4%b8%a4%e5%8f%b0linux%e6%9c%8d%e5%8a%a1%e5%99%a8%e9%97%b4%e4%bc%a0%e9%80%92%e6%96%87%e4%bb%b6/"}
{"article_name": ["在新服务器上搭建wordpress网站"], "article_url": "https://www.urlteam.org/2016/03/%e5%9c%a8%e6%96%b0%e6%9c%8d%e5%8a%a1%e5%99%a8%e4%b8%8a%e6%90%ad%e5%bb%bawordpress%e7%bd%91%e7%ab%99/"}
{"article_name": ["让树莓派开机运行Python脚本"], "article_url": "https://www.urlteam.org/2016/03/%e8%ae%a9%e6%a0%91%e8%8e%93%e6%b4%be%e5%bc%80%e6%9c%ba%e8%bf%90%e8%a1%8cpython%e8%84%9a%e6%9c%ac/"}
{"article_name": ["人脸识别考勤机开发计划"], "article_url": "https://www.urlteam.org/2016/02/%e4%ba%ba%e8%84%b8%e8%af%86%e5%88%ab%e8%80%83%e5%8b%a4%e6%9c%ba%e5%bc%80%e5%8f%91%e8%ae%a1%e5%88%92/"}
{"article_name": ["face++人脸识别与人脸库匹配python实现笔记二"], "article_url": "https://www.urlteam.org/2016/02/face%e4%ba%ba%e8%84%b8%e8%af%86%e5%88%ab%e4%b8%8e%e4%ba%ba%e8%84%b8%e5%ba%93%e5%8c%b9%e9%85%8dpython%e5%ae%9e%e7%8e%b0%e7%ac%94%e8%ae%b0%e4%b8%80/"}
{"article_name": ["face++人脸识别与人脸库匹配python实现笔记一"], "article_url": "https://www.urlteam.org/2016/02/face%e4%ba%ba%e8%84%b8%e8%af%86%e5%88%ab%e4%b8%8e%e9%85%8d%e5%a4%87python%e5%ae%9e%e7%8e%b0%e7%ac%94%e8%ae%b0/"}
{"article_name": ["python-opencv人脸识别与树莓派摄像头转头跟随()"], "article_url": "https://www.urlteam.org/2016/02/python-opencv%e4%ba%ba%e8%84%b8%e8%af%86%e5%88%ab%e4%b8%8e%e6%a0%91%e8%8e%93%e6%b4%be%e6%91%84%e5%83%8f%e5%a4%b4%e8%bd%ac%e5%a4%b4%e8%b7%9f%e9%9a%8f/"}
{"article_name": ["树莓派局域网实时照片监控"], "article_url": "https://www.urlteam.org/2016/01/%e6%a0%91%e8%8e%93%e6%b4%be%e5%b1%80%e5%9f%9f%e7%bd%91%e5%ae%9e%e6%97%b6%e7%85%a7%e7%89%87%e7%9b%91%e6%8e%a7/"}
{"article_name": ["创新实验室python&linux零下五度小组技能树规划"], "article_url": "https://www.urlteam.org/2016/01/%e5%88%9b%e6%96%b0%e5%ae%9e%e9%aa%8c%e5%ae%a4python%ef%bc%86linux%e9%9b%b6%e4%b8%8b%e4%ba%94%e5%ba%a6%e5%b0%8f%e7%bb%84%e6%8a%80%e8%83%bd%e6%a0%91%e8%a7%84%e5%88%92/"}
{"article_name": ["python多线程100进程一起ping演习笔记"], "article_url": "https://www.urlteam.org/2016/01/python%e5%a4%9a%e7%ba%bf%e7%a8%8b100%e8%bf%9b%e7%a8%8b%e4%b8%80%e8%b5%b7ping%e6%bc%94%e4%b9%a0%e7%ac%94%e8%ae%b0/"}
{"article_name": ["python-aiml人工智能+百度语音对话"], "article_url": "https://www.urlteam.org/2016/01/python-aiml%e4%ba%ba%e5%b7%a5%e6%99%ba%e8%83%bd%ef%bc%8b%e7%99%be%e5%ba%a6%e8%af%ad%e9%9f%b3%e5%af%b9%e8%af%9d/"}
{"article_name": ["python根据ip获取地理位置再查询天气情况调百度语音合成朗读"], "article_url": "https://www.urlteam.org/2016/01/python%e5%b0%8f%e7%a8%8b%e5%ba%8f-%e6%a0%b9%e6%8d%ae%e6%9c%ac%e6%9c%ba%e5%a4%96%e7%bd%91ip%e8%87%aa%e5%8a%a8%e6%9f%a5%e8%af%a2%e5%a4%a9%e6%b0%94%e6%83%85%e5%86%b5/"}
{"article_name": ["python调用百度天气api查询城市天气情况"], "article_url": "https://www.urlteam.org/2016/01/python%e8%b0%83%e7%94%a8%e7%99%be%e5%ba%a6%e5%a4%a9%e6%b0%94api%e6%9f%a5%e8%af%a2%e5%9f%8e%e5%b8%82%e5%a4%a9%e6%b0%94%e6%83%85%e5%86%b5/"}
{"article_name": ["python小游戏,猫抓老鼠"], "article_url": "https://www.urlteam.org/2016/01/python%e5%b0%8f%e6%b8%b8%e6%88%8f%ef%bc%8c%e7%8c%ab%e6%8a%93%e8%80%81%e9%bc%a0/"}
{"article_name": ["HTK隐马尔可夫模型–语音识别–项目笔记(含资源源码文件测试等)"], "article_url": "https://www.urlteam.org/2016/01/%e6%b5%8b%e8%af%95%e8%b5%84%e6%ba%90%e7%8e%af%e5%a2%83%e4%b8%8b%e8%bd%bd/"}
{"article_name": ["Linux查看实时带宽流量情况以及查看端口信息", "端口信息"], "article_url": "https://www.urlteam.org/2016/01/linux%e6%9f%a5%e7%9c%8b%e5%ae%9e%e6%97%b6%e5%b8%a6%e5%ae%bd%e6%b5%81%e9%87%8f%e6%83%85%e5%86%b5/"}
{"article_name": ["linux下free查看内存命令详细解析"], "article_url": "https://www.urlteam.org/2016/01/linux%e4%b8%8bfree%e6%9f%a5%e7%9c%8b%e5%86%85%e5%ad%98%e5%91%bd%e4%bb%a4%e8%af%a6%e7%bb%86%e8%a7%a3%e6%9e%90/"}
{"article_name": ["配置apache2使用不同端口或者域名访问网站", "第二种方法用域名绑定"], "article_url": "https://www.urlteam.org/2016/01/%e9%85%8d%e7%bd%aeapache2%e4%bd%bf%e7%94%a8%e4%b8%8d%e5%90%8c%e7%ab%af%e5%8f%a3%e6%88%96%e8%80%85%e5%9f%9f%e5%90%8d%e8%ae%bf%e9%97%ae%e7%bd%91%e7%ab%99/"}
{"article_name": ["apache2的几个核心设置优化"], "article_url": "https://www.urlteam.org/2015/12/apache2%e7%9a%84%e5%87%a0%e4%b8%aa%e6%a0%b8%e5%bf%83%e8%ae%be%e7%bd%ae%e4%bc%98%e5%8c%96/"}
{"article_name": ["pygame-游戏开发学习笔记(九)–pygame.向量实现"], "article_url": "https://www.urlteam.org/2015/12/pygame-%e6%b8%b8%e6%88%8f%e5%bc%80%e5%8f%91%e5%ad%a6%e4%b9%a0%e7%ac%94%e8%ae%b0%ef%bc%88%e4%b9%9d%ef%bc%89-pygame-%e5%90%91%e9%87%8f%e5%ae%9e%e7%8e%b0/"}
{"article_name": ["python常用函数总结"], "article_url": "https://www.urlteam.org/2015/12/python%e5%b8%b8%e7%94%a8%e5%87%bd%e6%95%b0%e6%80%bb%e7%bb%93/"}
{"article_name": ["pygame-游戏开发学习笔记(八)–pygame.time&&fps 动画制作"], "article_url": "https://www.urlteam.org/2015/12/pygame-%e6%b8%b8%e6%88%8f%e5%bc%80%e5%8f%91%e5%ad%a6%e4%b9%a0%e7%ac%94%e8%ae%b0%ef%bc%88%e4%ba%94%ef%bc%89-pygame-timefps-%e5%8a%a8%e7%94%bb%e5%88%b6%e4%bd%9c/"}
{"article_name": ["pygame-游戏开发学习笔记(七)–pygame.draw,画图。"], "article_url": "https://www.urlteam.org/2015/12/pygame-%e6%b8%b8%e6%88%8f%e5%bc%80%e5%8f%91%e5%ad%a6%e4%b9%a0%e7%ac%94%e8%ae%b0%ef%bc%88%e4%b8%83%ef%bc%89-pygame-draw%ef%bc%8c%e7%94%bb%e5%9b%be%e3%80%82/"}
{"article_name": ["学习-用Python和Pygame写游戏-从入门到精通(6)"], "article_url": "https://www.urlteam.org/2015/12/%e5%ad%a6%e4%b9%a0-%e7%94%a8python%e5%92%8cpygame%e5%86%99%e6%b8%b8%e6%88%8f-%e4%bb%8e%e5%85%a5%e9%97%a8%e5%88%b0%e7%b2%be%e9%80%9a%ef%bc%886%ef%bc%89/"}
{"article_name": ["使用pyaiml机器人模块快速做个和你智能对话的大脑"], "article_url": "https://www.urlteam.org/2015/12/%e4%bd%bf%e7%94%a8pyaiml%e6%9c%ba%e5%99%a8%e4%ba%ba%e6%a8%a1%e5%9d%97%e5%bf%ab%e9%80%9f%e5%81%9a%e4%b8%aa%e5%92%8c%e4%bd%a0%e6%99%ba%e8%83%bd%e5%af%b9%e8%af%9d%e7%9a%84%e5%a4%a7%e8%84%91/"}
{"article_name": ["学习—用 Python 和 OpenCV 检测和跟踪运动对象"], "article_url": "https://www.urlteam.org/2015/12/%e5%ad%a6%e4%b9%a0-%e7%94%a8-python-%e5%92%8c-opencv-%e6%a3%80%e6%b5%8b%e5%92%8c%e8%b7%9f%e8%b8%aa%e8%bf%90%e5%8a%a8%e5%af%b9%e8%b1%a1/"}
{"article_name": ["百度语音识别api使用python进行调用"], "article_url": "https://www.urlteam.org/2015/12/%e7%99%be%e5%ba%a6%e8%af%ad%e9%9f%b3%e8%af%86%e5%88%abapi%e4%bd%bf%e7%94%a8python%e8%bf%9b%e8%a1%8c%e8%b0%83%e7%94%a8/"}
{"article_name": ["树莓派开机左上角光标闪烁无法进图形系统问题解决"], "article_url": "https://www.urlteam.org/2015/12/%e6%a0%91%e8%8e%93%e6%b4%be%e5%bc%80%e6%9c%ba%e5%b7%a6%e4%b8%8a%e8%a7%92%e5%85%89%e6%a0%87%e9%97%aa%e7%83%81%e6%97%a0%e6%b3%95%e8%bf%9b%e5%9b%be%e5%bd%a2%e7%b3%bb%e7%bb%9f%e9%97%ae%e9%a2%98%e8%a7%a3/"}
{"article_name": ["解决gitpush的时候因为误加入特大文件,导致push失败"], "article_url": "https://www.urlteam.org/2015/12/%e8%a7%a3%e5%86%b3gitpush%e7%9a%84%e6%97%b6%e5%80%99%e5%9b%a0%e4%b8%ba%e8%af%af%e5%8a%a0%e5%85%a5%e7%89%b9%e5%a4%a7%e6%96%87%e4%bb%b6%ef%bc%8c%e5%af%bc%e8%87%b4push%e5%a4%b1%e8%b4%a5/"}
{"article_name": ["git–在树莓派(新电脑)重新用git进行pull以及push"], "article_url": "https://www.urlteam.org/2015/12/git-%e5%9c%a8%e6%a0%91%e8%8e%93%e6%b4%be%ef%bc%88%e6%96%b0%e7%94%b5%e8%84%91%ef%bc%89%e9%87%8d%e6%96%b0%e7%94%a8git%e8%bf%9b%e8%a1%8cpull%e4%bb%a5%e5%8f%8apush/"}
{"article_name": ["还没想好"], "article_url": "https://www.urlteam.org/2015/12/%e8%bf%98%e6%b2%a1%e6%83%b3%e5%a5%bd/"}
{"article_name": ["服务器内存占用过高导致数据库服务关闭,网站无法登陆的错误详解-制作swap交换区加大内存"], "article_url": "https://www.urlteam.org/2015/12/%e6%9c%8d%e5%8a%a1%e5%99%a8%e5%86%85%e5%ad%98%e5%8d%a0%e7%94%a8%e8%bf%87%e9%ab%98%e5%af%bc%e8%87%b4%e6%95%b0%e6%8d%ae%e5%ba%93%e6%9c%8d%e5%8a%a1%e5%85%b3%e9%97%ad%ef%bc%8c%e7%bd%91%e7%ab%99%e6%97%a0/"}
{"article_name": ["基于树莓派以及语音与人脸识别的迎宾机器人交互系统开发计划"], "article_url": "https://www.urlteam.org/2015/12/%e5%9f%ba%e4%ba%8e%e6%a0%91%e8%8e%93%e6%b4%be%e4%bb%a5%e5%8f%8a%e4%ba%ba%e8%84%b8%e8%af%86%e5%88%ab%e7%9a%84%e8%bf%8e%e5%ae%be%e6%9c%ba%e5%99%a8%e4%ba%ba%e4%ba%a4%e4%ba%92%e7%b3%bb%e7%bb%9f%e5%bc%80/"}
{"article_name": ["pygame-游戏开发学习笔记(五)–pygame.Font,字体与中文以及错误检测的问题"], "article_url": "https://www.urlteam.org/2015/12/pygame-%e6%b8%b8%e6%88%8f%e5%bc%80%e5%8f%91%e5%ad%a6%e4%b9%a0%e7%ac%94%e8%ae%b0%ef%bc%88%e4%ba%94%ef%bc%89-pygame-font%ef%bc%8c%e5%ad%97%e4%bd%93%e4%b8%8e%e4%b8%ad%e6%96%87%e4%bb%a5%e5%8f%8a/"}
{"article_name": ["pygame-游戏开发学习笔记(四)–pygame.display.set_mode()显示的问题"], "article_url": "https://www.urlteam.org/2015/12/pygame-%e6%b8%b8%e6%88%8f%e5%bc%80%e5%8f%91%e5%ad%a6%e4%b9%a0%e7%ac%94%e8%ae%b0%ef%bc%88%e5%9b%9b%ef%bc%89-pygame-display-set_mode%e6%98%be%e7%a4%ba%e7%9a%84%e9%97%ae%e9%a2%98/"}
{"article_name": ["pygame-游戏开发学习笔记(三)–event事件捕捉"], "article_url": "https://www.urlteam.org/2015/12/pygame-%e6%b8%b8%e6%88%8f%e5%bc%80%e5%8f%91%e5%ad%a6%e4%b9%a0%e7%ac%94%e8%ae%b0%ef%bc%88%e4%b8%89%ef%bc%89-event%e4%ba%8b%e4%bb%b6%e6%8d%95%e6%8d%89/"}
{"article_name": ["pygame-游戏开发学习笔记(二)–模块表与背景图样例。"], "article_url": "https://www.urlteam.org/2015/12/pygame-%e6%b8%b8%e6%88%8f%e5%bc%80%e5%8f%91%e5%ad%a6%e4%b9%a0%e7%ac%94%e8%ae%b0%ef%bc%88%e4%ba%8c%ef%bc%89-%e6%a8%a1%e5%9d%97%e8%a1%a8%e4%b8%8e%e8%83%8c%e6%99%af%e5%9b%be%e6%a0%b7%e4%be%8b/"}
{"article_name": ["pygame-游戏开发学习笔记(一)–SDL和pygame等环境安装"], "article_url": "https://www.urlteam.org/2015/12/pygame-%e6%b8%b8%e6%88%8f%e5%bc%80%e5%8f%91%e5%ad%a6%e4%b9%a0%e7%ac%94%e8%ae%b0%ef%bc%88%e4%b8%80%ef%bc%89/"}
{"article_name": ["少年壮志不言愁:清华施一公演讲"], "article_url": "https://www.urlteam.org/2015/12/%e5%b0%91%e5%b9%b4%e5%a3%ae%e5%bf%97%e4%b8%8d%e8%a8%80%e6%84%81%ef%bc%9a%e6%b8%85%e5%8d%8e%e6%96%bd%e4%b8%80%e5%85%ac%e6%bc%94%e8%ae%b2/"}
{"article_name": ["python_sqlite–简单的数据库增删查改测试"], "article_url": "https://www.urlteam.org/2015/12/python_sqlite-%e7%ae%80%e5%8d%95%e7%9a%84%e6%95%b0%e6%8d%ae%e5%ba%93%e5%a2%9e%e5%88%a0%e6%9f%a5%e6%94%b9%e6%b5%8b%e8%af%95/"}
{"article_name": ["python–GUI–制作简单的文本文档"], "article_url": "https://www.urlteam.org/2015/12/python-gui-%e7%ae%80%e5%8d%95%e7%9a%84%e6%96%87%e6%9c%ac%e6%96%87%e6%a1%a3/"}
{"article_name": ["爬虫首尝试—爬取百度贴吧图片"], "article_url": "https://www.urlteam.org/2015/12/%e7%88%ac%e8%99%ab%e9%a6%96%e5%b0%9d%e8%af%95-%e7%88%ac%e5%8f%96%e7%99%be%e5%ba%a6%e8%b4%b4%e5%90%a7%e5%9b%be%e7%89%87/"}
{"article_name": ["python包包大全!经典资源"], "article_url": "https://www.urlteam.org/2015/12/python%e5%8c%85%e5%8c%85%e5%a4%a7%e5%85%a8%ef%bc%81%e7%bb%8f%e5%85%b8%e8%b5%84%e6%ba%90/"}
{"article_name": ["操作系统-多进程和多线程-python"], "article_url": "https://www.urlteam.org/2015/12/%e6%93%8d%e4%bd%9c%e7%b3%bb%e7%bb%9f-%e5%a4%9a%e8%bf%9b%e7%a8%8b%e5%92%8c%e5%a4%9a%e7%ba%bf%e7%a8%8b-python/"}
{"article_name": ["2015.十一月工作学习计划-ly–(初计划)"], "article_url": "https://www.urlteam.org/2015/12/2015-%e5%8d%81%e4%b8%80%e6%9c%88%e5%b7%a5%e4%bd%9c%e5%ad%a6%e4%b9%a0%e8%ae%a1%e5%88%92-ly-%ef%bc%88%e5%88%9d%e8%ae%a1%e5%88%92%ef%bc%89/"}
{"article_name": ["心情调节器开发记录"], "article_url": "https://www.urlteam.org/2015/11/%e5%bf%83%e6%83%85%e8%b0%83%e8%8a%82%e5%99%a8%e5%bc%80%e5%8f%91%e8%ae%b0%e5%bd%95/"}
{"article_name": ["AYProcessView启动动画实现"], "article_url": "https://www.urlteam.org/2015/11/ayprocessview%e5%90%af%e5%8a%a8%e5%8a%a8%e7%94%bb%e5%ae%9e%e7%8e%b0/"}

原创文章,转载请注明: 转载自URl-team

本文链接地址: Scrapy笔记三 自动多网页爬取-本wordpress博客所有文章

Related posts:

  1. Scrapy-笔记一 入门项目 爬虫抓取w3c网站
  2. Scrapy-笔记二 中文处理以及保存中文数据
  3. Scrapy笔记四 自动爬取网页之使用CrawlSpider
  4. Scrapy笔记五 爬取妹子图网的图片 详细解析
本文参与 腾讯云自媒体分享计划,分享自作者个人站点/博客。
原始发表:2016-06-202,如有侵权请联系 cloudcommunity@tencent.com 删除

本文分享自 作者个人站点/博客 前往查看

如有侵权,请联系 cloudcommunity@tencent.com 删除。

本文参与 腾讯云自媒体分享计划  ,欢迎热爱写作的你一起参与!

评论
登录后参与评论
0 条评论
热度
最新
推荐阅读
目录
  • 0.创建项目
  • 1.items.py
  • 2.pipelines.py
  • 3.settings.py
  • 4.spiders目录下建立urlteam.py
  • 5.运行与结果
    • Related posts:
    相关产品与服务
    人脸识别
    腾讯云神图·人脸识别(Face Recognition)基于腾讯优图强大的面部分析技术,提供包括人脸检测与分析、比对、搜索、验证、五官定位、活体检测等多种功能,为开发者和企业提供高性能高可用的人脸识别服务。 可应用于在线娱乐、在线身份认证等多种应用场景,充分满足各行业客户的人脸属性识别及用户身份确认等需求。
    领券
    问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档