专栏首页finleyMaScrapy1.6 爬虫框架2 提取数据

Scrapy1.6 爬虫框架2 提取数据

使用 scrapy shell 提取数据

scrapy shell 是 scrapy 提供的命令行工具,可以方便的调试 比如执行 scrapy shell "http://quotes.toscrape.com/page/1/"

image.png

提示我们会暴露出来很多有用的对象,比如response对象包含了css和xpath方法,可以进一步提取页面的title。

image.png

修改上节中建立的 quotes_spider.py 我们分别提取 text, author 和 tags

import scrapy

class QuotesSpider(scrapy.Spider):
    name = "quotes"
    start_urls = [
        'http://quotes.toscrape.com/page/1/',
        'http://quotes.toscrape.com/page/2/',
    ]

    def parse(self, response):
        for quote in response.css('div.quote'):
            yield {
                # ::text 选中文本节点
                'text': quote.css('span.text::text').get(),
                'author': quote.css('small.author::text').get(),
                'tags': quote.css('div.tags a.tag::text').getall(),
            }

执行命令scrapy crawl quotes -o quotes.json 输出结果到 json 文件 结果类似:

[
{"text": "\u201cThe world as we have created it is a process of our thinking. It cannot be changed without changing our thinking.\u201d", "author": "Albert Einstein", "tags": ["change", "deep-thoughts", "thinking", "world"]},
{"text": "\u201cIt is our choices, Harry, that show what we truly are, far more than our abilities.\u201d", "author": "J.K. Rowling", "tags": ["abilities", "choices"]},
{"text": "\u201cThere are only two ways to live your life. One is as though nothing is a miracle. The other is as though everything is a miracle.\u201d", "author": "Albert Einstein", "tags": ["inspirational", "life", "live", "miracle", "miracles"]}, {"text": "\u201cThe person, be it gentleman or lady, who has not pleasure in a good novel, must be intolerably stupid.\u201d", "author": "Jane Austen", "tags": ["aliteracy", "books", "classic", "humor"]}, {"text": "\u201cImperfection is beauty, madness is genius and it's better to be absolutely ridiculous than absolutely boring.\u201d", "author": "Marilyn Monroe", "tags": ["be-yourself", "inspirational"]}, {"text": "\u201cTry not to become a man of success. Rather become a man of value.\u201d", "author": "Albert Einstein", "tags": ["adulthood", "success", "value"]}, {"text": "\u201cIt is better to be hated for what you are than to be loved for what you are not.\u201d", "author": "Andr\u00e9 Gide", "tags": ["life", "love"]}, {"text": "\u201cI have not failed. I've just found 10,000 ways that won't work.\u201d", "author": "Thomas A. Edison", "tags": ["edison", "failure", "inspirational", "paraphrased"]}, {"text": "\u201cA woman is like a tea bag; you never know how strong it is until it's in hot water.\u201d", "author": "Eleanor Roosevelt", "tags": ["misattributed-eleanor-roosevelt"]}, {"text": "\u201cA day without sunshine is like, you know, night.\u201d", "author": "Steve Martin", "tags": ["humor", "obvious", "simile"]}, {"text": "\u201cThis life is what you make it. No matter what, you're going to mess up sometimes, it's a universal truth. But the good part is you get to decide how you're going to mess it up. Girls will be your friends - they'll act like it anyway. But just remember, some come, some go. The ones that stay with you through everything - they're your true best friends. Don't let go of them. Also remember, sisters make the best friends in the world. As for lovers, well, they'll come and go too. And baby, I hate to say it, most of them - actually pretty much all of them are going to break your heart, but you can't give up because if you give up, you'll never find your soulmate. You'll never find that half who makes you whole and that goes for everything. Just because you fail once, doesn't mean you're gonna fail at everything. Keep trying, hold on, and always, always, always believe in yourself, because if you don't, then who will, sweetie? So keep your head high, keep your chin up, and most importantly, keep smiling, because life's a beautiful thing and there's so much to smile about.\u201d", "author": "Marilyn Monroe", "tags": ["friends", "heartbreak", "inspirational", "life", "love", "sisters"]}, {"text": "\u201cIt takes a great deal of bravery to stand up to our enemies, but just as much to stand up to our friends.\u201d", "author": "J.K. Rowling", "tags": ["courage", "friends"]}, {"text": "\u201cIf you can't explain it to a six year old, you don't understand it yourself.\u201d", "author": "Albert Einstein", "tags": ["simplicity", "understand"]}, {"text": "\u201cYou may not be her first, her last, or her only. She loved before she may love again. But if she loves you now, what else matters? She's not perfect\u2014you aren't either, and the two of you may never be perfect together but if she can make you laugh, cause you to think twice, and admit to being human and making mistakes, hold onto her and give her the most you can. She may not be thinking about you every second of the day, but she will give you a part of her that she knows you can break\u2014her heart. So don't hurt her, don't change her, don't analyze and don't expect more than she can give. Smile when she makes you happy, let her know when she makes you mad, and miss her when she's not there.\u201d", "author": "Bob Marley", "tags": ["love"]}, {"text": "\u201cI like nonsense, it wakes up the brain cells. Fantasy is a necessary ingredient in living.\u201d", "author": "Dr. Seuss", "tags": ["fantasy"]}, {"text": "\u201cI may not have gone where I intended to go, but I think I have ended up where I needed to be.\u201d", "author": "Douglas Adams", "tags": ["life", "navigation"]}, {"text": "\u201cThe opposite of love is not hate, it's indifference. The opposite of art is not ugliness, it's indifference. The opposite of faith is not heresy, it's indifference. And the opposite of life is not death, it's indifference.\u201d", "author": "Elie Wiesel", "tags": ["activism", "apathy", "hate", "indifference", "inspirational", "love", "opposite", "philosophy"]}, {"text": "\u201cIt is not a lack of love, but a lack of friendship that makes unhappy marriages.\u201d", "author": "Friedrich Nietzsche", "tags": ["friendship", "lack-of-friendship", "lack-of-love", "love", "marriage", "unhappy-marriage"]}, {"text": "\u201cGood friends, good books, and a sleepy conscience: this is the ideal life.\u201d", "author": "Mark Twain", "tags": ["books", "contentment", "friends", "friendship", "life"]},
{"text": "\u201cLife is what happens to us while we are making other plans.\u201d", "author": "Allen Saunders", "tags": ["fate", "life", "misattributed-john-lennon", "planning", "plans"]}
]

也可以输出到 csv,如 scrapy crawl quotes -o quotes.csv

本文参与腾讯云自媒体分享计划,欢迎正在阅读的你也加入,一起分享。

我来说两句

0 条评论
登录 后参与评论

相关文章

  • 【坑】dubbo+zookeeper服务端重复调用三次

    版权声明:本文为博主原创文章,未经博主允许不得转载。 https://blog.csdn.n...

    用户1149268
  • Java API批量操作Elasticsearch

    版权声明:本文为博主原创文章,未经博主允许不得转载。 https://blog.csdn.n...

    用户1149268
  • 零代码第一步,做个添加数据的服务先。node.js + mysql

    1、 数据库设计、建表这类的直接略过,假设有了一张表,我们要往这个表里面添加数据

    用户1174620
  • 物联网专业术语

    6LoWPAN是“IPv6 over low-power personal area networks”的缩写。是低速无线个域网标准,它基于IEEE 802....

    魏晓蕾
  • 扩展jwt解决oauth2 性能瓶颈

    如上步骤在实际使用,会造成认证中心的负载压力过大,成为造成整个系统瓶颈的关键点。

    冷冷
  • openshift/origin工作记录(7)——持久化openshift内部镜像仓库

    openshift内部镜像仓库Registry组件默认是非持久化的。之前重启过一次openshift集群,导致重启前通过s2i构建的镜像都不能用了。所以需要持...

    胡了了
  • python︱处理与使用json格式的数据(json/UltraJSON/Demjson)、pickle模块

    版权声明:博主原创文章,微信公众号:素质云笔记,转载请注明来源“素质云博客”,谢谢合作!! ...

    素质
  • 基于nodejs的流水线式的CRUD服务。依赖注入可以支持插件。

    当我们刚开始学习数据库编程的时候,我们会先写一段代码,实现往一个表里添加数据的功能。这段代码是必须写的,不写怎么会?

    用户1174620
  • jQuery学习笔记之jQuery的Ajax(3)

    ———————————————————————————— 6.0 jQuery ajax 浏览器给我们提供的XMLHttpRequest的作用是来发送h...

    王小雷
  • python三方库之requests-快速上手

    其他请求接口与HTTP请求类型一致,如PUT, DELETE, HEAD, OPTIONS等。

    枇杷李子橙橘柚

扫码关注云+社区

领取腾讯云代金券