首页
学习
活动
专区
圈层
工具
发布
首页
学习
活动
专区
圈层
工具
MCP广场
社区首页 >问答首页 >运行刮伤式网络爬虫时出错

运行刮伤式网络爬虫时出错
EN

Stack Overflow用户
提问于 2014-07-01 04:08:32
回答 1查看 408关注 0票数 3
代码语言:javascript
运行
复制
import scrapy

class ExampleSpider(scrapy.Spider):
    name = "example"
    allowed_domains = ["dmoz.org"]
    start_urls = [
        "http://www.dmoz.org/Computers/Programming/Languages/Python/Books/",
        "http://www.dmoz.org/Computers/Programming/Languages/Python/Resources/"
    ]

    def parse(self, response):
        for sel in response.xpath('//ul/li'):
            title = sel.xpath('a/text()').extract()
            link = sel.xpath('a/@href').extract()
            desc = sel.xpath('text()').extract()
            print title, link, desc

但是,当我试图调用蜘蛛时,我会收到以下错误消息:

代码语言:javascript
运行
复制
[example] ERROR: Spider error processing <GET http://www.dmoz.org/Computers/Programming/Languages/Python/Books/>
    Traceback (most recent call last):
      File "/System/Library/Frameworks/Python.framework/Versions/2.7/Extras/lib/python/twisted/internet/base.py", line 1178, in mainLoop
        self.runUntilCurrent()
      File "/System/Library/Frameworks/Python.framework/Versions/2.7/Extras/lib/python/twisted/internet/base.py", line 800, in runUntilCurrent
        call.func(*call.args, **call.kw)
      File "/System/Library/Frameworks/Python.framework/Versions/2.7/Extras/lib/python/twisted/internet/defer.py", line 368, in callback
        self._startRunCallbacks(result)
      File "/System/Library/Frameworks/Python.framework/Versions/2.7/Extras/lib/python/twisted/internet/defer.py", line 464, in _startRunCallbacks
        self._runCallbacks()
    --- <exception caught here> ---
      File "/System/Library/Frameworks/Python.framework/Versions/2.7/Extras/lib/python/twisted/internet/defer.py", line 551, in _runCallbacks
        current.result = callback(current.result, *args, **kw)
      File "/Users/andy2/Documents/Python/tutorial/tutorial/spiders/example.py", line 18, in parse
        print title, link, desc
    exceptions.NameError: global name 'link' is not defined

有什么我能做的来使这个代码工作吗?

有谁可以帮我?

谢谢!

EN

回答 1

Stack Overflow用户

回答已采纳

发布于 2014-07-01 04:12:32

您需要实例化一个Selector并将response作为参数传递。另外,你们的进口是不正确的。这是蜘蛛的固定版本:

代码语言:javascript
运行
复制
from scrapy.selector import Selector
from scrapy.spider import Spider


class ExampleSpider(Spider):
    name = "example"
    allowed_domains = ["dmoz.org"]
    start_urls = [
        "http://www.dmoz.org/Computers/Programming/Languages/Python/Books/",
        "http://www.dmoz.org/Computers/Programming/Languages/Python/Resources/"
    ]

    def parse(self, response):
        sel = Selector(response)
        for li in sel.xpath('//ul/li'):
            title = li.xpath('a/text()').extract()
            link = li.xpath('a/@href').extract()
            desc = li.xpath('text()').extract()
            print title, link, desc
票数 3
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/24502315

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档