首页
学习
活动
专区
工具
TVP
发布
社区首页 >问答首页 >For循环与Scrapy

For循环与Scrapy
EN

Stack Overflow用户
提问于 2018-06-10 02:30:18
回答 1查看 1.5K关注 0票数 0

嘿,你们好,我一直在努力学习scrapy,现在正在做我的第一个项目。我写了这段代码,试图从http://www.rotoworld.com/playernews/nfl/football/?rw=1上抓取NFL球员的新闻。我试图设置一个循环来从站点获取每个容器,但是当我运行代码时,它没有刮掉任何东西。代码运行得很好,甚至在我请求的时候也会输出一个csv文件。它只是没有刮掉我认为我告诉它刮掉的东西。任何帮助都是最好的!谢谢

代码语言:javascript
复制
import scrapy
from Roto_Player_News.items import NFLNews

class Roto_News_Spider2(scrapy.Spider):
    name="PlayerNews2"
    allowed_domains = ["rotoworld.com"]
    start_urls = ('http://www.rotoworld.com/playernews/nfl/football/',)

    def parse(self,response):

        containers= response.xpath('//*[@id="cp1_pnlNews"]/div/div[2]')

        def parse(self, response):

            for container in containers:
                def parse(self, response):           
                    item=NFLNews()
                    item['player']= response.xpath('//div[@class="pb"][1]/div[@id="cp1_ctl00_rptBlurbs_floatingcontainer_0"]/div[@class="report"]/text()')
                    item['headline'] = response.xpath('//div[@class="pb"][1]/div[@id="cp1_ctl00_rptBlurbs_floatingcontainer_0"]/div[@class="report"]/p/text()').extract()
                    item['info'] = response.xpath('//div[@class="pb"][1]/div[@id="cp1_ctl00_rptBlurbs_floatingcontainer_0"]/div[@class="impact"]/text()').extract()
                    item['date'] = response.xpath('//div[@class="pb"][1]/div[@id="cp1_ctl00_rptBlurbs_floatingcontainer_0"]/div[@class="info"]/div[@class="date"]/text()').extract()
                    item['source'] = response.xpath('//div[@class="pb"][1]/div[@id="cp1_ctl00_rptBlurbs_floatingcontainer_0"]/div[@class="info"]/div[@class="source"]/a/text()').extract()

                    yield item
EN

回答 1

Stack Overflow用户

回答已采纳

发布于 2018-06-10 03:21:06

您定义的xpath看起来不太好。试试这个吧。它应该会为你获取你想要抓取的内容。只需复制和粘贴即可。

代码语言:javascript
复制
import scrapy

class Roto_News_Spider2(scrapy.Spider):
    name = "PlayerNews2"

    start_urls = [
        'http://www.rotoworld.com/playernews/nfl/football/',
    ]

    def parse(self, response):
        for item in response.xpath("//div[@class='pb']"):
            player = item.xpath(".//div[@class='player']/a/text()").extract_first()
            report = item.xpath(".//div[@class='report']/p/text()").extract_first()
            date = item.xpath(".//div[@class='date']/text()").extract_first()
            impact = item.xpath(".//div[@class='impact']/text()").extract_first().strip()
            source = item.xpath(".//div[@class='source']/a/text()").extract_first()
            yield {"Player": player,"Report":report,"Date":date,"Impact":impact,"Source":source}
票数 0
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/50777358

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档