嘿,你们好,我一直在努力学习scrapy,现在正在做我的第一个项目。我写了这段代码,试图从http://www.rotoworld.com/playernews/nfl/football/?rw=1上抓取NFL球员的新闻。我试图设置一个循环来从站点获取每个容器,但是当我运行代码时,它没有刮掉任何东西。代码运行得很好,甚至在我请求的时候也会输出一个csv文件。它只是没有刮掉我认为我告诉它刮掉的东西。任何帮助都是最好的!谢谢
import scrapy
from Roto_Player_News.items import NFLNews
class Roto_News_Spider2(scrapy.Spider):
name="PlayerNews2"
allowed_domains = ["rotoworld.com"]
start_urls = ('http://www.rotoworld.com/playernews/nfl/football/',)
def parse(self,response):
containers= response.xpath('//*[@id="cp1_pnlNews"]/div/div[2]')
def parse(self, response):
for container in containers:
def parse(self, response):
item=NFLNews()
item['player']= response.xpath('//div[@class="pb"][1]/div[@id="cp1_ctl00_rptBlurbs_floatingcontainer_0"]/div[@class="report"]/text()')
item['headline'] = response.xpath('//div[@class="pb"][1]/div[@id="cp1_ctl00_rptBlurbs_floatingcontainer_0"]/div[@class="report"]/p/text()').extract()
item['info'] = response.xpath('//div[@class="pb"][1]/div[@id="cp1_ctl00_rptBlurbs_floatingcontainer_0"]/div[@class="impact"]/text()').extract()
item['date'] = response.xpath('//div[@class="pb"][1]/div[@id="cp1_ctl00_rptBlurbs_floatingcontainer_0"]/div[@class="info"]/div[@class="date"]/text()').extract()
item['source'] = response.xpath('//div[@class="pb"][1]/div[@id="cp1_ctl00_rptBlurbs_floatingcontainer_0"]/div[@class="info"]/div[@class="source"]/a/text()').extract()
yield item
发布于 2018-06-10 03:21:06
您定义的xpath看起来不太好。试试这个吧。它应该会为你获取你想要抓取的内容。只需复制和粘贴即可。
import scrapy
class Roto_News_Spider2(scrapy.Spider):
name = "PlayerNews2"
start_urls = [
'http://www.rotoworld.com/playernews/nfl/football/',
]
def parse(self, response):
for item in response.xpath("//div[@class='pb']"):
player = item.xpath(".//div[@class='player']/a/text()").extract_first()
report = item.xpath(".//div[@class='report']/p/text()").extract_first()
date = item.xpath(".//div[@class='date']/text()").extract_first()
impact = item.xpath(".//div[@class='impact']/text()").extract_first().strip()
source = item.xpath(".//div[@class='source']/a/text()").extract_first()
yield {"Player": player,"Report":report,"Date":date,"Impact":impact,"Source":source}
https://stackoverflow.com/questions/50777358
复制相似问题