blocks|key|1018498|text|除了一个非常小的错误，你做的每件事都是正确的。|type|unstyled|depth|inlineStyleRanges|entityRanges|data|1018499|包含图像的字段名称为Image而不是image|offset|length|style|CODE|1018500|尝试：|1018501|yield+{
++++'Image':+item.get('Image')
}|code-block|syntax|javascript|1018502|settings.py文件中的ITEM_PIPELINES可能也有问题|1018503|​|1018504|📷|atomic|1018505|1018506|1018507|1018508|1018509|entityMap|0|IMAGE|mutability|IMMUTABLE|imageUrl|https://ask.qcloudimg.com/http-save/yehe-900000/d4f1b4c428f3255045daaa294e417529.png|imageAlt|1|https://ask.qcloudimg.com/http-save/yehe-900000/73ee176f1751e73265f36042580fdba0.png^0|0|A|5|I|5|0|0|0|F|E|0|0|0|1|0|0|0|0|0|1|1|0|0^^$0|@$1|2|3|4|5|6|7|1A|8|@]|9|@]|A|$]]|$1|B|3|C|5|6|7|1B|8|@$D|1C|E|1D|F|G]|$D|1E|E|1F|F|G]]|9|@]|A|$]]|$1|H|3|I|5|6|7|1G|8|@]|9|@]|A|$]]|$1|J|3|K|5|L|7|1H|8|@]|9|@]|A|$M|N]]|$1|O|3|P|5|6|7|1I|8|@$D|1J|E|1K|F|G]]|9|@]|A|$]]|$1|Q|3|R|5|6|7|1L|8|@]|9|@]|A|$]]|$1|S|3|T|5|U|7|1M|8|@]|9|@$D|1N|E|1O|1|1P]]|A|$]]|$1|V|3|R|5|6|7|1Q|8|@]|9|@]|A|$]]|$1|W|3|R|5|6|7|1R|8|@]|9|@]|A|$]]|$1|X|3|T|5|U|7|1S|8|@]|9|@$D|1T|E|1U|1|1V]]|A|$]]|$1|Y|3|R|5|6|7|1W|8|@]|9|@]|A|$]]|$1|Z|3|-4|5|6|7|1X|8|@]|9|@]|A|$]]]|10|$11|$5|12|13|14|A|$15|16|17|-4]]|18|$5|12|13|14|A|$15|19|17|-4]]]]

You did everything right except a very small mistake.

The field name which contains the image is <code>Image</code> and not <code>image</code>

Try :

<pre><code>yield {
 'Image': item.get('Image')
}
</code></pre>

There is probably something also wrong with your <code>ITEM_PIPELINES</code> in settings.py file

<a href="https://i.stack.imgur.com/F5qtf.png" rel="nofollow noreferrer"><img src="https://i.stack.imgur.com/F5qtf.png" alt="Working with python requests"></a>

<a href="https://i.stack.imgur.com/0BRV2.png" rel="nofollow noreferrer"><img src="https://i.stack.imgur.com/0BRV2.png" alt="Works even with scrapy:"></a>

blocks|key|1118411|text|好吧，回答我的问题，并在一段时间后深入研究我的代码……我意识到这是关于标识错误和一些语法错误。|type|unstyled|depth|inlineStyleRanges|entityRanges|data|1118412|另一个问题是管道，我忘了把姓氏改为管道的真实名称，所以现在我用'Maiscarrinho.pipelines.MaiscarrinhoPipeline':+300代替了'Maiscarrinho.pipelines.SomePipeline':+300|offset|length|style|CODE|1118413|下面的代码像我想要的那样提取图像，但还有一个问题。因为页面有无限的滚动，所以我有一个条件来评估是否有一个名为'Image的元素，但是由于某些原因，我没有得到想要的结果。它应该提取40页，每页10张图片。|1118414|import+scrapy
import+json

class+WineSpider(scrapy.Spider):
name+=+"SpidyWine"

url+=+'https://maiscarrinho.com/api/search?q=vinho&pageNumber=%25s&pageSize=10'
start_urls+=+[url+%25+1]
i+=+1
def+parse(self,+response):
++++data+=+json.loads(response.body.decode('utf-8'))
++++for+item+in+data['results']:
++++++++yield+{
++++++++++++'Image':+item.get('Image')
++++++++}
++++++++if+item.get('Image'):
++++++++++++WineSpider.i+%2B=+1
++++++++++++yield+scrapy.Request(self.url+%25+WineSpider.i,+callback=self.parse)|code-block|syntax|javascript|1118415|entityMap^0|0|V|1E|2C|16|0|1I|6|0|0^^$0|@$1|2|3|4|5|6|7|Q|8|@]|9|@]|A|$]]|$1|B|3|C|5|6|7|R|8|@$D|S|E|T|F|G]|$D|U|E|V|F|G]]|9|@]|A|$]]|$1|H|3|I|5|6|7|W|8|@$D|X|E|Y|F|G]]|9|@]|A|$]]|$1|J|3|K|5|L|7|Z|8|@]|9|@]|A|$M|N]]|$1|O|3|-4|5|6|7|10|8|@]|9|@]|A|$]]]|P|$]]

Well answering to my question and after digging into my code after some time... I realized it was about identation errors and some errors of syntaxe.

Another point was the pipeline, i forgot to change de last name to the real name of my pipeline, so instead of having <code>'Maiscarrinho.pipelines.SomePipeline': 300</code> now i have <code>'Maiscarrinho.pipelines.MaiscarrinhoPipeline': 300</code>

The bellow code are extracting the images like i want, but there is one problem yet. Since the page have infinite scrolling i have a condition to evaluate if there is an element named <code>'Image</code> but for some reason i'm not getting the desired result. It should extract 40 pages each with 10 images.

<pre><code>import scrapy
import json

class WineSpider(scrapy.Spider):
name = "SpidyWine"

url = 'https://maiscarrinho.com/api/search?q=vinho&amp;pageNumber=%s&amp;pageSize=10'
start_urls = [url % 1]
i = 1
def parse(self, response):
 data = json.loads(response.body.decode('utf-8'))
 for item in data['results']:
 yield {
 'Image': item.get('Image')
 }
 if item.get('Image'):
 WineSpider.i += 1
 yield scrapy.Request(self.url % WineSpider.i, callback=self.parse)
</code></pre>

I'm trying to extract json data with Scrapy from a website, but i'm facing some issues, like when i run my spider, gives no error and says that crawled 0 pages. I also use the command to store de output to json file to see the output.

The following code is my spider:

<pre><code>import scrapy

class WineSpider(scrapy.Spider):
name = "SpidyWine"
i = 0
url = 'https://maiscarrinho.com/api/search?q=vinho&amp;pageNumber=%s&amp;pageSize=10'
start_urls = [url % 1]

def parse(self, response):
data = json.loads(response.body)
for item in data['results']:
 yield {
 'Image': item.get('image')
 }
if data['Image']:
 i = i + 1
 yield scrapy.Request(self.url % i, callback=self.parse)
</code></pre>

And my class of items:

<pre><code>import scrapy

class MaiscarrinhoItem(scrapy.Item):

 image = scrapy.Field()
 price = scrapy.Field()
 supermarket = scrapy.Field()
 promotion = scrapy.Field()
 wineName = scrapy.Field()
 brand = scrapy.Field()
</code></pre>

For now, i'm just using the Image field in my spider to get things more easier.
Also, my ideia when i wrote the if statement in my spider was to 'deal' with the infinite scorlling, when the json api has 'Image' means that that page have content. 

<a href="https://i.stack.imgur.com/KrHI4.png" rel="nofollow noreferrer">Output in Console</a>

Thanks in advance

Scrapy Spider Crawl 0 pages

翻译质量差，导致语言生硬或混乱。

没有提供实际的解决方法或示例。

解答不清晰，无法理解或解决问题。

页面排版不美观，阅读体验差。

文章

问答

视频

学习中心

腾讯云实验室

直播

竞赛

腾讯云代码分析专区

腾讯iOA零信任安全管理系统专区

腾讯云架构师技术同盟交流圈

腾讯云数据库专区

腾讯云顾问专区

腾讯云原生专区

腾讯混元专区

腾讯云TCE专区

腾讯云Lighthouse专区

腾讯云HAI专区

腾讯云Edgeone专区

腾讯云存储专区

腾讯云智能专区

腾讯轻联专区 

腾讯云开发专区

TAPD专区

腾讯轻量云游戏服专区

腾讯云最具价值专家

腾讯云架构师技术同盟

腾讯云创作之星

腾讯云开发者先锋

腾讯云代码助手

云原生构建

TAPD 敏捷项目管理

Cloud Studio

SDK中心

API中心

命令行工具

涵盖代码开发、场景应用、自动测试全流程，助你从零构建专属AI助手

一站式MCP教程库，解锁AI应用新玩法

我试图用Scrapy从网站中提取json数据，但我遇到了一些问题，比如当我运行我的爬行器时，没有给出错误，并且说爬行了0个页面。我还使用该命令将de输出存储到json文件中，以查看输出。下面的代码是我的爬虫：import scrapyclass WineSpider(scrapy.Spider):name = "Spi...

问Scrapy Spider Crawl 0页
EN

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问Scrapy Spider Crawl 0页EN