blocks|key|2665977|text|如果您正在尝试使用它来进行速率限制，那么您可能只想使用DOWNLOAD_DELAY。|type|unstyled|depth|inlineStyleRanges|entityRanges|offset|length|data|2665978|Scrapy只是Twisted之上的一个框架。在很大程度上，你可以像对待任何其他扭曲的应用程序一样对待它。不需要调用sleep，只需返回下一个请求make，并告诉twisted稍等片刻。例如：|2665979|from+twisted.internet+import+reactor,+defer

def+non_stop_function(self,+response)
++++d+=+defer.Deferred()
++++reactor.callLater(10.0,+d.callback,+Request(
++++++++'some+url',
++++++++callback=self.non_stop_function
++++))
++++return+d|code-block|syntax|javascript|2665980|entityMap|0|LINK|mutability|MUTABLE|url|http://doc.scrapy.org/en/latest/topics/settings.html#std:setting-DOWNLOAD_DELAY^0|R|E|0|0|0|0^^$0|@$1|2|3|4|5|6|7|S|8|@]|9|@$A|T|B|U|1|V]]|C|$]]|$1|D|3|E|5|6|7|W|8|@]|9|@]|C|$]]|$1|F|3|G|5|H|7|X|8|@]|9|@]|C|$I|J]]|$1|K|3|-4|5|6|7|Y|8|@]|9|@]|C|$]]]|L|$M|$5|N|O|P|C|$Q|R]]]]

If you're attempting to use this for rate limiting, you probably just want to use <a href="http://doc.scrapy.org/en/latest/topics/settings.html#std:setting-DOWNLOAD_DELAY" rel="noreferrer">DOWNLOAD_DELAY</a> instead.

Scrapy is just a framework on top of Twisted. For the most part, you can treat it the same as any other twisted app. Instead of calling sleep, just return the next request to make and tell twisted to wait a bit. Ex:

<pre><code>from twisted.internet import reactor, defer

def non_stop_function(self, response)
 d = defer.Deferred()
 reactor.callLater(10.0, d.callback, Request(
 'some url',
 callback=self.non_stop_function
 ))
 return d
</code></pre>

blocks|key|115104|text|Request对象有callback参数，请尝试使用该参数。我的意思是，创建一个包装self.second_parse_function和pause的Deferred。|type|unstyled|depth|inlineStyleRanges|offset|length|style|CODE|entityRanges|data|115105|这是我的脏的和未测试的示例，更改的行被标记。|115106|class+ScrapySpider(Spider):
++++name+=+'live_function'

++++def+start_requests(self):
++++++++yield+Request('some+url',+callback=self.non_stop_function)

++++def+non_stop_function(self,+response):

++++++++parse_and_pause+=+Deferred()++#+changed
++++++++parse_and_pause.addCallback(self.second_parse_function)+#+changed
++++++++parse_and_pause.addCallback(pause,+seconds=10)++#+changed

++++++++for+url+in+['url1',+'url2',+'url3',+'more+urls']:
++++++++++++yield+Request(url,+callback=parse_and_pause)++#+changed

++++++++yield+Request('some+url',+callback=self.non_stop_function)++#+Call+itself

++++def+second_parse_function(self,+response):
++++++++pass|code-block|syntax|javascript|115107|如果这种方法对您有效，那么您可以创建一个函数，它根据规则构造一个Deferred对象。它可以通过如下方式实现：|115108|def+get_perform_and_pause_deferred(seconds,+fn,+*args,+**kwargs):
++++d+=+Deferred()
++++d.addCallback(fn,+*args,+**kwargs)
++++d.addCallback(pause,+seconds=seconds)
++++return+d|115109|下面是可能的用法：|115110|class+ScrapySpider(Spider):
++++name+=+'live_function'

++++def+start_requests(self):
++++++++yield+Request('some+url',+callback=self.non_stop_function)

++++def+non_stop_function(self,+response):
++++++++for+url+in+['url1',+'url2',+'url3',+'more+urls']:
++++++++++++#+changed
++++++++++++yield+Request(url,+callback=get_perform_and_pause_deferred(10,+self.second_parse_function))

++++++++yield+Request('some+url',+callback=self.non_stop_function)++#+Call+itself

++++def+second_parse_function(self,+response):
++++++++pass|115111|entityMap^0|0|7|A|8|16|Q|1X|5|23|8|0|0|0|W|8|0|0|0|0^^$0|@$1|2|3|4|5|6|7|W|8|@$9|X|A|Y|B|C]|$9|Z|A|10|B|C]|$9|11|A|12|B|C]|$9|13|A|14|B|C]|$9|15|A|16|B|C]]|D|@]|E|$]]|$1|F|3|G|5|6|7|17|8|@]|D|@]|E|$]]|$1|H|3|I|5|J|7|18|8|@]|D|@]|E|$K|L]]|$1|M|3|N|5|6|7|19|8|@$9|1A|A|1B|B|C]]|D|@]|E|$]]|$1|O|3|P|5|J|7|1C|8|@]|D|@]|E|$K|L]]|$1|Q|3|R|5|6|7|1D|8|@]|D|@]|E|$]]|$1|S|3|T|5|J|7|1E|8|@]|D|@]|E|$K|L]]|$1|U|3|-4|5|6|7|1F|8|@]|D|@]|E|$]]]|V|$]]

<code>Request</code> object has <code>callback</code> parameter, try to use that one for the purpose.
I mean, create a <code>Deferred</code> which wraps <code>self.second_parse_function</code> and <code>pause</code>.

Here is my dirty and not tested example, changed lines are marked.

<pre><code>class ScrapySpider(Spider):
 name = 'live_function'

 def start_requests(self):
 yield Request('some url', callback=self.non_stop_function)

 def non_stop_function(self, response):

 parse_and_pause = Deferred() # changed
 parse_and_pause.addCallback(self.second_parse_function) # changed
 parse_and_pause.addCallback(pause, seconds=10) # changed

 for url in ['url1', 'url2', 'url3', 'more urls']:
 yield Request(url, callback=parse_and_pause) # changed

 yield Request('some url', callback=self.non_stop_function) # Call itself

 def second_parse_function(self, response):
 pass
</code></pre>

If the approach works for you then you can create a function which constructs a <code>Deferred</code> object according to the rule. It could be implemented in the way like the following:

<pre><code>def get_perform_and_pause_deferred(seconds, fn, *args, **kwargs):
 d = Deferred()
 d.addCallback(fn, *args, **kwargs)
 d.addCallback(pause, seconds=seconds)
 return d
</code></pre>

And here is possible usage:

<pre><code>class ScrapySpider(Spider):
 name = 'live_function'

 def start_requests(self):
 yield Request('some url', callback=self.non_stop_function)

 def non_stop_function(self, response):
 for url in ['url1', 'url2', 'url3', 'more urls']:
 # changed
 yield Request(url, callback=get_perform_and_pause_deferred(10, self.second_parse_function))

 yield Request('some url', callback=self.non_stop_function) # Call itself

 def second_parse_function(self, response):
 pass
</code></pre>

blocks|key|118044|text|发问者已经在问题的更新中提供了答案，但我想给出一个稍微更好的版本，以便它可以重用于任何请求。|type|unstyled|depth|inlineStyleRanges|entityRanges|data|118045|#+removed...
from+twisted.internet+import+reactor,+defer

class+MySpider(scrapy.Spider):
++++#+removed...

++++def+request_with_pause(self,+response):
++++++++d+=+defer.Deferred()
++++++++reactor.callLater(response.meta['time'],+d.callback,+scrapy.Request(
++++++++++++response.url,
++++++++++++callback=response.meta['callback'],
++++++++++++dont_filter=True,+meta={'dont_proxy':response.meta['dont_proxy']}))
++++++++return+d

++++def+parse(self,+response):
++++++++#+removed....
++++++++yield+scrapy.Request(the_url,+meta={
++++++++++++++++++++++++++++'time':+86400,+
++++++++++++++++++++++++++++'callback':+self.the_parse,+
++++++++++++++++++++++++++++'dont_proxy':+True
++++++++++++++++++++++++++++},+callback=self.request_with_pause)|code-block|syntax|javascript|118046|作为解释，Scrapy使用Twisted来异步管理请求，所以我们也需要Twisted的工具来处理延迟的请求。|118047|entityMap^0|0|0|0^^$0|@$1|2|3|4|5|6|7|K|8|@]|9|@]|A|$]]|$1|B|3|C|5|D|7|L|8|@]|9|@]|A|$E|F]]|$1|G|3|H|5|6|7|M|8|@]|9|@]|A|$]]|$1|I|3|-4|5|6|7|N|8|@]|9|@]|A|$]]]|J|$]]

The asker already provides an answer in the question's update, but I want to give a slightly better version so it's reusable for any request.

<pre><code># removed...
from twisted.internet import reactor, defer

class MySpider(scrapy.Spider):
 # removed...

 def request_with_pause(self, response):
 d = defer.Deferred()
 reactor.callLater(response.meta['time'], d.callback, scrapy.Request(
 response.url,
 callback=response.meta['callback'],
 dont_filter=True, meta={'dont_proxy':response.meta['dont_proxy']}))
 return d

 def parse(self, response):
 # removed....
 yield scrapy.Request(the_url, meta={
 'time': 86400, 
 'callback': self.the_parse, 
 'dont_proxy': True
 }, callback=self.request_with_pause)
</code></pre>

For explanation, Scrapy use Twisted to manage the request asynchronously, so we need Twisted's tool to do a delayed request too.

I have a problem. I need to stop the execution of a function for a while, but not stop the implementation of parsing as a whole. That is, I need a non-blocking pause.

It's looks like:

<pre><code>class ScrapySpider(Spider):
 name = 'live_function'

 def start_requests(self):
 yield Request('some url', callback=self.non_stop_function)

 def non_stop_function(self, response):
 for url in ['url1', 'url2', 'url3', 'more urls']:
 yield Request(url, callback=self.second_parse_function)

 # Here I need some function for sleep only this function like time.sleep(10)

 yield Request('some url', callback=self.non_stop_function) # Call itself

 def second_parse_function(self, response):
 pass
</code></pre>

Function non_stop_function needs to be stopped for a while, but it should not block the rest of the output.

If I insert <code>time.sleep()</code> - it will stop the whole parser, but I don't need it. Is it possible to stop one function using <code>twisted</code> or something else?

Reason: I need to create a non-blocking function that will parse the page of the website every n seconds. There she will get urls and fill for 10 seconds. URLs that have been obtained will continue to work, but the main feature needs to sleep.

UPDATE: 

Thanks to TkTech and viach. One answer helped me to understand how to make a pending <code>Request</code>, and the second is how to activate it. Both answers complement each other and I made an excellent non-blocking pause for Scrapy:

<pre><code>def call_after_pause(self, response):
 d = Deferred()
 reactor.callLater(10.0, d.callback, Request(
 'https://example.com/',
 callback=self.non_stop_function,
 dont_filter=True))
 return d
</code></pre>

And use this function for my request:

<pre><code>yield Request('https://example.com/', callback=self.call_after_pause, dont_filter=True)
</code></pre>

Scrapy: non-blocking pause

我有个问题。我需要停止函数的执行一段时间，但不能停止整个解析的实现。也就是说，我需要一个非阻塞的暂停。它看起来像这样：class ScrapySpider(Spider):    name = 'live_function'    def start_requests(self):        yield Reque...

问Scrapy:非阻塞暂停
EN

社区

活动

资源

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问Scrapy:非阻塞暂停EN

社区

活动

资源

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问Scrapy:非阻塞暂停
EN