通常,我使用以下命令运行我的scrapy cralwer:
scrapy crawl <sipder_name>运行后,它将从目标资源中抓取所需的元素,但我必须监视屏幕上显示的结果,以查找错误(如果有的话)并手动停止爬虫。
如何使自动化这个过程?当爬行器无法抓取所需的元素并在获取时失败时,是否有自动停止爬行程序的方法?
发布于 2021-11-09 06:42:24
spider.py:
import scrapy
from scrapy.exceptions import CloseSpider
class SomeSpider(scrapy.Spider):
name = 'somespider'
allowed_domains = ['example.com']
start_urls = ['https://example.com']
def parse(self, response):
try:
something()
except Exception as e:
print(e)
raise CloseSpider("Some error")
# if you want to catch a bad status you can also do:
# if response.status != 200: .....发布于 2021-11-10 08:52:49
我想你是在找伐木。有记录这里的文档。
我觉得有用的是:
import logging
import scrapy
logger = logging.getLogger('mycustomlogger')
class MySpider(scrapy.Spider):
name = 'myspider'
start_urls = ['https://scrapy.org']
def parse(self, response):
logger.info('Parse function called on %s', response.url)https://stackoverflow.com/questions/69893034
复制相似问题