Scrapy splash没有名为scrapy_splash的模块。破损的管道_Scrapy没有名为‘’的模块_没有名为'scrapy.conf‘的模块 - 腾讯云开发者社区

python、module、scrapy、python-import、scrapy-splash

我不知道为什么，但最近出了个错误： File "C:\Users\name\PycharmProjects\splash\project\project\spiders\scrapy.py", line 5, in <module> class ScrapySpider(scrapy.Spider): AttributeError: 'module' object has no attribute 'Spider' 我的完整代码： import scrapy from scrapy_splash import SplashRe

浏览 7提问于2019-12-17得票数 1

回答已采纳

1回答

Scrapy with Splash:没有名为scrapy_splash的模块

python、scrapy、splash-screen

我正在学习如何在scrapy中使用splash。我正在做这个教程：。我已经创建了一个scrapy项目。当我运行时： $ scrapy crawl spider1 一切都很好。但是，当我添加到我的settings.py文件时： DOWNLOADER_MIDDLEWARES = { 'scrapy_splash.SplashCookiesMiddleware': 723, 'scrapy_splash.SplashMiddleware': 725, 'scrapy.downloadermiddlewares.httpcompression.HttpComp

浏览 19提问于2017-01-11得票数 0

1回答

如何用SplashRequest在scrapy_splash中发送post请求

python、scrapy、scrapy-splash

我尝试将SplashRequest与端点=‘execute’一起使用下面的代码发送post请求，但是结果显示我没有成功。 import re import sys import os import scrapy from scrapy_splash import SplashRequest from crawler.items import CrawlerItem class Exp10itSpider(scrapy.Spider): name = "test" lua_script = """ function main(s

浏览 3提问于2017-10-25得票数 0

回答已采纳

1回答

没有名为scrapy_splash的模块

python、scrapy、scrapy-splash

我在OSX上：我用pip install scrapy splash安装了刮水器，效果很好。我将这个代码块添加到我的蜘蛛中。 SPLASH_URL = 'http://localhost:8050' DOWNLOADER_MIDDLEWARES = { 'scrapy_splash.SplashCookiesMiddleware': 723, 'scrapy_splash.SplashMiddleware': 725, 'scrapy.downloadermiddlewares.httpcompression.

浏览 2提问于2020-03-31得票数 0

1回答

Scrapy: IndentationError:取消缩进不匹配任何外部缩进级别

python、scrapy、scrapy-splash

我在Scrapy splash上写了一个爬虫，我开始收到这个错误： File "C:\Users\Name\PycharmProjects\splash\project\project\spiders\scrapy.py", line 5 start_urls = [ ^ IndentationError: unindent does not match any outer indentation level 下面是我获得它的代码行： import scrapy from scrapy_splash import SplashRequest clas

浏览 25提问于2019-12-17得票数 1

1回答

使用Scrapy和Splash将动态页面重新呈现为HTML

python、scrapy、scrapy-splash

在下面的代码中，我试图使用JavaScript将一个Scrapy_Splash页面呈现为html，但在运行爬行器时得到以下错误(TCP连接超时10060)： 2021-12-26 18:57:19 [scrapy.core.scraper] ERROR: Error downloading <GET https://www.tesla.com/en_ca/models/design#overview via h ttp://172.17.0.1:8050/render.html> Traceback (most recent call last): File "

浏览 19提问于2021-12-27得票数 0

2回答

Scrapy splash没有名为scrapy_splash的模块。破损的管道

python、scrapy

我使用pip install安装了scrapy_splash python3 -m pip freeze显示scrapy-splash==0.7.2 然而，当我运行爬行器时，我得到了以下错误 ImportError: No module named scrapy_splash 我怀疑我在一些环境路径上有问题。 which python /usr/bin/python echo $PATH /usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games:/usr/bin/python3 但不知何故pip失败了 pip --version bas

浏览 95提问于2019-03-20得票数 0

回答已采纳

2回答

Ubuntu服务器上的Scrapy Splash :获得一个意外的关键字参数“编码”

python、web-scraping、scrapy、scrapy-splash、splash-js-render

我使用的Scrapy在我的本地机器上工作得很好，但是当我在我的Ubuntu服务器上使用它时，它会返回这个错误。为什么会这样呢？是因为记忆不足吗？ File "/usr/local/lib64/python2.7/site-packages/twisted/internet/defer.py", line 1299, in _inlineCallbacks result = g.send(result) File "/usr/local/lib/python2.7/site-packages/scrapy/core/downloader/middlewar

浏览 9提问于2017-03-12得票数 0

回答已采纳

1回答

如何用scrapy_splash包重定向表单后的帖子？

python、lua、scrapy、scrapy-splash

我使用Python、Scrapy、Splash和scrapy_splash包来废弃网站。我能够在scrapy_splash中使用scrapy_splash对象登录。登录创建了一个cookie，它允许我访问门户页面。到目前为止，一切都是可行的。在门户页面上，有一个表单元素包装了许多按钮。单击时，将更新操作URL，并触发表单提交。表单提交导致302重定向。我在SplashRequest中尝试了同样的方法，但是，我无法捕获重定向返回的SSO查询参数。我尝试读取标头位置参数，但没有成功。我还尝试将lua脚本与SplashRequest对象结合使用，但是，我仍然无法访问重定向Location对象

浏览 3提问于2017-05-18得票数 2

回答已采纳

1回答

无法获取scrapy javascript完整响应

python、html、http、scrapy、splash-js-render

我似乎不能从这个链接呈现一个完整的html响应： http://gabgoh.github.io/COVID 我之所以使用splash扩展，是因为我用常见的scrapy实践尝试的方法都不起作用，但这也不起作用。这是我的python代码(我正在用docker run -p 8050:8050 scrapinghub/splash运行splash docker ) import scrapy from scrapy.utils.log import configure_logging import scrapy_splash from scrapy_splash import SplashRe

浏览 10提问于2020-04-01得票数 0

回答已采纳

1回答

我正在通过scrapy-splash抓取bet3000 livescore体育网站，该网站有javascript启用的站点，这给了我错误。

python、web-scraping、scrapy、scrapy-splash

这个蜘蛛文件从scrapy.spiders导入蜘蛛从scrapy_splash导入SplashRequest从..items导入GameItem class Splash1Spider(Spider): name = 'scrapy_splash_1' start_urls = ['https://www.livescore.bet3000.com'] def start_requests(self): for url in self.start_urls: yield SplashRequest

浏览 2提问于2020-11-27得票数 0

2回答

attrs给出- TypeError: attrs()获得意外的关键字参数'eq‘

python、scrapy-splash

我使用的是来自ScrapingHub的云Splash实例。我尝试使用Scrapy-Splash库执行一个简单的请求，但一直收到错误： @attr.s(hash=False, repr=False, eq=False) TypeError: attrs() got an unexpected keyword argument 'eq' 任何关于为什么会出现错误的想法/线索都将不胜感激。我使用的代码如下所示(使用Python 3.6和Scrapy v2.1.0)： import scrapy from scrapy_splash import SplashRequest cl

浏览 181提问于2020-05-20得票数 8

1回答

使用Scrapy和Splash抓取javascript内容

python、scrapy、scrapy-splash

我正在使用刮伤和splash：来抓取这个链接但我无法提取数据。我的代码： import scrapy from scrapy_splash import SplashRequest class ManuPySpider(scrapy.Spider): name = 'manulife' def start_requests(self): yield SplashRequest( url = 'https://manulife.taleo.net/careersection/external_global/jobsearc

浏览 1提问于2017-10-25得票数 0

回答已采纳

1回答

擦伤飞溅错误:放弃重试504网关超时

python、web-scraping、lua、scrapy、scrapy-splash

我正在接收这个504网关错误，当我在尝试爬行这个的时候使用splash和scrapy一起学习splash。你能帮帮我吗？ Splash正在8050港口的一个码头集装箱上运行。蜘蛛文件 import scrapy from scrapy_splash import SplashRequest class LaptopSpider(scrapy.Spider): name = 'laptop' allowed_domains = ['www.lazada.com.my'] def start_requests(self):

浏览 5提问于2022-02-16得票数 0

回答已采纳

1回答

如何在Scrapy-Splash中使用splash:mouse_press

splash-js-render

我正在尝试点击一个网站上的显示按钮，但没有成功，我真的不知道怎么做，但我遇到了一个可能工作的东西，splash:mouse_press。这将与scrapy-splash一起工作吗?如果可以，我该如何实现它？ import scrapy from scrapy.spiders import Spider from scrapy_splash import SplashRequest from ..items import NameItem class LoginSpider(scrapy.Spider): name = "LoginSpider" start_

浏览 30提问于2019-06-28得票数 0

回答已采纳

1回答

Python - Scrapy splash无法呈现此页面

python、web-scraping、scrapy、scrapy-splash

这就是我想要抓取的页面。当我使用SplashRequest打开它时，我得到了一个具有相同源代码的不同页面。以下是我对splas的设置： ROBOTSTXT_OBEY = False SPLASH_URL = 'http://192.168.99.100:8050' DOWNLOADER_MIDDLEWARES = { 'scrapy_splash.SplashCookiesMiddleware': 723, 'scrapy_splash.SplashMiddleware': 725, 'scrapy.downloadermiddlewa

浏览 2提问于2018-08-18得票数 0

4回答

允许重复下载与刮除图像管道？

python、scrapy、pipeline

请参阅下面的代码示例版本，它使用Scrapy Image管道从站点下载/刮取图像： import scrapy from scrapy_splash import SplashRequest from imageExtract.items import ImageextractItem class ExtractSpider(scrapy.Spider): name = 'extract' start_urls = ['url'] def parse(self, response): image = Imageextr

浏览 0提问于2017-07-18得票数 2

回答已采纳

1回答

无法导入'scrapy_splash‘pylint(导入错误)

python-3.x、scrapy、pylint、scrapy-splash

当尝试在VS Code中导入Splash Request时，我收到以下错误消息： Unable to import 'scrapy_splash' pylint(import-error) 你知道为什么会这样吗？我已经启动并运行了Splash，并且包已经安装在我的环境中。我使用的是Python 3.7 Here is a screenshoot

浏览 28提问于2019-04-16得票数 2

回答已采纳

1回答

将真实URL通过Scrapy-Splash传递到字典

python、scrapy、scrapy-splash

当试图通过('url‘：response.request.url)将url保存到字典中时，Scrapy从Scrapy中保存所有相同的url () 我尝试过添加额外的参数，这些参数将传递真正的URL，但没有效果。 from scrapy import Spider from scrapy.http import FormRequest from scrapy.utils.response import open_in_browser from scrapy import Request import scrapy from scrapy_splash import SplashReque

浏览 2提问于2019-02-01得票数 2

回答已采纳

1回答

使用splash抓取网站时未返回任何内容

python、scrapy

我是splash的新手，所以我有这个问题:我试着用splash：https://iboard.ssi.com.vn/bang-gia/vn30抓取这个网站。响应是200，但是当我包含我的xpath时，它什么也没有返回。这是我的代码：(我已经更改了下载中间件) import scrapy from scrapy_splash import SplashRequest class VndirectScrapeSpider(scrapy.Spider): name = 'vndirect_scrape' allowed_domains = ['iboard

浏览 60提问于2021-07-23得票数 1

1回答

scrapy中的Lua脚本

python、lua、scrapy、scrapy-splash

我正在使用scrapy 1.6和splash 3.2： import scrapy import random from scrapy_splash import SplashRequest from scrapy.utils.response import open_in_browser from scrapy.linkextractors import LinkExtractor USER_AGENT = 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:48.0) Gecko/20100101 Firefox/48.0' c

浏览 11提问于2019-06-25得票数 1

回答已采纳

2回答

Splash不会执行lua脚本

scrapy、scrapy-splash、splash-js-render

我遇到了一个问题，我的Lua脚本拒绝执行。从ScrapyRequest调用返回的响应似乎是一个HTML，而我期待的是一个文档标题。我假设Lua脚本从未被调用，因为它似乎对响应没有明显的影响。我已经在文档中挖掘了很多东西，而且似乎不太清楚这里缺少什么。有人有什么建议吗？ from urlparse import urljoin import scrapy from scrapy_splash import SplashRequest GOOGLE_BASE_URL = 'https://www.google.com/' GOOGLE_QUERY_PARAMETERS = &

浏览 4提问于2016-08-12得票数 4

回答已采纳

2回答

使用代理时，启动不执行Javascript

scrapy、web-crawler、scrapy-splash

我正在使用scrapy_splash来抓取需要js来检索正确内容的页面。当我使用没有代理设置的SplashRequest时，一切都很好，但是当我输入代理设置时，javascript不会呈现，给我的prejavascript html内容没有我需要的数据。有人知道怎么解决这个问题吗？我确信代理ip没有被列入黑名单。

浏览 20提问于2017-02-04得票数 0

1回答

scrapy的解析输出

python、scrapy、scrapy-splash

我正在用和后面的scrapy1.6测试一个splash实例。我的蜘蛛： import scrapy from scrapy_splash import SplashRequest from scrapy.utils.response import open_in_browser class MySpider(scrapy.Spider): start_urls = ["http://yahoo.com"] name = 'mytest' def start_requests(self): for url in

浏览 1提问于2019-06-20得票数 0

回答已采纳

1回答

Scrapy HTTP状态代码未处理或不允许

python、scrapy、scrapy-splash、scrapyd

我正在使用scrapy-splash来抓取一个使用javascript加载结果的汽车经销商网站，但我一直收到错误504 Gateway Time-out。我有docker和Win10，我不认为问题出在docker配置上，因为我可以用相同的代码抓取另一个站点。 import scrapy from scrapy_splash import SplashRequest from scrapy.loader import ItemLoader from ..items import AutoItem class Main_Spider(scrapy.Spider): name =

浏览 24提问于2021-02-10得票数 0

1回答

刮擦javascript生成的结果- scrapy-splash

python、web-scraping、scrapy、scrapy-splash

当您在字典搜索框中引入一些缩略词时，我正在尝试刮取javascript函数生成的结果。这是我使用的代码： import scrapy from scrapy_splash import SplashFormRequest class SedomSpider(scrapy.Spider): name = 'sedom-spider' url_s = 'https://www.sedom.es/diccionario/' formdata = {'sigla': 'AA'} def parse(

浏览 1提问于2021-05-11得票数 0

1回答

Scrapinghub/Zyte:延迟中未处理的错误:没有名为“scrapy_user_agents”的模块

python、scrapy、scrapy-splash、scrapinghub、zyte

我正在通过本地机器将我的Scrapy蜘蛛部署到Zyte (前ScrapingHub)。这是成功的。当我运行蜘蛛时，我得到了下面的输出。我已经查过了。Zyte团队在他们自己的站点上似乎没有很好的响应能力，但是我发现开发人员在这里通常更活跃:) 我的scrapinghub.yml看起来是这样的： projects: default: <myid> requirements: file: requirements.txt 我尝试将这些行添加到requirements.txt中，但是，无论我使用哪一行，都会生成相同的输出错误。 git+git://gi

浏览 28提问于2021-09-29得票数 0

2回答

登录在Splash API中工作，但在使用SplashRequest时不起作用

web-scraping、scrapy、scrapy-splash

Splash相对较新。我正在尝试抓取一个需要登录的网站。我从能够完美登录的Splash API开始。但是，当我使用SplashRequest将我的代码放在一个粗糙的爬虫脚本中时，它无法登录。 import scrapy from scrapy_splash import SplashRequest class Payer1Spider(scrapy.Spider): name = "payer1" start_url = "https://provider.wellcare.com/provider/claims/search"

浏览 50提问于2019-07-25得票数 0

1回答

Scrapy CrawlerProcess不使用代理

proxy、scrapy、scrapy-splash

我创建了一个使用scrapy，splash和proxy的爬虫。当我只执行一只蜘蛛时，一切都很好。然而，当我尝试使用CrawlerProcess时，我的蜘蛛没有使用代理，这会导致快速禁用。爬行器代码 # -*- coding: utf-8 -*- import scrapy from scrapy_splash import SplashRequest from scrapy.crawler import CrawlerProcess from my_fake_useragent import UserAgent ua = UserAgent() class AdsSpiderSpide

浏览 1提问于2021-09-20得票数 1

3回答

splash在scrapy端点=‘render.json’时不等待

python-3.x、scrapy、splash-screen、scrapy-splash

我正在尝试从iframe获取内容，因此我将启动请求端点从execute更改为render.json。然而，splash.wait根本不起作用。这是爬虫代码。 import scrapy from scrapy_splash import SplashRequest from scrapy.http import HtmlResponse src=""" function main(splash, args) assert(splash:go(args.url)) assert(splash:wait(10)) return { html = spla

浏览 0提问于2017-09-27得票数 0

1回答

10049:请求的地址在其上下文中无效。Scrapy Splash未正确读取URL

python-3.x、scrapy、splash-screen、scrapy-splash

我正在尝试使用splash为一个更复杂的站点获取在网页中读取的代码，但我甚至无法为这个简单的站点位置运行代码。我运行了docker，并在settings.py文件中将8050端口映射到0.0.0.0。任何帮助都将不胜感激。请提供您使用的任何包的版本，因为我担心这可能是一个问题。在此过程中，我尝试了许多错误修复。更改Splash、Scrapy和Twisted的版本。Scrapy只能在Python 3.x上使用较新版本的Twisted，但Splash表示无法与Twisted > 16.2相媲美。所以我试着在没有修复的情况下切换了一些版本。 import scrapy import scra

浏览 1提问于2019-01-10得票数 1

1回答

Scrapy Spash不爬行

python、scrapy、scrapy-splash

我正在尝试对搜索一家公司的年报返回的链接进行非常基本的打印，链接如下：https://www.mergentarchives.com/searchResults.php?searchType=annualReports&companyName=3Com+Corp.&compNumber=37958&aracompNumber=0 我需要使用Splash来渲染链接，因为这个网站是用javascript编写的，搜索结果是动态加载的。当我尝试打印链接列表时，抓取器就是不爬行。这是我非常简单的代码： import scrapy from scrapy_splash impor

浏览 17提问于2021-06-28得票数 0

1回答

scrapy-splash呈现多于第一页的内容

javascript、scrapy、splash-js-render

我正在尝试抓取一个网站，但需要在所有页面上使用splash，因为它们的内容是动态创建的。现在它只呈现第一个页面，而不是内容页面或分页页面。代码如下： import scrapy from scrapy_splash import SplashRequest import scrapy_splash class ShutSpider(scrapy.Spider): name = 'Shut' def start_requests(self): yield SplashRequest(url='ROOTURL',callb

浏览 15提问于2017-12-15得票数 1

回答已采纳

1回答

Splash - Scrapy - HAR数据

python、scrapy、scrapy-splash、scrapinghub、splash-js-render

总的来说，我知道如何使用Scrapy和x-path来解析html。但是，我不知道如何获取HAR数据。 mport scrapy from scrapy_splash import SplashRequest class QuotesSpider(scrapy.Spider): name = 'quotes' allowed_domains = ['quotes.toscrape.com'] start_urls = ['http://quotes.toscrape.com/js'] def start_req

浏览 32提问于2020-01-17得票数 3

1回答

如何在scrapy_splash中生成当前的响应URL

web-scraping、scrapy、scrapy-splash

如果我尝试在我的parse()方法中使用response.request.url来生成url，它将返回： http://192.168.99.100:8050/execute 在Lua脚本中返回URL是可行的，但我不知道如何在parse()方法中生成它。 import scrapy from scrapy_splash import SplashRequest class ComputersSpider(scrapy.Spider): name = 'computers' allowed_domains = ['http://daraz.pk'

浏览 7提问于2020-01-22得票数 0

1回答

如何从收到的响应中抓取html代码？

python、html、scrapy、web-crawler、scrape

我试图爬行-抓取一个网站与抓取和飞溅。我想从图像中的响应中抓取特定的html代码。以下是响应及其标头： ? 下面是响应(我想要抓取的html )： ? 我可以使用Inspect工具找到该HTML。我的代码返回的是html，我可以用"View page source“工具看到它。因此，这意味着Javascript在嵌入代码之前会对代码进行修改。但是，启动角色是运行javascript并返回HTML，不是吗？？response.body返回页面的源代码，而不是我在上面提到的响应中需要的html代码。 import scrapy from scrapy_splash import

浏览 18提问于2019-05-30得票数 1

1回答

用python的scrapy splash同时刮取多个不同的urls

python、web-scraping、scrapy、scrapy-splash

我需要同时刮多个url，使用scrapy和splash。我试着写了下面的代码，但还是没有结果.. 我附上了网址..。这里..。 '， '， '‘ 所以我需要迭代这些URL，然后使用scrapy刮掉它。我无法通过多个url获取数据。它显示出错误。请帮帮忙我的问题是我如何进一步刮掉这个URL列表？ import scrapy from scrapy_splash import SplashRequest import scrapy_proxies class WundergroundSpider(scrapy.Spider): name = 'wu

浏览 1提问于2020-05-05得票数 0

1回答

刮擦-等飞溅结束？

python、scrapy、splash-screen

下面是我的代码的简化版本。当运行时，文本“FINISHED”在“run”之前打印很长时间： import scrapy from scrapy_splash import SplashRequest class ExtractSpider(scrapy.Spider): name = 'extract' start_urls = ['SomeURL'] def parse(self, response): url_list = response.css('a.title::attr(href)').e

浏览 0提问于2017-07-19得票数 0

回答已采纳

1回答

Scrapy: AttributeError：'str‘对象没有'setdefault’属性

python、scrapy、scrapy-splash

我在测试scrapy和splash我有一只蜘蛛： class MySpider(scrapy.Spider): # start_urls = ["http://yahoo.com"] name = 'mytest' def __init__(self, state='CA', city='San_Francisco', *args, **kwargs): super().__init__(*args, **kwargs) self.state = state

浏览 60提问于2019-06-23得票数 0

回答已采纳

1回答

使用scrapy和splash抓取javascript呈现页面时缺少的项

python、web-scraping、scrapy、scrapy-splash、splash-js-render

我正试图在以下网站上搜寻基本的房地产上市信息：当页面使用javascript向下滚动时，网站的部分内容将从后端API中动态加载。为了解决这个问题，我尝试使用Scrapy和Splash来呈现javascript。我现在面临的问题是，虽然它没有返回所有的列表，但它只返回了前8个。我认为问题是页面没有向下滚动，所以页面没有填充，我需要的div也没有呈现。然后，我尝试添加一些Lua代码(我没有经验)滚动页面向下滚动，希望它将被填充，但它没有工作。下面是我的蜘蛛： import scrapy from scrapy.shell import inspect_response import panda

浏览 9提问于2021-05-28得票数 1

回答已采纳

1回答

Scrapy屏幕截图？

python、lua、scrapy、splash-screen

我试图在抓取每个页面的屏幕截图的同时抓取一个网站。到目前为止，我已经成功地拼凑了以下代码： import json import base64 import scrapy from scrapy_splash import SplashRequest class ExtractSpider(scrapy.Spider): name = 'extract' def start_requests(self): url = 'https://stackoverflow.com/' splash_args = {

浏览 0提问于2017-07-19得票数 9

回答已采纳

1回答

Scrapy - Splash获取动态数据

python、web-scraping、scrapy、scrapy-splash

我正在尝试从这个页面(以及其他页面)获取动态电话号码：https://www.europages.fr/LEMMERFULLWOOD-GMBH/DEU241700-00101.html 在带有类page-action click-tel的元素div上单击后，将显示电话号码。我正在尝试通过scrapy_splash使用LUA脚本执行单击操作来获取这些数据。在我的ubuntu上启动splash后： sudo docker run -d -p 8050:8050 scrapinghub/splash 到目前为止，我的代码如下(我使用的是代理服务)： class company(scrapy.Sp

浏览 21提问于2021-10-13得票数 0

1回答

scrapy-spash: SplashRequest响应对象在刮擦爬行调用与CrawlerProcess调用之间有所不同

python、web-scraping、scrapy、web-crawler、scrapy-splash

我想使用scrapy来获取目标页面的html和屏幕快照png。我需要能够以编程方式调用它。根据，指定 endpoint='render.json' 和传递的论点 'png': 1 应该会产生一个响应对象('scrapy_splash.response.SplashJsonResponse')，其中包含一个.data属性，该属性包含表示目标页面的png屏幕快照的解码JSON数据。当蜘蛛(此处命名为'search')被调用时 scrapy crawl search 结果与预期的一样，response.data' png‘包含

浏览 2提问于2019-03-10得票数 1

1回答

TypeError: close_spider()缺少一个必需的位置参数：“原因”

python-3.x、scrapy、scrapy-splash、scrapy-pipeline

执行时，从页面中提取蜘蛛数据，但当管道启动时，会出现问题.我得到以下错误：追溯(最近一次调用)：文件"C:\Users\EAgnelli\AppData\Local\Continuum\anaconda3\envs\tensorflow\lib\site-packages\twisted\internet\defer.py"，第654行，在_runCallbacks current.result =回调(current.result，*args，**kw) TypeError: close_spider()缺少一个必需的位置参数：‘理由’ 我通过Scrapy发送请求，

浏览 1提问于2019-03-05得票数 1

1回答

拒绝访问:您没有在此服务器上访问"http://www.airbnb.ca/rooms/48058366/“”的权限

python、web-scraping、scrapy、scrapy-splash

有什么办法可以避免这个错误吗？我正在使用splash获取HTML，但是返回的response.body给了我一个拒绝访问的机会。我可以在工具中查看数据，但是由于这个错误，HTML无法提取。另外，当我只使用splash时，我会看到完整的HTML！我为任何人放置了我的github链接：拒绝访问\n 访问被拒绝 \n您没有在此服务器上访问"“的权限。编号18.66cc94d1.1643648347.66b47664\n ‘ import scrapy from scrapy_splash import SplashRequest class SimpleSpider(scrapy

浏览 7提问于2022-01-31得票数 0

回答已采纳

1回答

如何在google的网络商店搜索结果上抓取动态生成的数据

python-3.x、dynamic、scrapy、web-crawler、scrapy-splash

我想抓取一个网页，它显示了在谷歌的网络商店的搜索结果，该链接是静态的，为特定的关键字。我想要定期查找扩展的排名。这是问题是我无法呈现由Javascript代码生成的动态数据，以响应服务器。我尝试使用Scrapy和Scrapy-Splash呈现所需的页面，但仍然得到相同的响应。我使用Docker在端口8050上运行scrapinghub/splash容器的一个实例。我甚至访问了网页http://localhost:8050并手动输入了我的网址，但它无法呈现数据，尽管消息显示成功。这是我为爬虫写的代码。它实际上什么也不做，它唯一的工作就是获取所需页面的HTML内容。 import scra

浏览 12提问于2019-07-07得票数 0

1回答

为什么scrapy_splash CrawlSpider所花费的时间与用Selenium擦伤的时间相同？

python、scrapy、scrapy-splash

我有以下刮伤的CrawlSpider import logger as lg from scrapy.crawler import CrawlerProcess from scrapy.http import Response from scrapy.spiders import CrawlSpider, Rule from scrapy_splash import SplashTextResponse from urllib.parse import urlencode from scrapy.linkextractors import LinkExtractor from scrapy.h

浏览 8提问于2022-01-10得票数 2

回答已采纳

3回答

更改Scrapy/Splash用户代理

python-3.x、web-scraping、scrapy、splash-screen

如何以类似的方式设置Scrapy的用户代理，如下所示： import requests from bs4 import BeautifulSoup ua = {"User-Agent":"Mozilla/5.0"} url = "http://www.example.com" page = requests.get(url, headers=ua) soup = BeautifulSoup(page.text, "lxml") 我的蜘蛛看起来就像这样： import scrapy from scrapy_splash impor

浏览 0提问于2017-09-04得票数 5

回答已采纳

1回答

有问题的刮-溅脚本。我只得到一个结果，而我的刮板不解析其他页面。

python、lua、scrapy、scrapy-splash

我正试图从javascript网站解析一个列表。当我运行它时，它只在每个列上返回一个条目，然后蜘蛛就关闭了。我已经设置了我的中间件设置。我不知道到底出了什么问题。提前感谢！ import scrapy from scrapy_splash import SplashRequest class MalrusSpider(scrapy.Spider): name = 'malrus' allowed_domains = ['backgroundscreeninginrussia.com'] start_urls = ['http:

浏览 4提问于2020-01-31得票数 0

回答已采纳

1回答

我怎么才能解决这个问题，它不回短信？

web-scraping、scrapy、scrapy-splash

我是刮这个网站，我在提取文本上有问题。我试过各种方法，但都没有用 import scrapy from scrapy_splash import SplashRequest class QuotesSpider(scrapy.Spider): name = "hi" start_urls = [ 'https://cadres.apec.fr/home/mes-offres/recherche-des-offres-demploi/liste-des-offres-demploi.html?motsCles=commercial&

浏览 1提问于2019-08-07得票数 1