问Scrapy:将response.body保存为html文件？
EN

Stack Overflow用户

提问于 2017-09-06 13:15:11

回答 2查看 17.4K关注 0票数 8

我的爬虫可以工作，但我无法下载我在.html文件中爬行的网站的正文。如果我写测试(‘self.html_fil.write’)，那么它工作得很好。我不知道如何把tulpe转换成字符串。

我使用Python 3.6

蜘蛛：

class ExampleSpider(scrapy.Spider):
    name = "example"
    allowed_domains = ['google.com']
    start_urls = ['http://google.com/']

    def __init__(self):
        self.path_to_html = html_path + 'index.html'
        self.path_to_header = header_path + 'index.html'
        self.html_file = open(self.path_to_html, 'w')

    def parse(self, response):
        url = response.url
        self.html_file.write(response.body)
        self.html_file.close()
        yield {
            'url': url
        }

跟踪：

Traceback (most recent call last):
  File "c:\python\python36-32\lib\site-packages\twisted\internet\defer.py", line
 653, in _runCallbacks
    current.result = callback(current.result, *args, **kw)
  File "c:\Users\kv\AtomProjects\example_project\example_bot\example_bot\spiders
\example.py", line 35, in parse
    self.html_file.write(response.body)
TypeError: write() argument must be str, not bytes

python

django

scrapy

web-crawler

回答 2

Stack Overflow用户

回答已采纳

发布于 2017-09-06 13:28:31

实际的问题是你得到的是字节码。您需要将其转换为字符串格式。有许多方法可以将字节格式转换为字符串格式。您可以使用

 self.html_file.write(response.body.decode("utf-8"))

而不是

  self.html_file.write(response.body)

您还可以使用

  self.html_file.write(response.text)

票数 12

Stack Overflow用户

发布于 2018-07-05 06:05:28

考虑到上面的响应，并尽可能增加with语句的使用，示例应该重写如下：

class ExampleSpider(scrapy.Spider):
    name = "example"
    allowed_domains = ['google.com']
    start_urls = ['http://google.com/']

    def __init__(self):
        self.path_to_html = html_path + 'index.html'
        self.path_to_header = header_path + 'index.html'

    def parse(self, response):
        with open(self.path_to_html, 'w') as html_file:
            html_file.write(response.text)
        yield {
            'url': response.url
        }

但是只能从parse方法访问html_file。

票数 4

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/46067258

复制

相似问题

问Scrapy:将response.body保存为html文件？
EN

回答 2

Stack Overflow用户

Stack Overflow用户

社区

活动

资源

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问Scrapy:将response.body保存为html文件？EN

回答 2

Stack Overflow用户

Stack Overflow用户

社区

活动

资源

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问Scrapy:将response.body保存为html文件？
EN