前往小程序,Get更优阅读体验!
立即前往
首页
学习
活动
专区
工具
TVP
发布
社区首页 >专栏 >使用scrapy发送post请求的坑

使用scrapy发送post请求的坑

作者头像
小歪
发布2018-08-31 17:58:22
5.6K0
发布2018-08-31 17:58:22
举报

使用requests发送post请求

先来看看使用requests来发送post请求是多少好用,发送请求

Requests 简便的 API 意味着所有 HTTP 请求类型都是显而易见的。例如,你可以这样发送一个 HTTP POST 请求:

代码语言:javascript
复制
>>> r = requests.post('http://httpbin.org/post', data = {'key':'value'})

使用data可以传递字典作为参数,同时也可以传递元祖

代码语言:javascript
复制
>>> payload = (('key1', 'value1'), ('key1', 'value2'))
>>> r = requests.post('http://httpbin.org/post', data=payload)
>>> print(r.text)
{
  ...
  "form": {
    "key1": [
      "value1",
      "value2"
    ]
  },
  ...
}

传递json是这样

代码语言:javascript
复制
>>> import json

>>> url = 'https://api.github.com/some/endpoint'
>>> payload = {'some': 'data'}

>>> r = requests.post(url, data=json.dumps(payload))

2.4.2 版的新加功能:

代码语言:javascript
复制
>>> url = 'https://api.github.com/some/endpoint'
>>> payload = {'some': 'data'}

>>> r = requests.post(url, json=payload)

也就是说,你不需要对参数做什么变化,只需要关注使用data=还是json=,其余的requests都已经帮你做好了。

使用scrapy发送post请求

官方推荐的 Using FormRequest to send data via HTTP POST

代码语言:javascript
复制
return [FormRequest(url="http://www.example.com/post/action",
                    formdata={'name': 'John Doe', 'age': '27'},
                    callback=self.after_post)]

这里使用的是FormRequest,并使用formdata传递参数,看到这里也是一个字典。

但是,超级坑的一点来了,今天折腾了一下午,使用这种方法发送请求,怎么发都会出问题,返回的数据一直都不是我想要的

代码语言:javascript
复制
return scrapy.FormRequest(url, formdata=(payload))

在网上找了很久,最终找到一种方法,使用scrapy.Request发送请求,就可以正常的获取数据。

代码语言:javascript
复制
return scrapy.Request(url, body=json.dumps(payload), method='POST', headers={'Content-Type': 'application/json'},)

参考:Send Post Request in Scrapy

代码语言:javascript
复制
my_data = {'field1': 'value1', 'field2': 'value2'}
request = scrapy.Request( url, method='POST', 
                          body=json.dumps(my_data), 
                          headers={'Content-Type':'application/json'} )

FormRequest 与 Request 区别

在文档中,几乎看不到差别,

The FormRequest class adds a new argument to the constructor. The remaining arguments are the same as for the Request class and are not documented here. Parameters: formdata (dict or iterable of tuples) – is a dictionary (or iterable of (key, value) tuples) containing HTML Form data which will be url-encoded and assigned to the body of the request.

说FormRequest新增加了一个参数formdata,接受包含表单数据的字典或者可迭代的元组,并将其转化为请求的body。并且FormRequest是继承Request的

代码语言:javascript
复制
class FormRequest(Request):

    def __init__(self, *args, **kwargs):
        formdata = kwargs.pop('formdata', None)
        if formdata and kwargs.get('method') is None:
            kwargs['method'] = 'POST'

        super(FormRequest, self).__init__(*args, **kwargs)

        if formdata:
            items = formdata.items() if isinstance(formdata, dict) else formdata
            querystr = _urlencode(items, self.encoding)
            if self.method == 'POST':
                self.headers.setdefault(b'Content-Type', b'application/x-www-form-urlencoded')
                self._set_body(querystr)
            else:
                self._set_url(self.url + ('&' if '?' in self.url else '?') + querystr)
            ###


def _urlencode(seq, enc):
    values = [(to_bytes(k, enc), to_bytes(v, enc))
              for k, vs in seq
              for v in (vs if is_listlike(vs) else [vs])]
    return urlencode(values, doseq=1)

最终我们传递的{'key': 'value', 'k': 'v'}会被转化为'key=value&k=v' 并且默认的method是POST,再来看看Request

代码语言:javascript
复制
class Request(object_ref):

    def __init__(self, url, callback=None, method='GET', headers=None, body=None,
                 cookies=None, meta=None, encoding='utf-8', priority=0,
                 dont_filter=False, errback=None, flags=None):

        self._encoding = encoding  # this one has to be set first
        self.method = str(method).upper()

默认的方法是GET,其实并不影响。仍然可以发送post请求。这让我想起来requests中的request用法,这是定义请求的基础方法。

代码语言:javascript
复制
def request(method, url, **kwargs):
    """Constructs and sends a :class:`Request <Request>`.

    :param method: method for the new :class:`Request` object.
    :param url: URL for the new :class:`Request` object.
    :param params: (optional) Dictionary or bytes to be sent in the query string for the :class:`Request`.
    :param data: (optional) Dictionary or list of tuples ``[(key, value)]`` (will be form-encoded), bytes, or file-like object to send in the body of the :class:`Request`.
    :param json: (optional) json data to send in the body of the :class:`Request`.
    :param headers: (optional) Dictionary of HTTP Headers to send with the :class:`Request`.
    :param cookies: (optional) Dict or CookieJar object to send with the :class:`Request`.
    :param files: (optional) Dictionary of ``'name': file-like-objects`` (or ``{'name': file-tuple}``) for multipart encoding upload.
        ``file-tuple`` can be a 2-tuple ``('filename', fileobj)``, 3-tuple ``('filename', fileobj, 'content_type')``
        or a 4-tuple ``('filename', fileobj, 'content_type', custom_headers)``, where ``'content-type'`` is a string
        defining the content type of the given file and ``custom_headers`` a dict-like object containing additional headers
        to add for the file.
    :param auth: (optional) Auth tuple to enable Basic/Digest/Custom HTTP Auth.
    :param timeout: (optional) How many seconds to wait for the server to send data
        before giving up, as a float, or a :ref:`(connect timeout, read
        timeout) <timeouts>` tuple.
    :type timeout: float or tuple
    :param allow_redirects: (optional) Boolean. Enable/disable GET/OPTIONS/POST/PUT/PATCH/DELETE/HEAD redirection. Defaults to ``True``.
    :type allow_redirects: bool
    :param proxies: (optional) Dictionary mapping protocol to the URL of the proxy.
    :param verify: (optional) Either a boolean, in which case it controls whether we verify
            the server's TLS certificate, or a string, in which case it must be a path
            to a CA bundle to use. Defaults to ``True``.
    :param stream: (optional) if ``False``, the response content will be immediately downloaded.
    :param cert: (optional) if String, path to ssl client cert file (.pem). If Tuple, ('cert', 'key') pair.
    :return: :class:`Response <Response>` object
    :rtype: requests.Response

    Usage::

      >>> import requests
      >>> req = requests.request('GET', 'http://httpbin.org/get')
      <Response [200]>
    """

    # By using the 'with' statement we are sure the session is closed, thus we
    # avoid leaving sockets open which can trigger a ResourceWarning in some
    # cases, and look like a memory leak in others.
    with sessions.Session() as session:
        return session.request(method=method, url=url, **kwargs)
本文参与 腾讯云自媒体分享计划,分享自微信公众号。
原始发表:2018-08-23,如有侵权请联系 cloudcommunity@tencent.com 删除

本文分享自 Python爬虫与算法进阶 微信公众号,前往查看

如有侵权,请联系 cloudcommunity@tencent.com 删除。

本文参与 腾讯云自媒体分享计划  ,欢迎热爱写作的你一起参与!

评论
登录后参与评论
0 条评论
热度
最新
推荐阅读
目录
  • 使用requests发送post请求
  • 使用scrapy发送post请求
领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档