requests库是python中常用的网络库,有着比原生urllib更丰富的功能和更易用的接口,但是并不一定有相当的灵活性。这不现在就有一个问题。
以数据万象的图片处理请求为例:
http://examples-1251000004.picsh.myqcloud.com/sample.jpeg?imageMogr2/sharpen/55|imageView2/1/w/200/h/300/q/85
该请求是将指定图片进行锐化处理然后再做压缩操作,正常请求是没问题的,但当使用requests库进行下载时却:
发现requests的下载结果并没有生效,抓包对比发现,requests将请求的url做了urlencode,导致变成了
http://examples-1251000004.picsh.myqcloud.com/sample.jpeg?imageMogr2/sharpen/55|imageView2/1/w/200/h/300/q/85
可见 管道操作符 | 变成了 %7C 从而导致管道操作失效了。
纳尼,这有办法关闭吗?requests做的这么好一定留了开关吧,抱着试一试的态度翻开了requests的代码:
def request(self, method, url,
params=None, data=None, headers=None, cookies=None, files=None,
auth=None, timeout=None, allow_redirects=True, proxies=None,
hooks=None, stream=None, verify=None, cert=None, json=None):
"""Constructs a :class:`Request <Request>`, prepares it and sends it.
Returns :class:`Response <Response>` object.
:param method: method for the new :class:`Request` object.
:param url: URL for the new :class:`Request` object.
:param params: (optional) Dictionary or bytes to be sent in the query
string for the :class:`Request`.
:param data: (optional) Dictionary, list of tuples, bytes, or file-like
object to send in the body of the :class:`Request`.
:param json: (optional) json to send in the body of the
:class:`Request`.
:param headers: (optional) Dictionary of HTTP Headers to send with the
:class:`Request`.
:param cookies: (optional) Dict or CookieJar object to send with the
:class:`Request`.
:param files: (optional) Dictionary of ``'filename': file-like-objects``
for multipart encoding upload.
:param auth: (optional) Auth tuple or callable to enable
Basic/Digest/Custom HTTP Auth.
:param timeout: (optional) How long to wait for the server to send
data before giving up, as a float, or a :ref:`(connect timeout,
read timeout) <timeouts>` tuple.
:type timeout: float or tuple
:param allow_redirects: (optional) Set to True by default.
:type allow_redirects: bool
:param proxies: (optional) Dictionary mapping protocol or protocol and
hostname to the URL of the proxy.
:param stream: (optional) whether to immediately download the response
content. Defaults to ``False``.
:param verify: (optional) Either a boolean, in which case it controls whether we verify
the server's TLS certificate, or a string, in which case it must be a path
to a CA bundle to use. Defaults to ``True``.
:param cert: (optional) if String, path to ssl client cert file (.pem).
If Tuple, ('cert', 'key') pair.
:rtype: requests.Response
"""
事与愿违,并没有这个开关,而且requests还是对整个uri做的urlencode,也就是不管你使用 params设置的query参数还是直接加到url中自己拼好的,它都要干预一下:
def prepare_url(self, url, params):
"""Prepares the given HTTP URL."""
#: Accept objects that have string representations.
#: We're unable to blindly call unicode/str functions
#: as this will include the bytestring indicator (b'')
#: on python 3.x.
#: https://github.com/requests/requests/pull/2238
'''省略很多信息'''
enc_params = self._encode_params(params)
if enc_params:
if query:
query = '%s&%s' % (query, enc_params)
else:
query = enc_params
url = requote_uri(urlunparse([scheme, netloc, path, None, query, fragment]))
self.url = url
def requote_uri(uri):
"""Re-quote the given URI.
This function passes the given URI through an unquote/quote cycle to
ensure that it is fully and consistently quoted.
:rtype: str
"""
safe_with_percent = "!#$%&'()*+,/:;=?@[]~"
safe_without_percent = "!#$&'()*+,/:;=?@[]~"
try:
# Unquote only the unreserved characters
# Then quote only illegal characters (do not quote reserved,
# unreserved, or '%')
return quote(unquote_unreserved(uri), safe=safe_with_percent)
except InvalidURL:
# We couldn't unquote the given URI, so let's try quoting it, but
# there may be unquoted '%'s in the URI. We need to make sure they're
# properly quoted so they do not cause issues elsewhere.
return quote(uri, safe=safe_without_percent)
没有了办法,被requests逼上绝路,只能自己另辟蹊径了。如何能不更改requests源码而更通用的解决问题呢,可能这是一个小众问题,被股哥和度姐拒绝后,我开始了研究源码,既然没有参数控制,看看能不能将requests.url修改一下,如下所示,自己设置的url参数被放在了 req.url 来保存,而该参数则是在 prepare_request 函数中进行了urlencode的修改:
def request(self, method, url,
params=None, data=None, headers=None, cookies=None, files=None,
auth=None, timeout=None, allow_redirects=True, proxies=None,
hooks=None, stream=None, verify=None, cert=None, json=None):
"""Constructs a :class:`Request <Request>`, prepares it and sends it.
Returns :class:`Response <Response>` object.
:rtype: requests.Response
"""
# Create the Request.
req = Request(
method=method.upper(),
url=url,
headers=headers,
files=files,
data=data or {},
json=json,
params=params or {},
auth=auth,
cookies=cookies,
hooks=hooks,
)
prep = self.prepare_request(req)
proxies = proxies or {}
settings = self.merge_environment_settings(
prep.url, proxies, stream, verify, cert
)
# Send the request.
send_kwargs = {
'timeout': timeout,
'allow_redirects': allow_redirects,
}
send_kwargs.update(settings)
resp = self.send(prep, **send_kwargs)
return resp
若想修改该参数只能在这之后,而能拿到req也就是 prep 参数的只有本身的 request函数 和 send函数了,而request函数逻辑太重,何不接管 send函数呢,说做就做。
import requests
class TrickUrlSession(requests.Session):
def setUrl(self, url):
self._trickUrl = url
def send(self, request, **kwargs):
if self._trickUrl:
request.url = self._trickUrl
return requests.Session.send(self, request, **kwargs)
'''使用方法'''
session = TrickUrlSession()
session.setUrl(url)
session.get(url)
这样就可以以最小的代价达到目的了,使用也很方便,但如果是多线程的话,则必须每个线程一个session,这样达不到共享连接池的效果,我们可以稍作修改,线程共用session,每个线程单独保存自己的trickUrl:
import requests
import threading
localData = threading.local()
class TrickUrlSession(requests.Session):
def send(self, request, **kwargs):
if hasattr(localData, 'trickUrl') and localData.trickUrl:
request.url = localData.trickUrl
return requests.Session.send(self, request, **kwargs)
'''使用方法'''
session = TrickUrlSession()
localData.trickUrl = url
session.get(url)
问题解决,可以悠然的在多线程间共享连接池从数据万象下载图片了呢。
原创声明:本文系作者授权腾讯云开发者社区发表,未经许可,不得转载。
如有侵权,请联系 cloudcommunity@tencent.com 删除。
原创声明:本文系作者授权腾讯云开发者社区发表,未经许可,不得转载。
如有侵权,请联系 cloudcommunity@tencent.com 删除。