文章背景:网络爬虫已经成为自动获取互联网数据的主要方式。Requests模块是Python的第三方模块,能够满足日常的网络请求,而且简单好用。因此,下面对Requests库的使用进行介绍。
对于网络爬虫而言,主要用到的是get()和head()这两个方法。
requests.request(method, url, **kwargs)
requests.get(url, params=None, **kwargs)
requests.head(url, **kwargs)
requests.post(url, data=None, json=None, **kwargs)
requests.put(url, data=None, **kwargs)
requests.patch(url, data=None, **kwargs)
requests.delete(url, **kwargs)
Whenever a call is made to requests.get()
and friends, you are doing two major things.
First, you are constructing a Request
object which will be sent off to a server to request or query some resource.
Second, a Response
object is generated once Requests gets a response back from the server.
The Response
object contains all of the information returned by the server and also contains the Request
object you created originally.
r = requests.get(url)
返回一个包含服务器资源的Response对象。
If we want to access the headers the server sent back to us, we do this:
r.headers
However, if we want to get the headers we sent the server, we simply access the request, and then the request’s headers:
r.request.headers
参考资料:
[1] 中国大学MOOC: Python网络爬虫与信息提取(https://www.icourse163.org/course/BIT-1001870001)
[2] Requests: HTTP for Humans(https://requests.readthedocs.io/en/master/)
[3] python爬虫基础requests库的使用以及参数详解(https://blog.csdn.net/weixin_45887687/article/details/106162634)