cURL——爬虫开发神器

cURL是一个利用URL语法在命令行下工作的文件传输工具,1997年首次发行。它支持文件上传和下载,所以是综合传输工具,但按传统,习惯称cURL为下载工具。cURL还包含了用于程序开发的libcurl。

一些命令

[root@localhost ~]# curl http://httpbin.org/ip
{
  "origin": "218.189.127.78"
}
[root@localhost ~]# curl http://httpbin.org/user-agent
{
  "user-agent": "curl/7.29.0"
}
[root@localhost ~]# curl https://httpbin.org/get?show_env=1
{
  "args": {
    "show_env": "1"
  }, 
  "headers": {
    "Accept": "*/*", 
    "Connect-Time": "0", 
    "Connection": "close", 
    "Host": "httpbin.org", 
    "Total-Route-Time": "0", 
    "User-Agent": "curl/7.29.0", 
    "Via": "1.1 vegur", 
    "X-Forwarded-For": "218.189.127.78", 
    "X-Forwarded-Port": "443", 
    "X-Forwarded-Proto": "https", 
    "X-Request-Id": "392e0fda-5f1b-4cc8-8131-77967bfee9db", 
    "X-Request-Start": "1499761771703"
  }, 
  "origin": "218.189.127.78", 
  "url": "https://httpbin.org/get?show_env=1"
}

Copy as cURL

可以使用Chrome直接复制cURL,方法如下

复制如下

curl 'https://github.com/' -H 'Connection: keep-alive' -H 'Upgrade-Insecure-Requests: 1' -H 'User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.131 Safari/537.36' -H 'Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3' -H 'Accept-Encoding: gzip, deflate, br' -H 'Accept-Language: zh-CN,zh;q=0.9,en;q=0.8,zh-TW;q=0.7' --compressed

可以直接在终端运行,得到的就是html。

cURL to Python

网站:https://curl.trillworks.com/

就可以直接转换为Python的requests,Headers什么的就不需要手动粘贴了。

Postman

postman支持直接导入cURL,点击Import —— Paste Raw Text,然后粘贴下cURL,就可以直接调试了

我可以任意修改Headers或formdata中的字段,来调试。

一个问题

如果使用Charles抓包,复制的cURL不能直接导入postman,从Charles复制的格式如下:

curl -H 'Host: httpbin.org' -H 'Pragma: no-cache' -H 'Cache-Control: no-cache' -H 'Upgrade-Insecure-Requests: 1' -H 'User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.131 Safari/537.36' -H 'Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3' -H 'Referer: https://www.google.com/' -H 'Accept-Language: zh-CN,zh;q=0.9,en;q=0.8,zh-TW;q=0.7' --compressed 'https://httpbin.org/'

对比在浏览器中复制的cURL

curl 'https://httpbin.org/' -H 'Connection: keep-alive' -H 'Upgrade-Insecure-Requests: 1' -H 'User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.131 Safari/537.36' -H 'Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3' -H 'Accept-Encoding: gzip, deflate, br' -H 'Accept-Language: zh-CN,zh;q=0.9,en;q=0.8,zh-TW;q=0.7' --compressed

发现URL所处位置不一样,需要手动的把URL放在开头位置,才能倒入使用。

最后,关于postman的功能远不止这些,有人专门写了文档:API开发利器:Postman (阅读原文)

原文发布于微信公众号 - Python爬虫与算法进阶(zhangslob)

原文发表时间:2019-05-16

本文参与腾讯云自媒体分享计划,欢迎正在阅读的你也加入,一起分享。

发表于

我来说两句

0 条评论
登录 后参与评论

扫码关注云+社区

领取腾讯云代金券