爬虫入门一把搜

实战下载母猫图,哈哈哈哈

# -*- coding:UTF-8 -*-import urllib.request#引用模块
response = urllib.request.urlopen("http://placekitten.com/g/200/300")'''urlopen是什么。URLopen可以简单理解为打开,里面可以是条字符串'''meinv_img = response.read()#
with open("meinvtu.jpg",'wb')as f:#    f.write(meinv_img)#
>>> response.geturl()'http://placekitten.com/g/200/300'>>> response.info()<http.client.HTTPMessage object at 0x04763E70>>>> print(response.info())Date: Wed, 19 Jun 2019 16:06:49 GMTContent-Length: 6327Connection: closeSet-Cookie: __cfduid=d1ea0fa0d89b88bce166015e474dfc91c1560960409; expires=Thu, 18-Jun-20 16:06:49 GMT; path=/; domain=.placekitten.com; HttpOnlyAccess-Control-Allow-Origin: *Cache-Control: public, max-age=86400Expires: Thu, 20 Jun 2019 16:06:49 GMTCF-Cache-Status: HITAccept-Ranges: bytesVary: Accept-EncodingServer: cloudflareCF-RAY: 4e96c09e7b469971-LAX

实例二

进行在线翻译

游览器内容渗透基础之浅谈HTTP请求(小白文)在一文中很详细

首先在游览器中输入http://fanyi.youdao.com

F12打开审查元素,如下所示也可以右键打开审查元素

点击

然后输入渗透云笔记查看

点击链接查看

把data沾出来

这些是与爬虫对应的

如下

Request URL: http://fanyi.youdao.com/translate_o?smartresult=dict&smartresult=rulei: 渗透云笔记from: AUTOto: AUTOsmartresult: dictclient: fanyideskwebsalt: 15609628090667sign: 6c2f918076d0c0a5426e1b7bcbf4b33ats: 1560962809066bv: 8eb5748fd9d9cf1da538ed0cc7b0c0e5doctype: jsonversion: 2.1keyfrom: fanyi.webaction: FY_BY_CLICKBUTTION
  1. 如下
  2. # -*- coding:UTF-8 -*-import urllib.requestimport urllib.parseurl = 'http://fanyi.youdao.com/translate?smartresult=dict&smartresult=rule'#直接从审查元素中copy过来的url会报错,必须把translate_o中的_o 删除才可以 #url = "http://fanyi.youdao.com/translate_o?smartresult=dict&smartresult=rule" data = {}data['i'] = '渗透云笔记'data['from'] = 'AUTO'data['to'] = 'AUTO'data['smartresult'] = 'dict'data['client'] = 'fanyideskweb'data['salt'] = '15609628090667'data['sign'] = '6c2f918076d0c0a5426e1b7bcbf4b33a'data['ts'] = '1560962809066'data['bv'] = '8eb5748fd9d9cf1da538ed0cc7b0c0e5'data['doctype'] = 'json'data['version'] = '2.1'data['keyfrom'] = 'fanyi.web'data['action'] = 'FY_BY_CLICKBUTTION' #使用urllib.parse.urlencode() 把data转换为需要的形式#带上编码(utf-8)data = urllib.parse.urlencode(data).encode('utf-8') response = urllib.request.urlopen(url, data)html = response.read().decode('utf-8')print(html)

运行结果为

>>>                           {"type":"ZH_CN2EN","errorCode":0,"elapsedTime":1,"translateResult":[[{"src":"渗透云笔记","tgt":"Penetrate cloud notes"}]]}
>>> 

这样舒服

乍一看妈耶,这不是个字典吧

>>> type(html)<class 'str'>>>> 

但是这里返回的是字符串,是因为使用了JSON格式说白了就是用字符串把Python的数据结构封装一下子

下面美化一下我们的程序

# -*- coding:UTF-8 -*-import urllib.requestimport urllib.parseimport jsoncontent = input("请输入翻译的内容:")url = 'http://fanyi.youdao.com/translate?smartresult=dict&smartresult=rule'#直接从审查元素中copy过来的url会报错,必须把translate_o中的_o 删除才可以
#url = "http://fanyi.youdao.com/translate_o?smartresult=dict&smartresult=rule"
data = {}data['i'] = contentdata['from'] = 'AUTO'data['to'] = 'AUTO'data['smartresult'] =  'dict'data['client'] =  'fanyideskweb'data['salt'] = '15609628090667'data['sign'] = '6c2f918076d0c0a5426e1b7bcbf4b33a'data['ts'] = '1560962809066'data['bv'] = '8eb5748fd9d9cf1da538ed0cc7b0c0e5'data['doctype'] = 'json'data['version'] =  '2.1'data['keyfrom'] = 'fanyi.web'data['action'] = 'FY_BY_CLICKBUTTION'
#使用urllib.parse.urlencode() 把data转换为需要的形式#带上编码(utf-8)data = urllib.parse.urlencode(data).encode('utf-8') response = urllib.request.urlopen(url, data)
html = response.read().decode('utf-8')target = json.loads(html)print("翻译结果%s"%(target["translateResult"][0][0]['tgt']))

运行结果

请输入翻译的内容:卧槽翻译结果Oh my god>>> 

安装Beautiful Soup

命令也是

pip install beautifulsoup4easy_install beautifulsoup4

还是cmd进入C:\Python34\Scripts>

原文发布于微信公众号 - 渗透云笔记(shentouyun)

原文发表时间:2019-06-20

本文参与腾讯云自媒体分享计划,欢迎正在阅读的你也加入,一起分享。

发表于

我来说两句

0 条评论
登录 后参与评论

扫码关注云+社区

领取腾讯云代金券