说到美女,第一个想到的就是美女云集的相亲网站了。所以今天也是选取某个相亲网站作为素材,爬取美女图片。
首先需要一个相亲网站的账号,我这里选取的是“我主良缘”。注册登陆就可以了:
登陆后界面大致如上,填一些筛选条件,然后点搜缘分,就是我们要的结果了。但是我们要做的是爬取其中的美女图片,我们右击->检查->Network,然后我们再点一下搜缘分,发现多了下面这条东西:
我们点一下,查看一下header中有什么:
其它我们都不需要看了,我们直接看一下这个网址。就是一个api,哈哈这就是我们要的美女图片api了。API如下:http://www.7799520.com/api/user/pc/list/search?startage=21&endage=30&gender=2&startheight=151&endheight=160&marry=1&salary=2&page=1
我们可以从URL中分析出这个API的参数,主要参数如下:
对于这些参数哪些是必要的哪些是非必要的这个可以自己试出来,对参数值的限定也可以自己试试。
在测试之后,发现上面的API返回的数据为Json数据,返回数据如下:
/*
* 提示:该行代码过长,系统自动注释不进行高亮。一键复制会移除系统注释
* { "data": { "list": [ { "avatar": "http://img.7799520.com/2019-11-27-1574867191-MXUdY0Fc.png", "birthdayyear": "1994", "city": "上海", "education": "初中", "gender": "2", "height": "159", "marry": "未婚", "monolog": "愿得一人心,白首不相离", "monologflag": "1", "province": "上海", "salary": "5千-1万", "userid": "3018330", "username": "单身笑山岚" }, { "avatar": "http://img.7799520.com/FhTV65n3mQ-X-PjfR3W9OpsFs5SO", "birthdayyear": "1991", "city": "北京", "education": "本科", "gender": "2", "height": "160", "marry": "未婚", "monolog": "土生土长北京人一枚,91年底小天蝎~lxt1103程序猿,高薪资,没房有车小有存款~胖胖哒还不高,唉:-(喜爱旅游,美食,旅游吃美食~想找个喜欢运动的小哥哥陪我减肥,或者不介意胖姑娘的男生哦~男孩子最好也是北京的,这样共同话题多,不能离北京太远了,赶春运也很痛苦的希望你是个逗比或者心思灵巧的蓝孩纸,在一起开心快乐聊得来就很幸福了", "monologflag": "-1", "province": "北京", "salary": "2万-5万", "userid": "3018171", "username": "桐桐桐桐桐" }, { "avatar": "http://img.7799520.com/00d0ba6e-5807-44fd-88af-eb379b325835", "birthdayyear": "1991", "city": "深圳", "education": "高中", "gender": "2", "height": "155", "marry": "未婚", "monolog": "如果真心实意可以加微信you02457本人对年龄要求30--35", "monologflag": "-1", "province": "广东", "salary": "1万-2万", "userid": "3017206", "username": "(坦诚相待)" }, { "avatar": "http://img.7799520.com/2019-11-27-1574817016-6JBhbUyU.png", "birthdayyear": "1989", "city": "西安", "education": "大专", "gender": "2", "height": "160", "marry": "未婚", "monolog": "再晚也要嫁给爱情", "monologflag": "-2", "province": "陕西", "salary": "2千-5千", "userid": "3015509", "username": "Best媛" }, { "avatar": "http://img.7799520.com/0e1ed4fa3b5ca22ed120bf08a452506b53c0da49-2019-11-27-15748275951574827595051-hSw85JrS.png", "birthdayyear": "1995", "city": "上海", "education": "硕士", "gender": "2", "height": "155", "marry": "未婚", "monolog": "这个真的不知道咋写哇......爹妈每天催婚.....算是独白吗...", "monologflag": "1", "province": "上海", "salary": "2万-5万", "userid": "3014896", "username": "。。。123" }, { "avatar": "http://img.7799520.com/f9e573e4-728a-4a05-8abd-9688c6d1c156", "birthdayyear": "1997", "city": "宁波", "education": "初中", "gender": "2", "height": "160", "marry": "未婚", "monolog": "愿得一人心,白首不分离,15058276626", "monologflag": "-1", "province": "浙江", "salary": "2千-5千", "userid": "3014476", "username": "季节娇气" }, { "avatar": "http://img.7799520.com/8c328b6a-f34a-4d91-a869-10f6e47627e9", "birthdayyear": "1992", "city": "深圳", "education": "初中", "gender": "2", "height": "158", "marry": "未婚", "monolog": "愿得一人心,白首不分离我微信号chen123456qing", "monologflag": "-1", "province": "广东", "salary": "5千-1万", "userid": "3013067", "username": "音响回眸勤奋" }, { "avatar": "http://img.7799520.com/9f74fb99444547a1408575c346008f22ac4bb1f7-2019-11-25-15746785901574678589876-kHZrSfnc.png", "birthdayyear": "1992", "city": "济南", "education": "大专", "gender": "2", "height": "160", "marry": "未婚", "monolog": "也许我很平凡,但是我绝不缺乏生活的热情和生命的梦想,也许我会孤单,但是我会一路找寻你的踪迹。遇见你,将是我生命中最绚烂的时刻。", "monologflag": "1", "province": "山东", "salary": "5千-1万", "userid": "3009076", "username": "骄傲的猫大王" }, { "avatar": "http://img.7799520.com/7da0c781-3115-467f-9fcc-d46d2aa1bb4a", "birthdayyear": "1994", "city": "国外", "education": "高中", "gender": "2", "height": "155", "marry": "未婚", "monolog": "我有一壶酒,足以慰风尘", "monologflag": "1", "province": "国外", "salary": "2千-5千", "userid": "3007139", "username": "墨染." }, { "avatar": "http://img.7799520.com/2019-11-24-1574575893-JYE0Y9nz.png", "birthdayyear": "1994", "city": "北海", "education": "大专", "gender": "2", "height": "157", "marry": "未婚", "monolog": "愿得一人心,白首不相离,非会员哦,所以很多信息都看不到呢,抱歉", "monologflag": "1", "province": "广西", "salary": "5千-1万", "userid": "3006914", "username": "蔓鲸" }, { "avatar": "http://img.7799520.com/2019-11-24-1574565615-2p6Q37YC.png", "birthdayyear": "1995", "city": "广州", "education": "本科", "gender": "2", "height": "160", "marry": "未婚", "monolog": "如果在一起是因为合适,那希望是合适一辈子。", "monologflag": "1", "province": "广东", "salary": "5千-1万", "userid": "3006237", "username": "长颈鹿向淡淡" }, { "avatar": "http://img.7799520.com/4c69af45f1f9763bc33b7322cd025c90157a93b9-2019-11-23-15745152791574515278714-5F2a7dhi.png", "birthdayyear": "1997", "city": "上海", "education": "大专", "gender": "2", "height": "158", "marry": "未婚", "monolog": "好看的皮囊千篇一律,有趣的灵魂万里挑一。。。", "monologflag": "1", "province": "上海", "salary": "1万-2万", "userid": "3004596", "username": "solely" }, { "avatar": "http://img.7799520.com/aaf297dd-af30-48de-8027-5c7e57ec2cdc", "birthdayyear": "1993", "city": "深圳", "education": "高中", "gender": "2", "height": "155", "marry": "未婚", "monolog": "在现在快节奏的社会,忙碌的工作之余,希望有个知心人陪伴,偶尔逛街,看电影吃饭,一起旅游,运动,分享彼此的喜怒哀乐,希望相互欣赏,包容,理解。我认为最好的爱情莫过于为彼此成为最好的自己,成为最默契的搭档,一起发现这个世界的美好。", "monologflag": "1", "province": "广东", "salary": "5千-1万", "userid": "3003499", "username": "一木木" }, { "avatar": "http://img.7799520.com/2019-11-22-1574436265-oOHCA0Pi.png", "birthdayyear": "1991", "city": "上海", "education": "高中", "gender": "2", "height": "153", "marry": "未婚", "monolog": "爱吃西瓜的跳舞女少年?", "monologflag": "-2", "province": "上海", "salary": "5千-1万", "userid": "3001594", "username": "西瓜西瓜瓜" }, { "avatar": "http://img.7799520.com/6351f7c2-734d-484f-95ae-7881b3b65132", "birthdayyear": "1996", "city": "南昌", "education": "中专", "gender": "2", "height": "158", "marry": "未婚", "monolog": "事事有回应,渐渐有着落", "monologflag": "1", "province": "江西", "salary": "2千-5千", "userid": "2999190", "username": "977" }, { "avatar": "http://img.7799520.com/bc692905b97d0deeb6df0f73356d3de82b1d6261-2019-11-23-15745076641574507664470-zV4AFL8O.png", "birthdayyear": "1990", "city": "成都", "education": "大专", "gender": "2", "height": "156", "marry": "未婚", "monolog": "在成都的东北人!照片是很多年前的了。不喜欢拍照所以没有现在的照片!我身高155体重42公斤。不喜欢:属羊的男生,最好不抽烟不喝酒!我属蛇天蝎座♏️", "monologflag": "1", "province": "四川", "salary": "2千-5千", "userid": "2998289", "username": "水壶苦恋无语" }, { "avatar": "http://img.7799520.com/7384954e-5c0d-4a5c-92c6-4493ba1be3d4", "birthdayyear": "1995", "city": "苏州", "education": "大专", "gender": "2", "height": "160", "marry": "未婚", "monolog": "嗨 你好 能带给我一份超大杯快乐嘛", "monologflag": "1", "province": "江苏", "salary": "2千-5千", "userid": "2991868", "username": "小呀么小静静" }, { "avatar": "http://img.7799520.com/FlUJTeR0REKbLhtoR5RNVeuOXRy1", "birthdayyear": "1992", "city": "苏州", "education": "大专", "gender": "2", "height": "160", "marry": "未婚", "monolog": "爱好看动漫和小说,比较宅,做事喜欢有计划,喜欢独处,自在。理想伴侣就是要有稳定的工作。。。。", "monologflag": "1", "province": "江苏", "salary": "2千-5千", "userid": "2989769", "username": "青一木" }, { "avatar": "http://img.7799520.com/edbb6516-2b07-401e-b56e-6aee6c2620ca", "birthdayyear": "1994", "city": "巴中", "education": "高中", "gender": "2", "height": "156", "marry": "未婚", "monolog": "我是找对象的,感觉我还行的,可以加JC718829", "monologflag": "-1", "province": "四川", "salary": "5千-1万", "userid": "2989629", "username": "星愿回首悲凉" }, { "avatar": "http://img.7799520.com/e648d317faacffb4f03b1ca31fdbed2b4c6ec5e4-2019-11-25-15746776401574677640473-C6Z1QX0K.png", "birthdayyear": "1990", "city": "深圳", "education": "初中", "gender": "2", "height": "159", "marry": "未婚", "monolog": "愿得一人心,白首不相离", "monologflag": "1", "province": "广东", "salary": "2千-5千", "userid": "2988102", "username": "兰玛珊蒂" } ], "num": 20, "page": 1 }, "error_code": 0}
*/
我们可以分析这个结构来获取自己需要的信息。
如果使用过爬虫一般都会觉得Python的爬虫是非常简单的,正如标题所言,只需要10行代码,代码如下:
import requests #导入request包
dir = 'C:/Users/zaxwz/Desktop/xqImg/' #用来存储图片的文件夹路径
#图片的url,我这里page没给参数,为了方便后面换页
url = 'http://www.7799520.com/api/user/pc/list/search?startage=21&endage=30&gender=2&startheight=151&endheight=160&marry=1&salary=3&page='
#用循环,爬取40页的美女
for i in range(40):
#其返回值为json数据,直接获取其json字典
jsonData = requests.get(url + str(i+1)).json()
#通过jsonData['data']['list']获取美女列表
for j in jsonData['data']['list']:
#其中j['avatar']为图片网址
imgUrl = j['avatar']
#发送网络请求
resp = requests.get(imgUrl)
#创建图片文件,并将流写入图片
img = open(dir + j['username'] + '.jpg', 'wb')
img.write(resp.content)
这样爬取美女图片就完成了,去掉注释的话正好是10行代码。爬取图片如下: