Python爬虫简单实现下载图片

文章来源：企鹅号 - 四海解忧日记

准备工作

安装Python 3.6.4

安装Pycharm

设置Pycharm安装依赖包点加号进行添加

搜索requests进行安装

搜索lxml进行安装

准备从https://alpha.wallhaven.cc/网址爬取桌面图片

直接上代码：

importos

importrequests

importtime

importrandom

fromlxmlimportetree

#从控制台接受要搜索的图片

keyWord =input(f"{'输入要搜索的关键字:'}")

classSpider():

def__init__(self):

self.headers = {

#是通过Chrome的开发者工具获取的请求头部参数

"User-Agent":"Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.108 Safari/537.36",

}

self.filePath = ('E:/users/桌面壁纸/'+ keyWord +'/')

defcreat_File(self):

filePath =self.filePath

if notos.path.exists(filePath):

os.makedirs(filePath)

defget_pageNum(self):

total =""

url = ("https://alpha.wallhaven.cc/search?q={}&search_image=").format(

keyWord)

html = requests.get(url)

selector = etree.HTML(html.text)

pageInfo = selector.xpath('//header[@class="listing-header"]/h1[1]/text()')

string =str(pageInfo[])

#str.isdigit方法检测字符串是否只由数字组成。

#filter()函数用于过滤序列，过滤掉不符合条件的元素，返回由符合条件元素组成的新列表。

#list()方法用于将元组转换为列表。

numlist =list(filter(str.isdigit, string))

foriteminnumlist:

total += item

totalPagenum =int(total)

returntotalPagenum

defmain_fuction(self):

self.creat_File()

count =self.get_pageNum()

print("We have found:{} images!".format(count))

times =int(count /24)

j =1

foriinrange(times):

pic_Urls =self.getLinks(i +1)

foriteminpic_Urls:

self.download(item, j)

j +=1

defgetLinks(self, number):

url = ("https://alpha.wallhaven.cc/search?q={}&categories=111&purity=100&sorting=relevance&order=desc&page={}").format(

keyWord, number)

try:

html = requests.get(url)

selector = etree.HTML(html.text)

pic_Linklist = selector.xpath('//a[@class="jsAnchor thumb-tags-toggle tagged"]/@href')

exceptExceptionase:

print(repr(e))

returnpic_Linklist

defdownload(self, url, count):

string = url.strip('/thumbTags').strip('https://alpha.wallhaven.cc/wallpaper/')

html ='http://wallpapers.wallhaven.cc/wallpapers/full/wallhaven-'+ string +'.jpg'

pic_path = (self.filePath + keyWord +str(count) +'.jpg')

try:

pic = requests.get(html,headers=self.headers)

f =open(pic_path,'wb')

f.write(pic.content)

f.close()

print("Image:{} has been downloaded!".format(count))

time.sleep(random.uniform(,2))

exceptExceptionase:

print(repr(e))

spider = Spider()

spider.main_fuction()

其中有两个URL

第一个是：进入主页搜索一个

单词比如dog进入搜索结果页

然后

下拉到第二页复制URL

把参数p和page后的具体值用｛｝

替换

第二个是：进入主页搜索一个单词

比如dog然后进入搜索结果也修改

三个下拉搜索得到第二个URL

结果：

发表于: 2018-01-262018-01-26 17:10:53
原文链接：http://kuaibao.qq.com/s/20180126A0RBZQ00?refer=cp_1026
腾讯「腾讯云开发者社区」是腾讯内容开放平台帐号（企鹅号）传播渠道之一，根据《腾讯内容开放平台服务协议》转载发布内容。
如有侵权，请联系 cloudcommunity@tencent.com 删除。

扫码

添加站长进交流群

领取专属 10元无门槛券

私享最新 技术干货

Python爬虫简单实现下载图片

相关快讯

扫码

社区

活动

资源

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

Python爬虫 简单实现下载图片

相关快讯

社区

活动

资源

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

Python爬虫简单实现下载图片