Python爬虫实例

# 概述

使用爬虫技术,爬取模板之家的模板,下载全站模板。

# 环境

- Python 3.5.4

- selenium:使用非常简洁方便的API让你去使用像Firefox, IE, Chrome, Remote等等这样的Selenium WebDrivers(Selenium web驱动器)

- BeautifulSoup:从HTML或XML文件中提取数据的Python库

# 源码

```

frombs4importBeautifulSoup

fromseleniumimportwebdriver

chrome_options = webdriver.ChromeOptions()

prefs = {

# 'profile.managed_default_content_settings.images': 2,

'download.default_directory':'F:\\soueces'

}

chrome_options.add_experimental_option("prefs", prefs)

browser = webdriver.Chrome(chrome_options=chrome_options)

defget_download_links(url):

browser.get(url)

html = browser.page_source

soup = BeautifulSoup(html,'lxml')

items = soup.select('.thumbItem > li > a')

links = []

foriteminitems:

links.append(link)

returnlinks

defdownload(links):

forlinkinlinks:

browser.get(link)

ele = browser.find_element_by_class_name('btn-down')

ele.click()

defmain():

browser.get(base_url %1)

html = browser.page_source

soup = BeautifulSoup(html,'lxml')

ele = soup.select('.tagsPage > form > font')

pages = ele[1].get_text()# 当前分类公有pages页

i =2

whilei

url = base_url % i

links = get_download_links(url)

download(links)

i = i +1

main()

```

  • 发表于:
  • 原文链接https://kuaibao.qq.com/s/20180704G0LBSD00?refer=cp_1026
  • 腾讯「腾讯云开发者社区」是腾讯内容开放平台帐号(企鹅号)传播渠道之一,根据《腾讯内容开放平台服务协议》转载发布内容。
  • 如有侵权,请联系 cloudcommunity@tencent.com 删除。

扫码关注云+社区

领取腾讯云代金券