第五篇爬虫技术之天天基金网（1）实战篇

python编程从入门到实践

发布于 2019-10-22 16:42:06

1.7K0

发布于 2019-10-22 16:42:06

文章被收录于专栏：python编程军火库

hello,各位小伙伴，大家好，今天我们来看看如何使用使用前面介绍过的知识来获取一下天天基金网的少量信息，起到抛砖引玉的作用。

好了，几天我们来简单的获取一下主页站点的信息。

我们先看看天天基金网的官网的样子：

好了，我们先上代码：

# -*- encoding: utf-8 -*-
# !/usr/bin/python
"""
@File    : day_day_scrapy_day1.py
@Time    : 2019/9/8 14:54
@Author  : haishiniu
@Software: PyCharm
"""
import requests
import logging
from pyquery import PyQuery as pq

def main():
    """
    enter
    :return: 
    """
    try:
        # 主页的url
        main_url = 'http://fund.eastmoney.com/'
        # 使用get请求来获取网页上述主站的url信息
        # 添加timeout参数 主要是为了防止请求发出没有正常反馈造成阻塞，请注意，这一点很重要！重要！重要！
        # content方法是转化请求回来的数据流
        main_response = requests.get(url=main_url, timeout=60).content
        # pq 是用来把请求回来的数据流转化成pq对象，方便后续获取后续元素
        response_info = pq(main_response)
        print response_info('title').text()  # 天天基金网(1234567.com.cn) --首批独立基金销售机构-- 东方财富网旗下基金平台!
        print response_info('#setHome').text()  # 设为首页
    except Exception as ex:
        logging.exception(str(ex))
        return []


if __name__ == '__main__':
    main()

好了我们看一下response_info = pq(main_response) 获取的html的信息，由于地方有限，我们截取一段信息来看一下：