两个微型的函数例子

1846122963

发布于 2018-03-09 15:35:41

8080

发布于 2018-03-09 15:35:41

hello小伙伴们大家好，还记得昨天的函数参数的文章吗？你get到了多少呢？实际的工作中并没有那么复杂，一个通用的函数定义形如：

def func_name(*args, **kwargs):
    pass

这样一个函数基本上就可以通吃了。之所以介绍那么多，就是让大家多了解一点，没有别的意思。

今天主要给大家介绍一个库的基本使用，那就是标准库urllib。在Python2.x中，是urllib2库，在Python3.x中，urllib2库被重命名为urllib，并且被分割成了几个子模块：urllib.request，urllib.parse，urllib.error。

urllib是python的标准库，我们不需要安装额外的库就可以使用它。它包含了很多方法，用来请求数据、处理cookies，甚至是改变元数据，如headers或用户客户端。

urlopen被用来打开远程网络上的一个对象并读取它，它可以用来读取HTML文件，图片文件或其他文件流。

urllib简单使用：

In[1]: from urllib.request import urlopen

In[2]: html = urlopen("http://pythonscraping.com/pages/page1.html")

In[3]: print(html.read()) 
b'<html>\n<head>\n<title>A Useful Page</title>\n</head>\n
<body>\n<h1>An Interesting Title</h1>\n
<div>\nLorem ipsum dolor sit amet, consectetur adipisicing elit, 
sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. 
Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris 
nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in 
reprehenderit in voluptate velit esse cillum dolore eu fugiat 
nulla pariatur. Excepteur sint occaecat cupidatat non proident, 
sunt in culpa qui officia deserunt mollit anim id est laborum.\n</div>\n
</body>\n</html>\n'

接下来写一个简单的函数，以复习昨天学过的内容。下面的脚本主要是爬取一个网页，获取该网页的title，非常的简单，简单到令人发指。代码如下：

# -*- coding: utf-8 -*-

from urllib.request import urlopen
from urllib.error import HTTPError
from bs4 import BeautifulSoup
# 如果没有安装bs4，请先安装之
# sudo pip install bs4


def get_title(url):
    try:
        html = urlopen(url)
    except HTTPError as e:
        print(e)
        return None
    
    try:
        bs_obj = BeautifulSoup(html.read(), 'html.parser')
        title = bs_obj.body.h1.text
    except AttributeError as e:
        return None
    
    return title

url = 'http://lavenliu.cn/post/test01.html'
title = get_title(url)
if title is None:
    print('Title could not be found')
else:
    print(title)

该脚本涉及到了两个模块：

urllib（标准模块）
bs4（第三方模块，需要安装）

接下来一个例子是查询IP地址的归属地信息，代码如下：

# -*- coding: utf-8 -*-

import sys
import argparse
import json
from urllib.request import urlopen

parser = argparse.ArgumentParser()
parser.add_argument('--ip-list-file', '-f',
                    action='store',
                    dest='ip_list_file',
                    help='file contains ip(s) which one ip per line to query')
parser.add_argument('--ip', '-i',
                    action='store',
                    dest='ip',
                    help='single ip to query')
results = parser.parse_args()

if len(sys.argv) < 2:
    print('You must enter at least one ip in cmd-line!')
    print('Usage: {} ip | ip_list_file'.format(sys.argv[0]))
    print('eg: {} --ip aaa.bbb.ccc.ddd | -f iplist'.format(sys.argv[0]))
    sys.exit(1)


def get_country(ip_address):
    url = 'http://freegeoip.net/json/'
    resp_json = urlopen(url+ip_address).read().decode('utf-8')
    resp_dict = json.loads(resp_json)
    
    if resp_dict['region_name'] and resp_dict['city']:
        return "[{}]: {}-{}-{}".format(
            resp_dict['ip'],
            resp_dict['country_name'],
            resp_dict['region_name'],
            resp_dict['city'])
    else:
        return "[{}]: {}".format(resp_dict['ip'], resp_dict['country_name'])


if results.ip_list_file:
    try:
        with open(results.ip_list_file) as f:
            ip_addresses = f.readlines()
        for ip_address in ip_addresses:
            print(get_country(ip_address.strip()))
    except FileNotFoundError:
        print('No such file: {}'.format(results.ip_list_file))
    except PermissionError:
        print('Permission denied: {}'.format(results.ip_list_file))
else:
    print(get_country(results.ip))

该脚本需要接收一个参数，要么是传入一个单个的IP地址，要么传入一个包含很多IP地址文件。该脚本的运行结果为：

$ python3 getip.py --ip-list-file iplist
[101.81.26.144]: China-Shanghai-Shanghai
[110.110.53.112]: China-Beijing-Beijing
[111.10.118.221]: China-Chongqing-Chongqin
[111.128.107.62]: China-Beijing-Beijing
[111.128.111.60]: China-Beijing-Beijing
[111.13.44.158]: China
...
[111.20.163.186]: China-Shaanxi-Xi'an
[111.41.44.23]: China-Heilongjiang-Jixi
[111.47.8.170]: China-Hubei-Chengzhong

ip文件内容为：

$ cat iplist
101.81.26.144
110.110.53.112
111.10.118.221
111.11.227.76
111.12.251.10
111.12.251.11
111.128.107.62
111.128.111.60
111.13.44.158
111.14.199.105
111.14.237.193
111.143.204.96
111.14.40.137
111.14.50.80
111.145.1.177
111.145.199.6
111.19.59.123
111.20.129.238
111.20.163.186
111.22.5.206
111.26.219.65
111.27.142.188
111.30.115.35
111.35.58.2
111.37.9.133
111.37.9.150
111.37.9.168
111.37.9.189
111.40.10.19
111.40.10.4
111.40.64.229
111.40.67.139
111.41.44.23
111.43.217.76
111.47.8.170
111.63.44.51
111.7.130.133
111.7.130.176
111.7.130.201
111.7.131.57
111.7.131.84
111.7.131.89

如果传入一个单个的IP地址呢？演示如下：

$ python3 getip.py --ip 58.246.245.18
[58.246.245.18]: China-Shanghai-Shanghai

这里用到了一些模块，这里我们并不打算介绍模块的具体使用方法，大家可以依葫芦画瓢，或者查看帮助手册，完全可以自学。

本文参与腾讯云自媒体同步曝光计划，分享自微信公众号。

原始发表：2017-11-01，如有侵权请联系 cloudcommunity@tencent.com 删除

python

本文分享自小白的技术客栈微信公众号，前往查看

如有侵权，请联系 cloudcommunity@tencent.com 删除。

本文参与腾讯云自媒体同步曝光计划，欢迎热爱写作的你一起参与！

python

登录后参与评论

0 条评论

热度

两个微型的函数例子

两个微型的函数例子

社区

活动

资源

关于

腾讯云开发者

热门产品

热门推荐

更多推荐