前往小程序,Get更优阅读体验!
立即前往
首页
学习
活动
专区
工具
TVP
发布
社区首页 >专栏 >baidumap

baidumap

作者头像
天道Vax的时间宝藏
发布2021-08-11 15:29:41
2170
发布2021-08-11 15:29:41
举报
代码语言:javascript
复制
import requests
from lxml import etree
import csv
import json

# address = '上海'
# par = {'address': address, 'key': 'cb649a25c1f81c1451adbeca73623251'}
# base = 'http://restapi.amap.com/v3/geocode/geo'
# response = requests.get(base, par)
# print(response.text)

fp = open('C://Users/LP/Desktop/map.csv','wt',newline='',encoding='utf-8')
writer = csv.writer(fp)
writer.writerow(('address','longitude','latitude'))

headers = {
    'User-Agent':'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/53.0.2785.143 Safari/537.36'
}

def get_user_url(url):
    url_part = 'http://www.qiushibaike.com'
    res = requests.get(url,headers=headers)
    selector = etree.HTML(res.text)
    url_infos = selector.xpath('//div[@class="article block untagged mb15"]')
    for url_info in url_infos:
        user_part_urls = url_info.xpath('div[1]/a[1]/@href')
        if len(user_part_urls) == 1:
            user_part_url = user_part_urls[0]
            get_user_address(url_part + user_part_url)
        else:
            pass

def get_user_address(url):
    res = requests.get(url, headers=headers)
    selector = etree.HTML(res.text)
    if selector.xpath('//div[2]/div[3]/div[2]/ul/li[4]/text()'):
        address = selector.xpath('//div[2]/div[3]/div[2]/ul/li[4]/text()')
        get_geo(address[0].split(' · ')[0])
    else:
        pass

def get_geo(address):
    par = {'address': address, 'key': 'cb649a25c1f81c1451adbeca73623251'}
    api = 'http://restapi.amap.com/v3/geocode/geo'
    res = requests.get(api, par)
    json_data = json.loads(res.text)
    try:
        geo = json_data['geocodes'][0]['location']
        longitude = geo.split(',')[0]
        latitude = geo.split(',')[1]
        writer.writerow((address,longitude,latitude))
    except IndexError:
        pass

if __name__ == '__main__':
    urls = ['http://www.qiushibaike.com/text/page/{}/'.format(str(i)) for i in range(1, 36)]
    for url in urls:
        get_user_url(url)
本文参与 腾讯云自媒体同步曝光计划,分享自作者个人站点/博客。
原始发表:2021-02-28 ,如有侵权请联系 cloudcommunity@tencent.com 删除

本文分享自 作者个人站点/博客 前往查看

如有侵权,请联系 cloudcommunity@tencent.com 删除。

本文参与 腾讯云自媒体同步曝光计划  ,欢迎热爱写作的你一起参与!

评论
登录后参与评论
0 条评论
热度
最新
推荐阅读
领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档