在《中国式放假与调休——如何计算平均发货时间?
》一文中提到过,中国特色的调休会导致很多的周分析失效,因此我们可以通过爬取调休信息来进行修正。
用的资源就是:百度搜索“万年历”,会有详细的信息,包括工作日调休与周末的上班调整:
源代码如下:
import requests
import json
import pandas as pd
import datetime
headers={
"Host": "sp0.baidu.com",
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv75.0) Gecko/20100101 Firefox/75.0",
"Accept": "*/*",
"Accept-Language": "zh-CN,zh;q=0.8,zh-TW;q=0.7,zh-HK;q=0.5,en-US;q=0.3,en;q=0.2",
"Accept-Encoding": "gzip, deflate, br",
"Referer": "https://www.baidu.com/baidu?tn=monline_7_dg&ie=utf-8&wd=%E4%B8%87%E5%B9%B4%E5%8E%86",
"Connection": "keep-alive",
"Cookie": "BAIDUID=987F272A156AC57C3F1EB4732E658EB6FG=1; BIDUPSID=987F272A156AC57C4CFA2D801DEF74A3; PSTM=1578566083; BDRCVFR[Hp1ap0hMjsC]=mk3SLVN4HKm; delPer=0; PSINO=2; H_PS_PSSID=1466_31326_21103; BDORZ=FFFB88E999055A3F8A630C64834BD6D0",
"Pragma": "no-cache",
"Cache-Control": "no-cache"
}
def get_local_date_str(origin_date_str):
utc_date = datetime.datetime.strptime(origin_date_str[0:10], "%Y-%m-%d")
local_date = utc_date + datetime.timedelta(days=1)
local_date_str = datetime.datetime.strftime(local_date ,'%Y-%m-%d')
return local_date_str
def get_info(year,month):
query=str(year)+'年'+str(month)+'月'
url='https://sp0.baidu.com/8aQDcjqpAAV3otqbppnN2DJv/api.php?query='+query+'&resource_id=39043&oe=gbk&tn=wisetpl'
response=requests.get(url,headers=headers)
jdata=json.loads(response.text)
data_list=jdata['data'][0]['almanac']
for tip in data_list:
origin_date_str=tip['oDate']
tip['oDate']=get_local_date_str(origin_date_str)
return data_list
count_num=0
data_list_all=[]
for year in range(2011,2021):
print(str(year)+"……")
for month in range(1,13):
data_list=get_info(year,month)
data_list_all+=data_list
df=pd.DataFrame(data_list_all)
df=df.drop_duplicates(subset=['oDate'])
df=df[df.year.isin([str(year)])]
if count_num==0:
df_concat=df
else:
df_concat=pd.concat([df_concat,df],axis=0,sort=False,ignore_index=True)
count_num+=1
df_concat.to_excel(r"C:\Users\学谦\Desktop\新建文件夹\year2011-2020.xlsx")
print(str(year)+"ok")
其实办法也很简单,打开浏览器的“检查”-网络,找到相应的链接和headers就可以。
本文分享自 PowerBI生命管理大师学谦 微信公众号,前往查看
如有侵权,请联系 cloudcommunity@tencent.com 删除。
本文参与 腾讯云自媒体同步曝光计划 ,欢迎热爱写作的你一起参与!