我正在尝试按类别创建过去12个月(不包括当前月份)的数据摘要。我已经用以下代码总结了前3个月,但这样做12个月似乎很麻烦。我想知道是否有更高效和有效的方法来动态切片过去12个月的数据。df1是我使用SQL查询从DB连接加载的完整数据集。我使用.drop()来切出不需要的数据列,只留下计数。
import pandas as pd
import datetime
df1.Start_Date = pd.DatetimeIndex(df1.Start_Date)
today = datetime.date.today()
currentfirst = today.replace(day=1)
thirdMonth = currentfirst - pd.offsets.MonthBegin(3)
secondMonth = currentfirst - pd.offsets.MonthBegin(2)
firstMonth = currentfirst - pd.offsets.MonthBegin(1)
fst_label = firstMonth.strftime('%B')
snd_label = secondMonth.strftime('%B')
thd_label = thirdMonth.strftime('%B')
def monthly_vol(df, label, start_date, end_date):
"""Slices df1 into previous months and sums the volume of each change class."""
if start_date is not None:
df = df1[df1.Start_Date >= start_date]
if end_date is not None:
df = df[df.Start_Date < end_date]
df_count = df.groupby('Change Class').count().drop(['Start_Date', 'Risk Level', 'Change Coordinator', 'Change Coordinator Group'], axis=1)
return df_count
fst_month = monthly_vol(df1, fst_label, firstMonth, currentfirst)
snd_month = monthly_vol(df1, snd_label, secondMonth, firstMonth)
thd_month = monthly_vol(df1, thd_label, thirdMonth, secondMonth)
def month_merge(df1, df2, df3):
"""Merges monthly dataframes together."""
new_df = pd.merge(df1, df2, left_index=True, right_index=True).merge(df3, left_index=True, right_index=True)
new_df.columns = [fst_label, snd_label, thd_label]
print(new_df)
return new_df
monthly_vol = month_merge(fst_month, snd_month, thd_month)
这将给出输出:
May April March
Change Class
Emergency 36 36 32
Expedited 17 24 35
Normal 182 146 134
Standard 256 210 267
奖励问题:如果能得到同一数据帧中每个类别的总成交量的平均值,那就太好了。有点像这样:
May MayAVG April AprilAVG March MarchAVG
Change Class
Emergency 36 7.33 36 8.65 32 6.84
Expedited 17 3.46 24 5.77 35 7.48
Normal 182 37.07 146 35.10 134 28.63
Standard 256 52.14 10 50.48 267 57.05
任何帮助都将不胜感激!
发布于 2018-06-07 03:59:11
你为什么不试着用字典呢?字典是数据的键值对。例如:{"3":“三月”,"4":“四月”}。因此,无论您在何处维护一个对,您都可以使用字典。在循环中填充这些字典。见下文。
month_dict = {"3": "March", "2": "April", "1": "May"}
thirdMonth = currentfirst - pd.offsets.MonthBegin(3)
secondMonth = currentfirst - pd.offsets.MonthBegin(2)
firstMonth = currentfirst - pd.offsets.MonthBegin(1)
label_dict = {}
fst_label = firstMonth.strftime('%B')
snd_label = secondMonth.strftime('%B')
thd_label = thirdMonth.strftime('%B')
vol_month = {}
fst_month = monthly_vol(df1, fst_label, firstMonth, currentfirst)
snd_month = monthly_vol(df1, snd_label, secondMonth, firstMonth)
thd_month = monthly_vol(df1, thd_label, thirdMonth, secondMonth)
https://stackoverflow.com/questions/50728255
复制相似问题