使用python按类别汇总最近12个月的数据框中的数据

内容来源于 Stack Overflow,并遵循CC BY-SA 3.0许可协议进行翻译与使用

  • 回答 (1)
  • 关注 (0)
  • 查看 (237)

我正在尝试创建过去12个月(不包括当前月份)的分类数据摘要。我用以下代码总结了前3个月的情况,但在12个月内这样做似乎很麻烦。我想知道过去12个月是否有更高效和更有效的动态分片数据的方法。df1是我使用SQL查询从数据库连接加载的完整数据集。我使用.drop()来分割不需要的数据列,只留下计数。

import pandas as pd
import datetime

df1.Start_Date = pd.DatetimeIndex(df1.Start_Date)

today = datetime.date.today()
currentfirst = today.replace(day=1)
thirdMonth = currentfirst - pd.offsets.MonthBegin(3)
secondMonth = currentfirst - pd.offsets.MonthBegin(2)
firstMonth = currentfirst - pd.offsets.MonthBegin(1)

fst_label = firstMonth.strftime('%B')
snd_label = secondMonth.strftime('%B')
thd_label = thirdMonth.strftime('%B')

def monthly_vol(df, label, start_date, end_date):
    """Slices df1 into previous months and sums the volume of each change class."""
    if start_date is not None:
        df = df1[df1.Start_Date >= start_date]
    if end_date is not None:
        df = df[df.Start_Date < end_date]
    df_count = df.groupby('Change Class').count().drop(['Start_Date', 'Risk Level', 'Change Coordinator', 'Change Coordinator Group'], axis=1)
    return df_count

fst_month = monthly_vol(df1, fst_label, firstMonth, currentfirst)
snd_month = monthly_vol(df1, snd_label, secondMonth, firstMonth)
thd_month = monthly_vol(df1, thd_label, thirdMonth, secondMonth)

def month_merge(df1, df2, df3):
    """Merges monthly dataframes together."""
    new_df = pd.merge(df1, df2, left_index=True, right_index=True).merge(df3, left_index=True, right_index=True)
    new_df.columns = [fst_label, snd_label, thd_label]
    print(new_df)
    return new_df

monthly_vol = month_merge(fst_month, snd_month, thd_month)

这将给出输出:

              May  April  March
Change Class                   
Emergency      36     36     32
Expedited      17     24     35
Normal        182    146    134
Standard      256    210    267

奖金问题:在相同的数据框中获得每个类别的总体积的平均值是很好的。有点像这样:

              May  MayAVG  April  AprilAVG   March  MarchAVG
Change Class                   
Emergency      36   7.33   36     8.65       32     6.84
Expedited      17   3.46   24     5.77       35     7.48
Normal        182   37.07  146    35.10      134    28.63
Standard      256   52.14  10     50.48      267    57.05
提问于
用户回答回答于
month_dict = {"3": "March", "2": "April", "1": "May"} 

thirdMonth = currentfirst - pd.offsets.MonthBegin(3)
secondMonth = currentfirst - pd.offsets.MonthBegin(2)
firstMonth = currentfirst - pd.offsets.MonthBegin(1)


label_dict = {}

fst_label = firstMonth.strftime('%B')
snd_label = secondMonth.strftime('%B')
thd_label = thirdMonth.strftime('%B')

vol_month = {}

fst_month = monthly_vol(df1, fst_label, firstMonth, currentfirst)
snd_month = monthly_vol(df1, snd_label, secondMonth, firstMonth)
thd_month = monthly_vol(df1, thd_label, thirdMonth, secondMonth)

扫码关注云+社区

领取腾讯云代金券