首页
学习
活动
专区
圈层
工具
发布
首页
学习
活动
专区
圈层
工具
MCP广场
社区首页 >问答首页 >熊猫数据框架计算3年来相同月份的平均分等,并附加在同一数据框架中。

熊猫数据框架计算3年来相同月份的平均分等,并附加在同一数据框架中。
EN

Stack Overflow用户
提问于 2020-03-05 09:14:53
回答 2查看 64关注 0票数 0

我有一个这样的熊猫数据框架:

代码语言:javascript
运行
复制
Account Id  Gross Sum   Invoice Type Name      Net Sum Company     Security         Supplier Date Completed YearMonth   Category
710830      282.81      Invoice                282.81              asd5a            Abc      1/1/2018       2018-1      Postal
445800      4868.71     Invoice                3926.4              adc6ac           Def      1/1/2018       2018-1      R&D
710350      282.81      Invoice                282.81              fgn6             Ghi      2/9/2018       2018-2      Other
710510      282.81      Invoice                282.81              dg               jkl      2/9/2018       2018-2      Electricity
710630      841.59      Invoice                707.07              dfvbfbf          mno      3/2/2018       2018-3      Repairs
710610      841.59      Invoice                707.07              rrcv             pqr      3/2/2018       2018-3      Leasing
710810      12.14       Invoice                10.12               btbfd            stu      1/1/2019       2019-1      Telephone
704300      81517.6     Invoice                65740               dfbtt            vwx      1/1/2019       2019-1      Statutory
710510      2105.64     Invoice                1776.53             dfdftb5          dfb      2/9/2019       2019-2      Electricity
710510      2105.64     Invoice                1776.53             ebdfb5b          bcd      2/9/2019       2019-2      Electricity
710920      66.96       Invoice                54                  dfrrt65          efg      3/2/2019       2019-3      Data
700330      239.47      Invoice                239.47              aae3a11          hij      3/2/2019       2019-3      Coffee

我想要的是在数据帧的底部添加行,该数据框架计算了连续3年的同月平均值。

例如:对于year month 2020-1,计算应该是for 2020-1 = sum(Net Sum Company) In 2019-1 + sum(Net Sum Company) in 2018-1 + sum(Net Sum Company) In 2017-1 divided by the number of months considered i.3,所以只需要考虑最后三年。这样,我将得到平均值,并在底部添加与新行相同的内容,其中只有年、月和公司净收入栏的平均值。最终目标是获得这样的数据框架:

代码语言:javascript
运行
复制
    Account Id  Gross Sum   Invoice Type Name      Net Sum Company     Security         Supplier Date Completed YearMonth   Category
710830      282.81      Invoice                282.81              asd5a            Abc      1/1/2018       2018-1      Postal
445800      4868.71     Invoice                3926.4              adc6ac           Def      1/1/2018       2018-1      R&D
710350      282.81      Invoice                282.81              fgn6             Ghi      2/9/2018       2018-2      Other
710510      282.81      Invoice                282.81              dg               jkl      2/9/2018       2018-2      Electricity
710630      841.59      Invoice                707.07              dfvbfbf          mno      3/2/2018       2018-3      Repairs
710610      841.59      Invoice                707.07              rrcv             pqr      3/2/2018       2018-3      Leasing
710810      12.14       Invoice                10.12               btbfd            stu      1/1/2019       2019-1      Telephone
704300      81517.6     Invoice                65740               dfbtt            vwx      1/1/2019       2019-1      Statutory
710510      2105.64     Invoice                1776.53             dfdftb5          dfb      2/9/2019       2019-2      Electricity
710510      2105.64     Invoice                1776.53             ebdfb5b          bcd      2/9/2019       2019-2      Electricity
710920      66.96       Invoice                54                  dfrrt65          efg      3/2/2019       2019-3      Data
700330      239.47      Invoice                239.47              aae3a11          hij      3/2/2019       2019-3      Coffee
-              -           -                   34979.66            -                -        -              2020-1      -
-              -           -                   2059.34             -                -        -              2020-2      -
-              -           -                   853.805             -                -        -              2020-3      -

我是新来的熊猫,所以任何指导都是非常感谢的。这必须严格使用熊猫。

EN

回答 2

Stack Overflow用户

回答已采纳

发布于 2020-03-05 10:57:56

IIUC,你想:

  • 查找每月数据中的数据
  • 和前3年的Net Sum Company
  • 除以每个月的月数(示例中的2个月),以获得一个月平均
  • ,并将这些平均值添加到数据中,包括新年和YearMonthH 210f 211

代码可以是:

代码语言:javascript
运行
复制
# extract Year and Month Series from the dataframe
year = df['YearMonth'].str.slice(stop=4).astype(int)
month = df['YearMonth'].str.slice(start=5)

# compute the new year per month as max(year) + 1
newyear_month = year.groupby(month).max() + 1

# build a Series aligned with the dataframe from that new year
newyear = pd.DataFrame(month).merge(
    pd.DataFrame(newyear_month),
    left_on='YearMonth', right_index=True, suffixes=('_x', '')
    )['YearMonth'].sort_index()

# compute the sum of relevant years per month
tmp = df.loc[(newyear-3 <= year) & (year <= newyear-1), 'Net Sum Company'
             ].groupby(month).sum()

# divide by the number of distinct month per sum
tmp /= df.groupby(month)['YearMonth'].nunique()

# compute a YearMonth column for that new dataframe
tmp = pd.concat([newyear_month.astype(str), tmp], axis=1)
tmp['YearMonth'] = tmp['YearMonth'] + '-' + tmp.index  # tmp is indexed by month

# force the type of Account Id to object to allow it to contain null values
df['Account Id'] = df['Account Id'].astype(object)

# concat the new rows to the dataframe and reset the index
new_df = df.append(tmp, sort=False).reset_index(drop=True)

在您的示例中,new_df给出:

代码语言:javascript
运行
复制
   Account Id  Gross Sum Invoice Type Name  Net Sum Company Security Supplier Date Completed YearMonth     Category
0      710830     282.81           Invoice          282.810    asd5a      Abc       1/1/2018    2018-1       Postal
1      445800    4868.71           Invoice         3926.400   adc6ac      Def       1/1/2018    2018-1          R&D
2      710350     282.81           Invoice          282.810     fgn6      Ghi       2/9/2018    2018-2        Other
3      710510     282.81           Invoice          282.810       dg      jkl       2/9/2018    2018-2  Electricity
4      710630     841.59           Invoice          707.070  dfvbfbf      mno       3/2/2018    2018-3      Repairs
5      710610     841.59           Invoice          707.070     rrcv      pqr       3/2/2018    2018-3      Leasing
6      710810      12.14           Invoice           10.120    btbfd      stu       1/1/2019    2019-1    Telephone
7      704300   81517.60           Invoice        65740.000    dfbtt      vwx       1/1/2019    2019-1    Statutory
8      710510    2105.64           Invoice         1776.530  dfdftb5      dfb       2/9/2019    2019-2  Electricity
9      710510    2105.64           Invoice         1776.530  ebdfb5b      bcd       2/9/2019    2019-2  Electricity
10     710920      66.96           Invoice           54.000  dfrrt65      efg       3/2/2019    2019-3         Data
11     700330     239.47           Invoice          239.470  aae3a11      hij       3/2/2019    2019-3       Coffee
12        NaN        NaN               NaN        34979.665      NaN      NaN            NaN    2020-1          NaN
13        NaN        NaN               NaN         2059.340      NaN      NaN            NaN    2020-2          NaN
14        NaN        NaN               NaN          853.805      NaN      NaN            NaN    2020-3          NaN

备注:

每月查找新年允许在滚动年份(例如2017年7月至2019年6月)使用代码,可以用new_df = new_df.fillna('')替换为空字符串(或其他什么)的

票数 1
EN

Stack Overflow用户

发布于 2020-03-05 09:46:25

对于一个简单的3y滚动平均值,可以这样做:

代码语言:javascript
运行
复制
df1['Date Completed'] = pd.to_datetime(df1['Date Completed'])
df1['roll_3y_avg'] = df1.rolling(window='1096D', on='Date Completed', closed='right')['Net Sum Company'].mean()
票数 1
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/60541755

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档