我有一个客户列表,日期和分数:
import pandas as pd
import datetime as dt
import numpy as np
data = pd.DataFrame(
np.array(
[
["A", dt.datetime(2017, 12, 10), 10.0],
["A", dt.datetime(2018, 1, 10), 10.0],
["A", dt.datetime(2018, 1, 15), 11.0],
["A", dt.datetime(2018, 1, 16), 12.0],
["A", dt.datetime(2018, 1, 16), 13.0],
["B", dt.datetime(2018, 1, 16), 10.0],
["A", dt.datetime(2018, 3, 1), 10.0],
]
),
columns=["Customer", "Date", "Score", "Result"],
)
Customer Date Score
0 A 2017-12-10 00:00:00 10
1 A 2018-01-10 00:00:00 10
2 A 2018-01-15 00:00:00 11
3 A 2018-01-16 00:00:00 12
4 A 2018-01-16 00:00:00 13
5 B 2018-01-16 00:00:00 10
6 A 2018-03-01 00:00:00 10对于每个客户,我想计算过去14天(包括今天)的平均得分。结果应该如下所示:
Customer Date Score Result
0 A 2017-12-10 00:00:00 10 10
1 A 2018-01-10 00:00:00 10 10
2 A 2018-01-15 00:00:00 11 10.5
3 A 2018-01-16 00:00:00 12 11.5
4 A 2018-01-16 00:00:00 13 11.5
5 B 2018-01-16 00:00:00 10 10
6 A 2018-03-01 00:00:00 10 10谢谢!!
发布于 2020-08-31 21:59:29
在Customer上使用DataFrame.groupby,并在Score上使用14 days的窗口大小计算rolling平均值,然后使用DataFrame.merge将此滚动avg与数据帧data合并
avg = data.set_index('Date').groupby('Customer').rolling('14d')['Score'].mean()
avg = avg[~avg.index.duplicated(keep='last')]
df = data.merge(avg.rename('Result'), left_on=['Customer', 'Date'], right_index=True)结果:
print(df)
Customer Date Score Result
0 A 2017-12-10 10 10.0
1 A 2018-01-10 10 10.0
2 A 2018-01-15 11 10.5
3 A 2018-01-16 12 11.5
4 A 2018-01-16 13 11.5
5 B 2018-01-16 10 10.0
6 A 2018-03-01 10 10.0https://stackoverflow.com/questions/63672106
复制相似问题