我们有数据表示用户的惩罚计数有NaN,随时间变化(值只上升)。以下是数据的子集:
import pandas as pd
import numpy as np
d = {'day':['Monday','Monday','Monday','Tuesday','Tuesday','Tuesday','Wednesday','Thursday','Thursday','Friday'],
'user_id': [1, 4,2,4,4,2,2,1,2,1], 'penalties_count': [1, 3,2,np.nan,4,2,np.nan,2,3,3]}
df = pd.DataFrame(data=d)
display(df)
day user_id penalties_count
0 Monday 1 1.0
1 Monday 4 3.0
2 Monday 2 2.0
3 Tuesday 4 NaN
4 Tuesday 4 4.0
5 Tuesday 2 2.0
6 Wednesday 2 NaN
7 Thursday 1 2.0
8 Thursday 2 3.0
9 Friday 1 3.0目标是用以前的值填充NaN单元,但只用于特定的user_id。其结果应该是:
day user_id penalties_count
0 Monday 1 1.0
1 Monday 4 3.0
2 Monday 2 2.0
3 Tuesday 4 3.0
4 Tuesday 4 4.0
5 Tuesday 2 2.0
6 Wednesday 2 2.0
7 Thursday 1 2.0
8 Thursday 2 3.0
9 Friday 1 3.0但当我用
df.fillna(method='bfill')对于user_id=4的第4行中的结果是不正确的(我们应该在这里看到3,而不是4):
day user_id penalties_count
0 Monday 1 1.0
1 Monday 4 3.0
2 Monday 2 2.0
3 Tuesday 4 4.0
4 Tuesday 4 4.0
5 Tuesday 2 2.0
6 Wednesday 2 2.0
7 Thursday 1 2.0
8 Thursday 2 3.0
9 Friday 1 3.0怎样才能解决这个问题?
发布于 2022-06-05 09:51:54
如果要按组填充NA,则需要在填充NA之前先使用groupby。而且,您似乎需要ffill,但不需要bfill。就像df.groupby("user_id")["penalties_count"].ffill()
https://stackoverflow.com/questions/72506214
复制相似问题