我有一个这样的数据帧:
index = [0,1,2,3,4,5]
s = pd.Series([1,1,1,2,2,2],index= index)
t = pd.Series([2007,2008,2011,2006,2007,2009],index= index)
f = pd.Series([2,4,6,8,10,12],index= index)
pp =pd.DataFrame(np.c_[s,t,f],columns = ["group","year","amount"])
pp
group year amount
0 1 2007 2
1 1 2008 4
2 1 2011 6
3 2 2006 8
4 2 2007 10
5 2 2009 12
我想在每组缺失的年份之间添加行。我想要的数据帧是这样的:
group year amount
0 1.0 2007 2.0
1 1.0 2008 4.0
2 1.0 2009 NaN
3 1.0 2010 NaN
4 1.0 2011 6
5 1.0 2006 8.0
6 2.0 2007 10.0
7 2.0 2008 NaN
8 2.0 2009 12.0
对于大型数据帧,有什么方法可以做到吗?
发布于 2018-07-24 04:04:13
第一次将年份更改为日期时间:
df.year = pd.to_datetime(df.year, format='%Y')
使用resample
的set_index
df.set_index('year').groupby('group').amount.resample('Y').mean().reset_index()
group year amount
0 1 2007-12-31 2.0
1 1 2008-12-31 4.0
2 1 2009-12-31 NaN
3 1 2010-12-31 NaN
4 1 2011-12-31 6.0
5 2 2006-12-31 8.0
6 2 2007-12-31 10.0
7 2 2008-12-31 NaN
8 2 2009-12-31 12.0
https://stackoverflow.com/questions/51486235
复制相似问题