我希望有人能帮我找出最好的方法来安排我的DataFrame来做散点图。散点图应该在x轴上有年份,在y轴上有外国玩家百分比。DataFrame大约有400行,为方便起见,我将显示具有较少值的头部。
我从一个更大的DataFrame的DataFrame开始:
df1 = df.head(5).loc[:, ['Year', 'Nationality', 'Foreign Player']]
Year Nationality Foreign Player
0 2016 United States False
1 2016 United States False
2 2016 United States False
3 2016 United States False
4 2016 United States False我根据年份和外国球员做了一个分组,这是一个多指标DataFrame:
df2 = df.groupby(['Year','Foreign Player']).count()[['Player']].head(6)
Player
Year Foreign Player
2000 False 26
True 2
2001 False 21
True 5
2002 False 20
True 5我重置了索引,使单个索引成为DataFrame:
df3 = df2.reset_index(level = [0,1]).head(6)
Year Foreign Player Player
0 2000 False 26
1 2000 True 2
2 2001 False 21
3 2001 True 5
4 2002 False 20如您所见,True和False与不同列中的相应值交替出现。我想这样做:
df3['Percent Foreign'] = df3[['Foreign Player']= False] / (df3[['Foreign Player']= False ] + df3[['Foreign Player']= True)显然这是行不通的。我的目标是一个新的DataFrame:
Year Percent Foreign
0 2000 15
1 2001 12
2 2002 5
3 2003 22
4 2004 17例如,我可以使用Matplotlib绘制x= Year和x= Percent。无论如何,如果在更早的步骤中有更简单的方法来绘制这一点,那就更好了。
再次感谢!
发布于 2021-03-01 13:58:06
For select
值用于反转掩码
,转换值
按合计编制索引和划分
print (df3)
Year Foreign Player Player
0 2000 False 26
1 2000 True 2
2 2001 False 21
3 2001 True 5
4 2002 False 20
4 2002 True 10df4 = (df3[~df3['Foreign Player']].set_index('Year')['Player'] /
df3.groupby('Year')['Player'].sum()).mul(100).reset_index(name='Percent Foreign')
print (df4)
Year Percent Foreign
0 2000 92.857143
1 2001 80.769231
2 2002 66.666667另一个想法是改变
由
df22 = df.groupby(['Year','Foreign Player'])['Player'].count().unstack()
print (df22)
Foreign Player False True
Year
2000 26 2
2001 21 5
2002 20 10然后除以
列和两列:
df4 = (df22[False] / df22.sum(axis=1)).mul(100).reset_index(name='Percent Foreign')
print (df4)
Year Percent Foreign
0 2000 92.857143
1 2001 80.769231
2 2002 66.666667对于百分比
s:
df5 = (df22[True] / df22.sum(axis=1)).reset_index(name='Percent Foreign')发布于 2021-03-01 14:16:06
为了得到玩家的比例,一个想法是制作两个新的列,分别统计总玩家和总外国玩家,然后第三个列将聚合的两个列分开。
示例-简化的数据帧
df = pd.DataFrame(
{'Year': [2010, 2010, 2010, 2010, 2011, 2011, 2011, 2011],
'Foreign Player': [True, True, False, True, False, False, True, True]}
)
Year Foreign Player
0 2010 True
1 2010 True
2 2010 False
3 2010 True
4 2011 False
5 2011 False
6 2011 True
7 2011 True统计行数和外国人数
df_agg = df.groupby('Year')['Foreign Player'].agg(['count', 'sum'])查找比率:
df_agg['ratio'] = df_agg['sum']/df_agg['count']
df_agg
count sum ratio
Year
2010 4 3 0.75
2011 4 2 0.50https://stackoverflow.com/questions/66417281
复制相似问题