我有一组出租车数据,其中有两列,如下所示:
Neighborhood Borough Time
Midtown Manhattan X
Melrose Bronx Y
Grant City Staten Island Z
Midtown Manhattan A
Lincoln Square Manhattan B基本上,每一行都代表了那个区社区的一辆出租车。现在,我想找出每一个区的前五名,皮卡数量最多的地区。我试过这个:
df['Neighborhood'].groupby(df['Borough']).value_counts()这给了我这样的感觉:
borough
Bronx High Bridge 3424
Mott Haven 2515
Concourse Village 1443
Port Morris 1153
Melrose 492
North Riverdale 463
Eastchester 434
Concourse 395
Fordham 252
Wakefield 214
Kingsbridge 212
Mount Hope 200
Parkchester 191
......
Staten Island Castleton Corners 4
Dongan Hills 4
Eltingville 4
Graniteville 4
Great Kills 4
Castleton 3
Woodrow 1我如何过滤它,使我只从每一个得到前5?我知道有一些类似标题的问题,但它们对我的情况没有帮助。
发布于 2016-02-12 14:18:06
我认为你可以使用nlargest -你可以把1改为5
s = df['Neighborhood'].groupby(df['Borough']).value_counts()
print s
Borough
Bronx Melrose 7
Manhattan Midtown 12
Lincoln Square 2
Staten Island Grant City 11
dtype: int64
print s.groupby(level=[0,1]).nlargest(1)
Bronx Bronx Melrose 7
Manhattan Manhattan Midtown 12
Staten Island Staten Island Grant City 11
dtype: int64正在创建其他列,指定级别信息。
发布于 2016-02-12 16:56:27
您可以在一行中使用‘your’稍微扩展原始的groupby:
>>> df.groupby(['Borough', 'Neighborhood']).Neighborhood.value_counts().nlargest(5)
Borough Neighborhood Neighborhood
Bronx Melrose Melrose 1
Manhattan Midtown Midtown 1
Manhatten Lincoln Square Lincoln Square 1
Midtown Midtown 1
Staten Island Grant City Grant City 1
dtype: int64发布于 2021-05-27 08:39:17
解决方案:从每一组中获取topn。
df.groupby(['Borough']).Neighborhood.value_counts().groupby(level=0, group_keys=False).head(5).value_counts().nlargest(5)在其他答案中只给你一个组前5名,也不代表我。group_keys=Falsevalue_counts()已经排序了,所以只需要head(5)https://stackoverflow.com/questions/35364601
复制相似问题