blocks|key|3633617|text|我认为你可以使用nlargest+-你可以把1改为5|type|unstyled|depth|inlineStyleRanges|offset|length|style|CODE|entityRanges|data|3633618|s+=+df['Neighborhood'].groupby(df['Borough']).value_counts()
print+s
Borough++++++++++++++++++++++
Bronx++++++++++Melrose++++++++++++7
Manhattan++++++Midtown+++++++++++12
+++++++++++++++Lincoln+Square+++++2
Staten+Island++Grant+City++++++++11
dtype:+int64

print+s.groupby(level=[0,1]).nlargest(1)
Bronx++++++++++Bronx++++++++++Melrose++++++++7
Manhattan++++++Manhattan++++++Midtown+++++++12
Staten+Island++Staten+Island++Grant+City++++11
dtype:+int64|code-block|syntax|javascript|3633619|正在创建其他列，指定级别信息。|3633620|entityMap|0|LINK|mutability|MUTABLE|url|http://pandas.pydata.org/pandas-docs/stable/generated/pandas.core.groupby.SeriesGroupBy.nlargest.html^0|8|8|M|1|P|1|8|8|0|0|0|0^^$0|@$1|2|3|4|5|6|7|U|8|@$9|V|A|W|B|C]|$9|X|A|Y|B|C]|$9|Z|A|10|B|C]]|D|@$9|11|A|12|1|13]]|E|$]]|$1|F|3|G|5|H|7|14|8|@]|D|@]|E|$I|J]]|$1|K|3|L|5|6|7|15|8|@]|D|@]|E|$]]|$1|M|3|-4|5|6|7|16|8|@]|D|@]|E|$]]]|N|$O|$5|P|Q|R|E|$S|T]]]]

I think you can use <a href="http://pandas.pydata.org/pandas-docs/stable/generated/pandas.core.groupby.SeriesGroupBy.nlargest.html" rel="noreferrer"><code>nlargest</code></a> - you can change <code>1</code> to <code>5</code>:

<pre><code>s = df['Neighborhood'].groupby(df['Borough']).value_counts()
print s
Borough 
Bronx Melrose 7
Manhattan Midtown 12
 Lincoln Square 2
Staten Island Grant City 11
dtype: int64

print s.groupby(level=[0,1]).nlargest(1)
Bronx Bronx Melrose 7
Manhattan Manhattan Midtown 12
Staten Island Staten Island Grant City 11
dtype: int64
</code></pre>

additional columns were getting created, specified level info

blocks|key|3719988|text|您可以在一行中使用‘your’稍微扩展原始的groupby：|type|unstyled|depth|inlineStyleRanges|entityRanges|data|3719989|>>>+df.groupby(['Borough',+'Neighborhood']).Neighborhood.value_counts().nlargest(5)
Borough++++++++Neighborhood++++Neighborhood++
Bronx++++++++++Melrose+++++++++Melrose+++++++++++1
Manhattan++++++Midtown+++++++++Midtown+++++++++++1
Manhatten++++++Lincoln+Square++Lincoln+Square++++1
+++++++++++++++Midtown+++++++++Midtown+++++++++++1
Staten+Island++Grant+City++++++Grant+City++++++++1
dtype:+int64|code-block|syntax|javascript|3719990|entityMap^0|0|0^^$0|@$1|2|3|4|5|6|7|I|8|@]|9|@]|A|$]]|$1|B|3|C|5|D|7|J|8|@]|9|@]|A|$E|F]]|$1|G|3|-4|5|6|7|K|8|@]|9|@]|A|$]]]|H|$]]

You can do this in a single line by slightly extending your original groupby with 'nlargest':

<pre><code>&gt;&gt;&gt; df.groupby(['Borough', 'Neighborhood']).Neighborhood.value_counts().nlargest(5)
Borough Neighborhood Neighborhood 
Bronx Melrose Melrose 1
Manhattan Midtown Midtown 1
Manhatten Lincoln Square Lincoln Square 1
 Midtown Midtown 1
Staten Island Grant City Grant City 1
dtype: int64
</code></pre>

blocks|key|700237|text|df['Neighborhood'].groupby(df['Borough']).value_counts().head(5)|type|code-block|depth|inlineStyleRanges|entityRanges|data|syntax|javascript|700238|head()获取数据帧中的前5行。|unstyled|offset|length|style|CODE|700239|entityMap^0|0|0|6|0^^$0|@$1|2|3|4|5|6|7|M|8|@]|9|@]|A|$B|C]]|$1|D|3|E|5|F|7|N|8|@$G|O|H|P|I|J]]|9|@]|A|$]]|$1|K|3|-4|5|F|7|Q|8|@]|9|@]|A|$]]]|L|$]]

<pre><code>df['Neighborhood'].groupby(df['Borough']).value_counts().head(5)
</code></pre>
<code>head()</code> gets the top 5 rows in a data frame.

blocks|key|699085|text|您还可以尝试在下面的代码中只获取值计数的前10个值。|type|unstyled|depth|inlineStyleRanges|entityRanges|data|699086|'country_code‘和'raised_amount_usd’是列名。|699087|groupby_country_code=master_frame.groupby('country_code')+arr=groupby_country_code'raised_amount_usd'.sum().sort_index()0:10打印(Arr)|699088|0:10显示用于切片的数组中的索引0到10。你可以选择你的切片选项。|699089|entityMap^0|0|0|0|0^^$0|@$1|2|3|4|5|6|7|J|8|@]|9|@]|A|$]]|$1|B|3|C|5|6|7|K|8|@]|9|@]|A|$]]|$1|D|3|E|5|6|7|L|8|@]|9|@]|A|$]]|$1|F|3|G|5|6|7|M|8|@]|9|@]|A|$]]|$1|H|3|-4|5|6|7|N|8|@]|9|@]|A|$]]]|I|$]]

You can also try below code to get only top 10 values of value counts
'country_code' and 'raised_amount_usd' is column names.
groupby_country_code=master_frame.groupby('country_code')
arr=groupby_country_code['raised_amount_usd'].sum().sort_index()[0:10]
print(arr)
[0:10] shows index 0 to 10 from array for slicing. you can choose your slicing option.

blocks|key|699098|text|试试这个(只需将head()中的数字更改为您的选择)：|type|unstyled|depth|inlineStyleRanges|entityRanges|data|699099|#+top+3+:+total+counts+of+'Neighborhood'+in+each+Borough
Z+=+df.groupby('Borough')['Neighborhood'].value_counts().groupby(level=0).head(3).sort_values(ascending=False).to_frame('counts').reset_index()

Z|code-block|syntax|javascript|699100|entityMap^0|0|0^^$0|@$1|2|3|4|5|6|7|I|8|@]|9|@]|A|$]]|$1|B|3|C|5|D|7|J|8|@]|9|@]|A|$E|F]]|$1|G|3|-4|5|6|7|K|8|@]|9|@]|A|$]]]|H|$]]

Try this one (just change the number in head() to your choice):
<pre><code># top 3 : total counts of 'Neighborhood' in each Borough
Z = df.groupby('Borough')['Neighborhood'].value_counts().groupby(level=0).head(3).sort_values(ascending=False).to_frame('counts').reset_index()

Z
</code></pre>

blocks|key|3633731|text|解决方案:从每一组中获取topn。|type|unstyled|depth|inlineStyleRanges|entityRanges|data|3633732|df.groupby(['Borough']).Neighborhood.value_counts().groupby(level=0,+group_keys=False).head(5)|code-block|syntax|javascript|3633733|.value_counts().nlargest(5)在其他答案中只给你一个组前5名，也不代表我。|ordered-list-item|offset|length|style|CODE|3633734|避免重复索引的group_keys=False|3633735|因为value_counts()已经排序了，所以只需要head(5)|3633736|entityMap^0|0|0|0|R|0|7|G|0|2|E|R|7|0^^$0|@$1|2|3|4|5|6|7|T|8|@]|9|@]|A|$]]|$1|B|3|C|5|D|7|U|8|@]|9|@]|A|$E|F]]|$1|G|3|H|5|I|7|V|8|@$J|W|K|X|L|M]]|9|@]|A|$]]|$1|N|3|O|5|I|7|Y|8|@$J|Z|K|10|L|M]]|9|@]|A|$]]|$1|P|3|Q|5|I|7|11|8|@$J|12|K|13|L|M]|$J|14|K|15|L|M]]|9|@]|A|$]]|$1|R|3|-4|5|6|7|16|8|@]|9|@]|A|$]]]|S|$]]

<h2>Solution: for get topn from every group</h2>
<pre><code>df.groupby(['Borough']).Neighborhood.value_counts().groupby(level=0, group_keys=False).head(5)
</code></pre>
<ol>
<li><code>.value_counts().nlargest(5)</code> in other answers only give you one group top 5, doesn't make sence for me too.</li>
<li><code>group_keys=False</code> to avoid duplicated index</li>
<li>because <code>value_counts()</code> has already sorted, just need <code>head(5)</code></li>
</ol>

I have a dataframe of taxi data with two columns that looks like this:

<pre><code>Neighborhood Borough Time
Midtown Manhattan X
Melrose Bronx Y
Grant City Staten Island Z
Midtown Manhattan A
Lincoln Square Manhattan B
</code></pre>

Basically, each row represents a taxi pickup in that neighborhood in that borough. Now, I want to find the top 5 neighborhoods in each borough with the most number of pickups. I tried this:

<pre><code>df['Neighborhood'].groupby(df['Borough']).value_counts()
</code></pre>

Which gives me something like this:

<pre><code>borough 
Bronx High Bridge 3424
 Mott Haven 2515
 Concourse Village 1443
 Port Morris 1153
 Melrose 492
 North Riverdale 463
 Eastchester 434
 Concourse 395
 Fordham 252
 Wakefield 214
 Kingsbridge 212
 Mount Hope 200
 Parkchester 191
......

Staten Island Castleton Corners 4
 Dongan Hills 4
 Eltingville 4
 Graniteville 4
 Great Kills 4
 Castleton 3
 Woodrow 1
</code></pre>

How do I filter it so that I get only the top 5 from each? I know there are a few questions with a similar title but they weren't helpful to my case.

Group by and find top n value_counts pandas

翻译质量差，导致语言生硬或混乱。

没有提供实际的解决方法或示例。

解答不清晰，无法理解或解决问题。

页面排版不美观，阅读体验差。

文章

问答

视频

学习中心

腾讯云实验室

直播

竞赛

腾讯云代码分析专区

腾讯iOA零信任安全管理系统专区

腾讯云架构师技术同盟交流圈

腾讯云数据库专区

腾讯云顾问专区

腾讯云原生专区

腾讯混元专区

腾讯云TCE专区

腾讯云Lighthouse专区

腾讯云HAI专区

腾讯云Edgeone专区

腾讯云存储专区

腾讯云智能专区

腾讯轻联专区 

腾讯云开发专区

TAPD专区

腾讯轻量云游戏服专区

腾讯云最具价值专家

腾讯云架构师技术同盟

腾讯云创作之星

腾讯云开发者先锋

腾讯云代码助手

云原生构建

TAPD 敏捷项目管理

Cloud Studio

SDK中心

API中心

命令行工具

涵盖代码开发、场景应用、自动测试全流程，助你从零构建专属AI助手

一站式MCP教程库，解锁AI应用新玩法

我有一组出租车数据，其中有两列，如下所示：Neighborhood    Borough        TimeMidtown         Manhattan      XMelrose         Bronx          YGrant City      Staten Island  ZMidtown ...

问value_counts熊猫组和寻找顶级大熊猫
EN

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问value_counts熊猫组和寻找顶级大熊猫EN