前往小程序,Get更优阅读体验!
立即前往
首页
学习
活动
专区
工具
TVP
发布
社区首页 >专栏 >Pandas函数使用-nlargest-nsmallest

Pandas函数使用-nlargest-nsmallest

作者头像
皮大大
发布2023-08-25 11:41:32
2290
发布2023-08-25 11:41:32
举报
文章被收录于专栏:机器学习/数据可视化

nsmallest和nlargest的使用

本文介绍两个函数的使用:nsmallest和nlargest。

官网地址:https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.nsmallest.html

代码语言:javascript
复制
DataFrame.nsmallest(
    n,  # int类型
    columns,  # 字段名
    keep='first'  # 重复值处理;{‘first’, ‘last’, ‘all’}, default ‘first’
   )

模拟数据

代码语言:javascript
复制
import pandas as pd
import numpy as np
代码语言:javascript
复制
df = pd.DataFrame({"name":["xiaosun","zhoujuan","xiaozhang","wangfeng","xiaoming","zhangjun"],
                   "score":[100,128,100,150,100,145],
                   "age":[21,25,23,21,25,25],
                   "height":[1.75,1.8,1.77,1.8,1.9,1.71]
                  })
df

.dataframe tbody tr th:only-of-type { vertical-align: middle; } <pre><code>.dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; } </code></pre>

name

score

age

height

0

xiaosun

100

21

1.75

1

zhoujuan

128

25

1.80

2

xiaozhang

100

23

1.77

3

wangfeng

150

21

1.80

4

xiaoming

100

25

1.90

5

zhangjun

145

25

1.71

nsmallest

默认情况

代码语言:javascript
复制
df.nsmallest(2, "score")

.dataframe tbody tr th:only-of-type { vertical-align: middle; } <pre><code>.dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; } </code></pre>

name

score

age

height

0

xiaosun

100

21

1.75

2

xiaozhang

100

23

1.77

代码语言:javascript
复制
df.nsmallest(4, "score")

.dataframe tbody tr th:only-of-type { vertical-align: middle; } <pre><code>.dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; } </code></pre>

name

score

age

height

0

xiaosun

100

21

1.75

2

xiaozhang

100

23

1.77

4

xiaoming

100

25

1.90

1

zhoujuan

128

25

1.80

可以看到默认情况,重复值也会多次计数。

参数keep

代码语言:javascript
复制
# 同上结果,默认first

df.nsmallest(4, "score", keep="first")

.dataframe tbody tr th:only-of-type { vertical-align: middle; } <pre><code>.dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; } </code></pre>

name

score

age

height

0

xiaosun

100

21

1.75

2

xiaozhang

100

23

1.77

4

xiaoming

100

25

1.90

1

zhoujuan

128

25

1.80

代码语言:javascript
复制
df.nsmallest(4, "score", keep="last")

.dataframe tbody tr th:only-of-type { vertical-align: middle; } <pre><code>.dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; } </code></pre>

name

score

age

height

4

xiaoming

100

25

1.90

2

xiaozhang

100

23

1.77

0

xiaosun

100

21

1.75

1

zhoujuan

128

25

1.80

排序的顺序发生了变化,从索引号最大的4开始;

如何理解keep=“all”?

代码语言:javascript
复制
df.nsmallest(2, "score")

.dataframe tbody tr th:only-of-type { vertical-align: middle; } <pre><code>.dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; } </code></pre>

name

score

age

height

0

xiaosun

100

21

1.75

2

xiaozhang

100

23

1.77

当keep="all"会把全部的信息显示出来:

代码语言:javascript
复制
df.nsmallest(2, "score", keep="all")

.dataframe tbody tr th:only-of-type { vertical-align: middle; } <pre><code>.dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; } </code></pre>

name

score

age

height

0

xiaosun

100

21

1.75

2

xiaozhang

100

23

1.77

4

xiaoming

100

25

1.90

多个字段取值

代码语言:javascript
复制
df.nsmallest(4,["age","height"])

.dataframe tbody tr th:only-of-type { vertical-align: middle; } <pre><code>.dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; } </code></pre>

name

score

age

height

0

xiaosun

100

21

1.75

3

wangfeng

150

21

1.80

2

xiaozhang

100

23

1.77

5

zhangjun

145

25

1.71

nlargest

该函数是降序排列

https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.nlargest.html#pandas.DataFrame.nlargest

代码语言:javascript
复制
DataFrame.nlargest(
    n,
    columns,
    keep='first'  # {‘first’, ‘last’, ‘all’}, default ‘first’
    )
代码语言:javascript
复制
df.nlargest(3,"score")

.dataframe tbody tr th:only-of-type { vertical-align: middle; } <pre><code>.dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; } </code></pre>

name

score

age

height

3

wangfeng

150

21

1.80

5

zhangjun

145

25

1.71

1

zhoujuan

128

25

1.80

代码语言:javascript
复制
df.nlargest(3,"age")

.dataframe tbody tr th:only-of-type { vertical-align: middle; } <pre><code>.dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; } </code></pre>

name

score

age

height

1

zhoujuan

128

25

1.80

4

xiaoming

100

25

1.90

5

zhangjun

145

25

1.71

代码语言:javascript
复制
df.nlargest(2,"age",keep="first")

.dataframe tbody tr th:only-of-type { vertical-align: middle; } <pre><code>.dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; } </code></pre>

name

score

age

height

1

zhoujuan

128

25

1.8

4

xiaoming

100

25

1.9

代码语言:javascript
复制
df.nlargest(2,"age",keep="last")

.dataframe tbody tr th:only-of-type { vertical-align: middle; } <pre><code>.dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; } </code></pre>

name

score

age

height

5

zhangjun

145

25

1.71

4

xiaoming

100

25

1.90

代码语言:javascript
复制
df.nlargest(2,"age",keep="all")

.dataframe tbody tr th:only-of-type { vertical-align: middle; } <pre><code>.dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; } </code></pre>

name

score

age

height

1

zhoujuan

128

25

1.80

4

xiaoming

100

25

1.90

5

zhangjun

145

25

1.71

nlargest + drop_duplicates

实现需求:找出年龄age最大的前2位;如果相同年龄,取出一个即可

代码语言:javascript
复制
df

.dataframe tbody tr th:only-of-type { vertical-align: middle; } <pre><code>.dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; } </code></pre>

name

score

age

height

0

xiaosun

100

21

1.75

1

zhoujuan

128

25

1.80

2

xiaozhang

100

23

1.77

3

wangfeng

150

21

1.80

4

xiaoming

100

25

1.90

5

zhangjun

145

25

1.71

代码语言:javascript
复制
df["age"].value_counts()
代码语言:javascript
复制
25    3
21    2
23    1
Name: age, dtype: int64

年龄最大为25,且有3位;根据age去重:

代码语言:javascript
复制
df1 = df.drop_duplicates(subset=["age"], keep="first")
df1

.dataframe tbody tr th:only-of-type { vertical-align: middle; } <pre><code>.dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; } </code></pre>

name

score

age

height

0

xiaosun

100

21

1.75

1

zhoujuan

128

25

1.80

2

xiaozhang

100

23

1.77

代码语言:javascript
复制
df1.nlargest(2,"age")

.dataframe tbody tr th:only-of-type { vertical-align: middle; } <pre><code>.dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; } </code></pre>

name

score

age

height

1

zhoujuan

128

25

1.80

2

xiaozhang

100

23

1.77

本文参与 腾讯云自媒体同步曝光计划,分享自作者个人站点/博客。
原始发表:2022-8-31,如有侵权请联系 cloudcommunity@tencent.com 删除

本文分享自 作者个人站点/博客 前往查看

如有侵权,请联系 cloudcommunity@tencent.com 删除。

本文参与 腾讯云自媒体同步曝光计划  ,欢迎热爱写作的你一起参与!

评论
登录后参与评论
0 条评论
热度
最新
推荐阅读
目录
  • nsmallest和nlargest的使用
  • 模拟数据
  • nsmallest
    • 默认情况
      • 参数keep
        • 多个字段取值
        • nlargest
        • nlargest + drop_duplicates
        领券
        问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档