Pandas-6.描述性函数

对Pandas中常用的描述性函数做一下记录: 拿一个DataFrame:

import pandas as pd
import numpy as np

#Create a Dictionary of series
d = {'Name':pd.Series(['Tom','James','Ricky','Vin','Steve','Minsu','Jack',
   'Lee','David','Gasper','Betina','Andres']),
   'Age':pd.Series([25,26,25,23,30,29,23,34,40,30,51,46]),
   'Rating':pd.Series([4.23,3.24,3.98,2.56,3.20,4.6,3.8,3.78,2.98,4.80,4.10,3.65])}


#Create a DataFrame
df = pd.DataFrame(d)
'''
    Name    Age Rating
0   Tom 25  4.23
1   James   26  3.24
2   Ricky   25  3.98
3   Vin 23  2.56
4   Steve   30  3.20
5   Minsu   29  4.60
6   Jack    23  3.80
7   Lee 34  3.78
8   David   40  2.98
9   Gasper  30  4.80
10  Betina  51  4.10
11  Andres  46  3.65
'''
  • describe() 显示数据摘要,可以传递include等参数来选择显示内容
df.describe()
'''
    Age Rating
count   12.000000   12.000000
mean    31.833333   3.743333
std 9.232682    0.661628
min 23.000000   2.560000
25% 25.000000   3.230000
50% 29.500000   3.790000
75% 35.500000   4.132500
max 51.000000   4.800000
'''
  • count()非空观测数量
df.count()
'''
Name      12
Age       12
Rating    12
dtype: int64
'''
  • sum()所有值的和,可以看到字符串被串联起来了
df.sum()
'''
Name      TomJamesRickyVinSteveMinsuJackLeeDavidGasperBe...
Age                                                     382
Rating                                                44.92
dtype: object
'''
  • mean() 所有值的平均值,可以看到字符串不能算平均值,跳过了
df.mean()
'''
Age       31.833333
Rating     3.743333
dtype: float64
'''
  • median()所有值的中位数
df.median()
'''
Age       29.50
Rating     3.79
dtype: float64
'''
  • mode 众数,注意可能有多个众数,所以是个DataFrame
df.mode()
'''

Name    Age Rating
0   Andres  23.0    2.56
1   Betina  25.0    2.98
2   David   30.0    3.20
3   Gasper  NaN 3.24
4   Jack    NaN 3.65
5   James   NaN 3.78
6   Lee NaN 3.80
7   Minsu   NaN 3.98
8   Ricky   NaN 4.10
9   Steve   NaN 4.23
10  Tom NaN 4.60
11  Vin NaN 4.80
'''
  • std() 值的标准差
df.std()
'''
Age       9.232682
Rating    0.661628
dtype: float64
'''
  • min() 所有值的最小值
df.min()
'''
Name      Andres
Age           23
Rating      2.56
dtype: object
'''
  • max() 最大值
df.max()
'''
Name      Vin
Age        51
Rating    4.8
dtype: object
'''
  • abs() 绝对值
df.Age.abs()
'''
0     25
1     26
2     25
3     23
4     30
5     29
6     23
7     34
8     40
9     30
10    51
11    46
Name: Age, dtype: int64
'''
  • prod() 数组元素的乘积
df.prod()
'''
Age       7.158408e+17
Rating    6.320128e+06
dtype: float64
'''
  • cumsum()累计总和
df.iloc[:,1:].cumsum()
'''

Age Rating
0   25.0    4.23
1   51.0    7.47
2   76.0    11.45
3   99.0    14.01
4   129.0   17.21
5   158.0   21.81
6   181.0   25.61
7   215.0   29.39
8   255.0   32.37
9   285.0   37.17
10  336.0   41.27
11  382.0   44.92

'''
  • cumprod()累计乘积
df.iloc[:,1:]. cumprod()
'''

Age Rating
0   2.500000e+01    4.230000e+00
1   6.500000e+02    1.370520e+01
2   1.625000e+04    5.454670e+01
3   3.737500e+05    1.396395e+02
4   1.121250e+07    4.468465e+02
5   3.251625e+08    2.055494e+03
6   7.478738e+09    7.810877e+03
7   2.542771e+11    2.952512e+04
8   1.017108e+13    8.798485e+04
9   3.051325e+14    4.223273e+05
10  1.556176e+16    1.731542e+06
11  7.158408e+17    6.320128e+06
'''

本文参与腾讯云自媒体分享计划,欢迎正在阅读的你也加入,一起分享。

发表于

我来说两句

0 条评论
登录 后参与评论

扫码关注云+社区

领取腾讯云代金券