对Pandas中常用的描述性函数做一下记录: 拿一个DataFrame:
import pandas as pd
import numpy as np
#Create a Dictionary of series
d = {'Name':pd.Series(['Tom','James','Ricky','Vin','Steve','Minsu','Jack',
'Lee','David','Gasper','Betina','Andres']),
'Age':pd.Series([25,26,25,23,30,29,23,34,40,30,51,46]),
'Rating':pd.Series([4.23,3.24,3.98,2.56,3.20,4.6,3.8,3.78,2.98,4.80,4.10,3.65])}
#Create a DataFrame
df = pd.DataFrame(d)
'''
Name Age Rating
0 Tom 25 4.23
1 James 26 3.24
2 Ricky 25 3.98
3 Vin 23 2.56
4 Steve 30 3.20
5 Minsu 29 4.60
6 Jack 23 3.80
7 Lee 34 3.78
8 David 40 2.98
9 Gasper 30 4.80
10 Betina 51 4.10
11 Andres 46 3.65
'''
describe()
显示数据摘要,可以传递include等参数来选择显示内容df.describe()
'''
Age Rating
count 12.000000 12.000000
mean 31.833333 3.743333
std 9.232682 0.661628
min 23.000000 2.560000
25% 25.000000 3.230000
50% 29.500000 3.790000
75% 35.500000 4.132500
max 51.000000 4.800000
'''
count()
非空观测数量df.count()
'''
Name 12
Age 12
Rating 12
dtype: int64
'''
sum()
所有值的和,可以看到字符串被串联起来了df.sum()
'''
Name TomJamesRickyVinSteveMinsuJackLeeDavidGasperBe...
Age 382
Rating 44.92
dtype: object
'''
mean()
所有值的平均值,可以看到字符串不能算平均值,跳过了df.mean()
'''
Age 31.833333
Rating 3.743333
dtype: float64
'''
median()
所有值的中位数df.median()
'''
Age 29.50
Rating 3.79
dtype: float64
'''
mode
众数,注意可能有多个众数,所以是个DataFramedf.mode()
'''
Name Age Rating
0 Andres 23.0 2.56
1 Betina 25.0 2.98
2 David 30.0 3.20
3 Gasper NaN 3.24
4 Jack NaN 3.65
5 James NaN 3.78
6 Lee NaN 3.80
7 Minsu NaN 3.98
8 Ricky NaN 4.10
9 Steve NaN 4.23
10 Tom NaN 4.60
11 Vin NaN 4.80
'''
std()
值的标准差df.std()
'''
Age 9.232682
Rating 0.661628
dtype: float64
'''
min()
所有值的最小值df.min()
'''
Name Andres
Age 23
Rating 2.56
dtype: object
'''
max()
最大值df.max()
'''
Name Vin
Age 51
Rating 4.8
dtype: object
'''
abs()
绝对值df.Age.abs()
'''
0 25
1 26
2 25
3 23
4 30
5 29
6 23
7 34
8 40
9 30
10 51
11 46
Name: Age, dtype: int64
'''
prod()
数组元素的乘积df.prod()
'''
Age 7.158408e+17
Rating 6.320128e+06
dtype: float64
'''
cumsum()
累计总和df.iloc[:,1:].cumsum()
'''
Age Rating
0 25.0 4.23
1 51.0 7.47
2 76.0 11.45
3 99.0 14.01
4 129.0 17.21
5 158.0 21.81
6 181.0 25.61
7 215.0 29.39
8 255.0 32.37
9 285.0 37.17
10 336.0 41.27
11 382.0 44.92
'''
cumprod()
累计乘积df.iloc[:,1:]. cumprod()
'''
Age Rating
0 2.500000e+01 4.230000e+00
1 6.500000e+02 1.370520e+01
2 1.625000e+04 5.454670e+01
3 3.737500e+05 1.396395e+02
4 1.121250e+07 4.468465e+02
5 3.251625e+08 2.055494e+03
6 7.478738e+09 7.810877e+03
7 2.542771e+11 2.952512e+04
8 1.017108e+13 8.798485e+04
9 3.051325e+14 4.223273e+05
10 1.556176e+16 1.731542e+06
11 7.158408e+17 6.320128e+06
'''