对Pandas中常用的描述性函数做一下记录: 拿一个DataFrame:
import pandas as pd import numpy as np #Create a Dictionary of series d = {'Name':pd.Series(['Tom','James','Ricky','Vin','Steve','Minsu','Jack', 'Lee','David','Gasper','Betina','Andres']), 'Age':pd.Series([25,26,25,23,30,29,23,34,40,30,51,46]), 'Rating':pd.Series([4.23,3.24,3.98,2.56,3.20,4.6,3.8,3.78,2.98,4.80,4.10,3.65])} #Create a DataFrame df = pd.DataFrame(d) ''' Name Age Rating 0 Tom 25 4.23 1 James 26 3.24 2 Ricky 25 3.98 3 Vin 23 2.56 4 Steve 30 3.20 5 Minsu 29 4.60 6 Jack 23 3.80 7 Lee 34 3.78 8 David 40 2.98 9 Gasper 30 4.80 10 Betina 51 4.10 11 Andres 46 3.65 '''
describe()
显示数据摘要,可以传递include等参数来选择显示内容df.describe() ''' Age Rating count 12.000000 12.000000 mean 31.833333 3.743333 std 9.232682 0.661628 min 23.000000 2.560000 25% 25.000000 3.230000 50% 29.500000 3.790000 75% 35.500000 4.132500 max 51.000000 4.800000 '''
count()
非空观测数量df.count() ''' Name 12 Age 12 Rating 12 dtype: int64 '''
sum()
所有值的和,可以看到字符串被串联起来了df.sum() ''' Name TomJamesRickyVinSteveMinsuJackLeeDavidGasperBe... Age 382 Rating 44.92 dtype: object '''
mean()
所有值的平均值,可以看到字符串不能算平均值,跳过了df.mean() ''' Age 31.833333 Rating 3.743333 dtype: float64 '''
median()
所有值的中位数df.median() ''' Age 29.50 Rating 3.79 dtype: float64 '''
mode
众数,注意可能有多个众数,所以是个DataFramedf.mode() ''' Name Age Rating 0 Andres 23.0 2.56 1 Betina 25.0 2.98 2 David 30.0 3.20 3 Gasper NaN 3.24 4 Jack NaN 3.65 5 James NaN 3.78 6 Lee NaN 3.80 7 Minsu NaN 3.98 8 Ricky NaN 4.10 9 Steve NaN 4.23 10 Tom NaN 4.60 11 Vin NaN 4.80 '''
std()
值的标准差df.std() ''' Age 9.232682 Rating 0.661628 dtype: float64 '''
min()
所有值的最小值df.min() ''' Name Andres Age 23 Rating 2.56 dtype: object '''
max()
最大值df.max() ''' Name Vin Age 51 Rating 4.8 dtype: object '''
abs()
绝对值df.Age.abs() ''' 0 25 1 26 2 25 3 23 4 30 5 29 6 23 7 34 8 40 9 30 10 51 11 46 Name: Age, dtype: int64 '''
prod()
数组元素的乘积df.prod() ''' Age 7.158408e+17 Rating 6.320128e+06 dtype: float64 '''
cumsum()
累计总和df.iloc[:,1:].cumsum() ''' Age Rating 0 25.0 4.23 1 51.0 7.47 2 76.0 11.45 3 99.0 14.01 4 129.0 17.21 5 158.0 21.81 6 181.0 25.61 7 215.0 29.39 8 255.0 32.37 9 285.0 37.17 10 336.0 41.27 11 382.0 44.92 '''
cumprod()
累计乘积df.iloc[:,1:]. cumprod() ''' Age Rating 0 2.500000e+01 4.230000e+00 1 6.500000e+02 1.370520e+01 2 1.625000e+04 5.454670e+01 3 3.737500e+05 1.396395e+02 4 1.121250e+07 4.468465e+02 5 3.251625e+08 2.055494e+03 6 7.478738e+09 7.810877e+03 7 2.542771e+11 2.952512e+04 8 1.017108e+13 8.798485e+04 9 3.051325e+14 4.223273e+05 10 1.556176e+16 1.731542e+06 11 7.158408e+17 6.320128e+06 '''
本文参与腾讯云自媒体分享计划,欢迎正在阅读的你也加入,一起分享。
我来说两句