首页
学习
活动
专区
工具
TVP
发布
精选内容/技术社群/优惠产品,尽在小程序
立即前往

给 iOS 开发者的 python 学习日记十三

写在前面

除了 Python 基本的资料结构(list,tuple 与 dictionary)以及昨天学习笔记提到的 ndarray,还记得我们为了让 Python 也能够使用 data frame 资料结构而使用了 pandas 套件的 data frame 吗?我们势必也要了解它常见的属性或方法。

Pandas 与 data frame 的常用属性或方法

建立 data frame

使用pandas套件的DataFrame()方法将一个dictionary的资料结构转换成data frame。

import pandas as pd

groups = ["Modern Web", "DevOps", "Cloud", "Big Data", "Security", "自我挑戰組"]

ironmen = [59, 9, 19, 14, 6, 77]

ironmen_dict = {

"groups": groups,

"ironmen": ironmen

}

ironmen_df = pd.DataFrame(ironmen_dict)

ironmen_df

眼尖的你发现到我们在建立 data frame 的时候并没有去指定索引值(index),然而生成的 data frame ,多么贴心的设计!

了解 data frame 的概观

ndim 属性

shape 属性

dtypes 属性

import pandas as pd

groups = ["Modern Web", "DevOps", "Cloud", "Big Data", "Security", "自我挑戰組"]

ironmen = [59, 9, 19, 14, 6, 77]

ironmen_dict = {

"groups": groups,

"ironmen": ironmen

}

# 建立 data frame

ironmen_df = pd.DataFrame(ironmen_dict)

# 使用屬性

print(ironmen_df.ndim)

print("---") # 分隔線

print(ironmen_df.shape)

print("---") # 分隔線

print(ironmen_df.dtypes)

删除观测值或栏位

data frame 可以透过 drop() 方法来删除观测值或栏位,指定参数 axis = 0 表示要删除观测值(row),指定参数 axis = 1 表示要删除栏位(column)。

import pandas as pd

groups = ["Modern Web", "DevOps", "Cloud", "Big Data", "Security", "自我挑戰組"]

ironmen = [59, 9, 19, 14, 6, 77]

ironmen_dict = {

"groups": groups,

"ironmen": ironmen

}

# 建立 data frame

ironmen_df = pd.DataFrame(ironmen_dict)

# 刪除觀測值

ironmen_df_no_mw = ironmen_df.drop(0, axis = 0)

print(ironmen_df_no_mw)

print("---") # 分隔線

# 刪除欄位

ironmen_df_no_groups = ironmen_df.drop("groups", axis = 1)

print(ironmen_df_no_groups)

透过 ix 属性筛选 data frame

我们可以透过 ix 属性(利用索引值)筛选 data frame。

import pandas as pd

groups = ["Modern Web", "DevOps", "Cloud", "Big Data", "Security", "自我挑戰組"]

ironmen = [59, 9, 19, 14, 6, 77]

ironmen_dict = {

"groups": groups,

"ironmen": ironmen

}

# 建立 data frame

ironmen_df = pd.DataFrame(ironmen_dict)

# 選擇欄位

print(ironmen_df.ix[:, "groups"])

print("---") # 分隔線

# 選擇觀測值

print(ironmen_df.ix[0])

print("---") # 分隔線

# 同時選擇欄位與觀測值

print(ironmen_df.ix[0, "groups"])

透过布林值筛选 data frame

import pandas as pd

groups = ["Modern Web", "DevOps", "Cloud", "Big Data", "Security", "自我挑戰組"]

ironmen = [59, 9, 19, 14, 6, 77]

ironmen_dict = {

"groups": groups,

"ironmen": ironmen

}

# 建立 data frame

ironmen_df = pd.DataFrame(ironmen_dict)

filter = ironmen_df["ironmen"] > 10 # 人數大於 10

ironmen_df[filter] # 篩選 data frame

排序

sort_index() 方法

sort_values() 方法

使用 data frame 的 sort_index() 方法可以用索引值排序。

import pandas as pd

groups = ["Modern Web", "DevOps", "Cloud", "Big Data", "Security", "自我挑戰組"]

ironmen = [59, 9, 19, 14, 6, 77]

# 建立 data frame

ironmen_df = pd.DataFrame(ironmen, columns = ["ironmen"], index = groups)

# 用索引值排序

ironmen_df.sort_index()

使用 data frame 的 sort_values() 方法可以用指定栏位的数值排序。

import pandas as pd

groups = ["Modern Web", "DevOps", "Cloud", "Big Data", "Security", "自我挑戰組"]

ironmen = [59, 9, 19, 14, 6, 77]

# 建立 data frame

ironmen_df = pd.DataFrame(ironmen, columns = ["ironmen"], index = groups)

# 用數值排序

ironmen_df.sort_values(by = "ironmen")

描述统计

data frame 有 sum()、mean()、median() 与 describe() 等统计方法可以使用。

import pandas as pd

groups = ["Modern Web", "DevOps", "Cloud", "Big Data", "Security", "自我挑戰組"]

ironmen = [59, 9, 19, 14, 6, 77]

ironmen_dict = {

"groups": groups,

"ironmen": ironmen

}

# 建立 data frame

ironmen_df = pd.DataFrame(ironmen_dict)

print(ironmen_df.sum()) # 計算總人數

print("---") # 分隔線

print(ironmen_df.mean()) # 計算平均人數

print("---") # 分隔線

print(ironmen_df.median()) # 計算中位數

print("---") # 分隔線

print(ironmen_df.describe()) # 描述統計

相異值個數

透过 pandas 的 value_counts() 方法可以统计相异值的个数。

import pandas as pd

gender = ["Male", "Male", "Female", "Male", "Male", "Male", "Female", "Male", "Male"]

name = ["蒙其·D·魯夫", "羅羅亞·索隆", "娜美", "騙人布", "文斯莫克·香吉士", "多尼多尼·喬巴", "妮可·羅賓", "佛朗基", "布魯克"]

# 建立 data frame

ironmen_df = pd.DataFrame(gender, columns = ["gender"], index = name)

# 計算男女各有幾個觀測值

pd.value_counts(ironmen_df.gender)

遗失值

判断遗失值

isnull() 方法

notnull() 方法

import numpy as np

import pandas as pd

groups = ["Modern Web", "DevOps", np.nan, "Big Data", "Security", "自我挑戰組"]

ironmen = [59, 9, 19, 14, 6, np.nan]

ironmen_dict = {

"groups": groups,

"ironmen": ironmen

}

# 建立 data frame

ironmen_df = pd.DataFrame(ironmen_dict)

print(ironmen_df.ix[:, "groups"].isnull()) # 判斷哪些組的組名是遺失值

print("---") # 分隔線

print(ironmen_df.ix[:, "ironmen"].notnull()) # 判斷哪些組的人數不是遺失值

處理遺失值

dropna() 方法

fillna() 方法

import numpy as np

import pandas as pd

groups = ["Modern Web", "DevOps", np.nan, "Big Data", "Security", "自我挑戰組"]

ironmen = [59, 9, 19, 14, 6, np.nan]

ironmen_dict = {

"groups": groups,

"ironmen": ironmen

}

# 建立 data frame

ironmen_df = pd.DataFrame(ironmen_dict)

ironmen_df_na_dropped = ironmen_df.dropna() # 有遺失值的觀測值都刪除

print(ironmen_df_na_dropped)

print("---") # 分隔線

ironmen_df_na_filled = ironmen_df.fillna(0) # 有遺失值的觀測值填補 0

print(ironmen_df_na_filled)

print("---") # 分隔線

ironmen_df_na_filled = ironmen_df.fillna({"groups": "Cloud", "ironmen": 71}) # 依欄位填補遺失值

print(ironmen_df_na_filled)

小結

我们讨论了 pandas 套件与 data frame 的属性或方法,包含建立,筛选与排序等,这些属性与方法有的隶属于 pandas 套件,有的隶属于 data frame 这个资料结构所建立的物件,对于熟悉面向对象的概念是很好的练习机会。

  • 发表于:
  • 原文链接http://kuaibao.qq.com/s/20171221G02T1B00?refer=cp_1026
  • 腾讯「腾讯云开发者社区」是腾讯内容开放平台帐号(企鹅号)传播渠道之一,根据《腾讯内容开放平台服务协议》转载发布内容。
  • 如有侵权,请联系 cloudcommunity@tencent.com 删除。

扫码

添加站长 进交流群

领取专属 10元无门槛券

私享最新 技术干货

扫码加入开发者社群
领券