【数据分析可视化】谈一谈NaN

瑞新

发布于 2020-07-07 20:01:04

5540

发布于 2020-07-07 20:01:04

NaN-means Not a Number

import numpy as np
import pandas as pd
from pandas import Series, DataFrame

# 创建NaN
n = np.nan

# 类型
type(n)

float

# 任何数字和nan做计算永远是nan
m = 1
m + n

nan

NaN in Series

# 创建含nan情况
s1 = Series([1,2,np.nan,3,4],index=['A','B','C','D','E'])
s1

A    1.0
B    2.0
C    NaN
D    3.0
E    4.0
dtype: float64

# 判断是否nan
s1.isnull()

A    False
B    False
C     True
D    False
E    False
dtype: bool

s1.notnull()

A     True
B     True
C    False
D     True
E     True
dtype: bool

# nan删除掉nan
s1.dropna()

A    1.0
B    2.0
D    3.0
E    4.0
dtype: float64

NaN in DataFrame

# 创建含有nan情况
df1 = DataFrame(np.random.rand(25).reshape(5,5))
df1.ix[2,4] = np.nan
df1.ix[1,3] = np.nan
df1

/Users/bennyrhys/opt/anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:3: FutureWarning: 
.ix is deprecated. Please use
.loc for label based indexing or
.iloc for positional indexing

See the documentation here:
http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#ix-indexer-is-deprecated
  This is separate from the ipykernel package so we can avoid doing imports until
/Users/bennyrhys/opt/anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:4: FutureWarning: 
.ix is deprecated. Please use
.loc for label based indexing or
.iloc for positional indexing

See the documentation here:
http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#ix-indexer-is-deprecated
  after removing the cwd from sys.path.

	0	1	2	3	4
0	0.912220	0.932765	0.827517	0.031858	0.749619
1	0.957043	0.857664	0.616395	NaN	0.562609
2	0.686575	0.016802	0.030477	0.609545	NaN
3	0.543484	0.555226	0.138279	0.979043	0.460136
4	0.870316	0.141909	0.567168	0.116696	0.204007

# 判断nan
df1.isnull()

	0	1	2	3	4
0	False	False	False	False	False
1	False	False	False	True	False
2	False	False	False	False	True
3	False	False	False	False	False
4	False	False	False	False	False

df1.notnull()

	0	1	2	3	4
0	True	True	True	True	True
1	True	True	True	False	True
2	True	True	True	True	False
3	True	True	True	True	True
4	True	True	True	True	True

# 删除的使用（df二维的，因此略有不同）
# axis=0所有带nan的行全部删除
df2 = df1.dropna(axis=0)
df2

	0	1	2	3	4
0	0.912220	0.932765	0.827517	0.031858	0.749619
3	0.543484	0.555226	0.138279	0.979043	0.460136
4	0.870316	0.141909	0.567168	0.116696	0.204007

# axis=1所有带nan的列全部删除
df2 = df1.dropna(axis=1)
df2

	0	1	2
0	0.912220	0.932765	0.827517
1	0.957043	0.857664	0.616395
2	0.686575	0.016802	0.030477
3	0.543484	0.555226	0.138279
4	0.870316	0.141909	0.567168

# 如何删除now，参数now
# any 只要有一个为nan就删掉 当前行或列
df2 = df1.dropna(axis=0,how='any')
df2

	0	1	2	3	4
0	0.912220	0.932765	0.827517	0.031858	0.749619
3	0.543484	0.555226	0.138279	0.979043	0.460136
4	0.870316	0.141909	0.567168	0.116696	0.204007

# 如何删除now，参数now
# all 只有全部为nan就删掉 当前行或列
df2 = df1.dropna(axis=0,how='all')
df2

	0	1	2	3	4
0	0.912220	0.932765	0.827517	0.031858	0.749619
1	0.957043	0.857664	0.616395	NaN	0.562609
2	0.686575	0.016802	0.030477	0.609545	NaN
3	0.543484	0.555226	0.138279	0.979043	0.460136
4	0.870316	0.141909	0.567168	0.116696	0.204007

# 为测试thresh参数新建数据
df2 = DataFrame(np.random.rand(25).reshape(5,5))
df2.ix[2,:] = np.nan
df2.ix[1,3] = np.nan
df2.ix[3,3] = np.nan
df2.ix[3,4] = np.nan
df2

/Users/bennyrhys/opt/anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:3: FutureWarning: 
.ix is deprecated. Please use
.loc for label based indexing or
.iloc for positional indexing

See the documentation here:
http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#ix-indexer-is-deprecated
  This is separate from the ipykernel package so we can avoid doing imports until
/Users/bennyrhys/opt/anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:4: FutureWarning: 
.ix is deprecated. Please use
.loc for label based indexing or
.iloc for positional indexing

See the documentation here:
http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#ix-indexer-is-deprecated
  after removing the cwd from sys.path.
/Users/bennyrhys/opt/anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:5: FutureWarning: 
.ix is deprecated. Please use
.loc for label based indexing or
.iloc for positional indexing

See the documentation here:
http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#ix-indexer-is-deprecated
  """
/Users/bennyrhys/opt/anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:6: FutureWarning: 
.ix is deprecated. Please use
.loc for label based indexing or
.iloc for positional indexing

See the documentation here:
http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#ix-indexer-is-deprecated

	0	1	2	3	4
0	0.371901	0.140453	0.576335	0.895684	0.233522
1	0.896337	0.719907	0.647172	NaN	0.698708
2	NaN	NaN	NaN	NaN	NaN
3	0.415230	0.601340	0.694270	NaN	NaN
4	0.926047	0.913255	0.586473	0.442759	0.238776

# thresh参数是一个删除界限（当前行或列的nan>2，则删除）
df3 = df2.dropna(thresh=2)
df3

	0	1	2	3	4
0	0.371901	0.140453	0.576335	0.895684	0.233522
1	0.896337	0.719907	0.647172	NaN	0.698708
3	0.415230	0.601340	0.694270	NaN	NaN
4	0.926047	0.913255	0.586473	0.442759	0.238776

# nan填充值（可以具体指定行列nan填充值）
df2.fillna(value=1)

	0	1	2	3	4
0	0.371901	0.140453	0.576335	0.895684	0.233522
1	0.896337	0.719907	0.647172	1.000000	0.698708
2	1.000000	1.000000	1.000000	1.000000	1.000000
3	0.415230	0.601340	0.694270	1.000000	1.000000
4	0.926047	0.913255	0.586473	0.442759	0.238776

# 可以具体指定行列nan填充值）
df2.fillna(value={0:0,1:1,2:2,3:3,4:4})

	0	1	2	3	4
0	0.371901	0.140453	0.576335	0.895684	0.233522
1	0.896337	0.719907	0.647172	3.000000	0.698708
2	0.000000	1.000000	2.000000	3.000000	4.000000
3	0.415230	0.601340	0.694270	3.000000	4.000000
4	0.926047	0.913255	0.586473	0.442759	0.238776

fillna 和 dropna 原始值不会变，需要保存新值

本文参与腾讯云自媒体同步曝光计划，分享自作者个人站点/博客。

原始发表：2020/04/18 ，如有侵权请联系 cloudcommunity@tencent.com 删除

nan

本文分享自作者个人站点/博客前往查看

如有侵权，请联系 cloudcommunity@tencent.com 删除。

本文参与腾讯云自媒体同步曝光计划，欢迎热爱写作的你一起参与！

nan

登录后参与评论

0 条评论

热度

【数据分析可视化】谈一谈NaN

【数据分析可视化】谈一谈NaN

NaN-means Not a Number

NaN in Series

NaN in DataFrame

社区

活动

资源

关于

腾讯云开发者

热门产品

热门推荐

更多推荐