我想过滤掉NaN值,并将其余行保留在Label列中。
df
Timestamp Label
157505 2010-09-21 23:13:21.090 1
321498 2010-09-22 00:44:14.890 1
332687 2010-09-22 00:44:15.890 1
330028 2010-09-22 00:44:17.890 NaN
293410 2010-09-22 00:44:18.440 2
23093 2010-09-22 00:44:19.890 2
282054 2010-09-22 00:44:23.440 2
158381 2010-09-22 01:04:33.440 NaN
317397 2010-09-22 01:27:01.790 NaN
170770 2010-09-22 02:18:52.850 NaN可重现的例子:
from pandas import *
import numpy as np
import pandas as pd
df = pd.DataFrame({'Timestamp': {157505: Timestamp('2010-09-21 23:13:21.090000'),
321498: Timestamp('2010-09-22 00:44:14.890000'),
332687: Timestamp('2010-09-22 00:44:15.890000'),
330028: Timestamp('2010-09-22 00:44:17.890000'),
293410: Timestamp('2010-09-22 00:44:18.440000'),
23093: Timestamp('2010-09-22 00:44:19.890000'),
282054: Timestamp('2010-09-22 00:44:23.440000'),
158381: Timestamp('2010-09-22 01:04:33.440000'),
317397: Timestamp('2010-09-22 01:27:01.790000'),
170770: Timestamp('2010-09-22 02:18:52.850000')},
'Label': {157505: 1,
321498: 1,
332687: 1,
330028: 'NaN',
293410: 2,
23093: 2,
282054: 2,
158381: 'NaN',
317397: 'NaN',
170770: 'NaN'}})
df我试过了:
df[df.Label.notnull()]得到了完全相同的表:
Timestamp Label
157505 2010-09-21 23:13:21.090 1
321498 2010-09-22 00:44:14.890 1
332687 2010-09-22 00:44:15.890 1
330028 2010-09-22 00:44:17.890 NaN
293410 2010-09-22 00:44:18.440 2
23093 2010-09-22 00:44:19.890 2
282054 2010-09-22 00:44:23.440 2
158381 2010-09-22 01:04:33.440 NaN
317397 2010-09-22 01:27:01.790 NaN
170770 2010-09-22 02:18:52.850 NaN出了什么问题,最好的方法是什么?
发布于 2020-11-03 06:08:14
您可以这样做:
df['Label'] = df['Label'].replace('NaN', np.nan)
df.dropna(inplace=True)
print(df)或
df = df[df['Label'].notna()]
print(df)
Timestamp Label
157505 2010-09-21 23:13:21.090 1.0
321498 2010-09-22 00:44:14.890 1.0
332687 2010-09-22 00:44:15.890 1.0
293410 2010-09-22 00:44:18.440 2.0
23093 2010-09-22 00:44:19.890 2.0
282054 2010-09-22 00:44:23.440 2.0发布于 2020-11-03 06:06:28
请将标签从dtype object转换为float并使用notna()或isna()
df=df[df.Label.astype(float).notna()]
print(df)
Timestamp Label
157505 2010-09-21 23:13:21.090 1.0
321498 2010-09-22 00:44:14.890 1.0
332687 2010-09-22 00:44:15.890 1.0
293410 2010-09-22 00:44:18.440 2.0
23093 2010-09-22 00:44:19.890 2.0
282054 2010-09-22 00:44:23.440 2.0发布于 2020-11-03 06:16:54
我知道你在尝试过滤NAN值。但是,notnull()过滤器不会过滤字符串'NaN‘。用np.nan替换它将会得到您所期望的结果。此外,您可以选择放弃它。
from pandas import *
import numpy as np
import pandas as pd
df = pd.DataFrame({'Timestamp': {157505: Timestamp('2010-09-21 23:13:21.090000'),
321498: Timestamp('2010-09-22 00:44:14.890000'),
332687: Timestamp('2010-09-22 00:44:15.890000'),
330028: Timestamp('2010-09-22 00:44:17.890000'),
293410: Timestamp('2010-09-22 00:44:18.440000'),
23093: Timestamp('2010-09-22 00:44:19.890000'),
282054: Timestamp('2010-09-22 00:44:23.440000'),
158381: Timestamp('2010-09-22 01:04:33.440000'),
317397: Timestamp('2010-09-22 01:27:01.790000'),
170770: Timestamp('2010-09-22 02:18:52.850000')},
'Label': {157505: 1,
321498: 1,
332687: 1,
330028: np.nan,
293410: 2,
23093: 2,
282054: 2,
158381: np.nan,
317397: np.nan,
170770: np.nan}})
df[df.Label.notnull()]将得到:
Timestamp Label
157505 2010-09-21 23:13:21.090 1.0
321498 2010-09-22 00:44:14.890 1.0
332687 2010-09-22 00:44:15.890 1.0
293410 2010-09-22 00:44:18.440 2.0
23093 2010-09-22 00:44:19.890 2.0
282054 2010-09-22 00:44:23.440 2.0或
df.dropna()它将产生相同的结果:
Timestamp Label
157505 2010-09-21 23:13:21.090 1.0
321498 2010-09-22 00:44:14.890 1.0
332687 2010-09-22 00:44:15.890 1.0
293410 2010-09-22 00:44:18.440 2.0
23093 2010-09-22 00:44:19.890 2.0
282054 2010-09-22 00:44:23.440 2.0https://stackoverflow.com/questions/64653971
复制相似问题