问为什么我在向这个Dataframe添加部分行时获得的是NaT值而不是NaN？
EN

Stack Overflow用户

提问于 2022-05-29 12:44:53

回答 3查看 89关注 0票数 1

我有一个脚本，它将一个.csv文件读取到一个数据中，然后允许用户通过向它添加额外的数据来扩展它。它将接受date列中的最后一个值，并开始逐日提示用户输入一个值。

如果用户没有为input指定任何内容，则将该值转换为math.nan。除非我将行附加到dataframe，否则假定的NaN将被转换为NaT。

我在下面重新创建了一个可复制的示例。

如何确保我的NaN不被转换为NaTs？

#!/usr/bin/env python

import pandas as pd
import datetime as dt
import math

df = pd.DataFrame({
    'date': pd.to_datetime(['2022-05-01', '2022-05-02', '2022-05-03']),
    'weight': [250., 249, 247],
})

last_recorded_date = df['date'].iloc[-1]
next_date = last_recorded_date + dt.timedelta(days=1)
df.loc[len(df.index)] = [next_date, math.nan]

print(df)
#         date weight
# 0 2022-05-01  250.0
# 1 2022-05-02  249.0
# 2 2022-05-03  247.0
# 3 2022-05-06    NaT

dataframe

python

pandas

腾讯云大数据新春特惠

从基础引擎到开发治理平台，再到数据应用，赋能企业数字化转型。大数据产品9.9元起！

回答 3

Stack Overflow用户

回答已采纳

发布于 2022-05-29 14:20:39

从list中设置行时，首先将列表转换为Series。Series的元素必须是相同的类型；第一个值是datetime；因此每个值都在结果Series中转换为datetime。特别是，math.nan变成了NaT。熊猫不使用现有的列类型来通知进程，而是根据需要对列类型进行调整-- weight列的类型从float扩展到object。

从我的测试来看，使用元组似乎解决了这个问题：

df.loc[len(df.index)] = (next_date, math.nan)

票数 1

Stack Overflow用户

发布于 2022-05-29 13:35:26

这太奇怪了。但一些实验揭示了一些线索：

df = pd.DataFrame({
    'date': pd.to_datetime(['2022-05-01', '2022-05-02', '2022-05-03']),
    'weight': [250., 249, 247],
    })

# Try this
df.loc[4] = None

提出：

FutureWarning: The default dtype for empty Series will be 'object' instead of 'float64' in a future version. Specify a dtype explicitly to silence this warning.

这并不能确切地解释为什么它将NaT添加到第二列中，但它确实表明在将类型附加到现有的dataframe时需要指定这些类型。

如here所解释的，有一种解决方案如下：

df = pd.DataFrame({
    'date': pd.to_datetime(['2022-05-01', '2022-05-02', '2022-05-03']),
    'weight': [250., 249, 247],
    })

next_date = pd.Timestamp('2022-05-04')
df = df.append(pd.DataFrame([{'date': next_date, 'weight': np.nan}]), ignore_index=True)
assert (df.dtypes.values == ('<M8[ns]', 'float64')).all()

然而，这就提出了：

FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead.
  df = df.append(pd.DataFrame([{'date': next_date, 'weight': np.nan}]), ignore_index=True)

所以我想正确的解决办法是：

new_row = pd.DataFrame([{'date': next_date, 'weight': np.nan}])
df = pd.concat([df, new_row]).reset_index(drop=True)
assert (df.dtypes.values == ('<M8[ns]', 'float64')).all()

但我必须问，你为什么要以这种方式附加到一个数据文件中呢？这是相当低效的，如果可能的话应该避免。

票数 1

Stack Overflow用户

发布于 2022-05-29 13:30:31

import pandas as pd
import datetime as dt
import math

df = pd.DataFrame({
    'date': pd.to_datetime(['2022-05-01', '2022-05-02', '2022-05-03']),
    'weight': [250., 249, 247],
    })

last_recorded_date = df['date'].iloc[-1]

while True:
    next_date = last_recorded_date + dt.timedelta(days=1)
    weight = input(f"{next_date}: ")
    if weight == 'q':
        break
    elif weight == '':
        weight = math.nan
    else:
        weight = float(weight)

    df.loc[len(df.index)] = [next_date, weight]
    last_recorded_date = next_date

df = df['weight'].replace(pd.NaT, math.nan)

print(df)