我试图计算物理和数学两类的平均值,并将其包含在单独的列平均值中。此外,我只是试图计算平均时的班级,当两者都需要一个分数。这是用来做过滤器的。唯一不起作用的想法是计算平均值。对于缺少的,它起作用,但不知怎么地,它将值设置为零,这是很奇怪的。数据如下所示:
Date School Math Physics Mean Flag
01.01.2020 ABC 3 4 1
01.03.2020 ABC 2 3 1
01.05.2020 ABC 2 1 1.5 2
01.07.2020 ABC 2 1 1
01.08.2020 ABC 2 1 1.5 2
01.04.2020 ABC 2 3
01.06.2020 ABC 1 3我的代码如下所示:
import pandas as pd
path = 'School_grades.xlsx'
df = pd.read_excel(path)
df_copy = df.copy(deep=True)
df_copy['Date'] = pd.to_datetime(df_copy.Date)
df_copy = df_copy[(df_copy["Flag"] != 3)]
df_copy['Mean'] = ((df_copy['Math'] + df_copy['Physics'])/2).where(df_copy['Flag'] == 1)
print(df_copy)我的代码提供了以下内容,其中我已经包含到NaN中的列:
Date School Math Physics Mean Flag
0 2020-01-01 ABC 3.0 4.0 3.5 1
1 2020-01-03 ABC 2.0 3.0 2.5 1
2 2020-01-05 ABC 2.0 1.0 NaN 2
3 2020-01-07 ABC 2.0 1.0 1.5 1
4 2020-01-08 ABC 2.0 1.0 NaN 2但更愿意期待这样的事情:
Date School Math Physics Mean Flag
0 2020-01-01 ABC 3.0 4.0 3.5 1
1 2020-01-03 ABC 2.0 3.0 2.5 1
2 2020-01-05 ABC 2.0 1.0 1.5 2
3 2020-01-07 ABC 2.0 1.0 1.5 1
4 2020-01-08 ABC 2.0 1.0 1.5 2发布于 2020-10-13 17:33:53
忘记在other中添加pandas.where()参数
>> df_copy['Mean'] = ((df_copy['Math'] + df_copy['Physics'])/2).where(df_copy['Flag'] == 1,df_copy['Mean'])
>> print(df_copy)
Date School Math Physics Mean Flag
0 01.01.2020 ABC 3.0 4.0 3.5 1
1 01.03.2020 ABC 2.0 3.0 2.5 1
2 01.05.2020 ABC 2.0 1.0 1.5 2
3 01.07.2020 ABC 2.0 1.0 1.5 1
4 01.08.2020 ABC 2.0 1.0 1.5 2
5 01.04.2020 ABC 2.0 NaN NaN 3
6 01.06.2020 ABC NaN 1.0 NaN 3使用pandas.DataFrame.mean计算平均值
df_copy['Mean'] = df_copy[['Math','Physics']].mean(axis=1).where(df_copy.Flag == 1,df_copy['Mean'])您还可以使用numpy.where
import numpy as np
df_copy['Mean'] = np.where(df_copy.Flag == 1,df_copy[['Math','Physics']].mean(axis=1),df_copy['Mean'])https://stackoverflow.com/questions/64339635
复制相似问题