我有一个数据框架的学生身份,以及他们所参加的每门学科的考试数量。我必须根据I将其分组,并以主题和测试次数作为地图。
我所拥有的:
Id Subject Number_of_Tests
101 Maths 6
101 Science 8
101 History 10
102 History 5
102 Maths 4
102 Science 7
我想要的:
Id Tests Grade
101 {Maths:6, Science:8, History:10} A
102 {History:5, Maths:5, Science:7} B
另外,在像这样分组之后,我还想再添加一个名为“品位”的列,这是基于新创建的“测试”地图字段。例如,如果数学考试的次数大于5次,如果科学考试的次数大于5次,如果历史上的考试次数大于5次,则为"A“级,否则为"B”。
有人能帮我一下吗。
发布于 2020-09-14 06:15:06
我觉得你需要:
# create a list of both columns on groupby
new_df = df.groupby('Id', as_index=False).aggregate({"Subject": lambda x: x.to_list(), "Number_Of_Tests": lambda x: x.to_list()})
# create a new column `Grade` based on condition
new_df["Grade"] = ["A" if all(j>5 for j in i) else "B" for i in new_df["Number_Of_Tests"]]
# create a column Tests using other 2 columns
new_df["Tests"] = [{k:v for k,v in zip(i,j)} for i,j in zip(new_df["Subject"], new_df["Number_Of_Tests"])]
# drop unwanted columns
new_df.drop(["Subject","Number_Of_Tests"], axis=1, inplace=True)
输出:
Id Grade Tests
0 101 A {'Maths': 6, 'Science': 8, 'History': 10}
1 102 B {'History': 5, 'Maths': 4, 'Science': 7}
编辑
mask1 = (df["Subject"] == "Maths") & (df["Number_Of_Tests"] > 3)
mask2 = (df["Subject"] == "Science") & (df["Number_Of_Tests"] > 5)
mask3 = (df["Subject"] == "History") & (df["Number_Of_Tests"] > 7)
df["Grades"] = np.select([mask1, mask2, mask3], ["A", "A", "A"], "B")
def func(x):
if "B" in x.values:
return "B"
return "A"
new_df = df.groupby('Id', as_index=False).aggregate({"Subject": lambda x: x.to_list(),
"Number_Of_Tests": lambda x: x.to_list(),
"Grades": func})
new_df["Tests"] = [{k:v for k,v in zip(i,j)} for i,j in zip(new_df["Subject"], new_df["Number_Of_Tests"])]
new_df.drop(["Subject","Number_Of_Tests"], axis=1, inplace=True)
发布于 2020-09-14 06:03:12
您最初问的问题是Tests
的输出是一个列表。这就是提供输出的代码。稍后,我将尝试为dict创建另一个版本:
,
Tests
列是非常直接的。首先,将相关的列连接为一个刺。稍后,在.groupby()
中,您将将这些值聚合为一个列表。.min()
并返回A
或B
来创建Grade
列。稍后,您可以将其合并回新的合并数据格式.。
df['Tests'] = df['Subject'] + ': ' + df['Number_of_Tests'].astype(str)
df['Grade'] = (df.groupby(['Id'])['Number_of_Tests'].transform('min') > 5).replace([True,False], ['A','B'])
df = pd.merge(df.groupby(['Id'])['Tests'].agg(list).reset_index(),
df[['Grade','Id']], on='Id').drop_duplicates(subset='Id')
df
Out[1]:
Id Tests Grade
0 101 [Maths: 6, Science: 8, History: 10] A
3 102 [History: 5, Maths: 4, Science: 7] B
发布于 2020-09-14 06:15:56
更改数据文件的结构,这样就可以更容易地进行操作,而不是像dict这样容易操作的格式。
df = df.set_index(['Id', 'Subject']).unstack()
Number_of_Tests
Subject History Maths Science
Id
101 10 6 8
102 5 4 7
现在,只需根据所需条件添加一列即可。
df['Grade'] = np.where((df['Number_of_Tests'] > 5).all(axis=1), 'A', 'B')
输出
Number_of_Tests Grade
Subject History Maths Science
Id
101 10 6 8 A
102 5 4 7 B
或
如果你想要dict
,
df['Tests'] = df.groupby('Id').apply(lambda x: dict(zip(x.Subject, x.Number_of_Tests)))
https://stackoverflow.com/questions/63878581
复制相似问题