我有一个有3列的Dataframe,两个是类别数据,一个是float16。当我执行groupby并在agg中运行lambda特定函数以根据dtype对每一列进行不同处理时,分类列上会有一个下降。
如果这样做的话,它就会奏效。
i=pd.DataFrame({"A":["a","a","a","b","c","c"],"B":[1,2,3,4,5,6],"C":[ "NaN" ,"b","NaN","b","c","c"]})
i['A'] = i['A'].astype('category')
i['B'] = i['B'].astype('float16')
i.groupby("A", as_index=False)[["B","C"]].agg(lambda x: x.mean() if np.dtype(x)=='float16' else x.value_counts().index[0])
输出,这是我想要得到的是:
A B C
0 a 2.0 NaN
1 b 4.0 b
2 c 5.5 c
但是,每当我声明C列为绝对列时,python就会自动删除列C。
i=pd.DataFrame({"A":["a","a","a","b","c","c"],"B":[1,2,3,4,5,6],"C":[ "NaN" ,"b","NaN","b","c","c"]})
i['A'] = i['A'].astype('category')
i['B'] = i['B'].astype('float16')
i['C'] = i['C'].astype('category')
i.groupby("A", as_index=False)[["B","C"]].agg(lambda x: x.mean() if np.dtype(x)=='float16' else x.value_counts().index[0])
答案如下:
['C'] did not aggregate successfully. If any error is raised this will raise in a future version of pandas. Drop these columns/ops to avoid this warning.
A B
0 a 2.0
1 b 4.0
2 c 5.5
是否有人知道groupby中的agg不能处理分类列?
发布于 2022-06-08 23:47:58
请注意,您没有正确检索类型:
i.groupby("A", as_index=False)[["B","C"]].agg(lambda x: print(np.dtype(x)))
给出“无”,而使用x.dtype=='float16'
,因为x
是pd.Series
。您可以向.agg(lambda x: print(type(x)))
查询
i['A'] = i['A'].astype('category')
i['B'] = i['B'].astype('float16')
i['C'] = i['C'].astype('category')
i.groupby("A", as_index=False)[["B","C"]].agg(lambda x: x.mean() if x.dtype=='float16' else x.value_counts().index[0])
给予:
A B C
0 a 2.0 NaN
1 b 4.0 b
2 c 5.5 c
https://stackoverflow.com/questions/72551544
复制相似问题