假设我有这样的数据:
df = pd.DataFrame({'index':['10a','10a','10a','20b','20b','20b','30c','30c','30c']
,'var_vals': ['aaa','aaa','abb','bbb','bba','bbb','ccc','ccc','cab']
,'var2_vals':['aga','aga','add','bgb','bbd','bgb','cdd','cdd','cda']})
display(df)看起来是这样的:
index var_vals var2_vals
0 10a aaa aga
1 10a aaa aga
2 10a abb add
3 20b bbb bgb
4 20b bba bbd
5 20b bbb bgb
6 30c ccc cdd
7 30c ccc cdd
8 30c cab cda如何使用新列中的不同之处将输出转换为一行:
index var_vals var_vals_0 var2_vals var2_vals_0
0 10a aaa abb aga add
1 20b bbb bba bgb bbd
2 30c ccc cab cdd cda我尝试过groupby,透视/枢轴_表,栈/解栈,但我要么以巨大的维度结束,要么数据丢失。
发布于 2018-10-19 18:04:16
这是另一个:
newdf = pd.DataFrame(df.groupby('index')['var_vals'].unique().tolist()).fillna('')更新代码:
dfs = (pd.DataFrame(df.groupby('index')[i].unique().tolist()).fillna('').add_prefix(i+'_')
for i in df.drop('index', 1))
df = pd.concat(dfs, axis=1)完整示例
将熊猫作为pd导入
df = pd.DataFrame({'index':['10a','10a','10a','20b','20b','20b','30c','30c','30c']
,'var_vals': ['aaa','aaa','abb','bbb','bba','bbb','ccc','ccc','cab']
,'var2_vals':['aga','aga','add','bgb','bbd','bgb','cdd','cdd','cda']})
df = pd.concat(
(pd.DataFrame(df.groupby('index')[i].unique().tolist()).fillna('').add_prefix(i+'_')
for i in df.drop('index', 1)), axis=1)
print(df)返回:
var2_vals_0 var2_vals_1 var_vals_0 var_vals_1
0 aga add aaa abb
1 bgb bbd bbb bba
2 cdd cda ccc cab发布于 2018-10-19 17:58:17
一种通过groupby.apply的方法
df.groupby('index')['var_vals'].apply(lambda x: pd.Series(x.unique())).unstack()
0 1
index
10a aaa abb
20b bbb bba
30c ccc cab发布于 2018-10-19 18:02:44
drop_duplicates与pivot的结合
df.drop_duplicates().assign(key=lambda x : x.groupby('index').cumcount()).pivot('index','key','var_vals')
Out[910]:
key 0 1
index
10a aaa abb
20b bbb bba
30c ccc cabhttps://stackoverflow.com/questions/52897666
复制相似问题