这个线程here建议使用reduce
一次合并多个数据帧。
df1= pd.DataFrame({'key': ['A', 'B', 'C', 'D'], 'value': np.random.randn(4)})
df2= pd.DataFrame({'key': ['B', 'D', 'E', 'F'], 'value': np.random.randn(4)})
df3= pd.DataFrame({'key': ['A', 'C', 'E', 'F'], 'value': np.random.randn(4)})
df4= pd.DataFrame({'key': ['A', 'B', 'C', 'F'], 'value': np.random.randn(4)})
df_list = [df1, df2, df3, df4]
from functools import reduce
df_merged = reduce(lambda left,right: pd.merge(left,right,on=['key'], how='outer'), df_list )
执行merge
,但有一个警告
<stdin>:1: FutureWarning: Passing 'suffixes' which cause duplicate columns {'value_x'} in the result is deprecated and will raise a MergeError in a future version.
df_meged
的列具有完全重复的名称。如何强制每个列的不同名称?
发布于 2021-12-29 21:24:56
你能做到的
s = pd.concat([x.set_index('key') for x in df_list],axis = 1,keys=range(len(df_list)))
s.columns = s.columns.map('{0[1]}_{0[0]}'.format)
s = s.reset_index()
s
Out[236]:
key value_0 value_1 value_2 value_3
0 A -1.957968 NaN -0.852135 -0.976960
1 B 1.545932 -0.276838 NaN 0.197615
2 C -2.149727 NaN -0.364382 0.349993
3 D 0.524990 -0.476655 NaN NaN
4 E NaN -2.135870 0.798782 NaN
5 F NaN 1.456544 -0.255705 0.447279
发布于 2021-12-29 21:20:38
此场景中的外部联接相当于在零轴上追加。因此,最好使用dfs中的两列作为连接键。代码如下
from functools import reduce
df_merged = reduce(lambda left,right: pd.merge(left,right,on=['key', 'value'], how='outer'), df_list )
https://stackoverflow.com/questions/70525053
复制相似问题