我有两个数据帧df1和df2
ID Range(US) Count(US) Mean(US)
0 690 1-3 266 4.0
1 4-7 277 NaN
2 354 1-3 233 2.0
3 4-7 85 NaN
4 947 1-3 156 4.0
和
ID Range(UK) Count(UK) Mean(UK)
0 690 1-3 186 4.0
1 4-7 25 NaN
2 354 1-3 44 1.0
3 947 1-3 213 3.0
4 4-7 33 NaN
我使用以下代码进行了合并:
In:df=df1.merge(df2, left_on='deviceid',right_on='deviceid', how='left') df
ID Range(US) Count(US) Mean(US) Range(UK) Count(UK) Mean(UK)
0 690 1-3 266 4.0 1-3 186 4.0
1 4-7 277 NaN 4-7 25 NaN
2 4-7 277 NaN 4-7 33 NaN
3 354 1-3 233 2.0 1-3 44 1.0
4 4-7 85 NaN 4-7 25 NaN
5 4-7 85 NaN 4-7 33 NaN
6 947 1-3 156 4.0 1-3 213 3.0
从上面我们可以看到,对于某些值,如果不存在,则会再次重复这些值。
但预期的产出是
ID Range(US) Count(US) Mean(US) Range(UK) Count(UK) Mean(UK)
0 690 1-3 266 4.0 1-3 186 4.0
1 4-7 277 NaN 4-7 25 NaN
2 354 1-3 233 2.0 1-3 44 1.0
3 4-7 85 NaN Nan NaN NaN
4 947 1-3 156 4.0 1-3 213 3.0
5 4-7 Nan Nan 4-7 33 Nan
发布于 2018-06-05 09:17:19
首先删除替换两个duplicated
ID
中的DataFrames
#df1['ID'] = df1['ID'].mask(df['ID'].duplicated(), '')
#df2['ID'] = df2['ID'].mask(df['ID'].duplicated(), '')
print (df1)
ID Range(US) Count(US) Mean(US)
0 690 1-3 266 4.0
1 690 4-7 277 NaN
2 354 1-3 233 2.0
3 354 4-7 85 NaN
4 947 1-3 156 4.0
print (df2)
ID Range(UK) Count(UK) Mean(UK)
0 690 1-3 186 4.0
1 690 4-7 25 NaN
2 354 1-3 44 1.0
3 947 1-3 213 3.0
4 947 4-7 33 NaN
然后用外部联接按两列合并:
df = df1.merge(df2, left_on=['ID', 'Range(US)'], right_on=['ID', 'Range(UK)'], how='outer')
print (df)
ID Range(US) Count(US) Mean(US) Range(UK) Count(UK) Mean(UK)
0 690 1-3 266.0 4.0 1-3 186.0 4.0
1 690 4-7 277.0 NaN 4-7 25.0 NaN
2 354 1-3 233.0 2.0 1-3 44.0 1.0
3 354 4-7 85.0 NaN NaN NaN NaN
4 947 1-3 156.0 4.0 1-3 213.0 3.0
5 947 NaN NaN NaN 4-7 33.0 NaN
https://stackoverflow.com/questions/50696386
复制相似问题