我有一个父表df0
,它有两级代码Code1
、Code2
和Message
。多个子表,如存储较低级别Code2
和Messages
的df1
。如您所见,"df1"
在表df0
中表示为索引。
import pandas as pd
s1 = pd.Series([1, 0, 'A'])
s2 = pd.Series([2, 'df1', 'B'])
s3 = pd.Series([3, 0, 'C'])
df0 = pd.DataFrame([list(s1), list(s2), list(s3)], columns = ["Code1", "Code2", "Message"])
df0
Code1 Code2 Message
0 1 0 A
1 2 df1 B
2 3 0 C
s4 = pd.Series([0, 'B1'])
s5 = pd.Series([1, 'B2'])
s6 = pd.Series([2, 'B3'])
df1 = pd.DataFrame([list(s4), list(s5), list(s6)], columns = ["Code2", "Message"])
df1
Code2 Message
0 0 B1
1 1 B2
2 2 B3
我想要一个更大的表,将df1
放回df0
,并创建一个平面合并表。谢谢你的帮助。
Code1 Code2 Message
0 1 0 A
1 2 B
2 2 0 B1
3 2 1 B2
4 2 2 B3
5 3 0 C
发布于 2019-04-20 01:32:19
使用merge
。下面提供了一个中间结果,可以进一步处理该结果以获得所需的输出。
import pandas as pd
import numpy as np
print(pd.__version__)
s1 = pd.Series([1, 0, 'A'])
s2 = pd.Series([2, 'df1', 'B'])
s3 = pd.Series([3, 0, 'C'])
df0 = pd.DataFrame([list(s1), list(s2), list(s3)], columns = ["Code1", "Code2", "Message"])
s4 = pd.Series([0, 'B1'])
s5 = pd.Series([1, 'B2'])
s6 = pd.Series([2, 'B3'])
df1 = pd.DataFrame([list(s4), list(s5), list(s6)], columns = ["Code2", "Message"])
join_key = 'df1'
df1[join_key] = join_key
df2 = pd.concat([df0[df0['Code2'] == join_key], df1], sort=True)
result = df0.merge(df2, left_on=['Code2'], right_on=['df1'], how='outer', suffixes=('_l', '_r'))
print (result)
结果
0.24.2
Code1_l Code2_l Message_l Code1_r Code2_r Message_r df1
0 1.0 0 A NaN NaN NaN NaN
1 3.0 0 C NaN NaN NaN NaN
2 2.0 df1 B NaN 0 B1 df1
3 2.0 df1 B NaN 1 B2 df1
4 2.0 df1 B NaN 2 B3 df1
5 NaN NaN NaN 2.0 df1 B NaN
https://stackoverflow.com/questions/55765444
复制相似问题