首页
学习
活动
专区
圈层
工具
发布
首页
学习
活动
专区
圈层
工具
MCP广场
社区首页 >问答首页 >我需要一种使用pandas.merge_asof()进行多对一合并的方法

我需要一种使用pandas.merge_asof()进行多对一合并的方法
EN

Stack Overflow用户
提问于 2018-12-14 08:05:31
回答 1查看 801关注 0票数 0

我有一个类似于以下链接中列出的帖子的问题:pandas merging based on a timestamp which do not match exactly

但是,在使用pandas.merge_asof()的功能时,我需要进行多对一匹配。

我有两个数据帧,df1和df2。

代码语言:javascript
运行
复制
import pandas as pd
import numpy as np
from io import StringIO

dtc = [['CALL_DATE']]
df1 = pd.read_csv(StringIO(u'''
CALL_DATE,customer,status
2017-01-03 14:12:58,70892,P
2017-01-06 20:00:25,70892,P
2017-01-07 09:42:58,70892,X
2017-01-03 13:56:41,70928,N
2017-01-07 15:16:26,70928,C
2017-01-03 15:39:11,71075,U
2017-01-03 15:46:29,71075,N
'''))

df2 = pd.read_csv(StringIO(u'''
CALL_DATE,customer,Note
2017-01-03 14:09:00,70892,Call to return
2017-01-06 19:59:00,70892,Wrong Item shipped
2017-01-07 09:36:00,70892,Survey denied
2017-01-03 13:56:00,70928,TGGT
2017-01-03 13:53:00,70928,Open issue
2017-01-03 13:56:00,70928,No Record of listings
2017-01-07 15:15:00,70928,Need Translator
2017-01-07 15:16:00,70928,rescheduled appointment 
2017-01-03 15:39:11,71075,New Contact
2017-01-03 15:46:29,71075,open membership
2017-01-03 15:46:29,71075,recurring delivery scheduled 
'''))

df1['CALL_DATE'] = pd.to_datetime(df1['CALL_DATE'], format = '%Y-%m-%d %H:%M:%S')
df2['CALL_DATE'] = pd.to_datetime(df2['CALL_DATE'], format = '%Y-%m-%d %H:%M:%S')  

这两个数据帧需要合并,最终结果类似于以下内容:

代码语言:javascript
运行
复制
df3 = pd.read_csv(StringIO(u'''
2017-01-03 14:12:58,70892,P,2017-01-03 14:09:00,Call to return
2017-01-06 20:00:25,70892,P,2017-01-06 19:59:00,Wrong Item shipped
2017-01-07 09:42:58,70892,P,2017-01-07 09:36:00,Survey denied
2017-01-03 13:56:41,70928,N,2017-01-03 13:56:00,TGGT 
2017-01-03 13:56:41,70928,N,2017-01-03 13:53:00,Open issue
2017-01-03 13:56:41,70928,N,2017-01-03 13:56:00,70928,No Record of listings
2017-01-07 15:16:26,70928,C,2017-01-07 15:15:00,Need Translator
2017-01-07 15:16:26,70928,C,2017-01-07 15:16:00,rescheduled appointment
2017-01-03 15:39:11,71075,U,2017-01-03 15:39:11,New Contact
2017-01-03 15:46:29,71075,N,2017-01-03 15:46:29,open membership
2017-01-03 15:46:29,71075,N,2017-01-03 15:46:29,recurring delivery schedule
'''))         

在提供的样本数据中,时间差确实很小,但在许多情况下,时间差可能是几个小时,几乎一整天。我正在尝试将备注与该客户的最接近的客户条目进行匹配。此外,df2条目可以在df1条目之前或之后(按时间顺序)。

当我执行pandas.merge_asof()时,它只是进行一对一的合并,并且我丢失了应该与客户的文件一起使用的注释。

EN

回答 1

Stack Overflow用户

发布于 2018-12-15 01:24:14

也许您所要做的就是在merge_asof调用中交换数据帧的顺序?因为这对我很有效:

代码语言:javascript
运行
复制
df1.sort_values(by='CALL_DATE', inplace=True)
df2.sort_values(by='CALL_DATE', inplace=True)

df1['STATUS_DATE'] = df1.CALL_DATE  # preserves times from df1

df3 = pd.merge_asof(df2, df1, on='CALL_DATE', by='customer', direction='nearest')

调用print(df3)输出(在我的机器上):

代码语言:javascript
运行
复制
             CALL_DATE  customer                           Note status  \
0  2017-01-03 13:53:00     70928                     Open issue      N   
1  2017-01-03 13:56:00     70928                           TGGT      N   
2  2017-01-03 13:56:00     70928          No Record of listings      N   
3  2017-01-03 14:09:00     70892                 Call to return      P   
4  2017-01-03 15:39:11     71075                    New Contact      U   
5  2017-01-03 15:46:29     71075                open membership      N   
6  2017-01-03 15:46:29     71075  recurring delivery scheduled       N   
7  2017-01-06 19:59:00     70892             Wrong Item shipped      P   
8  2017-01-07 09:36:00     70892                  Survey denied      X   
9  2017-01-07 15:15:00     70928                Need Translator      C   
10 2017-01-07 15:16:00     70928       rescheduled appointment       C   

           STATUS_DATE  
0  2017-01-03 13:56:41  
1  2017-01-03 13:56:41  
2  2017-01-03 13:56:41  
3  2017-01-03 14:12:58  
4  2017-01-03 15:39:11  
5  2017-01-03 15:46:29  
6  2017-01-03 15:46:29  
7  2017-01-06 20:00:25  
8  2017-01-07 09:42:58  
9  2017-01-07 15:16:26  
10 2017-01-07 15:16:26  

如果列的顺序让你感到困扰,你可以随时使用reorder the columns

票数 0
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/53771820

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档