我有一个从salesforce导出并转换的原始数据,如下所示;
df = pd.DataFrame(columns=['contact_start','name', 'aht'],
data=[['2021-09-27 09:58:00','Venus','180'],
['2021-09-27 10:00:00','Venus','240'],
['2021-09-27 11:05:00','Venus','60'],
['2021-09-27 10:55:00','Mars','30'],
['2021-09-27 10:56:00','Mars','30']])
使用下面的代码
df["contact_start"] = pd.to_datetime(df["contact_start"], format = "%Y-%m-%d %H:%M:%S",errors='coerce')
df["date"] = df["contact_start"].dt.strftime('%Y-%m-%d')
df['aht']=pd.to_datetime(df["aht"], unit='s').dt.strftime("%H:%M:%S")
df['contact_finish'] = pd.to_timedelta(df['aht']) + pd.to_datetime(df['contact_start'])
df['contact_finish'] = df['contact_finish'].astype('datetime64[s]')
我将其转换为:
但我的最终目标是处理重叠问题,我没有办法实现这一点。
结果应该如下所示:
df = pd.DataFrame(columns=['date','name', 'total_duration_sec'],
data=[['2021-09-27','Venus','420'],
['2021-09-27','Mars','60']])
我猜这看起来很简单,但实际上并非如此。如果有任何帮助,我将不胜感激。
编辑:我不知道如何在python中放入更有意义的数据,所以我上传了一个示例数据文件(3kb csv)
发布于 2021-10-19 14:33:58
您可以通过向代码中添加以下代码行来使现有代码正常工作:
overlapped = pd.Series(df.groupby(['name']).apply(lambda x: (x['contact_finish'] - x['contact_start'].shift(-1)).dt.total_seconds().shift()).droplevel(0), name='overlapped')
overlapped = overlapped.mask(overlapped<0, 0).fillna(0)
df['date'] = df['contact_start'].dt.date
df = df.groupby(['date', 'name']).apply(lambda x: (((x['contact_finish'] - x['contact_start']).dt.seconds) - overlapped).sum()).reset_index(name='total_duration_sec')
输出:
date name total_duration_sec
0 2021-09-27 Mars 60.0
1 2021-09-27 Venus 420.0
https://stackoverflow.com/questions/69631586
复制相似问题