我有一个数据集,它显示了谁在哪个时间预订了哪个房间,它看起来像这样。
email room Start Date End Date
abc@corp.com L11M2 2021-02-01 08:00:00 2021-02-01 11:00:00
xyz@corp.com L12M4 2021-02-01 08:00:00 2021-02-01 10:00:00我想把它分成不同的小时,这样一行只包含一个小时的数据。这就是我想要的数据帧。
email room Start Date End Date
abc@corp.com L11M2 2021-02-01 08:00:00 2021-02-01 09:00:00
abc@corp.com L11M2 2021-02-01 09:00:00 2021-02-01 10:00:00
abc@corp.com L11M2 2021-02-01 10:00:00 2021-02-01 11:00:00
xyz@corp.com L12M4 2021-02-01 08:00:00 2021-02-01 09:00:00
xyz@corp.com L12M4 2021-02-01 09:00:00 2021-02-01 10:00:00有没有什么办法可以用python做这件事?
发布于 2021-08-27 09:37:27
结合pandas melt和pyjanitor的complete可以帮助转换数据:
# pip install pyjanitor
import pandas as pd
import janitor
(df.melt(['email', 'room'], value_name = 'Start_Date')
.reindex([3,1,2,0])
# complete is a wrapper around pandas functions
# to expose missing values ... in this case it exposes the
# missing dates for each group in by
.complete([{'Start_Date':lambda df: pd.date_range(df.min(), df.max(),freq='H')}],
by=['email', 'room'])
.assign(End_Date = lambda df: df.Start_Date.add(pd.Timedelta('1 hour')))
.query('variable != "End Date"').drop(columns='variable'))
email room Start_Date End_Date
0 abc@corp.com L11M2 2021-02-01 08:00:00 2021-02-01 09:00:00
1 abc@corp.com L11M2 2021-02-01 09:00:00 2021-02-01 10:00:00
2 abc@corp.com L11M2 2021-02-01 10:00:00 2021-02-01 11:00:00
4 xyz@corp.com L12M4 2021-02-01 08:00:00 2021-02-01 09:00:00
5 xyz@corp.com L12M4 2021-02-01 09:00:00 2021-02-01 10:00:00https://stackoverflow.com/questions/68950904
复制相似问题