我有一个包含开始日期和结束日期的合同列表。
如何计算合同有效期内重叠合同的数量?
df = pd.DataFrame({
'contract': pd.Series(['A1', 'A2', 'A3', 'A4']),
'start': pd.Series(['01/01/2015', '03/02/2015', '15/01/2015', '10/01/2015']),
'end': pd.Series(['16/01/2015', '10/02/2015', '18/01/2015', '12/01/2015'])
})这就给出了:
contract end start
0 A1 16/01/2015 01/01/2015
1 A2 10/02/2015 03/02/2015
2 A3 18/01/2015 15/01/2015
3 A4 12/01/2015 10/01/2015A1与A3和A4重叠,因此重叠= 2。A2重叠而没有约定,因此重叠= 0。A3与A1重叠,因此重叠= 1。A4与A1重叠,因此重叠= 1。
我可以比较每个时间跨度(从头到尾),但这是O(n**2)更好的主意吗?
我有一种感觉,可以通过排序然后looping through the sorted ranges来获得改进
发布于 2015-05-05 04:01:51
以下是一种方法:
df = pd.DataFrame({
'contract': pd.Series(['A1', 'A2', 'A3', 'A4']),
'start': pd.Series(['01/01/2015', '03/02/2015', '15/01/2015', '10/01/2015']),
'end': pd.Series(['16/01/2015', '10/02/2015', '18/01/2015', '12/01/2015'])
})
df['start'] = pd.to_datetime(df.start, dayfirst=True)
df['end'] = pd.to_datetime(df.end, dayfirst=True)
periods = df[['start', 'end']].apply(lambda x: (pd.date_range(x['start'], x['end']),), axis=1)
overlap = periods.apply(lambda col: periods.apply(lambda col_: col[0].isin(col_[0]).any()))
df['overlap_count'] = overlap[overlap].apply(lambda x: x.count() - 1, axis=1)
print df这会产生:
contract end start overlap_count
0 A1 2015-01-16 2015-01-01 2
1 A2 2015-02-10 2015-02-03 0
2 A3 2015-01-18 2015-01-15 1
3 A4 2015-01-12 2015-01-10 1 我已经更新了代码,以输出重叠的计数,而不是以天为单位的重叠。
https://stackoverflow.com/questions/30032723
复制相似问题