我有一个datetime列。我需要导出一列从每个小时的第一个值到最后一个值经过的总分钟数(按小时分组),但是,在事件重叠的情况下,时间应该分布在两个小时之间。还有一种情况是,如果在两个连续记录之间经过的时间超过30分钟,则必须忽略它。
下面,我将分三个阶段进行说明,原始阶段、中间阶段(计算运行总数)和最终阶段。
而且,我计划在同一个平台上获取每小时一次的增量数据,因此,我们如何正确地将其与旧数据合并是另一个问题。
示例数据:
Moves_TS
1/4/2020 10:00
1/4/2020 10:25
1/4/2020 10:42
1/4/2020 10:56
1/4/2020 10:59
1/4/2020 11:02
1/4/2020 11:24
1/4/2020 11:43
1/4/2020 11:55
1/4/2020 12:26
1/4/2020 12:29
中间层:
Moves_TS Hour Running Total
1/4/2020 10:00 10 0
1/4/2020 10:25 10 25
1/4/2020 10:42 10 42
1/4/2020 10:56 10 56
1/4/2020 10:59 10 60
1/4/2020 11:02 11 2
1/4/2020 11:24 11 24
1/4/2020 11:43 11 43
1/4/2020 11:55 11 55
1/4/2020 12:26 12 0
1/4/2020 12:29 12 3
最终输出:
Hour Work done/Hour
10 60
11 55
12 3
发布于 2020-04-01 19:29:34
这是一个有一些曲折的鸿沟和岛屿问题。首先,我将通过30分钟间隔定义的“岛”来总结:
select min(moves_ts) as start_ts, max(moves_ts) as end_ts
from (select o.*,
count(prev_moves_ts) filter (where moves_ts > prev_moves_ts + interval '30 minute') over (order by moves_ts) as grp
from (select o.*, lag(moves_ts) over (order by moves_ts) as prev_moves_ts
from original o
) o
) o
group by grp;
然后,您可以将其与generate_series()
一起使用,以扩展数据并计算每个小时的重叠:
with islands as (
select min(moves_ts) as start_ts, max(moves_ts) as end_ts
from (select o.*,
count(prev_moves_ts) filter (where moves_ts > prev_moves_ts + interval '30 minute') over (order by moves_ts) as grp
from (select o.*, lag(moves_ts) over (order by moves_ts) as prev_moves_ts
from original o
) o
) o
group by grp
)
select hh.hh,
sum( least(hh.hh + interval '1 hour', i.end_ts) -
greatest(hh.hh, i.start_ts)
) as duration
from (select generate_series(date_trunc('hour', min(moves_ts)),
date_trunc('hour', max(moves_ts)),
interval '1 hour'
) hh
from original o
) hh left join
islands i
on i.start_ts < hh.hh + interval '1 hour' and
i.end_ts >= hh.hh
group by hh.hh
order by hh.hh;
Here是一个db<>fiddle。
发布于 2020-04-01 19:34:17
select
MOVES_TS,
Hour,
TO_CHAR(MOVES_TS,'YYYYMMDDHH') DATEHR,
MIN(Moves_TS) over (partition by DATEHR) as MIN_MOVES_TS,
(
DATE_PART('day', MOVES_TS - MIN_MOVES_TS) * 24 +
DATE_PART('hour', MOVES_TS - MIN_MOVES_TS) * 60 +
DATE_PART('minute', MOVES_TS - MIN_MOVES_TS)
) as RunningTotal
from dataset
https://stackoverflow.com/questions/60969665
复制相似问题