我在Hive中有下表mytable
:
id radar_id car_id datetime
1 A21 123 2017-03-08 17:31:19.0
2 A21 555 2017-03-08 17:32:00.0
3 A21 777 2017-03-08 17:33:00.0
4 B15 123 2017-03-08 17:35:22.0
5 B15 555 2017-03-08 17:34:05.0
5 B15 777 2017-03-08 20:50:12.0
6 A21 123 2017-03-09 11:00:00.0
7 C11 123 2017-03-09 11:10:00.0
8 A21 123 2017-03-09 11:12:00.0
9 A21 555 2017-03-09 11:12:10.0
10 B15 123 2017-03-09 11:14:00.0
11 C11 555 2017-03-09 11:20:00.0
我想让汽车通过雷达,A21
和B15
的路线在同一次旅行中。例如,如果同一car_id
的日期不同,那么它就不是相同的行程。基本上,我想考虑的是,相同车辆的雷达A21
和B15
之间的最大时间差应该是30分钟。如果它更大,那么旅行就不一样了,比如car_id
777
。
我的最终目标是计算每天的平均出行次数(非唯一的,所以如果同一辆车经过同一条路线2次,那么就应该计算2次)。
预期结果如下:
radar_start radar_end avg_tripscount_per_day
A21 B15 1.5
在2017-03-08
的日期,雷达A21
和B15
之间有2次行程(由于30分钟的限制,不考虑汽车777
),而在2017-03-09
的日期只有1次。平均每天是2+1=1.5旅行。
我怎样才能得到这个结果?基本上,我不知道如何在查询中引入30分钟的限制,以及如何按radar_start
和radar_end
对骑乘进行分组。
谢谢。
更新:
A21
在2017-03-08 23:55
和雷达B15
在2017-03-09 00:15
触发的,那么它应该被认为是为日期2017-03-08
注册的同一次旅行。ids
6和8的情况下,同一辆汽车123
经过A21
两次,然后转向B15
(id
10)。最后一次乘坐id
8应该考虑。那么,8-10
.因此,最近之前的B15
。其解释是,一辆汽车经过A21
两次,第二次被转向B15
。发布于 2017-08-03 12:11:31
select count(*) / count(distinct to_date(datetime)) as trips_per_day
from (select radar_id
,datetime
,lead(radar_id) over w as next_radar_id
,lead(datetime) over w as next_datetime
from mytable
where radar_id in ('A21','B15')
window w as
(
partition by car_id
order by datetime
)
) t
where radar_id = 'A21'
and next_radar_id = 'B15'
and datetime + interval '30' minutes >= next_datetime
;
+----------------+
| trips_per_day |
+----------------+
| 1.5 |
+----------------+
附注:
如果版本不支持间隔,则可以将最后的代码记录替换为-
and to_unix_timestamp(datetime) + 30*60 > to_unix_timestamp(next_datetime)
发布于 2017-08-03 11:22:37
我错过了您正在使用Hive
,所以开始为SQL-Server
编写查询,但也许它会对您有所帮助。试着做这样的事情:
查询
select radar_start,
radar_end,
convert(decimal(6,3), count(*)) / convert(decimal(6,3), count(distinct dt)) as avg_tripscount_per_day
from (
select
t1.radar_id as radar_start,
t2.radar_id as radar_end,
convert(date, t1.[datetime]) dt,
row_number() over (partition by t1.radar_id, t1.car_id, convert(date, t1.[datetime]) order by t1.[datetime] desc) rn1,
row_number() over (partition by t2.radar_id, t2.car_id, convert(date, t2.[datetime]) order by t2.[datetime] desc) rn2
from trips as t1
join trips as t2 on t1.car_id = t2.car_id
and datediff(minute,t1.[datetime], t2.[datetime]) between 0 and 30
and t1.radar_id = 'A21'
and t2.radar_id = 'B15'
)x
where rn1 = 1 and rn2 = 1
group by radar_start, radar_end
OUPUT
radar_start radar_end avg_tripscount_per_day
A21 B15 1.5000000000
样本数据
create table trips
(
id int,
radar_id char(3),
car_id int,
[datetime] datetime
)
insert into trips values
(1,'A21',123,'2017-03-08 17:31:19.0'),
(2,'A21',555,'2017-03-08 17:32:00.0'),
(3,'A21',777,'2017-03-08 17:33:00.0'),
(4,'B15',123,'2017-03-08 17:35:22.0'),
(5,'B15',555,'2017-03-08 17:34:05.0'),
(5,'B15',777,'2017-03-08 20:50:12.0'),
(6,'A21',123,'2017-03-09 11:00:00.0'),
(7,'C11',123,'2017-03-09 11:10:00.0'),
(8,'A21',123,'2017-03-09 11:12:00.0'),
(9,'A21',555,'2017-03-09 11:12:10.0'),
(8,'B15',123,'2017-03-09 11:14:00.0'),
(9,'C11',555,'2017-03-09 11:20:00.0')
https://stackoverflow.com/questions/45481288
复制相似问题