我有一个包含数十万行的数据库,其模式如下:
+----+----------+---------+
| id | duration | type |
+----+----------+---------+
| 1 | 41 | cycling |
+----+----------+---------+
| 2 | 15 | walking |
+----+----------+---------+
| 3 | 6 | walking |
+----+----------+---------+
| 4 | 26 | running |
+----+----------+---------+
| 5 | 30 | cycling |
+----+----------+---------+
| 6 | 13 | running |
+----+----------+---------+
| 7 | 10 | running |
+----+----------+---------+我之前使用MongoDB聚合来完成此操作,并获得按类型和总计数的活动分布:
{
$bucket: {
groupBy: '$duration',
boundaries: [0, 16, 31, 61, 91, 121],
default: 121,
output: {
total: { $sum: 1 },
walking: {
$sum: { $cond: [{ $eq: ['$type', 'walking'] }, 1, 0] },
},
running: {
$sum: { $cond: [{ $eq: ['$type', 'running'] }, 1, 0] },
},
cycling: {
$sum: { $cond: [{ $eq: ['$type', 'cycling'] }, 1, 0] },
},
},
},
}我刚过渡到使用Postgres,不知道如何在那里做条件和。要获得这样的结果表,查询应该是什么?
+---------------+---------+---------+---------+-------+
| duration_band | walking | running | cycling | total |
+---------------+---------+---------+---------+-------+
| 0-15 | 41 | 21 | 12 | 74 |
+---------------+---------+---------+---------+-------+
| 15-30 | 15 | 1 | 44 | 60 |
+---------------+---------+---------+---------+-------+
| 30-60 | 6 | 56 | 7 | 69 |
+---------------+---------+---------+---------+-------+
| 60-90 | 26 | 89 | 32 | 150 |
+---------------+---------+---------+---------+-------+
| 90-120 | 30 | 0 | 6 | 36 |
+---------------+---------+---------+---------+-------+
| 120+ | 13 | 90 | 0 | 103 |
+---------------+---------+---------+---------+-------+
| Total | 131 | 257 | 101 | 492 |
+---------------+---------+---------+---------+-------+发布于 2021-03-15 03:35:50
SQL非常擅长检索和计算数据,并提供数据,因此获取您想要的值是一项简单的任务。它不擅长格式化结果,这就是为什么这项任务通常留给表示层。然而,这并不意味着它不能完成-它可以在single query中完成。难点在于pivot过程-将行转换为列。但首先要做一些设置。您应该将持续时间数据放在自己的表中(如果还没有)。添加了一个标识符,该标识符随后允许多个标准集(稍后将详细介绍)。我将继续这样做。
create table bands( name text, period int4range, title text );
insert into bands(name, period, title)
values ('Standard', '[ 0, 15)'::int4range , '0 - 15')
, ('Standard', '[ 15, 30)'::int4range , '15 - 30')
, ('Standard', '[ 30, 60)'::int4range , '30 - 60')
, ('Standard', '[ 60, 90)'::int4range , '60 - 00')
, ('Standard', '[ 90,120)'::int4range , '90 - 120')
, ('Standard', '[120,)'::int4range , '120+'); 这将设置您当前的标准。name列是前面提到的标识符,其中title列成为输出上的持续时间段。有趣的列是句点;定义为integer range。在这种情况下,[闭合的,开放的]范围包括第一个数字,但不包括第二年,括号有意义。该定义成为结果查询的核心。查询按如下方式构建:
上面给出了活动间隔,它成为旋转时生成的矩阵。
上面的内容现在包含矩阵中每个单元格的间隔和总活动。然而,它仍然存在于行方向上。
将计算出的结果从行方向旋转到列orientation.
结果是:
with buckets ( period , title, ord) as
( select period , title, row_number() over (order by lower(b.period)) ord ---- 1
from bands b
where name = 'Standard'
union all
select '[0,)','Total',count(*) + 1
from bands b
where name = 'Standard'
)
, activities (activity) as ( values ('running'),('walking'),('cycling'), ('Total')) ---- 2
, activity_buckets (period, title, ord, activity) as
(select * from buckets cross join activities) ---- 3
select s2.title "Duration Band" ---- 6
, max(cycling) "Cycling"
, max(running) "Running"
, max(walking) "Walking"
, max(Total) "Total "
from ( select s1.title, s1.ord
, case when s1.activity = 'cycling' then duration else null end cycling ---- 5
, case when s1.activity = 'running' then duration else null end running
, case when s1.activity = 'walking' then duration else null end walking
, case when s1.activity = 'Total' then duration else null end total
from ( select ab.ord, ab.title, ab.activity
, sum(coalesce(t.duration,0)) duration ---- 4
from activity_buckets ab
left join test t
on ( (t.type = ab.activity or ab.activity = 'Total')
and t.duration <@ ab.period --** determines which time interval(s) the value belongs
)
group by ab.ord, ab.title, ab.activity
) s1
) s2
group by s2.ord,s2.title
order by s2.ord;参见demo。它包含了整个过程中的每个主要步骤。此外,它还展示了如何使用为间隔创建表。因为我不喜欢长查询,所以我通常将它们隐藏在SQL函数后面,然后直接使用该函数。Demo也包含这个。
https://stackoverflow.com/questions/66614951
复制相似问题