首页
学习
活动
专区
圈层
工具
发布
首页
学习
活动
专区
圈层
工具
MCP广场
社区首页 >问答首页 >在bigquery中在某一阈值启动和聚合

在bigquery中在某一阈值启动和聚合
EN

Stack Overflow用户
提问于 2019-04-26 09:23:51
回答 1查看 413关注 0票数 0

设备的能量使用情况每小时记录一次:

代码语言:javascript
运行
复制
+--------------+-----------+-----------------------+
| energy_usage | device_id |  timestamp            |
+--------------+-----------+-----------------------+
| 10           | 1         |  2019-02-12T01:00:00  |
| 16           | 2         |  2019-02-12T01:00:00  |
| 26           | 1         |  2019-03-12T02:00:00  |
| 24           | 2         |  2019-03-12T02:00:00  |
+--------------+-----------+-----------------------+

我的目标是:

  1. 创建两列,一列用于energy_usage_day (上午8点至8点),另一列用于energy_usage_night (上午8点至8点)
  2. 创建一个月度汇总,按device_id分组并总结能源使用情况。

结果可能是这样的:

代码语言:javascript
运行
复制
+--------------+------------------+--------------------+-----------+---------+------+
| energy_usage | energy_usage_day | energy_usage_night | device_id |  month  | year |
+--------------+------------------+--------------------+-----------+---------+------+
| 80           | 30               | 50                 | 1         | 2       | 2019 |
| 130          | 60               | 70                 | 2         | 3       | 2019 |
+--------------+------------------+--------------------+-----------+---------+------+

以下查询产生这样的结果:

代码语言:javascript
运行
复制
SELECT SUM(energy_usage) energy_usage
  , SUM(IF(EXTRACT(HOUR FROM timestamp) BETWEEN 8 AND 19, energy_usage, 0)) energy_usage_day
  , SUM(IF(EXTRACT(HOUR FROM timestamp) NOT BETWEEN 8 AND 19, energy_usage, 0)) energy_usage_night
  , device_id
  , EXTRACT(MONTH FROM timestamp) month, EXTRACT(YEAR FROM timestamp) year
FROM `data`
GROUP BY device_id, month, year

假设我只对超过某一阈值的能源使用总量感兴趣,例如50。我想从能源消耗总量50开始。结果应该如下所示:

代码语言:javascript
运行
复制
+--------------+------------------+--------------------+-----------+---------+------+
| energy_usage | energy_usage_day | energy_usage_night | device_id |  month  | year |
+--------------+------------------+--------------------+-----------+---------+------+
| 30           | 10               | 20                 | 1         | 2       | 2019 |
| 80           | 50               | 30                 | 2         | 3       | 2019 |
+--------------+------------------+--------------------+-----------+---------+------+

换句话说:只有当energy_usage、energy_usage_day和energy_usage_night达到50的阈值时,查询才会开始总结energy_usage、energy_usage和energy_usage_night。

这在bigquery中是可能的吗?

EN

回答 1

Stack Overflow用户

回答已采纳

发布于 2019-04-26 18:32:15

下面是BigQuery标准SQL,逻辑是只有在达到50之后(每个设备每月)才开始聚合使用。

代码语言:javascript
运行
复制
#standardSQL
WITH temp AS (
  SELECT *, SUM(energy_usage) OVER(win) > 50 qualified,
    EXTRACT(HOUR FROM `timestamp`) BETWEEN 8 AND 20 day_hour,
    EXTRACT(MONTH FROM `timestamp`) month, 
    EXTRACT(YEAR FROM `timestamp`) year    
  FROM `project.dataset.table`
  WINDOW win AS (PARTITION BY device_id, TIMESTAMP_TRUNC(`timestamp`, MONTH) ORDER BY `timestamp`)
)
SELECT SUM(energy_usage) energy_usage,
  SUM(IF(day_hour, energy_usage, 0)) energy_usage_day,
  SUM(IF(NOT day_hour, energy_usage, 0)) energy_usage_night,
  device_id,
  month, 
  year
FROM temp
WHERE qualified
GROUP BY device_id, month, year   

假设当前的使用量之和为49,下一个使用项的值为2,之和为51。因此,使用的2将被添加到和。相反,只有一半的1应该被添加。我们能在BigQuery SQL中解决这样的问题吗?

代码语言:javascript
运行
复制
#standardSQL
WITH temp AS (
  SELECT *, SUM(energy_usage) OVER(win) > 50 qualified,
    SUM(energy_usage) OVER(win) - 50 rolling_sum,
    EXTRACT(HOUR FROM `timestamp`) BETWEEN 8 AND 20 day_hour,
    EXTRACT(MONTH FROM `timestamp`) month, 
    EXTRACT(YEAR FROM `timestamp`) year    
  FROM `project.dataset.table`
  WINDOW win AS (PARTITION BY device_id, TIMESTAMP_TRUNC(`timestamp`, MONTH) ORDER BY `timestamp`)
), temp_with_adjustments AS (
  SELECT *, 
    IF(
      ROW_NUMBER() OVER(PARTITION BY device_id, month, year ORDER BY `timestamp`) = 1, 
      rolling_sum, 
      energy_usage
    ) AS adjusted_energy_usage
  FROM temp 
  WHERE qualified
)
SELECT SUM(adjusted_energy_usage) energy_usage,
  SUM(IF(day_hour, adjusted_energy_usage, 0)) energy_usage_day,
  SUM(IF(NOT day_hour, adjusted_energy_usage, 0)) energy_usage_night,
  device_id,
  month, 
  year
FROM temp_with_adjustments
GROUP BY device_id, month, year  

如您所见,我刚刚为temp_with_adjustments (和rolling_sum在temp中支持这一点)添加了逻辑--其余的都是相同的

票数 1
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/55864541

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档