首页
学习
活动
专区
工具
TVP
发布
社区首页 >问答首页 >bigquery中具有阈值的两个表的总和聚合

bigquery中具有阈值的两个表的总和聚合
EN

Stack Overflow用户
提问于 2019-06-20 16:16:21
回答 1查看 89关注 0票数 0

下表显示了设备的每小时能耗:

代码语言:javascript
复制
+--------------+-----------+-----------------------+
| energy_usage | device_id |  timestamp            |
+--------------+-----------+-----------------------+
| 10           | 1         |  2019-02-12T01:00:00  |
| 16           | 2         |  2019-02-12T01:00:00  |
| 26           | 1         |  2019-03-12T02:00:00  |
| 24           | 2         |  2019-03-12T02:00:00  |
+--------------+-----------+-----------------------+

我汇总了这些数据,这样我就可以得到白天和晚上的能源使用情况和设备:

代码语言:javascript
复制
+--------------+------------------+--------------------+-----------+------------+
| energy_usage | energy_usage_day | energy_usage_night | device_id |    date    |
+--------------+------------------+--------------------+-----------+------------+
| 80           | 30               | 50                 | 1         | 2019-06-02 |
| 130          | 60               | 70                 | 2         | 2019-06-03 |
+--------------+------------------+--------------------+-----------+------------+

我只对超过一定阈值的能源使用情况感兴趣。下面的查询适用于我:

代码语言:javascript
复制
WITH temp AS (
  SELECT *, SUM(usage) OVER(win) > 50 qualified,
    SUM(usage) OVER(win) - 50 rolling_sum,
    EXTRACT(HOUR FROM timestamp) BETWEEN 8 AND 19 day_hour,
    EXTRACT(MONTH FROM timestamp) month,
    FORMAT_TIMESTAMP("%Y-%m-%d", timestamp) date
  FROM `project.dataset.table`
  WINDOW win AS (PARTITION BY device_id, TIMESTAMP_TRUNC(timestamp, MONTH) ORDER BY timestamp)
), temp_with_adjustments AS (
  SELECT *, 
    IF(
      ROW_NUMBER() OVER(PARTITION BY device_id, MONTH ORDER BY timestamp) = 1, 
      rolling_sum, 
      usage
    ) AS adjusted_energy_usage
  FROM temp 
  WHERE qualified
)
SELECT ROUND(SUM(adjusted_energy_usage), 4) energy_usage,
  ROUND(SUM(IF(day_hour, adjusted_energy_usage, 0)), 4) energy_usage_day,
  ROUND(SUM(IF(NOT day_hour, adjusted_energy_usage, 0)), 4) energy_usage_night,
  device_id,
  date
FROM temp_with_adjustments
GROUP BY device_id, date

虽然第一个表显示了能源使用情况,但我还有另一个表显示了相应的使用情况:

代码语言:javascript
复制
+--------------+-----------+-----------------------+
| usage_charge | device_id |  timestamp            |
+--------------+-----------+-----------------------+
| 0.2          | 1         |  2019-02-12T01:00:00  |
| 0.6          | 2         |  2019-02-12T01:00:00  |
| 0.1          | 1         |  2019-03-12T02:00:00  |
| 1.2          | 2         |  2019-03-12T02:00:00  |
+--------------+-----------+-----------------------+

我想深入了解能源使用量> 50的设备在白天和晚上的使用费。结果可能如下所示:

代码语言:javascript
复制
+--------------+------------------+--------------------+--------------+------------------+--------------------+-----------+------------+
| energy_usage | energy_usage_day | energy_usage_night | usage_charge | usage_charge_day | usage_charge_night | device_id |    date    |
+--------------+------------------+--------------------+--------------+------------------+--------------------+-----------+------------+
| 80           | 30               | 50                 | 1.2          | 0.4              | 0.8                | 1         | 2019-06-02 |
| 130          | 60               | 70                 | 2.5          | 1                | 1.5                | 2         | 2019-06-03 |
+--------------+------------------+--------------------+--------------+------------------+--------------------+-----------+------------+

因此,我的第一个想法是对使用费使用与对能源使用完全相同的查询。然而,虽然50的阈值适用于能源使用,但我不能为使用费指定一个固定的阈值,因为收费计算因设备而异。因此,我必须首先获取能源使用量> 50,并使用时间戳来汇总使用费用。你知道我如何在bigquery中做到这一点吗?这有可能吗?

EN

回答 1

Stack Overflow用户

回答已采纳

发布于 2019-06-21 03:29:17

下面是针对BigQuery标准SQL的,只是基于我在初始查询中看到的应用模式-所以对我来说很难100%确定它就是你需要的。但不管怎样,这肯定是一个好的开始

代码语言:javascript
复制
#standardSQL
WITH temp AS (
  SELECT *, SUM(IF(qualified, usage_charge, 0)) OVER(win) rolling_charge
  FROM (
    SELECT *, SUM(usage) OVER(win) > 50 qualified,
      SUM(usage) OVER(win) - 50 rolling_sum,
      EXTRACT(HOUR FROM timestamp) BETWEEN 8 AND 19 day_hour,
      EXTRACT(MONTH FROM timestamp) month,
      FORMAT_TIMESTAMP("%Y-%m-%d", timestamp) date
    FROM `project.dataset.usage`
    JOIN `project.dataset.charges` USING(device_id, timestamp)
    WINDOW win AS (PARTITION BY device_id, TIMESTAMP_TRUNC(timestamp, MONTH) ORDER BY timestamp)
  )
  WINDOW win AS (PARTITION BY device_id, TIMESTAMP_TRUNC(timestamp, MONTH) ORDER BY timestamp)
), temp_with_adjustments AS (
  SELECT *, 
    IF(
      ROW_NUMBER() OVER(PARTITION BY device_id, MONTH ORDER BY timestamp) = 1, 
      rolling_sum, 
      usage
    ) AS adjusted_energy_usage
  FROM temp 
  WHERE qualified
)
SELECT ROUND(SUM(adjusted_energy_usage), 4) energy_usage,
  ROUND(SUM(IF(day_hour, adjusted_energy_usage, 0)), 4) energy_usage_day,
  ROUND(SUM(IF(NOT day_hour, adjusted_energy_usage, 0)), 4) energy_usage_night,
  ROUND(SUM(rolling_charge), 4) usage_charge,
  ROUND(SUM(IF(day_hour, rolling_charge, 0)), 4) usage_charge_day,
  ROUND(SUM(IF(NOT day_hour, rolling_charge, 0)), 4) usage_charge_night,
  device_id,
  date
FROM temp_with_adjustments
GROUP BY device_id, date
票数 1
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/56681800

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档