首页
学习
活动
专区
圈层
工具
发布
首页
学习
活动
专区
圈层
工具
MCP广场
社区首页 >问答首页 >使用分区的大型查询标准SQL与ARRAY_AGG()函数

使用分区的大型查询标准SQL与ARRAY_AGG()函数
EN

Stack Overflow用户
提问于 2019-02-13 09:53:33
回答 2查看 8.3K关注 0票数 1

我试图使用PARTITION BY子句和ARRAY_AGG()函数将列折叠到数组中。

大查询中的标准SQL如下:

代码语言:javascript
复制
        WITH initial_30days
           AS (
          SELECT 
            date,
            fullvisitorId AS user_id,
            visitNumber, 
            CONCAT(fullvisitorid, CAST(VisitId AS STRING)) AS session_id
          FROM
            `my-data.XXXXXXX.ga_sessions_*`
            WHERE _TABLE_SUFFIX BETWEEN '20181004' AND  '20181103'
            GROUP BY 1,2,3,4
            )

          SELECT
            date,
            ARRAY_AGG(sessions) OVER (PARTITION BY date ROWS BETWEEN 5 PRECEDING 
            AND CURRENT ROW) AS agg_array
          FROM(

          SELECT
            date,
            user_id,
            COUNT(DISTINCT( session_id))  AS sessions
            FROM initial_30days
            GROUP BY date,user_id) 
            GROUP BY date,sessions

我的预期输出是

代码语言:javascript
复制
+----------+--------------------------+
|   date   |        agg_array         |
+----------+--------------------------+
| 20181004 | [34,21,34,21,6,7,4,43]   |
| 20181005 | [1,5,56,76,23,1,3,54,45] |
| 20181006 | [22,67,43,1,2,67,3,24]   |
| 20181007 | [34,21,34,21,6,7,4,43]   |
+----------+--------------------------+

以一个日期值为例,我的当前输出如下所示:

代码语言:javascript
复制
+----------+------------------------+
|   date   |       agg_array        |
+----------+------------------------+
| 20181004 | [34]                   |
| 20181004 | [34,21]                |
| 20181004 | [34,21,34]             |
| 20181004 | [34,21,34,21]          |
| 20181004 | [34,21,34,21,6]        |
| 20181004 | [34,21,34,21,6,7]      |
| 20181004 | [34,21,34,21,6,7,4]    |
| 20181004 | [34,21,34,21,6,7,4,43] |
+----------+------------------------+

您可以看到按日期分区的数组为该数组的每个值创建了增量行。

应用ARRAY_AGG()函数的数据集如下所示:

代码语言:javascript
复制
+----------+------------------+----------+
|   date   |     user_id      | sessions |
+----------+------------------+----------+
| 20181004 | 2526262363754747 |       34 |
| 20181004 | 2525626325173256 |       21 |
| 20181004 | 7436783255747736 |       34 |
| 20181004 | 6526241526363536 |       21 |
| 20181004 | 4252636353637423 |        6 |
| 20181004 | 3636325636673563 |        7 |
+----------+------------------+----------+

我有一种感觉,那是因为我按上面的sessions分组,但这是因为我得到了一个验证错误,如果我不这样做的话:

代码语言:javascript
复制
    SELECT list expression references column sessions which is 
neither grouped nor aggregated at 
EN

回答 2

Stack Overflow用户

回答已采纳

发布于 2019-02-13 21:17:50

下面是用于BigQuery标准SQL的

只需在原始查询前后添加以下内容

代码语言:javascript
复制
SELECT date, 
  ARRAY_AGG(STRUCT(agg_array) ORDER BY ARRAY_LENGTH(agg_array) DESC LIMIT 1)[OFFSET(0)].*
FROM (
  ...   
  ...   
)
GROUP BY date   

因此,整个内容将如下所示(并将产生所需的结果-同时保留使用窗口函数的想法)

代码语言:javascript
复制
#standardSQL
WITH initial_30days AS (
  SELECT 
    date,
    fullvisitorId AS user_id,
    visitNumber, 
    CONCAT(fullvisitorid, CAST(VisitId AS STRING)) AS session_id
  FROM `my-data.XXXXXXX.ga_sessions_*`
  WHERE _TABLE_SUFFIX BETWEEN '20181004' AND  '20181103'
  GROUP BY 1,2,3,4
)
SELECT date, 
  ARRAY_AGG(STRUCT(agg_array) ORDER BY ARRAY_LENGTH(agg_array) DESC LIMIT 1)[OFFSET(0)].*
FROM (
  SELECT
    date, 
    ARRAY_AGG(sessions) OVER(PARTITION BY date ROWS BETWEEN 5 PRECEDING AND CURRENT ROW) AS agg_array
  FROM(
    SELECT
      date,
      user_id,
      COUNT(DISTINCT( session_id))  AS sessions
    FROM initial_30days
    GROUP BY date,user_id
  )
  GROUP BY date,sessions
)
GROUP BY date   
票数 1
EN

Stack Overflow用户

发布于 2019-02-13 12:13:38

如果您希望每次约会一行,则需要一个GROUP BY date

代码语言:javascript
复制
SELECT date,
       ARRAY_AGG(sessions) AS agg_array
FROM (SELECT date, user_id,
             COUNT(DISTINCT( session_id))  AS sessions
      FROM initial_30days
      GROUP BY date, user_id
     )  du
GROUP BY date;

如果只需要一定数量的值,那么将LIMIT添加到ARRAY_AGG()中。例如,如果您希望为具有最小ids的用户设置5个会话,则可以:

代码语言:javascript
复制
  ARRAY_AGG(sessions ORDER BY user_id LIMIT 5) AS agg_array
票数 0
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/54667183

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档