我试图使用PARTITION BY子句和ARRAY_AGG()函数将列折叠到数组中。
大查询中的标准SQL如下:
WITH initial_30days
AS (
SELECT
date,
fullvisitorId AS user_id,
visitNumber,
CONCAT(fullvisitorid, CAST(VisitId AS STRING)) AS session_id
FROM
`my-data.XXXXXXX.ga_sessions_*`
WHERE _TABLE_SUFFIX BETWEEN '20181004' AND '20181103'
GROUP BY 1,2,3,4
)
SELECT
date,
ARRAY_AGG(sessions) OVER (PARTITION BY date ROWS BETWEEN 5 PRECEDING
AND CURRENT ROW) AS agg_array
FROM(
SELECT
date,
user_id,
COUNT(DISTINCT( session_id)) AS sessions
FROM initial_30days
GROUP BY date,user_id)
GROUP BY date,sessions我的预期输出是
+----------+--------------------------+
| date | agg_array |
+----------+--------------------------+
| 20181004 | [34,21,34,21,6,7,4,43] |
| 20181005 | [1,5,56,76,23,1,3,54,45] |
| 20181006 | [22,67,43,1,2,67,3,24] |
| 20181007 | [34,21,34,21,6,7,4,43] |
+----------+--------------------------+以一个日期值为例,我的当前输出如下所示:
+----------+------------------------+
| date | agg_array |
+----------+------------------------+
| 20181004 | [34] |
| 20181004 | [34,21] |
| 20181004 | [34,21,34] |
| 20181004 | [34,21,34,21] |
| 20181004 | [34,21,34,21,6] |
| 20181004 | [34,21,34,21,6,7] |
| 20181004 | [34,21,34,21,6,7,4] |
| 20181004 | [34,21,34,21,6,7,4,43] |
+----------+------------------------+您可以看到按日期分区的数组为该数组的每个值创建了增量行。
应用ARRAY_AGG()函数的数据集如下所示:
+----------+------------------+----------+
| date | user_id | sessions |
+----------+------------------+----------+
| 20181004 | 2526262363754747 | 34 |
| 20181004 | 2525626325173256 | 21 |
| 20181004 | 7436783255747736 | 34 |
| 20181004 | 6526241526363536 | 21 |
| 20181004 | 4252636353637423 | 6 |
| 20181004 | 3636325636673563 | 7 |
+----------+------------------+----------+我有一种感觉,那是因为我按上面的sessions分组,但这是因为我得到了一个验证错误,如果我不这样做的话:
SELECT list expression references column sessions which is
neither grouped nor aggregated at 发布于 2019-02-13 21:17:50
下面是用于BigQuery标准SQL的
只需在原始查询前后添加以下内容
SELECT date,
ARRAY_AGG(STRUCT(agg_array) ORDER BY ARRAY_LENGTH(agg_array) DESC LIMIT 1)[OFFSET(0)].*
FROM (
...
...
)
GROUP BY date 因此,整个内容将如下所示(并将产生所需的结果-同时保留使用窗口函数的想法)
#standardSQL
WITH initial_30days AS (
SELECT
date,
fullvisitorId AS user_id,
visitNumber,
CONCAT(fullvisitorid, CAST(VisitId AS STRING)) AS session_id
FROM `my-data.XXXXXXX.ga_sessions_*`
WHERE _TABLE_SUFFIX BETWEEN '20181004' AND '20181103'
GROUP BY 1,2,3,4
)
SELECT date,
ARRAY_AGG(STRUCT(agg_array) ORDER BY ARRAY_LENGTH(agg_array) DESC LIMIT 1)[OFFSET(0)].*
FROM (
SELECT
date,
ARRAY_AGG(sessions) OVER(PARTITION BY date ROWS BETWEEN 5 PRECEDING AND CURRENT ROW) AS agg_array
FROM(
SELECT
date,
user_id,
COUNT(DISTINCT( session_id)) AS sessions
FROM initial_30days
GROUP BY date,user_id
)
GROUP BY date,sessions
)
GROUP BY date 发布于 2019-02-13 12:13:38
如果您希望每次约会一行,则需要一个GROUP BY date。
SELECT date,
ARRAY_AGG(sessions) AS agg_array
FROM (SELECT date, user_id,
COUNT(DISTINCT( session_id)) AS sessions
FROM initial_30days
GROUP BY date, user_id
) du
GROUP BY date;如果只需要一定数量的值,那么将LIMIT添加到ARRAY_AGG()中。例如,如果您希望为具有最小ids的用户设置5个会话,则可以:
ARRAY_AGG(sessions ORDER BY user_id LIMIT 5) AS agg_arrayhttps://stackoverflow.com/questions/54667183
复制相似问题