前往小程序,Get更优阅读体验!
立即前往
首页
学习
活动
专区
工具
TVP
发布
社区首页 >专栏 >hivesql 实现collect_list内排序

hivesql 实现collect_list内排序

作者头像
用户1217611
发布2023-04-23 16:25:30
7220
发布2023-04-23 16:25:30
举报
文章被收录于专栏:文渊之博文渊之博

假设我有一个看起来像这样的蜂巢表:

代码语言:javascript
复制
ID    event    order_num
------------------------
A      red         2
A      blue        1
A      yellow      3
B      yellow      2
B      green       1
...

我正在尝试使用 collect_list 为每个 ID 生成事件列表。所以类似于以下内容:

代码语言:javascript
复制
SELECT ID, 
collect_list(event) as events_list,
FROM table
GROUP BY ID;

但是,在我分组所依据的每个 ID 中,我需要按 order_num 进行排序。这样我的结果表将如下所示:

代码语言:javascript
复制
ID    events_list
------------------------
A      ["blue","red","yellow"]
B      ["green","red"]

我无法在 collect_list() 查询之前按 ID 和 order_num 进行全局排序,因为该表很大。有没有办法在 collect_list 中按 order_num 排序?

诀窍是使用带有 DISTRIBUTE BY 和 SORT BY 语句的子查询。见下文:

代码语言:javascript
复制
WITH table1 AS (
    SELECT 'A' AS ID, 'red' AS event, 2 AS order_num UNION ALL
    SELECT 'A' AS ID, 'blue' AS event, 1 AS order_num UNION ALL
    SELECT 'A' AS ID, 'yellow' AS event, 3 AS order_num UNION ALL
    SELECT 'B' AS ID, 'yellow' AS event, 2 AS order_num UNION ALL
    SELECT 'B' AS ID, 'green' AS event, 1 AS order_num
)

-- Collect it
SELECT subquery.ID, 
collect_list(subquery.event) as events_list
FROM (
SELECT
        table1.ID,
        table1.event,
        table1.order_num
    FROM table1
    DISTRIBUTE BY
        table1.ID
    SORT BY
        table1.ID,
        table1.order_num
) subquery
GROUP BY subquery.ID;
本文参与 腾讯云自媒体分享计划,分享自作者个人站点/博客。
原始发表:2023-04-14,如有侵权请联系 cloudcommunity@tencent.com 删除

本文分享自 作者个人站点/博客 前往查看

如有侵权,请联系 cloudcommunity@tencent.com 删除。

本文参与 腾讯云自媒体分享计划  ,欢迎热爱写作的你一起参与!

评论
登录后参与评论
0 条评论
热度
最新
推荐阅读
领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档