我有一个使用array_agg()
函数生成数组字符串的查询
SELECT
array_agg(message) as sequence
from mytable
group by id
这将生成一个如下所示的表:
sequence
1 foo foo bar baz bar baz
2 foo bar bar bar baz
3 foo foo foo bar bar baz
但我的目标是压缩字符串数组,使每个字符串数组在一行中只能重复一次,例如,所需的输出将如下所示:
sequence
1 foo bar baz bar baz
2 foo bar baz
3 foo bar baz
如何用Presto SQL做到这一点呢?
发布于 2019-05-29 05:10:08
您可以通过以下两种方法之一完成此操作:
array_distinct
函数从结果数组中删除重复项:WITH mytable(id, message) AS (VALUES
(1, 'foo'), (1, 'foo'), (1, 'bar'), (1, 'bar'), (1, 'baz'), (1, 'baz'),
(2, 'foo'), (2, 'bar'), (2, 'bar'), (2, 'bar'), (2, 'baz'),
(3, 'foo'), (3, 'foo'), (3, 'foo'), (3, 'bar'), (3, 'bar'), (3, 'baz')
)
SELECT array_distinct(array_agg(message)) AS sequence
FROM mytable
GROUP BY id
DISTINCT
限定符在将重复值传递到array_agg之前将其删除。WITH mytable(id, message) AS (VALUES
(1, 'foo'), (1, 'foo'), (1, 'bar'), (1, 'bar'), (1, 'baz'), (1, 'baz'),
(2, 'foo'), (2, 'bar'), (2, 'bar'), (2, 'bar'), (2, 'baz'), (3, 'foo'),
(3, 'foo'), (3, 'foo'), (3, 'bar'), (3, 'bar'), (3, 'baz')
)
SELECT array_agg(DISTINCT message) AS sequence
FROM mytable
GROUP BY id
这两种替代方法产生相同的结果:
sequence
-----------------
[foo, bar, baz]
[foo, bar, baz]
[foo, bar, baz]
(3 rows)
更新:您可以使用最近引入的MATCH_RECOGNIZE
功能删除重复的元素序列:
WITH mytable(id, message) AS (VALUES
(1, 'foo'), (1, 'foo'), (1, 'bar'), (1, 'baz'), (1, 'bar'), (1, 'baz'),
(2, 'foo'), (2, 'bar'), (2, 'bar'), (2, 'bar'), (2, 'baz'),
(3, 'foo'), (3, 'foo'), (3, 'foo'), (3, 'bar'), (3, 'bar'), (3, 'baz')
)
SELECT array_agg(value) AS sequence
FROM mytable
MATCH_RECOGNIZE(
PARTITION BY id
MEASURES A.message AS value
PATTERN (A B*)
DEFINE B AS message = PREV(message)
)
GROUP BY id
https://stackoverflow.com/questions/56349907
复制相似问题