描述
CLUSTER BY 子句用于先根据输入表达式重新分区数据,然后在每个分区内排序。这在语义上等同于依次执行 DISTRIBUTE BY 和 SORT BY。此子句仅保证结果行在每个分区内排序,不保证输出的全局顺序。语法
CLUSTER BY { expression [ , ... ] }
参数
子句/关键字 | 说明 |
expression | 指定一个或多个值、运算符和 SQL 函数的组合 |
示例
-- 创建测试表CREATE EXTERNAL TABLE cb_person (name STRING, age INT)USING PARQUET LOCATION 'cosn://<your_cos_bucket>/test_cluster_by/cb_person'-- 写入数据INSERT INTO cb_person VALUES ('Zen',25),('Anil',18),('Shone',16),('Mike',25),('John',18),('Jack',16)-- CLUSTER BY ageSELECT age, name FROM cb_person CLUSTER BY age-- CLUSTER BY (2 partitions)SET spark.sql.shuffle.partitions = 2SELECT age, name FROM cb_person CLUSTER BY age