第一层是 Partition,即分区。用户可以指定某一维度列作为分区列,并指定每个分区的取值范围,分区支持 Range 和 List 的划分方式。
第二层是 Bucket分桶(Tablet),仅支持 Hash 的划分方式,用户可以指定一个或多个维度列以及桶数对数据进行 HASH 分布或者不指定分桶列设置成 Random Distribution对数据进行随机分布。
创建Doris表时也可以仅使用一层分区,使用一层分区时,只支持Bucket分桶划分,这种表叫做单分区表;如果一张表既有分区又有分桶,这张表叫做复合分区表。
分区用于将数据划分成不同区间, 逻辑上可以理解为将原始表划分成了多个部分。可以方便的按分区对数据进行管理,例如,删除数据时,更加迅速。Partition支持Range和List的划分方式。
使用分区时注意点如下:
业务上,多数用户会选择采用按时间进行partition。Range分区列通常为时间列,以方便管理新旧数据。
Partition支持通过"VALUES [...)"指定下界,生成一个左闭右开的区间。也支持通过" VALUES LESS THAN (...)"仅指定上界,系统会将前一个分区的上界作为该分区的下界,生成一个左闭右开的区。从Doris1.2.0版本后也支持通过"FROM(...) TO (...) INTERVAL ..."来批量创建分区。下面分别进行演示。
通过"VALUES [...)"创建Range分区表example_db.example_range_tbl1:
CREATE TABLE IF NOT EXISTS example_db.example_range_tbl1
(
`user_id` LARGEINT NOT NULL COMMENT "用户id",
`date` DATE NOT NULL COMMENT "数据灌入日期时间",
`timestamp` DATETIME NOT NULL COMMENT "数据灌入的时间戳",
`city` VARCHAR(20) COMMENT "用户所在城市",
`age` SMALLINT COMMENT "用户年龄",
`sex` TINYINT COMMENT "用户性别",
`last_visit_date` DATETIME REPLACE DEFAULT "1970-01-01 00:00:00" COMMENT "用户最后一次访问时间",
`cost` BIGINT SUM DEFAULT "0" COMMENT "用户总消费",
`max_dwell_time` INT MAX DEFAULT "0" COMMENT "用户最大停留时间",
`min_dwell_time` INT MIN DEFAULT "99999" COMMENT "用户最小停留时间"
)
ENGINE=OLAP
AGGREGATE KEY(`user_id`, `date`, `timestamp`, `city`, `age`, `sex`)
PARTITION BY RANGE(`date`)
(
PARTITION `p201701` VALUES [("2017-01-01"),("2017-02-01")),
PARTITION `p201702` VALUES [("2017-02-01"),("2017-03-01")),
PARTITION `p201703` VALUES [("2017-03-01"),("2017-04-01"))
)
DISTRIBUTED BY HASH(`user_id`) BUCKETS 16
PROPERTIES
(
"replication_num" = "3"
);
查看表example_db.example_range_tbl1分区信息:
mysql> SHOW PARTITIONS FROM example_db.example_range_tbl1\G;
*************************** 1. row ***************************
PartitionId: 13898
PartitionName: p201701
VisibleVersion: 1
VisibleVersionTime: 2023-02-08 16:36:24
State: NORMAL
PartitionKey: date
Range: [types: [DATE]; keys: [2017-01-01]; ..types: [DATE]; keys: [2017-02-01]; )
DistributionKey: user_id
Buckets: 16
ReplicationNum: 3
StorageMedium: HDD
CooldownTime: 9999-12-31 23:59:59
RemoteStoragePolicy:
LastConsistencyCheckTime: NULL
DataSize: 0.000
IsInMemory: false
ReplicaAllocation: tag.location.default: 3
*************************** 2. row ***************************
PartitionId: 13899
PartitionName: p201702
VisibleVersion: 1
VisibleVersionTime: 2023-02-08 16:36:24
State: NORMAL
PartitionKey: date
Range: [types: [DATE]; keys: [2017-02-01]; ..types: [DATE]; keys: [2017-03-01]; )
DistributionKey: user_id
Buckets: 16
ReplicationNum: 3
StorageMedium: HDD
CooldownTime: 9999-12-31 23:59:59
RemoteStoragePolicy:
LastConsistencyCheckTime: NULL
DataSize: 0.000
IsInMemory: false
ReplicaAllocation: tag.location.default: 3
*************************** 3. row ***************************
PartitionId: 13900
PartitionName: p201703
VisibleVersion: 1
VisibleVersionTime: 2023-02-08 16:36:24
State: NORMAL
PartitionKey: date
Range: [types: [DATE]; keys: [2017-03-01]; ..types: [DATE]; keys: [2017-04-01]; )
DistributionKey: user_id
Buckets: 16
ReplicationNum: 3
StorageMedium: HDD
CooldownTime: 9999-12-31 23:59:59
RemoteStoragePolicy:
LastConsistencyCheckTime: NULL
DataSize: 0.000
IsInMemory: false
ReplicaAllocation: tag.location.default: 3
3 rows in set (0.01 sec)
通过"VALUES LESS THAN(...)"创建Range分区表example_db.example_range_tbl2:
CREATE TABLE IF NOT EXISTS example_db.example_range_tbl2
(
`user_id` LARGEINT NOT NULL COMMENT "用户id",
`date` DATE NOT NULL COMMENT "数据灌入日期时间",
`timestamp` DATETIME NOT NULL COMMENT "数据灌入的时间戳",
`city` VARCHAR(20) COMMENT "用户所在城市",
`age` SMALLINT COMMENT "用户年龄",
`sex` TINYINT COMMENT "用户性别",
`last_visit_date` DATETIME REPLACE DEFAULT "1970-01-01 00:00:00" COMMENT "用户最后一次访问时间",
`cost` BIGINT SUM DEFAULT "0" COMMENT "用户总消费",
`max_dwell_time` INT MAX DEFAULT "0" COMMENT "用户最大停留时间",
`min_dwell_time` INT MIN DEFAULT "99999" COMMENT "用户最小停留时间"
)
ENGINE=OLAP
AGGREGATE KEY(`user_id`, `date`, `timestamp`, `city`, `age`, `sex`)
PARTITION BY RANGE(`date`)
(
PARTITION `p201701` VALUES LESS THAN ("2017-02-01"),
PARTITION `p201702` VALUES LESS THAN ("2017-03-01"),
PARTITION `p201703` VALUES LESS THAN ("2017-04-01")
)
DISTRIBUTED BY HASH(`user_id`) BUCKETS 16
PROPERTIES
(
"replication_num" = "3"
);
注意:通过" VALUES LESS THAN (...)"创建分区仅指定上界,系统会将前一个分区的上界作为该分区的下界,生成一个左闭右开的区。最开始分区的下界为该分区字段的MIN_VALUE,DATE类型默认就是0000-01-01。
查看表 example_db.example_range_tbl2分区信息:
mysql> show partitions from example_db.example_range_tbl2\G;
*************************** 1. row ***************************
PartitionId: 14095
PartitionName: p201701
VisibleVersion: 1
VisibleVersionTime: 2023-02-08 16:42:20
State: NORMAL
PartitionKey: date
Range: [types: [DATE]; keys: [0000-01-01]; ..types: [DATE]; keys: [2017-02-01]; )
DistributionKey: user_id
Buckets: 16
ReplicationNum: 3
StorageMedium: HDD
CooldownTime: 9999-12-31 23:59:59
RemoteStoragePolicy:
LastConsistencyCheckTime: NULL
DataSize: 0.000
IsInMemory: false
ReplicaAllocation: tag.location.default: 3
*************************** 2. row ***************************
PartitionId: 14096
PartitionName: p201702
VisibleVersion: 1
VisibleVersionTime: 2023-02-08 16:42:20
State: NORMAL
PartitionKey: date
Range: [types: [DATE]; keys: [2017-02-01]; ..types: [DATE]; keys: [2017-03-01]; )
DistributionKey: user_id
Buckets: 16
ReplicationNum: 3
StorageMedium: HDD
CooldownTime: 9999-12-31 23:59:59
RemoteStoragePolicy:
LastConsistencyCheckTime: NULL
DataSize: 0.000
IsInMemory: false
ReplicaAllocation: tag.location.default: 3
*************************** 3. row ***************************
PartitionId: 14097
PartitionName: p201703
VisibleVersion: 1
VisibleVersionTime: 2023-02-08 16:42:20
State: NORMAL
PartitionKey: date
Range: [types: [DATE]; keys: [2017-03-01]; ..types: [DATE]; keys: [2017-04-01]; )
DistributionKey: user_id
Buckets: 16
ReplicationNum: 3
StorageMedium: HDD
CooldownTime: 9999-12-31 23:59:59
RemoteStoragePolicy:
LastConsistencyCheckTime: NULL
DataSize: 0.000
IsInMemory: false
ReplicaAllocation: tag.location.default: 3
3 rows in set (0.01 sec)
通过"FROM(...) TO (...) INTERVAL ..."创建Range分区表example_db.example_range_tbl2:
CREATE TABLE IF NOT EXISTS example_db.example_range_tbl3
(
`user_id` LARGEINT NOT NULL COMMENT "用户id",
`date` DATE NOT NULL COMMENT "数据灌入日期时间",
`timestamp` DATETIME NOT NULL COMMENT "数据灌入的时间戳",
`city` VARCHAR(20) COMMENT "用户所在城市",
`age` SMALLINT COMMENT "用户年龄",
`sex` TINYINT COMMENT "用户性别",
`last_visit_date` DATETIME REPLACE DEFAULT "1970-01-01 00:00:00" COMMENT "用户最后一次访问时间",
`cost` BIGINT SUM DEFAULT "0" COMMENT "用户总消费",
`max_dwell_time` INT MAX DEFAULT "0" COMMENT "用户最大停留时间",
`min_dwell_time` INT MIN DEFAULT "99999" COMMENT "用户最小停留时间"
)
ENGINE=OLAP
AGGREGATE KEY(`user_id`, `date`, `timestamp`, `city`, `age`, `sex`)
PARTITION BY RANGE(`date`)
(
FROM ("2017-01-03") TO ("2017-01-06") INTERVAL 1 DAY
)
DISTRIBUTED BY HASH(`user_id`) BUCKETS 16
PROPERTIES
(
"replication_num" = "3"
);
注意,以上"FROM(...) TO (...) INTERVAL ..."这种批量创建分区后面指定的INTERVAL还可以指定成YEAR、MONTH、WEEK、DAY、HOUR。
查看表 example_db.example_range_tbl3分区信息:
mysql> show partitions from example_db.example_range_tbl3\G;
*************************** 1. row ***************************
PartitionId: 14489
PartitionName: p_20170103
VisibleVersion: 1
VisibleVersionTime: 2023-02-08 16:54:18
State: NORMAL
PartitionKey: date
Range: [types: [DATE]; keys: [2017-01-03]; ..types: [DATE]; keys: [2017-01-04]; )
DistributionKey: user_id
Buckets: 16
ReplicationNum: 3
StorageMedium: HDD
CooldownTime: 9999-12-31 23:59:59
RemoteStoragePolicy:
LastConsistencyCheckTime: NULL
DataSize: 0.000
IsInMemory: false
ReplicaAllocation: tag.location.default: 3
*************************** 2. row ***************************
PartitionId: 14490
PartitionName: p_20170104
VisibleVersion: 1
VisibleVersionTime: 2023-02-08 16:54:18
State: NORMAL
PartitionKey: date
Range: [types: [DATE]; keys: [2017-01-04]; ..types: [DATE]; keys: [2017-01-05]; )
DistributionKey: user_id
Buckets: 16
ReplicationNum: 3
StorageMedium: HDD
CooldownTime: 9999-12-31 23:59:59
RemoteStoragePolicy:
LastConsistencyCheckTime: NULL
DataSize: 0.000
IsInMemory: false
ReplicaAllocation: tag.location.default: 3
*************************** 3. row ***************************
PartitionId: 14491
PartitionName: p_20170105
VisibleVersion: 1
VisibleVersionTime: 2023-02-08 16:54:18
State: NORMAL
PartitionKey: date
Range: [types: [DATE]; keys: [2017-01-05]; ..types: [DATE]; keys: [2017-01-06]; )
DistributionKey: user_id
Buckets: 16
ReplicationNum: 3
StorageMedium: HDD
CooldownTime: 9999-12-31 23:59:59
RemoteStoragePolicy:
LastConsistencyCheckTime: NULL
DataSize: 0.000
IsInMemory: false
ReplicaAllocation: tag.location.default: 3
3 rows in set (0.01 sec)
以上是三种方式来创建Range分区,下面对表example_db.example_range_tbl2进行分区增删操作,演示分区范围的变化情况。
目前表example_db.example_range_tbl2 中的分区情况如下:
p201701: [MIN_VALUE, 2017-02-01)
p201702: [2017-02-01, 2017-03-01)
p201703: [2017-03-01, 2017-04-01)
通过以下SQL命令来对表example_db.example_range_tbl2 增加一个分区:
mysql> ALTER TABLE example_db.example_range_tbl2 ADD PARTITION p201705 VALUES LESS THAN ("2017-06-01");
Query OK, 0 rows affected (0.05 sec)
注意:关于操作分区注意项参考官网:ALTER-TABLE-PARTITION - Apache Doris
增加分区后,表example_db.example_range_tbl2 中的分区情况如下:
p201701: [MIN_VALUE, 2017-02-01)
p201702: [2017-02-01, 2017-03-01)
p201703: [2017-03-01, 2017-04-01)
p201705: [2017-04-01, 2017-06-01)
此时,我们删除分区p201703,SQL命令如下:
mysql> ALTER TABLE example_db.example_range_tbl2 DROP PARTITION p201703;
Query OK, 0 rows affected (0.01 sec)
删除分区p201703后,分区结果如下:
p201701: [MIN_VALUE, 2017-02-01)
p201702: [2017-02-01, 2017-03-01)
p201705: [2017-04-01, 2017-06-01)
以上删除分区后,注意到 p201702 和 p201705 的分区范围并没有发生变化,而这两个分区之间,出现了一个空洞:[2017-03-01, 2017-04-01),即如果导入的数据范围在这个空洞范围内,是无法导入的。
继续删除分区p201702,空洞范围变为[2017-02-01, 2017-04-01),操作如下:
#删除分区p201702
mysql> ALTER TABLE example_db.example_range_tbl2 DROP PARTITION p201702;
Query OK, 0 rows affected (0.01 sec)
#删除后分区如下
p201701: [MIN_VALUE, 2017-02-01)
p201705: [2017-04-01, 2017-06-01)
现在对表example_db.example_range_tbl2 再次增加一个分区,分区结果如下:
#增加一个分区 p201702new VALUES LESS THAN ("2017-03-01")
mysql> ALTER TABLE example_db.example_range_tbl2 ADD PARTITION p201702new VALUES LESS THAN ("2017-03-01");
Query OK, 0 rows affected (0.05 sec)
#表分区结果
p201701: [MIN_VALUE, 2017-02-01)
p201702new: [2017-02-01, 2017-03-01)
p201705: [2017-04-01, 2017-06-01)
可以看到空洞范围缩小为:[2017-03-01, 2017-04-01)。
现在删除分区p201701,并添加分区p201612 VALUES LESS THAN ("2017-01-01"),SQL操作及分区结果如下:
#删除分区p201701
mysql> ALTER TABLE example_db.example_range_tbl2 DROP PARTITION p201701;
Query OK, 0 rows affected (0.01 sec)
#添加分区 p201612 VALUES LESS THAN ("2017-01-01")
mysql> ALTER TABLE example_db.example_range_tbl2 ADD PARTITION p201612 VALUES LESS THAN ("2017-01-01");
Query OK, 0 rows affected (0.05 sec)
#表分区结果
p201612: [MIN_VALUE, 2017-01-01)
p201702new: [2017-02-01, 2017-03-01)
p201705: [2017-04-01, 2017-06-01)
综上,分区的删除不会改变已存在分区的范围。删除分区可能出现空洞。通过 VALUES LESS THAN 语句增加分区时,分区的下界紧接上一个分区的上界。
Range分区除了上述我们看到的单列分区,也支持多列分区。创建表example_range_tbl4,该表为多列分区,建表语句如下:
CREATE TABLE IF NOT EXISTS example_db.example_range_tbl4
(
`date` DATE NOT NULL COMMENT "数据灌入日期时间",
`id` INT NOT NULL COMMENT "用户id",
`age` SMALLINT COMMENT "用户年龄",
`cost` BIGINT SUM DEFAULT "0" COMMENT "用户总消费"
)
ENGINE=OLAP
AGGREGATE KEY(`date`,`id`,`age`)
PARTITION BY RANGE(`date`,`id`)
(
PARTITION `p201701_1000` VALUES LESS THAN ("2017-02-01", "1000"),
PARTITION `p201702_2000` VALUES LESS THAN ("2017-03-01", "2000"),
PARTITION `p201703_all` VALUES LESS THAN ("2017-04-01")
)
DISTRIBUTED BY HASH(`id`) BUCKETS 16
PROPERTIES
(
"replication_num" = "3"
);
创建以上表分区是按照date和id两个列来进行分区,表创建完成后,分区如下:
p201701_1000: [(MIN_VALUE, MIN_VALUE), ("2017-02-01", "1000") )
p201702_2000: [("2017-02-01", "1000"), ("2017-03-01", "2000") )
p201703_all: [("2017-03-01", "2000"), ("2017-04-01", MIN_VALUE))
可以看到最后一个分区用户缺省只指定了 date 列的分区值,所以 id 列的分区值会默认填充 MIN_VALUE。当用户插入数据时,分区列值会按照顺序依次比较,最终得到对应的分区。向表中依次插入以下几条数据:
2#插入以下7条属于不同分区的数据
insert into example_db.example_range_tbl4 values
("2017-01-01",200,18,10),
("2017-01-01",2000,19,11),
("2017-02-01",100,20,12),
("2017-02-01",2000,21,13),
("2017-02-15",5000,22,14),
("2017-03-01",2000,23,15),
("2017-03-10",1,24,16);
#插入以下两条不属于任何分区的数据,会报错
insert into example_db.example_range_tbl4 values
("2017-04-01",1000,25,17),
("2017-05-01",1000,26,18);
可以通过以下命令来查看表 example_db.example_range_tbl4 对应分区数据:
#select col1,col2... from db.table PARTITION partition_name;
mysql> select * from example_range_tbl4 partition p201701_1000;
+------------+------+------+------+
| date | id | age | cost |
+------------+------+------+------+
| 2017-01-01 | 2000 | 19 | 11 |
| 2017-01-01 | 200 | 18 | 10 |
| 2017-02-01 | 100 | 20 | 12 |
+------------+------+------+------+
mysql> select * from example_range_tbl4 partition p201702_2000;
+------------+------+------+------+
| date | id | age | cost |
+------------+------+------+------+
| 2017-02-15 | 5000 | 22 | 14 |
| 2017-02-01 | 2000 | 21 | 13 |
+------------+------+------+------+
2 rows in set (0.07 sec)
mysql> select * from example_range_tbl4 partition p201703_all;
+------------+------+------+------+
| date | id | age | cost |
+------------+------+------+------+
| 2017-03-01 | 2000 | 23 | 15 |
| 2017-03-10 | 1 | 24 | 16 |
+------------+------+------+------+
2 rows in set (0.08 sec)
通过以上查询我们发现,数据对应分区情况如下:
数据 --> 分区
2017-01-01, 200 --> p201701_1000
2017-01-01, 2000 --> p201701_1000
2017-02-01, 100 --> p201701_1000
2017-02-01, 2000 --> p201702_2000
2017-02-15, 5000 --> p201702_2000
2017-03-01, 2000 --> p201703_all
2017-03-10, 1 --> p201703_all
2017-04-01, 1000 --> 无法导入
2017-05-01, 1000 --> 无法导入
注意:以上数据对应到哪个分区是一个个分区进行匹配,首先看第一个列是否在第一个分区中,不在再判断第二个列是否在第一个分区中,如果都不在那么就以此类推判断数据是否在第二个分区,直到进入合适的数据分区。
业务上,用户可以选择城市或者其他枚举值进行partition,对于这种枚举类型数据列进行分区就可以使用List分区。List分区列支持 BOOLEAN, TINYINT, SMALLINT, INT, BIGINT, LARGEINT, DATE, DATETIME, CHAR, VARCHAR 数据类型, 分区值为枚举值。只有当数据为目标分区枚举值其中之一时,才可以命中分区。
Partition 支持通过 VALUES IN (...) 来指定每个分区包含的枚举值。举例如下,创建List分区表example_db.example_list_tbl1如下:
CREATE TABLE IF NOT EXISTS example_db.example_list_tbl1
(
`user_id` LARGEINT NOT NULL COMMENT "用户id",
`date` DATE NOT NULL COMMENT "数据灌入日期时间",
`timestamp` DATETIME NOT NULL COMMENT "数据灌入的时间戳",
`city` VARCHAR(20) NOT NULL COMMENT "用户所在城市",
`age` SMALLINT COMMENT "用户年龄",
`sex` TINYINT COMMENT "用户性别",
`last_visit_date` DATETIME REPLACE DEFAULT "1970-01-01 00:00:00" COMMENT "用户最后一次访问时间",
`cost` BIGINT SUM DEFAULT "0" COMMENT "用户总消费",
`max_dwell_time` INT MAX DEFAULT "0" COMMENT "用户最大停留时间",
`min_dwell_time` INT MIN DEFAULT "99999" COMMENT "用户最小停留时间"
)
ENGINE=olap
AGGREGATE KEY(`user_id`, `date`, `timestamp`, `city`, `age`, `sex`)
PARTITION BY LIST(`city`)
(
PARTITION `p_cn` VALUES IN ("Beijing", "Shanghai", "Hong Kong"),
PARTITION `p_usa` VALUES IN ("New York", "San Francisco"),
PARTITION `p_jp` VALUES IN ("Tokyo")
)
DISTRIBUTED BY HASH(`user_id`) BUCKETS 16
PROPERTIES
(
"replication_num" = "3"
);
创建完成表example_db.example_list_tbl1之后,会自动生成如下3个分区:
p_cn: ("Beijing", "Shanghai", "Hong Kong")
p_usa: ("New York", "San Francisco")
p_jp: ("Tokyo")
执行如下命令对表example_db.example_list_tbl1 增加分区:
#增加分区 p_uk VALUES IN ("London")
mysql> ALTER TABLE example_db.example_list_tbl1 ADD PARTITION p_uk VALUES IN ("London");
Query OK, 0 rows affected (0.04 sec)
#分区结果如下:
p_cn: ("Beijing", "Shanghai", "Hong Kong")
p_usa: ("New York", "San Francisco")
p_jp: ("Tokyo")
p_uk: ("London")
执行如下命令对表example_db.example_list_tbl1删除分区:
#删除分区 p_jp
mysql> ALTER TABLE example_db.example_list_tbl1 DROP PARTITION p_jp;
Query OK, 0 rows affected (0.01 sec)
#分区结果如下:
p_cn: ("Beijing", "Shanghai", "Hong Kong")
p_usa: ("New York", "San Francisco")
p_uk: ("London")
向表example_db.example_list_tbl1中插入如下数据,观察数据所属分区情况:
#向表中插入如下数据,数据对应的city都能匹配对应分区
insert into example_db.example_list_tbl1 values
(10000,"2017-10-01","2017-10-01 08:00:05","Beijing",20,0,"2017-10-01 06:00:00",20,10,10),
(10000,"2017-10-01","2017-10-01 09:00:05","Shanghai",20,0,"2017-10-01 07:00:00",15,2,2),
(10001,"2017-10-01","2017-10-01 18:12:10","Hong Kong",30,1,"2017-10-01 17:05:45",2,22,22),
(10002,"2017-10-02","2017-10-02 13:10:00","New York",20,1,"2017-10-02 12:59:12",200,5,5),
(10003,"2017-10-02","2017-10-02 13:15:00","San Francisco",32,0,"2017-10-02 11:20:00",30,11,11),
(10004,"2017-10-01","2017-10-01 12:12:48","London",35,0,"2017-10-01 10:00:15",100,3,3);
#查询 p_cn 分区数据,查询其他分区数据一样语法
mysql> select * from example_db.example_list_tbl1 partition p_cn;
+---------+------------+---------------------+-----------+
| user_id | date | timestamp | city |...
+---------+------------+---------------------+-----------+
| 10001 | 2017-10-01 | 2017-10-01 18:12:10 | Hong Kong |...
| 10000 | 2017-10-01 | 2017-10-01 08:00:05 | Beijing |...
| 10000 | 2017-10-01 | 2017-10-01 09:00:05 | Shanghai |...
+---------+------------+---------------------+-----------+
#向表中插入如下数据,不属于表中任何分区会报错
insert into example_db.example_list_tbl1 values
(10004,"2017-10-03","2017-10-03 12:38:20","Tokyo",35,0,"2017-10-03 10:20:22",11,6,6);
List分区也支持多列分区。创建多列分区表example_db.example_list_tbl2如下:
CREATE TABLE IF NOT EXISTS example_db.example_list_tbl2
(
`id` LARGEINT NOT NULL COMMENT "用户id",
`date` DATE NOT NULL COMMENT "数据灌入日期时间",
`city` VARCHAR(20) NOT NULL COMMENT "用户所在城市",
`age` SMALLINT COMMENT "用户年龄",
`cost` BIGINT SUM DEFAULT "0" COMMENT "用户总消费"
)
ENGINE=olap
AGGREGATE KEY(`id`, `date`, `city`, `age`)
PARTITION BY LIST(`id`, `city`)
(
PARTITION `p1_city` VALUES IN (("1", "Beijing"), ("1", "Shanghai")),
PARTITION `p2_city` VALUES IN (("2", "Beijing"), ("2", "Shanghai")),
PARTITION `p3_city` VALUES IN (("3", "Beijing"), ("3", "Shanghai"))
)
DISTRIBUTED BY HASH(`id`) BUCKETS 16
PROPERTIES
(
"replication_num" = "3"
);
以上表是以id、city列创建的多列分区,分区信息如下:
p1_city: [("1", "Beijing"), ("1", "Shanghai")]
p2_city: [("2", "Beijing"), ("2", "Shanghai")]
p3_city: [("3", "Beijing"), ("3", "Shanghai")]
当数据插入到表中匹配时也是按照每列顺序进行匹配,向表中插入如下数据:
#向表中插入如下数据,每条数据可以对应到已有分区中
insert into example_db.example_list_tbl2 values
(1,"2017-10-01","Beijing",18,100),
(1,"2017-10-02","Shanghai",18,101),
(2,"2017-10-03","Shanghai",20,102),
(3,"2017-10-04","Beijing",21,103);
#向表中插入如下数据,每条数据都不能匹配已有分区,报错。
insert into example_db.example_list_tbl2 values
(1,"2017-10-05","Tianjin",22,104),
(4,"2017-10-06","Beijing",23,105);
以上几条数据匹配分区情况如下:
数据 ---> 分区
1, Beijing ---> p1_city
1, Shanghai ---> p1_city
2, Shanghai ---> p2_city
3, Beijing ---> p3_city
1, Tianjin ---> 无法导入
4, Beijing ---> 无法导入