pg14场景下测试
后面慢慢补充。
5.11.6. Best Practices for Declarative Partitioning
The choice of how to partition a table should be made carefully, as the performance of query planning and execution can be negatively affected by poor design.
One of the most critical design decisions will be the column or columns by which you partition your data. Often the best choice will be to partition by the column or set of columns which most commonly appear in WHERE clauses of queries being executed on the partitioned table. WHERE clauses that are compatible with the partition bound constraints can be used to prune unneeded partitions. However, you may be forced into making other decisions by requirements for the PRIMARY KEY or a UNIQUE constraint. Removal of unwanted data is also a factor to consider when planning your partitioning strategy. An entire partition can be detached fairly quickly, so it may be beneficial to design the partition strategy in such a way that all data to be removed at once is located in a single partition.
Choosing the target number of partitions that the table should be divided into is also a critical decision to make. Not having enough partitions may mean that indexes remain too large and that data locality remains poor which could result in low cache hit ratios. However, dividing the table into too many partitions can also cause issues. Too many partitions can mean longer query planning times and higher memory consumption during both query planning and execution, as further described below. When choosing how to partition your table, it’s also important to consider what changes may occur in the future. For example, if you choose to have one partition per customer and you currently have a small number of large customers, consider the implications if in several years you instead find yourself with a large number of small customers. In this case, it may be better to choose to partition by HASH and choose a reasonable number of partitions rather than trying to partition by LIST and hoping that the number of customers does not increase beyond what it is practical to partition the data by.
Sub-partitioning can be useful to further divide partitions that are expected to become larger than other partitions. Another option is to use range partitioning with multiple columns in the partition key. Either of these can easily lead to excessive numbers of partitions, so restraint is advisable.
It is important to consider the overhead of partitioning during query planning and execution. The query planner is generally able to handle partition hierarchies with up to a few thousand partitions fairly well, provided that typical queries allow the query planner to prune all but a small number of partitions. Planning times become longer and memory consumption becomes higher when more partitions remain after the planner performs partition pruning. Another reason to be concerned about having a large number of partitions is that the server’s memory consumption may grow significantly over time, especially if many sessions touch large numbers of partitions. That’s because each partition requires its metadata to be loaded into the local memory of each session that touches it.
With data warehouse type workloads, it can make sense to use a larger number of partitions than with an OLTP type workload. Generally, in data warehouses, query planning time is less of a concern as the majority of processing time is spent during query execution. With either of these two types of workload, it is important to make the right decisions early, as re-partitioning large quantities of data can be painfully slow. Simulations of the intended workload are often beneficial for optimizing the partitioning strategy.
Never just assume that more partitions are better than fewer partitions, nor vice-versa. 永远不要假设更多的分区比更少的分区更好,反之亦然。
分区键离散,可以使用PARTITION BY LIST。按字符串匹配决定落入哪个分区。
drop table customers;
CREATE TABLE customers (id INTEGER, status TEXT, arr NUMERIC) PARTITION BY LIST(status);
CREATE TABLE cust_active PARTITION OF customers FOR VALUES IN ('ACTIVE');
CREATE TABLE cust_archived PARTITION OF customers FOR VALUES IN ('EXPIRED1', 'EXPIRED2');
CREATE TABLE cust_others PARTITION OF customers DEFAULT;
INSERT INTO customers VALUES (1,'ACTIVE',100), (2,'RECURRING',20), (3,'EXPIRED1',38), (4,'REACTIVATED',144);
-- 父表
postgres=# \d+ customers
Partitioned table "public.customers"
Column | Type | Collation | Nullable | Default | Storage | Compression | Stats target | Description
--------+---------+-----------+----------+---------+----------+-------------+--------------+-------------
id | integer | | | | plain | | |
status | text | | | | extended | | |
arr | numeric | | | | main | | |
Partition key: LIST (status)
Partitions: cust_active FOR VALUES IN ('ACTIVE'),
cust_archived FOR VALUES IN ('EXPIRED'),
cust_others DEFAULT
-- 子表
postgres=# \d+ cust_active
Table "public.cust_active"
Column | Type | Collation | Nullable | Default | Storage | Compression | Stats target | Description
--------+---------+-----------+----------+---------+----------+-------------+--------------+-------------
id | integer | | | | plain | | |
status | text | | | | extended | | |
arr | numeric | | | | main | | |
Partition of: customers FOR VALUES IN ('ACTIVE')
Partition constraint: ((status IS NOT NULL) AND (status = 'ACTIVE'::text))
Access method: heap
带分区键查询:直接在指定表上查询。
postgres=# explain select * from customers where status = 'EXPIRED1' and arr > 30;
QUERY PLAN
-------------------------------------------------------------------------
Seq Scan on cust_archived customers (cost=0.00..22.75 rows=1 width=68)
Filter: ((arr > '30'::numeric) AND (status = 'EXPIRED1'::text))
不带分区键查询:不知道数据在哪个子表,全部扫一遍。
postgres=# explain select * from customers where arr > 30;
QUERY PLAN
-----------------------------------------------------------------------------------
Append (cost=0.00..66.12 rows=849 width=68)
-> Seq Scan on cust_active customers_1 (cost=0.00..20.62 rows=283 width=68)
Filter: (arr > '30'::numeric)
-> Seq Scan on cust_archived customers_2 (cost=0.00..20.62 rows=283 width=68)
Filter: (arr > '30'::numeric)
-> Seq Scan on cust_others customers_3 (cost=0.00..20.62 rows=283 width=68)
Filter: (arr > '30'::numeric)
分区键值连续,可以考虑使用PARTITION BY RANGE。
时间类型CASE:
CREATE TABLE measurement (
city_id int not null,
logdate date not null,
peaktemp int,
unitsales int
) PARTITION BY RANGE (logdate);
CREATE TABLE measurement_y2006m02 PARTITION OF measurement FOR VALUES FROM ('2006-02-01') TO ('2006-03-01');
CREATE TABLE measurement_y2006m03 PARTITION OF measurement FOR VALUES FROM ('2006-03-01') TO ('2006-04-01');
...
CREATE TABLE measurement_y2007m11 PARTITION OF measurement FOR VALUES FROM ('2007-11-01') TO ('2007-12-01');
CREATE TABLE measurement_y2007m12 PARTITION OF measurement FOR VALUES FROM ('2007-12-01') TO ('2008-01-01') TABLESPACE fasttablespace;
CREATE TABLE measurement_y2008m01 PARTITION OF measurement FOR VALUES FROM ('2008-01-01') TO ('2008-02-01') WITH (parallel_workers = 4)
TABLESPACE fasttablespace;
整形CASE:
drop table customers;
CREATE TABLE customers (id INTEGER, status TEXT, arr NUMERIC) PARTITION BY RANGE(arr);
CREATE TABLE cust_arr_small PARTITION OF customers FOR VALUES FROM (MINVALUE) TO (25);
CREATE TABLE cust_arr_medium PARTITION OF customers FOR VALUES FROM (25) TO (75);
CREATE TABLE cust_arr_large PARTITION OF customers FOR VALUES FROM (75) TO (MAXVALUE);
INSERT INTO customers VALUES (1,'ACTIVE',100), (2,'RECURRING',20), (3,'EXPIRED',38), (4,'REACTIVATED',144);
-- 父表
postgres=# \d+ customers
Partitioned table "public.customers"
Column | Type | Collation | Nullable | Default | Storage | Compression | Stats target | Description
--------+---------+-----------+----------+---------+----------+-------------+--------------+-------------
id | integer | | | | plain | | |
status | text | | | | extended | | |
arr | numeric | | | | main | | |
Partition key: RANGE (arr)
Partitions: cust_arr_large FOR VALUES FROM ('75') TO (MAXVALUE),
cust_arr_medium FOR VALUES FROM ('25') TO ('75'),
cust_arr_small FOR VALUES FROM (MINVALUE) TO ('25')
-- 子表
postgres=# \d+ cust_arr_small
Table "public.cust_arr_small"
Column | Type | Collation | Nullable | Default | Storage | Compression | Stats target | Description
--------+---------+-----------+----------+---------+----------+-------------+--------------+-------------
id | integer | | | | plain | | |
status | text | | | | extended | | |
arr | numeric | | | | main | | |
Partition of: customers FOR VALUES FROM (MINVALUE) TO ('25')
Partition constraint: ((arr IS NOT NULL) AND (arr < '25'::numeric))
Access method: heap
带分区键查询:
postgres=# SELECT tableoid::regclass,* FROM customers;
tableoid | id | status | arr
-----------------+----+-------------+-----
cust_arr_small | 2 | RECURRING | 20
cust_arr_medium | 3 | EXPIRED | 38
cust_arr_large | 1 | ACTIVE | 100
cust_arr_large | 4 | REACTIVATED | 144
postgres=# explain select * from customers where status = 'EXPIRED1' and arr > 130;
QUERY PLAN
--------------------------------------------------------------------------
Seq Scan on cust_arr_large customers (cost=0.00..22.75 rows=1 width=68)
Filter: ((arr > '130'::numeric) AND (status = 'EXPIRED1'::text))
不带分区键查询:
postgres=# explain select * from customers where status = 'EXPIRED1';
QUERY PLAN
-----------------------------------------------------------------------------------
Append (cost=0.00..61.94 rows=12 width=68)
-> Seq Scan on cust_arr_small customers_1 (cost=0.00..20.62 rows=4 width=68)
Filter: (status = 'EXPIRED1'::text)
-> Seq Scan on cust_arr_medium customers_2 (cost=0.00..20.62 rows=4 width=68)
Filter: (status = 'EXPIRED1'::text)
-> Seq Scan on cust_arr_large customers_3 (cost=0.00..20.62 rows=4 width=68)
Filter: (status = 'EXPIRED1'::text)
分区键值本身没有规律进行平均差分,可以用Hash取模离散。
drop table customers;
CREATE TABLE customers (id INTEGER, status TEXT, arr NUMERIC) PARTITION BY HASH(id);
CREATE TABLE cust_part1 PARTITION OF customers FOR VALUES WITH (modulus 3, remainder 0);
CREATE TABLE cust_part2 PARTITION OF customers FOR VALUES WITH (modulus 3, remainder 1);
CREATE TABLE cust_part3 PARTITION OF customers FOR VALUES WITH (modulus 3, remainder 2);
INSERT INTO customers VALUES (1,'ACTIVE',100), (2,'RECURRING',20), (3,'EXPIRED',38), (4,'REACTIVATED',144);
-- 父表
postgres=# \d+ customers
Partitioned table "public.customers"
Column | Type | Collation | Nullable | Default | Storage | Compression | Stats target | Description
--------+---------+-----------+----------+---------+----------+-------------+--------------+-------------
id | integer | | | | plain | | |
status | text | | | | extended | | |
arr | numeric | | | | main | | |
Partition key: HASH (id)
Partitions: cust_part1 FOR VALUES WITH (modulus 3, remainder 0),
cust_part2 FOR VALUES WITH (modulus 3, remainder 1),
cust_part3 FOR VALUES WITH (modulus 3, remainder 2)
-- 子表
postgres=# \d+ cust_part1
Table "public.cust_part1"
Column | Type | Collation | Nullable | Default | Storage | Compression | Stats target | Description
--------+---------+-----------+----------+---------+----------+-------------+--------------+-------------
id | integer | | | | plain | | |
status | text | | | | extended | | |
arr | numeric | | | | main | | |
Partition of: customers FOR VALUES WITH (modulus 3, remainder 0)
Partition constraint: satisfies_hash_partition('24667'::oid, 3, 0, id)
Access method: heap
drop table customers;
CREATE TABLE customers (id INTEGER, status TEXT, arr NUMERIC) PARTITION BY LIST(status);
-- 子表1
CREATE TABLE cust_active PARTITION OF customers FOR VALUES IN ('ACTIVE','RECURRING','REACTIVATED') PARTITION BY RANGE(arr);
CREATE TABLE cust_arr_small PARTITION OF cust_active FOR VALUES FROM (MINVALUE) TO (101) PARTITION BY HASH(id);
CREATE TABLE cust_part11 PARTITION OF cust_arr_small FOR VALUES WITH (modulus 2, remainder 0);
CREATE TABLE cust_part12 PARTITION OF cust_arr_small FOR VALUES WITH (modulus 2, remainder 1);
-- 子表2
CREATE TABLE cust_other PARTITION OF customers DEFAULT PARTITION BY RANGE(arr);
CREATE TABLE cust_arr_large PARTITION OF cust_other FOR VALUES FROM (101) TO (MAXVALUE) PARTITION BY HASH(id);
CREATE TABLE cust_part21 PARTITION OF cust_arr_large FOR VALUES WITH (modulus 2, remainder 0);
CREATE TABLE cust_part22 PARTITION OF cust_arr_large FOR VALUES WITH (modulus 2, remainder 1);
数据分布实例:
INSERT INTO customers VALUES (1,'ACTIVE',100), (2,'RECURRING',20), (3,'REACTIVATED',38), (4,'EXPIRED',144), (5,'EXPIRED',150), (6,'EXPIRED',160), (7,'EXPIRED',161);
postgres=# SELECT tableoid::regclass,* FROM customers;
tableoid | id | status | arr
-------------+----+-------------+-----
cust_part11 | 1 | ACTIVE | 100
cust_part11 | 2 | RECURRING | 20
cust_part12 | 3 | REACTIVATED | 38
cust_part22 | 4 | EXPIRED | 144
cust_part22 | 5 | EXPIRED | 150
cust_part22 | 6 | EXPIRED | 160
cust_part22 | 7 | EXPIRED | 161
drop table customers;
CREATE TABLE customers (id INTEGER, status TEXT, arr NUMERIC) PARTITION BY LIST(status);
CREATE TABLE cust_active PARTITION OF customers FOR VALUES IN ('ACTIVE');
CREATE TABLE cust_archived PARTITION OF customers FOR VALUES IN ('EXPIRED1', 'EXPIRED2');
CREATE TABLE cust_others PARTITION OF customers DEFAULT;
postgres=# \d+ customers
Partitioned table "public.customers"
Column | Type | Collation | Nullable | Default | Storage | Compression | Stats target | Description
--------+---------+-----------+----------+---------+----------+-------------+--------------+-------------
id | integer | | | | plain | | |
status | text | | | | extended | | |
arr | numeric | | | | main | | |
Partition key: LIST (status)
Partitions: cust_active FOR VALUES IN ('ACTIVE'),
cust_archived FOR VALUES IN ('EXPIRED1', 'EXPIRED2'),
cust_others DEFAULT
ALTER TABLE customers DETACH PARTITION cust_others;
postgres=# \d+ customers
Partitioned table "public.customers"
Column | Type | Collation | Nullable | Default | Storage | Compression | Stats target | Description
--------+---------+-----------+----------+---------+----------+-------------+--------------+-------------
id | integer | | | | plain | | |
status | text | | | | extended | | |
arr | numeric | | | | main | | |
Partition key: LIST (status)
Partitions: cust_active FOR VALUES IN ('ACTIVE'),
cust_archived FOR VALUES IN ('EXPIRED1', 'EXPIRED2')
分区键上的索引只有父表需要,只用于父表找到子表,所以无需再子表上创建。
drop table customers;
CREATE TABLE customers (id INTEGER, status TEXT, arr NUMERIC) PARTITION BY LIST(status);
CREATE TABLE cust_active PARTITION OF customers FOR VALUES IN ('ACTIVE');
CREATE TABLE cust_archived PARTITION OF customers FOR VALUES IN ('EXPIRED1', 'EXPIRED2');
CREATE TABLE cust_others PARTITION OF customers DEFAULT;
CREATE INDEX customers_status_idx ON customers (status);
postgres=# \d+ customers
Partitioned table "public.customers"
Column | Type | Collation | Nullable | Default | Storage | Compression | Stats target | Description
--------+---------+-----------+----------+---------+----------+-------------+--------------+-------------
id | integer | | | | plain | | |
status | text | | | | extended | | |
arr | numeric | | | | main | | |
Partition key: LIST (status)
Indexes:
"customers_status_idx" btree (status)
Partitions: cust_active FOR VALUES IN ('ACTIVE'),
cust_archived FOR VALUES IN ('EXPIRED1', 'EXPIRED2'),
cust_others DEFAULT
postgres=# \d+ cust_archived
Table "public.cust_archived"
Column | Type | Collation | Nullable | Default | Storage | Compression | Stats target | Description
--------+---------+-----------+----------+---------+----------+-------------+--------------+-------------
id | integer | | | | plain | | |
status | text | | | | extended | | |
arr | numeric | | | | main | | |
Partition of: customers FOR VALUES IN ('EXPIRED1', 'EXPIRED2')
Partition constraint: ((status IS NOT NULL) AND (status = ANY (ARRAY['EXPIRED1'::text, 'EXPIRED2'::text])))
Indexes:
"cust_archived_status_idx" btree (status)
Access method: heap
非分区键上的索引会传播的子表上,自动创建。
drop table customers;
CREATE TABLE customers (id INTEGER, status TEXT, arr NUMERIC) PARTITION BY LIST(status);
CREATE TABLE cust_active PARTITION OF customers FOR VALUES IN ('ACTIVE');
CREATE TABLE cust_archived PARTITION OF customers FOR VALUES IN ('EXPIRED1', 'EXPIRED2');
CREATE TABLE cust_others PARTITION OF customers DEFAULT;
CREATE INDEX customers_arr_idx ON customers (arr);
postgres=# \d+ customers
Partitioned table "public.customers"
Column | Type | Collation | Nullable | Default | Storage | Compression | Stats target | Description
--------+---------+-----------+----------+---------+----------+-------------+--------------+-------------
id | integer | | | | plain | | |
status | text | | | | extended | | |
arr | numeric | | | | main | | |
Partition key: LIST (status)
Indexes:
"customers_arr_idx" btree (arr)
Partitions: cust_active FOR VALUES IN ('ACTIVE'),
cust_archived FOR VALUES IN ('EXPIRED1', 'EXPIRED2'),
cust_others DEFAULT
postgres=# \d+ cust_archived
Table "public.cust_archived"
Column | Type | Collation | Nullable | Default | Storage | Compression | Stats target | Description
--------+---------+-----------+----------+---------+----------+-------------+--------------+-------------
id | integer | | | | plain | | |
status | text | | | | extended | | |
arr | numeric | | | | main | | |
Partition of: customers FOR VALUES IN ('EXPIRED1', 'EXPIRED2')
Partition constraint: ((status IS NOT NULL) AND (status = ANY (ARRAY['EXPIRED1'::text, 'EXPIRED2'::text])))
Indexes:
"cust_archived_arr_idx" btree (arr)
Access method: heap
增加ONLY关键字,只给父表创建索引;在使用alter index给某些子表建索引:
drop table customers;
CREATE TABLE customers (id INTEGER, status TEXT, arr NUMERIC) PARTITION BY LIST(status);
CREATE TABLE cust_active PARTITION OF customers FOR VALUES IN ('ACTIVE');
CREATE TABLE cust_archived PARTITION OF customers FOR VALUES IN ('EXPIRED1', 'EXPIRED2');
CREATE TABLE cust_others PARTITION OF customers DEFAULT;
CREATE INDEX customers_arr_idx ON ONLY customers (arr);
postgres=# \d+ customers
Partitioned table "public.customers"
Column | Type | Collation | Nullable | Default | Storage | Compression | Stats target | Description
--------+---------+-----------+----------+---------+----------+-------------+--------------+-------------
id | integer | | | | plain | | |
status | text | | | | extended | | |
arr | numeric | | | | main | | |
Partition key: LIST (status)
Indexes:
"customers_arr_idx" btree (arr) INVALID
Partitions: cust_active FOR VALUES IN ('ACTIVE'),
cust_archived FOR VALUES IN ('EXPIRED1', 'EXPIRED2'),
cust_others DEFAULT
子表无索引
postgres=# \d+ cust_archived
Table "public.cust_archived"
Column | Type | Collation | Nullable | Default | Storage | Compression | Stats target | Description
--------+---------+-----------+----------+---------+----------+-------------+--------------+-------------
id | integer | | | | plain | | |
status | text | | | | extended | | |
arr | numeric | | | | main | | |
Partition of: customers FOR VALUES IN ('EXPIRED1', 'EXPIRED2')
Partition constraint: ((status IS NOT NULL) AND (status = ANY (ARRAY['EXPIRED1'::text, 'EXPIRED2'::text])))
Access method: heap
子表手动创建索引
CREATE INDEX cust_archived_arr_idx ON cust_archived (arr);
ALTER INDEX customers_arr_idx ATTACH PARTITION cust_archived_arr_idx;
postgres=# \d+ cust_archived
Table "public.cust_archived"
Column | Type | Collation | Nullable | Default | Storage | Compression | Stats target | Description
--------+---------+-----------+----------+---------+----------+-------------+--------------+-------------
id | integer | | | | plain | | |
status | text | | | | extended | | |
arr | numeric | | | | main | | |
Partition of: customers FOR VALUES IN ('EXPIRED1', 'EXPIRED2')
Partition constraint: ((status IS NOT NULL) AND (status = ANY (ARRAY['EXPIRED1'::text, 'EXPIRED2'::text])))
Indexes:
"cust_archived_arr_idx" btree (arr)
Access method: heap
非分区键上的索引会传播的子表上,自动创建。
drop table customers;
CREATE TABLE customers (id INTEGER, status TEXT, arr NUMERIC) PARTITION BY LIST(status);
CREATE TABLE cust_active PARTITION OF customers FOR VALUES IN ('ACTIVE');
CREATE INDEX customers_arr_idx ON customers (arr);
CREATE TABLE cust_archived PARTITION OF customers FOR VALUES IN ('EXPIRED1', 'EXPIRED2');
CREATE TABLE cust_others PARTITION OF customers DEFAULT;
postgres=# \d+ cust_active
Table "public.cust_active"
Column | Type | Collation | Nullable | Default | Storage | Compression | Stats target | Description
--------+---------+-----------+----------+---------+----------+-------------+--------------+-------------
id | integer | | | | plain | | |
status | text | | | | extended | | |
arr | numeric | | | | main | | |
Partition of: customers FOR VALUES IN ('ACTIVE')
Partition constraint: ((status IS NOT NULL) AND (status = 'ACTIVE'::text))
Indexes:
"cust_active_arr_idx" btree (arr)
Access method: heap
postgres=# \d+ cust_archived
Table "public.cust_archived"
Column | Type | Collation | Nullable | Default | Storage | Compression | Stats target | Description
--------+---------+-----------+----------+---------+----------+-------------+--------------+-------------
id | integer | | | | plain | | |
status | text | | | | extended | | |
arr | numeric | | | | main | | |
Partition of: customers FOR VALUES IN ('EXPIRED1', 'EXPIRED2')
Partition constraint: ((status IS NOT NULL) AND (status = ANY (ARRAY['EXPIRED1'::text, 'EXPIRED2'::text])))
Indexes:
"cust_archived_arr_idx" btree (arr)
Access method: heap
postgres=# \d+ cust_others
Table "public.cust_others"
Column | Type | Collation | Nullable | Default | Storage | Compression | Stats target | Description
--------+---------+-----------+----------+---------+----------+-------------+--------------+-------------
id | integer | | | | plain | | |
status | text | | | | extended | | |
arr | numeric | | | | main | | |
Partition of: customers DEFAULT
Partition constraint: (NOT ((status IS NOT NULL) AND (status = ANY (ARRAY['ACTIVE'::text, 'EXPIRED1'::text, 'EXPIRED2'::text]))))
Indexes:
"cust_others_arr_idx" btree (arr)
Access method: heap