文章/答案/技术大牛

发布

社区首页 >问答首页 >PostgreSQL中的SQL JOIN - WHERE子句中的执行计划与ON子句中的执行计划不同

问PostgreSQL中的SQL JOIN - WHERE子句中的执行计划与ON子句中的执行计划不同
EN

Stack Overflow用户

提问于 2021-01-12 04:11:10

回答 1查看 325关注 0票数 3

我们在PostgreSQL 11.9/11.10或12.5中有一个简单的语句，我们可以使用where子句或ON子句来编写连接。含义是完全相同的，因此返回的行数也是一样的--但是我们收到了不同的解释计划。随着表中的数据越来越多，一个执行计划变得越来越糟糕，我们想知道为什么PostgreSQL会为这种情况选择不同的解释计划。有什么想法吗？

让我们创建一些示例数据：

CREATE TABLE t1 (
    t1_nr int8 NOT NULL,
    name varchar(60),
    CONSTRAINT t1_pk PRIMARY KEY (t1_nr)
);

INSERT INTO t1 (t1_nr, name) SELECT s, left(md5(random()::text), 10) FROM generate_series(1, 1000000) s; -- 1 million records

CREATE TABLE t2 (
    t2_nr int8 NOT NULL,
    CONSTRAINT t2_pk PRIMARY KEY (t2_nr)
);

INSERT INTO t2 (t2_nr) SELECT s FROM generate_series(1, 10000000) s; -- 10 million records

CREATE TABLE t3 (
    t1_nr int8 NOT NULL,
    t2_nr int8 NOT NULL,
    CONSTRAINT t3_pk PRIMARY KEY (t2_nr, t1_nr)
);

INSERT INTO t3 (t1_nr, t2_nr) SELECT (s-1)/10+1, s FROM generate_series(1, 10000000) s; -- 10 t2 records per t1 records --> 10 million records

我们的声明包含了充分分析的统计数据：

EXPLAIN (BUFFERS, ANALYZE)
SELECT t1.*
FROM t1 t1
WHERE EXISTS (
    SELECT 1
    FROM t3 t3
    JOIN t2 t2 ON t2.t2_nr = t3.t2_nr
    --AND t3.t1_nr = t1.t1_nr /* GOOD (using ON-CLAUSE) */
    WHERE t3.t1_nr = t1.t1_nr /* BAD (using WHERE-CLAUSE) */
)
LIMIT 1000

带有"GOOD“行(ON-子句)的解释计划：

QUERY PLAN                                                                                                                            |
--------------------------------------------------------------------------------------------------------------------------------------|
Limit  (cost=0.00..22896.86 rows=1000 width=19) (actual time=0.028..4.801 rows=1000 loops=1)                                          |
  Buffers: shared hit=8015                                                                                                            |
  ->  Seq Scan on t1  (cost=0.00..11448428.92 rows=500000 width=19) (actual time=0.027..4.725 rows=1000 loops=1)                      |
        Filter: (SubPlan 1)                                                                                                           |
        Buffers: shared hit=8015                                                                                                      |
        SubPlan 1                                                                                                                     |
          ->  Nested Loop  (cost=0.87..180.43 rows=17 width=0) (actual time=0.004..0.004 rows=1 loops=1000)                           |
                Buffers: shared hit=8008                                                                                              |
                ->  Index Only Scan using t3_pk on t3  (cost=0.43..36.73 rows=17 width=8) (actual time=0.002..0.002 rows=1 loops=1000)|
                      Index Cond: (t1_nr = t1.t1_nr)                                                                                  |
                      Heap Fetches: 1000                                                                                              |
                      Buffers: shared hit=4003                                                                                        |
                ->  Index Only Scan using t2_pk on t2  (cost=0.43..8.45 rows=1 width=8) (actual time=0.002..0.002 rows=1 loops=1000)  |
                      Index Cond: (t2_nr = t3.t2_nr)                                                                                  |
                      Heap Fetches: 1000                                                                                              |
                      Buffers: shared hit=4005                                                                                        |
Planning Time: 0.267 ms                                                                                                               |
Execution Time: 4.880 ms                                                                                                              |

带有“错误”行(WHERE-子句)的解释计划：

QUERY PLAN                                                                                                                                                   |
-------------------------------------------------------------------------------------------------------------------------------------------------------------|
Limit  (cost=1166.26..7343.42 rows=1000 width=19) (actual time=16.888..75.809 rows=1000 loops=1)                                                             |
  Buffers: shared hit=51883 read=11 dirtied=2                                                                                                                |
  ->  Merge Semi Join  (cost=1166.26..3690609.61 rows=597272 width=19) (actual time=16.887..75.703 rows=1000 loops=1)                                        |
        Merge Cond: (t1.t1_nr = t3.t1_nr)                                                                                                                    |
        Buffers: shared hit=51883 read=11 dirtied=2                                                                                                          |
        ->  Index Scan using t1_pk on t1  (cost=0.42..32353.42 rows=1000000 width=19) (actual time=0.010..0.271 rows=1000 loops=1)                           |
              Buffers: shared hit=12                                                                                                                         |
        ->  Gather Merge  (cost=1000.89..3530760.13 rows=9999860 width=8) (actual time=16.873..74.064 rows=9991 loops=1)                                     |
              Workers Planned: 2                                                                                                                             |
              Workers Launched: 2                                                                                                                            |
              Buffers: shared hit=51871 read=11 dirtied=2                                                                                                    |
              ->  Nested Loop  (cost=0.87..2375528.14 rows=4166608 width=8) (actual time=0.054..14.275 rows=4309 loops=3)                                    |
                    Buffers: shared hit=51871 read=11 dirtied=2                                                                                              |
                    ->  Parallel Index Only Scan using t3_pk on t3  (cost=0.43..370689.69 rows=4166608 width=16) (actual time=0.028..1.495 rows=4309 loops=3)|
                          Heap Fetches: 12927                                                                                                                |
                          Buffers: shared hit=131 read=6                                                                                                     |
                    ->  Index Only Scan using t2_pk on t2  (cost=0.43..0.48 rows=1 width=8) (actual time=0.002..0.002 rows=1 loops=12927)                    |
                          Index Cond: (t2_nr = t3.t2_nr)                                                                                                     |
                          Heap Fetches: 12927                                                                                                                |
                          Buffers: shared hit=51740 read=5 dirtied=2                                                                                         |
Planning Time: 0.475 ms                                                                                                                                      |
Execution Time: 75.947 ms                                                                                                                                    |

postgresql

join

where-clause

on-clause

sql

Stack Overflow用户

发布于 2021-03-31 21:54:52

感谢您的想法，如果我们添加一个索引，如

CREATE INDEX t3_t1_nr ON t3(t1_nr);

“不好的”-Statement会有所改善。

但我们的最终解决方案是增加为此表收集的统计数据：

ALTER TABLE t1 ALTER COLUMN t1_nr SET STATISTICS 10000;
ALTER TABLE t2 ALTER COLUMN t2_nr SET STATISTICS 10000;
ALTER TABLE t3 ALTER COLUMN t1_nr SET STATISTICS 10000;

ANALYZE t1;
ANALYZE t2;
ANALYZE t3;

在此更改之后，两个选择都有更多关于相同的执行时间。更多信息可在此处找到：https://www.postgresql.org/docs/12/planner-stats.html

票数 1

查看全部 1 条回答

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/65673841

复制

相似问题

问PostgreSQL中的SQL JOIN - WHERE子句中的执行计划与ON子句中的执行计划不同
EN

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问PostgreSQL中的SQL JOIN - WHERE子句中的执行计划与ON子句中的执行计划不同EN

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问PostgreSQL中的SQL JOIN - WHERE子句中的执行计划与ON子句中的执行计划不同
EN