文章/答案/技术大牛

发布

社区首页 >问答首页 >当记录数量少于“限制”时，Postgres奇怪的查询计划

问当记录数量少于“限制”时，Postgres奇怪的查询计划
EN

Stack Overflow用户

提问于 2020-12-03 02:00:04

回答 1查看 310关注 0票数 1

对于大日期过滤器，我有一个非常快速的查询

EXPLAIN ANALYZE
SELECT "advertisings"."id",
       "advertisings"."page_id",
       "advertisings"."page_name",
       "advertisings"."created_at",
       "posts"."image_url",
       "posts"."thumbnail_url",
       "posts"."post_content",
       "posts"."like_count"
FROM "advertisings"
  INNER JOIN "posts" ON "advertisings"."post_id" = "posts"."id"
WHERE "advertisings"."created_at" >= '2020-01-01T00:00:00Z'
AND   "advertisings"."created_at" < '2020-12-02T23:59:59Z'
ORDER BY "like_count" DESC LIMIT 20

查询计划是：

Limit  (cost=0.85..20.13 rows=20 width=552) (actual time=0.026..0.173 rows=20 loops=1)
  ->  Nested Loop  (cost=0.85..951662.55 rows=987279 width=552) (actual time=0.025..0.169 rows=20 loops=1)
        ->  Index Scan using posts_like_count_idx on posts  (cost=0.43..378991.65 rows=1053015 width=504) (actual time=0.013..0.039 rows=20 loops=1)
        ->  Index Scan using advertisings_post_id_index on advertisings  (cost=0.43..0.53 rows=1 width=52) (actual time=0.005..0.006 rows=1 loops=20)
              Index Cond: (post_id = posts.id)
              Filter: ((created_at >= '2020-01-01 00:00:00'::timestamp without time zone) AND (created_at < '2020-12-02 23:59:59'::timestamp without time zone))
Planning Time: 0.365 ms
Execution Time: 0.199 ms

但是，当我缩小筛选器(更改"created_at“>= '2020-11-25T00:00:00Z')返回9个记录(低于限制20)时，查询非常缓慢。

EXPLAIN ANALYZE
SELECT "advertisings"."id",
       "advertisings"."page_id",
       "advertisings"."page_name",
       "advertisings"."created_at",
       "posts"."image_url",
       "posts"."thumbnail_url",
       "posts"."post_content",
       "posts"."like_count"
FROM "advertisings"
  INNER JOIN "posts" ON "advertisings"."post_id" = "posts"."id"
WHERE "advertisings"."created_at" >= '2020-11-25T00:00:00Z'
AND   "advertisings"."created_at" < '2020-12-02T23:59:59Z'
ORDER BY "like_count" DESC LIMIT 20

查询计划：

Limit  (cost=1000.88..8051.73 rows=20 width=552) (actual time=218.485..4155.336 rows=9 loops=1)
  ->  Gather Merge  (cost=1000.88..612662.09 rows=1735 width=552) (actual time=218.483..4155.328 rows=9 loops=1)
        Workers Planned: 2
        Workers Launched: 2
        ->  Nested Loop  (cost=0.85..611461.80 rows=723 width=552) (actual time=118.170..3786.176 rows=3 loops=3)
              ->  Parallel Index Scan using posts_like_count_idx on posts  (cost=0.43..372849.07 rows=438756 width=504) (actual time=0.024..1542.094 rows=351005 loops=3)
              ->  Index Scan using advertisings_post_id_index on advertisings  (cost=0.43..0.53 rows=1 width=52) (actual time=0.006..0.006 rows=0 loops=1053015)
                    Index Cond: (post_id = posts.id)
                    Filter: ((created_at >= '2020-11-25 00:00:00'::timestamp without time zone) AND (created_at < '2020-12-02 23:59:59'::timestamp without time zone))
                    Rows Removed by Filter: 1
Planning Time: 0.394 ms
Execution Time: 4155.379 ms

我花了几个小时在谷歌上搜索，但找不到正确的解决方案。我们会非常感谢你的帮助。

更新的

当我继续缩小过滤器

WHERE "advertisings"."created_at" >= '2020-11-27T00:00:00Z'
AND   "advertisings"."created_at" < '2020-12-02T23:59:59Z'

它还将9条记录作为上述查询的慢速返回。然而，这一次，查询的速度确实很快。

Limit  (cost=8082.99..8083.04 rows=20 width=552) (actual time=0.062..0.065 rows=9 loops=1)
  ->  Sort  (cost=8082.99..8085.40 rows=962 width=552) (actual time=0.061..0.062 rows=9 loops=1)
        Sort Key: posts.like_count DESC
        Sort Method: quicksort  Memory: 32kB
        ->  Nested Loop  (cost=0.85..8057.39 rows=962 width=552) (actual time=0.019..0.047 rows=9 loops=1)
              ->  Index Scan using advertisings_created_at_index on advertisings  (cost=0.43..501.30 rows=962 width=52) (actual time=0.008..0.012 rows=9 loops=1)
                    Index Cond: ((created_at >= '2020-11-27 00:00:00'::timestamp without time zone) AND (created_at < '2020-12-02 23:59:59'::timestamp without time zone))
              ->  Index Scan using posts_pkey on posts  (cost=0.43..7.85 rows=1 width=504) (actual time=0.003..0.003 rows=1 loops=9)
                    Index Cond: (id = advertisings.post_id)
Planning Time: 0.540 ms
Execution Time: 0.096 ms

我不知道会发生什么

postgresql

performance

sql-execution-plan

postgresql-12

回答 1

Stack Overflow用户

发布于 2020-12-03 03:41:16

在前两个查询和最后一个查询中，PostgreSQL遵循两种不同的策略：

如果有许多匹配的advertisings行，则使用嵌套的循环联接来按照ORDER BY子句的顺序提取行，并丢弃在找到20.

之前与条件不匹配的行。

如果匹配的advertisings行很少，它会获取这几行，然后是posts中的匹配行，然后排序并接受前20行。

第二次执行很慢，因为PostgreSQL高估了与条件匹配的advertisings中的行。看看它是如何估计962而不是第三个查询中的9的吗？

解决方案是改进PostgreSQL的估计：

如果运行

分析广告；

这足以使缓慢的查询速度更快，让PostgreSQL更经常地收集统计数据：

更改表广告集(autovacuum_analyze_scale_factor =0.0 5)；

如果这还不够，请尝试收集更详细的统计数据：

设定default_statistics_target =1000，分析广告；

你可以用高达10000的值进行实验。找到起作用的值后，将其持久化：

更改表广告更改created_at集统计信息1000;

票数 3

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/65118809

复制

相似问题

问当记录数量少于“限制”时，Postgres奇怪的查询计划
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问当记录数量少于“限制”时，Postgres奇怪的查询计划EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问当记录数量少于“限制”时，Postgres奇怪的查询计划
EN