首页
学习
活动
专区
圈层
工具
发布
首页
学习
活动
专区
圈层
工具
MCP广场
社区首页 >问答首页 >当记录数量少于“限制”时,Postgres奇怪的查询计划

当记录数量少于“限制”时,Postgres奇怪的查询计划
EN

Stack Overflow用户
提问于 2020-12-03 02:00:04
回答 1查看 310关注 0票数 1

对于大日期过滤器,我有一个非常快速的查询

代码语言:javascript
运行
复制
EXPLAIN ANALYZE
SELECT "advertisings"."id",
       "advertisings"."page_id",
       "advertisings"."page_name",
       "advertisings"."created_at",
       "posts"."image_url",
       "posts"."thumbnail_url",
       "posts"."post_content",
       "posts"."like_count"
FROM "advertisings"
  INNER JOIN "posts" ON "advertisings"."post_id" = "posts"."id"
WHERE "advertisings"."created_at" >= '2020-01-01T00:00:00Z'
AND   "advertisings"."created_at" < '2020-12-02T23:59:59Z'
ORDER BY "like_count" DESC LIMIT 20

查询计划是:

代码语言:javascript
运行
复制
Limit  (cost=0.85..20.13 rows=20 width=552) (actual time=0.026..0.173 rows=20 loops=1)
  ->  Nested Loop  (cost=0.85..951662.55 rows=987279 width=552) (actual time=0.025..0.169 rows=20 loops=1)
        ->  Index Scan using posts_like_count_idx on posts  (cost=0.43..378991.65 rows=1053015 width=504) (actual time=0.013..0.039 rows=20 loops=1)
        ->  Index Scan using advertisings_post_id_index on advertisings  (cost=0.43..0.53 rows=1 width=52) (actual time=0.005..0.006 rows=1 loops=20)
              Index Cond: (post_id = posts.id)
              Filter: ((created_at >= '2020-01-01 00:00:00'::timestamp without time zone) AND (created_at < '2020-12-02 23:59:59'::timestamp without time zone))
Planning Time: 0.365 ms
Execution Time: 0.199 ms

但是,当我缩小筛选器(更改"created_at“>= '2020-11-25T00:00:00Z')返回9个记录(低于限制20)时,查询非常缓慢。

代码语言:javascript
运行
复制
EXPLAIN ANALYZE
SELECT "advertisings"."id",
       "advertisings"."page_id",
       "advertisings"."page_name",
       "advertisings"."created_at",
       "posts"."image_url",
       "posts"."thumbnail_url",
       "posts"."post_content",
       "posts"."like_count"
FROM "advertisings"
  INNER JOIN "posts" ON "advertisings"."post_id" = "posts"."id"
WHERE "advertisings"."created_at" >= '2020-11-25T00:00:00Z'
AND   "advertisings"."created_at" < '2020-12-02T23:59:59Z'
ORDER BY "like_count" DESC LIMIT 20

查询计划:

代码语言:javascript
运行
复制
Limit  (cost=1000.88..8051.73 rows=20 width=552) (actual time=218.485..4155.336 rows=9 loops=1)
  ->  Gather Merge  (cost=1000.88..612662.09 rows=1735 width=552) (actual time=218.483..4155.328 rows=9 loops=1)
        Workers Planned: 2
        Workers Launched: 2
        ->  Nested Loop  (cost=0.85..611461.80 rows=723 width=552) (actual time=118.170..3786.176 rows=3 loops=3)
              ->  Parallel Index Scan using posts_like_count_idx on posts  (cost=0.43..372849.07 rows=438756 width=504) (actual time=0.024..1542.094 rows=351005 loops=3)
              ->  Index Scan using advertisings_post_id_index on advertisings  (cost=0.43..0.53 rows=1 width=52) (actual time=0.006..0.006 rows=0 loops=1053015)
                    Index Cond: (post_id = posts.id)
                    Filter: ((created_at >= '2020-11-25 00:00:00'::timestamp without time zone) AND (created_at < '2020-12-02 23:59:59'::timestamp without time zone))
                    Rows Removed by Filter: 1
Planning Time: 0.394 ms
Execution Time: 4155.379 ms

我花了几个小时在谷歌上搜索,但找不到正确的解决方案。我们会非常感谢你的帮助。

更新的

当我继续缩小过滤器

代码语言:javascript
运行
复制
WHERE "advertisings"."created_at" >= '2020-11-27T00:00:00Z'
AND   "advertisings"."created_at" < '2020-12-02T23:59:59Z'

它还将9条记录作为上述查询的慢速返回。然而,这一次,查询的速度确实很快。

代码语言:javascript
运行
复制
Limit  (cost=8082.99..8083.04 rows=20 width=552) (actual time=0.062..0.065 rows=9 loops=1)
  ->  Sort  (cost=8082.99..8085.40 rows=962 width=552) (actual time=0.061..0.062 rows=9 loops=1)
        Sort Key: posts.like_count DESC
        Sort Method: quicksort  Memory: 32kB
        ->  Nested Loop  (cost=0.85..8057.39 rows=962 width=552) (actual time=0.019..0.047 rows=9 loops=1)
              ->  Index Scan using advertisings_created_at_index on advertisings  (cost=0.43..501.30 rows=962 width=52) (actual time=0.008..0.012 rows=9 loops=1)
                    Index Cond: ((created_at >= '2020-11-27 00:00:00'::timestamp without time zone) AND (created_at < '2020-12-02 23:59:59'::timestamp without time zone))
              ->  Index Scan using posts_pkey on posts  (cost=0.43..7.85 rows=1 width=504) (actual time=0.003..0.003 rows=1 loops=9)
                    Index Cond: (id = advertisings.post_id)
Planning Time: 0.540 ms
Execution Time: 0.096 ms

我不知道会发生什么

EN

回答 1

Stack Overflow用户

发布于 2020-12-03 03:41:16

在前两个查询和最后一个查询中,PostgreSQL遵循两种不同的策略:

  • 如果有许多匹配的advertisings行,则使用嵌套的循环联接来按照ORDER BY子句的顺序提取行,并丢弃在找到20.

之前与条件不匹配的行。

  • 如果匹配的advertisings行很少,它会获取这几行,然后是posts中的匹配行,然后排序并接受前20行。

第二次执行很慢,因为PostgreSQL高估了与条件匹配的advertisings中的行。看看它是如何估计962而不是第三个查询中的9的吗?

解决方案是改进PostgreSQL的估计:

  • 如果运行

分析广告;

这足以使缓慢的查询速度更快,让PostgreSQL更经常地收集统计数据:

更改表广告集(autovacuum_analyze_scale_factor =0.0 5);

如果这还不够,请尝试收集更详细的统计数据:

设定default_statistics_target =1000,分析广告;

你可以用高达10000的值进行实验。找到起作用的值后,将其持久化:

更改表广告更改created_at集统计信息1000;

票数 3
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/65118809

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档