前往小程序,Get更优阅读体验!
立即前往
首页
学习
活动
专区
工具
TVP
发布
社区首页 >专栏 >PG基于dexter的自动化索引推荐

PG基于dexter的自动化索引推荐

原创
作者头像
保持热爱奔赴山海
修改2024-01-07 18:39:09
1720
修改2024-01-07 18:39:09
举报
文章被收录于专栏:饮水机管理员饮水机管理员

github地址: https://github.com/ankane/dexter

安装高版本ruby

代码语言:shell
复制
安装scl源:
yum install  scl-utils centos-release-scl scl-utils-build


查看从 SCL 中安装的包的列表:
scl --list

列出scl源有哪些包可以用:
yum list all --enablerepo='centos-sclo-rh' | grep ruby

	
查看从 SCL 中安装的包的列表:
scl --list

安装ruby3.0版本
yum install rh-ruby30 rh-ruby30-ruby-devel

切到ruby3.0环境下
scl enable rh-ruby30 bash

# ruby --version
ruby 3.0.2p107 (2021-07-07 revision 0db68f0233) [x86_64-linux]

安装pgdexter组件

代码语言:shell
复制
$ scl enable rh-ruby30 bash
# export PATH=/usr/pgsql-15/bin/:/usr/pgsql-15/lib/:$PATH
# export https_proxy=http://192.168.31.1:7890 && export http_proxy=http://192.168.31.1:7890
# gem install pgdexter
Successfully installed google-protobuf-3.25.1-x86_64-linux
Building native extensions. This could take a while...
Successfully installed pg-1.5.4
Fetching pgdexter-0.5.1.gem
Successfully installed pgdexter-0.5.1
Parsing documentation for google-protobuf-3.25.1-x86_64-linux
Installing ri documentation for google-protobuf-3.25.1-x86_64-linux
Parsing documentation for pg-1.5.4
Installing ri documentation for pg-1.5.4
Parsing documentation for pgdexter-0.5.1
Installing ri documentation for pgdexter-0.5.1
Done installing documentation for google-protobuf, pg, pgdexter after 13 seconds
3 gems installed

对pg启用2个组件

插件不是本文重点,安装步骤忽略

代码语言:sql
复制
CREATE EXTENSION hypopg;
CREATE EXTENSION pg_stat_statements

造点测试数据,并进行压测

代码语言:shell
复制
$ pgbench  postgres  --initialize   --fillfactor=10 --scale=10
dropping old tables...
creating tables...
generating data (client-side)...
1000000 of 1000000 tuples (100%) done (elapsed 14.24 s, remaining 0.00 s)
vacuuming...
creating primary keys...
done in 23.42 s (drop tables 0.01 s, create tables 0.01 s, client-side generate 21.56 s, vacuum 0.44 s, primary keys 1.41 s).


postgres=# \dt+
                                                List of relations
 Schema |          Name           |   Type   |  Owner   | Persistence | Access method |    Size    | Description
--------+-------------------------+----------+----------+-------------+---------------+------------+-------------
 public | pgbench_accounts        | table    | postgres | permanent   | heap          | 1302 MB    |
 public | pgbench_branches        | table    | postgres | permanent   | heap          | 40 kB      |
 public | pgbench_history         | table    | postgres | permanent   | heap          | 0 bytes    |
 public | pgbench_tellers         | table    | postgres | permanent   | heap          | 80 kB      |


然后我们把原表的主键索引删掉,制造出全表扫描的场景
postgres=# alter table pgbench_accounts drop constraint pgbench_accounts_pkey ;
ALTER TABLE
Time: 9.403 ms


另外开一个窗口,执行压测
$ pgbench -c 10 -j 10
pgbench (16.0 - Percona Distribution, server 15.4 - Percona Distribution)
starting vacuum...end.
transaction type: <builtin: TPC-B (sort of)>
scaling factor: 10
query mode: simple
number of clients: 10
number of threads: 10
maximum number of tries: 1
number of transactions per client: 10
number of transactions actually processed: 100/100
number of failed transactions: 0 (0.000%)
latency average = 2140.750 ms
initial connection time = 74.566 ms
tps = 4.671260 (without initial connection time)

将dexer连接到pg并读取pg-stat-statements信息进行分析【推荐】

代码语言:shell
复制
# dexter -d postgres -Udts -h192.168.31.181 --pg-stat-statements --enable-hypopg
返回结果:
Processing 269 new query fingerprints
Index found: public.pgbench_accounts (aid)

如果没有发现需要优化的sql,则返回类似如下:
Processing 192 new query fingerprints
No new indexes found


另外,如果需要直接创建索引,可以加参数 --create 【生产上不推荐这种】
# dexter -d postgres -Udts -h192.168.31.181 --pg-stat-statements  --enable-hypopg --create
Processing 269 new query fingerprints
Index found: public.pgbench_accounts (aid)
Creating index: CREATE INDEX CONCURRENTLY ON "public"."pgbench_accounts" ("aid")
Index created: 1967 ms

另外,也可以使用docker方式

代码语言:shell
复制
docker pull ankane/dexter

docker run --rm ankane/dexter dexter -d postgres -Udts -h192.168.31.181 --pg-stat-statements --enable-hypopg
Processing 44 new query fingerprints
Index found: public.pgbench_accounts (aid)

dexter的帮助

代码语言:shell
复制
# dexter --help
Usage:
    dexter [options]

Input options:
    --input-format          input format
    --pg-stat-activity      use pg_stat_activity
    --stdin                 use stdin
    -s, --statement         process a single statement

Connection options:
    -d, --dbname            database name
    -h, --host              database host
    -p, --port              database port
    -U, --username          database user

Processing options:
    --interval              time to wait between processing queries, in seconds
    --min-calls             only process queries that have been called a certain number of times
    --min-time              only process queries that have consumed a certain amount of DB time, in minutes

Indexing options:
    --analyze               analyze tables that haven't been analyzed in the past hour
    --create                create indexes
    --enable-hypopg         enable the HypoPG extension
    --exclude               prevent specific tables from being indexed
    --include               only include specific tables
    --tablespace            tablespace to create indexes

Logging options:
    --log-level             log level
    --log-sql               log sql

Other options:
    -v, --version           print the version
    --help                  prints help

其它

直接从pg当前会话进行分析【推荐】

代码语言:shell
复制
# dexter -d postgres -Udts -h192.168.31.181  --pg-stat-activity   --enable-hypopg
Started
Processing 2 new query fingerprints
Index found: public.pgbench_accounts (aid)

从pg日志文件进行分析【不推荐】

代码语言:shell
复制
先调低pg慢日志时间
log_min_duration_statement = 10 # ms

然后使用
# dexter -d postgres -Udts -h192.168.31.181 postgresql-Wed.log
Processing 276 new query fingerprints
Index found: public.pgbench_accounts (aid)


支持stderr、csvlog、 和jsonlog格式。
对于实时索引,请通过管道传输日志文件:
tail -F -n +1 postgresql.csv | dexter -d postgres -Udts -h192.168.31.181 --enable-hypopg --stdin

从sql文件进行分析【不推荐】

代码语言:shell
复制
dexter -d postgres -Udts -h192.168.31.181 --enable-hypopg queries.sql
dexter -d postgres -Udts -h192.168.31.181 --enable-hypopg -s "SELECT * FROM xxx where xx=x"

收集选项

代码语言:shell
复制
可以忽略某些极少量的一次性的查询被索引,只分析运行了至少100次的sql
dexter --min-calls 100

只分析运行的总时长超过10分钟的sql
dexter --min-time 10 # minutes

流式传输日志时,指定处理查询之间等待的时间
dexter --interval 60 # seconds

分析

为了获得最佳结果,请确保您的表最近已被分析过,以便统计数据是最新的。您可以要求 Dexter 分析它遇到的在过去一小时内尚未分析过的表:

代码语言:shell
复制
# dexter -d postgres -Udts -h192.168.31.181 --pg-stat-statements  --analyze
Processing 276 new query fingerprints
Index found: public.pgbench_accounts (aid)

表格的过滤

代码语言:shell
复制
您可以使用以下方法从索引中排除大型表或写入量大的表:
dexter --exclude table1,table2

或者,您可以指定要索引的表:
dexter --include table3,table4

例:
# dexter -d postgres -Udts -h192.168.31.181 --pg-stat-statements --enable-hypopg --exclude=sm1.t1 --include=public.pgbench_accounts
Processing 44 new query fingerprints
Index found: public.pgbench_accounts (aid)

注意: --exclude=sm1.t1 和 --include 其中的表名不支持通配符的写法

调试

查看 Dexter 如何处理查询:

代码语言:shell
复制
dexter --log-sql --log-level debug2
dexter --log-sql --log-level debug3

发现的bug

update: 2024-01-07 ,作者已修复该问题并合并到master分支,详见 https://github.com/ankane/dexter/issues/48

代码语言:shell
复制
1、如果pg启用了auto_explain插件,且设置 auto_explain.log_min_duration = '0'



则执行dexter --pg-stat-statements 的时候会报错,如下是加了参数 --log-level debug2 打出的日志情况

# dexter -d postgres -Udts -h192.168.31.181 --pg-stat-statements --enable-hypopg --log-level debug2
Processing 5 new query fingerprints
Last analyze: public.pgbench_accounts : 2024-01-03T10:49:16+08:00
/opt/rh/rh-ruby30/root/usr/local/share/gems/gems/pgdexter-0.5.1/lib/dexter/indexer.rb:310:in `block in determine_indexes': undefined method `<' for nil:NilClass (NoMethodError)
        from /opt/rh/rh-ruby30/root/usr/local/share/gems/gems/pgdexter-0.5.1/lib/dexter/indexer.rb:306:in `each'
        from /opt/rh/rh-ruby30/root/usr/local/share/gems/gems/pgdexter-0.5.1/lib/dexter/indexer.rb:306:in `determine_indexes'
        from /opt/rh/rh-ruby30/root/usr/local/share/gems/gems/pgdexter-0.5.1/lib/dexter/indexer.rb:102:in `process_queries'
        from /opt/rh/rh-ruby30/root/usr/local/share/gems/gems/pgdexter-0.5.1/lib/dexter/indexer.rb:32:in `process_stat_statements'
        from /opt/rh/rh-ruby30/root/usr/local/share/gems/gems/pgdexter-0.5.1/lib/dexter/client.rb:27:in `perform'
        from /opt/rh/rh-ruby30/root/usr/local/share/gems/gems/pgdexter-0.5.1/lib/dexter/client.rb:9:in `start'
        from /opt/rh/rh-ruby30/root/usr/local/share/gems/gems/pgdexter-0.5.1/exe/dexter:7:in `<top (required)>'
        from /opt/rh/rh-ruby30/root/usr/local/bin/dexter:23:in `load'
        from /opt/rh/rh-ruby30/root/usr/local/bin/dexter:23:in `<main>'






根据报错日志,加些log输出
vim /opt/rh/rh-ruby30/root/usr/local/share/gems/gems/pgdexter-0.5.1/lib/dexter/indexer.rb +310  , 加一些 puts 输出变量到控制台

      savings_ratio = (1 - @min_cost_savings_pct / 100.0)
      puts "savings_ratio ---> ",savings_ratio

      queries.each do |query|
        if query.explainable? && query.high_cost?
          puts "query.costs -->", query.costs
          puts "query.costs.length -->", query.costs.length
          new_cost, new_cost2 = query.costs[1..2]
          puts "new_cost -->", new_cost
          puts "new_cost2 -->", new_cost2
          puts  "query.initial_cost -->", query.initial_cost
          puts  "savings_ratio -->",savings_ratio


再次执行上面的dexter命令,可以看到输出变量情况如下
# dexter -d postgres -Udts -h192.168.31.181 --pg-stat-statements --enable-hypopg --log-level debug2
Processing 25 new query fingerprints
Last analyze: public.pgbench_accounts : 2024-01-03T10:49:16+08:00
savings_ratio --->
0.5
query.costs -->
172875.43
query.costs.length -->   注意这里数组长度为1,造成 query.costs[1..2] 解包失败,导致 new_cost和new_cost2都为nil
1
new_cost -->

new_cost2 -->

query.initial_cost -->
172875.43
savings_ratio -->
0.5
/opt/rh/rh-ruby30/root/usr/local/share/gems/gems/pgdexter-0.5.1/lib/dexter/indexer.rb:318:in `block in determine_indexes': undefined method `<' for nil:NilClass (NoMethodError)
        from /opt/rh/rh-ruby30/root/usr/local/share/gems/gems/pgdexter-0.5.1/lib/dexter/indexer.rb:308:in `each'
        from /opt/rh/rh-ruby30/root/usr/local/share/gems/gems/pgdexter-0.5.1/lib/dexter/indexer.rb:308:in `determine_indexes'
        from /opt/rh/rh-ruby30/root/usr/local/share/gems/gems/pgdexter-0.5.1/lib/dexter/indexer.rb:102:in `process_queries'
        from /opt/rh/rh-ruby30/root/usr/local/share/gems/gems/pgdexter-0.5.1/lib/dexter/indexer.rb:32:in `process_stat_statements'
        from /opt/rh/rh-ruby30/root/usr/local/share/gems/gems/pgdexter-0.5.1/lib/dexter/client.rb:27:in `perform'
        from /opt/rh/rh-ruby30/root/usr/local/share/gems/gems/pgdexter-0.5.1/lib/dexter/client.rb:9:in `start'
        from /opt/rh/rh-ruby30/root/usr/local/share/gems/gems/pgdexter-0.5.1/exe/dexter:7:in `<top (required)>'
        from /opt/rh/rh-ruby30/root/usr/local/bin/dexter:23:in `load'
        from /opt/rh/rh-ruby30/root/usr/local/bin/dexter:23:in `<main>'




2、如果pg启用了auto_explain插件,但是设置 auto_explain.log_min_duration不为0,例如设置为 100 ,则不会有报错

postgres=# select pg_stat_statements_reset() ;
 pg_stat_statements_reset
--------------------------

(1 row)

Time: 2.176 ms
postgres=# select * from pgbench_accounts where aid=1000;
 aid  | bid | abalance |                                        filler
------+-----+----------+--------------------------------------------------------------------------------------
 1000 |   1 |        0 |
(1 row)


# dexter -d postgres -Udts -h192.168.31.181 --pg-stat-statements --enable-hypopg --log-level debug2
Processing 5 new query fingerprints
Last analyze: public.pgbench_accounts : 2024-01-03T10:49:16+08:00
savings_ratio --->
0.5
query.costs -->
172875.43
8.07
8.07
query.costs.length -->
3
new_cost -->
8.07
new_cost2 -->
8.07
query.initial_cost -->
172875.43
savings_ratio -->
0.5
Index found: public.pgbench_accounts (aid)
--------------------------------------------------------------------------------
Query 1edaea087b983cab
Start: 172875.43
Pass1: 8.07 : public.pgbench_accounts (aid)
Pass2: 8.07 : public.pgbench_accounts (aid)
Final: 8.07 : public.pgbench_accounts (aid)

select * from pgbench_accounts where aid=$1

--------------------------------------------------------------------------------
Query 300ac36fac258016
No tables

select pg_stat_statements_reset()

--------------------------------------------------------------------------------
Query 8d6f8ae553bbb2a4
No candidate tables for indexes

SELECT installed_version FROM pg_available_extensions WHERE name = $1 /*dexter*/

--------------------------------------------------------------------------------
Query c43bbb6237d6cbc6
No tables

SET lock_timeout = '5s' /*dexter*/

--------------------------------------------------------------------------------
Query ce4b1350f91bbc5f
No tables

SHOW server_version_num /*dexter*/



大致看了下indexer.rb的代码,正常情况下 calculate_plan 会执行3次,计算3次的costs存到数组里面,流程就不会抛错。 具体是哪个逻辑里面出问题的,ruby代码不想去深入了。

简单看了下dexter的逻辑(不到800行的代码),发现和我们目前生产在用的MySQL的sql自动化索引推荐服务有点类似,但PG有hypopg加持,MySQL我只能抽样数据做成沙箱环境。

原创声明:本文系作者授权腾讯云开发者社区发表,未经许可,不得转载。

如有侵权,请联系 cloudcommunity@tencent.com 删除。

原创声明:本文系作者授权腾讯云开发者社区发表,未经许可,不得转载。

如有侵权,请联系 cloudcommunity@tencent.com 删除。

评论
登录后参与评论
0 条评论
热度
最新
推荐阅读
目录
  • 安装高版本ruby
  • 安装pgdexter组件
  • 对pg启用2个组件
  • 造点测试数据,并进行压测
  • 将dexer连接到pg并读取pg-stat-statements信息进行分析【推荐】
  • 另外,也可以使用docker方式
  • dexter的帮助
  • 其它
    • 直接从pg当前会话进行分析【推荐】
      • 从pg日志文件进行分析【不推荐】
        • 从sql文件进行分析【不推荐】
        • 收集选项
        • 分析
        • 表格的过滤
        • 调试
        • 发现的bug
        相关产品与服务
        容器服务
        腾讯云容器服务(Tencent Kubernetes Engine, TKE)基于原生 kubernetes 提供以容器为核心的、高度可扩展的高性能容器管理服务,覆盖 Serverless、边缘计算、分布式云等多种业务部署场景,业内首创单个集群兼容多种计算节点的容器资源管理模式。同时产品作为云原生 Finops 领先布道者,主导开源项目Crane,全面助力客户实现资源优化、成本控制。
        领券
        问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档