前往小程序,Get更优阅读体验!
立即前往
首页
学习
活动
专区
工具
TVP
发布
社区首页 >专栏 >MySQL 案例:摸不准的查询优化器与索引

MySQL 案例:摸不准的查询优化器与索引

原创
作者头像
王文安@DBA
发布2022-02-28 16:14:15
1K0
发布2022-02-28 16:14:15
举报

背景

近期有用户在咨询查询的问题,发现一个比较典型的案例,SQL 语句无法选择正确的索引,导致查询效率偏低,正好借这个案例来学习一下查询优化器的知识。所使用的的 SQL 语句、数据以及表结构均已脱敏,

问题描述

使用 MariaDB 10.3,在一张业务表中,包含了 col1 和 col2 的唯一索引,使用如下语句查询时,只用了唯一索引的第一列,而没有用到第二列。

select col2 from t1 where col1 = 1 and col2 >= 4000000 order by col2 limit 50;

从常理来看,这个查询用上 col1 和 col2 的联合索引应该是最优解。而且衍生的相关查询均无法利用联合索引的两个列。

代码语言:txt
复制
select col2 from t1 where col1 = 1 and col2 >= 4000000 order by col2;

select col2 from t1 where col1 = 1 and col2 >= 4000000 limit 50;

select col2 from t1 where col1 = 1 and col2 >= 4000000;

模拟验证

部署 MariaDB 10.3.34 官方版本,构造测试环境用的语句如下:

代码语言:txt
复制
CREATE TABLE `t1` (

  `id` bigint(20) unsigned NOT NULL AUTO\_INCREMENT,

  `col1` int(11) NOT NULL,

  `col2` bigint(20) NOT NULL,

  `col3` varchar(20) NOT NULL DEFAULT '',

  `col4` varchar(20) DEFAULT NULL,

  `col5` varchar(100) DEFAULT NULL,

  `col6` varchar(20) DEFAULT NULL,

  `col7` varchar(100) DEFAULT NULL,

  `col8` varchar(100) DEFAULT NULL,

  `col9` varchar(100) DEFAULT NULL,

  `col10` tinyint(1) NOT NULL DEFAULT 0,

  `col11` timestamp NOT NULL DEFAULT current_timestamp(),

  `col12` timestamp NOT NULL DEFAULT current_timestamp() ON UPDATE current_timestamp(),

  PRIMARY KEY (`id`,`col1`),

  UNIQUE KEY `unq_col1_col2` (`col1`,`col2`),

  KEY `idx_1` (`col2`),

  KEY `idx_2` (`col5`),

  KEY `idx_3` (`col1`,`col4`),

  KEY `idx_4` (`col1`,`col11`),

  KEY `idx_5` (`col11`),

  KEY `idx_6` (`col12`)

) ENGINE=InnoDB;

insert into t1(col1,col2) values(1,1);

insert into t1 (col2, col1) select col2 + 1, col1 from t1 where col1 = 1;

insert into t1 (col2, col1) select col2 + 2, col1 from t1 where col1 = 1;

insert into t1 (col2, col1) select col2 + 4, col1 from t1 where col1 = 1;

insert into t1 (col2, col1) select col2 + 8, col1 from t1 where col1 = 1;

insert into t1 (col2, col1) select col2 + 16, col1 from t1 where col1 = 1;

insert into t1 (col2, col1) select col2 + 32, col1 from t1 where col1 = 1;

insert into t1 (col2, col1) select col2 + 64, col1 from t1 where col1 = 1;

insert into t1 (col2, col1) select col2 + 128, col1 from t1 where col1 = 1;

insert into t1 (col2, col1) select col2 + 256, col1 from t1 where col1 = 1;

insert into t1 (col2, col1) select col2 + 512, col1 from t1 where col1 = 1;

insert into t1 (col2, col1) select col2 + 1024, col1 from t1 where col1 = 1;

insert into t1 (col2, col1) select col2 + 2048, col1 from t1 where col1 = 1;

insert into t1 (col2, col1) select col2 + 4096, col1 from t1 where col1 = 1;

insert into t1 (col2, col1) select col2 + 8192, col1 from t1 where col1 = 1;

insert into t1 (col2, col1) select col2 + 16384, col1 from t1 where col1 = 1;

insert into t1 (col2, col1) select col2 + 32768, col1 from t1 where col1 = 1;

insert into t1 (col2, col1) select col2 + 65536, col1 from t1 where col1 = 1;

insert into t1 (col2, col1) select col2 + 131072, col1 from t1 where col1 = 1;

insert into t1 (col2, col1) select col2 + 262144, col1 from t1 where col1 = 1;

insert into t1 (col2, col1) select col2 + 524288, col1 from t1 where col1 = 1;

insert into t1 (col2, col1) select col2 + 1048576, col1 from t1 where col1 = 1;

insert into t1 (col2, col1) select col2 + 2097152, col1 from t1 where col1 = 1;

insert into t1(col1,col2) values(10,10),(10,11),(10,12);

那么先看看执行计划:

代码语言:txt
复制
MariaDB [test]> explain select col2 from t1 where col1 = 1 and col2 >= 4000000 order by col2 limit 50;
+------+-------------+-------+------+---------------------------------+---------------+---------+-------+--------+--------------------------+
| id   | select_type | table | type | possible_keys                   | key           | key_len | ref   | rows   | Extra                    |
+------+-------------+-------+------+---------------------------------+---------------+---------+-------+--------+--------------------------+
|    1 | SIMPLE      | t1    | ref  | unq_col1_col2,idx_1,idx_3,idx_4 | unq_col1_col2 | 4       | const | 388450 | Using where; Using index |
+------+-------------+-------+------+---------------------------------+---------------+---------+-------+--------+--------------------------+
1 row in set (0.000 sec)

MariaDB [test]> explain select col2 from t1 where col1 = 1 and col2 >= 4000000 order by col2;
+------+-------------+-------+------+---------------------------------+---------------+---------+-------+--------+--------------------------+
| id   | select_type | table | type | possible_keys                   | key           | key_len | ref   | rows   | Extra                    |
+------+-------------+-------+------+---------------------------------+---------------+---------+-------+--------+--------------------------+
|    1 | SIMPLE      | t1    | ref  | unq_col1_col2,idx_1,idx_3,idx_4 | unq_col1_col2 | 4       | const | 388450 | Using where; Using index |
+------+-------------+-------+------+---------------------------------+---------------+---------+-------+--------+--------------------------+
1 row in set (0.000 sec)

MariaDB [test]> explain select col2 from t1 where col1 = 1 and col2 >= 4000000 limit 50;
+------+-------------+-------+------+---------------------------------+---------------+---------+-------+--------+--------------------------+
| id   | select_type | table | type | possible_keys                   | key           | key_len | ref   | rows   | Extra                    |
+------+-------------+-------+------+---------------------------------+---------------+---------+-------+--------+--------------------------+
|    1 | SIMPLE      | t1    | ref  | unq_col1_col2,idx_1,idx_3,idx_4 | unq_col1_col2 | 4       | const | 388450 | Using where; Using index |
+------+-------------+-------+------+---------------------------------+---------------+---------+-------+--------+--------------------------+
1 row in set (0.000 sec)

MariaDB [test]> explain select col2 from t1 where col1 = 1 and col2 >= 4000000;
+------+-------------+-------+------+---------------------------------+---------------+---------+-------+--------+--------------------------+
| id   | select_type | table | type | possible_keys                   | key           | key_len | ref   | rows   | Extra                    |
+------+-------------+-------+------+---------------------------------+---------------+---------+-------+--------+--------------------------+
|    1 | SIMPLE      | t1    | ref  | unq_col1_col2,idx_1,idx_3,idx_4 | unq_col1_col2 | 4       | const | 388450 | Using where; Using index |
+------+-------------+-------+------+---------------------------------+---------------+---------+-------+--------+--------------------------+
1 row in set (0.000 sec)

MariaDB [test]> explain select col2 from t1 where col1 = 1 and col2 >= 1 order by col2 limit 50;
+------+-------------+-------+------+---------------------------------+---------------+---------+-------+---------+--------------------------+
| id   | select_type | table | type | possible_keys                   | key           | key_len | ref   | rows    | Extra                    |
+------+-------------+-------+------+---------------------------------+---------------+---------+-------+---------+--------------------------+
|    1 | SIMPLE      | t1    | ref  | unq_col1_col2,idx_1,idx_3,idx_4 | unq_col1_col2 | 4       | const | 2090440 | Using where; Using index |
+------+-------------+-------+------+---------------------------------+---------------+---------+-------+---------+--------------------------+
1 row in set (0.001 sec)

当变更查询条件,直到实际上结果集为空的时候,查询计划才会符合预期:

代码语言:txt
复制
MariaDB [test]> explain select col2 from t1 where col1 = 1 and col2 >= 5000000 order by col2 limit 50;
+------+-------------+-------+-------+---------------------------------+---------------+---------+------+------+--------------------------+
| id   | select_type | table | type  | possible_keys                   | key           | key_len | ref  | rows | Extra                    |
+------+-------------+-------+-------+---------------------------------+---------------+---------+------+------+--------------------------+
|    1 | SIMPLE      | t1    | range | unq_col1_col2,idx_1,idx_3,idx_4 | unq_col1_col2 | 12      | NULL |    1 | Using where; Using index |
+------+-------------+-------+-------+---------------------------------+---------------+---------+------+------+--------------------------+
1 row in set (0.000 sec)

PS:随机生成的数据中,col2 的最大值为 4194304。

原因简析

由于 MariaDB 10.3 并没有 optimizer_trace,因此很难去准确判断查询优化器因为什么原因没有选择联合索引,那么采用通常的人为干预手段,去试试看联合索引的效果,看看是否会有较好的查询效率:

代码语言:txt
复制
MariaDB [test]> explain select col2 from t1 force index(unq_col1_col2) where col1 = 1 and col2 >= 4000000 order by col2 limit 50;
+------+-------------+-------+-------+---------------+---------------+---------+------+--------+--------------------------+
| id   | select_type | table | type  | possible_keys | key           | key_len | ref  | rows   | Extra                    |
+------+-------------+-------+-------+---------------+---------------+---------+------+--------+--------------------------+
|    1 | SIMPLE      | t1    | range | unq_col1_col2 | unq_col1_col2 | 12      | NULL | 388450 | Using where; Using index |
+------+-------------+-------+-------+---------------+---------------+---------+------+--------+--------------------------+
1 row in set (0.000 sec)

MariaDB [test]> select col2 from t1 force index(unq_col1_col2) where col1 = 1 and col2 >= 4000000 order by col2 limit 50;
+---------+
| col2    |
+---------+
| 4000000 |
| 4000001 |
......
......
| 4000049 |
+---------+
50 rows in set (0.000 sec)

MariaDB [test]> select col2 from t1 where col1 = 1 and col2 >= 4000000 order by col2 limit 50;
+---------+
| col2    |
+---------+
| 4000000 |
| 4000001 |
......
......
| 4000047 |
| 4000048 |
| 4000049 |
+---------+
50 rows in set (0.796 sec)

可以发现查询时间上有非常明显的差距,确实是查询优化器选错了索引。

换到 MySQL 8.0 之后(官方版本和腾讯云数据库),查询计划选择了正确的索引,可以faxian 执行计划完全没有问题,且随着查询条件的变化,选择的索引都是合理且效率很高的。

代码语言:txt
复制
mysql> explain select col2 from t1 where col1 = 1 and col2 >= 4000000 order by col2 limit 50;
+----+-------------+-------+------------+-------+---------------------------------+-------+---------+------+--------+----------+--------------------------+
| id | select_type | table | partitions | type  | possible_keys                   | key   | key_len | ref  | rows   | filtered | Extra                    |
+----+-------------+-------+------------+-------+---------------------------------+-------+---------+------+--------+----------+--------------------------+
|  1 | SIMPLE      | t1    | NULL       | range | unq_col1_col2,idx_1,idx_3,idx_4 | idx_1 | 8       | NULL | 388450 |    50.00 | Using where; Using index |
+----+-------------+-------+------------+-------+---------------------------------+-------+---------+------+--------+----------+--------------------------+
1 row in set, 1 warning (0.00 sec)

mysql> explain select col2 from t1 where col1 = 1 and col2 >= 1000000 order by col2 limit 50;
+----+-------------+-------+------------+-------+---------------------------------+---------------+---------+------+---------+----------+--------------------------+
| id | select_type | table | partitions | type  | possible_keys                   | key           | key_len | ref  | rows    | filtered | Extra                    |
+----+-------------+-------+------------+-------+---------------------------------+---------------+---------+------+---------+----------+--------------------------+
|  1 | SIMPLE      | t1    | NULL       | range | unq_col1_col2,idx_1,idx_3,idx_4 | unq_col1_col2 | 12      | NULL | 2090440 |   100.00 | Using where; Using index |
+----+-------------+-------+------------+-------+---------------------------------+---------------+---------+------+---------+----------+--------------------------+
1 row in set, 1 warning (0.00 sec)

如上示例中可以看到随着条件的变化,执行计划是发生了变化的,恰好 8.0 的版本有 optimizer_trace,那也正好借此机会看一下查询计划的变更与索引,查询条件的关系。

截取 trace 的部分信息如下当查询条件为 4000000 < mid 的时候:

代码语言:txt
复制
......
                      {
                        "index": "unq_col1_col2",
                        "ranges": [
                          "col1 = 1 AND 4000000 <= col2"
                        ],
                        "index_dives_for_eq_ranges": true,
                        "rowid_ordered": false,
                        "using_mrr": false,
                        "index_only": true,
                        "in_memory": 0.999848,
                        "rows": 388450,
                        "cost": 39129.3,  // 判断查询计划好坏的最终指标
                        "chosen": true
                      },
                      {
                        "index": "idx_1",
                        "ranges": [
                          "4000000 <= col2"
                        ],
                        "index_dives_for_eq_ranges": true,
                        "rowid_ordered": false,
                        "using_mrr": false,
                        "index_only": true,
                        "in_memory": 1,
                        "rows": 388450,
                        "cost": 39082.1,  // 判断查询计划好坏的最终指标
                        "chosen": true
                      },
......

可以看到 mid 单列索引以微弱的 cost 优势胜过了唯一索引。而 1000000 > mid 的时候,再看一下 trace 的信息:

代码语言:txt
复制
......
                      {
                        "index": "unq_col1_col2",
                        "ranges": [
                          "col1 = 1 AND 1000000 <= col2"
                        ],
                        "index_dives_for_eq_ranges": true,
                        "rowid_ordered": false,
                        "using_mrr": false,
                        "index_only": true,
                        "in_memory": 0.999848,
                        "rows": 2090440,
                        "cost": 210573,
                        "chosen": true
                      },
                      {
                        "index": "idx_1",
                        "ranges": [
                          "1000000 <= col2"
                        ],
                        "index_dives_for_eq_ranges": true,
                        "rowid_ordered": false,
                        "using_mrr": false,
                        "index_only": true,
                        "in_memory": 1,
                        "rows": 2090440,
                        "cost": 210319,
                        "chosen": true
                      },
......
                    {
                      "rows_to_scan": 2090440,
                      "filtering_effect": [
                      ],
                      "final_filtering_effect": 0.5,
                      "access_type": "range",
                      "range_details": {
                        "used_index": "idx_1"
                      },
                      "resulting_rows": 1.04522e+06,
                      "cost": 419363, //execution 阶段估算出来的 idx_1 最终 cost
                      "chosen": false
                    }
......
                        "reconsidering_access_paths_for_index_ordering":{
                            "clause":"ORDER BY",
                            "steps":[
......
                        "chosen_range_access_summary": {
                          "range_access_plan": {
                            "type": "range_scan",
                            "index": "unq_col1_col2",
                            "rows": 2.09044e+06,
                            "ranges": [
                              "col1 = 1 AND 1000000 <= col2"
                            ]
                          },
                          "rows_for_plan": 2.09044e+06,
                          "cost_for_plan": 210573,
                          "chosen": true
......
                            ],
                            "index_order_summary":{
                                "table":"`t1`",
                                "index_provides_order":true,
                                "order_direction":"asc",
                                "index":"unq_col1_col2",
                                "plan_changed":true,
                                "access_type":"range"
                            }
                        }
......

可以看到,在执行计划的选择中,其实还是用 mid 单列索引比较好,但是执行优化器在考虑到排序的时候,变更了一次索引,最终选择了唯一索引 unq_col1_col2。

总结

在大多数时候,MySQL 选择索引的唯一依据其实还是 cost,即通过统计信息来估算不同索引的执行代价,因此同一个语句出现执行计划变更的时候,不一定是遇到了 bug 或者是问题,可能只是在当前版本的代价计算模型中,数据库选择了一个“它认为更好的索引”。

诚然算法不是万能的,总会有一些处理不好的 case,相对而言高版本的代价计算总是相对准确的,可以考虑尽量使用大版本较新的数据库来支撑业务。

原创声明:本文系作者授权腾讯云开发者社区发表,未经许可,不得转载。

如有侵权,请联系 cloudcommunity@tencent.com 删除。

原创声明:本文系作者授权腾讯云开发者社区发表,未经许可,不得转载。

如有侵权,请联系 cloudcommunity@tencent.com 删除。

评论
登录后参与评论
0 条评论
热度
最新
推荐阅读
目录
  • 背景
  • 问题描述
  • 模拟验证
  • 原因简析
  • 总结
相关产品与服务
云数据库 SQL Server
腾讯云数据库 SQL Server (TencentDB for SQL Server)是业界最常用的商用数据库之一,对基于 Windows 架构的应用程序具有完美的支持。TencentDB for SQL Server 拥有微软正版授权,可持续为用户提供最新的功能,避免未授权使用软件的风险。具有即开即用、稳定可靠、安全运行、弹性扩缩等特点。
领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档