前往小程序,Get更优阅读体验!
立即前往
首页
学习
活动
专区
工具
TVP
发布
社区首页 >专栏 >这条SQL的索引,你会如何创建?

这条SQL的索引,你会如何创建?

作者头像
bisal
发布2019-08-29 11:18:19
1.1K0
发布2019-08-29 11:18:19
举报
文章被收录于专栏:bisal的个人杂货铺

在微信群中,老虎刘老师提了一个有趣的问题,这个SQL,object_id列的可选择性非常高,owner列的可选择性比较差,你认为创建什么索引最佳?

代码语言:javascript
复制
select max(object_id) from t where owner='SYS';

但从这条SQL看,一共就用了两个字段,object_id和owner,如果使用穷举法,排列组合,能创建的B*Tree索引的类型就这几种,

1. object_id单键值索引。

2. owner单键值索引。

3. (object_id, owner)复合索引。

4. (owner, object_id)复合索引。

我们从实际的成本消耗,看下这几个方案,孰劣孰优?

首先创建测试表,可以看到,object_id选择率很高,owner选择率很低,

代码语言:javascript
复制
SQL> create table t as select object_id, owner from all_objects;
Table created.

SQL> select count(*) from t;
  COUNT(*)
----------
      5757

SQL> select count(distinct object_id) from t;
COUNT(DISTINCTOBJECT_ID)
------------------------
            5757

SQL> select owner, count(*) from t group by owner;
OWNER                 COUNT(*)
-------------------- ----------
BISAL                      6
PUBLIC                  3340
SYSTEM                     4
OE                         2
SYS                     2405

为了对比,我们看下无任何索引的消耗,全表扫描,consistent gets是69,

代码语言:javascript
复制
SQL> select max(object_id) from t where owner='SYS';
...	
---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      | 1     |    30 | 6   (0)| 00:00:01 |
|   1 |  SORT AGGREGATE    |      | 1     |    30 |        |          |
|*  2 |   TABLE ACCESS FULL| T    | 2405  | 72150 | 6   (0)| 00:00:01 |
---------------------------------------------------------------------------
...
Statistics
----------------------------------------------------------
     45  recursive calls
      0  db block gets
     69  consistent gets
      0  physical reads
      0  redo size
    534  bytes sent via SQL*Net to client
    519  bytes received via SQL*Net from client
      2  SQL*Net roundtrips to/from client
      5  sorts (memory)
      0  sorts (disk)
      1  rows processed

方案1,object_id单键值索引,由于条件字段owner,无索引,采用了全表扫描,consistent gets是32,

代码语言:javascript
复制
SQL> create index idx_t_01 on t(object_id);      
Index created.

SQL> select max(object_id) from t where owner='SYS';
...
---------------------------------------------------------------------------
| Id  | Operation       | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |        | 1     |    30 | 6   (0)| 00:00:01 |
|   1 |  SORT AGGREGATE    |        | 1     |    30 |        |          |
|*  2 |   TABLE ACCESS FULL| T      |  2405 | 72150 | 6   (0)| 00:00:01 |
---------------------------------------------------------------------------
...
Statistics
----------------------------------------------------------
      5  recursive calls
      0  db block gets
     32  consistent gets
      0  physical reads
      0  redo size
    534  bytes sent via SQL*Net to client
    519  bytes received via SQL*Net from client
      2  SQL*Net roundtrips to/from client
      0  sorts (memory)
      0  sorts (disk)
      1  rows processed

方案2,owner单键值索引,Oracle的CBO会根据各种执行计划的成本,选择出成本值最低的一个,虽然owner有索引,但是owner='SYS'的记录会返回接近一半的数据,相比索引单块读,全表扫描多块读,效率会更高一些,此时consistents gets是52,

代码语言:javascript
复制
SQL> create index idx_t_01 on t(owner);      
Index created.

SQL> select max(object_id) from t where owner='SYS';
...
---------------------------------------------------------------------------
| Id  | Operation       | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |  1    |    30 | 6   (0)| 00:00:01 |
|   1 |  SORT AGGREGATE    |      |  1    |    30 |        |          |
|*  2 |   TABLE ACCESS FULL| T    |  2405 | 72150 | 6   (0)| 00:00:01 |
---------------------------------------------------------------------------
...
Statistics
----------------------------------------------------------
     17  recursive calls
      0  db block gets
     52  consistent gets
      6  physical reads
      0  redo size
    534  bytes sent via SQL*Net to client
    519  bytes received via SQL*Net from client
      2  SQL*Net roundtrips to/from client
      0  sorts (memory)
      0  sorts (disk)
      1  rows processed

方案3,(object_id, owner)复合索引,因为条件只有owner,而索引前导列object_id的区分度很高,所以不会选择索引跳跃扫描,consistent gets是45,

代码语言:javascript
复制
SQL> create index idx_t_01 on t(object_id, owner);      
Index created.

SQL> select max(object_id) from t where owner='SYS';
...
---------------------------------------------------------------------------
| Id  | Operation       | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      | 1 |    30 | 6   (0)| 00:00:01 |
|   1 |  SORT AGGREGATE    |      | 1 |    30 |        |          |
|*  2 |   TABLE ACCESS FULL| T    | 1 |    30 | 6   (0)| 00:00:01 |
---------------------------------------------------------------------------
...
Statistics
----------------------------------------------------------
     13  recursive calls
      0  db block gets
     45  consistent gets
      0  physical reads
      0  redo size
    534  bytes sent via SQL*Net to client
    519  bytes received via SQL*Net from client
      2  SQL*Net roundtrips to/from client
      0  sorts (memory)
      0  sorts (disk)
      1  rows processed

如果使用HINT强制采用了这个索引,可以看到用的是索引全扫描,consistent get是32,这个效率和object_id单键值索引相差无几,

代码语言:javascript
复制
SQL> create index idx_t_01 on t(object_id, owner);      
Index created.

SQL> select max(object_id) from t where owner='SYS';
...
----------------------------------------------------------------------------------------
| Id  | Operation                   | Name     | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |          |     1 |    30 |    23   (0)| 00:00:01 |
|   1 |  SORT AGGREGATE             |          |     1 |    30 |            |          |
|   2 |   FIRST ROW                 |          |     1 |    30 |    23   (0)| 00:00:01 |
|*  3 |    INDEX FULL SCAN (MIN/MAX)| IDX_T_01 |     1 |    30 |    23   (0)| 00:00:01 |
----------------------------------------------------------------------------------------
...
Predicate Information (identified by operation id):
---------------------------------------------------

   3 - filter("OWNER"='SYS')
...
Statistics
----------------------------------------------------------
     13  recursive calls
      0  db block gets
     32  consistent gets
      1  physical reads
      0  redo size
    534  bytes sent via SQL*Net to client
    519  bytes received via SQL*Net from client
      2  SQL*Net roundtrips to/from client
      0  sorts (memory)
      0  sorts (disk)
      1  rows processed

如果我们增加测试数据量,

代码语言:javascript
复制
SQL> select count(*), count(distinct object_id), count(distinct owner) from t;
  COUNT(*) COUNT(DISTINCTOBJECT_ID) COUNT(DISTINCTOWNER)
---------- ------------------------ --------------------
   5898240               5760            5

SQL> select owner, count(*) from t group by owner;
OWNER                 COUNT(*)
-------------------- ----------
SYSTEM                   4096
OE                       2048
PUBLIC                3420160
BISAL                    6144
SYS                   2465792

此时,即使不用HINT了,Oracle仍会选择索引全扫描,因为索引叶子结点是有序排列,max/min的值,不是最左边,就是最右边,当数据量很小的情况下,可以通过owner='SYS'先从数据块中找到符合条件的记录(毕竟全表扫描是多块读,数据量小的时候,即使读所有数据,可能不会读几次,如果是索引读,则要一个索引块一个索引块地读),然后再统计max/min的值。但是当数据量非常大的情况下,通过owner='SYS'扫描数据块的开销,就会比之前增加几个数量级,而通过索引全扫描的方式,多块读索引块,用owner当作filter过滤条件,开销就会小很多,如下所示,数据量增加1000倍,consistent gets只增加了一倍,

代码语言:javascript
复制
SQL> select max(object_id) from t where owner = 'SYS';

----------------------------------------------------------------------------------------
| Id  | Operation            | Name     | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |          |     1 |    30 |    59   (0)| 00:00:01 |
|   1 |  SORT AGGREGATE             |          |     1 |    30 |            |          |
|   2 |   FIRST ROW                 |          |     1 |    30 |    59   (0)| 00:00:01 |
|*  3 |    INDEX FULL SCAN (MIN/MAX)| IDX_T_01 |     1 |    30 |    59   (0)| 00:00:01 |
----------------------------------------------------------------------------------------
...
Statistics
----------------------------------------------------------
      0  recursive calls
      0  db block gets
     66  consistent gets
      0  physical reads
      0  redo size
    534  bytes sent via SQL*Net to client
    519  bytes received via SQL*Net from client
      2  SQL*Net roundtrips to/from client
      0  sorts (memory)
      0  sorts (disk)
      1  rows processed

方案4,(owner, object_id)复合索引,因为检索条件中owner是索引的前导列,所以能使用索引范围扫描,consistent gets是40,但是不如object_id单键值索引,和object_id作为前导列的复合索引,

代码语言:javascript
复制
SQL> create index idx_t_01 on t(owner, object_id);      
Index created.

SQL> select max(object_id) from t where owner='SYS';
...
------------------------------------------------------------------	
	
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time    |
-----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |          |     1 |    30 |     2   (0)| 00:00:01 |
|   1 |  SORT AGGREGATE              |          |     1 |    30 |            |
|   2 |   FIRST ROW                  |          |     1 |    30 |     2   (0)| 00:00:01 |
|*  3 |    INDEX RANGE SCAN (MIN/MAX)| IDX_T_01 |     1 |    30 |     2   (0)| 00:00:01 |
-----------------------------------------------------------------------------------------	
...
Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access("OWNER"='SYS')

Statistics
----------------------------------------------------------
     17  recursive calls
      0  db block gets
     40  consistent gets
      7  physical reads
      0  redo size
    534  bytes sent via SQL*Net to client
    519  bytes received via SQL*Net from client
      2  SQL*Net roundtrips to/from client
      0  sorts (memory)
      0  sorts (disk)
      1  rows processed

从实验来看,object_id单键值索引和object_id作为前导列的复合索引,效率最高,当然这的需求是max/min,虽然where条件不带object_id,但基于B*Tree索引的特点,能用上索引,否则条件中没有object_id,就可能无法用上这个索引,要考虑其他方案。

此外,object_id单键值索引和object_id作为前导列的复合索引,对这两种索引的选择,还取决于业务的需求,如果有同时用object_id和owner这两个字段做检索条件的,可以选择使用复合索引,如果没这种需求,就可以创建object_id单键值索引,这样一来,使用object_id和其他字段的复合检索,都可能用上object_id的单键值索引,一举多得。

索引的选择,其实还是非常讲究的,无论是索引类型,还是索引字段的顺序,针对不同的业务场景,都会有不同方案,最根本的,还是对索引的原理通晓,加上经验的积累,才可能慢慢掌握。

本文参与 腾讯云自媒体同步曝光计划,分享自作者个人站点/博客。
原始发表:2019年08月22日,如有侵权请联系 cloudcommunity@tencent.com 删除

本文分享自 作者个人站点/博客 前往查看

如有侵权,请联系 cloudcommunity@tencent.com 删除。

本文参与 腾讯云自媒体同步曝光计划  ,欢迎热爱写作的你一起参与!

评论
登录后参与评论
0 条评论
热度
最新
推荐阅读
领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档