这条SQL的索引，你会如何创建？

bisal

发布于 2019-08-29 11:18:19

1.1K0

发布于 2019-08-29 11:18:19

在微信群中，老虎刘老师提了一个有趣的问题，这个SQL，object_id列的可选择性非常高，owner列的可选择性比较差，你认为创建什么索引最佳？

select max(object_id) from t where owner='SYS';

但从这条SQL看，一共就用了两个字段，object_id和owner，如果使用穷举法，排列组合，能创建的B*Tree索引的类型就这几种，

1. object_id单键值索引。

2. owner单键值索引。

3. (object_id, owner)复合索引。

4. (owner, object_id)复合索引。

我们从实际的成本消耗，看下这几个方案，孰劣孰优？

首先创建测试表，可以看到，object_id选择率很高，owner选择率很低，

SQL&gt; create table t as select object_id, owner from all_objects;
Table created.

SQL&gt; select count(*) from t;
  COUNT(*)
----------
      5757

SQL&gt; select count(distinct object_id) from t;
COUNT(DISTINCTOBJECT_ID)
------------------------
            5757

SQL&gt; select owner, count(*) from t group by owner;
OWNER                 COUNT(*)
-------------------- ----------
BISAL                      6
PUBLIC                  3340
SYSTEM                     4
OE                         2
SYS                     2405

为了对比，我们看下无任何索引的消耗，全表扫描，consistent gets是69，

SQL&gt; select max(object_id) from t where owner='SYS';
...	
---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      | 1     |    30 | 6   (0)| 00:00:01 |
|   1 |  SORT AGGREGATE    |      | 1     |    30 |        |          |
|*  2 |   TABLE ACCESS FULL| T    | 2405  | 72150 | 6   (0)| 00:00:01 |
---------------------------------------------------------------------------
...
Statistics
----------------------------------------------------------
     45  recursive calls
      0  db block gets
     69  consistent gets
      0  physical reads
      0  redo size
    534  bytes sent via SQL*Net to client
    519  bytes received via SQL*Net from client
      2  SQL*Net roundtrips to/from client
      5  sorts (memory)
      0  sorts (disk)
      1  rows processed

方案1，object_id单键值索引，由于条件字段owner，无索引，采用了全表扫描，consistent gets是32，

SQL&gt; create index idx_t_01 on t(object_id);      
Index created.

SQL&gt; select max(object_id) from t where owner='SYS';
...
---------------------------------------------------------------------------
| Id  | Operation       | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |        | 1     |    30 | 6   (0)| 00:00:01 |
|   1 |  SORT AGGREGATE    |        | 1     |    30 |        |          |
|*  2 |   TABLE ACCESS FULL| T      |  2405 | 72150 | 6   (0)| 00:00:01 |
---------------------------------------------------------------------------
...
Statistics
----------------------------------------------------------
      5  recursive calls
      0  db block gets
     32  consistent gets
      0  physical reads
      0  redo size
    534  bytes sent via SQL*Net to client
    519  bytes received via SQL*Net from client
      2  SQL*Net roundtrips to/from client
      0  sorts (memory)
      0  sorts (disk)
      1  rows processed

方案2，owner单键值索引，Oracle的CBO会根据各种执行计划的成本，选择出成本值最低的一个，虽然owner有索引，但是owner='SYS'的记录会返回接近一半的数据，相比索引单块读，全表扫描多块读，效率会更高一些，此时consistents gets是52，

SQL&gt; create index idx_t_01 on t(owner);      
Index created.

SQL&gt; select max(object_id) from t where owner='SYS';
...
---------------------------------------------------------------------------
| Id  | Operation       | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |  1    |    30 | 6   (0)| 00:00:01 |
|   1 |  SORT AGGREGATE    |      |  1    |    30 |        |          |
|*  2 |   TABLE ACCESS FULL| T    |  2405 | 72150 | 6   (0)| 00:00:01 |
---------------------------------------------------------------------------
...
Statistics
----------------------------------------------------------
     17  recursive calls
      0  db block gets
     52  consistent gets
      6  physical reads
      0  redo size
    534  bytes sent via SQL*Net to client
    519  bytes received via SQL*Net from client
      2  SQL*Net roundtrips to/from client
      0  sorts (memory)
      0  sorts (disk)
      1  rows processed

方案3，(object_id, owner)复合索引，因为条件只有owner，而索引前导列object_id的区分度很高，所以不会选择索引跳跃扫描，consistent gets是45，

SQL&gt; create index idx_t_01 on t(object_id, owner);      
Index created.

SQL&gt; select max(object_id) from t where owner='SYS';
...
---------------------------------------------------------------------------
| Id  | Operation       | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      | 1 |    30 | 6   (0)| 00:00:01 |
|   1 |  SORT AGGREGATE    |      | 1 |    30 |        |          |
|*  2 |   TABLE ACCESS FULL| T    | 1 |    30 | 6   (0)| 00:00:01 |
---------------------------------------------------------------------------
...
Statistics
----------------------------------------------------------
     13  recursive calls
      0  db block gets
     45  consistent gets
      0  physical reads
      0  redo size
    534  bytes sent via SQL*Net to client
    519  bytes received via SQL*Net from client
      2  SQL*Net roundtrips to/from client
      0  sorts (memory)
      0  sorts (disk)
      1  rows processed

如果使用HINT强制采用了这个索引，可以看到用的是索引全扫描，consistent get是32，这个效率和object_id单键值索引相差无几，

SQL&gt; create index idx_t_01 on t(object_id, owner);      
Index created.

SQL&gt; select max(object_id) from t where owner='SYS';
...
----------------------------------------------------------------------------------------
| Id  | Operation                   | Name     | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |          |     1 |    30 |    23   (0)| 00:00:01 |
|   1 |  SORT AGGREGATE             |          |     1 |    30 |            |          |
|   2 |   FIRST ROW                 |          |     1 |    30 |    23   (0)| 00:00:01 |
|*  3 |    INDEX FULL SCAN (MIN/MAX)| IDX_T_01 |     1 |    30 |    23   (0)| 00:00:01 |
----------------------------------------------------------------------------------------
...
Predicate Information (identified by operation id):
---------------------------------------------------

   3 - filter("OWNER"='SYS')
...
Statistics
----------------------------------------------------------
     13  recursive calls
      0  db block gets
     32  consistent gets
      1  physical reads
      0  redo size
    534  bytes sent via SQL*Net to client
    519  bytes received via SQL*Net from client
      2  SQL*Net roundtrips to/from client
      0  sorts (memory)
      0  sorts (disk)
      1  rows processed

如果我们增加测试数据量，

SQL&gt; select count(*), count(distinct object_id), count(distinct owner) from t;
  COUNT(*) COUNT(DISTINCTOBJECT_ID) COUNT(DISTINCTOWNER)
---------- ------------------------ --------------------
   5898240               5760            5

SQL&gt; select owner, count(*) from t group by owner;
OWNER                 COUNT(*)
-------------------- ----------
SYSTEM                   4096
OE                       2048
PUBLIC                3420160
BISAL                    6144
SYS                   2465792

此时，即使不用HINT了，Oracle仍会选择索引全扫描，因为索引叶子结点是有序排列，max/min的值，不是最左边，就是最右边，当数据量很小的情况下，可以通过owner='SYS'先从数据块中找到符合条件的记录(毕竟全表扫描是多块读，数据量小的时候，即使读所有数据，可能不会读几次，如果是索引读，则要一个索引块一个索引块地读)，然后再统计max/min的值。但是当数据量非常大的情况下，通过owner='SYS'扫描数据块的开销，就会比之前增加几个数量级，而通过索引全扫描的方式，多块读索引块，用owner当作filter过滤条件，开销就会小很多，如下所示，数据量增加1000倍，consistent gets只增加了一倍，

SQL&gt; select max(object_id) from t where owner = 'SYS';

----------------------------------------------------------------------------------------
| Id  | Operation            | Name     | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |          |     1 |    30 |    59   (0)| 00:00:01 |
|   1 |  SORT AGGREGATE             |          |     1 |    30 |            |          |
|   2 |   FIRST ROW                 |          |     1 |    30 |    59   (0)| 00:00:01 |
|*  3 |    INDEX FULL SCAN (MIN/MAX)| IDX_T_01 |     1 |    30 |    59   (0)| 00:00:01 |
----------------------------------------------------------------------------------------
...
Statistics
----------------------------------------------------------
      0  recursive calls
      0  db block gets
     66  consistent gets
      0  physical reads
      0  redo size
    534  bytes sent via SQL*Net to client
    519  bytes received via SQL*Net from client
      2  SQL*Net roundtrips to/from client
      0  sorts (memory)
      0  sorts (disk)
      1  rows processed

方案4，(owner, object_id)复合索引，因为检索条件中owner是索引的前导列，所以能使用索引范围扫描，consistent gets是40，但是不如object_id单键值索引，和object_id作为前导列的复合索引，

SQL&gt; create index idx_t_01 on t(owner, object_id);      
Index created.

SQL&gt; select max(object_id) from t where owner='SYS';
...
------------------------------------------------------------------	
	
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time    |
-----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |          |     1 |    30 |     2   (0)| 00:00:01 |
|   1 |  SORT AGGREGATE              |          |     1 |    30 |            |
|   2 |   FIRST ROW                  |          |     1 |    30 |     2   (0)| 00:00:01 |
|*  3 |    INDEX RANGE SCAN (MIN/MAX)| IDX_T_01 |     1 |    30 |     2   (0)| 00:00:01 |
-----------------------------------------------------------------------------------------	
...
Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access("OWNER"='SYS')

Statistics
----------------------------------------------------------
     17  recursive calls
      0  db block gets
     40  consistent gets
      7  physical reads
      0  redo size
    534  bytes sent via SQL*Net to client
    519  bytes received via SQL*Net from client
      2  SQL*Net roundtrips to/from client
      0  sorts (memory)
      0  sorts (disk)
      1  rows processed

从实验来看，object_id单键值索引和object_id作为前导列的复合索引，效率最高，当然这的需求是max/min，虽然where条件不带object_id，但基于B*Tree索引的特点，能用上索引，否则条件中没有object_id，就可能无法用上这个索引，要考虑其他方案。

此外，object_id单键值索引和object_id作为前导列的复合索引，对这两种索引的选择，还取决于业务的需求，如果有同时用object_id和owner这两个字段做检索条件的，可以选择使用复合索引，如果没这种需求，就可以创建object_id单键值索引，这样一来，使用object_id和其他字段的复合检索，都可能用上object_id的单键值索引，一举多得。

索引的选择，其实还是非常讲究的，无论是索引类型，还是索引字段的顺序，针对不同的业务场景，都会有不同方案，最根本的，还是对索引的原理通晓，加上经验的积累，才可能慢慢掌握。

本文参与腾讯云自媒体同步曝光计划，分享自作者个人站点/博客。

原始发表：2019年08月22日，如有侵权请联系 cloudcommunity@tencent.com 删除

oracle