前往小程序,Get更优阅读体验!
立即前往
首页
学习
活动
专区
工具
TVP
发布
社区首页 >专栏 >灌入大量数据后手工采集统计信息的重要性

灌入大量数据后手工采集统计信息的重要性

作者头像
bisal
发布2019-01-29 14:32:58
3070
发布2019-01-29 14:32:58
举报

1. 创建测试表TBL_STAT,及索引,但不插入记录

SQL> create table TBL_STAT as select * from dba_objects where 1<>1; Table created. SQL> create index idx_tbl_stat on tbl_stat (object_id); Index created. SQL> select count(*) from tbl_stat;   COUNT(*) ----------          0

2. 检索TBL_STAT的执行计划

SQL> explain plan for select object_name from tbl_stat where object_id = 1; Explained. SQL> select * from table(dbms_xplan.display); PLAN_TABLE_OUTPUT -------------------------------------------------------------------------------- Plan hash value: 2448091186 ------------------------------------------------------------------------------ | Id  | Operation         | Name     | Rows  | Bytes | Cost (%CPU)| Time     | ------------------------------------------------------------------------------ |   0 | SELECT STATEMENT  |          |     1 |    79 |     2   (0)| 00:00:01 | |*  1 |  TABLE ACCESS FULL| TBL_STAT |     1 |    79 |     2   (0)| 00:00:01 | ------------------------------------------------------------------------------ Predicate Information (identified by operation id): --------------------------------------------------- PLAN_TABLE_OUTPUT --------------------------------------------------------------------------------    1 - filter("OBJECT_ID"=1) Note -----    - dynamic sampling used for this statement 17 rows selected.

发现按照索引字段查询使用的是全表扫描

3. 手工收集TBL_STAT表的统计信息

SQL> exec dbms_stats.gather_table_stats(ownname=>'DCSOPEN', tabname=>'TBL_STAT',  estimate_percent=>100); PL/SQL procedure successfully completed. 4. 再次检索TBL_STAT表 SQL> explain plan for select object_name from tbl_stat where object_id = 1; Explained. SQL> select * from table(dbms_xplan.display); PLAN_TABLE_OUTPUT -------------------------------------------------------------------------------- Plan hash value: 3529113932 -------------------------------------------------------------------------------------------- | Id  | Operation                   | Name         | Rows  | Bytes | Cost (%CPU)| Time     |

-------------------------------------------------------------------------------------------- PLAN_TABLE_OUTPUT -------------------------------------------------------------------------------- |   0 | SELECT STATEMENT            |              |     1 |    79 |     1   (0)| 00:00:01 | |   1 |  TABLE ACCESS BY INDEX ROWID| TBL_STAT     |     1 |    79 |     1   (0)| 00:00:01 | |*  2 |   INDEX RANGE SCAN          | IDX_TBL_STAT |     1 |       |     1   (0)| 00:00:01 | -------------------------------------------------------------------------------------------- PLAN_TABLE_OUTPUT -------------------------------------------------------------------------------- Predicate Information (identified by operation id): ---------------------------------------------------    2 - access("OBJECT_ID"=1) 14 rows selected.

发现这次用到了索引范围扫描,说明收集统计信息让Oracle可以选择正确的执行计划路径

5. 插入100万的测试记录

SQL> begin   2    for i in 1 .. 10 loop   3      insert into tbl_stat select * from dba_objects;   4      commit;   5    end loop;   6  end;   7  / PL/SQL procedure successfully completed.

SQL> select count(*) from tbl_stat;   COUNT(*) ----------    1190725

6. 查看检索TBL_STAT表的执行计划

SQL> explain plan for select object_name from tbl_stat where object_id = 1; Explained. SQL> select * from table(dbms_xplan.display); PLAN_TABLE_OUTPUT -------------------------------------------------------------------------------- Plan hash value: 3529113932 -------------------------------------------------------------------------------------------- | Id  | Operation                   | Name         | Rows  | Bytes | Cost (%CPU)| Time     | -------------------------------------------------------------------------------------------- PLAN_TABLE_OUTPUT --------------------------------------------------------------------------------

|   0 | SELECT STATEMENT            |              |     1 |    79 |     1   (0)| 00:00:01 | |   1 |  TABLE ACCESS BY INDEX ROWID| TBL_STAT     |     1 |    79 |     1   (0)| 00:00:01 | |*  2 |   INDEX RANGE SCAN          | IDX_TBL_STAT |     1 |       |     1   (0)| 00:00:01 | -------------------------------------------------------------------------------------------- PLAN_TABLE_OUTPUT -------------------------------------------------------------------------------- Predicate Information (identified by operation id): ---------------------------------------------------    2 - access("OBJECT_ID"=1) 14 rows selected. 插入100万记录后,发现仍是索引范围扫描。

7. 创建第二个测试表TBL_STAT_2,以及索引

SQL> create table tbl_stat_2 as select * from tbl_stat; Table created. SQL> create index idx_tbl_stat_2 on tbl_stat_2 (object_id); Index created. SQL> select count(*) from tbl_stat_2;   COUNT(*) ----------    1190725

8. 检索TBL_STAT和TBL_STAT_2关联查询的执行计划

SQL> explain plan for select a.object_name, b.object_name from tbl_stat a, tbl_stat_2 b where a.object_Id = b.object_id; Explained. SQL> select * from table(dbms_xplan.display); PLAN_TABLE_OUTPUT -------------------------------------------------------------------------------- Plan hash value: 752230886 ---------------------------------------------------------------------------------------------- | Id  | Operation                   | Name           | Rows  | Bytes | Cost (%CPU)| Time     | ---------------------------------------------------------------------------------------------- PLAN_TABLE_OUTPUT -------------------------------------------------------------------------------- |   0 | SELECT STATEMENT            |                |     1 |   158 |    27   (0)| 00:00:01 | |   1 |  TABLE ACCESS BY INDEX ROWID| TBL_STAT_2     |    25 |  1975 |    25   (0)| 00:00:01 | |   2 |   NESTED LOOPS              |                |     1 |   158 |    27   (0)| 00:00:01 | |   3 |    TABLE ACCESS FULL        | TBL_STAT       |     1 |    79 |     2   (0)| 00:00:01 | PLAN_TABLE_OUTPUT -------------------------------------------------------------------------------- |*  4 |    INDEX RANGE SCAN         | IDX_TBL_STAT_2 |    25 |       |     2   (0)| 00:00:01 | ---------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- PLAN_TABLE_OUTPUT --------------------------------------------------------------------------------    4 - access("A"."OBJECT_ID"="B"."OBJECT_ID") Note -----    - dynamic sampling used for this statement 20 rows selected.

可以看到这里对TBl_STAT使用的是全表扫描,对TBL_STAT_2使用的是索引扫描,表之间是嵌套循环连接

SQL> explain plan for select a.object_name, b.object_name from tbl_stat_2 a, tbl_stat b where a.object_Id = b.object_id; Explained. SQL> select * from table(dbms_xplan.display); PLAN_TABLE_OUTPUT -------------------------------------------------------------------------------- Plan hash value: 752230886 ---------------------------------------------------------------------------------------------- | Id  | Operation                   | Name           | Rows  | Bytes | Cost (%CPU)| Time     | ---------------------------------------------------------------------------------------------- PLAN_TABLE_OUTPUT --------------------------------------------------------------------------------

|   0 | SELECT STATEMENT            |                |     1 |   158 |    27   (0)| 00:00:01 | |   1 |  TABLE ACCESS BY INDEX ROWID| TBL_STAT_2     |    25 |  1975 |    25   (0)| 00:00:01 | |   2 |   NESTED LOOPS              |                |     1 |   158 |    27   (0)| 00:00:01 | |   3 |    TABLE ACCESS FULL        | TBL_STAT       |     1 |    79 |     2   (0)| 00:00:01 | PLAN_TABLE_OUTPUT --------------------------------------------------------------------------------

|*  4 |    INDEX RANGE SCAN         | IDX_TBL_STAT_2 |    25 |       |     2   (0)| 00:00:01 |

---------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- PLAN_TABLE_OUTPUT --------------------------------------------------------------------------------    4 - access("A"."OBJECT_ID"="B"."OBJECT_ID") Note -----    - dynamic sampling used for this statement 20 rows selected.

即使置换两个表的连接顺序,依旧选择TBL_STAT表是全表扫描,TBL_STAT_2是索引范围扫描,但由于插入记录后未采集过统计信息,两张表的预估记录数现在都是和实际相差较多

9. 手工采集TBL_STAT的统计信息

SQL> exec dbms_stats.gather_table_stats(ownname=>'DCSOPEN', tabname=>'TBL_STAT',  estimate_percent=>100); PL/SQL procedure successfully completed. SQL> explain plan for select a.object_name, b.object_name from tbl_stat_2 a, tbl_stat b where a.object_Id = b.object_id; Explained. SQL> select * from table(dbms_xplan.display); PLAN_TABLE_OUTPUT --------------------------------------------------------------------------------

Plan hash value: 1789047457

----------------------------------------------------------------------------------------- | Id  | Operation          | Name       | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     | ----------------------------------------------------------------------------------------- PLAN_TABLE_OUTPUT -------------------------------------------------------------------------------- |   0 | SELECT STATEMENT   |            |    29M|  3038M|       | 15552   (2)| 00:03:07 | |*  1 |  HASH JOIN         |            |    29M|  3038M|    47M| 15552   (2)| 00:03:07 | |   2 |   TABLE ACCESS FULL| TBL_STAT   |  1190K|    34M|       |  3790   (1)| 00:00:46 | |   3 |   TABLE ACCESS FULL| TBL_STAT_2 |  1299K|    97M|       |  3645   (1)| 00:00:44 | PLAN_TABLE_OUTPUT -------------------------------------------------------------------------------- ----------------------------------------------------------------------------------------- Predicate Information (identified by operation id): ---------------------------------------------------    1 - access("A"."OBJECT_ID"="B"."OBJECT_ID") Note PLAN_TABLE_OUTPUT -------------------------------------------------------------------------------------    - dynamic sampling used for this statement 19 rows selected.

发现此时TBL_STAT和TBL_STAT_2的预估行数已经不是1了,而且表之间采用的是全表扫描的哈希连接

10. 手工采集TBL_STAT_2表的统计信息

SQL> exec dbms_stats.gather_table_stats(ownname=>'DCSOPEN', tabname=>'TBL_STAT_2',  estimate_percent=>100); PL/SQL procedure successfully completed. SQL> explain plan for select a.object_name, b.object_name from tbl_stat_2 a, tbl_stat b where a.object_Id = b.object_id; Explained. SQL> select * from table(dbms_xplan.display); PLAN_TABLE_OUTPUT --------------------------------------------------------------------------------

Plan hash value: 2620555949 ----------------------------------------------------------------------------------------- | Id  | Operation          | Name       | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     | ----------------------------------------------------------------------------------------- PLAN_TABLE_OUTPUT -------------------------------------------------------------------------------- |   0 | SELECT STATEMENT   |            |    29M|  1703M|       | 12327   (2)| 00:02:28 | |*  1 |  HASH JOIN         |            |    29M|  1703M|    47M| 12327   (2)| 00:02:28 | |   2 |   TABLE ACCESS FULL| TBL_STAT_2 |  1190K|    34M|       |  3644   (1)| 00:00:44 | |   3 |   TABLE ACCESS FULL| TBL_STAT   |  1190K|    34M|       |  3790   (1)| 00:00:46 | PLAN_TABLE_OUTPUT -------------------------------------------------------------------------------- ----------------------------------------------------------------------------------------- Predicate Information (identified by operation id): ---------------------------------------------------    1 - access("A"."OBJECT_ID"="B"."OBJECT_ID") 15 rows selected.

此时TBL_STAT_2表的记录也趋于和实际一致,两表的连接仍是哈希连接

总结

1. 表的统计信息收集还是比较重要的一项工作,除了Oracle 10g以后会有自动收集的作业外,也可以手工进行统计信息的收集。

2. 本例中,由于TBL_STAT表灌入100万数据后,未收集统计信息,和TBL_STAT_2表连接采用的是嵌套循环连接,这种连接适用于大表和小表的关联场景,但实际这的两张表数据量相当,且都超过了100万,这样相当于100万*100万次关联,当收集统计信息后,两表连接改为了哈希连接,说明此时Oracle已经知道了表的实际数据量,执行计划也是依据表的实际数据量来做的判断,因此当表灌入大量数据后,建议手工采集统计信息,否则在系统自动采集统计信息之前,可能得到的执行计划就是错的。

本文参与 腾讯云自媒体分享计划,分享自作者个人站点/博客。
原始发表:2015年01月03日,如有侵权请联系 cloudcommunity@tencent.com 删除

本文分享自 作者个人站点/博客 前往查看

如有侵权,请联系 cloudcommunity@tencent.com 删除。

本文参与 腾讯云自媒体分享计划  ,欢迎热爱写作的你一起参与!

评论
登录后参与评论
0 条评论
热度
最新
推荐阅读
领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档