灌入大量数据后手工采集统计信息的重要性

1. 创建测试表TBL_STAT,及索引,但不插入记录

SQL> create table TBL_STAT as select * from dba_objects where 1<>1; Table created. SQL> create index idx_tbl_stat on tbl_stat (object_id); Index created. SQL> select count(*) from tbl_stat;   COUNT(*) ----------          0

2. 检索TBL_STAT的执行计划

SQL> explain plan for select object_name from tbl_stat where object_id = 1; Explained. SQL> select * from table(dbms_xplan.display); PLAN_TABLE_OUTPUT -------------------------------------------------------------------------------- Plan hash value: 2448091186 ------------------------------------------------------------------------------ | Id  | Operation         | Name     | Rows  | Bytes | Cost (%CPU)| Time     | ------------------------------------------------------------------------------ |   0 | SELECT STATEMENT  |          |     1 |    79 |     2   (0)| 00:00:01 | |*  1 |  TABLE ACCESS FULL| TBL_STAT |     1 |    79 |     2   (0)| 00:00:01 | ------------------------------------------------------------------------------ Predicate Information (identified by operation id): --------------------------------------------------- PLAN_TABLE_OUTPUT --------------------------------------------------------------------------------    1 - filter("OBJECT_ID"=1) Note -----    - dynamic sampling used for this statement 17 rows selected.

发现按照索引字段查询使用的是全表扫描

3. 手工收集TBL_STAT表的统计信息

SQL> exec dbms_stats.gather_table_stats(ownname=>'DCSOPEN', tabname=>'TBL_STAT',  estimate_percent=>100); PL/SQL procedure successfully completed. 4. 再次检索TBL_STAT表 SQL> explain plan for select object_name from tbl_stat where object_id = 1; Explained. SQL> select * from table(dbms_xplan.display); PLAN_TABLE_OUTPUT -------------------------------------------------------------------------------- Plan hash value: 3529113932 -------------------------------------------------------------------------------------------- | Id  | Operation                   | Name         | Rows  | Bytes | Cost (%CPU)| Time     |

-------------------------------------------------------------------------------------------- PLAN_TABLE_OUTPUT -------------------------------------------------------------------------------- |   0 | SELECT STATEMENT            |              |     1 |    79 |     1   (0)| 00:00:01 | |   1 |  TABLE ACCESS BY INDEX ROWID| TBL_STAT     |     1 |    79 |     1   (0)| 00:00:01 | |*  2 |   INDEX RANGE SCAN          | IDX_TBL_STAT |     1 |       |     1   (0)| 00:00:01 | -------------------------------------------------------------------------------------------- PLAN_TABLE_OUTPUT -------------------------------------------------------------------------------- Predicate Information (identified by operation id): ---------------------------------------------------    2 - access("OBJECT_ID"=1) 14 rows selected.

发现这次用到了索引范围扫描,说明收集统计信息让Oracle可以选择正确的执行计划路径

5. 插入100万的测试记录

SQL> begin   2    for i in 1 .. 10 loop   3      insert into tbl_stat select * from dba_objects;   4      commit;   5    end loop;   6  end;   7  / PL/SQL procedure successfully completed.

SQL> select count(*) from tbl_stat;   COUNT(*) ----------    1190725

6. 查看检索TBL_STAT表的执行计划

SQL> explain plan for select object_name from tbl_stat where object_id = 1; Explained. SQL> select * from table(dbms_xplan.display); PLAN_TABLE_OUTPUT -------------------------------------------------------------------------------- Plan hash value: 3529113932 -------------------------------------------------------------------------------------------- | Id  | Operation                   | Name         | Rows  | Bytes | Cost (%CPU)| Time     | -------------------------------------------------------------------------------------------- PLAN_TABLE_OUTPUT --------------------------------------------------------------------------------

|   0 | SELECT STATEMENT            |              |     1 |    79 |     1   (0)| 00:00:01 | |   1 |  TABLE ACCESS BY INDEX ROWID| TBL_STAT     |     1 |    79 |     1   (0)| 00:00:01 | |*  2 |   INDEX RANGE SCAN          | IDX_TBL_STAT |     1 |       |     1   (0)| 00:00:01 | -------------------------------------------------------------------------------------------- PLAN_TABLE_OUTPUT -------------------------------------------------------------------------------- Predicate Information (identified by operation id): ---------------------------------------------------    2 - access("OBJECT_ID"=1) 14 rows selected. 插入100万记录后,发现仍是索引范围扫描。

7. 创建第二个测试表TBL_STAT_2,以及索引

SQL> create table tbl_stat_2 as select * from tbl_stat; Table created. SQL> create index idx_tbl_stat_2 on tbl_stat_2 (object_id); Index created. SQL> select count(*) from tbl_stat_2;   COUNT(*) ----------    1190725

8. 检索TBL_STAT和TBL_STAT_2关联查询的执行计划

SQL> explain plan for select a.object_name, b.object_name from tbl_stat a, tbl_stat_2 b where a.object_Id = b.object_id; Explained. SQL> select * from table(dbms_xplan.display); PLAN_TABLE_OUTPUT -------------------------------------------------------------------------------- Plan hash value: 752230886 ---------------------------------------------------------------------------------------------- | Id  | Operation                   | Name           | Rows  | Bytes | Cost (%CPU)| Time     | ---------------------------------------------------------------------------------------------- PLAN_TABLE_OUTPUT -------------------------------------------------------------------------------- |   0 | SELECT STATEMENT            |                |     1 |   158 |    27   (0)| 00:00:01 | |   1 |  TABLE ACCESS BY INDEX ROWID| TBL_STAT_2     |    25 |  1975 |    25   (0)| 00:00:01 | |   2 |   NESTED LOOPS              |                |     1 |   158 |    27   (0)| 00:00:01 | |   3 |    TABLE ACCESS FULL        | TBL_STAT       |     1 |    79 |     2   (0)| 00:00:01 | PLAN_TABLE_OUTPUT -------------------------------------------------------------------------------- |*  4 |    INDEX RANGE SCAN         | IDX_TBL_STAT_2 |    25 |       |     2   (0)| 00:00:01 | ---------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- PLAN_TABLE_OUTPUT --------------------------------------------------------------------------------    4 - access("A"."OBJECT_ID"="B"."OBJECT_ID") Note -----    - dynamic sampling used for this statement 20 rows selected.

可以看到这里对TBl_STAT使用的是全表扫描,对TBL_STAT_2使用的是索引扫描,表之间是嵌套循环连接

SQL> explain plan for select a.object_name, b.object_name from tbl_stat_2 a, tbl_stat b where a.object_Id = b.object_id; Explained. SQL> select * from table(dbms_xplan.display); PLAN_TABLE_OUTPUT -------------------------------------------------------------------------------- Plan hash value: 752230886 ---------------------------------------------------------------------------------------------- | Id  | Operation                   | Name           | Rows  | Bytes | Cost (%CPU)| Time     | ---------------------------------------------------------------------------------------------- PLAN_TABLE_OUTPUT --------------------------------------------------------------------------------

|   0 | SELECT STATEMENT            |                |     1 |   158 |    27   (0)| 00:00:01 | |   1 |  TABLE ACCESS BY INDEX ROWID| TBL_STAT_2     |    25 |  1975 |    25   (0)| 00:00:01 | |   2 |   NESTED LOOPS              |                |     1 |   158 |    27   (0)| 00:00:01 | |   3 |    TABLE ACCESS FULL        | TBL_STAT       |     1 |    79 |     2   (0)| 00:00:01 | PLAN_TABLE_OUTPUT --------------------------------------------------------------------------------

|*  4 |    INDEX RANGE SCAN         | IDX_TBL_STAT_2 |    25 |       |     2   (0)| 00:00:01 |

---------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- PLAN_TABLE_OUTPUT --------------------------------------------------------------------------------    4 - access("A"."OBJECT_ID"="B"."OBJECT_ID") Note -----    - dynamic sampling used for this statement 20 rows selected.

即使置换两个表的连接顺序,依旧选择TBL_STAT表是全表扫描,TBL_STAT_2是索引范围扫描,但由于插入记录后未采集过统计信息,两张表的预估记录数现在都是和实际相差较多

9. 手工采集TBL_STAT的统计信息

SQL> exec dbms_stats.gather_table_stats(ownname=>'DCSOPEN', tabname=>'TBL_STAT',  estimate_percent=>100); PL/SQL procedure successfully completed. SQL> explain plan for select a.object_name, b.object_name from tbl_stat_2 a, tbl_stat b where a.object_Id = b.object_id; Explained. SQL> select * from table(dbms_xplan.display); PLAN_TABLE_OUTPUT --------------------------------------------------------------------------------

Plan hash value: 1789047457

----------------------------------------------------------------------------------------- | Id  | Operation          | Name       | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     | ----------------------------------------------------------------------------------------- PLAN_TABLE_OUTPUT -------------------------------------------------------------------------------- |   0 | SELECT STATEMENT   |            |    29M|  3038M|       | 15552   (2)| 00:03:07 | |*  1 |  HASH JOIN         |            |    29M|  3038M|    47M| 15552   (2)| 00:03:07 | |   2 |   TABLE ACCESS FULL| TBL_STAT   |  1190K|    34M|       |  3790   (1)| 00:00:46 | |   3 |   TABLE ACCESS FULL| TBL_STAT_2 |  1299K|    97M|       |  3645   (1)| 00:00:44 | PLAN_TABLE_OUTPUT -------------------------------------------------------------------------------- ----------------------------------------------------------------------------------------- Predicate Information (identified by operation id): ---------------------------------------------------    1 - access("A"."OBJECT_ID"="B"."OBJECT_ID") Note PLAN_TABLE_OUTPUT -------------------------------------------------------------------------------------    - dynamic sampling used for this statement 19 rows selected.

发现此时TBL_STAT和TBL_STAT_2的预估行数已经不是1了,而且表之间采用的是全表扫描的哈希连接

10. 手工采集TBL_STAT_2表的统计信息

SQL> exec dbms_stats.gather_table_stats(ownname=>'DCSOPEN', tabname=>'TBL_STAT_2',  estimate_percent=>100); PL/SQL procedure successfully completed. SQL> explain plan for select a.object_name, b.object_name from tbl_stat_2 a, tbl_stat b where a.object_Id = b.object_id; Explained. SQL> select * from table(dbms_xplan.display); PLAN_TABLE_OUTPUT --------------------------------------------------------------------------------

Plan hash value: 2620555949 ----------------------------------------------------------------------------------------- | Id  | Operation          | Name       | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     | ----------------------------------------------------------------------------------------- PLAN_TABLE_OUTPUT -------------------------------------------------------------------------------- |   0 | SELECT STATEMENT   |            |    29M|  1703M|       | 12327   (2)| 00:02:28 | |*  1 |  HASH JOIN         |            |    29M|  1703M|    47M| 12327   (2)| 00:02:28 | |   2 |   TABLE ACCESS FULL| TBL_STAT_2 |  1190K|    34M|       |  3644   (1)| 00:00:44 | |   3 |   TABLE ACCESS FULL| TBL_STAT   |  1190K|    34M|       |  3790   (1)| 00:00:46 | PLAN_TABLE_OUTPUT -------------------------------------------------------------------------------- ----------------------------------------------------------------------------------------- Predicate Information (identified by operation id): ---------------------------------------------------    1 - access("A"."OBJECT_ID"="B"."OBJECT_ID") 15 rows selected.

此时TBL_STAT_2表的记录也趋于和实际一致,两表的连接仍是哈希连接

总结

1. 表的统计信息收集还是比较重要的一项工作,除了Oracle 10g以后会有自动收集的作业外,也可以手工进行统计信息的收集。

2. 本例中,由于TBL_STAT表灌入100万数据后,未收集统计信息,和TBL_STAT_2表连接采用的是嵌套循环连接,这种连接适用于大表和小表的关联场景,但实际这的两张表数据量相当,且都超过了100万,这样相当于100万*100万次关联,当收集统计信息后,两表连接改为了哈希连接,说明此时Oracle已经知道了表的实际数据量,执行计划也是依据表的实际数据量来做的判断,因此当表灌入大量数据后,建议手工采集统计信息,否则在系统自动采集统计信息之前,可能得到的执行计划就是错的。

本文参与腾讯云自媒体分享计划,欢迎正在阅读的你也加入,一起分享。

发表于

我来说两句

0 条评论
登录 后参与评论

相关文章

来自专栏安恒网络空间安全讲武堂

Sqlite3的二次盲注

在扫描目录,以及爆破 admin用户密码均无果后,发现flag用户的图片外链是 github仓库,想到会存在源码泄露,于是查看作者的仓库:

23130
来自专栏ios 技术积累

Mybatis中@Param的用法和作用详解

用注解来简化xml配置的时候,@Param注解的作用是给参数命名,参数命名后就能根据名字得到参数值,正确的将参数传入sql语句中

64010
来自专栏Jed的技术阶梯

SQL 语句的执行顺序

id 代表客服人员的 id,client 代表与该客服人员通话的客户的 id,也是说,每有一条记录,就代表一个客服与一位客户进行了通话,相同的记录,例如第一行和...

66030
来自专栏Spring相关

Android的HttpUrlConnection

24130
来自专栏祝威廉

MLSQL v1.1.7 Release roadmap

MLSQL v1.1.7 plans to release in Mid Jan 2019, this version will take almost thr...

11320
来自专栏安恒网络空间安全讲武堂

sqlmap tamper脚本编写

上次HCTF中Li4n0师傅出了一道Kzone,非预期解可以利用Unicode编码关键字bypass掉WAF,发现如果手动编写sqlmap中的tamper脚本能...

1.9K30
来自专栏IT大咖说

非常全的Web开发学习总结(4张高清大图)

内容来源:原作者——amranahmedse,原文——https://github.com/kamranahmedse/developer-roadmap;译者...

11440
来自专栏Spark生态圈

基于SparkSQL实现的一套即席查询服务

支持的数据源:hdfs、hive、hbase、kafka、mysql、es、mongo

42510
来自专栏淡定的博客

sql注入入门学习(数字型)(连载中)

在MySQL中,表名存放在information_schema数据库下tables表table_name字段中、查表名我们主要用到的是TABLES表

10320
来自专栏祝威廉

MLSQL v1.1.6 新特性:用MLSQL脚本查看API/Configuration

MLSQL有自己的一套doc系统,可以通过标准的MLSQL语句进行获取,可以参看MLSQL-Doc-Slide。

10230

扫码关注云+社区

领取腾讯云代金券

年度创作总结 领取年终奖励