前往小程序,Get更优阅读体验!
立即前往
首页
学习
活动
专区
工具
TVP
发布
社区首页 >专栏 >案例分析:倾斜值传入导致 SQL 资源消耗升高

案例分析:倾斜值传入导致 SQL 资源消耗升高

作者头像
数据和云
发布2018-07-27 15:42:08
4380
发布2018-07-27 15:42:08
举报
文章被收录于专栏:数据和云数据和云

作者 | 邓秋爽:云和恩墨技术工程师,有超过七年超大型数据库专业服务经验,擅长 Oracle 数据库优化、SQL 优化和 Troubleshooting。

局方监控系统反馈 2014-12-31 19:30:00-20:00:00 这段时间 db time 上升较大,sql_id 88wdzpr9mv2wy 消耗了12%以上的 db time。

SQL 在 shared pool 的执行计划:

-------------------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | -------------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1239 | 245K| 4582 (1)| 00:00:55 | |* 1 | VIEW | | 1239 | 245K| 4582 (1)| 00:00:55 | |* 2 | COUNT STOPKEY | | | | | | | 3 | VIEW | | 1239 | 229K| 4582 (1)| 00:00:55 | |* 4 | SORT ORDER BY STOPKEY | | 1239 | 168K| 4582 (1)| 00:00:55 | |* 5 | FILTER | | | | | | |* 6 | TABLE ACCESS BY INDEX ROWID | CVS_REC_BANKTASK | 1 | 77 | 3 (0)| 00:00:01 | | 7 | NESTED LOOPS | | 1239 | 168K| 4581 (1)| 00:00:55 | |* 8 | TABLE ACCESS BY INDEX ROWID| CVS_RECEPTION | 1239 | 76818 | 862 (1)| 00:00:11 | |* 9 | INDEX SKIP SCAN | IDX_CVS_RECEPTION | 1239 | | 101 (0)| 00:00:02 | |* 10 | INDEX RANGE SCAN | IDX_REC_BANKTASK | 1 | | 2 (0)| 00:00:01 | -------------------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - filter("RN">TO_NUMBER(:UP)*TO_NUMBER(:DOWN)-TO_NUMBER(:DOWN)) 2 - filter(ROWNUM<=TO_NUMBER(:UP)*TO_NUMBER(:DOWN)) 4 - filter(ROWNUM<=TO_NUMBER(:UP)*TO_NUMBER(:DOWN)) 5 - filter(TO_DATE(:ENDTDATA,'yyyymmdd')+1>=TO_DATE(:STARTDATA,'yyyymmdd')) 6 - filter("T"."DEAL_TIME"<=TO_DATE(:ENDTDATA,'yyyymmdd')+1 AND "T"."DEAL_TIME">=TO_DATE(:STARTDATA,'yyyymmdd') AND "T"."ACCOUNT_TYPE"=TO_NUMBER(:ACCOUNTTYPE) AND "T"."REGION"=TO_NUMBER(:REGION)) 8 - filter("M"."REGION"=TO_NUMBER(:REGION)) 9 - access("M"."ORG_ID"=:SITEID) filter("M"."ORG_ID"=:SITEID) 10 - access("T"."RECEPTION_ID"="M"."RECEPTION_ID")

SQL 的历史执行计划:

-------------------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | -------------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | | | 3453 (100)| | | 1 | VIEW | | 913 | 180K| 3453 (1)| 00:00:42 | | 2 | COUNT STOPKEY | | | | | | | 3 | VIEW | | 913 | 169K| 3453 (1)| 00:00:42 | | 4 | SORT ORDER BY STOPKEY | | 913 | 123K| 3453 (1)| 00:00:42 | | 5 | FILTER | | | | | | | 6 | TABLE ACCESS BY INDEX ROWID | CVS_REC_BANKTASK | 1 | 77 | 3 (0)| 00:00:01 | | 7 | NESTED LOOPS | | 913 | 123K| 3452 (1)| 00:00:42 | | 8 | TABLE ACCESS BY INDEX ROWID| CVS_RECEPTION | 913 | 56606 | 711 (0)| 00:00:09 | | 9 | INDEX SKIP SCAN | IDX_CVS_RECEPTION | 913 | | 74 (0)| 00:00:01 | | 10 | INDEX RANGE SCAN | IDX_REC_BANKTASK | 1 | | 2 (0)| 00:00:01 | -------------------------------------------------------------------------------------------------------- 36 rows selected.

SQL 历史中只出现过一种执行计划,这个表示该 SQL 在 AWR 中没有出现多种执行计划而导致性能出现差异。

相关表的统计信息:

CVS_RECEPTION 表的统计信息: Table Number Empty Chain Average Global Sample Date Name of Rows Blocks Blocks Count Row Len Stats Size MM-DD-YYYY ------------------------------ -------------- --------------- ------------ -------- ------- ------ -------------- ---------- CVS_RECEPTION 2,257,580 305,08 0 0 90 YES 112,879 01-03-2015 Column Distinct Number Number Sample Date Name Values Density Buckets Nulls Size MM-DD-YYYY ------------------------------ ------------ ----------- ------- ------------ -------------- ---------- REGION 1 1.00000000 1 0 112,879 01-03-2015 ORG_ID 1,822 .00054885 1 0 112,879 01-03-2015 REC_TYPE 9 .11111111 1 0 112,879 01-03-2015 ENTITY_ID 1,228,762 .00000081 1 5,820 112,588 01-03-2015 Index Leaf Distinct Number AV Av Cluster Date Name BLV Blks Keys of Rows LEA Data Factor MM-DD-YYYY ------------------------------ --- ------------ -------------- -------------- ------- ------- ------------ ---------- PK_CVS_RECEPTION 2 14,508 2,175,180 2,175,180 1 1 2,086,155 01-03-2015 IDX_CVS_REGION_REC_DATE 2 13,965 1,671,330 2,279,948 1 1 2,181,132 01-03-2015 IDX_CVS_RECEPTION 2 18,852 1,822 2,296,873 23 1,694 1,384,453 01-03-2015 IDX_CVS_RECEPTION_ENTITY_ID 2 12,855 1,228,762 2,324,396 1 1 2,318,265 01-03-2015 IDX_CVS_RECEPTION_FORMNUM 2 6,760 1 2,225,924 6,760 35,358 35,358 01-03-2015 Index Column Col Column Name Name Pos Details ------------------------------ ------------------------------ ---- ------------------------ IDX_CVS_RECEPTION REC_TYPE 1 VARCHAR2(10) NOT NULL ORG_ID 2 VARCHAR2(8) NOT NULL CVS_REC_BANKTASK表的统计信息: Table Number Empty Chain Average Global Sample Date Name of Rows Blocks Blocks Count Row Len Stats Size MM-DD-YYYY ------------------------------ -------------- --------------- ------------ -------- ------- ------ -------------- ---------- CVS_REC_BANKTASK 3,899,140 452,98 0 0 77 YES 194,957 01-04-2015 Column Distinct Number Number Sample Date Name Values Density Buckets Nulls Size MM-DD-YYYY ------------------------------ ------------ ----------- ------- ------------ -------------- ---------- REGION 1 1.00000000 1 0 194,957 01-04-2015 RECEPTION_ID 3,899,140 .00000026 1 0 194,957 01-04-2015 ACCOUNT_TYPE 1 1.00000000 1 0 194,957 01-04-2015 DEAL_TIME 2,644,869 .00000038 1 0 194,957 01-04-2015 Index Leaf Distinct Number AV Av Cluster Date Name BLV Blks Keys of Rows LEA Data Factor MM-DD-YYYY ------------------------------ --- ------------ -------------- -------------- ------- ------- ------------ ---------- IDX_REC_BANKTASK 2 25,300 3,721,460 3,721,460 1 1 3,503,660 01-04-2015 Index Column Col Column Name Name Pos Details ------------------------------ ------------------------------ ---- ------------------------ IDX_REC_BANKTASK RECEPTION_ID 1 VARCHAR2(32) NOT NULL

SQL 的历史执行信息:

sys@CRMDB4>@sqlhis_add.sql Enter value for sql_id: 88wdzpr9mv2wy old 27: and a.sql_id = '&sql_id' new 27: and a.sql_id = '88wdzpr9mv2wy' BEGIN_TIME INSTANCE_NUMBER MODULE PLAN_HASH_VALUE EXEC PER_GET PER_ROWS TIME_S PER_READ ------------------- --------------- ------------------------------ ---------------- ---------- ---------- ---------- ---------- ---------- 2015-01-05 08:00:30 1 tpengine@winftux1 (TNS V1-V3) 511419205 70 93172 8.9 14.91 719.01 2015-01-04 20:30:29 1 tpengine@winftux1 (TNS V1-V3) 511419205 119 98962 9.1 4.61 473.61 2015-01-04 19:00:34 1 tpengine@winftux1 (TNS V1-V3) 511419205 727 32261 28.2 .58 47.44 2015-01-04 17:30:13 1 tpengine@winftux1 (TNS V1-V3) 511419205 180 78200 21.1 4.24 425.64 2015-01-04 16:30:54 1 tpengine@winftux1 (TNS V1-V3) 511419205 70 277336 9.8 11.37 584.23 2015-01-04 09:30:18 1 tpengine@winftux1 (TNS V1-V3) 511419205 74 181718 14.5 15.93 988.64 2015-01-04 07:30:17 1 tpengine@winftux1 (TNS V1-V3) 511419205 12 17221 9.2 18.77 2430.75 2015-01-03 22:00:19 1 tpengine@winftux1 (TNS V1-V3) 511419205 47 0 9.8 15.35 867.7 2015-01-03 18:00:39 1 tpengine@winftux1 (TNS V1-V3) 511419205 93 9879 8.9 6 344.46 2015-01-03 17:30:32 1 tpengine@winftux1 (TNS V1-V3) 511419205 143 94887 9.2 3.04 248.31 2015-01-03 15:30:04 1 tpengine@winftux1 (TNS V1-V3) 511419205 40 467928 9.6 6.06 272.18 2015-01-03 15:00:40 1 tpengine@winftux1 (TNS V1-V3) 511419205 88 38890 7.7 8.39 669.23 2015-01-02 18:00:28 1 tpengine@winftux1 (TNS V1-V3) 511419205 436 63315 26.1 1.02 82.67 2015-01-02 10:30:13 1 tpengine@winftux1 (TNS V1-V3) 511419205 740 707283 27 6.15 9.71 2015-01-02 10:00:06 1 tpengine@winftux1 (TNS V1-V3) 511419205 256 531298 22.4 5.26 138.29 2015-01-02 07:30:31 1 tpengine@winftux1 (TNS V1-V3) 511419205 29 12594 7.2 11.02 1600.21 2015-01-01 19:00:04 1 tpengine@winftux1 (TNS V1-V3) 511419205 143 24895 12.3 6.37 234.63 2015-01-01 17:30:13 1 tpengine@winftux1 (TNS V1-V3) 511419205 447 46359 19.2 1.05 70.47 2015-01-01 16:00:53 1 tpengine@winftux1 (TNS V1-V3) 511419205 447 45576 19.5 .68 48.81 2015-01-01 11:30:11 1 tpengine@winftux1 (TNS V1-V3) 511419205 376 46110 21.5 1.63 116.57 2015-01-01 10:30:36 1 tpengine@winftux1 (TNS V1-V3) 511419205 416 47588 21.4 1.21 79.76 2014-12-31 20:00:08 1 tpengine@winftux1 (TNS V1-V3) 511419205 1071 726326 28 6.75 21.55 2014-12-31 19:30:01 1 tpengine@winftux1 (TNS V1-V3) 511419205 3057 760690 28.9 7.35 18.57 2014-12-31 14:30:51 1 tpengine@winftux1 (TNS V1-V3) 511419205 150 514830 21.1 12.26 438.05 2014-12-30 20:30:03 1 tpengine@winftux2 (TNS V1-V3) 511419205 74 11074 8.8 9.14 631.47 2014-12-30 19:30:26 1 tpengine@winftux2 (TNS V1-V3) 511419205 94 12433 8.4 8.28 545.99 2014-12-30 18:30:12 1 tpengine@winftux2 (TNS V1-V3) 511419205 12578 135489 30 1.59 1.25 2014-12-30 18:00:05 1 tpengine@winftux2 (TNS V1-V3) 511419205 7251 132103 29.8 1.57 7.55 2014-12-30 14:30:17 1 tpengine@winftux2 (TNS V1-V3) 511419205 70 17396 8.3 11.17 814.97 29 rows selected.

通过对比每半个小时的平均逻辑读部分时间段有较大的波动,在 2014-12-31 19:30:01 到 2014-12-31 20:00:01 时间段这个 SQL 执行次数达到了3057次,每次平均逻辑读达到了76万以上,而有些时间段的这个 SQL 的平均逻辑读只有几万,这个表示通过绑定变量传递过来的值会有倾斜值。

对比执行计划造成逻辑读在不同时间段存在差异的只可能是 tbcs.cvs_reception 表,而这个表是作为 nested loop 循环的驱动表,对应的执行计划和谓词部分如下:

|* 8 | TABLE ACCESS BY INDEX ROWID| CVS_RECEPTION | 1239 | 76818 | 862 (1)| 00:00:11 | |* 9 | INDEX SKIP SCAN | IDX_CVS_RECEPTION | 1239 | | 101 (0)| 00:00:02 | 8 - filter("M"."REGION"=TO_NUMBER(:REGION)) 9 - access("M"."ORG_ID"=:SITEID) filter("M"."ORG_ID"=:SITEID)

看来造成逻辑读存在差异的只可能是 org_id 和 region 两列,而 region 这列根据表的统计信息只有一组 distinct value,那么只可能是 org_id 这个对应的绑定变量:SITEID 存在有倾斜值,造成了平均逻辑读在这个时间段特别大,然而 SQL 在这个时间段执行频率又特别高,进而导致消耗了较多的 db time。

来看看表 tbcs.cvs_reception的org_id 倾斜值:

SQL> select * from (select org_id,count(*) from tbcs.cvs_reception group by org_id order by count(*) desc) where rownum<20;

优化器评估 INDEX SKIP SCAN IDX_CVS_RECEPTION 部分返回的 rows 是1239(在没有直方图的情况下,优化器计算等值谓词的选择 selectivy 公式是1/distinct*((num_rows-null_rows)/num_rows))

Oracle 抓取绑定变量的规律有两种:

  1. 硬解析的 SQL 被执行时,Oracle 会抓取该 SQL 的绑定变量;
  2. 软解析/软软解析的 SQL 重复执行时,Oracle 也会抓取绑定变量,不过这里 Oracle 只会每隔15分钟抓取一次绑定变量,这里抓取的值不一定具有代表性。

SQL> select value_string, last_captured 2 from dba_hist_sqlbind 3 where sql_id = '88wdzpr9mv2wy' 4 and name = ':SITEID' 5 order by last_captured desc 6 ; VALUE_STRING LAST_CAPTURED -------------------------------------------------------------------------------- ------------------------------ 11876365 2015/1/4 20:58:23 11791996 2015/1/4 19:25:19 11863035 2015/1/4 17:51:34 11001259 2015/1/4 16:49:52 11972820 2015/1/4 9:54:02 11167400 2015/1/4 7:50:47 11877553 2015/1/1 1:01:37 11877553 2015/1/1 1:01:37 11877553 2015/1/1 1:01:37 11877553 2015/1/1 1:01:37 11877553 2015/1/1 1:01:37 11877553 2015/1/1 1:01:37 11877553 2015/1/1 1:01:37 11877553 2015/1/1 1:01:37 11877553 2015/1/1 1:01:37 11877553 2015/1/1 1:01:37 11877553 2015/1/1 1:01:37 11877553 2015/1/1 1:01:37 11877553 2015/1/1 1:01:37 11877553 2015/1/1 1:01:37 11257437 2014/12/31 20:30:32 11001259 2014/12/31 20:00:15 11001259 2014/12/31 15:00:11 11289153 2014/12/30 20:49:37 11587654 2014/12/30 19:47:02 11001012 2014/12/30 18:53:50 11001585 2014/12/30 18:23:50 11001262 2014/12/30 14:54:52 11001454 2014/12/30 12:44:47 11001418 2014/12/30 7:53:54 11872880 2014/12/27 23:29:56 11872880 2014/12/27 23:29:56 11872880 2014/12/27 23:29:56 11872880 2014/12/27 23:29:56 11872880 2014/12/27 23:29:56 11872880 2014/12/27 23:29:56 36 rows selected

这里运气可能比较好,Oracle 这里在 2014/12/31 20:00:15 抓取的 bind value 刚好是倾斜值11001259(如果刚好故障时间段抓到是一个没有倾斜性的值,大家也不要判定觉得这个 SQL 在这个时间段是没有传入倾斜值的),这个值实际通过 index skip scan 部分要返回 310378 条数据,而这里又要走 nested loop 的方式,相当于底层的被驱动表 CVS_REC_BANKTASK 要走 310378 次 index range scan,正是循环次数的增多导致这个 SQL 会消耗较多的 IO 资源。

SQL> select name, last_captured, value_string, datatype_string 2 from dba_hist_sqlbind 3 where sql_id = '88wdzpr9mv2wy' 4 and last_captured = 5 to_date('2014/12/31 20:00:15', 'yyyy-mm-dd hh24:mi:ss') 6 ; NAME LAST_CAPTURED VALUE_STRING DATATYPE_STRING ------------------------------ ----------------------------------- ------------------------------ --------------- :DOWN 2014/12/31 20:00:15 30 VARCHAR2(32) :DOWN 2014/12/31 20:00:15 30 VARCHAR2(32) :UP 2014/12/31 20:00:15 101 VARCHAR2(32) :DOWN 2014/12/31 20:00:15 30 VARCHAR2(32) :UP 2014/12/31 20:00:15 101 VARCHAR2(32) :REGION 2014/12/31 20:00:15 11 VARCHAR2(32) :SITEID 2014/12/31 20:00:15 11001259 VARCHAR2(32) :ACCOUNTTYPE 2014/12/31 20:00:15 30 VARCHAR2(32) :ENDTDATA 2014/12/31 20:00:15 20141231 VARCHAR2(32) :STARTDATA 2014/12/31 20:00:15 20141231 VARCHAR2(32) 10 rows selected

带入具体的 bind value 值,来验证 SQL 的资源消耗。

Execution Plan ---------------------------------------------------------- Plan hash value: 511419205 -------------------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | -------------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1239 | 245K| 4582 (1)| 00:00:55 | |* 1 | VIEW | | 1239 | 245K| 4582 (1)| 00:00:55 | |* 2 | COUNT STOPKEY | | | | | | | 3 | VIEW | | 1239 | 229K| 4582 (1)| 00:00:55 | |* 4 | SORT ORDER BY STOPKEY | | 1239 | 168K| 4582 (1)| 00:00:55 | |* 5 | FILTER | | | | | | |* 6 | TABLE ACCESS BY INDEX ROWID | CVS_REC_BANKTASK | 1 | 77 | 3 (0)| 00:00:01 | | 7 | NESTED LOOPS | | 1239 | 168K| 4581 (1)| 00:00:55 | |* 8 | TABLE ACCESS BY INDEX ROWID| CVS_RECEPTION | 1239 | 76818 | 862 (1)| 00:00:11 | |* 9 | INDEX SKIP SCAN | IDX_CVS_RECEPTION | 1239 | | 101 (0)| 00:00:02 | |* 10 | INDEX RANGE SCAN | IDX_REC_BANKTASK | 1 | | 2 (0)| 00:00:01 | -------------------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - filter("RN">TO_NUMBER(:UP)*TO_NUMBER(:DOWN)-TO_NUMBER(:DOWN)) 2 - filter(ROWNUM<=TO_NUMBER(:UP)*TO_NUMBER(:DOWN)) 4 - filter(ROWNUM<=TO_NUMBER(:UP)*TO_NUMBER(:DOWN)) 5 - filter(TO_DATE(:ENDTDATA,'yyyymmdd')+1>=TO_DATE(:STARTDATA,'yyyymmdd')) 6 - filter("T"."DEAL_TIME"<=TO_DATE(:ENDTDATA,'yyyymmdd')+1 AND "T"."DEAL_TIME">=TO_DATE(:STARTDATA,'yyyymmdd') AND "T"."ACCOUNT_TYPE"=TO_NUMBER(:ACCOUNTTYPE) AND "T"."REGION"=TO_NUMBER(:REGION)) 8 - filter("M"."REGION"=TO_NUMBER(:REGION)) 9 - access("M"."ORG_ID"=:SITEID) filter("M"."ORG_ID"=:SITEID) 10 - access("T"."RECEPTION_ID"="M"."RECEPTION_ID") Statistics ---------------------------------------------------------- 0 recursive calls 0 db block gets 874889 consistent gets 36711 physical reads 2816 redo size 3171 bytes sent via SQL*Net to client 520 bytes received via SQL*Net from client 2 SQL*Net roundtrips to/from client 1 sorts (memory) 0 sorts (disk) 30 rows processed

看出这里带入倾斜值消耗了 874889 逻辑读,这个也充分表示了故障时间段 db time 升高是因为传入了较多的倾斜值和执行次数的增加导致。

这里展示的是一种分析问题的思路,当然这个 case 也带入了很多的知识点,比如绑定变量 Oracle 是如何捕捉值,捕捉的规律,然后部分谓词计算 selectivy 的方式,还有一些常用的脚本整理,比如最常规的有朋友问到在不做 AWR 的情况下,如何知道 db time,如何知道某些 SQL 消耗的 db time,如何找到 TOP SQL,这些常用 SQL 的整理对于我们做 SQL 审核和评估是非常重要的,只有提高自身技术水平才能更加容易和轻松的工作。

本文参与 腾讯云自媒体分享计划,分享自微信公众号。
原始发表:2018-04-19,如有侵权请联系 cloudcommunity@tencent.com 删除

本文分享自 数据和云 微信公众号,前往查看

如有侵权,请联系 cloudcommunity@tencent.com 删除。

本文参与 腾讯云自媒体分享计划  ,欢迎热爱写作的你一起参与!

评论
登录后参与评论
0 条评论
热度
最新
推荐阅读
领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档