生产系统调优之_敢于质疑(90天)

接着昨天的那个问题来说。有个sql语句在做了统计信息收集之后,速度有了一定的提升,从5秒的响应降低到了2秒。但是和预期还是有一定 的差距,按照80条查询请求在短时间内响应。2秒*80000次/60/60=44.4小时,本来感觉可以接受的一下子就成了大问题。

Elapsed Time (s)

Executions

Elapsed Time per Exec (s)

%Total

%CPU

%IO

SQL Id

SQL Module

SQL Text

31stMay

305.44

145

2.11

25.56

99.69

0.26

2fjzq67jbztwv

gext1GenericEx@ccbdbpr1 (TNS V1-V3)

/* */ SELECT DISTINCT 'K', AR....

和那个同事又确认了下,他说在其他项目也用的这个sql语句数据量还要大的多,就是没有问题。
有了昨天的一些数据,我自己也基本心里有数了,我表示怀疑,在此基础上能做的就是仔细看看这个sql语句了,看看到底实现细节有没有问题。
可以看到加入了一些hint,嵌入了子查询。
SELECT DISTINCT 'K',
                AR.RESOURCE_VALUE,
                AR.RESOURCE_TYPE,
                GREATEST(TO_CHAR(AR.EFFECTIVE_DATE, 'YYYYMMDDHH24MISS'),
                         TO_CHAR(SB.EFFECTIVE_DATE, 'YYYYMMDDHH24MISS')),
                LEAST(NVL(TO_CHAR(AR.EXPIRATION_DATE, 'YYYYMMDDHH24MISS'),
                          '47001231000000'),
                      NVL(TO_CHAR(SB.EXPIRATION_DATE, 'YYYYMMDDHH24MISS'),
                          '47001231000000')),
                AR.AGR_NO,
                SB.MEDIUM_CUS_ID,
                SB.SUB_STATUS,
                SB.BUSINESS_ENTITY_ID,
                SB.LANGUAGE,
                SB.ROUTING_POLICY_ID,
                SB.L9_PORT_IND,
                SB.L9_SPLIT_PERIOD
  FROM HUGE_RESOURCE AR,  --这个表有大约2000多万条数据,做了分区
       MEDIUM_SUB SB,   --这个表有50万左右的数据
       (select /*+ RESULT_CACHE */
        DISTINCT PARAM_NAME as PARAM_NAME 
          from SMALL_PARAM    --这个表比较小,有不到2万条记录,如果加过滤条件,能过滤掉一半多的数据,因为那个字段不在索引字段里,所以加了result_cache
         where GUIDING_IND = 'Y') OP,
       MEDIUM_CUS CS
WHERE AR.AGR_NO = 1056851
   AND AR.AGREEMENT_KEY = MOD(1056851, 100)
   AND (AR.RESOURCE_STATE != 'F' OR AR.RESOURCE_STATE IS NULL)
   AND AR.RANGE_IND = 'N'
   AND SB.MEDIUM_SUB_NO = AR.AGR_NO
   AND EXISTS
(select /*+ INDEX(OP SMALL_PARAM_1IX) */
         1
          from SMALL_PARAM OP
         where OP.PARAM_NAME in (AR.RESOURCE_PRM_CD, AR.BASE_PARAM_NAME) )
   AND SB.MEDIUM_CUS_ID = CS.MEDIUM_CUS_ID
   AND (SB.EXPIRATION_DATE IS NULL OR SB.EXPIRATION_DATE >= to_date('19000101000000','YYYYMMDDHH24MISS'))
   AND (AR.EXPIRATION_DATE IS NULL OR AR.EXPIRATION_DATE >= to_date('19000101000000','YYYYMMDDHH24MISS'))
   AND (AR.EFFECTIVE_DATE < SB.EXPIRATION_DATE OR
       SB.EXPIRATION_DATE IS NULL)
   AND (AR.EXPIRATION_DATE IS NULL OR
       AR.EXPIRATION_DATE > SB.EFFECTIVE_DATE)
   AND SB.SUB_STATUS != 'T'
UNION ALL
SELECT DISTINCT 'K',
                AR.RESOURCE_VALUE,
                AR.RESOURCE_TYPE,
                GREATEST(TO_CHAR(AR.EFFECTIVE_DATE, 'YYYYMMDDHH24MISS'),
                         TO_CHAR(SH.EFFECTIVE_DATE, 'YYYYMMDDHH24MISS')),
                LEAST(NVL(TO_CHAR(AR.EXPIRATION_DATE, 'YYYYMMDDHH24MISS'),
                          '47001231000000'),
                      NVL(TO_CHAR(SH.EXPIRATION_DATE, 'YYYYMMDDHH24MISS'),
                          '47001231000000')),
                AR.AGR_NO,
                SH.MEDIUM_CUS_ID,
                SH.SUB_STATUS,
                SH.BUSINESS_ENTITY_ID,
                SH.LANGUAGE,
                SH.ROUTING_POLICY_ID,
                SH.L9_PORT_IND,
                SH.L9_SPLIT_PERIOD
  FROM HUGE_RESOURCE AR,
       MEDIUM_SUB_HISTORY SH,
       (select /*+ RESULT_CACHE */
        DISTINCT PARAM_NAME as PARAM_NAME
          from SMALL_PARAM
         where GUIDING_IND = 'Y') OP,
       MEDIUM_CUS CS
WHERE AR.AGR_NO = 1056851
   AND AR.AGREEMENT_KEY = MOD(1056851, 100)
   AND (AR.RESOURCE_STATE != 'F' OR AR.RESOURCE_STATE IS NULL)
   AND AR.RANGE_IND = 'N'
   AND SH.MEDIUM_SUB_NO = AR.AGR_NO
   AND EXISTS
(select /*+ INDEX(OP SMALL_PARAM_1IX) */
         1
          from SMALL_PARAM OP
         where OP.PARAM_NAME in (AR.RESOURCE_PRM_CD, AR.BASE_PARAM_NAME))
   AND SH.MEDIUM_CUS_ID = CS.MEDIUM_CUS_ID
   AND (SH.EXPIRATION_DATE IS NULL OR SH.EXPIRATION_DATE >= to_date('19000101000000','YYYYMMDDHH24MISS'))
   AND (AR.EXPIRATION_DATE IS NULL OR AR.EXPIRATION_DATE >= to_date('19000101000000','YYYYMMDDHH24MISS'))
   AND (AR.EFFECTIVE_DATE < SH.EXPIRATION_DATE OR
       SH.EXPIRATION_DATE IS NULL)
   AND (AR.EXPIRATION_DATE IS NULL OR
       AR.EXPIRATION_DATE > SH.EFFECTIVE_DATE)
   AND SH.SUB_STATUS NOT IN ('C', 'T')

试着在生产上抓了一个执行计划,又看了一下统计信息。

Statistics
----------------------------------------------------------
        113  recursive calls
          8  db block gets
     31581  consistent gets
          0  physical reads
          0  redo size
       1997  bytes sent via SQL*Net to client
        520  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
         20  sorts (memory)
          0  sorts (disk)
          6  rows processed
 Elapsed: 00:02:44.14


---------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                                | Name                       | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
---------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                         |                            |    13 |  2905 |   519  (59)| 00:00:07 |       |       |

在备份库上查看,结果同样的查询一下子没有了反应。执行了将近3分钟还是没有反应。备份库的资源要比生产差一些。好了感觉有问题了。开始看看这个sql语句的第一部分。

SELECT DISTINCT 'K',
                AR.RESOURCE_VALUE,
                AR.RESOURCE_TYPE,
                GREATEST(TO_CHAR(AR.EFFECTIVE_DATE, 'YYYYMMDDHH24MISS'),
                         TO_CHAR(SB.EFFECTIVE_DATE, 'YYYYMMDDHH24MISS')),
                LEAST(NVL(TO_CHAR(AR.EXPIRATION_DATE, 'YYYYMMDDHH24MISS'),
                          '47001231000000'),
                      NVL(TO_CHAR(SB.EXPIRATION_DATE, 'YYYYMMDDHH24MISS'),
                          '47001231000000')),
                AR.AGR_NO,
                SB.MEDIUM_CUS_ID,
                SB.SUB_STATUS,
                SB.BUSINESS_ENTITY_ID,
                SB.LANGUAGE,
                SB.ROUTING_POLICY_ID,
                SB.L9_PORT_IND,
                SB.L9_SPLIT_PERIOD
  FROM HUGE_RESOURCE AR,  --这个表有大约2000多万条数据,做了分区
       MEDIUM_SUB SB,   --这个表有50万左右的数据
  (select /*+ RESULT_CACHE */
        DISTINCT PARAM_NAME as PARAM_NAME 
          from SMALL_PARAM    --这个表比较小,有不到2万条记录,如果加过滤条件,能过滤掉一半多的数据,因为那个字段不在索引字段里,所以加了result_cache
         where GUIDING_IND = 'Y') OP,
       MEDIUM_CUS CS
WHERE AR.AGR_NO = 1056851
   AND AR.AGREEMENT_KEY = MOD(1056851, 100)
   AND (AR.RESOURCE_STATE != 'F' OR AR.RESOURCE_STATE IS NULL)
   AND AR.RANGE_IND = 'N'
   AND SB.MEDIUM_SUB_NO = AR.AGR_NO
   AND EXISTS
(select /*+ INDEX(OP SMALL_PARAM_1IX) */
         1
  from SMALL_PARAM OP
         where OP.PARAM_NAME in (AR.RESOURCE_PRM_CD, AR.BASE_PARAM_NAME) )
   AND SB.MEDIUM_CUS_ID = CS.MEDIUM_CUS_ID
   AND (SB.EXPIRATION_DATE IS NULL OR SB.EXPIRATION_DATE >= to_date('19000101000000','YYYYMMDDHH24MISS'))
   AND (AR.EXPIRATION_DATE IS NULL OR AR.EXPIRATION_DATE >= to_date('19000101000000','YYYYMMDDHH24MISS'))
   AND (AR.EFFECTIVE_DATE < SB.EXPIRATION_DATE OR
       SB.EXPIRATION_DATE IS NULL)
   AND (AR.EXPIRATION_DATE IS NULL OR
       AR.EXPIRATION_DATE > SB.EFFECTIVE_DATE)
   AND SB.SUB_STATUS != 'T'

因为结果集的输出中没有op这个表的列,而且在where子句中存在exists语句,在exists里面也没有做关联,那个同事坚持说想在做关联的时候把op的数据先做了result cache,在子查询中就能做关联了,避免重复的表扫描。听起来好像有道理,我觉得语句有问题,尽管说是产品部分提供的方案。op在from 后,但是和后面的流程都没有关联,但也没有做笛卡尔积。在他的强烈反对中我把以下的部分从from中删除。

  (select /*+ RESULT_CACHE */
        DISTINCT PARAM_NAME as PARAM_NAME 
          from SMALL_PARAM    --这个表比较小,有不到2万条记录,如果加过滤条件,能过滤掉一半多的数据,因为那个字段不在索引字段里,所以加了result_cache
         where GUIDING_IND = 'Y') OP,

然后在备份库上重新跑一次,没想到一下子就有了反应。存在一定的物理读,第二次运行就没有了。逻辑度有了近6倍的提升,执行时间在0.02-0.07之间。

Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
 5854  consistent gets
        605  physical reads
         72  redo size
       1900  bytes sent via SQL*Net to client
        520  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          4  rows processed

按照这个速度,0.07秒*80000次/60/60=1.55小时,剩下的事情就是和他们确认一些具体的细节了。

原文发布于微信公众号 - 杨建荣的学习笔记(jianrong-notes)

原文发表时间:2014-06-01

本文参与腾讯云自媒体分享计划,欢迎正在阅读的你也加入,一起分享。

发表于

我来说两句

0 条评论
登录 后参与评论

相关文章

来自专栏hbbliyong

WPF Trigger for IsSelected in a DataTemplate for ListBox items

<DataTemplate DataType="{x:Type vm:HeaderSlugViewModel}"> <vw:HeaderSlug...

4064
来自专栏转载gongluck的CSDN博客

cocos2dx 打灰机

#include "GamePlane.h" #include "PlaneSprite.h" #include "BulletNode.h" #include...

5576
来自专栏pangguoming

Spring Boot集成JasperReports生成PDF文档

由于工作需要,要实现后端根据模板动态填充数据生成PDF文档,通过技术选型,使用Ireport5.6来设计模板,结合JasperReports5.6工具库来调用渲...

1.2K7
来自专栏杨龙飞前端

scrollto 到指定位置

2534
来自专栏飞扬的花生

jsencrypt参数前端加密c#解密

      写程序时一般是通过form表单或者ajax方式将参数提交到服务器进行验证,如何防止提交的请求不被抓包后串改,虽然无法说绝对安全却给非法提交提高了难度...

3879
来自专栏芋道源码1024

熔断器 Hystrix 源码解析 —— 断路器 HystrixCircuitBreaker

本文主要基于 Hystrix 1.5.X 版本 1. 概述 2. HystrixCircuitBreaker 3. HystrixCircuitBreaker....

5367
来自专栏张善友的专栏

Miguel de Icaza 细说 Mix 07大会上的Silverlight和DLR

Mono之父Miguel de Icaza 详细报道微软Mix 07大会上的Silverlight和DLR ,上面还谈到了Mono and Silverligh...

2717
来自专栏一个会写诗的程序员的博客

Spring Reactor 项目核心库Reactor Core

Non-Blocking Reactive Streams Foundation for the JVM both implementing a Reactiv...

2192
来自专栏闻道于事

js登录滑动验证,不滑动无法登陆

js的判断这里是根据滑块的位置进行判断,应该是用一个flag判断 <%@ page language="java" contentType="text/html...

6988
来自专栏跟着阿笨一起玩NET

c#实现打印功能

2812

扫码关注云+社区