专栏首页数据和云如何巧妙处理enq: TX - index contention问题?

如何巧妙处理enq: TX - index contention问题?

问题描述

某客户生产系统核心库在2019-08-21 23:50:00左右出现了业务告警,应用无法连接的情况。

查看故障期间数据库的告警日志,发现在23:49:56,数据库进行了数据库文件扩容操作。

Wed Aug 21 23:49:56 2019
ALTER DATABASE DATAFILE xxxxxxx RESIZE 100G
Wed Aug 21 23:51:38 2019
Completed: ALTER DATABASE DATAFILE xxxxxxx RESIZE 100G
Wed Aug 21 23:52:31 2019
ALTER DATABASE DATAFILE xxxxxxx RESIZE 950G
Thu Aug 22 00:05:23 2019
Completed: ALTER DATABASE DATAFILE xxxxxxx RESIZE 950G
Thu Aug 22 00:15:18 2019

查看故障期间数据库的监听日志,发现从23:58:44到00:05:23这段时间内,数据库没有新的JDBC连接进来。

专家解答

查看故障期间的ash信息,发现在应用异常的时间段内,SQL_ID为g64z3fw1nffvj的insert操作产生了大量的enq: TX - index contention等待,并且造成这个等待事件的原因就是alter database操作和它本身的insert操作,在resize的执行过程中enq: TX - index contention等待事件一直升高,resize操作成功后(即2019-08-22 00:05:00左右)等待事件恢复正常。

查看相关的trace文件信息:

*** 2019-08-21 23:57:14.907
===============================================================================
HANG ANALYSIS:
  instances (db_name.oracle_sid): xxxxx
  
===============================================================================
 
Chains most likely to have caused the hang<<<<<<<<<<<<<<<<<<<<<<查看HANG住三个可能因素
 [a] Chain 1 Signature: <not in a wait><='buffer busy waits'<='enq: TX - index contention'
     Chain 1 Signature Hash: 0xbeca5ef7
 [b] Chain 2 Signature: <not in a wait><='buffer busy waits'<='enq: TX - index contention'
     Chain 2 Signature Hash: 0xbeca5ef7
 [c] Chain 3 Signature: <not in a wait><='row cache lock'
     Chain 3 Signature Hash: 0xccefbfc0
 
===============================================================================
Non-intersecting chains:
 
-------------------------------------------------------------------------------
Chain 1:
-------------------------------------------------------------------------------
    Oracle session identified by:
    {
                instance: 2 (xxxxx)
                   os id: 15343
              process id: 67
              session id: 2882
        session serial #: 39285
    }
    is waiting for 'enq: TX - index contention' with wait info:
    {
                      p1: 'name|mode'=0x54580004
                      p2: 'usn<<16 | slot'=0x470000
                      p3: 'sequence'=0xa32a2d
  ……
    and is blocked by
 => Oracle session identified by:
    {
                instance: 2 (xxxxxxx)
                   os id: 105251
              process id: 79
              session id: 3400       <<<<<<<<<<<<<<<<<<<<<<chain 1被session_id=3400会话阻塞了
        session serial #: 22853
    }
    which is not in a wait:
    {
               last wait: 36.074155 sec ago
                blocking: 4 sessions
            wait history:
              1.       event: 'ASM file metadata operation'
                 time waited: 0.000018 sec (last interval)
                 time waited: 0.001953 sec (total)
                     wait id: 94              p1: 'msgop'=0x21
                                              p2: 'locn'=0x0
              * time between wait #1 and #2: 0.000000 sec
              2.       event: 'KSV master wait'
                 time waited: 0.001856 sec
                     wait id: 96              
              * time between wait #2 and #3: 0.000000 sec
              3.       event: 'ASM file metadata operation'
                 time waited: 0.000003 sec (last interval)
                 time waited: 0.000079 sec (total)
                     wait id: 94              p1: 'msgop'=0x21
                                              p2: 'locn'=0x0
    }
Chain 1 Signature: <not in a wait><='buffer busy waits'<='enq: TX - index contention'
Chain 1 Signature Hash: 0xbeca5ef7
-------------------------------------------------------------------------------
 
===============================================================================
Intersecting chains:
 
-------------------------------------------------------------------------------
Chain 2:
-------------------------------------------------------------------------------
    Oracle session identified by:
    {
                instance: 2 (xxxxxxx)
                   os id: 70061
              process id: 69
              session id: 2969
        session serial #: 46127
    }
    …….
    and is blocked by 'instance: 2, os id: 83344, session id: 2840',
    which is a member of 'Chain 1'. <<<<<<<<<<<<<<<<<<<<<<被chain1阻塞了 
 
Chain 2 Signature: <not in a wait><='buffer busy waits'<='enq: TX - index contention'
Chain 2 Signature Hash: 0xbeca5ef7
-------------------------------------------------------------------------------
 
-------------------------------------------------------------------------------
Chain 3:
-------------------------------------------------------------------------------
    Oracle session identified by:
    {
                instance: 2 (xxxxxx)
                   os id: 102165
              process id: 75
              session id: 3227
        session serial #: 3457
    }
    is waiting for 'row cache lock' with wait info:
    {
                      p1: 'cache id'=0x6
                      p2: 'mode'=0x0
                      p3: 'request'=0x3
            time in wait: 2.020260 sec
      heur. time in wait: 1 min 59 sec
           timeout after: 0.979740 sec
                 wait id: 360
                blocking: 0 sessions
            wait history:
              * time between current wait and wait #1: 0.000015 sec
              1.       event: 'row cache lock'
                 time waited: 3.001099 sec
                     wait id: 359             p1: 'cache id'=0x6
                                              p2: 'mode'=0x0
                                              p3: 'request'=0x3
              * time between wait #1 and #2: 0.000015 sec
              2.       event: 'row cache lock'
                 time waited: 3.000983 sec
                     wait id: 358             p1: 'cache id'=0x6
                                              p2: 'mode'=0x0
                                              p3: 'request'=0x3
              * time between wait #2 and #3: 0.000013 sec
              3.       event: 'row cache lock'
                 time waited: 3.000978 sec
                     wait id: 357             p1: 'cache id'=0x6
                                              p2: 'mode'=0x0
                                              p3: 'request'=0x3
    }
    and is blocked by 'instance: 2, os id: 105251, session id: 3400'。
    which is a member of 'Chain 1'. <<<<<<<<<<<<<<<<<<<<<<被session_id 3400阻塞了
 
Chain 3 Signature: <not in a wait><='row cache lock'
Chain 3 Signature Hash: 0xccefbfc0
-------------------------------------------------------------------------------
 
===============================================================================
Extra information that will be dumped at higher levels:
[level  4] :   1 node dumps -- [LEAF] [LEAF_NW] 
[level  5] :   4 node dumps -- [NO_WAIT] [INVOL_WT] [SINGLE_NODE] [NLEAF] [SINGLE_NODE_NW] 
State of ALL nodes
 [2839]/2/2840/17657/0x13d1a33000/83344/NLEAF/[3399]
[2881]/2/2882/39285/0x13c1a48b38/15343/NLEAF/[2839]
[2968]/2/2969/46127/0x13b1a69c08/70061/NLEAF/[2839]
[3226]/2/3227/3457/0x13c1ade618/102165/NLEAF/[3399]
[3399]/2/3400/22853/0x1391b4cad8/105251/LEAF_NW/
*** 2019-08-21 23:57:14.908
===============================================================================
END OF HANG ANALYSIS
===============================================================================
*** 2019-08-21 23:58:53.765
Suspected Hangs in the System
                     Root       Chain Total               Hang               
  Hang Hang          Inst Root  #hung #hung  Hang   Hang  Resolution         
    ID Type Status   Num  Sess   Sess  Sess  Conf   Span  Action             
 ----- ---- -------- ---- ----- ----- ----- ------ ------ -------------------
      1 HANG    VALID    2  3400     3   286   HIGH  LOCAL Terminate Process 
<<<<<<<<<<<<<<<<<<< 系统阻塞的可能原因是session id为3400的本地进程引起的
  inst# SessId  Ser#     OSPID Event
  ----- ------ ----- --------- -----
      2      2 19927    111623 enq: TX - index contention
      2   2840 17657     83344 buffer busy waits
      2   3400 22853    105251 not in wait
 
*** 2019-08-21 23:58:53.766
HM: Short Stack of immediate waiter session ID 2840, OSPID 83344 of hang ID 1
Short stack dump:
ksedsts()+465<-ksdxfstk()+32<-ksdxcb()+1927<-sspuser()+112<-__sighandler()<-semtimedop()+10<-skgpwwait()+178<-ksliwat()+2022<-kslwaitctx()+163<-kcbzwb()+1548<-kcbgcur()+8885<-ktfbhget()+254<-ktfbfsearch()+300<-ktfbtgex()+1240<-ktsxs_add()+2094<-ktspnr_next()+1115<-ktrsexec()+529<-ktspbmphwm()+1236<-ktspmvhwm()+53<-ktsp_bump_hwm()+212<-ktspgsp_main()+1941<-kdisnew()+279<-kdisnewle()+115<-kdisle()+4332<-kdiins0()+34743<-kdiinsp()+91<-kauxsin()+1784<-qesltcLoadIndexList()+922<-qesltcLoadIndexes()+55<-qerltcNoKdtBufferedInsRowCBK()+374<-qerltcSingleRowLoad()+279<-qerltcFetch()+379<-insexe()+691<-opiexe()+5632<-kpoal8()+2380<-opiodr()+917<-ttcpip()+2183<-opitsk()+1710<-opiino()+969<-opiodr()+917<-opidrv()+570<-sou2o()+103<-opimai_real()+133<-ssthrdmain()+265<-main()+201<-__libc_start_main()+253

HM: Short Stack of root session ID 3400, OSPID 105251 of hang ID 1
Short stack dump:
ksedsts()+465<-ksdxfstk()+32<-ksdxcb()+1927<-sspuser()+112<-__sighandler()<-pwrite(

从上述trace文件可以看出系统HANG住的可能原因是session_id为 3400的会话引起的。根据查看dba_hist_active_sess_history视图可以看出session_id为 3400的会话就是alter database datafile resize的操作。

根据该故障现象,查询mos发现:Alter Tablespace Resize on Bigfile Tablespace Causes Sessions to Hang with ‘Enq: TX - contention’ and ‘Buffer busy waits’ (Doc ID 2089689.1) ,说明ORACLE 11.2.0.4及后续版本中存在该BUG,调整表空间大小时将阻塞表空间中的其他DDL操作。

问题解决

  • 建议在数据库负载较低的时候进行数据文件扩容
  • 建议每次表空间扩容的大小不要过大

本文分享自微信公众号 - 数据和云(OraNews),作者:杨柳

原文出处及转载信息见文内详细说明,如有侵权,请联系 yunjia_community@tencent.com 删除。

原始发表时间:2019-10-22

本文参与腾讯云自媒体分享计划,欢迎正在阅读的你也加入,一起分享。

我来说两句

0 条评论
登录 后参与评论

相关文章

  • Oracle RAC环境下如何定位并杀掉最终阻塞的会话

    导读:Oracle RAC环境下定位并杀掉最终阻塞的会话,本文通过一个测试demo来具体介绍。

    数据和云
  • SQL执行效率提升几万倍的操作详解!

    发现没有用到索引,type全是ALL,那么首先想到的就是建立一个索引,建立索引的字段当然是在where条件的字段。

    数据和云
  • Oracle 20c 新特性:自动的区域图 - Automatic Zone Maps

    导读:Zone Maps 是一个独立的访问结构,可以为表独立建立。在表和索引扫描期间,区块图可以根据表列上的谓词来修剪表的磁盘块和分区表的潜在完整分区。区块映射...

    数据和云
  • 滴滴出行二面笔试题

    在喝奶茶的时候突然收到一封邮件,猝不及防的开始了滴滴出行国际部二面笔试。感觉题目还是比较基础的,我以为会有什么留存率一些比较困难的业务题,慌的我喝了几杯水,没想...

    开心鸭
  • 关于ADAS系统小记

    ADAS(Advanced Driving Assistant System)高级驾驶辅助系统是利用安装在车上的各式各样传感器(毫米波雷达、激光雷达、单\双目摄...

    李小白是一只喵
  • 第十二节:Activiti6.0——四种边界事件:定时器、错误、信号、补偿

    边界事件:边界事件属于一种特殊的中间事件。区别是: 中间事件 可以单独作为流程元素存在于流程中,而 边界事件 必须附属于某个流程元素(如任务、子流程等)。边界事...

    凡人飞
  • 每秒执行6000的简单SQL优化(二) (r10笔记第65天)

    继续前几天的一次性能调优,这次调优难度不小,而且空间很小,看起来简直就是绝处逢生的感觉。下面的两条SQL语句执行频率极高,每秒达到6000次,希望能够优化。 s...

    jeanron100
  • 解决laravel id非自增 模型取回为0 的问题

    laravel5.2 中 如果一个模型的id 为string等非自增类型时候 使用模型的find方法 会返会0

    砸漏
  • vuejs基础-v-for用法

    在组件中,使用v-for循环的时候,或者在一些特殊情况中,如果v-for有问题,必须使用v-for的同时,指定唯一的字符串/数字类型:key值

    eadela
  • JavaScript 数组及操作方法

    可以看出join()方法并不会改变aList数组本身的值,只会生成一个字符串而已。

    Devops海洋的渔夫

扫码关注云+社区

领取腾讯云代金券