Oracle有时会出现hang住的现象,原因可能很多,例如资源争用,如果需要知道为什么出现hang,可以分析等待链,而使用hanganalyze是其中一种方式,今天尝试一下。
首先,我们模拟场景,会话1,更新test表的第一行记录,但是不提交,
SQL> select sid from v$mystat where rownum=1;
SID
----------
42
SQL> update test set object_id=1 where rownum=1;
1 row updated.
会话2,更新相同行,此时出现了hang,
SQL> update test set object_id=1 where rownum=1;
会话3,更新相同行,还是出现了hang,
SQL> update test set object_id=1 where rownum=1;
可能当Oracle在hang状态下正常的sqlplus已经不能连上,从10g开始,提供了另外一种登陆方式,就是带着prelim,
[oracle@bisal ~]$ sqlplus -prelim / as sysdba
SQL*Plus: Release 19.0.0.0.0 - Production on Fri Apr 23 09:03:52 2021
Version 19.3.0.0.0
Copyright (c) 1982, 2019, Oracle. All rights reserved.
P.S. prelim的意思是preliminary connection
可以指定3级的hanganalyze,
SQL> oradebug hanganalyze 3
Statement processed.
不同的LEVEL打印的内容是不同的,当然级别越高,打印越多,消耗就会高,
1-2:Only HANGANALYZE output, no process dump at all(只有hanganalyeze输出,不dump任何进程) 3:Level 2 + Dump only processes thought to be in a hang (IN_HANG state)(DUMP IN_HANGE状态的进程) 4:Level 3 + Dump leaf nodes (blockers) in wait chains (LEAF,LEAF_NW,IGN_DMP state) 5:Level 4 + Dump all processes involved in wait chains (NLEAF state) 10:Dump all processes (IGN state)
如果是CDB环境,要确认是CDB级别的hang还是PDB级别的hang。如果是PDB级别的hang,只需要收集PDB的信息。如果无法确认,建议收集一下CDB的信息。
从11.2.0.2起,使用"sqlplus -prelim"登陆后,hanganalyze不再有内容输出,因为需要一个进程状态对象和一个会话状态对象。所以执行hanganalyze分析后,只是显示执行成功,trace中只会记录,
HANG ANALYSIS:
ERROR: Can not perform hang analysis dump without a process
state object and a session state object.
( process=(nil), sess=(nil) )
正常连接执行hanganalyze,会显示打印的trace文件路径和名称,
[oracle@bisal ~]$ sys
SQL*Plus: Release 19.0.0.0.0 - Production on Fri Apr 23 09:05:19 2021
Version 19.3.0.0.0
Copyright (c) 1982, 2019, Oracle. All rights reserved.
Connected to:
Oracle Database 19c Enterprise Edition Release 19.0.0.0.0 - Production
Version 19.3.0.0.0
SQL> oradebug hanganalyze 3
Hang Analysis in /opt/oracle/diag/rdbms/bisalcdb/BISALCDB/trace/BISALCDB_ora_47241.trc
从文件中,可以看到当前系统中存在两条等待链,等待事件都是enq: TX - row lock contention,Chain 1的会话是被sid=42的初始会话阻塞,Chain 2的会话同样被sid=42阻塞(which is a member of 'Chain 1'),
===============================================================================
Chains most likely to have caused the hang:
[a] Chain 1 Signature: 'SQL*Net message from client'<='enq: TX - row lock contention'
Chain 1 Signature Hash: 0x38c48850
[b] Chain 2 Signature: 'SQL*Net message from client'<='enq: TX - row lock contention'
Chain 2 Signature Hash: 0x38c48850
===============================================================================
Non-intersecting chains:
-------------------------------------------------------------------------------
Chain 1:
-------------------------------------------------------------------------------
Oracle session identified by:
{
instance: 1 (bisalcdb.bisalcdb)
os id: 41963
process id: 51, oracle@bisal
session id: 79
session serial #: 18025
module name: 5 (SQL*Plusbisal (TNS V1-V3))
pdb id: 5 (BISALPDB2)
}
is waiting for 'enq: TX - row lock contention' with wait info:
{
p1: 'name|mode'=0x54580006
p2: 'usn<<16 | slot'=0x5000a
p3: 'sequence'=0x465
time in wait: 1 min 42 sec
timeout after: never
wait id: 25
blocking: 0 sessions
current sql_id: 2786213102
current sql: update test set object_id=1 where rownum = 1
short stack: ksedsts()+426<-ksdxfstk()+58<-ksdxcb()+872<-sspuser()+200<-__sighandler()<-semtimedop()+10<-skgpwwait()+187<-ksliwat()+2218<-kslwaitctx()+188<-ksqcmi()+21656<-ksqgtlctx()+6620<-ksqgelctx()+838<-ktuGetTxForXid()+230<-ktcwit1()+367<-kdddgb()+5478<-kdusru()+458<-updrowFastPath()+1209<-qerupUpdRow()+1548<-qerupFetch()+471<-updaul()+1416<-updThreePhaseExe()+340<-updexe()+405<-opiexe()+11573<-kpoal8()+2387<-opiodr()+1202<-ttcpip()+1222<-opitsk()+1895<-opiino()+936<-opiodr()+1202<-opidrv()+1094<-sou2o()+165<-opimai_real(
wait history:
* time between current wait and wait #1: 0.044253 sec
1. event: 'SQL*Net message from client'
time waited: 13.489353 sec
wait id: 24 p1: 'driver id'=0x54435000
p2: '#bytes'=0x1
* time between wait #1 and #2: 0.000000 sec
2. event: 'SQL*Net message to client'
time waited: 0.000001 sec
wait id: 23 p1: 'driver id'=0x54435000
p2: '#bytes'=0x1
* time between wait #2 and #3: 0.000011 sec
3. event: 'SQL*Net message from client'
time waited: 0.000027 sec
wait id: 22 p1: 'driver id'=0x54435000
p2: '#bytes'=0x1
}
and is blocked by
=> Oracle session identified by:
{
instance: 1 (bisalcdb.bisalcdb)
os id: 18256
process id: 46, oracle@bisal
session id: 42
session serial #: 18536
module name: 5 (SQL*Plusbisal (TNS V1-V3))
pdb id: 5 (BISALPDB2)
}
which is waiting for 'SQL*Net message from client' with wait info:
{
p1: 'driver id'=0x54435000
p2: '#bytes'=0x1
time in wait: 2 min 6 sec
timeout after: never
wait id: 82
blocking: 1 session
current sql_id: 0
current sql:
short stack: ksedsts()+426<-ksdxfstk()+58<-ksdxcb()+872<-sspuser()+200<-__sighandler()<-read()+14<-snttread()+16<-nttfprd()+354<-nsbasic_brc()+399<-nioqrc()+438<-opikndf2()+999<-opitsk()+905<-opiino()+936<-opiodr()+1202<-opidrv()+1094<-sou2o()+165<-opimai_real()+422<-ssthrdmain()+417<-main()+256<-__libc_start_main()+245
wait history:
* time between current wait and wait #1: 0.000007 sec
1. event: 'SQL*Net message to client'
time waited: 0.000001 sec
wait id: 81 p1: 'driver id'=0x54435000
p2: '#bytes'=0x1
* time between wait #1 and #2: 0.001595 sec
2. event: 'db file scattered read'
time waited: 0.005296 sec
wait id: 80 p1: 'file#'=0x13
p2: 'block#'=0x763
p3: 'blocks'=0x5
* time between wait #2 and #3: 0.000287 sec
3. event: 'db file sequential read'
time waited: 0.019587 sec
wait id: 79 p1: 'file#'=0x13
p2: 'block#'=0x762
p3: 'blocks'=0x1
}
Chain 1 Signature: 'SQL*Net message from client'<='enq: TX - row lock contention'
Chain 1 Signature Hash: 0x38c48850
-------------------------------------------------------------------------------
===============================================================================
Intersecting chains:
-------------------------------------------------------------------------------
Chain 2:
-------------------------------------------------------------------------------
Oracle session identified by:
{
instance: 1 (bisalcdb.bisalcdb)
os id: 41963
process id: 51, oracle@bisal
session id: 79
session serial #: 18025
module name: 5 (SQL*Plusbisal (TNS V1-V3))
pdb id: 5 (BISALPDB2)
}
is waiting for 'enq: TX - row lock contention' with wait info:
{
p1: 'name|mode'=0x54580006
p2: 'usn<<16 | slot'=0x5000a
p3: 'sequence'=0x465
time in wait: 4 min 40 sec
timeout after: never
wait id: 25
blocking: 0 sessions
current sql_id: 2786213102
current sql: update test set object_id=1 where rownum = 1
short stack: ksedsts()+426<-ksdxfstk()+58<-ksdxcb()+872<-sspuser()+200<-__sighandler()<-semtimedop()+10<-skgpwwait()+187<-ksliwat()+2218<-kslwaitctx()+188<-ksqcmi()+21656<-ksqgtlctx()+6620<-ksqgelctx()+838<-ktuGetTxForXid()+230<-ktcwit1()+367<-kdddgb()+5478<-kdusru()+458<-updrowFastPath()+1209<-qerupUpdRow()+1548<-qerupFetch()+471<-updaul()+1416<-updThreePhaseExe()+340<-updexe()+405<-opiexe()+11573<-kpoal8()+2387<-opiodr()+1202<-ttcpip()+1222<-opitsk()+1895<-opiino()+936<-opiodr()+1202<-opidrv()+1094<-sou2o()+165<-opimai_real(
wait history:
* time between current wait and wait #1: 0.044253 sec
1. event: 'SQL*Net message from client'
time waited: 13.489353 sec
wait id: 24 p1: 'driver id'=0x54435000
p2: '#bytes'=0x1
* time between wait #1 and #2: 0.000000 sec
2. event: 'SQL*Net message to client'
time waited: 0.000001 sec
wait id: 23 p1: 'driver id'=0x54435000
p2: '#bytes'=0x1
* time between wait #2 and #3: 0.000011 sec
3. event: 'SQL*Net message from client'
time waited: 0.000027 sec
wait id: 22 p1: 'driver id'=0x54435000
p2: '#bytes'=0x1
}
and is blocked by 'instance: 1, os id: 18256, session id: 42',
which is a member of 'Chain 1'.
Chain 2 Signature: 'SQL*Net message from client'<='enq: TX - row lock contention'
Chain 2 Signature Hash: 0x38c48850
-------------------------------------------------------------------------------
===============================================================================
因此,从这个trace文件中,就可以很清晰地看到等待链的逻辑关系,进而有针对性的管理会话。