Keyword:
AWR snapshot Generation MMON Suspension
AWR是ORACLE数据库重要的诊断工具,但是有时可能遇到AWR快照无法获取的问题,影响性能监测。本文介绍如何诊断和解决这类问题。
■收集诊断信息
1.获取AWR Snapshot的设定信息和过去取得信息
set mark html on
spool Info.html
alter session set NLS_DATE_FORMAT = 'YYYY/MM/DD HH24:MI:SS';
alter session set nls_timestamp_format='YYYY-MM-DD HH24:MI:SS.FF3'
---数据库状态
select open_mode from v$database;
---AWR相关环境设定信息
show parameter control_management_pack_access
show parameter statistics
---AWR设定信息
select * from dba_hist_wr_control;
---快照的取得信息
select * from dba_hist_snapshot order by snap_id;
---快照的最新取得时间信息
col systimestamp form a35
col most_recent_snap_time form a25
col snap_interval form a17
select systimestamp,
most_recent_snap_time,
snap_interval
from wrm$_wr_control
where dbid = (select dbid from v$database);
---快照的错误信息
col instance_number form 999 head INST
col begin_interval_time form a25
col flush_elapsed form a17
col status form 999
col error_count form 999 head ERR
col snap_flag form 999 head SNAP
select * from
(select snap_id,instance_number,STARTUP_TIME,begin_interval_time,
flush_elapsed,status,error_count,snap_flag
from wrm$_snapshot
where dbid = (select dbid from v$database)
order by snap_id desc)
where rownum <= 10
order by snap_id
;
---快照的错误原因
select * from wrm$_snap_error
where dbid = (select dbid from v$database)
order by snap_id;
---版本11.2.0.2以后,通过WRM$_SNAPSHOT_DETAILS视图获得各AWR对象表的处理时间
set pagesize 999
column name format a28
column time format a29
variable dbid number
exec select dbid into :dbid from v$database;
variable snap_id number
exec select max(snap_id) into :snap_id from wrm$_snapshot where dbid=:dbid;
select table_name_kewrtb name, end_time-begin_time time
from wrm$_snapshot_details, x$kewrtb
where snap_id = :snap_id and dbid = :dbid
and table_id = table_id_kewrtb
order by table_id;
---获取AQ相关信息,看是否AQ的信息太多
SELECT q_name, a, COUNT (*) HOW_MANY
FROM (SELECT q_name, TRUNC(enq_time, 'DD') AS a
FROM SYS.alert_qt
ORDER BY enq_time)
GROUP BY a, q_name
ORDER BY a;
select systimestamp, count(*), QUEUE, MSG_STATE
from aq$ALERT_QT
group by msg_state, queue;
spool off;
set mark html off;
参考: Troubleshooting: Missing Automatic Workload Repository (AWR) Snapshots and Other Collection Issues (Doc ID 1301503.1)
2.查看Alert log和相关的一些Trace信息(MMON,M00x等进程),看是否有有用的相关信息
例如:
Suspending MMON slave action kewrmafsa_ for 82800 seconds
3.确认数据库的Path状况:
export ORACLE_HOME=<OracleHome>
cd $ORACLE_HOME\OPatch
./opatch lsinventory
4.确认MMON 是否挂起(Suspension)
查看本节点的MMON状态:SQL> oradebug setmypid
SQL> oradebug unit_test kebm_dmp_slv_attrs kewrmafsa_
输出例:
SQL> oradebug setmypid
Statement processed.
SQL> oradebug unit_test kebm_dmp_slv_attrs kewrmafsa_
Status: 2
Flags: 0
Runtime limit: 1800
CPU time limit: 600
Violations: 0
Suspended until: 0 ★如果这个值大于0的话,则代表MMON发生了挂起
查看远程节点的MMON状态(RAC):SQL> oradebug setmypid
SQL> oradebug unit_test kebm_dmp_slv_attrs kewrmrfsa_
如果发生了挂起,使用如下命令进行恢复。SQL> oradebug unit_test kebm_set_slv_attrs kewrmafsa_ retain retain retain retain 0 0
5.确认通过手动收集是否能够正常
例:
SQL> exec dbms_workload_repository.create_snapshot();
6.确认用于保存AWR的SYSAUX表空间是否有空闲空间和使用状况。
SQL> connect / as sysdba
SQL> @?/rdbms/admin/awrinfo.sql
7.设置MMON 的动作Trace和snapshot flush Trace (RAC情况下每个Instance都需要设置)
有效:
---MMON action trace
SQL> alter session set "_swrf_test_action" = 28;
--snapshot flush Trace
SQL> alter session set "_swrf_test_action" = 10;
无效:(设置后等待1小时以上,包含AWR取得时间段)
SQL> alter session set "_swrf_test_action" = 29;
SQL> alter session set "_swrf_test_action" = 11;
8.设置针对MMON 的SQL trace
例:
有效:
begin
dbms_monitor.serv_mod_act_trace_enable(service_name=>'SYS$BACKGROUND',
module_name=>'MMON_SLAVE',
action_name=>'Auto-Flush Slave Action');
dbms_monitor.serv_mod_act_trace_enable(service_name=>'SYS$BACKGROUND',
module_name=>'MMON_SLAVE',
action_name=>'Remote-Flush Slave Action');
end;
/
无效:
begin
dbms_monitor.serv_mod_act_trace_disable(service_name=>'SYS$BACKGROUND',
module_name=>'MMON_SLAVE',
action_name=>'Auto-Flush Slave Action');
dbms_monitor.serv_mod_act_trace_disable(service_name=>'SYS$BACKGROUND',
module_name=>'MMON_SLAVE',
action_name=>'Remote-Flush Slave Action');
end;
/
9.也可以通过设置AWR test 事件来获取MMON 的SQL trace
有效:
(trace AWR SQL)
alter session set events 'immediate trace name awr_test level 3';
另外,也可以通过以下的设置获得其他的一些trace。
trace AWR snapshots:
alter session set events 'immediate trace name awr_test level 1';
trace AWR purging:
alter session set events 'immediate trace name awr_test level 2';
无效:(关闭所有awr test trace)
alter session set events 'immediate trace name awr_test level 4';
10.如果在AWR快照时发生挂起(Hang)的话,通过如下方法获取hanganalyze和 systemstate,确认进程状态和errorstack信息。
SQL> connect /as sysdba
SQL> oradebug setmypid
SQL> oradebug unlimit
SQL> oradebug setinst all
SQL> oradebug -g all hanganalyze 4
SQL> oradebug -g all dump systemstate 266
SQL> quit
<等待...>
SQL> connect /as sysdba
SQL> oradebug setmypid
SQL> oradebug unlimit
SQL> oradebug setinst all
SQL> oradebug -g all hanganalyze 4
SQL> oradebug -g all dump systemstate 266
SQL> quit
<等待...>
SQL> connect /as sysdba
SQL> oradebug setmypid
SQL> oradebug unlimit
SQL> oradebug setinst all
SQL> oradebug -g all hanganalyze 4
SQL> oradebug -g all dump systemstate 266
SQL> quit
或者,通过一下方法获取errostack
1.确认MMON的PID
SELECT s.program, p.pid
FROM v$session s, v$process p
WHERE s.paddr=p.addr
AND s.program like '%(MMON)';
2.获取errorstack
oradebug setorapid <MMON_PID_from_previous_query>
oradebug unlimit
oradebug short_stack ---> Repeat this command one or more times.
oradebug tracefile_name
通过如下方法尝试重新启动AWR 快照取得的动作
1.通过更改“restricted session”的mode重新唤醒MMON进程
例:
alter system enable restricted session;
alter system disable restricted session;
2.通过OS命令将MMON进程Kill
例:
kill -9 <MMON Process ID>
MMON被kill或,会自动启动MMON和MMON Slave进程(m00x)
3.重启数据库
通过重启数据库会重新启动MMON,从而解决相关问题
4.删除AQ相关信息
--
alter system set "_alert_message_purge"=0;
declare
po dbms_aqadm.aq$_purge_options_t;
begin
dbms_aqadm.purge_queue_table(queue_table => 'ALERT_QT',
purge_condition => NULL,
purge_options => po);
end;
/
alter system set "_alert_message_purge"=1;
5.重建AWR功能
参考:
How to Recreate the Automatic Workload Repository (AWR)? (Doc ID 782974.1)
6.字典表统计信息不准确导致性能慢
例:
SQL> execute dbms_stats.gather_table_stats
7.如果发现在处理某个AWR对象时耗时间而导致无法完成的话,可以使用"_awr_disabled_flush_tables" 参数跳过对该对象的处理,这样会导致AWR缺少某些数据,但是整体可以完成。
例:
alter system set "_awr_disabled_flush_tables" = 'wrh$_sql_bind_metadata';
用碎片化的时间,一点一滴地学习一套系统化的知识。