生产系统的一个库(负责容错处理的),目前遇到了严重的性能问题,数据量也大的出奇,一个分区表一百多个分区,blob字段达到了800多G.查看 AWR 系统负载倒不重,但是根据反馈响应速度很慢。
Snap Id | Snap Time | Sessions | Cursors/Session | |
---|---|---|---|---|
Begin Snap: | 4512 | 17-Jul-13 10:00:04 | 24 | 2.0 |
End Snap: | 4513 | 17-Jul-13 11:00:06 | 41 | 19.5 |
Elapsed: | 60.03 (mins) | |||
DB Time: | 70.09 (mins) |
Load Profile
Per Second | Per Transaction | |
---|---|---|
Redo size: | 2,061,966.68 | 840,400.17 |
Logical reads: | 28,242.96 | 11,511.05 |
Block changes: | 4,749.98 | 1,935.96 |
Physical reads: | 1,990.22 | 811.16 |
Physical writes: | 244.15 | 99.51 |
User calls: | 303.93 | 123.87 |
Parses: | 80.30 | 32.73 |
Hard parses: | 0.02 | 0.01 |
Sorts: | 91.59 | 37.33 |
Logons: | 0.04 | 0.02 |
Executes: | 132.32 | 53.93 |
Transactions: | 2.45 |
Instance Efficiency Percentages (Target 100%)
Buffer Nowait %: | 99.67 | Redo NoWait %: | 99.95 |
---|---|---|---|
Buffer Hit %: | 96.06 | In-memory Sort %: | 100.00 |
Library Hit %: | 99.98 | Soft Parse %: | 99.97 |
Execute to Parse %: | 39.31 | Latch Hit %: | 97.11 |
Parse CPU to Parse Elapsd %: | 2.80 | % Non-Parse CPU: | 99.69 |
Parse CPU to Parse Elapsd指标很低,而且memory的使用情况也只有40%,50%.但是每秒的redo数达到了近2M. 想看看日志切换频率,但是似乎从报告中找不到明显的地方,可以用以下sql来。填上自己需要关注的时间段。
select sequence#,first_time,nexttime,round(((first_time-nexttime)*24)*60,2) diff from (select sequence#,first_time,lag(first_time) over(order by sequence#) nexttime from v$log_history where thread#=1 and to_char(first_time,'yyyy-mm-dd')='2013-07-17') order by sequence# desc; SEQ# FIRST_TIME NEXTTIME DURATION(mins) 167868 2013-07-17 10:15:37 2013-07-17 10:15:10 .45 167867 2013-07-17 10:15:10 2013-07-17 10:14:59 .18 167866 2013-07-17 10:14:59 2013-07-17 10:14:46 .22 167865 2013-07-17 10:14:46 2013-07-17 10:14:32 .23 167864 2013-07-17 10:14:32 2013-07-17 10:14:22 .17 167863 2013-07-17 10:14:22 2013-07-17 10:14:15 .12 167862 2013-07-17 10:14:15 2013-07-17 10:14:09 .1 167861 2013-07-17 10:14:09 2013-07-17 07:09:19 184.83 167860 2013-07-17 07:09:19 2013-07-17 03:01:32 247.78
可以看到在负载高的时候,日志切换极快。查看当前的redo情况,是四组日志,每组100M,看来明显不够。但是需要设置为多少合适呢,发现了问题,给出对应的指标值也是很重要的。根据oracle的建议,20分钟内切换算是一个指标。跑了一个addm报告,也轻松的得到了结果,addm建议调为2048M.oracle还估算了调优这个会节省多少时间,IMPACT: 26% impact (1099 seconds)
此外还惊喜的得到了调整cursor的建议,这个问题最近得到开发的反馈说会碰到:
ORA-01000 max open cursors exceed...
看来addm确实是个好东西,但是在报告里面没有给出明显的理由。在awr报告里找cursor相关的指标
soft parsing of SQL statements was consuming significant database time.
Statistic | Begin Value | End Value |
---|---|---|
session pga memory max | 15,053,354,632 | 15,249,597,664 |
session cursor cache count | 383,435 | 386,192 |
session uga memory | 4.4E+13 | 4.5E+13 |
opened cursors current | 48 | 800 |
logons current | 24 | 41 |
session uga memory max | 44,030,880,616 | 44,314,080,728 |
session pga memory | 13,478,131,336 | 13,655,531,424 |
给出的建议如下:
ACTION: Consider increase parameter open_cursors more then 300.
increase parameter session_cached_cursors to 150
now parameter details: open_cursors=300 session_cached_cursors=100
IMPACT: 3.9% impact (163 seconds)
此外还发现了一些top sql,但是没有明显的调优方向,我没有马上动手,我觉得这些需要以上的系统级配置生效以后需要做持续性的调整。
lob的调优还需要多补补。在后续章节分享。