Statspack之十四-"log file sync" 等待事件

原文出处:

http://www.eygle.com/statspack/statspack14-LogFileSync.htm 当一个用户提交(commits)或者回滚(rollback),session的redo信息需要写出到redo logfile中. 用户进程将通知LGWR执行写出操作,LGWR完成任务以后会通知用户进程. 这个等待事件就是指用户进程等待LGWR的写完成通知.

对于回滚操作,该事件记录从用户发出rollback命令到回滚完成的时间.

如果该等待过多,可能说明LGWR的写出效率低下,或者系统提交过于频繁. 针对该问题,可以关注: log file parallel write等待事件 user commits,user rollback等统计信息可以用于观察提交或回滚次数

解决方案: 1.提高LGWR性能 尽量使用快速磁盘,不要把redo log file存放在raid 5的磁盘上 2.使用批量提交 3.适当使用NOLOGGING/UNRECOVERABLE等选项

可以通过如下公式计算平均redo写大小:

avg.redo write size = (Redo block written/redo writes)*512 bytes

如果系统产生redo很多,而每次写的较少,一般说明LGWR被过于频繁的激活了. 可能导致过多的redo相关latch的竞争,而且Oracle可能无法有效的使用piggyback的功能.

我们从一个statspack中提取一些数据来研究一下这个问题.

1.主要信息

DB Name DB Id Instance Inst Num Release OPS Host ------------ ----------- ------------ -------- ----------- --- ------------ DB 1222010599 oracle 1 8.1.7.4.5 NO sun Snap Id Snap Time Sessions ------- ------------------ -------- Begin Snap: 3473 13-Oct-04 13:43:00 540 End Snap: 3475 13-Oct-04 14:07:28 540 Elapsed: 24.47 (mins) Cache Sizes ~~~~~~~~~~~ db_block_buffers: 102400 log_buffer: 20971520 db_block_size: 8192 shared_pool_size: 600M Load Profile ~~~~~~~~~~~~ Per Second Per Transaction --------------- --------------- Redo size: 28,458.11 2,852.03 ......

2.等待事件

Event Waits Timeouts Time (cs) (ms) /txn ---------------------------- ------------ ---------- ----------- ------ ------ log file sync 14,466 2 4,150 3 1.0 db file sequential read 17,202 0 2,869 2 1.2 latch free 24,841 13,489 2,072 1 1.7 direct path write 121 0 1,455 120 0.0 db file parallel write 1,314 0 1,383 11 0.1 log file sequential read 1,540 0 63 0 0.1 .... log file switch completion 1 0 3 30 0.0 refresh controlfile command 23 0 1 0 0.0 LGWR wait for redo copy 46 0 0 0 0.0 .... log file single write 4 0 0 0 0.0

我们看到,这里log file sync和db file parallel write等待同时出现了. 显然log file sync在等待db file parallel write的完成.

这里磁盘IO肯定存在了瓶颈,实际用户的redo和数据文件同时存放在Raid的磁盘上,存在性能问题. 需要调整.

3.统计信息

Statistic Total per Second per Trans --------------------------------- ---------------- ------------ ------------ .... redo blocks written 93,853 63.9 6.4 redo buffer allocation retries 1 0.0 0.0 redo entries 135,837 92.5 9.3 redo log space requests 1 0.0 0.0 redo log space wait time 3 0.0 0.0 redo ordering marks 0 0.0 0.0 redo size 41,776,508 28,458.1 2,852.0 redo synch time 4,174 2.8 0.3 redo synch writes 14,198 9.7 1.0 redo wastage 4,769,200 3,248.8 325.6 redo write time 3,698 2.5 0.3 redo writer latching time 0 0.0 0.0 redo writes 14,572 9.9 1.0 .... sorts (disk) 4 0.0 0.0 sorts (memory) 179,856 122.5 12.3 sorts (rows) 2,750,980 1,874.0 187.8 .... transaction rollbacks 36 0.0 0.0 transaction tables consistent rea 0 0.0 0.0 transaction tables consistent rea 0 0.0 0.0 user calls 1,390,718 947.4 94.9 user commits 14,136 9.6 1.0 user rollbacks 512 0.4 0.0 write clones created in backgroun 0 0.0 0.0 write clones created in foregroun 11 0.0 0.0 -------------------------------------------------------------

avg.redo write size = (Redo block written/redo writes)*512 bytes
		    = ( 93,853 / 14,572 )*512 
		    = 3K				

这个平均过小了,说明系统的提交过于频繁.

Latch Sleep breakdown for DB: DPSHDB Instance: dpshdb Snaps: 3473 -3475 -> ordered by misses desc Get Spin & Latch Name Requests Misses Sleeps Sleeps 1->4 -------------------------- -------------- ----------- ----------- ------------ row cache objects 12,257,850 113,299 64 113235/64/0/ 0/0 shared pool 3,690,715 60,279 15,857 52484/588/65 46/661/0 library cache 4,912,465 29,454 8,876 23823/2682/2 733/216/0 cache buffers chains 10,314,526 2,856 33 2823/33/0/0/ 0 redo writing 76,550 937 1 936/1/0/0/0 session idle bit 2,871,949 225 1 224/1/0/0/0 messages 107,950 159 2 157/2/0/0/0 session allocation 184,386 44 6 38/6/0/0/0 checkpoint queue latch 96,583 1 1 0/1/0/0/0 -------------------------------------------------------------

由于过渡频繁的提交,LGWR过度频繁的激活,我们看到这里出现了redo writing的latch竞争.

关于redo writing竞争你可以在steve的站点找到详细的介绍: http://www.ixora.com.au/notes/lgwr_latching.htm

转引如下:

When LGWR wakes up, it first takes the redo writing latch to update the SGA variable that shows whether it is active. This prevents other Oracle processes from posting LGWR needlessly. LGWR then takes the redo allocation latch to determine how much redo might be available to write (subject to the release of the redo copy latches). If none, it takes the redo writing latch again to record that it is no longer active, before starting another rdbms ipc message wait. If there is redo to write, LGWR then inspects the latch recovery areas for the redo copy latches (without taking the latches) to determine whether there are any incomplete copies into the log buffer. For incomplete copies above the sync RBA, LGWR just defers the writing of that block and subsequent log buffer blocks. For incomplete copies below the sync RBA, LGWR sleeps on a LGWR wait for redo copy wait event, and is posted when the required copy latches have been released. The time taken by LGWR to take the redo writing and redo allocation latches and to wait for the redo copy latches is accumulated in the redo writer latching time statistic. (Prior to release 8i, foreground processes held the redo copy latches more briefly because they did not retain them for the application of the change vectors. Therefore, LGWR would instead attempt to assure itself that there were no ongoing copies into the log buffer by taking all the redo copy latches.) After each redo write has completed, LGWR takes the redo allocation latch again in order to update the SGA variable containing the base disk block for the log buffer. This effectively frees the log buffer blocks that have just been written, so that they may be reused.

本文作者: eygle,Oracle技术关注者,来自中国最大的Oracle技术论坛itpub. www.eygle.com是作者的个人站点.你可通过Guoqiang.Gai@gmail.com来联系作者.欢迎技术探讨交流以及链接交换.


原文出处:

http://www.eygle.com/statspack/statspack14-LogFileSync.htm

本文参与腾讯云自媒体分享计划,欢迎正在阅读的你也加入,一起分享。

发表于

我来说两句

0 条评论
登录 后参与评论

相关文章

来自专栏图像识别与深度学习

2018-08-15python通过蓝牙接megpi主板

Equipment Class: DXX - Part 15 Low Power Communication Device Transmitter

2872
来自专栏黑泽君的专栏

day54_BOS项目_06

第一步:根据提供的 业务受理.pdm 文件生成建表文件 bos_qp.sql 第二步:由于业务受理.pdm 文件中有伪表,所以我们需要修改生成的建表文件,修改如...

872
来自专栏FreeBuf

远程RPC溢出EXP编写实战之MS06-040

0x01 前言 MS06-040算是个比较老的洞了,在当年影响十分之广,基本上Microsoft大部分操作系统都受到了影响,威力不亚于17年爆出的”永恒之蓝”漏...

27210
来自专栏JackieZheng

Spring实战——XML和JavaConfig的混合配置

前言 看了园龄已经两年多了,再不能写完内容直接点击发布,留下一片密密麻麻的文字让别人看的头昏脑涨。所以现在每次写完主要内容后,还需要对于格式稍稍调整下。那么有没...

2966
来自专栏大魏分享(微信公众号:david-share)

通过Swagger管理API:API Management学习第一篇

3 Scale有个很好的功能,它提供ActiveDocs实时文档。它基于Swagger框架,提供了一种记录API的方法,并包含在Developer Portal...

1203
来自专栏杨建荣的学习笔记

一天内碰到的3个rac节点问题 (r6笔记第36天)

说到问题,真是层出不穷,自己也算搭建了也不少的rac环境的,但是在本地试验的时候总是会碰到一些问题,昨晚铲掉旧环境,搭建了两遍rac环境,终于在凌晨搭建好了环境...

3437
来自专栏安恒网络空间安全讲武堂

赛前福利①最新2018HITB国际赛writeup

FIRST 距离“西湖论剑杯”全国大学生网络空间安全技能大赛只有10天啦! 要拿大奖、赢offer,那必须得来点赛前练习定定心啊~这不,讲武堂就拿到了2018H...

4545
来自专栏葡萄城控件技术团队

Url Rewrite 再说Url 重写

前几天看到园子里一篇关于 Url 重写的文章《获取ISAPI_Rewrite重写后的URL》 , URL-Rewrite 这项技术早已不是一项新技术了,这个话题...

5408
来自专栏blackpiglet

在 Kubernetes中,fluentd 以 sidecar 模式收集日志,并发送至 ElasticSearch

ElasticSearch 在日志收集和分析领域非常流行,而 fluentd 是一种万用型的日志收集器,当然也支持 ES(ElasticSearch)。不过在 ...

1482
来自专栏Kubernetes

cluster-proportional-autoscaler源码分析及如何解决KubeDNS性能瓶颈

Author: xidianwangtao@gmail.com 工作机制 cluster-proportional-autoscaler是kubernetes的...

46510

扫码关注云+社区