自适应log file sync影响案例

bisal

发布于 2019-01-30 09:38:49

9040

发布于 2019-01-30 09:38:49

文章被收录于专栏：bisal的个人杂货铺

Oracle最吸引人的地方，就是有些答案，隐藏在种种现象之中，扑朔迷离，朦朦胧胧，就像侦探办案，首先要有思路，其次要有证据，再者就是扎实的基础知识，另外就是些运气。

例如最近碰见了一个案例，一套3节点11.2.0.4 RAC，某应用只用节点1(FAILOVER other nodes)，如下是系统负载，某段时间，TPS是554.5，

AWR经常出现log file sync等待事件处于top5/10首位，单次平均等待的时间，11毫秒，

Oracle中有些等待事件，是非常常见的，例如log file sync，当我们从AWR看见他的时候，并不意味着系统有问题，但也不意味着系统没问题，但从上面信息看，log file sync一直处于首位，未免让人生疑。

我们先看下log file sync等待事件的解释，

当用户执行commit/rollback指令，redo信息需要从log buffer刷新到本地redo log file，进行持久化操作，这就是所谓的“日志优先写”，保证数据不丢的前提。日志刷新磁盘的操作，就是使用LGWR进程完成的。用户会话会等待log file sync这个事件，此时，等待LGWR进程返回确认所有redo改变向量已经写入本地磁盘文件中。

log file sync等待事件，可以分解为以下步骤，

1. Wakeup LGWR if idle 2. LGWR gathers the redo to be written and issue the I/O 3. Time for the log write I/O to complete 4. LGWR I/O post processing 5. LGWR posting the foreground/user session that the write has completed 6. Foreground/user session wakeup

如果log file sync等待事件时间久，就可以从上面分解步骤，寻找原因。MOS针对这些步骤，提出了一些优化的建议，2和3的数据收集于AWR的redo write time。3对应的就是log file parallel write等待事件。5和6随着系统负载的增加，可能会比较明显，这是因为，即使通知了前台进程，对于操作系统，也需要时间进行调度。

话说回来，通常来讲对于log file sync，常见的原因要么写日志慢，要么commit操作太频繁。

1. 从上面信息看，TPS是554.5，并不是非常高，并不能说明commit操作太频繁。

2. 再来看日志写入，log file sync对应前台进程的等待事件，相应的后台进程等待事件，就是log file parallel write，如果日志写入慢，会体现在log file parallel write单次响应时间慢，但从AWR来看，单次平均等待时间是2毫秒，因此不是因为日志写入慢，导致log file sync等待事件，

数据库版本是11.2.0.4，11g有些新特性值得关注，比如Adaptive Log File Sync，即自适应Log File Sync，有可能是导致log file sync的原因，根据MOS以及一些牛人案例，相关文章的描述，同时也是周五小y了解现象，第一时间的反应，什么是自适应Log File Sync？

首先我们看下，Log File Sync有什么可以适应。

为了确认commit操作完成与否，LGWR和前台进程需要通信，两种方法，

1. Post/wait 这是之前Oracle版本中传统的方法。LGWR写完log buffer的改变向量，就会立刻通知等待commit的进程。优点就是log file sync等待时间很短，相对来说，12c之前由于只有1个LGWR进程，因此LGWR进程负载可能比较重。 2. Polling 使用这种方法，LGWR完成写入，会进入休眠环节，timeout之后才会查看log buffer是否写入。换句话说，LGWR不单独通知每个等待commit的进程，一定意义上解放了LGWR高负载，随之带来的是log file sync时间变久。

Oracle会根据一些内部的算法，评估post/wait方法效率高，还是polling方法效率高，在高系统负载下，polling可能效率更高，因为post/wait模式下每次刷新磁盘，均需要和前台进程通信。如果系统负载低，post/wait方式效率会高些，相比polling会有更高的响应时间。Oracle依赖于内部统计数据，决定究竟使用何种方法。因为每次post/wait和polling之间的切换，会有些负载产生，因此需要机制，保证这种切换，不要过于频繁。

按照官方的解释，log file sync适应的就是post/wait和polling，这两种日志写入通知的机制。

但实际上，对于OLTP应用，polling机制有时是瓶颈，因为他会导致log file sync时间变久，对应的系统commit操作时间变久，影响系统吞吐量。这种现象和本案例相近，再看一些依据。

对于是否采用polling的机制，从v$sysstat视图可以看出统计数据，若结果为0，则表示未采用，若不为零，则表示采用了。

SQL> select name,value from v$sysstat where name in ('redo synch poll writes','redo synch polls');

AWR中也有这些统计数据，可以看出采样时间段，日志写入通知，采用了polling的机制，从数据看，非常频繁，

对于wait/post和polling每次切换，在LGWR的trace文件中均会记录，而且带有时间戳，如下就是切换为polling，

*** 2018-01-05 08:21:00.000 kcrfw_update_adaptive_sync_mode: post->poll long#=316 sync#=1719 sync=247657 poll=12382 rw=6191 rw+=13890 ack=0 min_sleep=10023 *** 2018-01-05 08:21:00.001 Log file sync switching to polling Current scheduling delay is 234 usec

如下则是切换为post/wait，

*** 2018-01-05 08:22:00.000 kcrfw_update_adaptive_sync_mode: poll->post current_sched_delay=0 switch_sched_delay=234 current_sync_count_delta=2275 switch_sync_count_delta=1719 *** 2018-01-05 08:22:00.000 Log file sync switching to post/wait Current approximate redo synch write rate is 758 per sec

对于本文案例，从trace中进行统计，发现log file sync出现明显的时间段，使用的就是polling机制，即使出现post/wait的记录，经过1分钟左右，就切换为了polling机制。

如果需要禁用log file sync自适应，可以设置_use_adaptive_log_file_sync这个隐藏参数为false，这个参数从11gR2版本引入，控制log file sync自适应切换功能，11.2.0.1和11.2.0.2版本中默认值则是false，从11.2.0.3开始默认值改为了true。这是一个动态的参数，可以在系统级进行变更，

ALTER SYSTEM SET "_use_adaptive_log_file_sync"= <FALSE/TRUE> scope=<both/spfile/memory>;

对于本案例来说，首先可以调整的，就是关闭log file sync自适应，而且按照小y所说，这个参数，对于比较繁忙的OLTP系统来说，也是11g下建议默认关闭。

根据MOS介绍，这种log file sync自适应按说会提高日志文件同步的整体性能。但同时这一个新特性，会有一些bug产生，MOS建议要么升级至11.2.0.4/12.1，要么patch，就是没说针对OLTP，建议默认关闭，毕竟作为11g新特性，官方就建议关闭，岂不是自己打脸？如下是自适应log file sync，已知和解决的bug，

Document 13707904.8 Bug 13707904 - LGWR sometimes uses polling, sometimes post/wait Document 13074706.8 Bug 13074706 - Long "log file sync" waits in RAC not correlated with slow writes Document 25178179.8 Bug 25178179 - Several sessions wait on 'log file sync' in a RAC environment

参考文献：

Adaptive Log File Sync Optimization (1541136.1)

Troubleshooting: 'Log file sync' Waits (1376916.1)

WAITEVENT: "log file sync" Reference Note (34592.1)

High Waits for 'Log File Sync': Known Issue Checklist for 11.2 (1548261.1)

Adaptive Switching Between Log Write Methods can Cause 'log file sync' Waits (1462942.1)

Bug 13074706 - Long "log file sync" waits in RAC not correlated with slow writes (13074706.8)

Bug 25178179 - Several sessions wait on 'log file sync' in a RAC environment (25178179.8)

"log file parallel write" Reference Note(34583.1)

本文参与腾讯云自媒体同步曝光计划，分享自作者个人站点/博客。

原始发表：2018年12月10日，如有侵权请联系 cloudcommunity@tencent.com 删除

http

oracle

本文分享自作者个人站点/博客前往查看

如有侵权，请联系 cloudcommunity@tencent.com 删除。

本文参与腾讯云自媒体同步曝光计划，欢迎热爱写作的你一起参与！

http

oracle

登录后参与评论

0 条评论

热度

自适应log file sync影响案例

自适应log file sync影响案例

社区

活动

资源

关于

腾讯云开发者

热门产品

热门推荐

更多推荐