现象:系统hang住,可以ping通,但ssh无响应
在导入数据的时候,服务器突然无反应,连接不上,基本上就判定死掉了,重启服务器后查看日志如下:
Mar 26 08:13:01 localhost kernel: INFO: task flush-8:0:26079 blocked for more than 120 seconds.
Mar 26 08:13:01 localhost kernel: Tainted: P --------------- 2.6.32-431.el6.x86_64 #1
Mar 26 08:13:01 localhost kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Mar 26 08:13:01 localhost kernel: flush-8:0 D 0000000000000001 0 26079 2 0x00000080
Mar 26 08:13:01 localhost kernel: ffff8804359118c0 0000000000000046 0000000000000000 ffff880436a5cb18
Mar 26 08:13:01 localhost kernel: ffff880435911860 ffffffff81068a53 ffffffffa0109300 ffff880436a5cb18
Mar 26 08:13:01 localhost kernel: ffff880436a5d098 ffff880435911fd8 000000000000fbc8 ffff880436a5d098
Mar 26 08:13:01 localhost kernel: Call Trace:
Mar 26 08:13:01 localhost kernel: [<ffffffff81068a53>] ? dequeue_entity+0x113/0x2e0
Mar 26 08:13:01 localhost kernel: [<ffffffffa0109300>] ? noalloc_get_block_write+0x0/0x60 [ext4]
linux会设置40%的可用内存用来做系统cache,当flush数据时这40%内存中的数据由于和IO同步问题导致超时(120s),所将40%减小到10%,避免超时。
修改内核参数:
# vim /etc/sysctrl.conf vm.dirty_background_ratio = 5 vm.dirty_ratio = 10 |
---|