当客户某个业务加载起来后,数据库便周期性宕机,让我们帮分析下原因,查看系统日志/var/log/message和dmesg 可以看到由于系统OOM killer机制kill掉了mysqld进程。
Aug 31 17:12:07 i-costdb01 kernel: zabbix_agentd invoked oom-killer: gfp_mask=0x2084d0, order=0, oom_score_adj=0
Aug 31 17:12:07 i-costdb01 kernel: zabbix_agentd cpuset=/ mems_allowed=0-3
Aug 31 17:12:07 i-costdb01 kernel: CPU: 13 PID: 19109 Comm: zabbix_agentd Not tainted 3.10.0-229.el7.x86_64 #1
Aug 31 17:12:07 i-costdb01 kernel: Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 09/21/2015
Aug 31 17:12:07 i-costdb01 kernel: ffff8803236b71c0 0000000095cabcbd ffff88081f523a28 ffffffff81604b0a
Aug 31 17:12:07 i-costdb01 kernel: ffff88081f523ab8 ffffffff815ffaaf ffff88081f523ac0 ffff880d98c284c8
Aug 31 17:12:07 i-costdb01 kernel: 0000000000000001 0000000000003e08 0000000000000010 ffffffff8193a320
Aug 31 17:12:07 i-costdb01 kernel: Call Trace:
Aug 31 17:12:07 i-costdb01 kernel: [<ffffffff81604b0a>] dump_stack+0x19/0x1b
Aug 31 17:12:07 i-costdb01 kernel: [<ffffffff815ffaaf>] dump_header+0x8e/0x214
Aug 31 17:12:07 i-costdb01 kernel: [<ffffffff8115a44e>] oom_kill_process+0x24e/0x3b0
Aug 31 17:12:07 i-costdb01 kernel: [<ffffffff81159fb6>] ? find_lock_task_mm+0x56/0xc0
Aug 31 17:12:07 i-costdb01 kernel: [<ffffffff8107bdde>] ? has_capability_noaudit+0x1e/0x30
Aug 31 17:12:07 i-costdb01 kernel: [<ffffffff8115ac76>] out_of_memory+0x4b6/0x4f0
Aug 31 17:12:07 i-costdb01 kernel: [<ffffffff81160e55>] __alloc_pages_nodemask+0xa95/0xb90
Aug 31 17:12:07 i-costdb01 kernel: [<ffffffff8119f4f9>] alloc_pages_current+0xa9/0x170
Aug 31 17:12:07 i-costdb01 kernel: [<ffffffff8105e4a7>] pte_alloc_one+0x17/0x40
Aug 31 17:12:07 i-costdb01 kernel: [<ffffffff8117f3c3>] __pte_alloc+0x23/0x170
Aug 31 17:12:07 i-costdb01 kernel: [<ffffffff81183202>] handle_mm_fault+0xc42/0xd70
Aug 31 17:12:07 i-costdb01 kernel: [<ffffffff8160fe06>] __do_page_fault+0x156/0x540
Aug 31 17:12:07 i-costdb01 kernel: [<ffffffff81112c08>] ? __call_rcu_nocb_enqueue+0xa8/0xc0
Aug 31 17:12:07 i-costdb01 kernel: [<ffffffff811e68ae>] ? mntput_no_expire+0x3e/0x120
Aug 31 17:12:07 i-costdb01 kernel: [<ffffffff811e69b4>] ? mntput+0x24/0x40
Aug 31 17:12:07 i-costdb01 kernel: [<ffffffff811c88e3>] ? __fput+0x183/0x270
Aug 31 17:12:07 i-costdb01 kernel: [<ffffffff8161020a>] do_page_fault+0x1a/0x70
Aug 31 17:12:07 i-costdb01 kernel: [<ffffffff8160cf65>] ? do_device_not_available+0x35/0x60
Aug 31 17:12:07 i-costdb01 kernel: [<ffffffff8160c408>] page_fault+0x28/0x30
Aug 31 17:12:07 i-costdb01 kernel: Mem-Info:
Aug 31 17:12:07 i-costdb01 kernel: Node 0 DMA per-cpu:
Aug 31 17:12:07 i-costdb01 kernel: CPU 0: hi: 0, btch: 1 usd: 0
Aug 31 17:12:07 i-costdb01 kernel: CPU 1: hi: 0, btch: 1 usd: 0
Aug 31 17:12:07 i-costdb01 kernel: CPU 2: hi: 0, btch: 1 usd: 0
Aug 31 17:12:07 i-costdb01 kernel: CPU 3: hi: 0, btch: 1 usd: 0
Aug 31 17:12:07 i-costdb01 kernel: CPU 4: hi: 0, btch: 1 usd: 0
Aug 31 17:12:07 i-costdb01 kernel: CPU 5: hi: 0, btch: 1 usd: 0
Aug 31 17:12:07 i-costdb01 kernel: CPU 6: hi: 0, btch: 1 usd: 0
Aug 31 17:12:07 i-costdb01 kernel: CPU 7: hi: 0, btch: 1 usd: 0
Aug 31 17:12:07 i-costdb01 kernel: CPU 8: hi: 0, btch: 1 usd: 0
Aug 31 17:12:07 i-costdb01 kernel: CPU 9: hi: 0, btch: 1 usd: 0
Aug 31 17:12:07 i-costdb01 kernel: CPU 10: hi: 0, btch: 1 usd: 0
Aug 31 17:12:07 i-costdb01 kernel: CPU 11: hi: 0, btch: 1 usd: 0
Aug 31 17:12:07 i-costdb01 kernel: CPU 12: hi: 0, btch: 1 usd: 0
Aug 31 17:12:07 i-costdb01 kernel: CPU 13: hi: 0, btch: 1 usd: 0
Aug 31 17:12:07 i-costdb01 kernel: CPU 14: hi: 0, btch: 1 usd: 0
Aug 31 17:12:07 i-costdb01 kernel: CPU 15: hi: 0, btch: 1 usd: 0
Aug 31 17:12:07 i-costdb01 kernel: Node 0 DMA32 per-cpu:
Aug 31 17:12:07 i-costdb01 kernel: CPU 0: hi: 186, btch: 31 usd: 1
Aug 31 17:12:07 i-costdb01 kernel: CPU 1: hi: 186, btch: 31 usd: 79
Aug 31 17:12:07 i-costdb01 kernel: CPU 2: hi: 186, btch: 31 usd: 183
Aug 31 17:12:07 i-costdb01 kernel: CPU 3: hi: 186, btch: 31 usd: 0
Aug 31 17:12:07 i-costdb01 kernel: CPU 4: hi: 186, btch: 31 usd: 0
Aug 31 17:12:07 i-costdb01 kernel: CPU 5: hi: 186, btch: 31 usd: 30
Aug 31 17:12:07 i-costdb01 kernel: CPU 6: hi: 186, btch: 31 usd: 0
Aug 31 17:12:07 i-costdb01 kernel: CPU 7: hi: 186, btch: 31 usd: 18
Aug 31 17:12:07 i-costdb01 kernel: CPU 8: hi: 186, btch: 31 usd: 6
Aug 31 17:12:07 i-costdb01 kernel: CPU 9: hi: 186, btch: 31 usd: 0
Aug 31 17:12:07 i-costdb01 kernel: CPU 10: hi: 186, btch: 31 usd: 30
Aug 31 17:12:07 i-costdb01 kernel: CPU 11: hi: 186, btch: 31 usd: 6
Aug 31 17:12:07 i-costdb01 kernel: CPU 12: hi: 186, btch: 31 usd: 112
Aug 31 17:12:07 i-costdb01 kernel: CPU 13: hi: 186, btch: 31 usd: 0
Aug 31 17:12:07 i-costdb01 kernel: CPU 14: hi: 186, btch: 31 usd: 29
Aug 31 17:12:07 i-costdb01 kernel: CPU 15: hi: 186, btch: 31 usd: 9
Aug 31 17:12:07 i-costdb01 kernel: Node 0 Normal per-cpu:
Aug 31 17:12:07 i-costdb01 kernel: CPU 0: hi: 186, btch: 31 usd: 179
Aug 31 17:12:07 i-costdb01 kernel: CPU 1: hi: 186, btch: 31 usd: 184
Aug 31 17:12:07 i-costdb01 kernel: CPU 2: hi: 186, btch: 31 usd: 170
Aug 31 17:12:07 i-costdb01 kernel: CPU 3: hi: 186, btch: 31 usd: 52
Aug 31 17:12:07 i-costdb01 kernel: CPU 4: hi: 186, btch: 31 usd: 0
Aug 31 17:12:07 i-costdb01 kernel: CPU 5: hi: 186, btch: 31 usd: 0
Aug 31 17:12:07 i-costdb01 kernel: CPU 6: hi: 186, btch: 31 usd: 0
Aug 31 17:12:07 i-costdb01 kernel: CPU 7: hi: 186, btch: 31 usd: 54
Aug 31 17:12:07 i-costdb01 kernel: CPU 8: hi: 186, btch: 31 usd: 182
Aug 31 17:12:07 i-costdb01 kernel: CPU 9: hi: 186, btch: 31 usd: 0
Aug 31 17:12:07 i-costdb01 kernel: CPU 10: hi: 186, btch: 31 usd: 0
Aug 31 17:12:07 i-costdb01 kernel: CPU 11: hi: 186, btch: 31 usd: 77
Aug 31 17:12:07 i-costdb01 kernel: CPU 12: hi: 186, btch: 31 usd: 0
Aug 31 17:12:07 i-costdb01 kernel: CPU 13: hi: 186, btch: 31 usd: 0
Aug 31 17:12:07 i-costdb01 kernel: CPU 14: hi: 186, btch: 31 usd: 169
Aug 31 17:12:07 i-costdb01 kernel: CPU 15: hi: 186, btch: 31 usd: 70
Aug 31 17:12:07 i-costdb01 kernel: Node 1 Normal per-cpu:
Aug 31 17:12:07 i-costdb01 kernel: CPU 0: hi: 186, btch: 31 usd: 0
Aug 31 17:12:07 i-costdb01 kernel: CPU 1: hi: 186, btch: 31 usd: 0
Aug 31 17:12:07 i-costdb01 kernel: CPU 2: hi: 186, btch: 31 usd: 0
Aug 31 17:12:07 i-costdb01 kernel: CPU 3: hi: 186, btch: 31 usd: 0
Aug 31 17:12:07 i-costdb01 kernel: CPU 4: hi: 186, btch: 31 usd: 167
Aug 31 17:12:07 i-costdb01 kernel: CPU 5: hi: 186, btch: 31 usd: 7
Aug 31 17:12:07 i-costdb01 kernel: CPU 6: hi: 186, btch: 31 usd: 27
Aug 31 17:12:07 i-costdb01 kernel: CPU 7: hi: 186, btch: 31 usd: 155
Aug 31 17:12:07 i-costdb01 kernel: CPU 8: hi: 186, btch: 31 usd: 0
Aug 31 17:12:07 i-costdb01 kernel: CPU 9: hi: 186, btch: 31 usd: 0
Aug 31 17:12:07 i-costdb01 kernel: CPU 10: hi: 186, btch: 31 usd: 0
Aug 31 17:12:07 i-costdb01 kernel: CPU 11: hi: 186, btch: 31 usd: 0
Aug 31 17:12:07 i-costdb01 kernel: CPU 12: hi: 186, btch: 31 usd: 0
Aug 31 17:12:07 i-costdb01 kernel: CPU 13: hi: 186, btch: 31 usd: 0
Aug 31 17:12:07 i-costdb01 kernel: CPU 14: hi: 186, btch: 31 usd: 0
Aug 31 17:12:07 i-costdb01 kernel: CPU 15: hi: 186, btch: 31 usd: 0
Aug 31 17:12:07 i-costdb01 kernel: Node 2 Normal per-cpu:
Aug 31 17:12:07 i-costdb01 kernel: CPU 0: hi: 186, btch: 31 usd: 0
Aug 31 17:12:07 i-costdb01 kernel: CPU 1: hi: 186, btch: 31 usd: 122
Aug 31 17:12:07 i-costdb01 kernel: CPU 2: hi: 186, btch: 31 usd: 0
Aug 31 17:12:07 i-costdb01 kernel: CPU 3: hi: 186, btch: 31 usd: 0
Aug 31 17:12:07 i-costdb01 kernel: CPU 4: hi: 186, btch: 31 usd: 0
Aug 31 17:12:07 i-costdb01 kernel: CPU 5: hi: 186, btch: 31 usd: 15
Aug 31 17:12:07 i-costdb01 kernel: CPU 6: hi: 186, btch: 31 usd: 6
Aug 31 17:12:07 i-costdb01 kernel: CPU 7: hi: 186, btch: 31 usd: 170
Aug 31 17:12:07 i-costdb01 kernel: CPU 8: hi: 186, btch: 31 usd: 39
Aug 31 17:12:07 i-costdb01 kernel: CPU 9: hi: 186, btch: 31 usd: 161
Aug 31 17:12:07 i-costdb01 kernel: CPU 10: hi: 186, btch: 31 usd: 30
Aug 31 17:12:07 i-costdb01 kernel: CPU 11: hi: 186, btch: 31 usd: 170
Aug 31 17:12:07 i-costdb01 kernel: CPU 12: hi: 186, btch: 31 usd: 0
Aug 31 17:12:07 i-costdb01 kernel: CPU 13: hi: 186, btch: 31 usd: 0
Aug 31 17:12:07 i-costdb01 kernel: CPU 14: hi: 186, btch: 31 usd: 9
Aug 31 17:12:07 i-costdb01 kernel: CPU 15: hi: 186, btch: 31 usd: 0
Aug 31 17:12:07 i-costdb01 kernel: Node 3 Normal per-cpu:
Aug 31 17:12:07 i-costdb01 kernel: CPU 0: hi: 186, btch: 31 usd: 0
Aug 31 17:12:07 i-costdb01 kernel: CPU 1: hi: 186, btch: 31 usd: 0
Aug 31 17:12:07 i-costdb01 kernel: CPU 2: hi: 186, btch: 31 usd: 0
Aug 31 17:12:07 i-costdb01 kernel: CPU 3: hi: 186, btch: 31 usd: 0
Aug 31 17:12:07 i-costdb01 kernel: CPU 4: hi: 186, btch: 31 usd: 0
Aug 31 17:12:07 i-costdb01 kernel: CPU 5: hi: 186, btch: 31 usd: 0
Aug 31 17:12:07 i-costdb01 kernel: CPU 6: hi: 186, btch: 31 usd: 0
Aug 31 17:12:07 i-costdb01 kernel: CPU 7: hi: 186, btch: 31 usd: 0
Aug 31 17:12:07 i-costdb01 kernel: CPU 8: hi: 186, btch: 31 usd: 20
Aug 31 17:12:07 i-costdb01 kernel: CPU 9: hi: 186, btch: 31 usd: 0
Aug 31 17:12:07 i-costdb01 kernel: CPU 10: hi: 186, btch: 31 usd: 133
Aug 31 17:12:07 i-costdb01 kernel: CPU 11: hi: 186, btch: 31 usd: 34
Aug 31 17:12:07 i-costdb01 kernel: CPU 12: hi: 186, btch: 31 usd: 177
Aug 31 17:12:07 i-costdb01 kernel: CPU 13: hi: 186, btch: 31 usd: 0
Aug 31 17:12:07 i-costdb01 kernel: CPU 14: hi: 186, btch: 31 usd: 80
Aug 31 17:12:07 i-costdb01 kernel: CPU 15: hi: 186, btch: 31 usd: 156
Aug 31 17:12:07 i-costdb01 kernel: active_anon:4665536 inactive_anon:849442 isolated_anon:0
active_file:0 inactive_file:0 isolated_file:34
unevictable:0 dirty:0 writeback:0 unstable:0
free:52771 slab_reclaimable:27583 slab_unreclaimable:16950
mapped:2465 shmem:4129 pagetables:16210 bounce:0
free_cma:0
Aug 31 17:12:07 i-costdb01 kernel: Node 0 DMA free:15860kB min:32kB low:40kB high:48kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15992kB managed:15908kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:16kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
Aug 31 17:12:07 i-costdb01 kernel: lowmem_reserve[]: 0 2813 15868 15868
Aug 31 17:12:07 i-costdb01 kernel: Node 0 DMA32 free:59536kB min:5916kB low:7392kB high:8872kB active_anon:2199436kB inactive_anon:596172kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:3129280kB managed:2882128kB mlocked:0kB dirty:0kB writeback:0kB mapped:3180kB shmem:4704kB slab_reclaimable:8340kB slab_unreclaimable:2740kB kernel_stack:720kB pagetables:5640kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:256 all_unreclaimable? yes
Aug 31 17:12:07 i-costdb01 kernel: lowmem_reserve[]: 0 0 13055 13055
Aug 31 17:12:07 i-costdb01 kernel: Node 0 Normal free:30480kB min:27456kB low:34320kB high:41184kB active_anon:7853792kB inactive_anon:996100kB active_file:0kB inactive_file:2132kB unevictable:0kB isolated(anon):0kB isolated(file):128kB present:13631488kB managed:13368432kB mlocked:0kB dirty:0kB writeback:0kB mapped:4440kB shmem:6644kB slab_reclaimable:31124kB slab_unreclaimable:19244kB kernel_stack:4736kB pagetables:19048kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:3421 all_unreclaimable? yes
Aug 31 17:12:07 i-costdb01 kernel: lowmem_reserve[]: 0 0 0 0
Aug 31 17:12:07 i-costdb01 kernel: Node 1 Normal free:36060kB min:33916kB low:42392kB high:50872kB active_anon:1572808kB inactive_anon:524416kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:16777216kB managed:16514364kB mlocked:0kB dirty:0kB writeback:0kB mapped:300kB shmem:416kB slab_reclaimable:22744kB slab_unreclaimable:14516kB kernel_stack:2608kB pagetables:14620kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
Aug 31 17:12:07 i-costdb01 kernel: lowmem_reserve[]: 0 0 0 0
Aug 31 17:12:07 i-costdb01 kernel: Node 2 Normal free:33736kB min:33916kB low:42392kB high:50872kB active_anon:3791368kB inactive_anon:631984kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):8kB present:16777216kB managed:16514364kB mlocked:0kB dirty:0kB writeback:0kB mapped:876kB shmem:1892kB slab_reclaimable:24560kB slab_unreclaimable:16512kB kernel_stack:3072kB pagetables:13572kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:32 all_unreclaimable? yes
Aug 31 17:12:07 i-costdb01 kernel: lowmem_reserve[]: 0 0 0 0
Aug 31 17:12:07 i-costdb01 kernel: Node 3 Normal free:35412kB min:33916kB low:42392kB high:50872kB active_anon:3244740kB inactive_anon:649096kB active_file:0kB inactive_file:8kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:16777216kB managed:16513684kB mlocked:0kB dirty:0kB writeback:0kB mapped:1064kB shmem:2860kB slab_reclaimable:23564kB slab_unreclaimable:14772kB kernel_stack:2592kB pagetables:11960kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:118 all_unreclaimable? yes
Aug 31 17:12:07 i-costdb01 kernel: lowmem_reserve[]: 0 0 0 0
Aug 31 17:12:07 i-costdb01 kernel: Node 0 DMA: 1*4kB (U) 0*8kB 1*16kB (U) 1*32kB (U) 1*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB (U) 1*2048kB (R) 3*4096kB (M) = 15860kB
Aug 31 17:12:07 i-costdb01 kernel: Node 0 DMA32: 270*4kB (UEM) 167*8kB (UEM) 121*16kB (UEM) 96*32kB (UEM) 61*64kB (UEM) 20*128kB (UEM) 6*256kB (UE) 7*512kB (UE) 38*1024kB (UEM) 1*2048kB (M) 0*4096kB = 59968kB
Aug 31 17:12:07 i-costdb01 kernel: Node 0 Normal: 1074*4kB (UEM) 664*8kB (UEM) 194*16kB (UEM) 162*32kB (UEM) 76*64kB (UEM) 25*128kB (UEM) 20*256kB (UEM) 2*512kB (M) 0*1024kB 0*2048kB 0*4096kB = 32104kB
Aug 31 17:12:07 i-costdb01 kernel: Node 1 Normal: 357*4kB (UEM) 410*8kB (UEM) 364*16kB (UEM) 187*32kB (UEM) 139*64kB (UEM) 45*128kB (UEM) 13*256kB (EM) 2*512kB (M) 0*1024kB 0*2048kB 0*4096kB = 35524kB
Aug 31 17:12:07 i-costdb01 kernel: Node 2 Normal: 1505*4kB (UEM) 625*8kB (UEM) 684*16kB (UEM) 313*32kB (UEM) 48*64kB (UM) 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 35052kB
Aug 31 17:12:07 i-costdb01 kernel: Node 3 Normal: 8748*4kB (UEM) 18*8kB (U) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 35136kB
Aug 31 17:12:07 i-costdb01 kernel: Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
Aug 31 17:12:07 i-costdb01 kernel: Node 1 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
Aug 31 17:12:07 i-costdb01 kernel: Node 2 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
Aug 31 17:12:07 i-costdb01 kernel: Node 3 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
Aug 31 17:12:07 i-costdb01 kernel: 17313 total pagecache pages
Aug 31 17:12:07 i-costdb01 kernel: 10892 pages in swap cache
Aug 31 17:12:07 i-costdb01 kernel: Swap cache stats: add 19565051, delete 19554159, find 186056121/186530708
Aug 31 17:12:07 i-costdb01 kernel: Free swap = 0kB
Aug 31 17:12:07 i-costdb01 kernel: Total swap = 2097148kB
Aug 31 17:12:07 i-costdb01 kernel: 16777102 pages RAM
Aug 31 17:12:07 i-costdb01 kernel: 0 pages HighMem/MovableOnly
Aug 31 17:12:07 i-costdb01 kernel: 324882 pages reserved
Aug 31 17:12:07 i-costdb01 kernel: [ pid ] uid tgid total_vm rss nr_ptes swapents oom_score_adj name
Aug 31 17:12:07 i-costdb01 kernel: [ 823] 0 823 29186 39 26 84 -1000 auditd
Aug 31 17:12:07 i-costdb01 kernel: [ 847] 0 847 108299 212 58 236 0 NetworkManager
Aug 31 17:12:07 i-costdb01 kernel: [ 848] 0 848 4825 38 13 44 0 irqbalance
Aug 31 17:12:07 i-costdb01 kernel: [ 852] 0 852 137549 129 86 2508 0 tuned
Aug 31 17:12:07 i-costdb01 kernel: [ 856] 81 856 6686 91 19 55 -900 dbus-daemon
Aug 31 17:12:07 i-costdb01 kernel: [ 858] 0 858 31595 24 17 139 0 crond
Aug 31 17:12:07 i-costdb01 kernel: [ 862] 38 862 6296 36 16 110 0 ntpd
Aug 31 17:12:07 i-costdb01 kernel: [ 876] 999 876 128550 126 46 1796 0 polkitd
Aug 31 17:12:07 i-costdb01 kernel: [ 1033] 0 1033 47966 312 46 540 0 vmtoolsd
Aug 31 17:12:07 i-costdb01 kernel: [ 1059] 0 1059 15203 0 33 742 0 VGAuthService
Aug 31 17:12:07 i-costdb01 kernel: [ 1260] 0 1260 20631 23 42 190 -1000 sshd
Aug 31 17:12:07 i-costdb01 kernel: [ 1767] 0 1767 22775 18 43 238 0 master
Aug 31 17:12:07 i-costdb01 kernel: [ 1791] 89 1791 22845 20 45 243 0 qmgr
Aug 31 17:12:07 i-costdb01 kernel: [ 2741] 0 2741 28318 38 12 50 0 mysqld_safe
Aug 31 17:12:07 i-costdb01 kernel: [ 6604] 0 6604 27512 1 11 32 0 agetty
Aug 31 17:12:07 i-costdb01 kernel: [11776] 998 11776 20821 56 38 144 0 zabbix_agentd
Aug 31 17:12:07 i-costdb01 kernel: [11777] 998 11777 20821 307 38 129 0 zabbix_agentd
Aug 31 17:12:07 i-costdb01 kernel: [11778] 998 11778 20851 88 40 148 0 zabbix_agentd
Aug 31 17:12:07 i-costdb01 kernel: [11779] 998 11779 20851 91 40 145 0 zabbix_agentd
Aug 31 17:12:07 i-costdb01 kernel: [11781] 998 11781 20851 108 40 148 0 zabbix_agentd
Aug 31 17:12:07 i-costdb01 kernel: [11782] 998 11782 20853 99 40 151 0 zabbix_agentd
Aug 31 17:12:07 i-costdb01 kernel: [11127] 0 11127 96442 399 111 146 0 rsyslogd
Aug 31 17:12:07 i-costdb01 kernel: [11708] 0 11708 10607 0 25 70 0 lvmetad
Aug 31 17:12:07 i-costdb01 kernel: [11900] 0 11900 113303 48 168 633 0 smbd
Aug 31 17:12:07 i-costdb01 kernel: [11902] 0 11902 112311 10 160 645 0 smbd-notifyd
Aug 31 17:12:07 i-costdb01 kernel: [11903] 0 11903 112426 14 159 641 0 cleanupd
Aug 31 17:12:07 i-costdb01 kernel: [11904] 0 11904 113299 39 159 641 0 lpqd
Aug 31 17:12:07 i-costdb01 kernel: [16045] 0 16045 35954 51 72 268 0 sshd
Aug 31 17:12:07 i-costdb01 kernel: [18609] 0 18609 35954 54 72 265 0 sshd
Aug 31 17:12:07 i-costdb01 kernel: [30206] 0 30206 35954 55 71 267 0 sshd
Aug 31 17:12:07 i-costdb01 kernel: [11295] 0 11295 35954 53 72 266 0 sshd
Aug 31 17:12:07 i-costdb01 kernel: [14206] 0 14206 35954 51 72 268 0 sshd
Aug 31 17:12:07 i-costdb01 kernel: [ 6508] 0 6508 35954 109 71 208 0 sshd
Aug 31 17:12:07 i-costdb01 kernel: [ 6541] 1002 6541 35989 128 67 198 0 sshd
Aug 31 17:12:07 i-costdb01 kernel: [ 6542] 1002 6542 28849 1 13 95 0 bash
Aug 31 17:12:07 i-costdb01 kernel: [ 6579] 1002 6579 37084 202 27 56 0 top
Aug 31 17:12:07 i-costdb01 kernel: [ 7776] 0 7776 35991 138 73 203 0 sshd
Aug 31 17:12:07 i-costdb01 kernel: [ 7782] 0 7782 28882 82 14 43 0 bash
Aug 31 17:12:07 i-costdb01 kernel: [ 7945] 0 7945 26987 1 9 25 0 tail
Aug 31 17:12:07 i-costdb01 kernel: [ 7946] 0 7946 28165 18 11 28 0 grep
Aug 31 17:12:07 i-costdb01 kernel: [ 8012] 0 8012 26987 0 10 27 0 tail
Aug 31 17:12:07 i-costdb01 kernel: [ 8013] 0 8013 28164 0 12 36 0 grep
Aug 31 17:12:07 i-costdb01 kernel: [ 8022] 0 8022 26987 20 10 3 0 tail
Aug 31 17:12:07 i-costdb01 kernel: [ 8023] 0 8023 28164 25 11 8 0 grep
Aug 31 17:12:07 i-costdb01 kernel: [ 8036] 0 8036 26987 0 10 28 0 tail
Aug 31 17:12:07 i-costdb01 kernel: [ 8037] 0 8037 28165 0 12 47 0 grep
Aug 31 17:12:07 i-costdb01 kernel: [ 8195] 0 8195 26987 22 10 2 0 tail
Aug 31 17:12:07 i-costdb01 kernel: [ 8196] 0 8196 28164 17 12 17 0 grep
Aug 31 17:12:07 i-costdb01 kernel: [ 8223] 0 8223 26987 0 10 27 0 tail
Aug 31 17:12:07 i-costdb01 kernel: [ 8224] 0 8224 28165 0 11 47 0 grep
Aug 31 17:12:07 i-costdb01 kernel: [ 8254] 0 8254 26987 2 10 25 0 tail
Aug 31 17:12:07 i-costdb01 kernel: [ 8255] 0 8255 28165 34 12 18 0 grep
Aug 31 17:12:07 i-costdb01 kernel: [ 8472] 0 8472 26987 1 10 25 0 tail
Aug 31 17:12:07 i-costdb01 kernel: [ 8473] 0 8473 28165 1 11 51 0 grep
Aug 31 17:12:07 i-costdb01 kernel: [ 8533] 1001 8533 14681928 5060592 11924 476957 0 mysqld
Aug 31 17:12:07 i-costdb01 kernel: [ 8553] 0 8553 26987 0 10 28 0 tail
Aug 31 17:12:07 i-costdb01 kernel: [ 8554] 0 8554 28165 0 12 51 0 grep
Aug 31 17:12:07 i-costdb01 kernel: [ 8774] 0 8774 26987 24 10 0 0 tail
Aug 31 17:12:07 i-costdb01 kernel: [ 8775] 0 8775 28165 32 11 2 0 grep
Aug 31 17:12:07 i-costdb01 kernel: [ 8776] 0 8776 28165 0 11 34 0 grep
Aug 31 17:12:07 i-costdb01 kernel: [ 8902] 0 8902 26987 23 10 0 0 tail
Aug 31 17:12:07 i-costdb01 kernel: [ 8903] 0 8903 28165 0 12 34 0 grep
Aug 31 17:12:07 i-costdb01 kernel: [ 8904] 0 8904 28165 32 12 0 0 grep
Aug 31 17:12:07 i-costdb01 kernel: [ 9124] 0 9124 26987 24 10 0 0 tail
Aug 31 17:12:07 i-costdb01 kernel: [ 9125] 0 9125 28165 34 12 0 0 grep
Aug 31 17:12:07 i-costdb01 kernel: [ 9126] 0 9126 28165 28 12 4 0 grep
Aug 31 17:12:07 i-costdb01 kernel: [ 9153] 0 9153 26987 18 10 6 0 tail
Aug 31 17:12:07 i-costdb01 kernel: [ 9154] 0 9154 28165 27 11 7 0 grep
Aug 31 17:12:07 i-costdb01 kernel: [ 9155] 0 9155 28165 28 12 4 0 grep
Aug 31 17:12:07 i-costdb01 kernel: [ 9856] 0 9856 26987 21 10 3 0 tail
Aug 31 17:12:07 i-costdb01 kernel: [10327] 0 10327 26987 23 10 0 0 tail
Aug 31 17:12:07 i-costdb01 kernel: [10328] 0 10328 28165 0 11 35 0 grep
Aug 31 17:12:07 i-costdb01 kernel: [13669] 0 13669 57879 3284 67 2224 0 perl
Aug 31 17:12:07 i-costdb01 kernel: [16415] 0 16415 35954 305 72 21 0 sshd
Aug 31 17:12:07 i-costdb01 kernel: [16417] 0 16417 28882 123 13 0 0 bash
Aug 31 17:12:07 i-costdb01 kernel: [30333] 0 30333 480292 428471 894 559 0 perl
Aug 31 17:12:07 i-costdb01 kernel: [31622] 0 31622 35954 324 71 0 0 sshd
Aug 31 17:12:07 i-costdb01 kernel: [31625] 0 31625 28882 134 12 0 0 bash
Aug 31 17:12:07 i-costdb01 kernel: [10772] 0 10772 35954 318 72 0 0 sshd
Aug 31 17:12:07 i-costdb01 kernel: [12473] 89 12473 22940 249 45 3 0 pickup
Aug 31 17:12:07 i-costdb01 kernel: [12570] 0 12570 35954 318 69 0 0 sshd
Aug 31 17:12:07 i-costdb01 kernel: [15040] 0 15040 35954 315 71 0 0 sshd
Aug 31 17:12:07 i-costdb01 kernel: [15041] 0 15041 35954 319 74 0 0 sshd
Aug 31 17:12:07 i-costdb01 kernel: [15792] 0 15792 26987 23 9 0 0 tail
Aug 31 17:12:07 i-costdb01 kernel: [18044] 0 18044 35954 317 71 0 0 sshd
Aug 31 17:12:07 i-costdb01 kernel: [19058] 0 19058 26976 35 9 0 0 tailf
Aug 31 17:12:07 i-costdb01 kernel: [19106] 998 19106 20851 139 41 107 0 zabbix_agentd
Aug 31 17:12:07 i-costdb01 kernel: [19107] 998 19107 20851 128 41 108 0 zabbix_agentd
Aug 31 17:12:07 i-costdb01 kernel: [19109] 998 19109 20851 141 40 107 0 zabbix_agentd
Aug 31 17:12:07 i-costdb01 kernel: Out of memory: Kill process 8533 (mysqld) score 326 or sacrifice child
Aug 31 17:12:07 i-costdb01 kernel: Killed process 8533 (mysqld) total-vm:58727712kB, anon-rss:20242368kB, file-rss:0kB
操作系统一旦发现内存紧张,系统会通过三种方式回收内存。这三种方式分别是 :
前两种方式,缓存回收和 Swap 回收,实际上都是基于 LRU 算法,也就是优先回收不常访问的内存。LRU 回收算法,实际上维护着 active 和 inactive 两个双向链表,其中:
越接近链表尾部,就表示内存页越不常访问。这样,在回收内存时,系统就可以根据活跃程度,优先回收不活跃的内存。
活跃和非活跃的内存页,按照类型的不同,又分别分为文件页和匿名页,对应着缓存回收和 Swap 回收。
当然,你可以从 /proc/meminfo 中,查询它们的大小,比如:
# grep 表示只保留包含 active 的指标(忽略大小写)
# sort 表示按照字母顺序排序
$ cat /proc/meminfo | grep -i active | sort
Active: 1065612 kB
Active(anon): 976800 kB
Active(file): 88812 kB
Inactive: 302580 kB
Inactive(anon): 216340 kB
Inactive(file): 86240 kB
第三种方式,OOM 机制按照 oom_score 给进程排序。oom_score 越大,进程就越容易被系统杀死。
当系统发现内存不足以分配新的内存请求时,就会尝试直接内存回收(内存回收,也就是系统释放掉可以回收的内存,比如缓存和缓冲区,就属于可回收内存。它们在内存管理中,通常被叫做文件页(File-backed Page)。
)。这种情况下,如果回收完文件页和匿名页后,内存够用了,当然皆大欢喜,把回收回来的内存分配给进程就可以了。但如果内存还是不足,OOM 就要登场了。
OOM 发生时,你可以在 dmesg 中看到 Out of memory 的信息,从而知道是哪些进程被 OOM 杀死了。比如,你可以执行下面的命令,查询 OOM 日志:
[root@mysql ~]# dmesg | grep -i "Out of memory"
[185239.423934] Out of memory: Kill process 11237 (mysqld) score 434 or sacrifice child
[700202.981796] Out of memory: Kill process 946 (mysqld) score 387 or sacrifice child
[797118.005761] Out of memory: Kill process 28762 (mysqld) score 345 or sacrifice child
[1163188.900843] Out of memory: Kill process 15553 (mysqld) score 306 or sacrifice child
[1488860.015830] Out of memory: Kill process 3997 (mysqld) score 505 or sacrifice child
[1489375.579757] Out of memory: Kill process 28573 (mysqld) score 498 or sacrifice child
[1489713.775145] Out of memory: Kill process 3164 (mysqld) score 496 or sacrifice child
当然了,如果你不希望mysqld被 OOM 杀死,可以调整进程的 oom_score_adj,减小 OOM 分值,进而降低被杀死的概率。或者,你还可以开启内存的 overcommit,允许进程申请超过物理内存的虚拟内存(这儿实际上假设的是,进程不会用光申请到的虚拟内存)。
[root@VM-0-3-centos ~]# vmstat 5 222222222
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
r b swpd free buff cache si so bi bo in cs us sy id wa st
5 0 0 87352 11024 204368 11 14 536 157 1 24 89 11 0 0 0
2 0 0 85504 11032 204524 0 0 30 6 2627 7073 93 7 0 0 0
9 0 0 85512 11032 204524 0 0 0 1 2855 7433 94 6 0 0 0
7 0 0 114896 11040 204648 0 0 0 4 2081 6097 95 5 0 0 0
3 0 0 111880 11040 207204 0 0 510 52 1586 5083 94 6 0 0 0
4 0 0 81884 11048 207156 0 0 0 43 1643 4974 93 7 0 0 0
2 0 0 81884 11056 207156 0 0 0 8 1705 5221 92 8 0 0 0
38 3 0 72544 5984 219064 0 0 24218 4314 2960 10408 77 23 0 0 0
39 3 0 76824 5800 173512 0 0 26400 5916 4376 16988 66 34 0 0 0
33 3 0 78048 7004 161204 0 0 16531 6535 4357 19180 66 34 0 0 0
36 5 0 63724 4360 172656 0 0 28257 14726 4425 18518 61 39 0 0 0
21 4 0 63732 7900 202984 0 0 26574 14319 4293 17321 62 38 0 0 0
37 4 0 79132 7856 164192 0 0 10734 15306 4569 16690 65 35 0 0 0
35 5 0 69996 6008 152200 0 0 17387 13890 4566 17896 65 35 0 0 0
37 3 0 75020 8268 149384 0 0 5831 11859 4129 16746 63 37 0 0 0
63 4 0 74836 3036 135776 0 0 11779 9560 3915 18354 67 33 0 0 0
15 3 0 82808 6092 150444 0 0 6086 8590 3335 16407 72 28 0 0 0
67 2 0 73816 5664 149204 0 0 5384 9675 3333 15355 72 28 0 0 0
20 2 0 73568 3548 122160 0 0 6340 10139 3579 16906 70 30 0 0 0
68 5 0 70604 4796 117328 0 0 2534 7565 3627 18239 68 32 0 0 0
36 1 0 73348 3692 110336 0 0 10324 7895 3577 17643 69 31 0 0 0
31 2 0 72972 7168 138604 0 0 5726 8675 3019 14137 73 27 0 0 0
12 3 0 68408 2596 136864 0 0 16776 8878 3132 14300 72 28 0 0 0
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
r b swpd free buff cache si so bi bo in cs us sy id wa st
67 6 0 67252 1708 117944 0 0 16225 9900 3557 16447 69 31 0 0 0
64 2 0 74944 4364 103108 0 0 3207 8523 3588 17707 72 28 0 0 0
68 2 0 70112 4612 101772 0 0 3063 8500 3353 16530 70 30 0 0 0
70 2 0 81144 728 75528 0 0 18962 8661 3472 17181 67 33 0 0 0
33 1 0 72956 3728 113452 0 0 17182 10196 3199 15874 69 31 0 0 0
23 2 0 75236 1368 75688 0 0 4174 10606 3332 16063 70 30 0 0 0
62 3 0 69448 2556 68232 0 0 8689 13682 3629 17687 68 32 0 0 0
69 1 0 80308 1160 65140 0 0 2158 13900 3834 19480 67 33 0 0 0
71 3 0 93228 3468 85180 0 0 3430 7704 2928 13719 68 32 0 0 0
67 3 0 67200 2896 109156 0 0 17289 8297 3195 13607 73 27 0 0 0
59 4 0 74608 4008 88748 0 0 2502 9206 3513 16283 72 28 0 0 0
7 2 0 66460 432 85196 0 0 18922 10685 3451 15834 70 30 0 0 0
18 8 0 63484 208 65656 0 0 39850 9111 3507 17503 69 31 0 0 0
16 7 0 60900 544 68712 0 0 42138 8307 3217 15919 70 30 0 0 0
10 4 0 52352 564 73196 0 0 103578 9233 3111 14667 59 41 0 0 0
36 72 0 52404 140 65592 0 0 102027 8888 3317 15019 63 37 0 0 0
3 0 0 651888 7440 426096 0 0 73366 321 2472 6545 72 28 0 0 0
4 0 0 644608 7572 434952 0 0 1793 0 1921 5710 90 10 0 0 0
6 0 0 644636 7580 434956 0 0 0 10 1950 5628 92 8 0 0 0
82 0 0 643200 7588 434956 0 0 2 0 1881 5621 92 8 0 0 0
使用如下脚本监控mysqld内存使用
#!/bin/bash
step=5 #间隔的秒数,不能大于60
for (( i = 0; i < 60; i=(i+step) )); do
grep Pss /proc/12085/smaps | awk '{total+=$2}; END {printf "%d kB\n", total }' >> mysqld_mem.
log
sleep $step
done
exit 0
脚本输出如下
835712 kB
859214 kB
870558 kB
879699 kB
883517 kB
898319 kB
914754 kB
925866 kB
938948 kB
绘制走势图(如果有监控就更好了)
图二如下,只看free、buffer、cache走势
从上面得知,mysqld进程占用内存逐渐增大,导致OOM发生,但是现实的场景中不都是由于业务增大占用内存多导致的内存占用升高,还有一种可能是mysql bug导致无限制的在操作系统申请内存,并且不释放,从而发生memory leak,也就是内存泄漏,那么如果检测是否有内存泄漏发生呢?
我这里给出了两种办法
memleak 是 bcc 软件包中的一个工具,关于如何使用可以网上查找资料另行测试,这里不做过多介绍。
# -a 表示显示每个内存分配请求的大小以及地址
# -p 指定案例应用的 PID 号
$ /usr/share/bcc/tools/memleak -a -p $(pidof app)
WARNING: Couldn't find .text section in /app
WARNING: BCC can't handle sym look ups for /app
addr = 7f8f704732b0 size = 8192
addr = 7f8f704772d0 size = 8192
addr = 7f8f704712a0 size = 8192
addr = 7f8f704752c0 size = 8192
32768 bytes in 4 allocations from stack
[unknown] [app]
[unknown] [app]
start_thread+0xdb [libpthread-2.27.so]
查看mysql内存组件真实大小适用于MySQL8.0版本
select event_name,
sum(cast(replace(current_alloc,'GiB','') as decimal(10,2)))*1024 as allocated_MB
from sys.memory_global_by_current_bytes as a
where current_alloc like '%GiB%' group by event_name
union all
select event_name,
sum(cast(replace(current_alloc,'MiB','') as decimal(10,2))) as allocated_MB
from sys.memory_global_by_current_bytes as b
where current_alloc like '%MiB%' group by event_name
union all
select event_name,
sum(cast(replace(current_alloc,'KiB','') as decimal(10,2)))/ 1024 as allocated_MB
from sys.memory_global_by_current_bytes as c
where current_alloc like '%KiB%' group by event_name
union all
select event_name,
sum(cast(replace(current_alloc,'bytes','') as decimal(10,2)))/ 1024/1024 as allocated_MB
from sys.memory_global_by_current_bytes as d
where current_alloc like '%bytes%' group by event_name
关于oom的一些排查思路你学废了吗!!~
更多文章欢迎关注本人公众号,搜dbachongzi或扫二维码
作者:姚崇 Oracle OCM、MySQL OCP、Oceanbase OBCA、PingCAP PCTA认证,擅长基于Oracle、MySQL Performance Turning及多种关系型 NoSQL数据库。