业务进程异常停止或重启,可以根据 /var/log/messages 日志判断是否发生OOM,如果是,又是什么进程占用了大量内存空间触发 OOM Killer
运行nmap一个小时后会被结束,查看日志为系统内存不足,触发oom机制,nmap进程被killed,但是查看历史内存使用情况,并没有持续达到峰值情况
过滤日志发现,messages 日志中存在多条 Out of memory 日志信息,触发 Kill 进程动作
Nov 22 11:09:03 VM_0_14_centos kernel: [ pid ] uid tgid total_vm rss nr_ptes swapents oom_score_adj name
Nov 22 11:09:03 VM_0_14_centos kernel: [ 328] 0 328 26240 94 56 0 0 systemd-journal
Nov 22 11:09:03 VM_0_14_centos kernel: [ 346] 0 346 29149 76 25 0 0 lvmetad
Nov 22 11:09:03 VM_0_14_centos kernel: [ 349] 0 349 11121 186 24 0 -1000 systemd-udevd
Nov 22 11:09:03 VM_0_14_centos kernel: [ 469] 0 469 13877 113 27 0 -1000 auditd
Nov 22 11:09:03 VM_0_14_centos kernel: [ 505] 0 505 6594 75 18 0 0 systemd-logind
Nov 22 11:09:03 VM_0_14_centos kernel: [ 506] 81 506 15044 143 34 0 -900 dbus-daemon
Nov 22 11:09:03 VM_0_14_centos kernel: [ 511] 999 511 135131 1909 61 0 0 polkitd
Nov 22 11:09:03 VM_0_14_centos kernel: [ 512] 998 512 2144 37 10 0 0 lsmd
Nov 22 11:09:03 VM_0_14_centos kernel: [ 516] 0 516 1096 34 8 0 0 acpid
Nov 22 11:09:03 VM_0_14_centos kernel: [ 518] 38 518 12330 180 29 0 0 ntpd
Nov 22 11:09:03 VM_0_14_centos kernel: [ 729] 0 729 31570 155 19 0 0 crond
Nov 22 11:09:03 VM_0_14_centos kernel: [ 730] 0 730 6476 52 18 0 0 atd
Nov 22 11:09:03 VM_0_14_centos kernel: [ 803] 0 803 26845 498 50 0 0 dhclient
Nov 22 11:09:03 VM_0_14_centos kernel: [ 881] 0 881 143975 2757 96 0 0 tuned
Nov 22 11:09:03 VM_0_14_centos kernel: [ 883] 0 883 80229 487 79 0 0 rsyslogd
Nov 22 11:09:03 VM_0_14_centos kernel: [ 967] 0 967 27522 33 12 0 0 agetty
Nov 22 11:09:03 VM_0_14_centos kernel: [ 968] 0 968 27522 33 10 0 0 agetty
Nov 22 11:09:03 VM_0_14_centos kernel: [ 1324] 0 1324 28199 256 60 0 -1000 sshd
Nov 22 11:09:03 VM_0_14_centos kernel: [ 1345] 0 1345 24365 115 17 0 0 sgagent
Nov 22 11:09:03 VM_0_14_centos kernel: [ 1393] 0 1393 38819 1626 30 0 0 barad_agent
Nov 22 11:09:03 VM_0_14_centos kernel: [ 1402] 0 1402 39547 1830 31 0 0 barad_agent
Nov 22 11:09:03 VM_0_14_centos kernel: [ 1403] 0 1403 169459 2863 51 0 0 barad_agent
Nov 22 11:09:03 VM_0_14_centos kernel: [ 1480] 0 1480 25249 98 21 0 0 YDLive
Nov 22 11:09:03 VM_0_14_centos kernel: [ 3343] 0 3343 32007 198 19 0 0 screen
Nov 22 11:09:03 VM_0_14_centos kernel: [ 3344] 0 3344 29023 272 14 0 0 bash
Nov 22 11:09:03 VM_0_14_centos kernel: [13683] 0 13683 142840 1703 36 0 0 YDService
Nov 22 11:09:03 VM_0_14_centos kernel: [19272] 0 19272 678212 422656 1329 0 0 nmap
Nov 22 11:09:03 VM_0_14_centos kernel: Out of memory: Kill process 19272 (nmap) score 873 or sacrifice child
Nov 22 11:09:03 VM_0_14_centos kernel: Killed process 19272 (nmap) total-vm:2712848kB, anon-rss:1690624kB, file-rss:0kB, shmem-rss:0kB
这里可以从 anon-rss 值中确认到进程被kill前实际占用了多少物理内存:
total-vm 表示如果一个程序完全驻留在内存的话需要占用多少内存空间 anon-rss 进程当前实际占用了多少内存
注意: 这里如果系统中存在多个占用内存较高的进程,直接查看被 kill 的进程实际占用内存大小可能与系统实际内存大小相差比较大,那么可以通过计算当前系统中占用内存大进程的内存总数对比确认都是哪些进程消耗了大量系统内存:
实际占用内存计算: RSS(物理内存页)大小是 4kB,可以查看 messages 日志中打印的 rss 数值(进程占用的物理内存页数量) 例如这里我们看到 nmap 进程占用最高,实际占用物理内存页是422656,乘以4KB等于 1690624KB,除以 1024 等于 1651MB