本地启动了一个sshd的容器服务,但该容器经常会被重启导致ssh连接失败,使用kubectl describe pod命令查看改命令发现有容器返回值为137,一般是系统环境原因,且一般为内存不足导致的,参见:Container exits with non-zero exit code 137
Started: Tue, 20 Nov 2018 12:14:42 +0800
Last State: Terminated
Reason: Error
Exit Code: 137
Started: Mon, 19 Nov 2018 14:18:22 +0800
Finished: Tue, 20 Nov 2018 12:14:16 +0800
Nov 20 04:02:36 ip-172-20-54-91 kernel: [91374.975004] sshd: page allocation stalls for 20388ms, order:0, mode:0x24200ca(GFP_HIGHUSER_MOVABLE)
Nov 20 04:02:36 ip-172-20-54-91 kernel: [91374.984454] CPU: 3 PID: 1257 Comm: sshd Not tainted 4.9.0-7-amd64 #1 Debian 4.9.110-3+deb9u2
Nov 20 04:02:36 ip-172-20-54-91 kernel: [91374.988477] Hardware name: Xen HVM domU, BIOS 4.2.amazon 08/24/2006
Nov 20 04:02:36 ip-172-20-54-91 kernel: [91374.995081] 0000000000000000 ffffffff90d30694 ffffffff91401218 ffffb76e46b5fb60
Nov 20 04:02:36 ip-172-20-54-91 kernel: [91375.004170] ffffffff90b89d0a 024200ca00000006 ffffffff91401218 ffffb76e46b5fb00
Nov 20 04:02:36 ip-172-20-54-91 kernel: [91375.004170] ffff8b5300000010 ffffb76e46b5fb70 ffffb76e46b5fb20 286e078452d92816
Nov 20 04:02:36 ip-172-20-54-91 kernel: [91375.004170] Call Trace:
Nov 20 04:02:36 ip-172-20-54-91 kernel: [91375.004170] [<ffffffff90d30694>] ? dump_stack+0x5c/0x78
Nov 20 04:02:36 ip-172-20-54-91 kernel: [91375.026383] [<ffffffff90b89d0a>] ? warn_alloc+0x13a/0x160
Nov 20 04:02:36 ip-172-20-54-91 kernel: [91375.026383] [<ffffffff90b8a735>] ? __alloc_pages_slowpath+0x995/0xbf0
Nov 20 04:02:36 ip-172-20-54-91 kernel: [91375.026383] [<ffffffff9100fee1>] ? __schedule+0x241/0x6f0
Nov 20 04:02:36 ip-172-20-54-91 kernel: [91375.026383] [<ffffffff90a1bc81>] ? xen_clocksource_get_cycles+0x11/0x20
Nov 20 04:02:36 ip-172-20-54-91 kernel: [91375.026383] [<ffffffff90aef21e>] ? ktime_get+0x3e/0xb0
Nov 20 04:02:36 ip-172-20-54-91 kernel: [91375.026383] [<ffffffff90b8ab91>] ? __alloc_pages_nodemask+0x201/0x260
Nov 20 04:02:36 ip-172-20-54-91 kernel: [91375.026383] [<ffffffff90bdd39e>] ? alloc_pages_vma+0xae/0x260
Nov 20 04:02:36 ip-172-20-54-91 kernel: [91375.026383] [<ffffffff90bb43c9>] ? wp_page_copy+0x89/0x700
Nov 20 04:02:36 ip-172-20-54-91 kernel: [91375.026383] [<ffffffff90bb56c1>] ? do_wp_page+0x161/0x7e0
Nov 20 04:02:36 ip-172-20-54-91 kernel: [91375.026383] [<ffffffff90bc5261>] ? page_add_file_rmap+0x11/0x110
Nov 20 04:02:36 ip-172-20-54-91 kernel: [91375.026383] [<ffffffff90bb7812>] ? alloc_set_pte+0x3c2/0x550
Nov 20 04:02:36 ip-172-20-54-91 kernel: [91375.026383] [<ffffffff90bb8422>] ? handle_mm_fault+0x832/0x1280
Nov 20 04:02:36 ip-172-20-54-91 kernel: [91375.026383] [<ffffffff90a61015>] ? __do_page_fault+0x255/0x4f0
Nov 20 04:02:36 ip-172-20-54-91 kernel: [91375.026383] [<ffffffff91016018>] ? page_fault+0x28/0x30
Nov 20 04:02:36 ip-172-20-54-91 kernel: [91375.110910] Mem-Info:
Nov 20 04:02:36 ip-172-20-54-91 kernel: [91375.113701] active_anon:3868989 inactive_anon:1176 isolated_anon:0
Nov 20 04:02:36 ip-172-20-54-91 kernel: [91375.113701] active_file:23607 inactive_file:21078 isolated_file:720
Nov 20 04:02:36 ip-172-20-54-91 kernel: [91375.113701] unevictable:0 dirty:8 writeback:0 unstable:0
Nov 20 04:02:36 ip-172-20-54-91 kernel: [91375.113701] slab_reclaimable:16746 slab_unreclaimable:57137
Nov 20 04:02:36 ip-172-20-54-91 kernel: [91375.113701] mapped:38107 shmem:3568 pagetables:20708 bounce:0
Nov 20 04:02:36 ip-172-20-54-91 kernel: [91375.113701] free:33020 free_pcp:370 free_cma:0
Nov 20 04:02:36 ip-172-20-54-91 kernel: [91375.143413] Node 0 active_anon:15475956kB inactive_anon:4704kB active_file:94428kB inactive_file:84312kB unevictable:0kB isolated(anon):0kB isolated(file):2880kB mapped:152428kB dirty:32kB writeback:0kB shmem:14272kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 135168kB writeback_tmp:0kB unstable:0kB pages_scanned:50 all_unreclaimable? no
Nov 20 04:02:36 ip-172-20-54-91 kernel: [91375.169765] Node 0 DMA free:15904kB min:64kB low:80kB high:96kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15988kB managed:15904kB mlocked:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
Nov 20 04:02:36 ip-172-20-54-91 kernel: [91375.195526] lowmem_reserve[]: 0 3741 16011 16011 16011
Nov 20 04:02:36 ip-172-20-54-91 kernel: [91375.201354] Node 0 DMA32 free:64640kB min:15776kB low:19720kB high:23664kB active_anon:3624584kB inactive_anon:144kB active_file:17908kB inactive_file:15952kB unevictable:0kB writepending:0kB present:3915776kB managed:3850208kB mlocked:0kB slab_reclaimable:11200kB slab_unreclaimable:45456kB kernel_stack:12508kB pagetables:16536kB bounce:0kB free_pcp:508kB local_pcp:0kB free_cma:0kB
Nov 20 04:02:36 ip-172-20-54-91 kernel: [91375.230119] lowmem_reserve[]: 0 0 12270 12270 12270
Nov 20 04:02:36 ip-172-20-54-91 kernel: [91375.235036] Node 0 Normal free:51536kB min:51740kB low:64672kB high:77604kB active_anon:11851568kB inactive_anon:4560kB active_file:76636kB inactive_file:68596kB unevictable:0kB writepending:32kB present:12845056kB managed:12569296kB mlocked:0kB slab_reclaimable:55784kB slab_unreclaimable:183092kB kernel_stack:47380kB pagetables:66296kB bounce:0kB free_pcp:1056kB local_pcp:0kB free_cma:0kB
附:使用perf进行cpu占用率进行分析
#include<stdio.h>
#include<stdlib.h>
void AA(){
int i=0;
while(1){
i++;
}
}
void BB(){
printf("BB\n");
}
int main(){
BB();
AA();
}
%Cpu(s): 50.0 us, 8.3 sy, 0.0 ni, 41.7 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
Samples: 699K of event 'cpu-clock', Event count (approx.): 8217702565
Overhead Shared Object Symbol
99.68% test [.] AA
0.12% [kernel] [k] _raw_spin_unlock_irqrestore
0.06% [kernel] [k] __do_softirq
0.02% [kernel] [k] e1000_xmit_frame
0.01% libc-2.17.so [.] _int_malloc
0.00% [kernel] [k] clear_page
0.00% libvmtools.so.0.0.0 [.] Backdoor_InOut
0.00% [kernel] [k] kstat_irqs
perf record -a -e cycles -o cycle.perf -g sleep 10
# perf report -i cycle.perf|more
# To display the perf.data header info, please use --header/--header-only options.
#
#
# Total Lost Samples: 0
#
# Samples: 22K of event 'cpu-clock'
# Event count (approx.): 5736750000
#
# Children Self Command Shared Object Symbol
# ........ ........ ............... ................... ..........................................................................
#
88.97% 0.00% test libc-2.17.so [.] __libc_start_main
|
---__libc_start_main
main
AA
88.97% 0.00% test test [.] main
|
---main
AA
88.97% 88.88% test test [.] AA
|
--88.88%--__libc_start_main
main
AA
TIPS:
参见:
https://utcc.utoronto.ca/~cks/space/blog/linux/DecodingPageAllocFailures
https://www.cnblogs.com/004x/p/6651600.htm
http://tldp.org/LDP/Linux-Filesystem-Hierarchy/html/proc.html
https://utcc.utoronto.ca/~cks/space/blog/linux/KernelMemoryZones
https://blog.csdn.net/lickylin/article/details/50726847
http://www.10tiao.com/html/497/201606/2456160252/1.html
https://www.kernel.org/doc/Documentation/filesystems/proc.txt