运维:k8s pod erro exit code 137

该问题由于引用触发oom,进而因为kill 信号,致使pod 终端停止。

Issue

If a container is no longer running, use the following command to find the status of the container:

docker container ls -a

This article explains possible reasons for the following exit code:

"task: non-zero exit (137)"

With exit code 137, you might also notice a status of Shutdown or the following failed message:

Failed 42 hours ago

Resolution

The "task: non-zero exit (137)" message is effectively the result of a kill -9 (128 + 9). This can be due to a couple possibilities (seen most often with Java applications):

  1. The container received a docker stop, and the application didn't gracefully handle SIGTERM (kill -15) — whenever a SIGTERM has been issued, the docker daemon waits 10 seconds then issue a SIGKILL (kill -9) to guarantee the shutdown. To test whether your containerized application correctly handles SIGTERM, simply issue a docker stop against the container ID and check to see whether you get the "task: non-zero exit (137)". This is not something to test in a production environment, as you can expect at least a brief interruption of service. Best practices would be to test in a development or test Docker environment.
  2. The application hit an OOM (out of memory) condition. With regards to OOM condition handling, review the node's kernel logs to validate whether this occurred. This would require knowing which node the failed container was running on, or proceed with checking all nodes. Run something like this on your node(s) to help you identify whether you've had a container hit an OOM condition: journalctl -k | grep -i -e memory -e oom Another option would be to inspect the (failed) container: docker inspect <container ID> Review the application's memory requirements and ensure that the container it's running in has sufficient memory. Conversely, set a limit on the container's memory to ensure that wherever it runs, it does not consume memory to the detriment of the node. If the application is Java-based, you may want to review the maximum memory configuration settings.

References

  • docker run command line options
  • Specify hard limits on memory available to containers (-m, –memory)

原文发布于微信公众号 - 云计算与大数据(heidcloud)

原文发表时间:2018-10-23

本文参与腾讯云自媒体分享计划,欢迎正在阅读的你也加入,一起分享。

发表于

我来说两句

0 条评论
登录 后参与评论

相关文章

来自专栏后台及大数据开发

springBoot系列教程05:fastjson的集成、配置及使用

springBoot自带的json用着不太习惯,已习惯了fastJSON,下面介绍下fastjson的配置

15010
来自专栏张善友的专栏

Windows Server AppFabric Beta 2 已经发布

Windows Server AppFabric Beta 2是一个包含完全功能的AppFabric版本(This build represents our “...

19350
来自专栏名山丶深处

springboot集成redis(mybatis、分布式session)

25080
来自专栏乐沙弥的世界

PXC 5.7 WSREP_SST: [ERROR] xtrabackup_checkpoints missing

前阵子在配置好了PXC5.7之后,在启动其中的一个节点,碰到了 [ERROR] xtrabackup_checkpoints missing. xtraback...

26820
来自专栏ml

flume安装及配置介绍(二)

注: 环境: skylin-linux Flume的下载方式:   wget http://www.apache.org/dyn/closer.lua/flu...

380110
来自专栏蓝天

当gdb看到一行行的??时,你要冷静!

下面这段,初看一定会脑大,实际原因非常明确,所以遇到时要先观察,不一定是头大的问题。 gdb -p 1461 GNU gdb 6.6 Copyright (...

10410
来自专栏张善友的专栏

What is aspnet.config

今天认真的看了一下1.1和2.0版本的Aspnet.config,发现非常的不同,也许是asp.net 2.0比1.1的修改非常大。在MSDN上也找不到相关的文...

24790
来自专栏生信技能树

(15)基因组各种版本对应关系-生信菜鸟团博客2周年精选文章集

这是我的成名作: 首先是NCBI对应UCSC,对应ENSEMBL数据库: GRCh36 (hg18): ENSEMBL release_52. GRCh37 (...

55380
来自专栏流媒体

Linux下ndk编译移植FFmpeg到Android平台简介

这里我们选择3.2.4版本(注意:这里使用的3.2.4版本,如果用最新的版本,编译可能出现问题,为了想让大家上手,建议版本先保持一致)。直接github上选择下...

33020
来自专栏专业duilib使用+业余界面开发

Error code of Media server (MediaPlayer内核异常码说明)

79440

扫码关注云+社区

领取腾讯云代金券