前往小程序,Get更优阅读体验!
立即前往
首页
学习
活动
专区
工具
TVP
发布
社区首页 >专栏 >遇到mpi worker exited on signal 9

遇到mpi worker exited on signal 9

作者头像
runzhliu
发布2020-08-06 10:46:17
1.8K0
发布2020-08-06 10:46:17
举报
文章被收录于专栏:容器计算容器计算

运行一个 mpi-operator 的 demo(这个 demo 还是我提交的…),看到如下错误。

代码语言:javascript
复制
An MPI communication peer process has unexpectedly disconnected.  This
usually indicates a failure in the peer process (e.g., a crash or
otherwise exiting without calling MPI_FINALIZE first).

Although this local MPI process will likely now behave unpredictably
(it may even hang or crash), the root cause of this problem is the
failure of the peer -- that is what you need to investigate.  For
example, there may be a core file that you can examine.  More
generally: such peer hangups are frequently caused by application bugs
or other external events.

  Local host: mpi-sleep-worker-0
  Local PID:  99
  Peer host:  mpi-sleep-worker-1
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpirun noticed that process rank 1 with PID 58 on node mpi-sleep-worker-1 exited on signal 9 (Killed).
--------------------------------------------------------------------------

看了许久,发现是 Worker 配置的内存太少了(之前只有1Gi),如果要运行这个 demo,请把 Worker 的内存加到 2Gi。

本文参与 腾讯云自媒体同步曝光计划,分享自作者个人站点/博客。
原始发表:2020-06-09 ,如有侵权请联系 cloudcommunity@tencent.com 删除

本文分享自 作者个人站点/博客 前往查看

如有侵权,请联系 cloudcommunity@tencent.com 删除。

本文参与 腾讯云自媒体同步曝光计划  ,欢迎热爱写作的你一起参与!

评论
登录后参与评论
0 条评论
热度
最新
推荐阅读
领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档