前往小程序,Get更优阅读体验!
立即前往
首页
学习
活动
专区
工具
TVP
发布
社区首页 >专栏 >【NoReady】Kubernetes集群新添加node节点出错排查记录

【NoReady】Kubernetes集群新添加node节点出错排查记录

作者头像
宝耶需努力
发布2023-08-26 15:42:03
1.3K0
发布2023-08-26 15:42:03
举报
文章被收录于专栏:Cloud-DIYCloud-DIY

1、查看node节点概况🔎

发现新添加的Node节点处在NoReady状态。

代码语言:javascript
复制
[root@master01 ~]# kubectl get nodes
NAME       STATUS     ROLES           AGE    VERSION
master01   Ready      control-plane   24h    v1.28.0
node01     Ready      <none>          24h    v1.28.0
node02     NotReady   <none>          122m   v1.28.0

查看此节点的详细信息

代码语言:javascript
复制
[root@master01 ~]# kubectl get nodes
NAME       STATUS     ROLES           AGE    VERSION
master01   Ready      control-plane   24h    v1.28.0
node01     Ready      <none>          24h    v1.28.0
node02     NotReady   <none>          122m   v1.28.0
[root@master01 ~]# kubectl describe node node02
Name:               node02
Roles:              <none>
Labels:             beta.kubernetes.io/arch=amd64
                    beta.kubernetes.io/os=linux
                    kubernetes.io/arch=amd64
                    kubernetes.io/hostname=node02
                    kubernetes.io/os=linux
Annotations:        kubeadm.alpha.kubernetes.io/cri-socket: unix:///var/run/containerd/containerd.sock
                    node.alpha.kubernetes.io/ttl: 0
                    volumes.kubernetes.io/controller-managed-attach-detach: true
CreationTimestamp:  Thu, 24 Aug 2023 09:59:44 +0800
Taints:             node.kubernetes.io/not-ready:NoExecute
                    node.kubernetes.io/not-ready:NoSchedule
Unschedulable:      false
Lease:
  HolderIdentity:  node02
  AcquireTime:     <unset>
  RenewTime:       Thu, 24 Aug 2023 12:02:30 +0800
Conditions:
  Type             Status  LastHeartbeatTime                 LastTransitionTime                Reason                       Message
  ----             ------  -----------------                 ------------------                ------                       -------
  MemoryPressure   False   Thu, 24 Aug 2023 12:00:06 +0800   Thu, 24 Aug 2023 10:07:57 +0800   KubeletHasSufficientMemory   kubelet has sufficient memory available
  DiskPressure     False   Thu, 24 Aug 2023 12:00:06 +0800   Thu, 24 Aug 2023 10:07:57 +0800   KubeletHasNoDiskPressure     kubelet has no disk pressure
  PIDPressure      False   Thu, 24 Aug 2023 12:00:06 +0800   Thu, 24 Aug 2023 10:07:57 +0800   KubeletHasSufficientPID      kubelet has sufficient PID available
  Ready            False   Thu, 24 Aug 2023 12:00:06 +0800   Thu, 24 Aug 2023 10:07:57 +0800   KubeletNotReady              container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: cni plugin not initialized
Addresses:
  InternalIP:  192.168.20.30
  Hostname:    node02
Capacity:
  cpu:                4
  ephemeral-storage:  27245572Ki
  hugepages-1Gi:      0
  hugepages-2Mi:      0
  memory:             8107004Ki
  pods:               110
Allocatable:
  cpu:                4
  ephemeral-storage:  25109519114
  hugepages-1Gi:      0
  hugepages-2Mi:      0
  memory:             8004604Ki
  pods:               110
System Info:
  Machine ID:                 8f112fe303914f1e8e27c6b68d205117
  System UUID:                cccb4d56-2724-7bd9-9a5d-25df2e878d03
  Boot ID:                    ee9e1155-e71e-41a2-b07c-d621654a7429
  Kernel Version:             5.14.0-284.25.1.el9_2.x86_64
  OS Image:                   Rocky Linux 9.2 (Blue Onyx)
  Operating System:           linux
  Architecture:               amd64
  Container Runtime Version:  containerd://1.6.23
  Kubelet Version:            v1.28.0
  Kube-Proxy Version:         v1.28.0
PodCIDR:                      10.10.2.0/24
PodCIDRs:                     10.10.2.0/24
Non-terminated Pods:          (2 in total)
  Namespace                   Name                     CPU Requests  CPU Limits  Memory Requests  Memory Limits  Age
  ---------                   ----                     ------------  ----------  ---------------  -------------  ---
  kube-flannel                kube-flannel-ds-skdz2    100m (2%)     0 (0%)      50Mi (0%)        0 (0%)         122m
  kube-system                 kube-proxy-zj662         0 (0%)        0 (0%)      0 (0%)           0 (0%)         122m
Allocated resources:
  (Total limits may be over 100 percent, i.e., overcommitted.)
  Resource           Requests   Limits
  --------           --------   ------
  cpu                100m (2%)  0 (0%)
  memory             50Mi (0%)  0 (0%)
  ephemeral-storage  0 (0%)     0 (0%)
  hugepages-1Gi      0 (0%)     0 (0%)
  hugepages-2Mi      0 (0%)     0 (0%)
Events:              <none>
[root@master01 ~]#
代码语言:javascript
复制
container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: cni plugin not initialized

通过查看Node节点的详细信息,发现是网络问题,接着进一步排查有关网络的Pod的运行情况。

2、查看Pod容器概况🔎

代码语言:javascript
复制
kubectl get pods --all-namespaces

查看对应Pod的详细信息

代码语言:javascript
复制
kubectl describe pods/kube-flannel-ds-skdz2 -n kube-flannel
代码语言:javascript
复制
Warning  FailedCreatePodSandBox  3s (x219 over 48m)    kubelet  (combined from similar events): Failed to create pod sandbox: rpc error: code = Unknown desc = failed to create containerd task: failed to create shim task: OCI runtime create failed: unable to retrieve OCI runtime error (open /run/containerd/io.containerd.runtime.v2.task/k8s.io/7be4fcbe7145f777339cd0a3e43223c9861058af77e2e528b58138aebcbce56d/log.json: no such file or directory): exec: "runc": executable file not found in $PATH: unknown

🔴 至此,发现问题。问题出在runc安装路径找不到

3、问题发现及解决✅

🟢 首先,排查问题,发现是在node节点上安装runc时,安装路径出现错误。安装操作步骤如下,重新排查runc安装路径。

代码语言:javascript
复制
2️⃣Step 2:Installing runc

# https://github.com/opencontainers/runc/releases 下载对应的安装包

$ wget https://github.com/opencontainers/runc/releases/download/v1.1.9/runc.amd64

$ mkdir -p /usr/local/sbin/runc

$ install -m 755 runc.amd64 /usr/local/sbin/runc

[root@node02 ~]# ll /usr/local/sbin/
总用量 10436
-rwxr-xr-x 1 root root 10684992  8月 24 14:53 runc

发现此时有关网络的pod状态已经恢复正常

代码语言:javascript
复制
[root@master01 ~]# kubectl get pods -n kube-flannel
NAME                    READY   STATUS    RESTARTS      AGE
kube-flannel-ds-jmsr9   1/1     Running   0             95m
kube-flannel-ds-jpc9k   1/1     Running   2 (45m ago)   95m
kube-flannel-ds-nlr95   1/1     Running   2 (44m ago)   95m

查看node02节点详细信息

代码语言:javascript
复制
[root@master01 ~]# kubectl describe node node02
Name:               node02
Roles:              <none>
Labels:             beta.kubernetes.io/arch=amd64
                    beta.kubernetes.io/os=linux
                    kubernetes.io/arch=amd64
                    kubernetes.io/hostname=node02
                    kubernetes.io/os=linux
Annotations:        csi.volume.kubernetes.io/nodeid: {"rook-ceph.cephfs.csi.ceph.com":"node02","rook-ceph.rbd.csi.ceph.com":"node02"}
                    flannel.alpha.coreos.com/backend-data: {"VNI":1,"VtepMAC":"12:2a:cd:4a:6a:7c"}
                    flannel.alpha.coreos.com/backend-type: vxlan
                    flannel.alpha.coreos.com/kube-subnet-manager: true
                    flannel.alpha.coreos.com/public-ip: 192.168.20.30
                    kubeadm.alpha.kubernetes.io/cri-socket: unix:///var/run/containerd/containerd.sock
                    node.alpha.kubernetes.io/ttl: 0
                    volumes.kubernetes.io/controller-managed-attach-detach: true
CreationTimestamp:  Thu, 24 Aug 2023 09:59:44 +0800
Taints:             <none>
Unschedulable:      false
Lease:
  HolderIdentity:  node02
  AcquireTime:     <unset>
  RenewTime:       Thu, 24 Aug 2023 15:45:00 +0800
Conditions:
  Type                 Status  LastHeartbeatTime                 LastTransitionTime                Reason                       Message
  ----                 ------  -----------------                 ------------------                ------                       -------
  NetworkUnavailable   False   Thu, 24 Aug 2023 14:59:09 +0800   Thu, 24 Aug 2023 14:59:09 +0800   FlannelIsUp                  Flannel is running on this node
  MemoryPressure       False   Thu, 24 Aug 2023 15:40:31 +0800   Thu, 24 Aug 2023 14:06:28 +0800   KubeletHasSufficientMemory   kubelet has sufficient memory available
  DiskPressure         False   Thu, 24 Aug 2023 15:40:31 +0800   Thu, 24 Aug 2023 14:06:28 +0800   KubeletHasNoDiskPressure     kubelet has no disk pressure
  PIDPressure          False   Thu, 24 Aug 2023 15:40:31 +0800   Thu, 24 Aug 2023 14:06:28 +0800   KubeletHasSufficientPID      kubelet has sufficient PID available
  Ready                True    Thu, 24 Aug 2023 15:40:31 +0800   Thu, 24 Aug 2023 14:59:09 +0800   KubeletReady                 kubelet is posting ready status
Addresses:
  InternalIP:  192.168.20.30
  Hostname:    node02
Capacity:
  cpu:                4
  ephemeral-storage:  27245572Ki
  hugepages-1Gi:      0
  hugepages-2Mi:      0
  memory:             8107012Ki
  pods:               110
Allocatable:
  cpu:                4
  ephemeral-storage:  25109519114
  hugepages-1Gi:      0
  hugepages-2Mi:      0
  memory:             8004612Ki
  pods:               110
System Info:
  Machine ID:                 8f112fe303914f1e8e27c6b68d205117
  System UUID:                cccb4d56-2724-7bd9-9a5d-25df2e878d03
  Boot ID:                    2f59eb8b-d2cc-41c5-874a-a2d31a2c0da6
  Kernel Version:             5.14.0-284.25.1.el9_2.x86_64
  OS Image:                   Rocky Linux 9.2 (Blue Onyx)
  Operating System:           linux
  Architecture:               amd64
  Container Runtime Version:  containerd://1.6.23
  Kubelet Version:            v1.28.0
  Kube-Proxy Version:         v1.28.0
PodCIDR:                      10.10.2.0/24
PodCIDRs:                     10.10.2.0/24
Non-terminated Pods:          (8 in total)
  Namespace                   Name                                                CPU Requests  CPU Limits  Memory Requests  Memory Limits  Age
  ---------                   ----                                                ------------  ----------  ---------------  -------------  ---
  kube-flannel                kube-flannel-ds-jmsr9                               100m (2%)     0 (0%)      50Mi (0%)        0 (0%)         97m
  kube-system                 kube-proxy-zj662                                    0 (0%)        0 (0%)      0 (0%)           0 (0%)         5h45m
  rook-ceph                   csi-cephfsplugin-dbfgd                              0 (0%)        0 (0%)      0 (0%)           0 (0%)         45m
  rook-ceph                   csi-rbdplugin-xvccs                                 0 (0%)        0 (0%)      0 (0%)           0 (0%)         45m
  rook-ceph                   rook-ceph-crashcollector-node02-796978746f-7zfm9    0 (0%)        0 (0%)      0 (0%)           0 (0%)         34m
  rook-ceph                   rook-ceph-mgr-a-54bf4765f-lskgr                     0 (0%)        0 (0%)      0 (0%)           0 (0%)         39m
  rook-ceph                   rook-ceph-mon-c-b467f78dd-7bwz4                     0 (0%)        0 (0%)      0 (0%)           0 (0%)         45m
  rook-ceph                   rook-ceph-osd-0-7575dcff-bpglm                      0 (0%)        0 (0%)      0 (0%)           0 (0%)         34m
Allocated resources:
  (Total limits may be over 100 percent, i.e., overcommitted.)
  Resource           Requests   Limits
  --------           --------   ------
  cpu                100m (2%)  0 (0%)
  memory             50Mi (0%)  0 (0%)
  ephemeral-storage  0 (0%)     0 (0%)
  hugepages-1Gi      0 (0%)     0 (0%)
  hugepages-2Mi      0 (0%)     0 (0%)
Events:
  Type     Reason                   Age                From             Message
  ----     ------                   ----               ----             -------
  Normal   Starting                 48m                kube-proxy
  Normal   Starting                 98m                kubelet          Starting kubelet.
  Warning  InvalidDiskCapacity      98m                kubelet          invalid capacity 0 on image filesystem
  Normal   NodeAllocatableEnforced  98m                kubelet          Updated Node Allocatable limit across pods
  Normal   NodeHasNoDiskPressure    98m (x2 over 98m)  kubelet          Node node02 status is now: NodeHasNoDiskPressure
  Normal   NodeHasSufficientPID     98m (x2 over 98m)  kubelet          Node node02 status is now: NodeHasSufficientPID
  Warning  Rebooted                 98m                kubelet          Node node02 has been rebooted, boot id: 1a7c4fda-ca1d-4db9-8af0-186ec828da5b
  Normal   NodeNotReady             98m                kubelet          Node node02 status is now: NodeNotReady
  Normal   NodeHasSufficientMemory  98m (x2 over 98m)  kubelet          Node node02 status is now: NodeHasSufficientMemory
  Normal   RegisteredNode           48m                node-controller  Node node02 event: Registered Node node02 in Controller
  Warning  InvalidDiskCapacity      48m                kubelet          invalid capacity 0 on image filesystem
  Normal   Starting                 48m                kubelet          Starting kubelet.
  Normal   NodeHasSufficientMemory  48m                kubelet          Node node02 status is now: NodeHasSufficientMemory
  Normal   NodeHasNoDiskPressure    48m                kubelet          Node node02 status is now: NodeHasNoDiskPressure
  Normal   NodeHasSufficientPID     48m                kubelet          Node node02 status is now: NodeHasSufficientPID
  Warning  Rebooted                 48m (x2 over 48m)  kubelet          Node node02 has been rebooted, boot id: 2f59eb8b-d2cc-41c5-874a-a2d31a2c0da6
  Normal   NodeAllocatableEnforced  48m                kubelet          Updated Node Allocatable limit across pods
  Normal   NodeReady                45m                kubelet          Node node02 status is now: NodeReady

4、总结🎇

通过此次排错学习,找到了排错解决相关问题的思路,一针见血,找到问题发生之根源,快速排错,达到定位错位来源,解决错误问题的最终目的。需要加强学习和排错能力。

本文参与 腾讯云自媒体分享计划,分享自作者个人站点/博客。
原始发表:2023-08-25,如有侵权请联系 cloudcommunity@tencent.com 删除

本文分享自 作者个人站点/博客 前往查看

如有侵权,请联系 cloudcommunity@tencent.com 删除。

本文参与 腾讯云自媒体分享计划  ,欢迎热爱写作的你一起参与!

评论
登录后参与评论
0 条评论
热度
最新
推荐阅读
目录
  • 2、查看Pod容器概况🔎
  • 4、总结🎇
相关产品与服务
容器服务
腾讯云容器服务(Tencent Kubernetes Engine, TKE)基于原生 kubernetes 提供以容器为核心的、高度可扩展的高性能容器管理服务,覆盖 Serverless、边缘计算、分布式云等多种业务部署场景,业内首创单个集群兼容多种计算节点的容器资源管理模式。同时产品作为云原生 Finops 领先布道者,主导开源项目Crane,全面助力客户实现资源优化、成本控制。
领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档