前往小程序,Get更优阅读体验!
立即前往
首页
学习
活动
专区
工具
TVP
发布
社区首页 >专栏 >K8S学习笔记之kubeadm reset后的环境清理

K8S学习笔记之kubeadm reset后的环境清理

作者头像
Jetpropelledsnake21
发布2022-09-27 19:12:03
1.5K0
发布2022-09-27 19:12:03
举报
文章被收录于专栏:JetpropelledSnakeJetpropelledSnake

0x00 概述

本文主要记录在kubeadm reset后,在重装集群,加入和管理节点过程中遇到的问题。

0x01 kubeadm reset后的清理工作

代码语言:javascript
复制
iptables -F && iptables -t nat -F && iptables -t mangle -F && iptables -X
ipvsadm --clear
systemctl stop kubelet
systemctl stop docker
rm -rf /var/lib/cni/*
rm -rf /var/lib/kubelet/*
rm -rf /etc/cni/*
rm -rf $HOME/.kube/config
systemctl start docker

根据自己需要选择清空iptables或者ipvs,如果后续加入集群过程中,提示有别的文件夹存在文件,可以直接rm -rf删除指定文件夹。

0x02 kubeadm reset后重装集群遇到的问题

2.1 calico报错

在执行kubeadm reset之后,开始进行重装集群操作,此时会遇到很多calico相关的报错日志,包括以下日志但是不限于这些日志:

以下日志是在执行kubectl apply -f calico.yaml之后出现的。

calico的BGP报错

代码语言:javascript
复制
Warning Unhealthy pod/calico-node-k6tz5 Readiness probe failed: calico/node is not ready: BIRD is not ready: Error querying BIRD: unable to connect to BIRDv4 socket: dial unix /var/run/bird/bird.ctl: connect: no such file or directory

Warning Unhealthy pod/calico-node-k6tz5 Liveness probe failed: calico/node is not ready: bird/confd is not live: exit status 1

Warning BackOff pod/calico-node-k6tz5 Back-off restarting failed container

分支网卡解决方案,参考这里,但是更改了网卡后还是报错,参考2.2操作(根源就在kube-proxy)。

dail tcp 10.96.0.1:443: connect: connection refused报错

代码语言:javascript
复制
Hit error connecting to datastore - retry error=Get “https://10.96.0.1:443/api/v1/nodes/foo”: dial tcp 10.96.0.1:443: connect: connection refused

calico liveness和readniess probe探针报错

代码语言:javascript
复制
  Warning  Unhealthy       69m (x2936 over 12d)   kubelet  Readiness probe errored: rpc error: code = Unknown desc = operation timeout: context deadline exceeded
  Warning  Unhealthy       57m (x2938 over 12d)   kubelet  Liveness probe errored: rpc error: code = Unknown desc = operation timeout: context deadline exceeded
  Warning  Unhealthy       12m (x6 over 13m)      kubelet  Liveness probe failed: container is not running
  Normal   SandboxChanged  11m (x2 over 13m)      kubelet  Pod sandbox changed, it will be killed and re-created.
  Normal   Killing         11m (x2 over 13m)      kubelet  Stopping container calico-node
  Warning  Unhealthy       8m3s (x32 over 13m)    kubelet  Readiness probe failed: container is not running
  Warning  Unhealthy       4m45s (x6 over 5m35s)  kubelet  Liveness probe failed: container is not running
  Normal   SandboxChanged  3m42s (x2 over 5m42s)  kubelet  Pod sandbox changed, it will be killed and re-created.
  Normal   Killing         3m42s (x2 over 5m42s)  kubelet  Stopping container calico-node
  Warning  Unhealthy       42s (x31 over 5m42s)   kubelet  Readiness probe failed: container is not running

2.2 kube-proxy报错

kube-proxy报错包括Failed to retrieve node info: Unauthorized和Failed to list *v1.Endpoints: Unauthorized等,具体如下:

代码语言:javascript
复制
W0430 12:33:28.887260       1 server_others.go:267] Flag proxy-mode="" unknown, assuming iptables proxy
W0430 12:33:28.913671       1 node.go:113] Failed to retrieve node info: Unauthorized
I0430 12:33:28.915780       1 server_others.go:147] Using iptables Proxier.
W0430 12:33:28.916065       1 proxier.go:314] invalid nodeIP, initializing kube-proxy with 127.0.0.1 as nodeIP
W0430 12:33:28.916089       1 proxier.go:319] clusterCIDR not specified, unable to distinguish between internal and external traffic
I0430 12:33:28.917555       1 server.go:555] Version: v1.14.1
I0430 12:33:28.959345       1 conntrack.go:52] Setting nf_conntrack_max to 131072
I0430 12:33:28.960392       1 config.go:202] Starting service config controller
I0430 12:33:28.960444       1 controller_utils.go:1027] Waiting for caches to sync for service config controller
I0430 12:33:28.960572       1 config.go:102] Starting endpoints config controller
I0430 12:33:28.960609       1 controller_utils.go:1027] Waiting for caches to sync for endpoints config controller
E0430 12:33:28.970720       1 event.go:191] Server rejected event '&v1.Event{TypeMeta:v1.TypeMeta{Kind:"", APIVersion:""}, ObjectMeta:v1.ObjectMeta{Name:"fh-ubuntu01.159a40901fa85264", GenerateName:"", Namespace:"default", SelfLink:"", UID:"", ResourceVersion:"", Generation:0, CreationTimestamp:v1.Time{Time:time.Time{wall:0x0, ext:0, loc:(*time.Location)(nil)}}, DeletionTimestamp:(*v1.Time)(nil), DeletionGracePeriodSeconds:(*int64)(nil), Labels:map[string]string(nil), Annotations:map[string]string(nil), OwnerReferences:[]v1.OwnerReference(nil), Initializers:(*v1.Initializers)(nil), Finalizers:[]string(nil), ClusterName:"", ManagedFields:[]v1.ManagedFieldsEntry(nil)}, InvolvedObject:v1.ObjectReference{Kind:"Node", Namespace:"", Name:"fh-ubuntu01", UID:"fh-ubuntu01", APIVersion:"", ResourceVersion:"", FieldPath:""}, Reason:"Starting", Message:"Starting kube-proxy.", Source:v1.EventSource{Component:"kube-proxy", Host:"fh-ubuntu01"}, FirstTimestamp:v1.Time{Time:time.Time{wall:0xbf2a2e0639406264, ext:334442672, loc:(*time.Location)(0x2703080)}}, LastTimestamp:v1.Time{Time:time.Time{wall:0xbf2a2e0639406264, ext:334442672, loc:(*time.Location)(0x2703080)}}, Count:1, Type:"Normal", EventTime:v1.MicroTime{Time:time.Time{wall:0x0, ext:0, loc:(*time.Location)(nil)}}, Series:(*v1.EventSeries)(nil), Action:"", Related:(*v1.ObjectReference)(nil), ReportingController:"", ReportingInstance:""}': 'Unauthorized' (will not retry!)
E0430 12:33:28.970939       1 reflector.go:126] k8s.io/client-go/informers/factory.go:133: Failed to list *v1.Endpoints: Unauthorized
E0430 12:33:28.971106       1 reflector.go:126] k8s.io/client-go/informers/factory.go:133: Failed to list *v1.Service: Unauthorized
E0430 12:33:29.977038       1 reflector.go:126] k8s.io/client-go/informers/factory.go:133: Failed to list *v1.Endpoints: Unauthorized
E0430 12:33:29.979890       1 reflector.go:126] k8s.io/client-go/informers/factory.go:133: Failed to list *v1.Service: Unauthorized
E0430 12:33:30.980098       1 reflector.go:126] k8s.io/client-go/informers/factory.go:133: Failed to list *v1.Endpoints: Unauthorized

清空kube-proxy的secrets,参考这里

kubeadm reset后 proxy的pod用的还是上次产生的secret,删除此secret后会自动生成新的,然后再删除相关的kube-proxy容器,问题是重新运行了kubeadm,集群里保存的证书和新生成的证书不一致导致。

解决kube-proxy的问题,你会发现2.1的calico问题已经自动解决了。

0x03  kubeadm join时kube-apiserver报错

加入集群时候,提示kube-apiserver x509证书问题

代码语言:javascript
复制
kube-apiserver[16692]: E0211 14:34:11.507411 16692 authentication.go:63] “Unable to authenticate the request” err=“[x509: certificate signed by unknown authority (possibly because of "crypto/rsa: verification error" while trying to verify candidate authority certificate "kubernetes")

删除$HOME/.kube,参考这里

0x04 总结

2.1的问题基本是由2.2导致的,优先处理2.2集群内过期的secret问题,2.1的calico问题会自动解决;

以上问题,仅是解决问题的一个切面,记录仅供参考。

本文参与 腾讯云自媒体分享计划,分享自作者个人站点/博客。
原始发表:2022-08-29,如有侵权请联系 cloudcommunity@tencent.com 删除

本文分享自 作者个人站点/博客 前往查看

如有侵权,请联系 cloudcommunity@tencent.com 删除。

本文参与 腾讯云自媒体分享计划  ,欢迎热爱写作的你一起参与!

评论
登录后参与评论
0 条评论
热度
最新
推荐阅读
目录
  • 0x00 概述
  • 0x01 kubeadm reset后的清理工作
  • 0x02 kubeadm reset后重装集群遇到的问题
    • 2.1 calico报错
      • 2.2 kube-proxy报错
      • 0x03  kubeadm join时kube-apiserver报错
      • 0x04 总结
      相关产品与服务
      容器服务
      腾讯云容器服务(Tencent Kubernetes Engine, TKE)基于原生 kubernetes 提供以容器为核心的、高度可扩展的高性能容器管理服务,覆盖 Serverless、边缘计算、分布式云等多种业务部署场景,业内首创单个集群兼容多种计算节点的容器资源管理模式。同时产品作为云原生 Finops 领先布道者,主导开源项目Crane,全面助力客户实现资源优化、成本控制。
      领券
      问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档