对于kubectl来说,这仍然是一种全新的体验。我有一个Rancher测试环境(通过Terraform部署),我正在上面学习东西。尝试将新的k8s群集部署到我的环境时收到超时错误。我看了看pods,发现了4个头盔pods,都有错误:
% kubectl get pods --all-namespaces
NAMESPACE NAME READY STATUS RESTARTS AGE
cattle-logging rancher-logging-fluentd-linux-6x8vr 2/2 Running 0 20h
cattle-logging rancher-logging-fluentd-linux-9llsf 2/2 Running 0 20h
cattle-logging rancher-logging-fluentd-linux-hhwtb 2/2 Running 0 20h
cattle-logging rancher-logging-fluentd-linux-rzbc8 2/2 Running 0 20h
cattle-logging rancher-logging-log-aggregator-linux-9q6w8 1/1 Running 0 20h
cattle-logging rancher-logging-log-aggregator-linux-b27c4 1/1 Running 0 20h
cattle-logging rancher-logging-log-aggregator-linux-h8q75 1/1 Running 0 20h
cattle-logging rancher-logging-log-aggregator-linux-hhbk7 1/1 Running 0 20h
cattle-system helm-operation-2ztsk 1/2 Error 0 41m
cattle-system helm-operation-7jlwf 1/2 Error 0 12m
cattle-system helm-operation-fv5hq 1/2 Error 0 55m
cattle-system helm-operation-zbdnd 1/2 Error 0 27m
cattle-system rancher-6f77f5cbb4-cs4sp 2/2 Running 0 42m
cattle-system rancher-6f77f5cbb4-gvkv7 2/2 Running 0 42m
cattle-system rancher-6f77f5cbb4-jflnb 2/2 Running 0 42m
cert-manager cert-manager-cainjector-596464bfbd-zj2wg 1/1 Running 0 6h39m
cert-manager cert-manager-df467b89d-c5kdw 1/1 Running 0 6h39m
cert-manager cert-manager-df467b89d-kbvgm 1/1 Running 0 6h39m
cert-manager cert-manager-df467b89d-lndnp 1/1 Running 0 6h40m
cert-manager cert-manager-webhook-55f8bd4b8c-m58n2 1/1 Running 0 6h39m
fleet-system fleet-agent-6688b99df5-n26zf 1/1 Running 0 6h40m
fleet-system fleet-controller-6dc545d5db-f6f2t 1/1 Running 0 6h40m
fleet-system gitjob-84bd8cf9c4-4q95g 1/1 Running 0 6h40m
ingress-nginx nginx-nginx-ingress-controller-58689b79d9-44q95 1/1 Running 0 6h40m
ingress-nginx nginx-nginx-ingress-controller-58689b79d9-blgpf 1/1 Running 0 6h39m
ingress-nginx nginx-nginx-ingress-controller-58689b79d9-wkdg9 1/1 Running 0 6h40m
ingress-nginx nginx-nginx-ingress-default-backend-65d7b58ccc-tbwlk 1/1 Running 0 6h39m
kube-system coredns-799dffd9c4-nmplh 1/1 Running 0 6h39m
kube-system coredns-799dffd9c4-stjhl 1/1 Running 0 6h40m
kube-system coredns-autoscaler-7868844956-qr67l 1/1 Running 0 6h41m
kube-system kube-flannel-5wzd7 2/2 Running 0 20h
kube-system kube-flannel-hm7tc 2/2 Running 0 20h
kube-system kube-flannel-hptdm 2/2 Running 0 20h
kube-system kube-flannel-jjbpq 2/2 Running 0 20h
kube-system kube-flannel-pqfkh 2/2 Running 0 20h
kube-system metrics-server-59c6fd6767-ngrzg 1/1 Running 0 6h40m
kube-system rke-coredns-addon-deploy-job-l7n2b 0/1 Completed 0 20h
kube-system rke-metrics-addon-deploy-job-bkpf2 0/1 Completed 0 20h
kube-system rke-network-plugin-deploy-job-vht9d 0/1 Completed 0 20h
metallb-system controller-7686dfc96b-fn7hw 1/1 Running 0 6h39m
metallb-system speaker-9l8fp 1/1 Running 0 20h
metallb-system speaker-9mxp2 1/1 Running 0 20h
metallb-system speaker-b2ltt 1/1 Running 0 20h
rancher-operator-system rancher-operator-576f654978-5c4kb 1/1 Running 0 6h39m我想看看重新启动pod是否能让它们恢复正常,但我不知道如何做到这一点。Helm不会显示在kubectl get deployments --all-namespaces下,所以我不能缩放pods或执行kubectl rollout restart。
如何重启这些pods?
发布于 2020-11-03 14:27:30
您可以尝试使用以下命令查看有关特定pod的更多信息,以便进行故障排除:kubectl describe pod
发布于 2020-11-03 18:56:24
为此,您可以按照以下步骤(按此顺序)进行操作:
通过执行kubectl describe pods ${POD_NAME}并检查其失败背后的原因来执行
kubectl logs --previous ${POD_NAME} ${CONTAINER_NAME}或kubectl logs ${POD_NAME} ${CONTAINER_NAME}进行are:
通过使用kubectl exec在特定容器内运行命令来执行
当由于容器崩溃或容器映像不包括调试实用程序(例如使用distroless images )而导致kubectl exec不足时,
v1.18.版本开始,kubectl有一个阿尔法命令,可以创建用于调试的临时容器
这些步骤应该足以深入问题的核心,然后专注于解决它。
https://stackoverflow.com/questions/64611510
复制相似问题