首页
学习
活动
专区
圈层
工具
发布
首页
学习
活动
专区
圈层
工具
MCP广场
社区首页 >问答首页 >CrashLoopBackOff中的Coredns (kubernetes 1.11)

CrashLoopBackOff中的Coredns (kubernetes 1.11)
EN

Stack Overflow用户
提问于 2018-11-14 22:27:46
回答 1查看 2.2K关注 0票数 2

我试图在Ubuntu16.04VM上安装kubernetes,按照https://kubernetes.io/docs/setup/independent/create-cluster-kubeadm/的说明,使用weave作为我的荚网络插件。

我看到了类似于码荚具有CrashLoopBackOff或错误状态的问题,但是我没有看到一个解决方案,而且我使用的版本是不同的:

代码语言:javascript
运行
复制
kubeadm         1.11.4-00
kubectl         1.11.4-00
kubelet         1.11.4-00
kubernetes-cni  0.6.0-00
Docker version 1.13.1-cs8, build 91ca5f2
weave script 2.5.0
weave 2.5.0

我在公司防火墙后面运行,所以我设置了代理变量,然后运行kubeadm init,如下所示:

代码语言:javascript
运行
复制
# echo $http_proxy
http://135.28.13.11:8080
# echo $https_proxy
http://135.28.13.11:8080
# echo $no_proxy
127.0.0.1,135.21.27.139,135.0.0.0/8,10.96.0.0/12,10.32.0.0/12
# kubeadm init --pod-network-cidr=10.32.0.0/12 
# kubectl apply -f "https://cloud.weave.works/k8s/net?k8s-version=$(kubectl version | base64 | tr -d '\n')" 
# kubectl taint nodes --all node-role.kubernetes.io/master-

两个核荚都呆在CrashLoopBackOff

代码语言:javascript
运行
复制
# kubectl get pods  --all-namespaces -o wide
NAMESPACE     NAME                                     READY     STATUS             RESTARTS   AGE       IP              NODE             NOMINATED NODE
default       hostnames-674b556c4-2b5h2                1/1       Running            0          5h        10.32.0.6       mtpnjvzonap001   <none>
default       hostnames-674b556c4-4bzdj                1/1       Running            0          5h        10.32.0.5       mtpnjvzonap001   <none>
default       hostnames-674b556c4-64gx5                1/1       Running            0          5h        10.32.0.4       mtpnjvzonap001   <none>
kube-system   coredns-78fcdf6894-s7rvx                 0/1       CrashLoopBackOff   18         1h        10.32.0.7       mtpnjvzonap001   <none>
kube-system   coredns-78fcdf6894-vxwgv                 0/1       CrashLoopBackOff   80         6h        10.32.0.2       mtpnjvzonap001   <none>
kube-system   etcd-mtpnjvzonap001                      1/1       Running            0          6h        135.21.27.139   mtpnjvzonap001   <none>
kube-system   kube-apiserver-mtpnjvzonap001            1/1       Running            0          1h        135.21.27.139   mtpnjvzonap001   <none>
kube-system   kube-controller-manager-mtpnjvzonap001   1/1       Running            0          6h        135.21.27.139   mtpnjvzonap001   <none>
kube-system   kube-proxy-2c4tx                         1/1       Running            0          6h        135.21.27.139   mtpnjvzonap001   <none>
kube-system   kube-scheduler-mtpnjvzonap001            1/1       Running            0          1h        135.21.27.139   mtpnjvzonap001   <none>
kube-system   weave-net-bpx22                          2/2       Running            0          6h        135.21.27.139   mtpnjvzonap001   <none>

核荚在它们的日志中有这个条目。

github.com/coredns/coredns/plugin/kubernetes/controller.go:313:失败列出*v1.Service: Get https://10.96.0.1:443/api/v1/services?limit=500&resourceVersion=0:拨号TCP10.96.0.1:443: i/o超时

对我来说,这表明coredns不能使用集群IP访问apiserver pod:

代码语言:javascript
运行
复制
# kubectl describe svc/kubernetes
Name:              kubernetes
Namespace:         default
Labels:            component=apiserver
                   provider=kubernetes
Annotations:       <none>
Selector:          <none>
Type:              ClusterIP
IP:                10.96.0.1
Port:              https  443/TCP
TargetPort:        6443/TCP
Endpoints:         135.21.27.139:6443
Session Affinity:  None
Events:            <none>

我还在https://kubernetes.io/docs/tasks/debug-application-cluster/troubleshooting/上完成了故障排除步骤。

  • 我创建了一个用于测试的公共箱吊舱。
  • 我成功地创建了主机名部署
  • 我成功地公开了主机名部署。
  • 从busybox,我成功地通过集群IP访问了主机名服务。
  • 从节点,我成功地通过集群IP访问了主机名服务。

因此,简单地说,我创建了主机名服务,它在10.96.0.0/12空间中有一个集群IP (如预期的那样),它可以工作,但出于某种原因,豆荚无法访问apiserver的10.96.0.1集群IP,尽管从节点我可以访问10.96.0.1:

代码语言:javascript
运行
复制
# wget --no-check-certificate https://10.96.0.1/hello
--2018-11-14 21:44:25--  https://10.96.0.1/hello
Connecting to 10.96.0.1:443... connected.
WARNING: cannot verify 10.96.0.1's certificate, issued by ‘CN=kubernetes’:
  Unable to locally verify the issuer's authority.
HTTP request sent, awaiting response... 403 Forbidden
2018-11-14 21:44:25 ERROR 403: Forbidden.

根据其他报告类似问题的人的建议,我检查了其他一些事情:

代码语言:javascript
运行
复制
# sysctl net.ipv4.conf.all.forwarding
net.ipv4.conf.all.forwarding = 1
# sysctl net.bridge.bridge-nf-call-iptables
net.bridge.bridge-nf-call-iptables = 1
# iptables-save | egrep ':INPUT|:OUTPUT|:POSTROUTING|:FORWARD'
:INPUT ACCEPT [0:0]
:OUTPUT ACCEPT [11:692]
:POSTROUTING ACCEPT [11:692]
:INPUT ACCEPT [1697:364811]
:FORWARD ACCEPT [0:0]
:OUTPUT ACCEPT [1652:363693]
# ls -l /usr/sbin/conntrack
-rwxr-xr-x 1 root root 65632 Jan 24  2016 /usr/sbin/conntrack
# systemctl status firewalld
● firewalld.service
   Loaded: not-found (Reason: No such file or directory)
   Active: inactive (dead)

我查了库贝代理的日志,没有发现任何错误。我还试着删除代码荚,apiserver荚;它们被重新创建(如预期的),但问题仍然存在。

这是编织容器中的日志副本

代码语言:javascript
运行
复制
# kubectl logs -n kube-system weave-net-bpx22 weave
DEBU: 2018/11/14 15:56:10.909921 [kube-peers] Checking peer "aa:53:be:75:71:f7" against list &{[]}
Peer not in list; removing persisted data
INFO: 2018/11/14 15:56:11.041807 Command line options: map[name:aa:53:be:75:71:f7 nickname:mtpnjvzonap001 ipalloc-init:consensus=1 ipalloc-range:10.32.0.0/12 db-prefix:/weavedb/weave-net docker-api: expect-npc:true host-root:/host http-addr:127.0.0.1:6784 metrics-addr:0.0.0.0:6782 conn-limit:100 datapath:datapath no-dns:true port:6783]
INFO: 2018/11/14 15:56:11.042230 weave  2.5.0
INFO: 2018/11/14 15:56:11.198348 Bridge type is bridged_fastdp
INFO: 2018/11/14 15:56:11.198372 Communication between peers is unencrypted.
INFO: 2018/11/14 15:56:11.203206 Our name is aa:53:be:75:71:f7(mtpnjvzonap001)
INFO: 2018/11/14 15:56:11.203249 Launch detected - using supplied peer list: [135.21.27.139]
INFO: 2018/11/14 15:56:11.216398 Checking for pre-existing addresses on weave bridge
INFO: 2018/11/14 15:56:11.229313 [allocator aa:53:be:75:71:f7] No valid persisted data
INFO: 2018/11/14 15:56:11.233391 [allocator aa:53:be:75:71:f7] Initialising via deferred consensus
INFO: 2018/11/14 15:56:11.233443 Sniffing traffic on datapath (via ODP)
INFO: 2018/11/14 15:56:11.234120 ->[135.21.27.139:6783] attempting connection
INFO: 2018/11/14 15:56:11.234302 ->[135.21.27.139:49182] connection accepted
INFO: 2018/11/14 15:56:11.234818 ->[135.21.27.139:6783|aa:53:be:75:71:f7(mtpnjvzonap001)]: connection shutting down due to error: cannot connect to ourself
INFO: 2018/11/14 15:56:11.234843 ->[135.21.27.139:49182|aa:53:be:75:71:f7(mtpnjvzonap001)]: connection shutting down due to error: cannot connect to ourself
INFO: 2018/11/14 15:56:11.236010 Listening for HTTP control messages on 127.0.0.1:6784
INFO: 2018/11/14 15:56:11.236424 Listening for metrics requests on 0.0.0.0:6782
INFO: 2018/11/14 15:56:11.990529 [kube-peers] Added myself to peer list &{[{aa:53:be:75:71:f7 mtpnjvzonap001}]}
DEBU: 2018/11/14 15:56:11.995901 [kube-peers] Nodes that have disappeared: map[]
10.32.0.1
135.21.27.139
DEBU: 2018/11/14 15:56:12.075738 registering for updates for node delete events
INFO: 2018/11/14 15:56:41.279799 Error checking version: Get https://checkpoint-api.weave.works/v1/check/weave-net?arch=amd64&flag_docker-version=none&flag_kernel-version=4.4.0-135-generic&flag_kubernetes-cluster-size=1&flag_kubernetes-cluster-uid=ce66cb23-e825-11e8-abc3-525400314503&flag_kubernetes-version=v1.11.4&os=linux&signature=EJdydeNysrC7LC5xAJAKyDvxXCvkeWUFzepdk3QDfr0%3D&version=2.5.0: dial tcp 74.125.196.121:443: i/o timeout
INFO: 2018/11/14 20:52:47.025412 Error checking version: Get https://checkpoint-api.weave.works/v1/check/weave-net?arch=amd64&flag_docker-version=none&flag_kernel-version=4.4.0-135-generic&flag_kubernetes-cluster-size=1&flag_kubernetes-cluster-uid=ce66cb23-e825-11e8-abc3-525400314503&flag_kubernetes-version=v1.11.4&os=linux&signature=EJdydeNysrC7LC5xAJAKyDvxXCvkeWUFzepdk3QDfr0%3D&version=2.5.0: dial tcp 74.125.196.121:443: i/o timeout
INFO: 2018/11/15 01:46:32.842792 Error checking version: Get https://checkpoint-api.weave.works/v1/check/weave-net?arch=amd64&flag_docker-version=none&flag_kernel-version=4.4.0-135-generic&flag_kubernetes-cluster-size=1&flag_kubernetes-cluster-uid=ce66cb23-e825-11e8-abc3-525400314503&flag_kubernetes-version=v1.11.4&os=linux&signature=EJdydeNysrC7LC5xAJAKyDvxXCvkeWUFzepdk3QDfr0%3D&version=2.5.0: dial tcp 74.125.196.121:443: i/o timeout
INFO: 2018/11/15 09:06:03.624359 Error checking version: Get https://checkpoint-api.weave.works/v1/check/weave-net?arch=amd64&flag_docker-version=none&flag_kernel-version=4.4.0-135-generic&flag_kubernetes-cluster-size=1&flag_kubernetes-cluster-uid=ce66cb23-e825-11e8-abc3-525400314503&flag_kubernetes-version=v1.11.4&os=linux&signature=EJdydeNysrC7LC5xAJAKyDvxXCvkeWUFzepdk3QDfr0%3D&version=2.5.0: dial tcp 172.217.9.147:443: i/o timeout
INFO: 2018/11/15 14:34:02.070893 Error checking version: Get https://checkpoint-api.weave.works/v1/check/weave-net?arch=amd64&flag_docker-version=none&flag_kernel-version=4.4.0-135-generic&flag_kubernetes-cluster-size=1&flag_kubernetes-cluster-uid=ce66cb23-e825-11e8-abc3-525400314503&flag_kubernetes-version=v1.11.4&os=linux&signature=EJdydeNysrC7LC5xAJAKyDvxXCvkeWUFzepdk3QDfr0%3D&version=2.5.0: dial tcp 172.217.9.147:443: i/o timeout

以下是两个核心舱的活动

代码语言:javascript
运行
复制
# kubectl get events -n kube-system --field-selector involvedObject.name=coredns-78fcdf6894-6f9q6
LAST SEEN   FIRST SEEN   COUNT     NAME                                        KIND      SUBOBJECT                  TYPE      REASON      SOURCE                    MESSAGE
41m         20h          245       coredns-78fcdf6894-6f9q6.1568eab25f0acb02   Pod       spec.containers{coredns}   Normal    Killing     kubelet, mtpnjvzonap001   Killing container with id docker://coredns:Container failed liveness probe.. Container will be killed and recreated.
26m         20h          248       coredns-78fcdf6894-6f9q6.1568ea920f72ddd4   Pod       spec.containers{coredns}   Normal    Pulled      kubelet, mtpnjvzonap001   Container image "k8s.gcr.io/coredns:1.1.3" already present on machine
5m          20h          1256      coredns-78fcdf6894-6f9q6.1568eaa1fd9216d2   Pod       spec.containers{coredns}   Warning   Unhealthy   kubelet, mtpnjvzonap001   Liveness probe failed: HTTP probe failed with statuscode: 503
1m          19h          2963      coredns-78fcdf6894-6f9q6.1568eb75f2b1af3e   Pod       spec.containers{coredns}   Warning   BackOff     kubelet, mtpnjvzonap001   Back-off restarting failed container
# kubectl get events -n kube-system --field-selector involvedObject.name=coredns-78fcdf6894-skjwz
LAST SEEN   FIRST SEEN   COUNT     NAME                                        KIND      SUBOBJECT                  TYPE      REASON      SOURCE                    MESSAGE
6m          20h          1259      coredns-78fcdf6894-skjwz.1568eaa181fbeffe   Pod       spec.containers{coredns}   Warning   Unhealthy   kubelet, mtpnjvzonap001   Liveness probe failed: HTTP probe failed with statuscode: 503
1m          19h          2969      coredns-78fcdf6894-skjwz.1568eb7578188f24   Pod       spec.containers{coredns}   Warning   BackOff     kubelet, mtpnjvzonap001   Back-off restarting failed container
#

欢迎任何帮助或进一步的故障排除步骤。

EN

回答 1

Stack Overflow用户

发布于 2019-08-14 18:42:09

我也有同样的问题,需要允许防火墙中的几个端口: 22,53,6443,6783,6784,8285。

我从现有的健康集群中复制规则。对于此错误,可能只需要6443 (上面显示为coredns服务的目标端口),而其他的则是我在集群中运行的其他服务。

对于Ubuntu,这是简单防火墙

代码语言:javascript
运行
复制
ufw allow 22/tcp # allowed for ssh, included in case you had firewall disabled altogether
ufw allow 6443
ufw allow 53
ufw allow 8285
ufw allow 6783
ufw allow 6784
票数 0
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/53309671

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档