前往小程序,Get更优阅读体验!
立即前往
首页
学习
活动
专区
工具
TVP
发布
社区首页 >专栏 >Tungsten Fabric实战:基于K8s的部署踩坑

Tungsten Fabric实战:基于K8s的部署踩坑

作者头像
Tungsten Fabric
修改2020-06-09 14:37:53
1.9K0
修改2020-06-09 14:37:53
举报

Tungsten Fabric(原名opencontrail),提供了可以与编排器(openstack/k8s/vCenter)协同工作的controller,和部署在计算节点/node上的vRouter受其管控,替代原有的linux-bridge/ovs进行通信。

作者:刘敬一

前言

研究一款开源控制器,最好的方法就是先部署一套,怎么方便怎么来。

先去TF的GitHub,无论是tf-devstack还是tf-dev-env里面的run.sh,全都卡住。

代码语言:javascript
复制
[setup contrail git sources]
INFO: source env from /root/contrail/.env/tf-developer-sandbox.env
INFO: current folder is
100  2584  100  2584    0     0    934      0  0:00:02  0:00:02 --:--:--   933
INFO: Download repo tool
  1. 找到TF中文社区,加微信,被拉入TF讨论群
  2. 经过群里的大佬吴sir和杨sir的指导,开始按照以下文章来部署

实操记录

初始准备

  • 创建三台CentOS7.7的虚拟机
代码语言:javascript
复制
deployer 192.168.122.160
master01 192.168.122.96  <---内存至少8G
node01 192.168.122.250

# cat /etc/redhat-release 
CentOS Linux release 7.7.1908 (Core)

基于aliyun的pip加速

  • 各个节点设置pip加速
代码语言:javascript
复制
mkdir .pip && tee ~/.pip/pip.conf <<-'EOF'
[global]
trusted-host =  mirrors.aliyun.com
index-url = https://mirrors.aliyun.com/pypi/simple
EOF

基于aliyun的docker镜像加速

  • 网上教程很多,下面的加速地址用**隐去
代码语言:javascript
复制
sudo mkdir -p /etc/docker
sudo tee /etc/docker/daemon.json <<-'EOF'
{
  "registry-mirrors": ["https://********.mirror.aliyuncs.com"]
}
EOF
sudo systemctl daemon-reload
sudo systemctl restart docker

一些源文件

  • 很多需要的安装文件被放到了http://35.220.208.0/ 这个服务器上,可以根据实际链接来下发命令
代码语言:javascript
复制
mkdir pkg_python/
cd pkg_python/
wget http://35.220.208.0/packages_python/pip-19.3.1.tar.gz
easy_install pip-19.3.1.tar.gz
easy_install --upgrade --dry-run pip
wget http://35.220.208.0/packages_python/docker_compose-1.24.1-py2.py3-none-any.whl
pip2 install docker_compose-1.24.1-py2.py3-none-any.whl

mkdir /root/pkg_k8s
cd /root/pkg_k8s
wget http://35.220.208.0/k8s_v1.12.9/packages/auto_download.sh
chmod +x auto_download.sh
./auto_download.sh
  • 遇到下面的错误,但是貌似没有什么影响
代码语言:javascript
复制
[root@localhost pkg_python]# easy_install --upgrade --dry-run pip
Searching for pip
Reading https://pypi.python.org/simple/pip/
Best match: pip 20.0.2
Downloading https://files.pythonhosted.org/packages/8e/76/66066b7bc71817238924c7e4b448abdb17eb0c92d645769c223f9ace478f/pip-20.0.2.tar.gz#sha256=7db0c8ea4c7ea51c8049640e8e6e7fde949de672bfa4949920675563a5a6967f
Processing pip-20.0.2.tar.gz
Writing /tmp/easy_install-bm8Ztx/pip-20.0.2/setup.cfg
Running pip-20.0.2/setup.py -n -q bdist_egg --dist-dir /tmp/easy_install-bm8Ztx/pip-20.0.2/egg-dist-tmp-32s9sn
/usr/lib64/python2.7/distutils/dist.py:267: UserWarning: Unknown distribution option: 'project_urls'
  warnings.warn(msg)
/usr/lib64/python2.7/distutils/dist.py:267: UserWarning: Unknown distribution option: 'python_requires'
  warnings.warn(msg)
warning: no files found matching 'docs/docutils.conf'
warning: no previously-included files found matching '.coveragerc'
warning: no previously-included files found matching '.mailmap'
warning: no previously-included files found matching '.appveyor.yml'
warning: no previously-included files found matching '.travis.yml'
warning: no previously-included files found matching '.readthedocs.yml'
warning: no previously-included files found matching '.pre-commit-config.yaml'
warning: no previously-included files found matching 'tox.ini'
warning: no previously-included files found matching 'noxfile.py'
warning: no files found matching 'Makefile' under directory 'docs'
warning: no files found matching '*.bat' under directory 'docs'
warning: no previously-included files found matching 'src/pip/_vendor/six'
warning: no previously-included files found matching 'src/pip/_vendor/six/moves'
warning: no previously-included files matching '*.pyi' found under directory 'src/pip/_vendor'
no previously-included directories found matching '.github'
no previously-included directories found matching '.azure-pipelines'
no previously-included directories found matching 'docs/build'
no previously-included directories found matching 'news'
no previously-included directories found matching 'tasks'
no previously-included directories found matching 'tests'
no previously-included directories found matching 'tools'
warning: install_lib: 'build/lib' does not exist -- no Python modules to install

[root@localhost pkg_python]# 

本地registry

  • 本地运行registry容器,宿主机的80端口映射为容器的5000端口
代码语言:javascript
复制
[root@deployer ~]# docker run -d -p 80:5000 --restart=always --name registry registry:2
0c17a03ebdffe3cea98d7cec42c268c1117241f236f9f2443bbb1b77d34b0082
[root@deployer ~]# 
[root@deployer ~]# docker ps
CONTAINER ID        IMAGE               COMMAND                  CREATED             STATUS              PORTS                  NAMES
0c17a03ebdff        registry:2          "/entrypoint.sh /etc…"   About an hour ago   Up About an hour    0.0.0.0:80->5000/tcp   registry
[root@deployer ~]# 

设置yaml文件

  • 获取到contrail-ansible-deployer之后,进入文件夹,修改instances.yaml
代码语言:javascript
复制
[root@deployer inventory]# vim  ../config/instances.yaml

provider_config:
  bms:
   ssh_pwd: Password
   ssh_user: root
   ssh_public_key: /root/.ssh/id_rsa.pub
   ssh_private_key: /root/.ssh/id_rsa
   domainsuffix: local
instances:
  bms1:
    provider: bms
    roles:
      config_database:
      config:
      control:
      analytics_database:
      analytics:
      webui:
      k8s_master:
      kubemanager:
    ip: 192.168.122.96
  bms2:
    provider: bms
    roles:
      vrouter:
      k8s_node:
    ip: 192.168.122.250
global_configuration:
  CONTAINER_REGISTRY: hub.juniper.net
contrail_configuration:
  CONTRAIL_VERSION: 1912-latest
  • CONTAINER_REGISTRY替换为本地registry,contrail的版本设置为1912-last与后面拉取镜像retag保持一致

设置免密登录

  • 需要设置从developer不输入密码就能登录本机/master01/node01
代码语言:javascript
复制
# ssh-keygen -t rsa

# ssh-copy-id -i ~/.ssh/id_rsa.pub root@master01
# ssh-copy-id -i ~/.ssh/id_rsa.pub root@node01
# ssh-copy-id -i ~/.ssh/id_rsa.pub root@node02

ansible

  • deployer上执行ansible会有报错
代码语言:javascript
复制
/usr/lib/python2.7/site-packages/requests/__init__.py:91: RequestsDependencyWarning: urllib3 (1.24.3) or chardet (2.2.1) doesn't match a supported version!
  RequestsDependencyWarning)

解决方法是

代码语言:javascript
复制
pip uninstall urllib3    
pip uninstall chardet
pip install requests 

拉取镜像

  • k8s的镜像还好,有aliyun加速
  • contrail的源hub.juniper.net是需要Juniper的账号,这个需要替换为opencontrailnightly
  • 杨sir提供了脚本进行拉取和推送到本地registry,后续master/node就可以直接从deployer的registry拉取了
  • 如果是用最新的contrail-ansible-deployer代码,还需要加上一个镜像:contrail-provisioner
  • 但是执行之前,需要先将本地IP设置为insecure-registry,就可以基于http而不是https下载了
  • 一种解决方法就是修改/etc/docker/daemon.json(如果没有就自己加)
代码语言:javascript
复制
[root@node01 ~]# cat /etc/docker/daemon.json 
{
  "insecure-registries": [ "hub.juniper.net","k8s.gcr.io" ]
}
[root@node01 ~]# 

然后

代码语言:javascript
复制
[root@deployer ~]# systemctl daemon-reload
[root@deployer ~]# systemctl restart docker
  • 脚本如下,已经修改为deployer的IP
代码语言:javascript
复制
# 准备Kubernetes离线镜像,运行如下脚本
#!/bin/bash
# Author: Alex Yang <alex890714@gmail.com>

set -e

REPOSITORIE="gcr.azk8s.cn/google_containers"
LOCAL_REPO="192.168.122.160"
IMAGES="kube-proxy:v1.12.9 kube-controller-manager:v1.12.9 kube-scheduler:v1.12.9 kube-apiserver:v1.12.9 coredns:1.2.2 coredns:1.2.6 pause:3.1 etcd:3.2.24 kubernetes-dashboard-amd64:v1.8.3"

for img in $IMAGES
do
  echo "===Pulling image: "$img
  docker pull $REPOSITORIE/$img
  echo "===Retag image ["$img"]"
  docker tag $REPOSITORIE/$img $LOCAL_REPO/$img
  echo "===Pushing image: "$LOCAL_REPO/$img
  docker push $LOCAL_REPO/$img
  docker rmi $REPOSITORIE/$img
done

# 准备TungstenFabric离线镜像,运行如下脚本

#!/bin/bash
# Author: Alex Yang <alex890714@gmail.com>

set -e

REGISTRY_URL=opencontrailnightly
LOCAL_REGISTRY_URL=192.168.122.160
IMAGE_TAG=1912-latest
COMMON_IMAGES="contrail-node-init contrail-status contrail-nodemgr contrail-external-cassandra contrail-external-zookeeper contrail-external-kafka contrail-external-redis contrail-external-rabbitmq contrail-external-rsyslogd"
ANALYTICS_IMAGES="contrail-analytics-query-engine contrail-analytics-api contrail-analytics-collector contrail-analytics-snmp-collector contrail-analytics-snmp-topology contrail-analytics-alarm-gen"
CONTROL_IMAGES="contrail-controller-control-control contrail-controller-control-dns contrail-controller-control-named contrail-controller-config-api contrail-controller-config-devicemgr contrail-controller-config-schema contrail-controller-config-svcmonitor contrail-controller-config-stats contrail-controller-config-dnsmasq"
WEBUI_IMAGES="contrail-controller-webui-job contrail-controller-webui-web"
K8S_IMAGES="contrail-kubernetes-kube-manager contrail-kubernetes-cni-init"
VROUTER_IMAGES="contrail-vrouter-kernel-init contrail-vrouter-agent"

IMAGES=$COMMON_IMAGES" "$ANALYTICS_IMAGES" "$CONTROL_IMAGES" "$WEBUI_IMAGES" "$K8S_IMAGES" "$VROUTER_IMAGES

for image in $IMAGES
do
  echo "===Pulling image: "$image
  docker pull $REGISTRY_URL/$image:$IMAGE_TAG
  echo "===Retag image ["$image"]"
  docker tag $REGISTRY_URL/$image:$IMAGE_TAG $LOCAL_REGISTRY_URL/$image:$IMAGE_TAG
  echo "===Pushing image: "$LOCAL_REGISTRY_URL/$image:$IMAGE_TAG
  docker push $LOCAL_REGISTRY_URL/$image:$IMAGE_TAG
  docker rmi $REGISTRY_URL/$image:$IMAGE_TAG
done
  • 查看镜像列表
代码语言:javascript
复制
[root@deployer ~]# docker image list
REPOSITORY                                              TAG                 IMAGE ID            CREATED             SIZE
ubuntu                                                  latest              72300a873c2c        3 weeks ago         64.2MB
registry                                                2                   708bc6af7e5e        7 weeks ago         25.8MB
registry                                                latest              708bc6af7e5e        7 weeks ago         25.8MB
192.168.122.160/contrail-vrouter-kernel-init            1912-latest         92e9cce315a5        3 months ago        581MB
192.168.122.160/contrail-vrouter-agent                  1912-latest         e8d9457d740e        3 months ago        729MB
192.168.122.160/contrail-status                         1912-latest         d2264c6741a5        3 months ago        513MB
192.168.122.160/contrail-nodemgr                        1912-latest         c3428aa7e9b7        3 months ago        523MB
192.168.122.160/contrail-node-init                      1912-latest         c846ff071cc8        3 months ago        506MB
192.168.122.160/contrail-kubernetes-kube-manager        1912-latest         983a6307731b        3 months ago        517MB
192.168.122.160/contrail-kubernetes-cni-init            1912-latest         45c88538c834        3 months ago        525MB
192.168.122.160/contrail-external-zookeeper             1912-latest         6937c72b866c        3 months ago        290MB
192.168.122.160/contrail-external-rsyslogd              1912-latest         812ba27a4e08        3 months ago        304MB
192.168.122.160/contrail-external-redis                 1912-latest         3dc79f0b6eb9        3 months ago        129MB
192.168.122.160/contrail-external-rabbitmq              1912-latest         a98ac91667b2        3 months ago        256MB
192.168.122.160/contrail-external-kafka                 1912-latest         7b5a2ce6a656        3 months ago        665MB
192.168.122.160/contrail-external-cassandra             1912-latest         20109c39696c        3 months ago        545MB
192.168.122.160/contrail-controller-webui-web           1912-latest         44054aa131c5        3 months ago        552MB
192.168.122.160/contrail-controller-webui-job           1912-latest         946e2bbd7451        3 months ago        552MB
192.168.122.160/contrail-controller-control-named       1912-latest         81ef8223a519        3 months ago        575MB
192.168.122.160/contrail-controller-control-dns         1912-latest         15c1ce0cf26e        3 months ago        575MB
192.168.122.160/contrail-controller-control-control     1912-latest         ec195cc75705        3 months ago        594MB
192.168.122.160/contrail-controller-config-svcmonitor   1912-latest         3d53781422be        3 months ago        673MB
192.168.122.160/contrail-controller-config-stats        1912-latest         46bc77cf1c87        3 months ago        506MB
192.168.122.160/contrail-controller-config-schema       1912-latest         75acb8ed961f        3 months ago        673MB
192.168.122.160/contrail-controller-config-dnsmasq      1912-latest         dc2980441d51        3 months ago        506MB
192.168.122.160/contrail-controller-config-devicemgr    1912-latest         c08868a27a0a        3 months ago        772MB
192.168.122.160/contrail-controller-config-api          1912-latest         f39ca251b475        3 months ago        706MB
192.168.122.160/contrail-analytics-snmp-topology        1912-latest         5ee37cbbd034        3 months ago        588MB
192.168.122.160/contrail-analytics-snmp-collector       1912-latest         29ae502fb74f        3 months ago        588MB
192.168.122.160/contrail-analytics-query-engine         1912-latest         b5f937d6b6e3        3 months ago        588MB
192.168.122.160/contrail-analytics-collector            1912-latest         ee1bdbcc460a        3 months ago        588MB
192.168.122.160/contrail-analytics-api                  1912-latest         ac5c8f7cef89        3 months ago        588MB
192.168.122.160/contrail-analytics-alarm-gen            1912-latest         e155b24a0735        3 months ago        588MB
192.168.10.10/kube-proxy                                v1.12.9             295526df163c        9 months ago        95.7MB
192.168.122.160/kube-proxy                              v1.12.9             295526df163c        9 months ago        95.7MB
192.168.122.160/kube-controller-manager                 v1.12.9             f473e8452c8e        9 months ago        164MB
192.168.122.160/kube-apiserver                          v1.12.9             8ea704c2d4a7        9 months ago        194MB
192.168.122.160/kube-scheduler                          v1.12.9             c79506ccc1bc        9 months ago        58.4MB
192.168.122.160/coredns                                 1.2.6               f59dcacceff4        16 months ago       40MB
192.168.122.160/etcd                                    3.2.24              3cab8e1b9802        18 months ago       220MB
192.168.122.160/coredns                                 1.2.2               367cdc8433a4        18 months ago       39.2MB
192.168.122.160/kubernetes-dashboard-amd64              v1.8.3              0c60bcf89900        2 years ago         102MB
192.168.122.160/pause                                   3.1                 da86e6ba6ca1        2 years ago         742kB
[root@deployer ~]# 
  • 查看本地仓库中的image
代码语言:javascript
复制
[root@deployer ~]# curl -X GET http://localhost/v2/_catalog | python -m json.tool
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  1080  100  1080    0     0  18298      0 --:--:-- --:--:-- --:--:-- 18620
{
    "repositories": [
        "contrail-analytics-alarm-gen",
        "contrail-analytics-api",
        "contrail-analytics-collector",
        "contrail-analytics-query-engine",
        "contrail-analytics-snmp-collector",
        "contrail-analytics-snmp-topology",
        "contrail-controller-config-api",
        "contrail-controller-config-devicemgr",
        "contrail-controller-config-dnsmasq",
        "contrail-controller-config-schema",
        "contrail-controller-config-stats",
        "contrail-controller-config-svcmonitor",
        "contrail-controller-control-control",
        "contrail-controller-control-dns",
        "contrail-controller-control-named",
        "contrail-controller-webui-job",
        "contrail-controller-webui-web",
        "contrail-external-cassandra",
        "contrail-external-kafka",
        "contrail-external-rabbitmq",
        "contrail-external-redis",
        "contrail-external-rsyslogd",
        "contrail-external-zookeeper",
        "contrail-kubernetes-cni-init",
        "contrail-kubernetes-kube-manager",
        "contrail-node-init",
        "contrail-nodemgr",
        "contrail-status",
        "contrail-vrouter-agent",
        "contrail-vrouter-kernel-init",
        "coredns",
        "etcd",
        "kube-apiserver",
        "kube-controller-manager",
        "kube-proxy",
        "kube-scheduler",
        "kubernetes-dashboard-amd64",
        "pause"
    ]
}
[root@deployer ~]# 
  • 至于master01和node01,就可以直接从developer上拉取k8s/contrail的镜像了,速度杠杠的!(别忘了--insecure-registry=192.168.122.160)
代码语言:javascript
复制
# 准备Kubernetes离线镜像,运行如下脚本
#!/bin/bash
# Author: Alex Yang <alex890714@gmail.com>

set -e

REPOSITORIE="k8s.gcr.io"
LOCAL_REPO="192.168.122.160"
IMAGES="kube-proxy:v1.12.9 kube-controller-manager:v1.12.9 kube-scheduler:v1.12.9 kube-apiserver:v1.12.9 coredns:1.2.2 coredns:1.2.6 pause:3.1 etcd:3.2.24 kubernetes-dashboard-amd64:v1.8.3"

for img in $IMAGES
do
  echo "===Pulling image: "$img
  docker pull $LOCAL_REPO/$img
  echo "===Retag image ["$img"]"
  docker tag $LOCAL_REPO/$img $REPOSITORIE/$img
  docker rmi $LOCAL_REPO/$img
done

# 准备TungstenFabric离线镜像,运行如下脚本

#!/bin/bash
# Author: Alex Yang <alex890714@gmail.com>

set -e

REPOSITORIE=hub.juniper.net
LOCAL_REPO="192.168.122.160"
IMAGE_TAG=1912-latest
COMMON_IMAGES="contrail-node-init contrail-status contrail-nodemgr contrail-external-cassandra contrail-external-zookeeper contrail-external-kafka contrail-external-redis contrail-external-rabbitmq contrail-external-rsyslogd"
ANALYTICS_IMAGES="contrail-analytics-query-engine contrail-analytics-api contrail-analytics-collector contrail-analytics-snmp-collector contrail-analytics-snmp-topology contrail-analytics-alarm-gen"
CONTROL_IMAGES="contrail-controller-control-control contrail-controller-control-dns contrail-controller-control-named contrail-controller-config-api contrail-controller-config-devicemgr contrail-controller-config-schema contrail-controller-config-svcmonitor contrail-controller-config-stats contrail-controller-config-dnsmasq"
WEBUI_IMAGES="contrail-controller-webui-job contrail-controller-webui-web"
K8S_IMAGES="contrail-kubernetes-kube-manager contrail-kubernetes-cni-init"
VROUTER_IMAGES="contrail-vrouter-kernel-init contrail-vrouter-agent"

IMAGES=$COMMON_IMAGES" "$ANALYTICS_IMAGES" "$CONTROL_IMAGES" "$WEBUI_IMAGES" "$K8S_IMAGES" "$VROUTER_IMAGES

for img in $IMAGES
do
  echo "===Pulling image: "$img
  docker pull $LOCAL_REPO/$img:$IMAGE_TAG
  echo "===Retag image ["$img"]"
  docker tag $LOCAL_REPO/$img:$IMAGE_TAG $REPOSITORIE/$img:$IMAGE_TAG
  docker rmi $LOCAL_REPO/$img:$IMAGE_TAG
done

打开web

  • developer上执行过
代码语言:javascript
复制
ansible-playbook -e orchestrator=kubernetes -i inventory/ playbooks/install_k8s.yml
ansible-playbook -e orchestrator=kubernetes -i inventory/ playbooks/install_contrail.yml
  • web访问master01的8143端口,默认进入的是monitor页面
  • 用户名/密码:admin/contrail123,domain不需要填,总算看到WebUI了

  • 可以切换到config页面

k8s状态

  • node
代码语言:javascript
复制
[root@master01 ~]# kubectl get nodes
NAME       STATUS   ROLES    AGE    VERSION
master01   Ready    master   6h4m   v1.12.9
node01     Ready    <none>   6h3m   v1.12.9
[root@master01 ~]# 
[root@master01 ~]# kubectl get namespaces
NAME          STATUS   AGE
contrail      Active   80m
default       Active   6h20m
kube-public   Active   6h20m
kube-system   Active   6h20m
[root@master01 ~]# 
  • pods
代码语言:javascript
复制
[root@master01 ~]# kubectl get pods -n kube-system 
NAME                                    READY   STATUS             RESTARTS   AGE
coredns-85c98899b4-4dzzx                0/1     ImagePullBackOff   0          6h2m
coredns-85c98899b4-w4bcs                0/1     ImagePullBackOff   0          6h2m
etcd-master01                           1/1     Running            5          28m
kube-apiserver-master01                 1/1     Running            4          28m
kube-controller-manager-master01        1/1     Running            5          28m
kube-proxy-dmmlh                        1/1     Running            5          6h2m
kube-proxy-ph9gx                        1/1     Running            1          6h2m
kube-scheduler-master01                 1/1     Running            5          28m
kubernetes-dashboard-76456c6d4b-x5lz4   0/1     ImagePullBackOff   0          6h2m

继续排障

node01无法使用kubectrl命令

  • 问题如下
代码语言:javascript
复制
[root@node01 ~]# kubectl get pods -n kube-system -o wide
The connection to the server localhost:8080 was refused - did you specify the right host or port?
  • 解决方法参考这里 https://blog.csdn.net/qq_24046745/article/details/94405188?depth_1-utm_source=distribute.pc_relevant.none-task&utm_source=distribute.pc_relevant.none-task
代码语言:javascript
复制
[root@node01 ~]# scp root@192.168.122.250:/etc/kubernetes/admin.conf /etc/kubernetes/admin.conf
[root@node01 ~]# echo "export KUBECONFIG=/etc/kubernetes/admin.conf" >> ~/.bash_profile
[root@node01 ~]# source ~/.bash_profile
[root@node01 ~]# kubectl get pods -n kube-system -o wide
NAME                                    READY   STATUS             RESTARTS   AGE     IP                NODE       NOMINATED NODE
coredns-85c98899b4-4dzzx                0/1     ImagePullBackOff   0          5h45m   10.47.255.252     node01     <none>
coredns-85c98899b4-w4bcs                0/1     ImagePullBackOff   0          5h45m   10.47.255.251     node01     <none>
etcd-master01                           1/1     Running            3          11m     192.168.122.96    master01   <none>
kube-apiserver-master01                 1/1     Running            3          11m     192.168.122.96    master01   <none>
kube-controller-manager-master01        1/1     Running            3          11m     192.168.122.96    master01   <none>
kube-proxy-dmmlh                        1/1     Running            3          5h45m   192.168.122.96    master01   <none>
kube-proxy-ph9gx                        1/1     Running            1          5h44m   192.168.122.250   node01     <none>
kube-scheduler-master01                 1/1     Running            3          11m     192.168.122.96    master01   <none>
kubernetes-dashboard-76456c6d4b-x5lz4   0/1     ImagePullBackOff   0          5h44m   192.168.122.250   node01     <none>
[root@node01 ~]# 

ImagePullBackOff 的问题

  • 先看一下coredns的pod描述
代码语言:javascript
复制
[root@master01 ~]# kubectl describe pod coredns-85c98899b4-4dzzx -n kube-system
Name:               coredns-85c98899b4-4dzzx
Namespace:          kube-system
...
Events:
  Type     Reason                  Age                    From               Message
  ----     ------                  ----                   ----               -------
  Warning  FailedScheduling        75m (x281 over 4h40m)  default-scheduler  0/2 nodes are available: 2 node(s) had taints that the pod didn't tolerate.
  Warning  FailedCreatePodSandBox  71m                    kubelet, node01    Failed create pod sandbox: rpc error: code = Unknown desc = failed to set up sandbox container "1af3fb24d906d5f82ad3bdcf6d65be328302d3c596e63fc79ed0c134390b4753" network for pod "coredns-85c98899b4-4dzzx": NetworkPlugin cni failed to set up pod "coredns-85c98899b4-4dzzx_kube-system" network: Failed in Poll VM-CFG. Error : Failed in PollVM. Error : Failed HTTP Get operation. Return code 404
  Normal   SandboxChanged          70m (x3 over 71m)      kubelet, node01    Pod sandbox changed, it will be killed and re-created.
  Normal   Pulling                 70m (x3 over 70m)      kubelet, node01    pulling image "k8s.gcr.io/coredns:1.2.6"
  Warning  Failed                  70m (x3 over 70m)      kubelet, node01    Failed to pull image "k8s.gcr.io/coredns:1.2.6": rpc error: code = Unknown desc = Error response from daemon: Get https://k8s.gcr.io/v2/: dial tcp 192.168.122.160:443: getsockopt: no route to host
  Warning  Failed                  70m (x3 over 70m)      kubelet, node01    Error: ErrImagePull
  Warning  Failed                  6m52s (x282 over 70m)  kubelet, node01    Error: ImagePullBackOff
  Normal   BackOff                 103s (x305 over 70m)   kubelet, node01    Back-off pulling image "k8s.gcr.io/coredns:1.2.6"
[root@master01 ~]# 
  • 看来是启动pod的时候,insecure-registry还没有设置,强制重启pod
代码语言:javascript
复制
[root@master01 ~]# kubectl get pod coredns-85c98899b4-4dzzx -n kube-system -o yaml | kubectl replace --force -f -
pod "coredns-85c98899b4-4dzzx" deleted
pod/coredns-85c98899b4-4dzzx replaced
[root@master01 ~]# 
  • 发现还没有up,继续查看
代码语言:javascript
复制
[root@master01 ~]# kubectl describe pod coredns-85c98899b4-4dzzx -n kube-system
Events:
  Type     Reason                  Age                   From               Message
  ----     ------                  ----                  ----               -------
  Normal   Scheduled               6m29s                 default-scheduler  Successfully assigned kube-system/coredns-85c98899b4-fnpd7 to master01
  Warning  FailedCreatePodSandBox  6m26s                 kubelet, master01  Failed create pod sandbox: rpc error: code = Unknown desc = [failed to set up sandbox container "3074c719934789cef519eeae16d2eca4e272fb6bda1b157cee1dbdf2f597a59f" network for pod "coredns-85c98899b4-fnpd7": NetworkPlugin cni failed to set up pod "coredns-85c98899b4-fnpd7_kube-system" network: failed to find plugin "contrail-k8s-cni" in path [/opt/cni/bin], failed to clean up sandbox container "3074c719934789cef519eeae16d2eca4e272fb6bda1b157cee1dbdf2f597a59f" network for pod "coredns-85c98899b4-fnpd7": NetworkPlugin cni failed to teardown pod "coredns-85c98899b4-fnpd7_kube-system" network: failed to find plugin "contrail-k8s-cni" in path [/opt/cni/bin]]
  Normal   SandboxChanged          76s (x25 over 6m25s)  kubelet, master01  Pod sandbox changed, it will be killed and re-created.
  • 缺少contrail-k8s-cni,从node01复制一个过来
代码语言:javascript
复制
[root@master01 ~]# scp root@192.168.122.250:opt/cni/bin/contrail-k8s-cni /opt/cni/bin/
  • 再重建
代码语言:javascript
复制
[root@master01 ~]# kubectl get pod coredns-85c98899b4-fnpd7 -n kube-system -o yaml | kubectl replace --force -f -
pod "coredns-85c98899b4-fnpd7" deleted
pod/coredns-85c98899b4-fnpd7 replaced
[root@master01 ~]# 
  • 可惜重启之后还是有报错
代码语言:javascript
复制
Events:
  Type     Reason                  Age                  From               Message
  ----     ------                  ----                 ----               -------
  Normal   Scheduled               18m                  default-scheduler  Successfully assigned kube-system/coredns-85c98899b4-8zq9h to master01
  Warning  FailedCreatePodSandBox  17m                  kubelet, master01  Failed create pod sandbox: rpc error: code = Unknown desc = failed to set up sandbox container "ffe9745c42750850e44035ee6413bf573148759738fc6131ce970537e03a5d13" network for pod "coredns-85c98899b4-8zq9h": NetworkPlugin cni failed to set up pod "coredns-85c98899b4-8zq9h_kube-system" network: Failed in Poll VM-CFG. Error : Failed in PollVM. Error : Get http://127.0.0.1:9091/vm-cfg/9bf51269-675b-11ea-ac43-525400c1ec4f: dial tcp 127.0.0.1:9091: connect: connection refused

隔天kebectl的命令都不能用了

  • 无论是在master01上还是在node01上
代码语言:javascript
复制
[root@master01 ~]# kubectl get nodes
The connection to the server 192.168.122.96:6443 was refused - did you specify the right host or port?
[root@master01 ~]# 
  • 多次重启kubelet没有用,虽然运行但是有报错
代码语言:javascript
复制
[root@master01 ~]# journalctl -xe -u kubelet
3月 17 21:57:15 master01 kubelet[28722]: E0317 21:57:15.336303   28722 kubelet.go:2236] node "master01" not found
3月 17 21:57:15 master01 kubelet[28722]: E0317 21:57:15.425393   28722 reflector.go:125] k8s.io/kubernetes/pkg/kubelet/kubelet.go:451: Failed to list *v1.Node: Get https://192.168.122.96:6443/api/v1/nodes?fieldSelector=metadata.name%3Dma
3月 17 21:57:15 master01 kubelet[28722]: E0317 21:57:15.426388   28722 reflector.go:125] k8s.io/kubernetes/pkg/kubelet/kubelet.go:442: Failed to list *v1.Service: Get https://192.168.122.96:6443/api/v1/services?limit=500&resourceVersion=
3月 17 21:57:15 master01 kubelet[28722]: E0317 21:57:15.436468   28722 kubelet.go:2236] node "master01" not found
3月 17 21:57:15 master01 kubelet[28722]: E0317 21:57:15.536632   28722 kubelet.go:2236] node "master01" not found
3月 17 21:57:15 master01 kubelet[28722]: E0317 21:57:15.636848   28722 kubelet.go:2236] node "master01" not found
3月 17 21:57:15 master01 kubelet[28722]: E0317 21:57:15.636961   28722 reflector.go:125] k8s.io/kubernetes/pkg/kubelet/config/apiserver.go:47: Failed to list *v1.Pod: Get https://192.168.122.96:6443/api/v1/pods?fieldSelector=spec.nodeNam
3月 17 21:57:15 master01 kubelet[28722]: E0317 21:57:15.737070   28722 kubelet.go:2236] node "master01" not found
3月 17 21:57:15 master01 kubelet[28722]: E0317 21:57:15.837781   28722 kubelet.go:2236] node "master01" not found
  • 搜索发现有很多人也遇到了这个问题
  • 据说可能是kube-apiserver没有启动造成的,但是当前环境无法启动kube-apiserver
代码语言:javascript
复制
[root@master01 ~]# systemctl start kube-apiserver
Failed to start kube-apiserver.service: Unit not found.
[root@master01 ~]# 

调用北向接口

  • 参考文档戳这里 http://www.opencontrail.org/documentation/api/r5.0/#
  • 例如最简单的获取virtual-networks列表(使用最简单用户名/密码认证方法)
代码语言:javascript
复制
[root@master01 ~]# curl -X GET -u "admin:contrail123" -H "Content-Type: application/json; charset=UTF-8" http://192.168.122.96:8082/virtual-networks
{"virtual-networks": [{"href": "http://192.168.122.96:8082/virtual-network/99c4144d-a7b7-4fb1-833e-887f21144320", "fq_name": ["default-domain", "default-project", "default-virtual-network"], "uuid": "99c4144d-a7b7-4fb1-833e-887f21144320"}, {"href": "http://192.168.122.96:8082/virtual-network/6e90abe8-91b6-48ad-99d2-fba6c9e29de4", "fq_name": ["default-domain", "k8s-default", "k8s-default-service-network"], "uuid": "6e90abe8-91b6-48ad-99d2-fba6c9e29de4"}, {"href": "http://192.168.122.96:8082/virtual-network/ab12e6dc-be52-407d-8f1d-37e6d29df0b1", "fq_name": ["default-domain", "default-project", "ip-fabric"], "uuid": "ab12e6dc-be52-407d-8f1d-37e6d29df0b1"}, {"href": "http://192.168.122.96:8082/virtual-network/915156f1-cec3-44eb-b15e-742452084d67", "fq_name": ["default-domain", "k8s-default", "k8s-default-pod-network"], "uuid": "915156f1-cec3-44eb-b15e-742452084d67"}, {"href": "http://192.168.122.96:8082/virtual-network/64a648ee-3ba6-4348-a543-07de6f225486", "fq_name": ["default-domain", "default-project", "dci-network"], "uuid": "64a648ee-3ba6-4348-a543-07de6f225486"}, {"href": "http://192.168.122.96:8082/virtual-network/82890bf9-a8e5-4c85-a32c-e307d9447a0a", "fq_name": ["default-domain", "default-project", "__link_local__"], "uuid": "82890bf9-a8e5-4c85-a32c-e307d9447a0a"}]}[root@master01 ~]# 
[root@master01 ~]# 

重新部署

  • 下定决心,重新部署1-master/2-node的k8s场景,还是使用之前的deployer
  • 记录
代码语言:javascript
复制
[root@deployer contrail-ansible-deployer]# cat install_k8s_3node.log 
...
PLAY RECAP **********************************************************************************************************************************************************************************************************************************
192.168.122.116            : ok=31   changed=15   unreachable=0    failed=0   
192.168.122.146            : ok=23   changed=8    unreachable=0    failed=0   
192.168.122.204            : ok=23   changed=8    unreachable=0    failed=0   
localhost                  : ok=62   changed=4    unreachable=0    failed=0  

[root@deployer contrail-ansible-deployer]# cat install_contrail_3node.log
...
PLAY RECAP **********************************************************************************************************************************************************************************************************************************
192.168.122.116            : ok=76   changed=45   unreachable=0    failed=0   
192.168.122.146            : ok=37   changed=17   unreachable=0    failed=0   
192.168.122.204            : ok=37   changed=17   unreachable=0    failed=0   
localhost                  : ok=66   changed=4    unreachable=0    failed=0   

发现新的master的状态是NotReady,查看状态

代码语言:javascript
复制
[root@master02 ~]# systemctl status kubelet
● kubelet.service - kubelet: The Kubernetes Node Agent
   Loaded: loaded (/usr/lib/systemd/system/kubelet.service; enabled; vendor preset: disabled)
  Drop-In: /usr/lib/systemd/system/kubelet.service.d
           └─10-kubeadm.conf
   Active: active (running) since 三 2020-03-18 16:04:35 +08; 32min ago
     Docs: https://kubernetes.io/docs/
 Main PID: 18801 (kubelet)
    Tasks: 20
   Memory: 60.3M
   CGroup: /system.slice/kubelet.service
           └─18801 /usr/bin/kubelet --bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf --config=/var/lib/kubelet/config.yaml --cgroup-driver=cgroupfs --network-plugin=cni

3月 18 16:36:51 master02 kubelet[18801]: W0318 16:36:51.929447   18801 cni.go:188] Unable to update cni config: No networks found in /etc/cni/net.d
3月 18 16:36:51 master02 kubelet[18801]: E0318 16:36:51.929572   18801 kubelet.go:2167] Container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready...fig uninitialized
3月 18 16:36:56 master02 kubelet[18801]: W0318 16:36:56.930736   18801 cni.go:188] Unable to update cni config: No networks found in /etc/cni/net.d

发现master上确实没有 /etc/cni/net.d这个目录,所以将node02的拷贝过来

代码语言:javascript
复制
[root@master02 ~]# mkdir -p /etc/cni/net.d/
[root@master02 ~]# scp root@192.168.122.146:/etc/cni/net.d/10-contrail.conf /etc/cni/net.d/10-contrail.conf

[root@master02 ~]# systemctl restart kubelet

问题解决

代码语言:javascript
复制
[root@master02 ~]# kubectl get node
NAME                    STATUS   ROLES    AGE   VERSION
localhost.localdomain   Ready    <none>   35m   v1.12.9
master02                Ready    master   35m   v1.12.9
node03                  Ready    <none>   35m   v1.12.9
[root@master02 ~]# 
  • 如果用一个deployer部署两套环境,打开web的时候会提示

  • 解决方法参考这里 https://support.mozilla.org/en-US/kb/Certificate-contains-the-same-serial-number-as-another-certificate

  • pod状态正常了
代码语言:javascript
复制
[root@master02 ~]# kubectl get pods -n kube-system -o wide
NAME                                    READY   STATUS    RESTARTS   AGE   IP                NODE                    NOMINATED NODE
coredns-85c98899b4-4vgk4                1/1     Running   0          69m   10.47.255.252     node03                  <none>
coredns-85c98899b4-thpz6                1/1     Running   0          69m   10.47.255.251     localhost.localdomain   <none>
etcd-master02                           1/1     Running   0          55m   192.168.122.116   master02                <none>
kube-apiserver-master02                 1/1     Running   0          55m   192.168.122.116   master02                <none>
kube-controller-manager-master02        1/1     Running   0          55m   192.168.122.116   master02                <none>
kube-proxy-6sp2n                        1/1     Running   0          69m   192.168.122.116   master02                <none>
kube-proxy-8gpgd                        1/1     Running   0          69m   192.168.122.204   node03                  <none>
kube-proxy-wtvhd                        1/1     Running   0          69m   192.168.122.146   localhost.localdomain   <none>
kube-scheduler-master02                 1/1     Running   0          55m   192.168.122.116   master02                <none>
kubernetes-dashboard-76456c6d4b-9s6vc   1/1     Running   0          69m   192.168.122.204   node03                  <none>
[root@master02 ~]# 

本文系转载,前往查看

如有侵权,请联系 cloudcommunity@tencent.com 删除。

本文系转载前往查看

如有侵权,请联系 cloudcommunity@tencent.com 删除。

评论
登录后参与评论
0 条评论
热度
最新
推荐阅读
目录
  • ansible
  • 拉取镜像
  • 打开web
  • k8s状态
  • node01无法使用kubectrl命令
  • ImagePullBackOff 的问题
  • 隔天kebectl的命令都不能用了
  • 调用北向接口
  • 重新部署
相关产品与服务
容器服务
腾讯云容器服务(Tencent Kubernetes Engine, TKE)基于原生 kubernetes 提供以容器为核心的、高度可扩展的高性能容器管理服务,覆盖 Serverless、边缘计算、分布式云等多种业务部署场景,业内首创单个集群兼容多种计算节点的容器资源管理模式。同时产品作为云原生 Finops 领先布道者,主导开源项目Crane,全面助力客户实现资源优化、成本控制。
领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档