首页
学习
活动
专区
圈层
工具
发布
首页
学习
活动
专区
圈层
工具
MCP广场
社区首页 >问答首页 >kubernetes上的Rabbitmq吊舱处于荚初始化状态

kubernetes上的Rabbitmq吊舱处于荚初始化状态
EN

Stack Overflow用户
提问于 2021-07-04 12:49:09
回答 1查看 764关注 0票数 1

我在Kubernetes上运行三个节点的rabbitmq集群。Kubernetes集群正在AWS spot实例上运行,其中一个Kubernetes节点意外地被终止,其中一个Rabbitmq吊舱正在运行。现在,吊舱git计划在另一个节点,从那时起,我的兔my荚被卡在荚初始化状态。

库伯奈特斯事件上写着"FailedPostStartHook“。

日志:

代码语言:javascript
运行
复制
9m46s       Warning   FailedPostStartHook      pod/rabbitmq-0   Exec lifecycle hook ([/bin/sh -c until rabbitmqctl --erlang-cookie ${RABBITMQ_ERLANG_COOKIE} node_health_check; do sleep 1; done; rabbitmqctl --erlang-cookie ${RABBITMQ_ERLANG_COOKIE} set_policy ha-all "" '{"ha-mode":"all", "ha-sync-mode": "automatic"}'
]) for Container "rabbitmq" in Pod "rabbitmq-0_devops(c96c1a6e-bf9a-450d-828d-ed0e8a0ad949)" failed - error: command '/bin/sh -c until rabbitmqctl --erlang-cookie ${RABBITMQ_ERLANG_COOKIE} node_health_check; do sleep 1; done; rabbitmqctl --erlang-cookie ${RABBITMQ_ERLANG_COOKIE} set_policy ha-all "" '{"ha-mode":"all", "ha-sync-mode": "automatic"}'
' exited with 137: Error: unable to perform an operation on node 'rabbit@rabbitmq-0.rabbitmq-service.devops.svc.cluster.local'. Please see diagnostics information and suggestions below.
Most common reasons for this are:
 * Target node is unreachable (e.g. due to hostname resolution, TCP connection or firewall issues)
 * CLI tool fails to authenticate with the server (e.g. due to CLI tool's Erlang cookie not matching that of the server)
 * Target node is not running
In addition to the diagnostics info below:
 * See the CLI, clustering and networking guides on https://rabbitmq.com/documentation.html to learn more
 * Consult server logs on node rabbit@rabbitmq-0.rabbitmq-service.devops.svc.cluster.local
 * If target node is configured to use long node names, don't forget to use --longnames with CLI tools
DIAGNOSTICS
===========
attempted to contact: ['rabbit@rabbitmq-0.rabbitmq-service.devops.svc.cluster.local']

Kubernetes陈述清单:

代码语言:javascript
运行
复制
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: rabbitmq
  namespace: devops
spec:
  podManagementPolicy: OrderedReady
  replicas: 3
  revisionHistoryLimit: 3
  selector:
    matchLabels:
      app: rabbitmq
  serviceName: rabbitmq-service
  template:
    metadata:
      annotations:
      labels:
        app: rabbitmq
      name: rabbitmq
    spec:
      containers:
      - env:
        - name: HOSTNAME
          valueFrom:
            fieldRef:
              apiVersion: v1
              fieldPath: metadata.name
        - name: NAMESPACE
          valueFrom:
            fieldRef:
              apiVersion: v1
              fieldPath: metadata.namespace
        - name: RABBITMQ_USE_LONGNAME
          value: "true"
        - name: RABBITMQ_BASIC_AUTH
          valueFrom:
            secretKeyRef:
              key: password
              name: rabbitmq
        - name: RABBITMQ_NODENAME
          value: rabbit@$(HOSTNAME).rabbitmq-service.$(NAMESPACE).svc.cluster.local
        - name: K8S_SERVICE_NAME
          value: rabbitmq-service
        - name: RABBITMQ_DEFAULT_USER
          value: admin
        - name: RABBITMQ_DEFAULT_PASS
          valueFrom:
            secretKeyRef:
              key: password
              name: rabbitmq
        - name: RABBITMQ_ERLANG_COOKIE
          value: some-cookie
        - name: NODE_NAME
          valueFrom:
            fieldRef:
              apiVersion: v1
              fieldPath: metadata.name
        image: rabbitmq:3.8.1-management-alpine
        imagePullPolicy: IfNotPresent
        lifecycle:
          postStart:
            exec:
              command:
              - /bin/sh
              - -c
              - |
                until rabbitmqctl --erlang-cookie ${RABBITMQ_ERLANG_COOKIE} node_health_check; do sleep 1; done; rabbitmqctl --erlang-cookie ${RABBITMQ_ERLANG_COOKIE} set_policy ha-all "" '{"ha-mode":"all", "ha-sync-mode": "automatic"}'
        livenessProbe:
          exec:
            command:
            - rabbitmqctl
            - status
          failureThreshold: 3
          initialDelaySeconds: 60
          periodSeconds: 10
          successThreshold: 1
          timeoutSeconds: 30
        name: rabbitmq
        ports:
        - containerPort: 4369
          protocol: TCP
        - containerPort: 5672
          protocol: TCP
        - containerPort: 5671
          protocol: TCP
        - containerPort: 25672
          protocol: TCP
        - containerPort: 15672
          protocol: TCP
        readinessProbe:
          exec:
            command:
            - rabbitmqctl
            - status
          failureThreshold: 3
          initialDelaySeconds: 20
          periodSeconds: 10
          successThreshold: 1
          timeoutSeconds: 30
        resources:
          limits:
            cpu: "2"
            memory: 3Gi
          requests:
            cpu: "1"
            memory: 2Gi
        volumeMounts:
        - mountPath: /var/lib/rabbitmq/
          name: rabbitmq-data
        - mountPath: /etc/rabbitmq
          name: config
      dnsPolicy: ClusterFirst
      initContainers:
      - command:
        - /bin/bash
        - -euc
        - |
          rm -f /var/lib/rabbitmq/.erlang.cookie
          cp /rabbitmqconfig/rabbitmq.conf /etc/rabbitmq/rabbitmq.conf
          cp /rabbitmqconfig/enabled_plugins /etc/rabbitmq/enabled_plugins
        image: rabbitmq:3.8.1-management-alpine
        imagePullPolicy: Always
        name: copy-rabbitmq-config
        resources: {}
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
        volumeMounts:
        - mountPath: /rabbitmqconfig
          name: rabbitmq-configmap
        - mountPath: /etc/rabbitmq
          name: config
        - mountPath: /var/lib/rabbitmq
          name: rabbitmq-data
      restartPolicy: Always
      schedulerName: default-scheduler
      securityContext: {}
      serviceAccount: rabbitmq
      serviceAccountName: rabbitmq
      terminationGracePeriodSeconds: 10
      volumes:
      - configMap:
          defaultMode: 420
          items:
          - key: rabbitmq.conf
            path: rabbitmq.conf
          - key: enabled_plugins
            path: enabled_plugins
          name: rabbitmq-configmap
        name: rabbitmq-configmap
      - emptyDir: {}
        name: config
  updateStrategy:
    type: RollingUpdate
  volumeClaimTemplates:
  - apiVersion: v1
    kind: PersistentVolumeClaim
    metadata:
      creationTimestamp: null
      name: rabbitmq-data
    spec:
      accessModes:
      - ReadWriteOnce
      resources:
        requests:
          storage: 20Gi
      storageClassName: gp2
      volumeMode: Filesystem

我尝试过的事情:

  1. 登录到已命中的吊舱并执行(此命令没有任何响应)

rabbitmqctl stop_app

  1. 试图强行删除豆荚,但没有运气。--

  1. 登录被击中的吊舱并执行

兔reset复位

  1. 登录被击中的吊舱并执行

rabbitmqctl force_boot

  1. 登录被击中的吊舱并执行

rm /var/log/rabbitmq/*

上述任何一件事都没有帮助。

请注意,其他2个rabbitmq节点运行良好,服务于流量,并将失败节点显示为up:

代码语言:javascript
运行
复制
rabbitmq-2 rabbitmq 2021-07-04 12:19:07.233 [info] <0.490.0> node 'rabbit@rabbitmq-0.rabbitmq-service.devops.svc.cluster.local' up
rabbitmq-1 rabbitmq 2021-07-04 12:19:07.208 [info] <0.494.0> node 'rabbit@rabbitmq-0.rabbitmq-service.devops.svc.cluster.local' up 
EN

回答 1

Stack Overflow用户

发布于 2021-07-05 06:32:09

运行statefulset命令的rollout重新启动对我有效。

代码语言:javascript
运行
复制
kubectl rollout restart statefulset rabbitmq -n devops

在这个命令之后,rabbitmq集群已经启动并运行,所有三个节点都加入了集群,没有任何问题。

一旦完成,就需要重新启动连接到这个rabbitmq集群的应用程序。

票数 0
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/68244876

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档