文章/答案/技术大牛

发布

社区首页 >问答首页 >兔子mq -等待Mnesia表时出错

问兔子mq -等待Mnesia表时出错
EN

Stack Overflow用户

提问于 2020-02-26 05:02:05

回答 6查看 24.3K关注 0票数 18

我在kubernetes集群上安装了使用舵机图表的rabbitmq。兔子舱一直在重新启动。在检查吊舱日志时，我得到以下错误

2020-02-26 04:42:31.582 [warning] <0.314.0> Error while waiting for Mnesia tables: {timeout_waiting_for_tables,[rabbit_durable_queue]}
2020-02-26 04:42:31.582 [info] <0.314.0> Waiting for Mnesia tables for 30000 ms, 6 retries left

当我试图做kubectl描述荚时，我得到了这个错误

Conditions:
  Type              Status
  Initialized       True
  Ready             False
  ContainersReady   False
  PodScheduled      True
Volumes:
  data:
    Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  data-rabbitmq-0
    ReadOnly:   false
  config-volume:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      rabbitmq-config
    Optional:  false
  healthchecks:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      rabbitmq-healthchecks
    Optional:  false
  rabbitmq-token-w74kb:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  rabbitmq-token-w74kb
    Optional:    false
QoS Class:       Burstable
Node-Selectors:  beta.kubernetes.io/arch=amd64
Tolerations:     node.kubernetes.io/not-ready:NoExecute for 300s
                 node.kubernetes.io/unreachable:NoExecute for 300s
Events:
  Type     Reason     Age                      From                                               Message
  ----     ------     ----                     ----                                               -------
  Warning  Unhealthy  3m27s (x878 over 7h21m)  kubelet, gke-analytics-default-pool-918f5943-w0t0  Readiness probe failed: Timeout: 70 seconds ...
Checking health of node rabbit@rabbitmq-0.rabbitmq-headless.default.svc.cluster.local ...
Status of node rabbit@rabbitmq-0.rabbitmq-headless.default.svc.cluster.local ...
Error:
{:aborted, {:no_exists, [:rabbit_vhost, [{{:vhost, :"$1", :_, :_}, [], [:"$1"]}]]}}
Error:
{:aborted, {:no_exists, [:rabbit_vhost, [{{:vhost, :"$1", :_, :_}, [], [:"$1"]}]]}}

我已经在kubernetes集群上的Google上提供了上面的内容。我不知道在什么具体情况下，它开始失败。我不得不重新启动吊舱，从那以后，它一直在失败。

这里有什么问题？

kubernetes

rabbitmq

google-kubernetes-engine

kubernetes-helm

rabbitmq-exchange

回答 6

Stack Overflow用户

回答已采纳

发布于 2020-07-19 11:25:04

测试此部署：

kind: Service
apiVersion: v1
metadata:
  namespace: rabbitmq-namespace
  name: rabbitmq
  labels:
    app: rabbitmq
    type: LoadBalancer  
spec:
  type: NodePort
  ports:
   - name: http
     protocol: TCP
     port: 15672
     targetPort: 15672
     nodePort: 31672
   - name: amqp
     protocol: TCP
     port: 5672
     targetPort: 5672
     nodePort: 30672
   - name: stomp
     protocol: TCP
     port: 61613
     targetPort: 61613
  selector:
    app: rabbitmq
---
kind: Service 
apiVersion: v1
metadata:
  namespace: rabbitmq-namespace
  name: rabbitmq-lb
  labels:
    app: rabbitmq
spec:
  # Headless service to give the StatefulSet a DNS which is known in the cluster (hostname-#.app.namespace.svc.cluster.local, )
  # in our case - rabbitmq-#.rabbitmq.rabbitmq-namespace.svc.cluster.local  
  clusterIP: None
  ports:
   - name: http
     protocol: TCP
     port: 15672
     targetPort: 15672
   - name: amqp
     protocol: TCP
     port: 5672
     targetPort: 5672
   - name: stomp
     port: 61613
  selector:
    app: rabbitmq
---
apiVersion: v1
kind: ConfigMap
metadata:
  name: rabbitmq-config
  namespace: rabbitmq-namespace
data:
  enabled_plugins: |
      [rabbitmq_management,rabbitmq_peer_discovery_k8s,rabbitmq_stomp].

  rabbitmq.conf: |
      ## Cluster formation. See http://www.rabbitmq.com/cluster-formation.html to learn more.
      cluster_formation.peer_discovery_backend  = rabbit_peer_discovery_k8s
      cluster_formation.k8s.host = kubernetes.default.svc.cluster.local
      ## Should RabbitMQ node name be computed from the pod's hostname or IP address?
      ## IP addresses are not stable, so using [stable] hostnames is recommended when possible.
      ## Set to "hostname" to use pod hostnames.
      ## When this value is changed, so should the variable used to set the RABBITMQ_NODENAME
      ## environment variable.
      cluster_formation.k8s.address_type = hostname   
      ## Important - this is the suffix of the hostname, as each node gets "rabbitmq-#", we need to tell what's the suffix
      ## it will give each new node that enters the way to contact the other peer node and join the cluster (if using hostname)
      cluster_formation.k8s.hostname_suffix = .rabbitmq.rabbitmq-namespace.svc.cluster.local
      ## How often should node cleanup checks run?
      cluster_formation.node_cleanup.interval = 30
      ## Set to false if automatic removal of unknown/absent nodes
      ## is desired. This can be dangerous, see
      ##  * http://www.rabbitmq.com/cluster-formation.html#node-health-checks-and-cleanup
      ##  * https://groups.google.com/forum/#!msg/rabbitmq-users/wuOfzEywHXo/k8z_HWIkBgAJ
      cluster_formation.node_cleanup.only_log_warning = true
      cluster_partition_handling = autoheal
      ## See http://www.rabbitmq.com/ha.html#master-migration-data-locality
      queue_master_locator=min-masters
      ## See http://www.rabbitmq.com/access-control.html#loopback-users
      loopback_users.guest = false
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: rabbitmq
  namespace: rabbitmq-namespace
spec:
  serviceName: rabbitmq
  replicas: 3
  selector:
    matchLabels:
      name: rabbitmq
  template:
    metadata:
      labels:
        app: rabbitmq
        name: rabbitmq
        state: rabbitmq
      annotations:
        pod.alpha.kubernetes.io/initialized: "true"
    spec:
      serviceAccountName: rabbitmq
      terminationGracePeriodSeconds: 10
      containers:        
      - name: rabbitmq-k8s
        image: rabbitmq:3.8.3
        volumeMounts:
          - name: config-volume
            mountPath: /etc/rabbitmq
          - name: data
            mountPath: /var/lib/rabbitmq/mnesia
        ports:
          - name: http
            protocol: TCP
            containerPort: 15672
          - name: amqp
            protocol: TCP
            containerPort: 5672
        livenessProbe:
          exec:
            command: ["rabbitmqctl", "status"]
          initialDelaySeconds: 60
          periodSeconds: 60
          timeoutSeconds: 10
        resources:
            requests:
              memory: "0"
              cpu: "0"
            limits:
              memory: "2048Mi"
              cpu: "1000m"
        readinessProbe:
          exec:
            command: ["rabbitmqctl", "status"]
          initialDelaySeconds: 20
          periodSeconds: 60
          timeoutSeconds: 10
        imagePullPolicy: Always
        env:
          - name: MY_POD_IP
            valueFrom:
              fieldRef:
                fieldPath: status.podIP
          - name: NAMESPACE
            valueFrom:
              fieldRef:
                fieldPath: metadata.namespace
          - name: HOSTNAME
            valueFrom:
              fieldRef:
                fieldPath: metadata.name
          - name: RABBITMQ_USE_LONGNAME
            value: "true"
          # See a note on cluster_formation.k8s.address_type in the config file section
          - name: RABBITMQ_NODENAME
            value: "rabbit@$(HOSTNAME).rabbitmq.$(NAMESPACE).svc.cluster.local"
          - name: K8S_SERVICE_NAME
            value: "rabbitmq"
          - name: RABBITMQ_ERLANG_COOKIE
            value: "mycookie"      
      volumes:
        - name: config-volume
          configMap:
            name: rabbitmq-config
            items:
            - key: rabbitmq.conf
              path: rabbitmq.conf
            - key: enabled_plugins
              path: enabled_plugins
  volumeClaimTemplates:
  - metadata:
      name: data
    spec:
      accessModes:
        - "ReadWriteOnce"
      storageClassName: "default"
      resources:
        requests:
          storage: 3Gi

---
apiVersion: v1
kind: ServiceAccount
metadata:
  name: rabbitmq 
  namespace: rabbitmq-namespace 
---
kind: Role
apiVersion: rbac.authorization.k8s.io/v1beta1
metadata:
  name: endpoint-reader
  namespace: rabbitmq-namespace 
rules:
- apiGroups: [""]
  resources: ["endpoints"]
  verbs: ["get"]
---
kind: RoleBinding
apiVersion: rbac.authorization.k8s.io/v1beta1
metadata:
  name: endpoint-reader
  namespace: rabbitmq-namespace
subjects:
- kind: ServiceAccount
  name: rabbitmq
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: Role
  name: endpoint-reader

票数 -7

Stack Overflow用户

发布于 2021-03-10 14:59:39

TLDR

helm upgrade rabbitmq --set clustering.forceBoot=true

问题

出现此问题的原因如下：

由于某种原因(可能是因为显式地将StatefulSet副本设置为0或其他原因)，所有的RMQ荚都会同时终止。
其中一个是最后一个停下来的(也许只是在其他人之后一点点)。它将这个条件(“我现在是独立的”)存储在它的文件系统中，在k8s中是PersistentVolume(Claim)。假设这个吊舱是兔子一号。
当你旋转StatefulSet回来，荚兔子to 0总是第一个开始(见这里)。
在启动过程中，pod 0首先检查它是否应该独立运行。但是，从它自己的文件系统上可以看到，它是集群的一部分。因此，它检查它的同行，却找不到任何东西。这个默认情况下导致启动失败。。
因此，兔thus 0永远不会准备好。
rabbitmq-1从来没有启动过，因为这就是部署StatefulSets的方式--一个接一个。如果它要启动，它将成功启动，因为它看到它也可以独立运行。

因此，最终，RabbitMQ和StatefulSets的工作方式有点不匹配。RMQ说：“如果每样东西都坏了，只要同时启动所有东西，一个就能启动，一旦这个启动，其他的就可以重新加入集群了。”k8s StatefulSets说：“一次开始所有的事情是不可能的，我们从0开始。”

溶液

要解决这个问题，有一个用于rabbitmqctl的开机命令，它基本上告诉一个实例，如果它没有找到任何对等程序，就启动独立的。如何从Kubernetes中使用这一点取决于所使用的Helm图表和容器。在使用Bitnami图的Bitnami码头图像中，有一个值clustering.forceBoot = true，它将转换为容器中的env变量RABBITMQ_FORCE_BOOT = yes，然后该变量将为您发出上述命令。

但是，看看这个问题，您还可以看到为什么删除PVCs将有效(其他答案)。豆荚只会“忘记”上一次它们是RMQ集群的一部分，并且愉快地开始了。不过，我更喜欢上述解决方案，因为没有数据丢失。

票数 43

Stack Overflow用户

发布于 2020-02-26 12:56:30

刚刚删除了现有的持久卷声明并重新安装了rabbitmq，它就开始工作了。

因此，每次在kubernetes集群上安装rabbitmq之后，如果我将吊舱缩小到0，当我稍后放大吊舱时，就会得到同样的错误。我还尝试删除持久化卷声明，而没有卸载rabbitmq头盔图表，但仍然是相同的错误。

因此，每次我将集群缩小到0时，我都需要卸载rabbitmq头盔图，删除相应的持久性卷声明，并每次安装rabbitmq头盔图表以使其正常工作。

票数 13

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/60407082

复制

相似问题

问兔子mq -等待Mnesia表时出错
EN

回答 6

Stack Overflow用户

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问兔子mq -等待Mnesia表时出错EN

回答 6

Stack Overflow用户

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问兔子mq -等待Mnesia表时出错
EN