我在kubernetes集群上安装了使用舵机图表的rabbitmq。兔子舱一直在重新启动。在检查吊舱日志时,我得到以下错误
2020-02-26 04:42:31.582 [warning] <0.314.0> Error while waiting for Mnesia tables: {timeout_waiting_for_tables,[rabbit_durable_queue]}
2020-02-26 04:42:31.582 [info] <0.314.0> Waiting for Mnesia tables for 30000 ms, 6 retries left
当我试图做kubectl描述荚时,我得到了这个错误
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
data:
Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
ClaimName: data-rabbitmq-0
ReadOnly: false
config-volume:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: rabbitmq-config
Optional: false
healthchecks:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: rabbitmq-healthchecks
Optional: false
rabbitmq-token-w74kb:
Type: Secret (a volume populated by a Secret)
SecretName: rabbitmq-token-w74kb
Optional: false
QoS Class: Burstable
Node-Selectors: beta.kubernetes.io/arch=amd64
Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning Unhealthy 3m27s (x878 over 7h21m) kubelet, gke-analytics-default-pool-918f5943-w0t0 Readiness probe failed: Timeout: 70 seconds ...
Checking health of node rabbit@rabbitmq-0.rabbitmq-headless.default.svc.cluster.local ...
Status of node rabbit@rabbitmq-0.rabbitmq-headless.default.svc.cluster.local ...
Error:
{:aborted, {:no_exists, [:rabbit_vhost, [{{:vhost, :"$1", :_, :_}, [], [:"$1"]}]]}}
Error:
{:aborted, {:no_exists, [:rabbit_vhost, [{{:vhost, :"$1", :_, :_}, [], [:"$1"]}]]}}
我已经在kubernetes集群上的Google上提供了上面的内容。我不知道在什么具体情况下,它开始失败。我不得不重新启动吊舱,从那以后,它一直在失败。
这里有什么问题?
发布于 2020-07-19 11:25:04
测试此部署:
kind: Service
apiVersion: v1
metadata:
namespace: rabbitmq-namespace
name: rabbitmq
labels:
app: rabbitmq
type: LoadBalancer
spec:
type: NodePort
ports:
- name: http
protocol: TCP
port: 15672
targetPort: 15672
nodePort: 31672
- name: amqp
protocol: TCP
port: 5672
targetPort: 5672
nodePort: 30672
- name: stomp
protocol: TCP
port: 61613
targetPort: 61613
selector:
app: rabbitmq
---
kind: Service
apiVersion: v1
metadata:
namespace: rabbitmq-namespace
name: rabbitmq-lb
labels:
app: rabbitmq
spec:
# Headless service to give the StatefulSet a DNS which is known in the cluster (hostname-#.app.namespace.svc.cluster.local, )
# in our case - rabbitmq-#.rabbitmq.rabbitmq-namespace.svc.cluster.local
clusterIP: None
ports:
- name: http
protocol: TCP
port: 15672
targetPort: 15672
- name: amqp
protocol: TCP
port: 5672
targetPort: 5672
- name: stomp
port: 61613
selector:
app: rabbitmq
---
apiVersion: v1
kind: ConfigMap
metadata:
name: rabbitmq-config
namespace: rabbitmq-namespace
data:
enabled_plugins: |
[rabbitmq_management,rabbitmq_peer_discovery_k8s,rabbitmq_stomp].
rabbitmq.conf: |
## Cluster formation. See http://www.rabbitmq.com/cluster-formation.html to learn more.
cluster_formation.peer_discovery_backend = rabbit_peer_discovery_k8s
cluster_formation.k8s.host = kubernetes.default.svc.cluster.local
## Should RabbitMQ node name be computed from the pod's hostname or IP address?
## IP addresses are not stable, so using [stable] hostnames is recommended when possible.
## Set to "hostname" to use pod hostnames.
## When this value is changed, so should the variable used to set the RABBITMQ_NODENAME
## environment variable.
cluster_formation.k8s.address_type = hostname
## Important - this is the suffix of the hostname, as each node gets "rabbitmq-#", we need to tell what's the suffix
## it will give each new node that enters the way to contact the other peer node and join the cluster (if using hostname)
cluster_formation.k8s.hostname_suffix = .rabbitmq.rabbitmq-namespace.svc.cluster.local
## How often should node cleanup checks run?
cluster_formation.node_cleanup.interval = 30
## Set to false if automatic removal of unknown/absent nodes
## is desired. This can be dangerous, see
## * http://www.rabbitmq.com/cluster-formation.html#node-health-checks-and-cleanup
## * https://groups.google.com/forum/#!msg/rabbitmq-users/wuOfzEywHXo/k8z_HWIkBgAJ
cluster_formation.node_cleanup.only_log_warning = true
cluster_partition_handling = autoheal
## See http://www.rabbitmq.com/ha.html#master-migration-data-locality
queue_master_locator=min-masters
## See http://www.rabbitmq.com/access-control.html#loopback-users
loopback_users.guest = false
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: rabbitmq
namespace: rabbitmq-namespace
spec:
serviceName: rabbitmq
replicas: 3
selector:
matchLabels:
name: rabbitmq
template:
metadata:
labels:
app: rabbitmq
name: rabbitmq
state: rabbitmq
annotations:
pod.alpha.kubernetes.io/initialized: "true"
spec:
serviceAccountName: rabbitmq
terminationGracePeriodSeconds: 10
containers:
- name: rabbitmq-k8s
image: rabbitmq:3.8.3
volumeMounts:
- name: config-volume
mountPath: /etc/rabbitmq
- name: data
mountPath: /var/lib/rabbitmq/mnesia
ports:
- name: http
protocol: TCP
containerPort: 15672
- name: amqp
protocol: TCP
containerPort: 5672
livenessProbe:
exec:
command: ["rabbitmqctl", "status"]
initialDelaySeconds: 60
periodSeconds: 60
timeoutSeconds: 10
resources:
requests:
memory: "0"
cpu: "0"
limits:
memory: "2048Mi"
cpu: "1000m"
readinessProbe:
exec:
command: ["rabbitmqctl", "status"]
initialDelaySeconds: 20
periodSeconds: 60
timeoutSeconds: 10
imagePullPolicy: Always
env:
- name: MY_POD_IP
valueFrom:
fieldRef:
fieldPath: status.podIP
- name: NAMESPACE
valueFrom:
fieldRef:
fieldPath: metadata.namespace
- name: HOSTNAME
valueFrom:
fieldRef:
fieldPath: metadata.name
- name: RABBITMQ_USE_LONGNAME
value: "true"
# See a note on cluster_formation.k8s.address_type in the config file section
- name: RABBITMQ_NODENAME
value: "rabbit@$(HOSTNAME).rabbitmq.$(NAMESPACE).svc.cluster.local"
- name: K8S_SERVICE_NAME
value: "rabbitmq"
- name: RABBITMQ_ERLANG_COOKIE
value: "mycookie"
volumes:
- name: config-volume
configMap:
name: rabbitmq-config
items:
- key: rabbitmq.conf
path: rabbitmq.conf
- key: enabled_plugins
path: enabled_plugins
volumeClaimTemplates:
- metadata:
name: data
spec:
accessModes:
- "ReadWriteOnce"
storageClassName: "default"
resources:
requests:
storage: 3Gi
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: rabbitmq
namespace: rabbitmq-namespace
---
kind: Role
apiVersion: rbac.authorization.k8s.io/v1beta1
metadata:
name: endpoint-reader
namespace: rabbitmq-namespace
rules:
- apiGroups: [""]
resources: ["endpoints"]
verbs: ["get"]
---
kind: RoleBinding
apiVersion: rbac.authorization.k8s.io/v1beta1
metadata:
name: endpoint-reader
namespace: rabbitmq-namespace
subjects:
- kind: ServiceAccount
name: rabbitmq
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: Role
name: endpoint-reader
发布于 2021-03-10 14:59:39
TLDR
helm upgrade rabbitmq --set clustering.forceBoot=true
问题
出现此问题的原因如下:
因此,最终,RabbitMQ和StatefulSets的工作方式有点不匹配。RMQ说:“如果每样东西都坏了,只要同时启动所有东西,一个就能启动,一旦这个启动,其他的就可以重新加入集群了。”k8s StatefulSets说:“一次开始所有的事情是不可能的,我们从0开始。”
溶液
要解决这个问题,有一个用于rabbitmqctl的开机命令,它基本上告诉一个实例,如果它没有找到任何对等程序,就启动独立的。如何从Kubernetes中使用这一点取决于所使用的Helm图表和容器。在使用Bitnami图的Bitnami码头图像中,有一个值clustering.forceBoot = true
,它将转换为容器中的env变量RABBITMQ_FORCE_BOOT = yes
,然后该变量将为您发出上述命令。
但是,看看这个问题,您还可以看到为什么删除PVCs将有效(其他答案)。豆荚只会“忘记”上一次它们是RMQ集群的一部分,并且愉快地开始了。不过,我更喜欢上述解决方案,因为没有数据丢失。
发布于 2020-02-26 12:56:30
刚刚删除了现有的持久卷声明并重新安装了rabbitmq,它就开始工作了。
因此,每次在kubernetes集群上安装rabbitmq之后,如果我将吊舱缩小到0,当我稍后放大吊舱时,就会得到同样的错误。我还尝试删除持久化卷声明,而没有卸载rabbitmq头盔图表,但仍然是相同的错误。
因此,每次我将集群缩小到0时,我都需要卸载rabbitmq头盔图,删除相应的持久性卷声明,并每次安装rabbitmq头盔图表以使其正常工作。
https://stackoverflow.com/questions/60407082
复制相似问题