The content of this page has been automatically translated by AI. If you encounter any problems while reading, you can view the corresponding content in Chinese.
Under what circumstances does the cluster status turn RED and YELLOW:
When there are unassigned Primary Index Shards in the cluster, the status will be RED. This situation affects index read and write and requires close attention.
When all primary index shards are assigned but there are unassigned Replica Index Shards, the cluster status will be YELLOW. This situation does not affect index read and write and generally recovers automatically.
Viewing Cluster Status
Use Kibana development tools to view the cluster status:
GET /_cluster/health
Here, you can see that the current cluster status is red, with 9 unassigned shards.
ES Health Interface returned content official explanation:
Metrics
Meaning
cluster_name
Name of the Cluster
status
The cluster health is based on the state of its primary and replica shards. The status is:
– green: All shards are assigned
– yellow: All primary shards are assigned, but one or more replica shards are not assigned. If a node in the cluster fails, some data may be unavailable until that node is repaired
– red: One or more primary shards are unallocated, causing data unavailability. This might occur briefly during cluster startup as primary shards get allocated
timed_out
If a false response is returned within the timeout period specified by the timeout parameter (default is 30s)
number_of_nodes
Number of nodes in the cluster
number_of_data_nodes
Number of nodes serving as dedicated data nodes
active_primary_shards
Number of active primary partitions
active_shards
Total number of active primary and replica partitions
relocating_shards
Number of shards being relocated
initializing_shards
Number of shards being initialized
unassigned_shards
Number of unallocated shards
delayed_unassigned_shards
The number of shards delayed due to timeout settings
number_of_pending_tasks
The number of cluster-level changes not yet executed
number_of_in_flight_fetch
The number of incomplete memory accesses
task_max_waiting_in_queue_millis
Time since the earliest initialization task has been waiting for execution (in milliseconds)
active_shards_percent_as_number
The ratio of active fragments in the cluster, expressed as a percentage
Analysis
When the cluster status is abnormal, pay special attention to unassigned_shards, which are shards that are not properly assigned. Here is an example of one scenario.
Find the abnormal index
Check the index situation and find the abnormal status index based on the return.
GET /_cat/indices
Search for the required CAM policy as needed, and click to complete policy association.
View detailed exception information
GET /_cluster/allocation/explain
Search for the required CAM policy as needed, and click to complete policy association.
The exception information here indicates:
1. The primary shard is currently in an unassigned state (current_state), due to the node to which this shard was allocated leaving the cluster (unassigned_info.reason).
2. After this issue occurs, the reason that the shard cannot be automatically allocated is that there are no available replicas of the shard in the cluster (can_allocate).
3. More detailed information is also provided (allocate_explanation).
The reason for this situation is that there is a node decommissioning in the cluster, resulting in the primary shard having no available shard data. The only thing that can be done currently is to wait for the node to recover and rejoin the cluster.
Note
In some extreme scenarios, such as when the shard of a single replica cluster is damaged, or the file system fails causing the node to be permanently removed, it is necessary to accept the fact of data loss and use reroute commands to reallocate an empty primary shard. To avoid such extreme scenarios, it is recommended to design index shards reasonably and avoid setting single replicas for indices. Here, a single replica means the index has a primary shard but no replica shards or 0 replicas. Reasonable design of index shards can keep the total number of shards in the cluster at a healthy level, making full use of the cluster's distributed characteristics to improve overall cluster performance while ensuring high availability.
All possible reasons for shard unassignment (unassigned_info.reason)
You can use the following analysis methods to preliminarily determine the reason for the cluster's unassigned shards. In most cases, the allocation explain API can provide the desired answers.
Note:
If the cluster status does not automatically recover for a long time or cannot be resolved, you need to contact After-sales Support to reach Tencent Cloud Technical Support.
reason
Reason
INDEX_CREATED
Index creation, unassigned due to API index creation
CLUSTER_RECOVERED
Cluster recovery, unassigned due to entire cluster recovery