Overview
Instance Self-Check allows you to monitor the performance, cost, network, and disk status of your CVM instances, helping you understand their operational status. This feature enables you to promptly identify and resolve any issues related to your instances.
Use Cases
We recommend you use health check in the following two scenarios:
Troubleshooting: if any failure or problem occurs during instance operations, you can use self-service instance detection to locate and troubleshoot it and handle it according to the provided suggestions.
All-Around instance check: during daily Ops, self-service instance detection can help you keep up to date with the overall instance running status and discover and solve problems promptly, guaranteeing normal business operations.
Check Item Description
The health check items are as detailed below:
展开全部
Local Network Detection
Category | Detection Instructions | Risk level | Solution |
Network latency | Local network latency refers to the PING value between your computer and the Tencent Cloud server (the time it takes for your computer to send data to the server and receive feedback), which is used to check the transmission delay between networks. Detecting high network latency in instances by sending HTTP requests. The criteria are as follows: If the latency is above 600 ms, the network quality is considered poor. If no response is received within 5s, the request is considered timed out. If all requests time out, the network is considered disconnected. | Abnormal | We recommend that you check your local network, address specific issues accordingly, or switch to another network. |
| Network jitter | Calculate the difference in latency values between adjacent requests, and the average of these differences represents the network jitter value. If the network jitter value divided by the network latency value is less than or equal to 0.15, it indicates a stable network. If it is greater than 0.15, it signifies network fluctuations. | - |
| Upstream bandwidth | Data packets are uploaded to the instance to calculate its upstream bandwidth. | - |
| Downstream bandwidth | Data packets are downloaded from the instance to calculate its downstream bandwidth. | - |
Security Group Rule Detection
Category | Detection Instructions | Risk level | Solution |
Common ports | Checks whether requests to common ports such as ports 22 and 3389 used by the TCP protocol for the inbound traffic are blocked in the security group. | Warnings | Inbound (Ingress) rules for TCP port 22 in the instance security group are blocked, which may prevent normal SSH login. You can open the required ports. For more information, please see Security Group Use Cases. |
Account Cost Detection
Category | Detection Instructions | Risk level | Solution |
Cloud disk status | Checks whether cloud disks associated with the instance have expired and whether they can be read/written. | Abnormal | The cloud disk of this instance has expired. Please visit the Cloud Disk Console to renew it as soon as possible. |
| For pay-as-you-go instances and non-auto-renewal monthly subscription cloud disks, check if the cloud disk has expired and become unusable. | Warnings | The cloud disk of this instance is not set to auto-renew, which may result in its expiration and unavailability. It is recommended to visit the Cloud Disk Console to enable auto-renewal for the cloud disk. |
| | For monthly-subscription instances with auto-renewal enabled and monthly-subscription cloud disks without auto-renewal, the cloud disk may become unusable due to expiration. | Warnings |
| Non-auto-renewal monthly-subscription instances and non-auto-renewal monthly-subscription cloud disks may become unusable due to inconsistent expiration dates between the two, causing the cloud disk to expire. | Warnings | The expiry time of the instance and its attached cloud disk are not consistent, which may result in the cloud disk becoming unavailable due to expiration. It is recommended to visit the Cloud Disk Console to set up auto-renewal for the cloud disk. |
Instance Storage Check
Category | Detection Instructions | Risk level | Solution |
High cloud disk latency | Checks whether the I/O performance metric svctm is abnormal. | Warnings | A cloud disk associated with the instance has a high latency. We recommend you pay attention to the cloud disk usage. |
Checks for cloud disk I/O hang | Checks for cloud disk I/O hang | Warnings | A cloud disk associated with the instance has an I/O hang. We recommend you pay attention to the cloud disk usage. |
System disk inode utilization | Checks whether the inode utilization of the cloud disk has reached 100%. | Warnings | Please pay attention to the usage of cloud disks. For troubleshooting, refer to Kernel and I/O Related Issues. |
System disk read-only | Checks whether the cloud disk is read-only. | Abnormal | |
System disk space utilization | Checks whether the utilization of the cloud disk has reached 100%. | Warnings | |
Partition I/O utilization | Checks whether the io_util of the cloud disk has reached 100%. | Warnings | |
Instance Network Detection
Category | Detection Instructions | Risk level | Solution |
Public EIP connection | Checks whether the EIP is isolated due to overdue payments. | Abnormal | Public IP connectivity may be disrupted due to overdue payments. We recommend you visit the Billing Center to top up and renew your account as soon as possible. |
Existence of EIP | Checks whether the instance has an EIP. | Warnings | This instance does not have a public IP. If you need a public IP for external network access, please go to the EIP Console to bind an EIP. |
EIP blocked | EIP blocked | Abnormal | The public IP of this instance has been blocked due to a DDoS attack. Please refer to the Unblocking Protected IP documentation for further guidance. |
Public network bandwidth utilization | Checks whether the instance has experienced a high public network inbound bandwidth utilization in the last 12 hours. | Warnings | To prevent becoming a bottleneck for your business, it is recommended to monitor network usage. For troubleshooting, please refer to High Bandwidth Usage Preventing Login. |
| | Checks whether the instance has experienced a high public network outbound bandwidth utilization in the last 12 hours. | Warnings |
| Private network bandwidth utilization | Checks whether the instance has experienced a high private network inbound bandwidth utilization in the last 12 hours. | Warnings |
| | Checks whether the instance has experienced a high private network outbound bandwidth utilization in the last 12 hours. | Warnings |
Packet loss | Checks whether the instance has experienced TCP packet loss due to triggering of traffic throttling in the last 12 hours. | Warnings | To prevent bottlenecks, it is recommended to check the health of your services. For more information, please refer to Cloud Server Network Packet Loss. |
| | Checks whether the instance has experienced UDP packet loss due to triggering of traffic throttling in the last 12 hours. | Warnings |
| | Checks whether the instance has experienced packet loss due to a soft interrupt in the last 12 hours. | Warnings |
| Kernel network conditions | Checks whether the instance has experienced a full UDP send buffer in the last 12 hours. | Warnings |
| | Checks whether the instance has experienced a full UDP receive buffer in the last 12 hours. | Warnings |
| | Checks whether the instance has experienced a full TCP complete connection queue in the last 12 hours. | Warnings |
| | Checks whether the instance has experienced TCP request overflow in the last 12 hours. | Warnings |
| Connection utilization | Checks whether the number of connections of the instance has reached the upper limit in the last 12 hours. | Warnings |
Internal Detection for Linux Hosts
Category | Detection Instructions | Risk level | Solution | |
SSH Login Related | Is password-based login disabled for SSHD? | In the /etc/ssh/sshd_config file, is ssh_password_authentication set to yes? | Warnings | The sshd configuration has disabled password login. If you need to enable password-based login, please refer to the Procedure. |
| Whether SSHD prohibits root user login | In the /etc/ssh/sshd_config file, check if ssh_permit_root_login is set to "yes". | Warnings | |
| SSH Private Key File Permissions | Is the file permission for /etc/ssh/ssh_host_rsa_key incorrect? | Warnings | The ssh_host_rsa_key permission configuration is incorrect. Please refer to Troubleshooting Steps for resolution. |
| /var/empty/sshd Permissions | Is the permission for /var/empty/sshd incorrect? | Critical | The /var/empty/sshd permission configuration is incorrect, causing login issues. Please refer to the Troubleshooting Steps for resolution. |
| hosts_deny Configuration | Whether the /etc/hosts.deny configuration file contains special login rules | Warnings | The /etc/hosts.deny configuration file contains special restriction rules that may prevent login. Please refer to Troubleshooting Steps for resolution. |
| Root User Shell Configuration | Whether the shell configuration in the /etc/passwd file is incorrect | Critical | The root user's bash configuration in the /etc/passwd file is incorrect, causing login failure. Please refer to Steps for repair. |
| wtmp or btmp Files | Checking if /var/log/wtmp or /var/log/btmp files are oversized | Warnings | Oversized /var/log/wtmp or /var/log/btmp files may cause slow login. You need to clear the corresponding files. Please refer to Unresponsive VNC Login After Entering Correct Password for the repair process. |
| Dynamic libraries required by the SSHD process | Does the dynamic library required by the SSHD process exist? | Critical | The dynamic library required by the SSHD process is missing, causing login issues. Please refer to troubleshooting steps for resolution. |
| /etc/profile invoking /etc/profile | Whether there is a /etc/profile calling /etc/profile infinite loop | Critical | A dead loop caused by /etc/profile invoking itself prevents login. Please refer to Fixing the /etc/profile Dead Loop Invocation Issue for a solution. |
| SSHD Process | SSH process existence | Critical | The sshd process is missing; you need to start the corresponding sshd service. Please refer to the Troubleshooting Steps for resolution. |
Network Configuration | Number of ENI Queues | Whether all ENI queues have not been fully enabled | Warnings | If the number of ENI queues is not fully enabled, the instance may not achieve its maximum network performance. Please refer to Incorrect ENI Multi-Queue Configuration for a solution. |
| NAT Environment Kernel Parameters | Is packet loss in the NAT environment caused by irregular kernel network configurations? | Warnings | If tcp_tw_recycle is configured, packet loss may occur in a NAT environment. Please refer to Common Kernel Parameters for Linux Instances for temporary removal. |
OS Environment | System Limits Configuration | Is the /etc/security/limits.conf configuration abnormal? | Warnings | An abnormal configuration in /etc/security/limits.conf may cause login failures. Please refer to Steps for repair instructions. |
| System OOM | Has the system recently experienced an OOM event? | Critical | If the system has experienced OOM, it is recommended to evaluate whether the memory usage is reasonable or to upgrade the instance configuration. Please refer to High Memory Utilization for troubleshooting and resolution. |
| Is SELinux enabled? | Check if SELinux is enabled on the instance. | Warnings | Enabling SELinux may cause login issues. We recommend referring to Handling Steps to disable SELinux first. |
| Has the PID been exhausted? | Instance PID exhaustion check | Critical | System PID is nearing exhaustion, which may lead to system anomalies. It is recommended to evaluate whether the system's launched threads are reasonable or increase the system's pid_max. Please refer to Troubleshooting Steps for resolution. |
| Cloud-init Environment | Checking if the Cloud-Init environment is functioning properly | Critical | Cloud-init environment anomalies may cause issues such as inability to reset passwords or modify hostnames. We recommend reinstalling cloud-init by referring to Installing cloud-init on Linux systems. |
| Basic information of a file system | /etc/fstab Configuration and File System Check | Warnings | /etc/fstab contains a non-existent partition, which may cause the instance to fail to start upon reboot. Please refer to Cloud Disk Not Auto-attaching upon Linux CVM Restart for troubleshooting and resolution. |
| Entering Emergency Mode | Checking for abnormalities in /etc/fstab configurations | Critical | Abnormal configurations in /etc/fstab may cause the system to enter emergency mode. Please refer to Unable to Log In Due to /etc/fstab Configuration Errors for troubleshooting and resolution. |
| Firewall Detection | Checking for abnormal firewall rules | Warnings | If the iptables policy is set to drop, it may cause network connectivity issues. Please refer to iptables policy settings for troubleshooting and resolution. |
Launch Configuration-related | bin and lib soft links | Checking for missing soft links in bin and lib directories | Critical | Missing bin or lib soft links may cause system anomalies. Please refer to System bin or lib Soft Link Missing to recreate the corresponding soft links. |
| Configuring Huge Pages Memory | Has huge pages memory been enabled? | Warnings | The system has enabled huge pages memory, which may cause system anomalies. Please refer to Configuring Huge Pages in sysctl.conf to check if it is configured by your application. If not, the instance may have been compromised. |
| Dynamic Library Hijacking Configuration | Has dynamic library hijacking been configured? | Warnings | The system is configured with dynamic library hijacking, which may cause system abnormalities. Please refer to ld.so.preload for adding dynamic library hijacking to check if it is configured by the business program. If not, the instance may have been compromised. |
System Resource Usage | Is the CPU utilization too high? | Are there any processes with CPU utilization exceeding 80%? | Warnings | High CPU utilization: assess whether it's reasonable or consider upgrading the instance configuration. Please refer to Troubleshooting and resolving login issues due to high CPU or memory usage for guidance. |
| Is the memory utilization too high? | Are there any processes with memory utilization exceeding 80%? | Warnings | High memory usage: assess whether it's reasonable or consider upgrading the instance configuration. Please refer to Troubleshooting and Resolving Login Failures Due to High CPU or Memory Usage for more information. |
| Is the file system inode usage too high? | Has the file system inode usage exceeded 95%? | Warnings | Excessive disk space usage may lead to system anomalies and prevent data writing. It is recommended to assess whether you can delete some files or expand the disk size. Please refer to Solving the Problem of Full Disk Space for resolution. |
| Checking if the file system space usage is too high | Has the disk space usage exceeded 95%? | Warnings | High disk inode usage may lead to system anomalies and prevent data writing. It is recommended to evaluate whether you can delete some files or expand the disk size. Please refer to Resolving File System Inode Full Issues for guidance on fixing the problem. |
Internal Detection for Windows Hosts
Category | Detection Instructions | Risk level | Solution | |
OS Environment | Windows Operating System Version | Is the Windows operating system Windows Server 2008 R2 or an earlier version? | Warnings | Windows Server 2008 R2 and earlier versions have inferior security, stability, and compatibility, and are no longer maintained by Microsoft or Tencent Cloud. It is recommended to follow the processing steps to back up your data and reinstall Windows Server 2016 or a higher version. |
| Memory limit | Has a memory limit been set? | Warnings | The installed system memory is not being fully utilized. To remove the memory limit, please refer to the resolution steps. |
| CPU Limit | Has a CPU limit been set? | Warnings | If the allocated CPU is not being fully utilized, refer to Resolution Steps to remove the CPU limit. |
| Handle leakage | Are the number of file handles normal? | Warnings | There may be a process handle leak; please refer to the troubleshooting steps to investigate and resolve the issue. |
| System brute-force attacks and security breaches | Determining if the system is experiencing a significant number of brute-force attacks and other types of attacks. | Warnings | The system may experience lag or crashes due to brute force attacks or other malicious activities, affecting normal operations and even posing a risk of data loss. Please refer to Security Group Overview and configure security group policies appropriately through the console, allowing only necessary IPs and ports. |
| System Environment Variables | Are the system environment variables functioning properly? | Warnings | If system environment variables are missing or abnormal, please refer to Resolution Steps for repair. |
| System Activation | Determining if the system has been activated | Warnings | |
| System time | Is the system time accurate? | Warnings | |
| System Route Table | Is the system default route missing? | Warnings | |
| System Internet Explorer Proxy | Is the Internet Explorer proxy configured? | Warnings | |
| CD-ROM Status | Checking if the system CD-ROM device is functioning properly | Warnings | CD-ROM Anomaly: The console requires a CD-ROM to reset the password. Please refer to the Troubleshooting Steps for repair. |
System Resource Usage | Is the memory utilization too high? | Are there any processes with memory utilization exceeding 80%? | Warnings | If the system memory usage is too high, please refer to the Troubleshooting Steps for investigation. |
| Is the virtual memory utilization too high? | Is there insufficient virtual memory resources? | Critical | If the system's virtual memory usage is too high, please follow the resolution steps to fix the issue. |
| High total CPU usage | Are there any processes with total CPU usage exceeding 80%? | Warnings | |
| High usage of a single CPU | Are there any processes with single CPU utilization exceeding 80%? | Warnings | If the utilization of a single logical CPU is too high, please refer to the troubleshooting steps for investigation. |
| Insufficient Available Disk Space | Whether the disk usage exceeds 95% or the available disk space is less than 5GB. | Warnings | |
| NTFS System Metadata Files | Is the NTFS metadata disk usage proportion high? | Warnings | |
Remote Connection | Remote Desktop Service Status | Is the remote desktop service status abnormal? | Warnings | If the remote desktop service status is abnormal, please refer to the troubleshooting steps for further investigation. |
| Remote Desktop Service Port | Whether the remote desktop service port is listening on the default port 3389. | Warnings | The Remote Desktop Service port is not listening. Please refer to Troubleshooting Steps for resolution. |
| RDP Listener | Is the RDP Listener enabled? | Critical | The RDP listener is not enabled, preventing remote login. Please refer to the resolution steps for fixing this issue. |
| Allow Remote Desktop Connection | Whether to allow Remote Desktop Connection | Critical | |
| RDP Self-Signed Certificate Expiration Date | Has the RDP self-signed certificate expired? | Critical | If the RDP self-signed certificate has expired, you may not be able to remotely log in. Please refer to the resolution steps for repair. |
| Remote Desktop Services Role Installation and Authorization | Determining if the Remote Desktop Services role is installed and licenses are imported. | Warnings | Multi-user login is enabled, but the License has not been imported. Please refer to Repair Steps for resolution. |
| Network Access Account | Verify if the network access sharing and security model for local accounts is set to "forceguest". | Critical | The network access account is set as a guest and cannot be used for remote login. Please refer to Troubleshooting Steps for resolution. |
| Allow Remote Desktop Service Port through Firewall | Check if the firewall allows Remote Desktop Services. | Warnings | The Windows internal firewall has not allowed the Remote Desktop Service port, preventing remote login. Please refer to the troubleshooting steps for resolution. |
Network Configuration | Port Exhaustion | Whether the number of TCP and UDP ports has been exhausted | Critical | |
| Timewait/Closewait Connection Count | Is the number of Timewait/Closewait connections normal? | Warnings | An abnormal number of Timewait/Closewait connections may prevent remote login. Please refer to the troubleshooting steps for resolution. |
| Gateway Status | Is the gateway status normal? | Warnings | |
| MAC Addresses | Is it the default system MAC address? | Critical | |
| Private Network Domain Name Resolution | Can Tencent Cloud's private network domain names be resolved normally? | Warnings | |
Instance Status Detection
Category | Detection Instructions | Risk level | Solution |
Instance shutdown | Checks whether the instance is shut down. | Warnings | |
Instance restart history | Checks whether the instance has been restarted in the last 12 hours. | Warnings | The instance has been restarted in the last 12 hours. Pay attention to the instance running status. |
Instance kernel crash | Checks whether a hung task has occurred in the instance in the last 12 hours. | Abnormal | The instance has experienced hung tasks, panic, or soft lockups within the last 12 hours. Please pay attention to the instance's operational status. For troubleshooting, refer to Kernel and IO Related Issues. |
| | Checks whether a panic has occurred in the instance in the last 12 hours. | Abnormal |
| | Checks whether a soft deadlock occurred in the instance in the last 12 hours. | Abnormal |
Instance Performance Detection
Category | Detection Instructions | Risk level | Solution |
CPU utilization | Checks whether the instance has experienced a high CPU utilization in the last 12 hours. | Warnings | To avoid becoming a bottleneck for your business, it is recommended to monitor CPU usage and adjust configurations accordingly. For troubleshooting, please refer to the corresponding instance operating system documentation: Windows Instance: Unable to log in due to high CPU or memory usage For Linux, refer to: High CPU or Memory Usage Causing Login Failures |
| Memory utilization | Has the instance experienced high memory load in the last 12 hours? | Warnings |
| Basic CPU utilization | Checks whether the instance has experienced a high CPU utilization in the last 12 hours. | Warnings |
Related Actions
You can refer to Using Instance Self-Check to generate an instance detection report or view historical detection reports.