Monitoring and Alarming are essential components in ensuring high reliability, availability, and performance of cloud servers. When creating a cloud server, Tencent Cloud Observability Platform is activated by default for free. You can analyze and receive real-time alerts, as well as obtain host monitoring metrics through the platform.
This document provides an overview of the monitoring and alarming features available for cloud servers. For more detailed information, please refer to the Tencent Cloud Observability Platform Product Documentation.
Overview
Cloud server monitoring and alarming is a management tool for real-time monitoring of cloud servers. The monitoring and alarming features provide comprehensive and detailed monitoring data, extracting key metrics from cloud servers and displaying them in the form of monitoring charts. This allows you to gain a thorough understanding of resource utilization, performance, and operational status of your cloud servers. Additionally, it supports setting custom alarm thresholds and sending notifications based on your defined rules.
Basic Features
You can access the following CVM monitoring and alarms features in the Cloud Monitor console:
Module | Feature | Main Feature |
Tencent Cloud Observability Platform Overview | Provides an overview of the overall status, alarm summary, and comprehensive monitoring information. | |
Customizable alarm thresholds are supported for users. | Currently, the cloud server alarm setting service is supported. When an anomaly occurs in the cloud server metrics, you will be promptly notified to address the issue. | |
View Cloud Product Monitoring Dashboard | Current Cloud Server Monitoring View | |
Preset Monitoring Dashboard Custom Monitoring Dashboard | Offers flexible and personalized chart features for cloud server monitoring scenarios, including cross-instance aggregated data, real-time/historical data display, similar metric comparisons, and linked charts. | |
Monitoring Traffic | View Overall Bandwidth Information for Users | |
Inherit open-source Prometheus monitoring capabilities | You can monitor the internal status of applications or services, such as the number of requests processed, orders placed, etc. Additionally, you can monitor the processing time of core logic, such as the time consumed when requesting external services. For more information, please refer to Custom Integration in Cloud Server Scenarios. | |
Provide open-source visualization Grafana service | Pre-configured cloud server monitoring dashboard, including various commonly used metrics. |
Use Cases
Daily Management Scenarios: Log in to the Tencent Cloud Observability Platform console to view the operational status of various Tencent Cloud Observability Platforms.
Timely handling of abnormal scenarios: Sends alarm notifications when monitoring data reaches the alarm threshold, allowing you to promptly receive alerts and investigate the cause of the anomaly.
Timely Expansion Scenarios: By setting alarm rules for monitoring items such as bandwidth, connection count, and disk usage, you can conveniently understand the current status of your cloud services and receive timely alarm notifications for service expansion when the business volume increases.
Monitoring Items
To monitor instance performance benchmarks, you should monitor at least the following items. You can access the relevant monitoring information in the Cloud Server Console on the instance details page.
Monitoring Metric | Monitored metrics | Note |
CPU utilization | cpu_usage | CPU usage ratio. The data is collected and reported by the internal monitoring component of the server, making the data more accurate. |
Memory Utilization | mem_usage | The ratio of the actual amount of memory used by the user to the total amount of memory, excluding the memory occupied by buffer and system cache. |
Private network bandwidth out | lan_outtraffic | Average outbound traffic per second of private ENI. |
Private network bandwidth in | lan_intraffic | Average inbound traffic per second of private ENI. |
Public network bandwidth out | wan_outtraffic | Average outbound traffic per second over the public network. The minimum granularity for bandwidth statistics is 10 seconds (bandwidth calculation method: total traffic in 10 seconds divided by 10 seconds). |
Public network bandwidth in | wan_intraffic | Average inbound traffic per second of the public network. |
Disk utilization | disk_usage | Disk usage. |
Disk I/O wait time | disk_io_await | Average wait time per disk I/O operation. |
Monitoring Data
Monitoring Interval: Tencent Cloud Observability Platform currently offers various monitoring data aggregation granularities, including 10 seconds, 1 minute, 5 minutes, 1 hour, and 1 day. Cloud servers support a 1-minute monitoring granularity, meaning data is aggregated every minute. By default, the interval is set to 5 minutes.
Data Storage: Monitoring data at second-level granularity is stored for 1 day; data at 1-minute and 5-minute granularities is stored for 31 days; data at 1-hour granularity is stored for 93 days; and data at 1-day granularity is stored for half a year.
Alarm Display: Data is presented in easy-to-read charts, and the console integrates monitoring data from all products, providing users with a comprehensive overview of system performance.
Alarm Settings: You can set monitoring metric thresholds, and when the conditions are met, timely alarm notifications will be sent to the concerned parties. For more information, refer to Creating Alarm Policies.
Dashboard Configuration: You can set up a dashboard for monitoring metrics, allowing you to dynamically analyze abnormal metric causes. You can also observe real-time metric changes and promptly scale resources as needed. For more information, refer to Create a Dashboard.