前往小程序,Get更优阅读体验!
立即前往
首页
学习
活动
专区
工具
TVP
发布
社区首页 >专栏 >Observability Platform - Technical Selection Analysis

Observability Platform - Technical Selection Analysis

原创
作者头像
行者深蓝
发布2023-12-06 18:07:53
2220
发布2023-12-06 18:07:53
举报
文章被收录于专栏:云原生应用工坊

Observability

Image Reference : https://mp.weixin.qq.com/s/nAF3lv-qZprLWvOdvSbYXg

Observability refers to the extent to which a system's internal states can be inferred from its external outputs. In mathematics, observability and controllability are dual concepts.

In modern software systems and cloud computing, observability plays an increasingly important role in ensuring the reliability, performance, and security of applications and infrastructure. As software systems become more complex, with widespread adoption of microservices and increasing reliance on distributed architectures, the importance of observability becomes more pronounced.

Observability mainly includes the following aspects:

  • Logs: Logs are records of system events collected during system operation, including errors, warnings, and information. Logs can provide detailed information about the internal state of the system, such as system startup and shutdown, resource usage, errors, and exceptions.
  • Metrics: Metrics are statistical data about system performance collected during operation, such as CPU usage, memory usage, and network traffic. Metrics provide an overview of the system's operational status, such as overall health and performance bottlenecks.
  • Tracing: Tracing is the full-path tracking of requests and responses in the system, which can help analyze performance bottlenecks, errors, and other issues.

Observability tools help system administrators and developers collect and analyze the above data, thus improving understanding and control of the system.

Applications of observability include:

  1. Troubleshooting: Observability tools can help quickly locate and resolve system faults.
  2. Performance Optimization: Observability tools can help identify performance bottlenecks for optimization.
  3. Security Monitoring: Observability tools can help monitor the security status of the system to prevent security incidents.

Evolution of Monitoring

From the era of monolithic applications to the era of microservices, the dimensions of monitoring data (metrics, logs, traces) have evolved as follows:

Monolithic Applications

In the era of monolithic applications, applications were typically deployed as a single unit on a server. Therefore, the basis of monitoring data was usually singular, such as server CPU, memory, network metrics, etc.

SOA Applications

In the SOA era, applications were split into multiple independent services, each of which could be developed, deployed, and managed independently. Thus, the basis of monitoring data became more complex, requiring attention to the resource usage and performance metrics of each service.

Distributed Applications

In the era of microservices, applications are split into even finer-grained microservices, each typically responsible for a specific business function. Therefore, the basis of monitoring data became even more extensive, requiring attention to the resource usage, performance metrics, and tracing of each microservice.

The dimensions of metrics, logs, and tracing in different eras are summarized as follows:

Era

Metrics

Logs

Tracing

Monolithic

Server resource usage, etc.

Application logs

None

SOA

Service resource usage, etc.

Service logs

Service call links

Microservices

Microservice resource usage, etc.

Microservice logs

Microservice call links

As application architectures evolve, the dimensions of monitoring data have grown increasingly extensive, posing higher demands on the design and implementation of monitoring systems. Monitoring systems must be capable of collecting, storing, and analyzing monitoring data from various sources and dimensions, providing comprehensive support for application maintenance.

Resource Monitoring vs. Application Observability

Traditional resource-focused monitoring primarily addresses the operational status of systems, including overall health and performance bottlenecks. Traditional resource monitoring typically uses metrics to measure system status, such as CPU usage, memory usage, network traffic, etc.

Application observability, on the other hand, focuses not only on system status but also on application business logic and data. Application observability typically uses logs, tracing, and other technologies to collect and analyze data produced during application runtime.

Differences

Aspect

Traditional Resource Monitoring

Application Observability

Focus

System operational status

Application status, business logic, data

Data Sources

Metrics

Logs, tracing

Relationship

Traditional resource monitoring is a part of application observability. Application observability needs to collect and analyze system status metrics, often provided by traditional resource monitoring.

Scope

Traditional resource monitoring is typically limited to the system level, such as servers, containers, databases, etc. Application observability can extend to the application level, including business logic, data, etc.

In summary, resource monitoring and application observability are related but distinct concepts. Traditional resource monitoring is a part of application observability, providing a foundation for it. Application observability can extend to the application level, supporting analysis of business logic and data.

System Monitoring vs. Application Observability

System monitoring primarily focuses on the operational status of systems, including overall health and performance bottlenecks. System monitoring typically uses metrics to measure system status, such as CPU usage, memory usage, network traffic, etc.

Application observability, in contrast, focuses not only on system status but also on application business logic and data. Application observability typically uses logs, tracing, and other technologies to collect and analyze data produced during application runtime.

The differences between system monitoring and application observability can be summarized as follows:

Aspect

System Monitoring

Application Observability

Analysis Purpose

Fault localization, performance optimization

Fault localization, performance optimization, business logic analysis, data understanding

Monitoring Metrics

CPU, Memory, Usage, Load

SLOs, SLIs, Time measurements, Event measurements, Availability

For example, SLOs are the service level objectives of an application, SLIs measure SLOs, time and event measurements help analyze business logic, and availability helps understand data situations.

Suggestions for addressing the evolution of monitoring data include:

  • Adopting distributed monitoring systems to handle the growth of monitoring data.
  • Using data analysis techniques to extract valuable information from monitoring data, enhancing efficiency and effectiveness.
  • Employing automation tools to reduce manual intervention and improve monitoring automation.

Evolution of Monitoring Data Storage Methods

As application architectures evolve, the methods for storing monitoring data have also changed. In the era of monolithic applications, file storage was sufficient for monitoring data needs. In the SOA and microservices era, distributed databases such as TSDB and NoSQL are required. In the future, with the growth of monitoring data volumes and analytical demands, emerging database technologies like graph databases will play an increasingly important role in monitoring data storage.

Storage Comparison

Storage Method

Data Model

Storage Efficiency

Query Efficiency

Suitable Data Types

Applicable Scenarios

Limitations

File Storage

Unstructured

Low

Low

All

Simple Data Storage

Complex Data Management, Poor Scalability

SQLDB

Relational

High

High

Structured

Data Analysis

Poor at Storing Unstructured Data, Limited Horizontal Scaling

TSDB

Time-Series

High

High

Time-Series Data

Monitoring Metrics

Poor at Storing Unstructured Data, Limited Data Types Supported

NoSQL

Non-Relational

High

Low to High

All

Diverse Data Storage

Flexible Data Model, Less Efficient Queries than Relational Databases

Row Database

Row

High

High

Structured

Log Data

Flexible Data Model

Column Database

Column

High

High

Unstructured

Link Tracing Data

Flexible Data Model

Graph Database

Graph

High

High

Relational Data

Application Topology

Flexible Data Model

Monitoring System Technology Selection

Monitoring System

Metric Data

Log Data

Link Tracing Data

Nagios

File Storage

File Storage

Not Supported

Zabbix

SQLDB

SQLDB

Not Supported

Prometheus

TSDB

TSDB

Not Supported

Observability Platform

TSDB

NoSQL

NoSQL/Graph Database

Selection Recommendations

  • Metric Data: TSDB is the best choice for storing metric data due to its high performance, reliability, and scalability.
  • Log Data: NoSQL databases are best for storing log data, offering flexible storage structures and high scalability.
  • Link Tracing Data: NoSQL and graph databases are ideal for storing and analyzing complex relational data.

Advantages of Column and Graph Databases

Column and graph databases have become mainstream choices due to their storage efficiency and scalability. In the realm of AI-assisted monitoring (AIGC), vector databases play a crucial role.

Building an Open Source Observability Platform

Combine different software components to build an observability platform tailored to specific needs.

Open Source Observability Platform Software Combinations

  • Data Storage: TSDB, NoSQL, or graph databases like ClickHouse, Neo4j, VectorDB.
  • Metric Data Collection: Tools like OpenTelemetry, Prometheus.
  • Visualization: Tools like Grafana.
  • Alerting: Tools like AlertManager.
  • Fault Diagnosis: Tools like DeepFlow.

Components

  • ClickHouse: Columnar database for storing metric, log, and link tracing data.
  • Neo4j: Graph database for storing complex link topologies and dependencies.
  • VectorDB: Vector database for AI engine analysis.
  • PromQL and LogQL: Query languages for Prometheus and Loki, respectively.
  • OpenTelemetry: Standard for collecting and storing link tracing data.
  • Grafana: Visualization tool.
  • AlertManager: Alerting system.
  • DeepFlow: Fault diagnosis tool.

References

  1. Open Source Observability Platform Solutions: https://cloud.tencent.com/developer/article/2363793
  2. Open Source Observability Platform Solutions - Operations Manual: https://cloud.tencent.com/developer/article/2363815

原创声明:本文系作者授权腾讯云开发者社区发表,未经许可,不得转载。

如有侵权,请联系 cloudcommunity@tencent.com 删除。

原创声明:本文系作者授权腾讯云开发者社区发表,未经许可,不得转载。

如有侵权,请联系 cloudcommunity@tencent.com 删除。

评论
登录后参与评论
0 条评论
热度
最新
推荐阅读
目录
  • Observability
  • Evolution of Monitoring
    • Monolithic Applications
      • SOA Applications
        • Distributed Applications
        • Resource Monitoring vs. Application Observability
          • Differences
            • Relationship
              • Scope
              • System Monitoring vs. Application Observability
              • Evolution of Monitoring Data Storage Methods
                • Storage Comparison
                  • Monitoring System Technology Selection
                    • Selection Recommendations
                      • Advantages of Column and Graph Databases
                        • Building an Open Source Observability Platform
                          • Open Source Observability Platform Software Combinations
                          • Components
                        • References
                        相关产品与服务
                        Prometheus 监控服务
                        Prometheus 监控服务(TencentCloud Managed Service for Prometheus,TMP)是基于开源 Prometheus 构建的高可用、全托管的服务,与腾讯云容器服务(TKE)高度集成,兼容开源生态丰富多样的应用组件,结合腾讯云可观测平台-告警管理和 Prometheus Alertmanager 能力,为您提供免搭建的高效运维能力,减少开发及运维成本。
                        领券
                        问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档