本文主要从基于日志、基于trace和基于监控指标这三个方面,初步罗列了微服务架构的异常检测和根因定位的相关论文。
Anomaly Detection Using Program Control Flow Graph Mining From Execution Logs.
An Approach for Anomaly Diagnosis Based on Hybrid Graph Model with Logs for Distributed Services.
LogSed: Anomaly Diagnosis through Mining Time-Weighted Control Flow Graph in Logs.
Localization of Operational Faults in Cloud Applications by Mining Causal Dependencies in Logs using Golden Signals.
3.1.1 无监督检测
Unsupervised Detection of Microservice Trace Anomalies through Service-Level Deep Bayesian Networks.
Anomaly Detection from System Tracing Data Using Multi-modal Deep Learning.
An Anomaly Detection Algorithm for Microservice Architecture Based on Robust Principal Component Analysis.
3.1.2 有监督检测
Anomaly Detection and Classification using Distributed Tracing and Deep Learning.
Seer: Leveraging Big Data to Navigate the Complexity of Performance Debugging in Cloud Microservices.
Self-SupervisedAnomalyDetectionfrom Distributed Traces.
Latent Error Prediction and Fault Localization for Microservice Applications by Learning from System Trace Logs.
3.1.3 trace比对
Workflow-Aware Automatic Fault Diagnosis for Microservice-Based Applications With Statistics.
Detecting anomalies in microservices with execution trace comparison.
A Framework of Virtual War Room and Matrix Sketch-Based Streaming Anomaly Detection for Microservice Systems.
3.2.1 基于可视化的分析
Graph-Based Trace Analysis for Microservice Architecture Understanding and Problem Diagnosis.
Fault Analysis and Debugging of Microservice Systems: Industrial Survey, Benchmark System, and Empirical Study.
3.2.2 直接分析(Direct Analysis)
Toward Fine-Grained, Unsupervised, Scalable Performance Diagnosis for Production Cloud Computing Systems.
Unsupervised Detection of Microservice Trace Anomalies through Service-Level Deep Bayesian Networks.
3.2.3 基于拓扑图的分析
MicroHECL: High-Efficient Root Cause Localization in Large-Scale Microservice Systems.
Root Cause Detection in a Service-Oriented Architecture.
4.1.1 无监督检测
Detecting Anomalous Behavior of Black-Box Services Modeled with Distance-Based Online Clustering.
Localizing Faults in Cloud Systems.
DLA: Detecting and Localizing Anomalies in Containerized Microservice Architectures Using Markov Models.
Performance Diagnosis in Cloud Microservices using Deep Learning.
MicroRCA: Root Cause Localization of Performance Issues in Microservices.
4.1.2 有监督检测
Predicting failures in multi-tier distributed systems.
Anomaly Detection and Diagnosis for Container-Based Microservices with Performance Monitoring.
4.1.3 SLO Check(Service Level Objective)
CauseInfer: Automated End-to-End Performance Diagnosis with Hierarchical Causality Graph in Cloud Environment.
CauseInfer: Automatic and distributed performance diagnosis with hierarchical causality graph in large distributed systems.
On Anomaly Detection and Root Cause Analysis of Microservice Systems.
Microscope: Pinpoint Performance Issues with Causal Graphs in Micro-service Environments.
4.2.1 直接分析(Direct Analysis)
-Diagnosis: Unsupervised and Real-Time Diagnosis of Small-Window Long-Tail Latency in Large-Scale Microservice Platforms.
Root-Cause Metric Location for Microservice Systems via Log Anomaly Detection.
PAL: Propagation-Aware Anomaly Localization for Cloud Hosted Distributed Applications.
FChain: Toward Black-Box Online Fault Localization for Cloud Systems.
4.2.2 基于拓扑图的分析
Graph-based root cause analysis for service-oriented and microservice architectures.
Sieve: Actionable Insights from Monitored Metrics in Distributed Systems.
Performance Diagnosis in Cloud Microservices using Deep Learning.
MicroRCA: Root Cause Localization of Performance Issues in Microservices.
DLA: Detecting and Localizing Anomalies in Containerized Microservice Architectures Using Markov Models.
4.2.3 基于因果图的分析
CauseInfer: Automated End-to-End Performance Diagnosis with Hierarchical Causality Graph in Cloud Environment.
CauseInfer: Automatic and distributed performance diagnosis with hierarchical causality graph in large distributed systems.
On Anomaly Detection and Root Cause Analysis of Microservice Systems.
Microscope: Pinpoint Performance Issues with Causal Graphs in Micro-service Environments.
FacGraph: Frequent Anomaly Correlation Graph Mining for Root Cause Diagnose in Micro-Service Architecture.
MS-Rank: Multi-Metric and Self-Adaptive Root Cause Diagnosis for Microservice Applications.
Self-Adaptive Root Cause Diagnosis for Large-Scale Microservice Architecture.
AutoMAP: Diagnose Your Microservice-Based Web Applications Automatically.
CloudRanger: Root Cause Identification for Cloud Native Systems.
Localizing Failure Root Causes in a Microservice through Causality Inference.
Localizing Faults in Cloud Systems.