【我在拉勾训练营学技术】微服务监控--链路追踪技术

程序员爱酸奶

发布于 2021-01-18 09:54:42

4280

发布于 2021-01-18 09:54:42

文章被收录于专栏：程序员爱酸奶

前言

文章内容输出来源：拉勾教育Java高薪训练营；

Spring Cloud 是一站式微服务解决方案。很多公司都在使用 Spring Cloud 组件。我们想要学习 Spring Cloud 微服务架构，就需要学习他们的组件。包含：注册中心、负载均衡、熔断处理、过程调用、网关服务、配置中心、消息总线、调用链路、数据监控等等。

这篇文章带领大家了解微服务链路追踪技术？当我们微服务的数量越来越多的时候，服务间的调用越来越复杂，当一个服务出现问题的时候，排查问题起来就很麻烦，所以就需要链路追踪技术啦，可以清楚的记录服务调用链路，在哪出问题一目了然，可以方便我们快速的定位问题，赶紧 get 吧。

学习本场本文您将了解到：

分布式链路追踪技术适⽤场景
核心思想
Spring Cloud Sleuth
Zipkin
ZipKin 数据持久化

分布式链路追踪技术适⽤场景

为了⽀撑⽇益增⻓的庞⼤业务量，我们会使⽤微服务架构设计我们的系统，使得我们的系统不仅能够通过集群部署抵挡流量的冲击，⼜能根据业务进⾏灵活的扩展。

那么，在微服务架构下，⼀次请求少则经过三四次服务调⽤完成，多则跨越⼏⼗个甚⾄是上百个服务节点。那么问题接踵⽽来：

1、如何动态展示服务的调⽤链路？（⽐如A服务调⽤了哪些其他的服务---依赖关系）

2、如何分析服务调⽤链路中的瓶颈节点并对其进⾏调优？(⽐如A—>B—>C，C服务处理时间特别⻓)

3、如何快速进⾏服务链路的故障发现？

这就是分布式链路追踪技术存在的⽬的和意义。如果我们在⼀个请求的调⽤处理过程中，在各个链路节点都能够记录下⽇志，并最终将⽇志进⾏集中可视化展示，那么我们想监控调⽤链路中的⼀些指标就可以实现了，⽐如，请求到达哪个服务实例？请求被处理的状态怎样？处理耗时怎样？这些都能够分析出来了...

分布式环境下基于这种想法实现的监控技术就是就是分布式链路追踪（全链路追踪）。

分布式链路追踪技术已然成熟，市场上的分布式链路追踪⽅案产品也不少，国内外都有，⽐如Spring Cloud Sleuth + Twitter Zipkin、阿⾥巴巴的“鹰眼”、⼤众点评的“CAT”、美团的“Mtrace”、京东的“Hydra”、新浪的“Watchman”，另外还有最近也被提到很多的Apache Skywalking。

核心思想

本质：记录⽇志，作为⼀个完整的技术，分布式链路追踪也有⾃⼰的理论和概念

微服务架构中，针对请求处理的调⽤链可以展现为⼀棵树，示意如下

image-20200820111746438

上图描述了⼀个常⻅的调⽤场景，⼀个请求通过⽹关服务路由到下游的微服务-1，然后微服务-1调⽤微服务-2，拿到结果后再调⽤微服务-3，最后组合微服务-2和微服务-3的结果，通过⽹关返回给⽤户为了追踪整个调⽤链路，肯定需要记录⽇志，⽇志记录是基础，在此之上肯定有⼀些理论概念，当下主流的的分布式链路追踪技术/系统所基于的理念都来⾃于Google的⼀篇论⽂《Dapper, a Large-ScaleDistributed Systems Tracing Infrastructure》，这⾥⾯涉及到的核⼼理念是什么，我们来看下，还以前⾯的服务调⽤来说

一次请求的链路为一个请求链路，也就是 Trance ,在这条链路上，所有的服务都为相同的链路 id。一个一个的服务就为 span .

Trace：服务追踪的追踪单元是从客户发起请求（request）抵达被追踪系统的边界开始，到被追踪系统向客户返回响应（response）为⽌的过程

Trace ID：为了实现请求跟踪，当请求发送到分布式系统的⼊⼝端点时，只需要服务跟踪框架为该请求创建⼀个唯⼀的跟踪标识Trace ID，同时在分布式系统内部流转的时候，框架失踪保持该唯⼀标识，直到返回给请求⽅。⼀个Trace由⼀个或者多个Span组成，每⼀个Span都有⼀个SpanId，Span中会记录TraceId，同时还有⼀个叫做ParentId，指向了另外⼀个Span的SpanId，表明⽗⼦关系，其实本质表达了依赖关系

Span ID：为了统计各处理单元的时间延迟，当请求到达各个服务组件时，也是通过⼀个唯⼀标识SpanID来标记它的开始，具体过程以及结束。对每⼀个Span来说，它必须有开始和结束两个节点，通过记录开始Span和结束Span的时间戳，就能统计出该Span的时间延迟，除了时间戳记录之外，它还可以包含⼀些其他元数据，⽐如时间名称、请求信息等。

每⼀个Span都会有⼀个唯⼀跟踪标识 Span ID,若⼲个有序的 span 就组成了⼀个 trace。

Span可以认为是⼀个⽇志数据结构，在⼀些特殊的时机点会记录了⼀些⽇志信息，⽐如有时间戳、spanId、TraceId，parentIde等，Span中也抽象出了另外⼀个概念，叫做事件，核⼼事件如下

CS ：client send/start 客户端/消费者发出⼀个请求，描述的是⼀个span开始
SR: server received/start 服务端/⽣产者接收请求 SR-CS属于请求发送的⽹络延迟
SS: server send/fifinish 服务端/⽣产者发送应答 SS-SR属于服务端消耗时间
CR：client received/fifinished 客户端/消费者接收应答 CR-SS表示回复需要的时间(响应的⽹络延迟)

Spring Cloud Sleuth （追踪服务框架）可以追踪服务之间的调⽤，Sleuth 可以记录⼀个服务请求经过哪些服务、服务处理时⻓等，根据这些，我们能够理清各微服务间的调⽤关系及进⾏问题追踪分析。

耗时分析：通过 Sleuth 了解采样请求的耗时，分析服务性能问题（哪些服务调⽤⽐较耗时）
链路优化：发现频繁调⽤的服务，针对性优化等

Sleuth就是通过记录⽇志的⽅式来记录踪迹数据的

我们往往把Spring Cloud Sleuth 和 Zipkin ⼀起使⽤，把 Sleuth 的数据信息发送给 Zipkin 进⾏聚合，利⽤ Zipkin 存储并展示数据。

Spring Cloud Sleuth 用来链路的收集。

image-20200820112539597

Zipkin 用于链路的分析统计和显示，并且进行持久化。

image-20200820112610329

Spring Cloud Sleuth

接下来我们就在项目中使用链路追踪技术。具体怎么做了？首先我们引入 Spring Cloud Sleuth 。在项目收集日志，我们在前面我们搭建的微服务架构 demo 的基础上改造。

依赖

给下面这些服务添加 Spring Cloud Sleuth 依赖。或者我们直接在父pom文件中添加依赖也可以。

image-20200820131801309

<!--链路追踪-->
<dependency>
    <groupId>org.springframework.cloud</groupId>
    <artifactId>spring-cloud-starter-sleuth</artifactId>
</dependency>

配置

每⼀个微服务都修改application.yml配置⽂件，添加⽇志级别

#分布式链路追踪
logging:
  level:
    org.springframework.web.servlet.DispatcherServlet: debug
    org.springframework.cloud.sleuth: debug

这样我们就完成了链路日志收集的工作啦。

Zipkin

上面完成了 Spring Cloud Sleuth ，接着我们来集成 ZipKin ，来让我们链路追踪方便我们分析查看。

Zipkin 包括Zipkin Server和 Zipkin Client两部分，Zipkin Server是⼀个单独的服务，Zipkin Client就是具体的微服务

Zipkin 服务端

我们先来完成服务端。服务端就是我们来查看分析链路追踪的。

依赖

一样的我们需要添加 ZipKin 服务端依赖

<!--zipkin-server的依赖坐标-->
<dependency>
    <groupId>io.zipkin.java</groupId>
    <artifactId>zipkin-server</artifactId>
    <version>2.12.3</version>
    <exclusions>
        <!--排除掉log4j2的传递依赖，避免和springboot依赖的日志组件冲突-->
        <exclusion>
            <groupId>org.springframework.boot</groupId>
            <artifactId>spring-boot-starter-log4j2</artifactId>
        </exclusion>
    </exclusions>
</dependency>

<!--zipkin-server ui界面依赖坐标-->
<dependency>
    <groupId>io.zipkin.java</groupId>
    <artifactId>zipkin-autoconfigure-ui</artifactId>
    <version>2.12.3</version>
</dependency>

配置

配置文件中我们指定启动端口，且关闭自动检测。

server:
  port: 8771

management:
  metrics:
    web:
      server:
        request:
          autotime:
            enabled: false # 关闭自动检测

启动类

启动类中我们增加 @EnableZipkinServer 注解来开启服务端。

@SpringBootApplication
@EnableZipkinServer // 开启Zipkin 服务器功能
public class ZipkinServerHw8771Application {

    public static void main(String[] args) {
        SpringApplication.run(ZipkinServerHw8771Application.class, args);
    }

}

这样服务端就配置好啦。

ZipKin 客户端

客户端就是我们这些已有的服务。

依赖

增加依赖

<dependency>
    <groupId>org.springframework.cloud</groupId>
    <artifactId>spring-cloud-starter-zipkin</artifactId>
</dependency>

配置

还需要增加配置，指向我们的 zipkin 服务端，并且以什么形式将消息发送给服务端。

spring
  zipkin:
    base-url: http://127.0.0.1:8771 # zipkin server的请求地址
    sender:
      # web 客户端将踪迹日志数据通过网络请求的方式传送到服务端，另外还有配置
      # kafka/rabbit 客户端将踪迹日志数据传递到mq进行中转
      type: web
    sleuth:
      sampler:
        # 采样率 1 代表100%全部采集 ，默认0.1 代表10% 的请求踪迹数据会被采集
        # 生产环境下，请求量非常大，没有必要所有请求的踪迹数据都采集分析，对于网络包括server端压力都是比较大的，可以配置采样率采集一定比例的请求的踪迹数据进行分析即可
        probability: 1

这样我们客户端也完成啦。

测试

现在我们启动我们这些微服务来测试一下，先启动配置中心，然后启动其他的微服务。

在启动的时候，我 zipkin 启动一直报错。找了一天的问题也没有解决，最后把springboot 版本降下来解决的。怀疑是版本冲突的。最终我zipkin 的 pom 文件如下：

<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 https://maven.apache.org/xsd/maven-4.0.0.xsd">
    <modelVersion>4.0.0</modelVersion>
    <parent>
        <groupId>org.springframework.boot</groupId>
        <artifactId>spring-boot-starter-parent</artifactId>
        <version>2.1.6.RELEASE</version>
    </parent>
    <groupId>cn.quellanan</groupId>
    <artifactId>zipkin-server-hw-8771</artifactId>
    <version>1.0.0</version>
    <name>zipkin-server-hw-8771</name>
    <description>链路追踪 zipkin 服务端</description>

    <dependencies>
        <!--web依赖-->
        <dependency>
            <groupId>org.springframework.boot</groupId>
            <artifactId>spring-boot-starter-web</artifactId>
        </dependency>
        <!-- Actuator可以帮助你监控和管理Spring Boot应用-->
        <dependency>
            <groupId>org.springframework.boot</groupId>
            <artifactId>spring-boot-starter-actuator</artifactId>
        </dependency>

        <dependency>
            <groupId>io.zipkin.java</groupId>
            <artifactId>zipkin-autoconfigure-ui</artifactId>
            <version>2.12.3</version>
        </dependency>
        <dependency>
            <groupId>io.zipkin.java</groupId>
            <artifactId>zipkin-server</artifactId>
            <version>2.12.3</version>
            <exclusions>
                <exclusion>
                    <groupId>org.springframework.boot</groupId>
                    <artifactId>spring-boot-starter-log4j2</artifactId>
                </exclusion>
            </exclusions>
        </dependency>
    </dependencies>

    <build>
        <plugins>
            <!--编译插件-->
            <plugin>
                <groupId>org.apache.maven.plugins</groupId>
                <artifactId>maven-compiler-plugin</artifactId>
                <configuration>
                    <source>8</source>
                    <target>8</target>
                    <encoding>utf-8</encoding>
                </configuration>
            </plugin>
            <!--打包插件-->
            <plugin>
                <groupId>org.springframework.boot</groupId>
                <artifactId>spring-boot-maven-plugin</artifactId>
            </plugin>
        </plugins>
    </build>

</project>

最终全部启动的效果：

image-20200820202620968

访问：

http://localhost:8771/

image-20200820202728532

我们访问几个接口试试。

http://localhost/api/code/create/1186154608@qq.com

再来看下界面

image-20200820203215027

image-20200820203313274

也可以查看详细信息

image-20200820203547189

ZipKin 数据持久化

但是上面有个问题，每次 ZipKin 重启，之前的数据就会丢失，所以需要做持久化，zipkin 支持多种方式持久化，我们这里就持久化的 mysql 中。

官网：

https://github.com/openzipkin/zipkin

找到创建表的脚本

https://github.com/openzipkin/zipkin/tree/master/zipkin-storage/mysql-v1/src/main/resources

image-20200820200522506

--
-- Copyright 2015-2019 The OpenZipkin Authors
--
-- Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except
-- in compliance with the License. You may obtain a copy of the License at
--
-- http://www.apache.org/licenses/LICENSE-2.0
--
-- Unless required by applicable law or agreed to in writing, software distributed under the License
-- is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express
-- or implied. See the License for the specific language governing permissions and limitations under
-- the License.
--

CREATE TABLE IF NOT EXISTS zipkin_spans (
  `trace_id_high` BIGINT NOT NULL DEFAULT 0 COMMENT 'If non zero, this means the trace uses 128 bit traceIds instead of 64 bit',
  `trace_id` BIGINT NOT NULL,
  `id` BIGINT NOT NULL,
  `name` VARCHAR(255) NOT NULL,
  `remote_service_name` VARCHAR(255),
  `parent_id` BIGINT,
  `debug` BIT(1),
  `start_ts` BIGINT COMMENT 'Span.timestamp(): epoch micros used for endTs query and to implement TTL',
  `duration` BIGINT COMMENT 'Span.duration(): micros used for minDuration and maxDuration query',
  PRIMARY KEY (`trace_id_high`, `trace_id`, `id`)
) ENGINE=InnoDB ROW_FORMAT=COMPRESSED CHARACTER SET=utf8 COLLATE utf8_general_ci;

ALTER TABLE zipkin_spans ADD INDEX(`trace_id_high`, `trace_id`) COMMENT 'for getTracesByIds';
ALTER TABLE zipkin_spans ADD INDEX(`name`) COMMENT 'for getTraces and getSpanNames';
ALTER TABLE zipkin_spans ADD INDEX(`remote_service_name`) COMMENT 'for getTraces and getRemoteServiceNames';
ALTER TABLE zipkin_spans ADD INDEX(`start_ts`) COMMENT 'for getTraces ordering and range';

CREATE TABLE IF NOT EXISTS zipkin_annotations (
  `trace_id_high` BIGINT NOT NULL DEFAULT 0 COMMENT 'If non zero, this means the trace uses 128 bit traceIds instead of 64 bit',
  `trace_id` BIGINT NOT NULL COMMENT 'coincides with zipkin_spans.trace_id',
  `span_id` BIGINT NOT NULL COMMENT 'coincides with zipkin_spans.id',
  `a_key` VARCHAR(255) NOT NULL COMMENT 'BinaryAnnotation.key or Annotation.value if type == -1',
  `a_value` BLOB COMMENT 'BinaryAnnotation.value(), which must be smaller than 64KB',
  `a_type` INT NOT NULL COMMENT 'BinaryAnnotation.type() or -1 if Annotation',
  `a_timestamp` BIGINT COMMENT 'Used to implement TTL; Annotation.timestamp or zipkin_spans.timestamp',
  `endpoint_ipv4` INT COMMENT 'Null when Binary/Annotation.endpoint is null',
  `endpoint_ipv6` BINARY(16) COMMENT 'Null when Binary/Annotation.endpoint is null, or no IPv6 address',
  `endpoint_port` SMALLINT COMMENT 'Null when Binary/Annotation.endpoint is null',
  `endpoint_service_name` VARCHAR(255) COMMENT 'Null when Binary/Annotation.endpoint is null'
) ENGINE=InnoDB ROW_FORMAT=COMPRESSED CHARACTER SET=utf8 COLLATE utf8_general_ci;

ALTER TABLE zipkin_annotations ADD UNIQUE KEY(`trace_id_high`, `trace_id`, `span_id`, `a_key`, `a_timestamp`) COMMENT 'Ignore insert on duplicate';
ALTER TABLE zipkin_annotations ADD INDEX(`trace_id_high`, `trace_id`, `span_id`) COMMENT 'for joining with zipkin_spans';
ALTER TABLE zipkin_annotations ADD INDEX(`trace_id_high`, `trace_id`) COMMENT 'for getTraces/ByIds';
ALTER TABLE zipkin_annotations ADD INDEX(`endpoint_service_name`) COMMENT 'for getTraces and getServiceNames';
ALTER TABLE zipkin_annotations ADD INDEX(`a_type`) COMMENT 'for getTraces and autocomplete values';
ALTER TABLE zipkin_annotations ADD INDEX(`a_key`) COMMENT 'for getTraces and autocomplete values';
ALTER TABLE zipkin_annotations ADD INDEX(`trace_id`, `span_id`, `a_key`) COMMENT 'for dependencies job';

CREATE TABLE IF NOT EXISTS zipkin_dependencies (
  `day` DATE NOT NULL,
  `parent` VARCHAR(255) NOT NULL,
  `child` VARCHAR(255) NOT NULL,
  `call_count` BIGINT,
  `error_count` BIGINT,
  PRIMARY KEY (`day`, `parent`, `child`)
) ENGINE=InnoDB ROW_FORMAT=COMPRESSED CHARACTER SET=utf8 COLLATE utf8_general_ci;

创建好后的表如下：

image-20200820200616841

引入依赖

<!--zipkin针对mysql持久化的依赖-->
<dependency>
    <groupId>io.zipkin.java</groupId>
    <artifactId>zipkin-autoconfigure-storage-mysql</artifactId>
    <version>2.12.3</version>
</dependency>
<dependency>
    <groupId>mysql</groupId>
    <artifactId>mysql-connector-java</artifactId>
</dependency>
<dependency>
    <groupId>com.alibaba</groupId>
    <artifactId>druid-spring-boot-starter</artifactId>
    <version>1.1.10</version>
</dependency>
<!--操作数据库需要事务控制-->
<dependency>
    <groupId>org.springframework</groupId>
    <artifactId>spring-tx</artifactId>
    </dependency>
<dependency>
    <groupId>org.springframework</groupId>
    <artifactId>spring-jdbc</artifactId>
</dependency>

增加配置

增加连接数据库的配置

spring:
  application:
    name: zipkin-server-8771
  sleuth:
    enabled: false
  datasource:
    driver-class-name: com.mysql.jdbc.Driver
    url: jdbc:mysql://localhost:3306/zipkin?serverTimezone=UTC&useUnicode=true&characterEncoding=utf-8&useSSL=false&allowMultiQueries=true
    username: root
    password: 123456
    druid:
      initialSize: 10
      minIdle: 10
      maxActive: 30
      maxWait: 50000
# 指定zipkin持久化介质为mysql
zipkin:
  storage:
    type: mysql