采样
Learn about sampling, and the different sampling options available in OpenTelemetry. 了解采样以及 OpenTelemetry 中提供的不同采样选项。
With distributed tracing, you observe requests as they move from one service to another in a distributed system. It’s superbly practical for a number of reasons, such as understanding your service connections and diagnosing latency issues, among many other benefits. 使用分布式跟踪,您可以观察请求在分布式系统中从一个服务到另一个服务的传递情况。出于多种原因,这对于理解服务连接和诊断延迟问题非常实用。
However, if the majority of all your requests are successful 200s and finish without unacceptable latency or errors, do you really need all that data? Here’s the thing—you don’t always need a ton of data to find the right insights. You just need the right sampling of data. 但是,如果您的所有请求中的大多数都在 200 秒内成功完成并且没有出现不可接受的延迟或错误,那么您真的需要所有这些数据吗?事情是这样的——你并不总是需要大量数据才能找到正确的洞察力。您只需要正确的数据采样。
The idea behind sampling is to control the spans you send to your observability backend, resulting in lower ingest costs. Different organizations will have their own reasons for not just why they want to sample, but also what they want to sample. You might want to customize your sampling strategy to: 采样的核心思想是控制您发送到可观测后端的Span,从而降低摄取成本。不同的组织不仅有自己的采样原因,而且也有自己想要采样的对象。您可能需要自定义采样策略以实现以下目标:
术语
It’s important to use consistent terminology when discussing sampling. A trace or span is considered “sampled” or “not sampled”: 在讨论采样时使用一致的术语非常重要。Trace或Span被视为“已采样”或“未采样”:
Sometimes, the definitions of these terms get mixed up. You may find someone state that they are “sampling out data” or that data not processed or exported is considered “sampled”. These are incorrect statements. 有时,这些术语的定义会混淆。您可能会发现有人声称他们正在“采样数据”,比如未处理或导出的数据被视为“采样”。这些都是不正确的说法。
头部采样
Head sampling is a sampling technique used to make a sampling decision as early as possible. A decision to sample or drop a span or trace is not made by inspecting the trace as a whole. 头部采样是一种用于尽早做出采样决定的采样技术。对Span或Trace进行采样或删除的决定不是通过检查整个Trace来做出的。
For example, the most common form of head sampling is Consistent Probability Sampling. It may also be referred to as Deterministic Sampling. In this case, a sampling decision is made based on the trace ID and a desired percentage of traces to sample. This ensures that whole traces are sampled - no missing spans - at a consistent rate, such as 5% of all traces. 例如,最常见的头部采样形式是 一致概率采样。它也可以称为确定性采样。在这种情况下,采样决策是根据Trace ID和所需的采样Trace百分比做出的。这可确保对整个Trace保持一致的速率(例如所有Trace的 5%)进行采样,且不会丢失Span。
The upsides to head sampling are: 头部采样的优点是:
The primary downside to head sampling is that it is not possible make a sampling decision based on data in the entire trace. This means that head sampling is effective as a blunt instrument, but is wholly insufficient for sampling strategies that must take whole-system information into account. For example, it is not possible to use head sampling to ensure that all traces with an error within them are sampled. For this, you need Tail Sampling. 头部采样的主要缺点是无法根据整个Tace中的数据做出采样决策。这意味着首部采样在某种程度上会有效,但对于必须考虑整个系统信息的采样策略来说完全不够。例如,不可能使用头部采样来确保对其中有错误的所有Trace进行采样。为此,您需要尾部采样。
尾部取样
Tail sampling is where the decision to sample a trace takes place by considering all or most of the spans within the trace. Tail Sampling gives you the option to sample your traces based on specific criteria derived from different parts of a trace, which isn’t an option with Head Sampling. 尾部采样是通过考虑Trace内的全部或大部分Span来决定对Trace哪些地方的Span进行采样的方法。尾部采样允许您根据Trace的不同部分派生出的特定条件对Trace进行采样,而头部采样则无法提供此选项。
Some examples of how you can use Tail Sampling include: 使用尾部采样的一些示例包括:
As you can see, tail sampling allows for a much higher degree of sophistication. For larger systems that must sample telemetry, it is almost always necessary to use Tail Sampling to balance data volume with usefulness of that data. 正如您所看到的,尾部采样可以实现更高程度的复杂性。对于必须对Telemetry数据进行采样的大型系统,几乎总是需要使用尾部采样来平衡数据量和数据的有用性。
There are three primary downsides to tail sampling today: 目前尾部采样存在三个主要缺点:
Finally, for some systems, tail sampling may be used in conjunction with Head Sampling. For example, a set of services that produce an extremely high volume of trace data may first use head sampling to only sample a small percentage of traces, and then later in the telemetry pipeline use tail sampling to make more sophisticated sampling decisions before exporting to a backend. This is often done in the interest of protecting the telemetry pipeline from being overloaded. 最后,对于某些系统,尾部采样可以与头部采样结合使用。例如,生成大量跟踪数据的一组服务可能首先使用头部采样仅对一小部分跟踪进行采样,然后在Telemetry管道中使用尾部采样做出更复杂的采样决策,然后再导出到后端。这样做通常是为了防止Telemetry管道过载。
The OpenTelemetry Collector includes the following sampling processors: OpenTelemetry Collector 包括以下采样处理器:
For the individual language specific implementations of the OpenTelemetry API & SDK you will find support for sampling at the respective documentation pages: 对于 OpenTelemetry API 和 SDK 的各个语言特定实现,您可以在相应的文档页面找到采样支持: