Flink Forward 2019--实战相关(17)--Yelp分享实时访问规模预测

阿泽

发布于 2019-08-23 16:30:01

8430

Using Flink to inspect live data as it flows through a data pipeline -- Matthew Dailey(Splunk)

One of the hardest challenges with authoring a data pipeline in Flink is understanding what your data looks like at each stage of the pipeline. Pipeline authors would love to answer questions like ""why is no data coming through my filter?"" Or ""why did my regex not extract any fields?"" Or ""is my pipeline even reading anything from Kafka?"" Unit and integration testing pipeline logic goes a long way, and metrics are another great tool to understand what a pipeline is doing, but sometimes you need the data itself to answer why a pipeline is behaving the way it is.

在Flink中编写数据管道最困难的挑战之一是了解管道的每个阶段的数据外观。管道作者希望回答这样的问题：“为什么没有数据通过我的过滤器？”或者“为什么我的regex没有提取任何字段？”或者“我的管道甚至在读卡夫卡的文章吗？”单元和集成测试管道逻辑有很长的路要走，度量是理解管道正在做什么的另一个很好的工具，但有时您需要数据本身来回答管道行为方式的原因。

To answer these questions for ourselves and our customers, at Splunk we created a simple yet robust architecture for extracting data as it moves through a pipeline. You'll also learn about our implementation of this architecture, including the lessons learned while creating it, and how you can apply this architecture yourself. You'll hear about how to rewrite your Flink job graph at job submission time, how to retrieve data from all the nodes in the job graph, and how to expose this information to a user interface through a REST API.

为了为我们自己和我们的客户解答这些问题，在Splunk，我们创建了一个简单而健壮的体系结构，用于在数据通过管道时提取数据。您还将了解这个体系结构的实现，包括在创建它时学到的经验教训，以及如何自己应用这个体系结构。您将听到如何在作业提交时重写Flink作业图，如何从作业图中的所有节点检索数据，以及如何通过RESTAPI将这些信息公开给用户界面。