Building production Flink jobs with Airstream at Airbnb
对应的现场视频已上传至B站,地址为 https://www.bilibili.com/video/av53226934/
AirStream is a realtime stream computation framework that supports Flink as one of its processing engines. It allows engineers and data scientists at Airbnb to easily leverage Flink to build real time data pipelines and feedback loops. Multiple mission critical applications have been built on top of it. In this talk, we will start with an overview of AirStream, and describe how we have designed Airstream to leverage SQL support in Flink to allow users to easily build real time data pipelines. We will go over a few production use cases such as building a user activity profiler and building user identity mapping in realtime. We will also cover how we have integrated Airstream into the data infrastructure ecosystem at Airbnb through easily configurable connectors such as Kafka and Hive that allow users to easily leverage these components in their pipelines.
AirStream是支持Flink作为其处理引擎之一的实时流计算框架。它允许Airbnb的工程师和数据科学家轻松利用Flink构建实时数据管道和反馈回路。在此基础上构建了多个关键任务应用程序。在本文中,我们将从AirStream的概述开始,并描述我们如何设计气流以利用Flink中的SQL支持,从而允许用户轻松构建实时数据管道。我们将介绍一些生产用例,例如构建用户活动探查器和实时构建用户身份映射。我们还将介绍如何通过易于配置的连接器(如Kafka和Hive)将AirStream集成到Airbnb的数据基础设施生态系统中,该连接器允许用户在管道中轻松利用这些组件。