Flink Forward 2019系列文章--实战相关(3)--Netflix

阿泽

发布于 2019-06-21 16:07:35

5170

发布于 2019-06-21 16:07:35

Massive Scale Data Processing at Netflix using Flink--Snehal Nagmote & Pallavi Phadnis

导读

Over 137 million members worldwide are enjoying TV series, feature films across a wide variety of genres and languages on Netflix. It leads to petabyte scale of user behavior data. At Netflix, our client logging platform collects and processes this data to empower recommendations, personalization and many other services to enhance user experience. Built with Apache Flink, this platform processes 100s of billion events and a petabyte data per day, 2.5 million events/sec in sub milliseconds latency. The processing involves a series of data transformations such as decryption and data enrichment of customer, geo, device information using microservices based lookups.

全世界有超过1.37亿会员在Netflix上欣赏各种类型和语言的电视连续剧、故事片。它导致了用户行为数据达到千兆字节级。在Netflix，我们的客户端日志记录平台收集和处理这些数据，以增强建议、个性化和许多其他服务，以增强用户体验。这个平台使用Flink构建，每天处理1000亿个事件和一个千兆字节数据，每秒250万个事件，延迟时间为毫秒。处理过程涉及一系列数据转换，例如使用基于微服务的查找对客户、地理位置和设备信息进行解密和数据浓缩。

The transformed and enriched data is further used by multiple data consumers for a variety of applications such as improving user-experience with A/B tests, tracking application performance metrics, tuning algorithms. This causes redundant reads of the dataset by multiple batch jobs and incurs heavy processing costs. To avoid this, we have developed a config driven, centralized, managed platform, on top of Apache Flink, that reads this data once and routes it to multiple streams based on dynamic configuration. This has resulted in improved computation efficiency, reduced costs and reduced operational overhead.

转换和丰富的数据被多个数据使用者进一步用于各种应用程序，例如改进A/B测试的用户体验、跟踪应用程序性能指标、优化算法。这会导致通过多个批处理作业对数据集进行冗余读取，并产生大量的处理成本。为了避免这种情况，我们在Flink之上开发了一个配置驱动的、集中的、托管的平台，它只读取一次数据，并基于动态配置将其路由到多个流。这就提高了计算效率，降低了成本，降低了运营管理费用。

Stream processing at scale while ensuring that the production systems are scalable and cost-efficient brings interesting challenges. In this talk, we will share about how we leverage Apache Flink to achieve this, the challenges we faced and our learnings while running one of the largest Flink application at Netflix.

大规模的流处理，同时确保生产系统具有可扩展性和成本效益，这带来了有趣的挑战。在本次讨论中，我们将分享如何利用ApacheFlink来实现这一目标，我们在运行Netflix最大的Flink应用程序时所面临的挑战和所学到的知识。