Adventures in Scaling from Zero to 5 Billion Data Points per Day -- Dave Torok(Comcast)
At Flink Forward San Francisco 2018 our team at Comcast presented the operationalized streaming ML framework which had just gone into production. This year in just a few short months we scaled a Customer Experience use case from an initial trickle of volume to processing over 5 Billion data points per day. This use case is used to help diagnose potential issues with High Speed Data service and provide recommendations to solving this issues as quickly and as cost-effectively as possible.
As with any solution that grows quickly, our platform faced challenges, bottlenecks, and technology limits; forcing us to quickly adapt and evolve our approach to enable handling 50,000+ data points per second.
We will introduce the problems, approaches, solutions, and lessons we learned along the way including: The Trigger and Diagnosis Problem, The REST problem, The “Feature Store” Problem, The “Customer State” Problem, The Savepoint Problem, The HA Problem, The Volume Problem, and of course The Really High Volume Feature Store Problem #2.
在Flink Forward San Francisco 2018我们的团队在Comcast 提出了操作化的流媒体ML框架,刚刚投入生产。今年,在短短几个月内,我们将客户体验用例从最初的少量数量扩展到每天处理超过50亿个数据点。此用例用于帮助诊断高速数据服务的潜在问题,并为尽可能快速和经济高效地解决此问题提供建议。
与任何快速增长的解决方案一样,我们的平台也面临挑战、瓶颈和技术限制;迫使我们快速调整和改进我们的方法,使我们能够每秒处理50000多个数据点。
我们将介绍一路上我们学到的问题、方法、解决方案和经验教训,包括:触发和诊断问题、其余问题、功能存储问题、客户状态问题、保存点问题、HA问题、容量问题,当然还有高容量功能存储问题。