Netflix's infrastructure is just too big for human eyes to check everything. That's why they built this incredible operationally intelligent system that monitors itself and basically heals itself.
"Netflix strives to provide an amazing experience to each member. To accomplish this, Netflix needs to maintain very high availability across our systems. However, at a certain scale, humans can no longer scale their ability to monitor the status of all systems, making it critical for Netflix to build tools and platforms that can automatically monitor their production environments and make intelligent real-time operational decisions to remedy the problems they identify. In this session, we discuss how Netflix uses data mining and machine learning techniques to automate decisions in real-time with the goal of supporting operational availability, reliability, and consistency. We review how we got to the current states, the lessons we learned, and the future of real-time analytics at Netflix. While Netflix's scale is larger than most other companies, we believe the approaches and technologies we discuss are highly relevant to other production environments, and audience members should come away with actionable ideas that are implementable in, and benefit, most other environments."