DevOps at Massive Scale
Frequent spurts of backlash resulting from these changes became known to some Yahoo! insiders as “the bad years.” How did Yahoo! use DevOps to recover?
Join the DZone community and get the full member experience.Join For Free
When you have a billion users, people notice. That’s where our story about DevOps and Yahoo! starts. For Kishore Jalleda and Gopal Mor, both engineers at Yahoo!, when something goes wrong on a Yahoo! page, people will notice.
Correction: a lot of people will notice.
Of course, Yahoo! — like all services on the Internet — constantly improves its products. In fact, they have 100+ iterations and experiments happening at any given time. Some changes bring new innovation to the forefront and others alter the user experience.
When iterations and experiments are served in front of loyal users who have become comfortable with a specific user experience, they sometimes react with a natural resistance to the change. When a change causes or appears to cause breaks in service, the backlash can be crippling. Frequent spurts of backlash resulting from these changes became known to some Yahoo! insiders as “the bad years.”
At the recent All Day DevOps conference, Kishore and Gopal shared how Yahoo! turned to DevOps practices to recover from “the bad years.” In their presentation, Launching Products at Massive Scale: The DevOps Way, Kishore said:
“DevOps is about eliminating technical, process, and cultural barriers between idea and execution — using software.”
Specific to each of these, Kishore recommends the following:
Culture of ownership and excellence: Own lifecycles within Development, fix root causes, and have pride in the product.
Processes: Design or engineer processes to be fast and Agile. Work should be iterative, support learning, and provide fast feedback cycles. Let the machines to the heavy lifting.
Tools: Solve operations problems with software. Use open-source tools that are self-service, reusable, friendly, and easy-to-use.
At Yahoo!, the DevOps practice is built on three functional pillars:
Deliver products to market quickly.
Prevent defects from reaching customers.
Repair production issues quickly.
Speaking of repairing production issues quickly, Gopal discussed resiliency. With their user base, downtime (or lack thereof) is critical to Yahoo! and the challenges are many:
- Distributed multilayer architecture.
- Hundreds of subsystems.
- Complex request flow.
- Change is the only constant.
While it may be counterintuitive, Gopal demonstrates how the combined system is weaker than the weakest subsystem.
To ensure optimum uptime, Gopal tells us we need to:
- Analyze the entire range of failure types.
- Understand their rate and impact level.
- Plan to cover all failure types.
- Conduct fire drills — test, test, and test.
Specifically, how does Yahoo! ensure high availability? They maintain four layers of resiliency in the serving stack:
Speculative retry. Deliver the page again after a predefined latency is exceeded. This addresses long tail latency and intermittent failures.
Per-module fallback. Cache non-personalized modules to front-end servers and serve cached content for failed modules. You need to ensure the cache is refreshed often, that you implement strong validation of the cache’s data, and you check for backward compatibility if the TTL is high.
Full page failsafe. Cache the entire page without personalized data and ads and with minimal interaction. This is used when the entire page cannot be served. Yahoo! uses auto-scale AWS servers to serve these pages so their servers are not negatively affected.
Failwhale page. The “we will be right back” page that lets users know, yes, your Internet is working, but our service isn’t — temporarily. Please try again. Obviously, this is a last resort, but it's a necessary one.
There is a lot here, and you don’t have to have a service at the scale of Yahoo! to benefit from the experience. You can dive into Kishore and Gopal’s full All Day DevOps conference session (just 30 minutes) to learn more about their learnings from the DevOps front lines. The other 56 presentations from the All Day DevOps Conference are also available online, free of charge.
Published at DZone with permission of Derek Weeks, DZone MVB. See the original article here.
Opinions expressed by DZone contributors are their own.