One of the big draws of the O'Reilly Software Architecture Conference was Adrian Cockcroft's talk, "Deliver Faster and Spend Less with Cloud Native Microservices." Cockcroft is an experienced speaker on the conference circuit and he's well-known as the architect who led Netflix into its new era of unprecedented scale and agility. He now works for Battery Ventures, but he still draws primarily on his experiences at Netflix for his talks. He and his team were the ones behind the greatest success story for the latest trend in software architecture: microservices.
It all started with a Netflix datacenter full of 'snowflakes' in 2008. After a major outage that year, they decided that they weren't very good at running datacenters, so they decided to let someone who was better at managing datacenters handle them. This 'someone' would eventually end up being Amazon Web Services.
A Quick Story About Netflix
While everyone who's contemplating a move to the cloud or to microservices probably has a unique situation, Cockcroft encouraged new adopters to remember the theme of Netflix:
Start with the simplest possible thing you can transition, make sure it's not customer-facing, and test it in the new system. You should establish your risk boundaries, and move them forward as you have more successes. Use the smallest instance that still allows you to learn all that you want.
Things moved over to AWS fairly quick for Netflix. By 2010 they knew they had to move most of their infrastructure over because they knew they would be out of capacity by Christmas. What they used initially was a hybrid cloud, but it involved a lot of things to manage in many different places. Cockcroft and his team compared it to a horse-rider riding two horses with a leg on each.
Visual example, for your amusement...
Today, Cockcroft is much more confident in the maturity of the public cloud, and says that the hybrid cloud step could probably be skipped in most organizations.
Netflix got a lot of their engineering talent because of their major open source effort. There were several advantages to this:
- Engineers who worked on a lot of open source projects had high levels of creativity
- Developers felt more ownership over their work, and pride in it
- Open source developers work well together because of their similar ways of thinking
- Peer pressure from GitHub—having their name on a project—was a big motivator for engineers to work harder and not let the community of users down.
- If they leave, they're likely to keep working on the project, so you're still getting value for free!
Why was it a good idea for Netflix to open source so much of their architecture and associated tools?
It was actually a very smart move. They knew they were ahead of the game technologically, so they didn't want to get so far ahead of the rest of the industry that they'd have to synchronize their architecture and technology with some other future trend or open direction of the larger industry. They wanted to become that standard and future trend. And they didn't want to be the only "Unicorn," because they truly believe that they're not the only ones who can do this. (Today, Gilt, Twitter, Soundcloud, and others have proven this fact)
Business managers have been in love with the OODA loop for years. It was defined by John Boyd, a US Air Force colonel, and it stands for "Observe, Orient, Decide, and Act." It's especially relevant to the Lean Startup mindset , which is probably why Netflix really liked the concept. Cockcroft shared his own OODA diagram with related concepts in software development:
- Observe = Research (gather data) & Innovation
- Orient = Big Data analytics
- Decide = Culture (JFDI: Just F***ing Do It)
- Act = Cloud-speed provisioning
Continuous Delivery is at the core of the cycle
The Site Reliability Team
With large software teams, it becomes impossible to pinpoint the developer who caused a bug without focused teams working on modular areas of the codebase. Many organizations have a reliability or monitoring team that fixes the bugs themselves or has to notify a bunch of people when something breaks.
The site reliability team at Netflix, didn't do any of those things. Their only job was to identify which microservice caused the bug, and then notify only the developer who was tagged as working on the malfunctioning piece of code (or their former team leader, if they're not working on that code anymore). They'll also organize a meeting of multiple developers if necessary, but the developers (not monitoring or the Ops/platform team) are on the hook for supporting their microservices.
The site reliability team is solely there to notify and organize the relevant developers (which they discover through monitoring each microservice) when something breaks.
The number of people inconvenienced by bugs and outages should be your metric. --Adrian Cockcroft
3 Kinds of ReleasesLots of developers have trouble with having daily or weekly releases. Most of their issues come from a misunderstanding of what 'release' means in Continuous Delivery. There are three definitions:
- Putting some code in production (this is the release CD is talking about—fast micro-releases)
- Customers start seeing the code (a small feature release to customers)
- A major marketing release. (aggregation of micro-releases) It's all about marketing and it's calendar-driven
Give the developers the pain, and they'll automate everything out of the way.
The Boundaries of Microservices
- Check out Martin Fowler's post on Bounded Context
- Read the DDD book by Eric Evans
- A microservice's immediate connections and context should be something one developer can fit in their head
- One developer should be able to independently produce it
- One "verb" (single concern/function, not GET/PUT/DELETE) per microservice
- Should be possible to deploy in a container
Hopefully, some of these key takeaways from Adrian Cockcroft's talk clarified some of your knowledge about microservices or agile organizational methodologies. There is even more content from this talk to discuss, and I'll include those other facets in their own focused posts. Check out this blog post for further examination of Adrian's key themes from his microservice talks.