After attending a full-day track with multiple sessions and open-spaces on microservices at QCon New York, it is clear that this technique took flight since I first heard about that at the same conference in 2014. A lot of companies made the jump to solve their technical and organizational scaling issues, including Uber, Amazon, and Netflix. And many of them used QCon to share their experiences. This is a random list of thoughts, observations, and ideas from that day.
- A lot of my concerns from that time seem to have been resolved by the industry. It's incredible how many (open-source) libraries, products, and tools have emerged since then. Kafka seems to be a big one here.
- Container technology seems to have become a crucial element in a successful migration to microservices. Some would even call this a disrupting technology that changes the way we build software. Deployment times went from months, to days, to minutes. But apparently the next big thing is Serverless Architecture. AWS Lambda and Azure Functions are examples of that.
- Domain Driven Design is becoming the de-facto technique for building microservices. You can see that in this nice definition: "A loosely coupled service-oriented architecture defined by bounded contexts". Event the DDD statement that if you have the need to know too much about surrounding services, you probably have your bounded context wrong, applies to microservices. Unfortunately, according to an open space discussion, defining those boundaries is the most complex task.
- One of the next challenges the community is expected to resolve is the complexity of authorization, security groups, network partition and such.
- Something that Daniel Bryant noted (and something I observed myself while talking to the people at QCon) is that microservices is becoming the new the solution-to-all-problems. This is dangerous and leads to cargo culting.
- Microservices is not just about technology. It also has a significant effect on the organization. In fact, as the workshop on the last day showed (about which a blog post will follow), these two go hand in hand.
- Failure testing by injecting catastrophic events using Chaos Monkey (part of the Simian Army), Failure Injection Testing and Gremlin seem to become a commodity.
- For obvious reasons Continuous Delivery is not a luxury anymore, it has become a prerequisite.
- Version-aware routing and discovery of services through an API gateway is being used by all those that moved to microservices and seem to be a prerequisite. Such a gateway also provides a service registry to find services by name and version and get an IP and port. Smart Pipes are supposed to make that even more transparent. Examples of gateways that were used included Kong, Apigee, AWS API Gateway and Mulesoft.
- With respect to communication protocols, the consensus is to use JSON over HTTP for external/public interfaces (which makes it easy to consume by all platforms), and using more bandwidth-optimized protocols like thrift, Protobuf, Avro or SBE internally. XML has been ruled out by all parties. Next to that, Swagger (through Swashbuckle for .NET) or DataWire Quark were mentioned for documenting the interfaces. Next to that, developers proposed to have the owning team build a reference driver for every service. Consumers can use that to understand the semantics of the service.
- Even a deployment strategy has formed. I've heard multiple stating that you should deploy the existing code with either a new/updated platform stack or new dependencies, but not both. Consequently, deploying new code should happen without changing the dependencies or platform. This should reduce the feedback loop for diagnosing deployment problems.
- A common recurring problem is overloading the network, also called retry storming because each service has its own timeout and retry logic causes an exponential retry duplication. A proposed solution is introducing a cascading timeout budget. Using event publishing rather than RPC surely can prevent this problem altogether.
- If you have many services, you might eventually run into connection timeout issues. So using shared containers allows reusing connections. Don't use connection pooling either. Next to that, if services have been replicated for scale-out purposes, you should only retry on a different connection to a different instance.
- Regressions don't surface immediately, apparently, so all of the speakers agreed that canary testing is the only reliable way to find them.
- A big-bang replacement is the worst thing you can do, and most of the successful companies used a form of the Strangler pattern to replace a part of the functionality.
- A nice pattern that was mentioned a couple of times is that every service exposes a /test end-point so that be used to verify versions and dependencies during canary testing.
So what do you think? Are you already trying or operating microservices?