My goal today is to try and make an argument for reducing the amount of shared code and limiting or eliminating the coupling of systems with binary dependencies. The punchlines of this text are all taken from the fantastic talk given by Ben Christensen, titled “Don’t Build a Distributed Monolith,” which can greatly help in putting some developers’ personal observations into context.
I'll be talking about shared libraries and network clients. We define them for the purpose of this post as follows:
Shared libraries: libraries that are required or libraries of the transitive variety (buy one and get 20 free type of thing!). They’re often called “the core” or “platform core” or similar names signifying their importance and omnipresence.
Network clients: the clients without which a service is inaccessible or extremely hard to access due to technical limitations or lack of clarity of protocols and contracts.
The “Distributed Monolith” is born out of a phenomenon within which a monolithic code base is broken up and spread across a network. This is where we lose a few of the benefits and keep almost all the troubles. There is a very thin line between a distributed monolith and microservices. There is no Great Wall of China with armed sentries posted every 20 feet between them. I would like to argue that with too much sharing we create a field of gravity where all you need is a little push to be thrown into a bigger problem than the one you wanted to avoid.
We can identify a distributed monolith:
When it sometimes take months to upgrade a library across a company, or days on a smaller scale. I think every developer has felt that kind of pain one way or another, in or out of a monolith, here or there. In this scenario, when an upgrade has to happen, a team has to get assigned to up the versions across all the repos and, depending on the numbers of repos and their complexity, this can take from days to months to get that diff safely. This is often followed by a sweaty weekend where the entire company has to be upgraded.
When introducing a new language or major foundational tech stack takes too long relative to the size and complexity of the code base. When we need to reinvent a lot of wheels and reverse engineer how the base platform works just to get going with another language.
When it is the client binary that decides all the details, such as opening sockets, using threads or thread pools, allocating memory, doing caching, or any of those nice things. Those decisions are now all made by some other intrusive binary which comes into the system and, in our runtime, is making decisions that we cannot control. Extend this to however many client libraries and you get yourself a decision nightmare. A change sometimes means having to make decisions with all the consumers of those libraries. The operational complexity is now everyone’s concern instead of the service owner isolating their service implementation.
If these symptoms are occurring, they can be signs of losing some of the coolest benefits of microservice architecture. In many ways, if a system has ended up with any of the above symptoms, it probably should have stayed closer to the monolith anyways, because it is paying all the costs of the distributed systems without some of its clear benefits. Some of those voided benefits are:
Embracing a Polyglot Universe
We don’t mean insanity here. It doesn’t mean every service written in a different language. And it definitely doesn’t mean PHP. That is a whole other extreme that is very unlikely to be of any benefit to anybody. It does make sense that there are core technologies that a broad portion of the company is familiar with. Even if it’s just for debugging at 3am on the eve of Saint Jean Baptiste! We have been there, we have done that. But, and the big “BUT,” is that over time, we might find that there are different services that are served better by different languages or technology stacks, or there are different skill sets that we can hire or acquire using acquisitions that will mean money if they can just integrate with the system following a bunch of protocols and contracts rather than having to learn Spring from scratch.
Organizational and Technical Decoupling
One of the most talked about benefits of microservices is that it enables an organization to grow such that the individual teams and organizations within can evolve technically without coupled collaboration between them. So an individual team adopts a new technology or platform without convincing a central authority. Let’s say we had some use cases that would be better served by the Actor model, but this coupling in a distributed monolith ends up in rewriting the entire platform.
Within a company, the pressures of deliverable dates and organizational pressure will inevitably make you move forward rapidly. It's never an easy thing to push back and say, “I don’t agree with what you’re pushing upon us.” And then we get into cross-organizational disputes over timelines and priorities where everyone is right in their own way. Imagine we are a consumer that depends on ten different services with ten client libraries to use. It’s 3am and everything is crashing and all the complexity of the code and the responsibility of dealing the bugs are now our problem.
Over time, the reasons we make certain technical choices will change, or versions of libraries increase and we want to adopt the newer tech. It’s a good test to see if this can be done without upgrading the entire company at once. Sometimes, because of the temporal coupling, our service cannot use the newest version of awesome library X because some core central platform is transitively holding you to three versions ago.
The Overrated DRYness
I was a firm believer that code duplication is a deadly sin. This is what I had been taught since, hmmm, forever. A good friend once argued that it’s not always actually the right or the best thing to do and referred me to this the talk which points to a chapter in Sam Newman’s book titled “DRY and the Perils of Code Reuse in a Microservice World.” The thing is, DRY is not necessarily the right principle to prioritize in distributed systems and that abstractions are good but those that do not require binary coupling. The evils of too much coupling between services are far worse than that caused by code duplication.
Shared code, in and of itself, is not problematic when you’re using it inside for your implementation. But as soon as it starts to leak across your network boundaries and across your service boundaries, that’s when it starts to become a problem. If some service wants to have a Spring stack that teams can adopt, that’s fine. As long as it’s not preventing me from using also Go or Node.js or whatever in the different places, and that a large framework like Spring is an implementation inside, as with Akka or any other powerful technology. But then I should expose APIs and not expect my entire company to all be Akka actors, because it’s not the right solution everywhere. One of the reasons we bother with microservice architecture is so that we can avoid coupling the producer and consumers, so that when one changes, the others don’t all have to change. And if we favor the DRY principle, we can start to break this very benefit of microservices.
True technology heterogeneity gets lost with binary coupling. Let’s say Go is the new hotness and I want to adopt Go or I want to bring this dude who is really, really awesome and can do good stuff on the Go. In that case, if a team feels that it is beneficial enough for us to embrace Go, we can actually reimplement the necessary libraries against the protocols and contract to do so. For example, for AWS APIs, there are separate teams and the community also who actually generates common libraries that most people end up using. Some of these common tech stacks resemble the platform in a distribute monolith that we mentioned earlier. However these are not formalized as the only way to do something, and that they’re built against protocols and contracts. Here, different stacks can exist. You get the benefits of reusing other people’s work and collaborating together, but you actually still allow the decoupling over time.
As Ben Christensen says in “Don’t Build a Distributed Monolith,” “This happens to be why the internet has been so successful. It’s been able to evolve completely independently. There’s an interesting, though, that happens within a company. Even though we base ourselves off of the internet technologies, because we don’t have that hard decoupling of organizational boundaries and different priorities, we end up often making decisions that end up breaking the very things that made the internet so successful.”
The first thing that happens when things are getting too DRY is a client library, written by the service team, becomes the only official way to access the service and no other way will work. The consuming team is now at the mercy of the service owner. Whatever the service owner chooses to do, whatever code they choose to put in their client, whatever their deployment cycle is, or whenever they need to fix a bug, the consuming team basically has no choice except to accept what they give. Also when we have a single official client it’s then very easy for service logic to start to drift into the client because it’s now my only formal way of talking. Despite being useful in narrow context, it is a limiting factor for polyglot adoption as usually the team which own the service is an expert in one (at best, two) languages.
It always happens that at some point in time we end up reasoning that it is just easier to make a conditional check in the client and here is when all hell breaks loose. All of a sudden, we are starting to actually run a bunch of service code in the client. Worst case it may lead some dev to forget about the distributed nature of the system and start using the library as if it was local methods. The entire system is tightly coupled through these formal client libraries that have already made the decision of what the architecture and language and tech stack has to be. If we ever adopt anything new, we actually have to figure out what to do with the dozens of clients that are all based upon a decision years ago.
Sometimes there can be so much business logic or information about how something should work that there is no way but to use the exact same technology stack just to be able to use the shared library. More important than that is the reduction in the ability to make changes in isolation. So if it happens that some business logic is sitting in a shared library, and we have a bug or we need to put new behavior into it, and that requires someone getting 10 other teams to set a change and then deploy. At this point, it is definitely a distributed monolith we are talking about. Synchronize the deployments of everybody to get change out can get nasty quick.
Apache Commons and Gurava and many other libraries are extremely useful. Trouble starts when they are too intrusive and require us to use a specific framework to run. There are also cases when the shared library might be the only option, for example, if it contains some complex algorithm used across many services. Those cases, however, are rare exceptions rather than rules.
My goal was to present an argument that looking beyond the short-term ease and avoid binary coupling by looking at leveraging contracts, protocols, and automated tooling around our systems which will enable us to enjoy benefits of microservices and service-oriented architectures. We know best how to use shared libraries and send them around so we keep doing it. But just like programming languages which have interfaces and APIs, services can hide all their implementation details and expose data contracts and network protocols. Consumers can change and evolve independently over time as they wish and they have no dependency on the service implementation.
Sharing is always well intentioned and consistency at binary level is really, really tempting. It is often presented that health is implicit in consistency and that it is inherently a good thing like a rule written on a stone. I always thought about this. Maybe because I always felt I'm too inconsistent. I could never really manage to sleep at the same time every night. It always depended on what is interesting enough to keep me awake. I barely ever managed to wear socks that match. So it was always a question for me, people say consistency is good, but why consistency as a word implicitly brings good as a characteristic? Consistency at what level? So let’s say I’m running a restaurant. Do I consistently audit my food and my customers' opinion to produce good food or I have chefs that wear same clothes and have the same fake smile but produce crap food consistently? There is an important distinction. Living together of unlike organisms or performing symbiosis is one of the reasons why we survived as a species besides our intelligence and our viscousness.
Alvin Toffler, in his book called Revolutionary Wealth, writes: “This criterion is based on the assumption that if a fact fits with other facts regarded as true, it too must be true. Detectives, lawyers and courts lean heavily on consistency as the primary test of a witness’s truthfulness. In the world-famous trial of Michael Jackson for child-molestation, millions of TV viewers around the globe were mesmerized for months as each side, prosecution and defense, highlighted discrepancies in the evidence presented by the other. Every bit of evidence was fine-tooth-combed for internal contradictions as though noncontradiction proved truthfulness. In business, too, consistency wins points, even though it is quite possible to be consistently false. When a SWAT team of auditors descends on a firm to perform what is known as “due diligence” in preparation for a merger or acquisition, the first thing it looks for are inconsistencies. Do accounts receivable, reported in the “control ledger,” line up precisely with what the underlying subledgers show? Inconsistencies all raise suspicion that the truth is being massaged. Since the accounting scandals at Enron, WorldCom, Adelphia, Tyco and a host of other high-flying firms, the consistency criterion has been applied with greater consistency.”
There are many things like standardized logging, fault injection, distributed tracing, discovery, routing, etc. that are very hard problems that do need solutions. We can’t just say, “Well screw it. We’re not going to have any standardization whatsoever.”. However binary coupling isn’t necessarily needed to achieve this and can cause great harm in the long run. Standardization can be achieved via protocols and contracts, perhaps with a little bit more effort at the start. There’s a lot of things that we can address. Not all of them, but a lot of them we can address by declaring them in the protocol and in the data contracts of our services and then allowing independent libraries to evolve that handle all the implementation details of that.
We can use auditing rather than binary coupling to ensure standardization. We need something that audits a service before it comes in. It is effectively an integration test for a new service. So if I want to bring a new service in, if I’m going to use some common tech stack that most services use, then it’s probably pretty easy to get through that auditing process because everyone’s done it. But if I’m a team who is trying to do something new and it’s worth taking us a little bit longer to bring on a new tech stack, then there’s an auditing process to actually assert that yes you actually show up properly with distributed tracing, logging, security or anything else necessary.
Delaying of the cost of the decoupling is actually very, very high. When you push it off down the road, to basically untangle this later, it’s incredibly hard and very expensive. Just talking about how to detangle the mess is years, let alone actually making it happen. We should try to only share a contract and avoid binary. With GraphQL we can generate clients from schemas. GRPC might be the way to go if type really matters. Using Swagger is also perfectly fine if we really want rest.