Microservices architecture is overhyped nowadays. There are a lot of decent blog posts, series of articles, presentations, etc. that explain what it is, explain its pros and cons, or just focus on implementation details. Some engineers advocate for the transition to microservices, while some advise you to stay with your monolith and make it the majestic monolith.
This blog post is not about that. Here, I would like to share real practical decisions that we probably might have done differently if we had had another chance. There will be a lot of Captain Obvious here, but I hope that at least someone will take into consideration our experience and make their journey to microservices smoother and funnier.
Here are the lessons that we learned when migrating to microservices.
Start With Just One Service, Not With a Few in Parallel
Imagine that you have decided to start a project from scratch, and somehow, a decision was made that it should follow microservices architecture. So, what will the engineers do? They will create at least a few repositories, organize several different backlogs, and at the same time, they will start coding. That might lead to such consequences as the following:
- These services will have a distinct module structure.
- These services will depend on various JAR versions, use various utility libraries for the same purpose, etc.
- These services will follow different coding standards.
- These services will have different approaches for testing.
- And more.
At the end of the first or maybe second Sprint, those teams will realize that they don't have a collective agreement in place and will start arguing about all the items listed above. First PRs might block the whole process for a while.
Instead of starting with multiple services at a time, start the first couple of Sprints developing functionality for multiple domains in the same service, in the same repository, but in different packages. All team members will be involved in code review and will align their common vision on package structures, coding standards, testing, libraries used, and approaches for standard functionality (data mapping, validation, CRUD operations, etc.). Then, a split into two services will look like a natural step to divide two bounded contexts into separate services. But at this stage, separate teams have the same vision what they are building and how.
Start With Services That Bring Business Value
That might sound obvious, but it is not. People might be overhyped with infrastructure and start thinking about OAuth, Gateway, Service Discovery, Config Server, asynchronous communication between services, etc., from day one. There is nothing wrong with having that in mind, but starting implementation of microservices architecture from infrastructural service might be a step in a wrong direction.
I will provide specific examples. Imagine you have begun with OAuth. To make it work, you at least need to have a user- and role-management service. If you don't have it in place, you will start creating of shortcuts and workarounds or be you will be blocked by another team that is developing it at a different pace. If you don't have at least two services, there is no sense to create a unified Gateway, implement Service Discovery, etc. Use common sense and start with services that help you to achieve your MVP with less effort. Focus on business first, not on fancy technical stuff.
Think About Orchestration and Eventual Data Consistency
Life is easier when you are inside of one monolithic application — for example, no distributed transactions, no two-phase commits, etc. Life may get more interesting in the microservices world when one logical business transaction from the UI perspective has to result in multiple calls to different services that should somehow be orchestrated and should store data in an eventually consistent state.
Typical scenario: When payment is complete, then the appropriate order should be marked as confirmed; otherwise, it should be marked as not verified. In this simple example, we have two different services — a payment and an order — that operate with two separate storages. So, after a payment is created in a payment service DB corresponding order in order service, the DB should be updated, too. This can be achieved in various ways, but let me describe one of the simplest in details:
- Order services make an async HTTP call to the payment service to start payment.
- The appropriate order has a not-paid status.
- The payment service completes payment and notifies the order service by making the async HTTP call to order a service that updates the order status with a particular result.
- UI might reload the order from the database to show a correct status for an end user.
Even for this one very simple scenario logic became quite complicated:
- Should the order service trigger communication between services or the payment service?
- Should this communication be synchronous or asynchronous?
- What if the payment service(s) is/are down when making a request?
- What if the order service(s) is/are down when payment service sends status back?
In reality, such flows involve calls between multiple services.
Eventual Data Consistency
As microservices architecture follows to AP scenario from CAP theorem, data in the system will be eventually consistent. Thus, it would be great to store data in a way to:
- Guarantee reply to a correct state if one or more of the services are down.
- Guarantee reply to a proper state if an event bus between services is down.
- Provide visible auditing and a history of changes.
For instance, event sourcing is a great fit for microservices architecture. Have a look at a presentation by Chris Richardson and Kenny Bastani.
The strategy might be the following:
- Persist events, not the current state.
- Identity state changing domain events.
- Replay events to recreate state if needed.
The problem is that the decision as to whether to use event sourcing or not has to be made in advance. The late decision to develop in an event-driven architecture might require service rewrite and an additional data migration.
Think About Data Migration
Are you migrating an existing monolithic solution to microservices?
If so, then there are only a few data migration options that you should be considering.
Let's review each of them in more detail.
A new solution will start with new domain models, new storages, and no data. This is a relatively rare case, in reality, to be honest. If you have one, you are pretty lucky.
A new solution will start with new domain models and storages, and all data are going to migrate from a monolithic solution before a release. Afterward, no data will be inserted via the old system, and the new one will be a single source of truth. This is a preferable scenario, but usually still not the most common.
If you are following this path, then you are focusing on a big bang migration tool that might execute during downtime window. At first glance, it might sound trivial, and engineers tend to neglect its importance and postpone it to the latter stage when all services and infrastructure are in place. But in reality, you need to migrate all data from one big monolithic storage (usually RDBMS) to the set of independent storages (both SQL, NoSQL, or even binary files) that are currently managed separately.
That might lead to such technical complexities:
- A domain model between them might differ significantly. This means that it is not simple field-to-field mapping anymore. As a result, the migration tool has its business logic that should be specified, verified and tested.
- Validation rules might significantly vary, which will lead to the specific corner cases and shortcuts that should be discussed one by one with product managers.
- Statements that used to form a single transaction in the old system are now fully distributed, so you need to think about how to maintain data consistency between storages.
- Execution time matters. As downtime window is usually short, you will spend a lot of time from the initial version of a migration tool (that satisfies correctness) getting to a version that is also quite performant.
Two systems might co-exist. Two scenarios are possible here:
- Two systems are keeping their data fully separately, so new functionality (that is no longer supported in old solution) is using a new domain model.
- Two systems are keeping their data separately, but synchronizing to some extent (which is obviously more complex). This scenario is quite common since, for the old solution, there can be a dependent process (such as reporting) that can work with the old model only, so data should eventually reside in the old data storage somehow.
The first scenario is just a smaller version of a big bang migration tool. So, all statements mentioned above apply here, too.
In the case of the second scenario, it might get even more complex. Imagine a scenario when all changes made via new service (add data inserted using it) should migrate in near real-time to the old database used by a monolithic application. That leads us to even more complex implications:
- Some service, let's call it a data loader, has to be implemented as a mediator between those two solutions. The data loader might interact with the monolithic database directly or use API exposed by the monolithic application (that is, of course, more preferable, as we could reuse validation rules, transactions support, etc.).
- Communication between new services and the data loader should be asynchronous — for instance, via events — which leads us to infrastructure changes, too, as we need to add a persistent pub/sub mechanism or other alternatives like Apache Kafka.
- The data loader should think about concurrent execution to guarantee that changes made in particular order by microservices have to be propagated in the same order to an old system.
- One transaction in a microservice might be transformed into multiple calls from the data loader to an old system. The question about eventual data consistency is being raised one more time in this case.
- The data loader should deal with all corner cases and alert if anything is wrong while performing data migration.
- The data loader should be fault tolerant, i.e., be able to start processing old events after a crash.
- Comprehensive logging, auditing, and monitoring should be implemented to guarantee data consistency and no data loss between the two systems.
As you can see, data migration is quite a complicated part of your system. Thus, it would be great to agree at least on the option that you are choosing as soon as possible (knowing all technical implications) and to start implementing a data migration tool or a data loader service when you have a few business services already in place, but not when you are close to a release.
Invest in DevOps
This item is critical in any case. But in a microservices world, it becomes even more critical. It is crucial for the team to be able to easily check new changes to the particular microservices along with others. So, the CI pipeline might consist of the following steps:
- Automatic build after each commit.
- Test execution.
- Automatic setup of an environment (latest version of your microservice against a specific set of dependent services).
- Execution of automated tests.
- Automatic environment shutdown
Invest into infrastructure as a code approaches using, for instance, Terraform to fully automate setup/shutdown of an environment and provisioning tools like Ansible. Make sure that each engineer has sufficient ecosystem, for instance via Jenkins — enough to deliver new robust functionality without tedious mechanics.
Communication is hard. Cross-team or cross-location communication is even harder.
There is a famous joke about communication and microservices:
Build microservices around the teams that know how to collaborate. If you have concerns about any possible technological issues, please make sure that you don't have any communication issues in advance. Talk to people, listen to them, lead by example, and invest in effective communication and collaboration as much as you can. It is priceless.
At the end of this blog post, I'd like to highlight some other smaller lessons that I hope might be useful, too.
Make Things Simple
Microservices architecture is hard by default. It adds extra complexities to everything —integration, transactions, testing, deployment, debug, tracing, etc.
So, please try to make things simple whenever possible. Don't overcomplicate.
If there is an option to choose an easier path for data migration, choose it.
If, for any reason, synchronous communication between services is acceptable for you, then use it.
If event sourcing is not worth investments, then go with a traditional error-prone, update-based data model. There is nothing wrong with that decision if it is done deliberately. You might add audit logging via separate storage later.
If you don't need HATEOAS and can live with level-2 services, then that is okay, too.
If authentication/authorization were checked at Gateway, there is no need to secure calls between intranet services. Use JWT to pass minimum user identity between services.
There is a great presentation by Aviran Mordo about the Wix journey to microservices. I highly recommend reviewing it. The main recommendation? Don't forget about the YAGNI principle.
But Don't Make Shortcuts
Microservices architecture has its standards. At first, as everything in life, they look complex. But they are complicated just because they are just not familiar to us.
Learn from pain, inconvenience, and the experience of others, but don't make shortcuts.
If you need data from a database, create an appropriate API and make a call using it, not through direct access to a DB.
If a service is responsible for more than one context, create another one — don't violate the single responsibility principle.
If integration between services becomes a bottleneck, then, according to extreme programming, do it more often.
I wish you all a great and interesting journey from monolith to microservices architecture.