Demystifying the Transition to Microservices
We cover the initial analysis and prep for a microservices transition; tips and best practices related to the process of splitting monolithic apps into microservices.
Join the DZone community and get the full member experience.Join For Free
A lot has been written about microservices; when to use them, what advantages they provide, and how fantastic life is after adopting them. You may want to consider this architectural approach when your backend is going beyond trivial and you expect the features you are providing to grow, change quickly and become mission-critical if they are not already.
However, there are still high chances that you end up working in an organization or project where monolithic architecture is still dominant, for a variety of reasons:
- This is how most backends are built, and inertia keeps them there.
- Transition to a different architecture is seen as a major change with high risk and cost, and with uncertain results.
- Lack of skills in the team to undertake a change of this magnitude.
- Fear of development slow-down or even paralyzation during the process.
- Inability to design and implement a convincing change process.
We will go through the initial analysis and preparation for a microservices transition, provide tips and best practices related to the process of splitting your monolithic applications into microservices, and comment about the adoption of some technologies and processes that will help future-proof your solution in a world of constant changes.
The learnings and opinions written here are inspired by practical experience in large CMS-based web applications/mobile app backends developed and maintained by DMI within the Mobile App Development practice. Most of the ideas could also be applicable to other kinds of applications such as e-commerce solutions, finance, etc.
The Initial Picture
Typically the initial architecture you may encounter is a monolithic one, perhaps even some small collection of different monolithic applications, in a service-oriented architecture. Perhaps these monoliths already communicate with each other using APIs.
Assuming you already made the decision to transition to a microservices architecture and you know why (otherwise there is plenty of literature about the topic), you can probably envision the final picture where you would want to go: A large collection of loosely coupled small pieces (microservices), running independently, communicating among them with APIs or messages, encapsulating some domain logic and data, etc. There is no trace of the original monolith(s) since all the functionality has been implemented in a smarter way in small microservices.
I am sorry to tell you that this picture is wrong. This is the picture you would draw if you started the implementation from scratch, and although it can be helpful as guidance, you should not get obsessed with it since it is probably not cost-effective to reach there. If you are dealing with a large existing monolithic application developed over the years, the monolith typically will contain lots and lots of functionality which, although necessary for the business, probably there is no point in making the effort to replicate it in the form of microservices.
If you could stop the business for one or two years and reimplement the whole thing again using microservices, then it would make sense to get rid of the monolith completely, but in real life the business cannot stop, and the business is normally about implementing new features on top of the existing. Only some of the monolith functionality will be worth extracting in microservices, generally:
- Core components that are needed by other parts of the application (such as authentication, authorization, etc.).
- Components that are in active development and evolution (depending on your business logic).
So, you need to embrace the idea that the monolith is probably not going to disappear any time soon. Your application will be running under a hybrid architecture of monolith + microservices for some (potentially long) time, and you will be deciding what needs to be extracted to a microservice and whatnot depending on where the business is going over the next months or years.
Your focus should not be on the ideal final picture, but the transition process itself. This is a challenging enough endeavor, to be able to evolve the architecture without affecting the business, keep releasing new features regularly and minimize or avoid completely any kind of downtime.
Where to Start
Embrace Containers First
The very first step you should be taking is to embrace container technology. The biggest difference between a service-oriented architecture and a microservice-oriented architecture is that in the second one, the deployment is so complex, there are so many pieces with independent lifecycles, and each piece needs to have some custom configuration that it can no longer be managed manually.
In a service-oriented architecture, with a handful of monolithic applications, the infrastructure team can still treat each of them as a separate application and manage them individually in terms of the release process, monitoring, health check, configuration, etc.
With microservices, this is not possible with a reasonable cost. There will eventually be hundreds of different 'applications,' each of them with its own release cycle, health check, configuration, etc., so their lifecycle has to be managed automatically. There may be other technologies to do so, but microservices have become almost a synonym of containers. Not only Docker containers manually started, but you will also need an orchestrator. Kubernetes or Docker Swarm are the most popular ones.
My advice is that you adopt containers and choose an orchestrator as soon as possible before you start touching one single line of code related to microservices implementation. Your first milestone should be to be able to release your monolithic application(s) inside a container each and run all these containers in a cluster instead of having dedicated virtual machines for every application, which is the typical way monolithic applications are run.
Notice this is applicable to monolithic applications for which your organization owns the source code and there is active development going on, but it can also be applied to third-party applications that are part of your application stack, as long as you have the ability to configure them.
How to Embrace Containers
Some of the ideas I will express next are based mostly on my experience with Docker Swarm and Kubernetes, but they could be applicable to other orchestrators.
In order to embrace containers, before you even think about microservices, you need to do two things:
- Wrap your existing monolithic applications inside containers.
- Create a description of your stack of services in the language of your orchestrator.
Let’s see these steps in more detail.
Wrapping Your Monolithic Application Inside Containers
This will require you to change your build process so that the output is a container, for example, a Docker container. Before doing that, you might have been packaging your application in the form of a WAR file (for example, for Java web applications), or in a ZIP file that contains a Node.js application inside, or whatever other format. After the build, these output artifacts are published and stored in some repository (for example, Nexus) and they are deployed to the servers where the application runs using whatever deployment tools you are using. For example, a WAR file would be copied to some server with Tomcat or Jetty installed in order to complete the release process, or the Node.js application in a ZIP file would be expanded into some folder in a server that already has the scripts and the necessary software to run Node.js applications.
You need to replace all these manual steps after your build with some automated Docker container creation and publishing. By adding a Dockerfile to your application repository and changing the build scripts, you must be able to publish a container that can run your application inside. If your application is a WAR that runs in tomcat, you will need to install tomcat in your container (using the Dockerfile). If it requires Node.js in a specific version and some dependencies, your Dockerfile must do the steps in order to install whatever is needed. These automated steps substitute the previous deployment activities.
Fortunately, there is a myriad of ready-to-use base images you can use for your containers. In my experience, I have mostly chosen Alpine Linux-based images, for example, an OpenJDK on Alpine for running Java applications, or Node on Alpine for Node.js. Should you not find the base image that suits your needs, it is generally easy to take a blank Linux base image and, using your Dockerfile, install whatever packages and add whatever files you need on top.
Notice that the artifact you will publish (the Docker image) should have a unique version number, and your build process should take care of incrementing it automatically. Alternatively, you can use a commit id or any other unique tag. A traditional numeric version has the advantage that you can easily compare and see which version is greater than another. In my experience, I have used Nexus to host published artifacts, as it supports Docker image repositories as well as many other popular formats for other kinds of artifacts (such as libraries).
Describing Your Deployment in The Language of Your Orchestrator
This is the next logical step. In Docker Swarm, you define your stack of containers in a Docker-compose.yml file. In Kubernetes, you also use YAML files defining pods, services, HPA’s, etc. You will need to create those, listing your services and their configurations:
- Services, pods: Your published container images have a unique name and version, you need to use these.
- Configurations:We are talking here mostly about:
- Environment variables. Rely on these heavily to configure your containers. For example, when your architecture was monolithic you might have had an application that was a CMS offering some API and another application that was a mobile backend consuming from these APIs. The URLs of each other were known and configured into one another. Now, if both the CMS and the Mobile Backend are going to run inside containers and they still need to call each other, perhaps you would need to pass an environment variable with the name of the service inside the Docker cluster so that they reach each other using the internal Docker network. Docker Swarm’s DNS resolves these service names to the right IP address.
- Mounted volumes. Your applications may need access to the filesystem, and this was not a problem when they were running on their dedicated hardware. With containers, the filesystem is a volatile one, which disappears if the container is stopped. If your application only needs temporary files this is not a problem, but if it generates files that need to be kept, you need something like mounted volumes which can map to the host’s filesystem, which can as well map to some cloud storage for distributed, highly available filesystem support.
Put Your Deployment Definition in A Version Control System
Your deployment definition needs to change:
- Every time you adjust environment variables that your containers read.
- Every time you release new versions of your containers.
- Every time you adjust other configurations, such as mount volumes, deployment parameters, etc.
You cannot afford to maintain all this manually. A VCS system provides convenient storage for such kind of information, providing also auditing, history log, rollback, etc.
In my particular experience, a dedicated repository just for the deployment definition files is highly advisable, and I call it a 'deployments repository.' I explicitly say files in the plural, because if you are using different environments (such as test, staging, and production), every environment may have different deployment files. Differences come from:
- The difference in the lifecycle of the different containers. At any point in time, the versions in the test environments are usually higher than the versions in the production environment. If there is active development, the team is always working on future new versions.
- Different configurations. Some options like memory sizes, number of replicas per service, etc. may be different depending on the environment.
- Different environment variables.
Although I would recommend keeping the differences between environments to a minimum, these are often inevitable.
Together with the deployment files, you may need scripts and configuration files to automate several processes, such as:
- Promoting versions of containers between environments.
- Adjusting environment-specific options.
This methodology is also known as GitOps: Using git to store the desired status of your environments, using scripts and tools to automate the coordination between the defined status and the runtime status, etc. This strategy gives a big advantage to your development and operations teams both in normal day-to-day release activities but also in extraordinary circumstances (for example, if you need to recover from a catastrophic situation and regenerate your whole environment).
Do Continuous Delivery
Once you have a VCS repository containing the descriptor files for your test/staging/production environments, doing continuous delivery is my recommended next step.
The idea would be that if this repository contains descriptor files of what is running in every environment of your application, any change in any of these files that are pushed to the repository should trigger the release process in this environment so that the running state in the servers matches the definition.
There are many ways to do that, depending on your technologies of choice. For example, in DMI we have successfully implemented the following:
- A Jenkins pipeline watches for changes in the deployments Git repository.
- The deployments git repository contains not only the YML definition of every environment but also a history folder with a copy of all the YML’s that have ever been deployed per environment. Alternatively, you can use git history to compare different deployments, whatever makes more sense in your context.
- If any new commit is made, the Jenkins pipeline is triggered and compares for every environment the defined YML with the last release made (from the history folder).
- If there is any difference, it triggers the release process for this environment:
- Transferring the YML to the environment (by SSH/SCP) and executing Docker Swarm / Kubernetes commands to update the stack definition to the new YML(s).
- Copying these YMLs to the history folder and adding it to the git repository, so this release becomes now the latest.
Integrate or Build Tools to Visualize and Manage the Release Process
Your infrastructure team can supervise the status of your releases and deployments, by simply observing the contents of the deployments repository. The files stored there include vital information such as:
- When was the last release to an environment done?
- What was the content of the release (difference between this release and the latest) in terms of versions of components and their configuration?
- What is the difference between the two environments? That is, for example, what would be the content of a release to the production environment if it was made now with the version that is running in the staging environment?
Additionally to whatever can be done by someone with skills in file comparison tools, visual tools can be integrated or developed to automate these tasks and make them visible to other members of your team. In particular, I have found it particularly useful to implement a simple web application to offer project managers, product owners, QA teams, developers, etc., some of the following:
- Release process tools. Promote changes (versions of containers) between environments, compare environments, etc.
- Automated release notes generation. These are generated by comparing the versions of the released components and aggregating the release notes of each component.
- Automatic history of releases and forecast of release contents.
This internal application can run in the same stack as the main application does, and be available in some back-office URL. It can integrate with your ticket tracking system (in DMI we use JIRA) and even display ticket information in release notes and release management screens.
Of course, this application gives access to some internal infrastructure capabilities and has to be strongly secured. For example, additionally, to the credentials required to access, the URL does not need to be exposed to the public internet. Also, since the public-facing web would be running in a different container as this application, there is some intrinsic security due to the architectural choice. An attacker who would hypothetically compromise the web application would not necessarily gain access to this deployment app which might allow them to deploy malicious code or interact with the release process.
If this application is hosted in your Docker stack together with your main application, it can also integrate with the Docker/Kubernetes API of the running instance and use it to provide information about the running stack. This is vital information for operations/infrastructure teams. For example, if there is a difference in the running stack compared with what is defined in the deployments repository, either the release is still in progress (it takes time to apply changes in the stack), or something failed (for example, some version of some component failed to start and the previous version is running). The team needs to know that in order to do something about it.
Create Your Microservices
Probably the last steps 'do continuous delivery' and 'integrate or build tools to visualize and manage the release process' make a lot more sense when you start making your deployment more complex.
Until now, we are just wrapping our existing monolithic application(s) in containers in preparation for a gradual transition to a microservices architecture, but if you have only one or two of these applications, you probably do not see the point of investing in such sophisticated release automation.
These necessities become more obvious once you start developing microservices. You will start with the first microservice but you need to be prepared to accommodate potentially hundreds.
There are several best practices that have helped me in this process and I would advise.
Design Your Transition Process
You will probably be extracting pieces of logic from your monolithic application(s) into microservices, but also creating new functionality in microservices directly. You need to define a process for both scenarios and anticipate your needs before you jump into execution.
For example, for extracting functionality from the monolith into a microservice, you may need steps like:
- Isolate the domain. For example, you may want to extract the authentication system and define the boundaries of what operations and data belong to this domain.
- Identify the data model and isolate it.
- Implement a new microservice offering the desired logic.
- Based on the monolith code initially.
- Could be refactored/improved later.
- Add the microservice to the stack and deploy it.
- Replace the implementation in the monolith with calls to the microservice APIs.
- Migrate the data (if any) from the legacy system into the microservice database. This step must be released together with the previous at the same time for obvious reasons.
Design Your Microservices
Even though the literature sells the microservices architecture as the possibility of combining in the same application pieces written in totally different technologies, for your own convenience, you may want to reduce the variety of technologies as much as possible. That means you should choose your reference technologies upfront, based mostly on what your team is capable of working with.
Choose Your Technologies
You definitely cannot do everything in one technology alone, but also you do not want to bring in an arbitrarily large variety of technologies just because with containers it is possible.
For example, you may need to choose a combination of technologies like:
- Relational. MySQL.
- NoSQL. MongoDB.
- Unstructured. Azure storage.
- Business logic implementation (microservices):
- Java with Spring Boot framework.
- HAProxy for basic balancing and HTTP rules.
- Node.js for more complex transformations.
Of course, these technologies should be the ones you need for your application, but the point is that there is no reason to develop the business logic (microservices) in Node.js, Java, PHP, Scala, and Groovy just for the sake of doing it and exploring the technology. You will later need to maintain them all, apply security policies to them all, monitor them all, etc. The fewer technologies you have, the easier it will be to do all this.
Embrace a Contract-first Approach
In a microservices architecture, API contracts are first-class citizens. Microservices communicate with each other exclusively through their public APIs, so you have to allocate the processes and technologies to make the maintenance and evolution of API contracts a natural part of your development process.
A series of best practices I would recommend:
- Treat your API contracts as deliverable artifacts. For example, if you are using Java in the backend, you can design your REST API contracts as Java interfaces with Feign and Spring annotations and package them as a JAR artifact with proper versioning and release process. Consumers of the APIs can use this JAR as a client library and the service implementation (for example, in Spring Boot) can expose the API by implementing the same interface. If you change something in the contract, this will affect both the server and clients and you probably will be alerted in compile time. Of course, you need to consider the HTTP compatibility of your contract changes anyways, but this practice facilitates it.
- Document your contracts from the beginning, using Swagger or similar.
- Deploy your service implementation of the contract ASAP in the test environment(s), even if the implementation is not done yet. The contract, which is a first-class citizen, needs to be scrutinized by the involved parties (frontend developers, backend developers, QA engineers, etc.).
Choose Your Communication Stack
Generally, you will need two types of communication techniques in your microservices application:
- Synchronous. I would recommend REST (HTTP) as your synchronous communication protocol. Especially in CMS-based backends, or mobile backends, which adapt very naturally to HTTP verbs and greatly benefit from features like cache, authorization, etc. which are part of the protocol.
- Asynchronous. Choose a messaging system for your asynchronous communication and use it extensively whenever the business logic permits it.
Developers need to run the software they are maintaining in their local machines. They need to be able to run it, debug it, profile it, modify it and repeat it all over again until it works as expected.
Unlike traditional monolithic software, a microservices backend most likely cannot be run in developer machines. A single microservice can be started locally but the whole stack may end up having hundreds or thousands of containers, and it is not practical or even feasible to start it in a developer computer due to hardware limitations (insufficient memory, for example).
Your solution must be prepared for this eventuality since developers need to run the software in their local computers before making it available to anyone else in a shared test environment... You need to solve all these problems, and perhaps some more depending on your technology choices:
- Starting microservices outside containers needs to be supported since processes like debugging or profiling become more complex inside a container.
- But also it may be needed to run them inside a Docker container for a more realistic scenario. Some bugs may only happen when the software runs in the exact conditions that happen inside the container where it runs in the production environment.
- The URLs of other services that a microservice may need should be configured to point to a reference environment (for example, the test environment). This means, when the service is run locally by a developer (either inside a container or not) if it needs to call APIs of other services, the URL points to some public URL of a shared environment. However, when this service runs in the shared environment as part of the stack, these URLs are configured to point to the internal names of the services inside the Docker / Kubernetes network. You must find a solution for this problem for your technology of choice, which does not involve lots of tedious and manual work every time a new microservice is implemented. For example, Spring Cloud Config can be used for Java + Spring-based applications to customize the URLs of the different endpoints necessary.
Create a Hierarchy of Containers
A consequence of choosing specific technologies for each purpose in your microservices stack is that you will end up with families of containers with similar purposes. For example, if all your business-logic microservices are implemented in Java with Spring boot and your gateways use HAProxy or Node.js, you clearly have these two or three families of containers which should share a common base image.
Create your own hierarchy of containers in order to facilitate the creation and maintenance of new containers. Docker supports this naturally via the 'FROM' directive in the Dockerfile. For example, if you choose Alpine Linux-based containers, you may need a parent image with the version of alpine of your choice, a child image with the version of Java you need for running your Spring Boot microservices, etc.
Your hierarchy may end up looking like this:
Here, the blue alpine image would be the base alpine container from the Docker hub, and you may want to have a series of containers that extend this image and add specific scripts, applications, and configuration files that are needed for all your Spring Boot services (for example, including startup and stop scripts, sophisticated health check scripts, etc.), a similar base image for your proxies and gateways, etc. These are the ones in green.
Eventually, you may also want to support a hypothetical transition of technologies, for example, you may start implementing your microservices using Java 8 but then gradually migrate to Java 11, so you may want to have two different parent images, each one with a different version of the JDK and use the newest one for new developments while you keep the old one for microservices which are still not transitioned.
All these images should be defined in your VCS and the image stored in your artifacts repository (such as Nexus).
Future-proof Your Solution
Monitor Your Application
Monitoring how your application performs is a crucial part of any microservices architecture. I would recommend Prometheus and Grafana to capture and visualize data respectively.
The type of data you may want to capture includes, at least:
- Resource usage per container. CPU, memory, network usage. If you use Kubernetes, these container metrics are exposed and collected automatically by the prometheus-operator. You can just turn it on and then use standard Grafana community dashboards. Alternatively, you can incorporate cAdvisor, for example, into your stack.
- Application performance metrics such as:
- Rate and duration of API executions.
- Rate and duration of API invocations.
- Error ratios
- Cache hit ratios.
- Others, depending on your application.
If you have accurate reports of this kind of metric, you can easily identify bottlenecks in your application, measure the impact of software changes on your infrastructure, and continuously improve your solution.
Collect Your Application Logs
Your log collection and visualization architecture could be relatively independent of the exact technology choices made for the rest of the stack. A typical logging architecture involves:
- Log collection. A container running on each node of the cluster captures, processes, and forwards all the logs recorded. Normally, every line that the entry point of every container writes to the standard output or error stream is considered a log entry and captured by the underlying container engine. There is a variety of choices, I have successfully used:
- Logspout in Docker Swarm. Connects to the local Docker socket and reads the log streams using the Docker API.
- Fluentd in Kubernetes using containerd. Allows very sophisticated processing of logs, applying tags, parsing the content, and extracting fields (like log severity, container name, etc.). All these capabilities enable better display later on.
- Log storage. The log collectors forward the logs to some centralized storage. I have personally used Elasticsearch running on the same cluster to provide persistent storage, which seamlessly integrates with Fluentd (Fluentd provides a connector with Elasticsearch).
- Log visualization and search. Kibana, although intimidating at first, easily connects to Elasticsearch and provides all you need for searching, filtering, and displaying your log files. Papertrail is also a powerful alternative both for storage and visualization.
Make HTTP the Default Language of Your Application
If you have adopted REST APIs as your default synchronous communication protocol (which I strongly recommend), you would enjoy the vast benefits of a widely known and supported technology:
- Browser and command-line support for visualizing and interacting with any of the components of your backend.
- Ability to leverage out of the box HTTP opensource solutions such as:
- Caching. Varnish Cache can be plugged into your stack, providing a cache implementation that is compatible with the standard HTTP directives. This provides a huge benefit, especially in content-intensive backends. The cache can boost not only the client-backend communication (for example, between your mobile app and your backend), but also between microservices that call each other to read contents of any kind (for example, the permissions associated with a given session token).
- Load balancing and HTTP routing. Components like Nginx or HAProxy can be used to route, balance, modify and log your application traffic, both between your frontends/apps and your backend, but also between your backend components.
- Out of the box support for:
- Authentication and authorization.
- Error handling.
The transition to a microservices architecture from a big monolithic application could provide innumerable benefits in terms of scalability, resilience, speed of development, etc. but it could also be a headache if the execution is not carefully done.
In this article, I have tried to cover a series of aspects and best practices that I have successfully applied in this kind of projects:
- Premature adoption of container technology.
- Solid CI/CD and Operations processes, such as GitOps.
- Internal tooling for the control and visualization of your whole deployment cycle and release process.
- Technology choices for the design and implementation of your microservices.
- Tools, processes, and best practices to make your system future-proof.
Recommended Readings and Supporting Links
Opinions expressed by DZone contributors are their own.