Design Considerations for a Microservice Architecture With Docker Swarm
Design Considerations for a Microservice Architecture With Docker Swarm
When building a microservice that uses Docker Swarm, make sure you keep these architectural concerns in mind.
Join the DZone community and get the full member experience.Join For Free
Learn how modern cloud architectures use of microservices has many advantages and enables developers to deliver business software in a CI/CD way.
When designing a microservice architecture, there are various design considerations to take care of, especially in terms of scalability, high availability, resilience, and loose coupling. Recently, we went live with our application which is based on microservices architecture and hosted on Docker Swarm. Here are some of the key learnings and design consideration with Docker Swarm microservices architecture which needs to be taken into account while architecting a Docker Swarm infrastructure
- Loosely coupled microservices
- Manager nodes availability and their location
- Stateless vs stateful
- Machine configurations
- Number of manager/workers
- Restricting the container's memory
- Avoiding downtime
Loosely Coupled Microservices
This is one of the most important design principles and promotes resilience and better implementation. When you have a front-end application which talks to various backends or does compute-intensive tasks, it's better to segregate the business logic into separate microservices. The front-end should just act as a view layer and all the business logic should be in the business layer. Deciding when to create a new microservice is important and relies on the functionality and business purpose served. This will also determine that how many microservices you will end up with. For example, let's imagine a web application where the customer comes and checks if he is eligible for currency conversion. The application can be broken into three microservices, like
- View — layer which holds the application view layer
- Currency conversion
- Eligibility check
The advantages of this separation are as follows:
- Say if 90% of the customers just use the application for checking eligibility, then you can scale that service alone on multiple machines based on usage and keep currency conversion instances low.
- If currency conversion is down for some reason, the customer can still check their eligibility and other bits.
Stateless vs Stateful
Docker containers, by design, are supposed to be stateless. However, many times, the application may need to be stateful, e.g. for login functionality, where the application needs to be aware of which user is logged on. By default, Docker Swarm uses the round-robin algorithm for traffic routing, which means incoming requests will be sent to a different Docker container each time and thus lose the session information. Session persistence might come as a feature in Docker Swarm in the future, but is not available as of now. We had to implement Traefik as a load balancer to maintain sticky sessions. Read here about how to implement session persistence in Docker Swarm using Traefik.
Manager Node Availability and Location
The managers in Docker Swarm need to define a quorum of managers which, in simple terms, means that the number of available manager nodes should be always greater or equal to (n+1)/2, where n is the number of manager nodes. So if you have 3 manager nodes, 2 should be always up, and if you have 5 manager nodes, 3 should be up. If the swarm loses the quorum of managers, the swarm cannot perform management tasks. This means you can not add new node or run swarm commands until the quorum is maintained again.Another important attribute is the location of manager nodes. It is advisable that manager nodes are in a different geographic region, so any outages in a particular region won't affect the quorum of managers. For example, if you have 3 manager nodes, then you can choose Asia, Europe, and America as their geographic locations from any cloud provider. On the positive side, even if the quorum is lost, say due to 2 out of 3 managers being down, the Docker containers/services will still keep working and serving traffic. Once the machines are available, the quorum will be maintained automatically.
The rosy picture which has been painted by containerization is that it is easy to scale using cheap machines. The problem with cheap machines is that they often have a poor configuration. If the machine has only 1 CPU and the microservice happens to be CPU-intensive, running multiple containers on that machine might even make things worse, as the containers would be fighting for CPU allocation. Similarly, if the microservices are memory-intensive make sure the RAM is appropriate.
Autoscaling is not available with Docker Swarm as of now with version 17.06, and to add new machines to a swarm, you will have to use
docker swarm join token to add more managers and workers. Also, adding new nodes doesn't mean that the swarm will be auto-rebalanced, for example, if you have 3 machines with each running 2 containers and you decide to add 3 more machines, only 1 container should run on each machine. Unless and until you do a
docker stack deploy, Swarm won't be auto-balanced. Another trick which works well and I tend to use is
docker service scale to scale up and bring services down — that way the swarm rebalances itself.
Eventually, at some point, the services will fail or will have defects and you will need logs to debug things. Having multiple services would mean multiple log files, and even if you use
docker service logs, it may not be helpful if the service has multiple containers running. The best way to log in a multiservice environment is to use a log aggregator like fluentd so logs are written in one place regardless of being scattered all over. Fluentd works well with Elasticsearch and Kibana, where you can basically search through logs, filter, and query. More can be found here:
- How to Collect logs from multiple containers and write to a single file
- Configuring Kibana and ElasticSearch for Log Analysis with Fluentd on Docker Swarm
To avoid downtime, there are a couple of things which can be done. The first is to have multiple instances of the container. Any service should have at least 2 instances of containers running. Also, make effective use of the
update_config attribute in Docker compose, where you can specify the delay between 2 restarts. For example, the below snippet of docker-compose will create 3 replicas of containers, and if you ever choose to update your service, each container will restart after a gap of 90 seconds.
deploy: mode: replicated replicas: 3 update_config: delay: 90s
Optimizing the Container Limits
To make sure that one Docker container/microservice doesn't end up fighting up with other containers for resources like CPU, RAM, and I/O, containers can be limited to how much RAM can be allocated or how much CPU can be used by them. For example, the below lines in Docker compose will limit the container to use only 2GB RAM, even if the machine has 8GB or 16GB RAM.
resources: limits: memory: 2048M
Docker cloud seems to be capable of creating a new swarm on Azure/AWS, and also can potentially implement a continuous integration pipeline, but on the downside, it creates too many resources — on Azure, at least. We found that it was easy enough to create a swarm within a matter of minutes once we have Docker installed.
docker swarm join tokencan be used to easily bring up new machines on Swarm. Also, automated deployment is easy enough through Jenkins. We use the fabric8.io plugin to create Docker images and push them to Dockerhub. Jenkins then does the deployment by running commands on the manager node using a remote SSH plugin.
- How to Automate Docker Swarm Service deployment using Jenkins
- How To Push Docker Images To Docker Hub Repository Using Docker Maven plugin
Docker Swarm fits well in a microservice architecture. Here are some of the features which have really caught our eyes:
- Docker Swarm is very easy to create and can be set up in a matter of minutes. The ease of scaling up is immense; any new machine needs just a token to become a worker/manager node.
- Scaling services is very easy.
docker scale service =10will create 10 instances of Docker containers in no time.
- Its open-source and community edition also works well in production, saving a lot of money for small enterprises.
Some features, if added, could be a good improvement:
- Session persistence in Docker Swarm could be added as a feature in new releases.
- Autoscaling could be added as a feature, too. It would be good if Swarm could add new machines from the pool and run containers which are being used more or under stress on demand.
- Rebalancing the services when new machines are added to Swarm would be a great addition, too.
Opinions expressed by DZone contributors are their own.