This post is part of DZone's Java Ecosystem Series related to our first-ever Guide to the Java Ecosystem!
In practice, it turns out that the “write once, run everywhere” promise is only half true. You can certainly run any Java application on the Java Virtual Machine, but the JVM installation is where incompatibilities might arise. An application might require a specific version of the JVM, need specific ports to be available, or require environment variables to be set. Many applications also require other, non-Java components like databases. All these things make it difficult to run applications consistently on different machines and take applications to production, even more so when we’re talking about large clusters and deployment automation. This is where Docker comes in.
Docker makes it easy to package any kind of component or application in a container and work with each container in the exact same way. It doesn’t matter what technology is used inside. This gives developers and operations a single interface to work with different components. The container hides and abstracts away all the details of setting up the environment to run each component. During different steps of the application lifecycle you can leverage these features in different ways.
Docker in Production
Docker’s biggest benefits come into play on the production side of deployments, so we’ll look at that first. When deploying to production, it’s even more important to have a reproducible environment. Automation becomes a lot easier if containers are used, because all you have to do is start and stop containers. There’s no need to write scripts to set environment variables or install a bunch of dependencies.
Also, production deployments often require a cluster of multiple servers to deal with production load and failover. These deployments require a lot of orchestration during deployment and during runtime. For example, deploying new versions of an application without downtime in the form of rolling-upgrades or blue-green deployments is very common, but far from trivial to implement. When there’s a server failure in a cluster, who restarts the containers on another available machine? These kinds of requirements make production deployments very challenging if we were to handle all of these things manually. By hiding the environment details of the components inside its containers, Docker makes consistent deployments and deployment automation a lot easier.
This is great for running components on a single server, but doesn’t help much with running clusters by itself. Docker is focused on running containers on a single machine. Starting and monitoring containers on a cluster of machines requires additional tools. Luckily the Docker ecosystem already has several new tools for this. Kubernetes, Mesos, and Docker Swarm are the most well-known tools for running containers on clusters. Kubernetes is one of the most active open source projects in this area, backed and actively developed by Google, Red Hat, and many others.
Kubernetes lets you deploy containers replicated over multiple machines, taking care of all the necessary orchestration of starting containers on a number of machines as well as monitoring containers for failure. Kubernetes has a command line tool to start deployments and get information about the cluster, but more importantly it provides a REST API that can be used to integrate with build servers, load balancers, and other pieces of the production environment puzzle.
Tools like Kubernetes require an investment to learn, but they are definitely worth the effort. Docker itself is helpful, but its utility is limited in a clustered server environment. Tools like Kubernetes make container technology really shine, taking deployments to a completely new level.
There is a lot to say about whether microservices are a good idea or not. This discussion is out of scope for this article, but you should know that Docker and Kubernetes based deployments make the operations side of the microservices story a lot easier. Different services in a microservices architecture may be developed using different technologies; different Java frameworks, or even different languages entirely. At the infrastructure level, all these services are deployed in exactly the same way; all the infrastructure cares about is containers. Even if your application doesn’t follow a microservices approach (most applications don’t) it gives you a lot of freedom when you’re not limited by your deployment environment when making technology choices.
When working with microservices it should also be possible to scale services individually. This brings another challenge: How do you know the location (IP address and port) of a service if it may move to a different machine(s)? Service discovery is an important aspect in a microservices architecture. Kubernetes makes this problem a lot easier to solve because it already has a concept of services. A service in Kubernetes is a proxy on top of containers replicated over multiple machines. Consumers of the service only need to know the IP address of the service, while the underlying containers are using dynamic addresses that may change after deployments, failures, or scaling.
Docker in Test and Development Environments
When using containerized production deployments we get another huge benefit for free. It becomes trivial to run test/acceptance environments configured exactly the same as their production counterparts. Production issues caused by small differences between a production deployment and a test deployment are very hard to debug. By using exactly the same containers in test and production, we avoid this problem altogether.
This extends to developer environments as well. Predictably, the development environment is where most developers start using Docker. Using Docker on your developer machine, you can test an application exactly the way it will be deployed to production. It can make running other components required by the application, like databases, a lot easier by running those in containers.
Containerized deployment does impose some requirements on your application architecture. Tools like Kubernetes are designed to deal with dynamic environments where applications run on multiple machines, and scaling and failover happen automatically. This kind of dynamic behavior requires a stateless architecture. Basically this means that the application shouldn’t rely on the existence of a specific server, which has a lot to do with avoiding session state. There are lots of tools and patterns available to help with this, stateless architecture isn’t something new, but it has become even more relevant today.
Building Docker Containers for Java
What do we need in order to run Java in a Docker container? Ideally we have a bare bones image that just contains the JVM and your application. Image size is important for a fast build and deployment workflow, so smaller images are better. There are useable base images available that already contain the JVM. The application can run any way you want, but a very convenient approach is to package the application as a self contained executable JAR file. This is supported by many tools and frameworks today including Spring Boot, OSGi, and others. This makes the Docker file trivial to write and easy to maintain.
Working with more traditional setups where the application is deployed in an application server works as well though. The big difference with traditional deployments is that the application server should only contain one component/application instead of deploying multiple applications in the same app server. This gives you much more flexibility during deployment and better isolation. As a side effect, this drastically reduces the need for an application server, moving more towards an approach where you just have the components that your application needs.
A common question while working with Docker containers is how to deal with configuration like database logins. Typically this kind of configuration will be different for each environment (production, test, acceptance, etc.) that you deploy to. To make the same image useful in different environments we need a way to pass in parameters to an image. There are roughly two approaches. The simplest approach is to pass configuration as environment variables. Environment variables can be set when starting a Docker container, and most Java frameworks have a way to read configuration from environment variables. The more complex, but potentially more dynamic approach is to use a key-value store like etcd or ZooKeeper to store configuration.
Docker has great benefits for Java developers. However, it’s important to understand that Docker itself is only the start. Tools like Kubernetes are the really interesting parts that take distributed architectures and deployment automation to the next level.