In this article, I show how Docker can help in setting up a runtime environment, a database with a predefined dataset, get it all together and run isolated from everything else on your machine.
Let’s start with the goals:
- I want to have isolated Java SDK, Scala SDK, and SBT (build tool).
- I want to be still able to edit and rebuild my project code easily from my IDE.
- I need a MongoDB instance running locally.
- Last but not least, I want to have some minimal data set in my MongoDB out of the box.
- Ah, right, all of the above must be downloaded and configured in a single command.
All these goals can be achieved by running and tying together just three Docker containers. Here is a high-level overview of these containers:
It’s impressive that such a simple setup brings all listed benefits, isn’t it? Let’s dive in.
Step I: Development Container Configuration
First two goals are covered by Dev Container, so let’s start with that one.
Our minimal project structure should look like this:
/myfancyproject /project /module1 /module2 .dockerignore build.sbt Dockerfile
The structure will become more sophisticated as the article progresses, but for now it’s sufficient. Dockerfile is the place where Dev Container’s image is described (if you aren’t yet familiar with images and containers, you should read this first). Let’s look inside:
❶ FROM java:openjdk-8u72-jdk ❷ # install SBT RUN apt-get update \ && apt-get install -y apt-transport-https \ && echo "deb https://dl.bintray.com/sbt/debian /" | tee -a /etc/apt/sources.list.d/sbt.list \ && apt-key adv --keyserver hkp://keyserver.ubuntu.com:80 --recv 642AC823 \ && apt-get update \ && apt-get -y install sbt ENV SBT_OPTS="-Xmx1200M -Xss512K -XX:MaxMetaspaceSize=512M -XX:MetaspaceSize=300M" ❸ # make SBT dependencies an image layer: no need to update them on every container rebuild COPY project/build.properties /setup/project/build.properties RUN cd /setup \ && sbt test \ && cd / \ && rm -r /setup ❹ EXPOSE 8080 ❺ VOLUME /app WORKDIR /app ❻ CMD ["sbt"]
The Official OpenJDK 1.8 image is specified as a base image in the first line of Dockerfile. Note: official repositories are available for many other popular platforms. They are easily recognizable by naming convention: it never contains slashes i.e. it always just a repository name without author name.
Install SBT and specify environment variable SBT_OPTS.
An SBT-specific trick (skip it safely if you don’t use SBT) is to speed up containers' starting times. As you may know, nothing is eternal in a container: it can (and normally is) destroyed every now and then. By making some required dependencies a part of the image, we download them just once and in this way significantly speed up containers building time.
Declare port 8080 as the one listened by an application inside the container. We’ll refer it later to access the application.
Declare a new volume under the
/appfolder and start the next commands from there.We will use it in a moment to make all project files accessible from two worlds: from the host and from the container.
Default command to run SBT interactive mode on container startup.For build tools without interactive mode (like Maven), this can be
Now, we can already test the image:
docker build -t myfancyimage . docker run --rm -it -v $(pwd):/app -p 127.0.0.1:9090:8080 myfancyimage
The first command builds an image from our Dockerfile and gives it a name myfancyimage. The second command builds and starts the container from the image. It binds current folder to the container’s volume (
$(pwd):/app) and binds host port 9090 to container’s exposed port 8080.
Step II: MongoDB Container Configuration
Ok, now it’s time to bring in some data. We start with adding Mongo Engine container, and later supply it with sample data snapshot. As we’re about to run multiple containers linked together, it’s convenient to describe how to run these containers via docker-compose configuration file. Let’s add to the project’s root
docker-compose.yml with the following content:
version: '2' services: dev-container: ❶ build: context: . dockerfile: Dockerfile image: myfancyimage ports: - "127.0.0.1:9090:8080" volumes: - .:/app links: ❸ - mongo-engine mongo-engine: ❷ image: mongo:3.2 command: --directoryperdb
The commands for building and running myfancyimage are transformed to the
mongo-enginewith MongoDB 3.2 from official DockerHub repository.
dev-container: mongo-engine will start prior to dev-container and they will share a network. MongoDB is available to dev-container by the URL "mongodb://mongo-engine/".
Let’s try it:
docker-compose run --service-ports dev-container
It’s important to add the --service-ports flag to enable configured ports mapping.
Step III: Data Container Configuration
All right, here comes the hardest part: sample data distribution. Unfortunately, there’s no suitable mechanism for Docker Data Volumes distribution, although there exist a few Docker volumes managers (i.e., Flocker, Azure Volume driver, etc.), these tools serve other goals.
Note: An alternative solution would be to restore data from DB dump programmatically or even generate it randomly. But this approach is not generic, i.e., it involves specific tools and scripts for each DB, and in general is more complicated.
The data distribution mechanism we’re seeking must support two operations:
- Replicate a fairly small dataset from a remote shared repository to local environment.
- Publish new or modified data set to the remote repository.
One obvious approach is to distribute data via docker images. In this case, a remote repository is the same place we store our Docker images. It can be either DockerHub or a private Docker Registry instance. The solution described below can work with both.
Meeting the 1st requirement is easy: we need to run a container from a data image, mark data folder as a volume, and link that volume (via the --volumes-from argument) to Mongo Engine container.
The 2nd requirement is complicated. After doing some changes inside the volume we cannot simply commit those changes back to docker image: volume is technically not a part of the modifiable top layer of a container. In simpler words, Docker daemon just doesn’t see any changes to commit.
Here’s the trick: If we can read changed data but cannot commit it from the volume then we need to copy it first elsewhere outside of all volumes so that the daemon detects changes. Applying the trick has a not-so-obvious consequence: we cannot create a volume directly from the data folder but have to use another path for it, and then copy all data to the volume when the container starts. Otherwise, we’ll have to alternate the volume path depending on where the data is stored this time, and this is hardly automated.
The whole process of cloning and saving a dataset is displayed on the diagrams below:
Making data snapshot available to other containers as a volume on startup.
Committing changes to new image and pushing it to storage.
We’ll dig into scripts for taking and applying data snapshots a bit later. For now, let’s assume they are present in the data snapshot container’s /usr folder. Here is how the docker-compose.yml is updated with the data container definition:
version: '2' services: dev-container: build: context: . dockerfile: Dockerfile image: myfancyimage ports: - "127.0.0.1:9090:8080" volumes: - .:/app links: - mongo-engine mongo-engine: image: mongo:3.2 command: --directoryperdb volumes_from: ❶ - data-snapshot depends_on: - data-snapshot data-snapshot: image: rgorodischer/data-snapshot:scratch volumes: - /data/active ❷ command: /usr/apply_snapshot.sh ❸
Link volumes from the data-snapshot container defined below.
Make a volume from folder
/data/activein the data-snapshot container.
/usr/apply_snapshot.shscript on data-snapshot container startup.
Now, let’s see what the scripts are doing.
apply_snapshot.sh is simply copying /data/snapshot folder contents to the volume folder /data/active (see the second diagram). Here’s its full listing:
#!/bin/ash set -e DEST=/data/active mkdir -p $DEST if [[ -z $(ls -A $DEST) ]]; then cp -a /data/snapshot/. $DEST else echo "$DEST is not empty." fi
take_snapshot.sh is doing the opposite: replaces contents of /data/snapshot with contents of /data/active folder. It also removes files with .lock extension, which is the only MongoDB-specific action here (and more a precaution than a necessity). A listing of
take_snapshot.sh is shown below:
#!/bin/ash set -e rm -rf /data/snapshot mkdir -p /data/snapshot cp -a /data/active/. /data/snapshot find /data/snapshot -type f -name *.lock -delete
Taking a snapshot is directed externally from
#!/bin/bash set -e REGISTRY=your.registry REPOSITORY=your-repository-name SNAPSHOT_TAKER=data-snapshot-taker DATA_CONTAINER=data-snapshot data_container_id=$(docker ps -a | grep $DATA_CONTAINER | cut -d' ' -f 1) if [[ -z $data_container_id ]]; then echo "Data container is not found." exit 1 fi docker login $REGISTRY echo "Taking the snapshot..\n" ❶ docker run --name $SNAPSHOT_TAKER --volumes-from $data_container_id rgorodischer/data-snapshot:scratch /usr/take_snapshot.sh echo -en "Snapshot description:\n" read msg echo -en "Author:\n" read author echo -en "Snapshot tag (alphanumeric string without spaces):\n" read tag ❷ docker commit --author="$author" --message="$msg" $SNAPSHOT_TAKER $REGISTRY/$REPOSITORY:$tag ❸ docker push $REGISTRY/$REPOSITORY:$tag ❹ docker rm -f $SNAPSHOT_TAKER &> /dev/null
Run a temporary container from
data-snapshot:scratchimage with linked data-snapshot’s volume, execute
/usr/take_snapshot.shscript on startup and stop the container (it’s stopped automatically because no other processes are run there). I run the container from my image on DockerHub, but most likely you want to use your own copy.
Commit changes to new local image tagged with $tag.
Push new data snapshot image to your repository.
Remove the temporary container.
Now imagine you’ve just published a new shiny data snapshot tagged essential-data-set.
Then you simply update data-snapshot definition in docker-compose.yml with the new tag and make a Git push. Your teammate pulls those changes, and can reestablish the whole dev environment including your new dataset just by running a single command:
docker-compose run --service-ports dev-container
As a final step, you can add some scripting for removing existing containers and volumes before updating the environment, so that docker-compose can work out smoothly every run.