Ephemeral storage is a major selling point of containers. “Start a container from an image. Make whatever changes you want. Then you stop it and start a new one. Look, a whole new file system that resets back to the content of the image!”
In Docker terms, that might look like this:
# docker run -it centos [root@d42876f95c6a /]# echo "Hello world" > /hello-file [root@d42876f95c6a /]# exit exit # docker run -it centos [root@a0a93816fcfe /]# cat /hello-file cat: /hello-file: No such file or directory
When we build applications around containers, this ephemeral storage is incredibly useful. It makes it easy to scale horizontally: we just create multiple instances of containers from the same image, and each one gets its own isolated file system. It makes it easy to upgrade: we just create a new version of the image, and we don’t have to worry about upgrade-in-place or capturing anything from existing container instances. It makes it easy to move from a single system to a cluster, or from on-premises to cloud: we only need to make sure the cluster or cloud can access our image in a registry. And it makes it easy to recover: no matter what our application might have done to its file system on its way to a horrible crash, we just start a new, fresh container instance from the image and it’s like the failure never happened.
So, we don’t want our container engine to stop providing ephemeral, temporary storage. But we do have a problem when we transition from tutorial examples to real applications. Real applications must keep state somewhere. Often, we push our state back into some data store (SQL-based or NoSQL-based). But that just raises the question of where to put the data store application. Is it also in a container? Ideally, the answer is “yes,” so we can take advantage of the same rolling upgrades, redundancy, and failover that we use for our application layer. To run our data store in a container, however, we can no longer be satisfied with just ephemeral, temporary storage. Our container instances need to be able to access persistent storage.
For simple cases where we just run our Docker containers directly, this is easy. We have two main choices: we can identify a directory on the host file system, or we can have Docker manage the storage for us. Here’s how it looks when Docker manages the storage:
# docker volume create data data # docker run -it -v data:/data centos [root@5238393087ae /]# echo "Hello world" > /data/hello-file [root@5238393087ae /]# exit exit # docker run -it -v data:/data centos [root@e62608823cd0 /]# cat /data/hello-file Hello world
Docker does not keep the root file system from the first container, but it does keep the “data” volume, and that same volume is mounted in the second container as well, so the storage is persistent.
This works on a single system, but access to persistent storage gets more complicated in a clustered container environment like Kubernetes or Docker Swarm. If our data store container might get started on any one of hundreds of nodes and might migrate from one node to another at any time, we can’t just rely on one server’s file system to store the data. We need a storage solution that is aware of containers and distributed processing and can seamlessly integrate.
This Refcard will describe the solution to this need for container-aware storage and will show how getting the storage solution right is a key element of building reliable containerized applications that excel in production.