Managing Data in Volumes
Managing Data in Volumes
Learn some of the basics around Docker volume concepts and get some details about how new volume support works in Kontena v. 1.2.0 onward.
Join the DZone community and get the full member experience.Join For Free
The Architect’s Guide to Big Data Application Performance. Get the Guide.
Kontena 1.2.0 introduced support for managing data in named volumes. This blog post will cover some of the basics around Docker volume concepts and give some details about how new volume support works in Kontena v. 1.2.0 onward.
What Is a Volume?
Managing persistent data in containers has been somewhat a problem since the inception of the containerization era. Since the early days, Docker has provided support for managing data outside of containers by using volumes. A volume is essentially a directory (or a file) that lives outside of the container union file system. As they live outside the union file system, they are essentially just normal files and directories on the host. It has been possible to define volumes either in containers, using the
-v option or even in images using
VOLUME declaration in the Dockerfile.
Data Volume Containers
Before Docker came up with the concept of named volumes, the only sensible way to decouple the life-cycle of a container and persistent data was to use a pattern called data volume container. In this pattern, there was a dedicated container to only handle the data, in volumes, which could be re-used for multiple containers. The biggest advantage is/was that the life-cycle of application containers is decoupled from the data. This essentially means that you can upgrade your apps without losing the data they've stored.
Docker 1.8 introduced support for named volumes. A named volume behaves pretty much like the data container, only it's not a container. Essentially, it is a reusable holder for data that can be used in many containers and it naturally decouples also the life cycles of containers and data. The cool thing with named volumes is that Docker supports the extensible model, in which one can use different drivers for volumes. By using different drivers, data persistence can be integrated to different external storage systems such as AWS S3. This essentially means that data persisted in named volumes is not tied anymore even to a single host and can be possibly used across many hosts.
In certain cases, it's essential that the different containers do not see the same data and the data is replicated at the application level. In some other cases, it may be desired that multiple containers in different hosts see the same exact data. To cater to the different needs for volume behavior, Docker supports a pluggable model for volume drivers. There is a plethora of different volume driver plugins available to be used. Each of the drivers integrates to different external storage systems and thus provide different options. This, of course, means that it depends on the specific use case which driver can be used and which of the drivers fulfill the requirements.
Volumes With Kontena
From 1.2.0, the Kontena Platform comes with the capability to manage named volumes. It's currently labeled as an
experimental feature. Experimental in practice means that some of the details in the APIs and commands might still change a bit. However, we wanted to make the feature available already so we can get feedback from the community and find potential issues and rough corners.
Managing Volumes With CLI
Kontena's CLI comes with subcommands to manage volumes. There are the usual commands to
show details of, and to
remove a volume. Currently, the volumes are managed outside of Kontena Stacks, but the externally managed volumes can then be used in different stacks.
Creating a Volume
A volume can be created with the
kontena volume create --scope instance --driver rexray my-volume command. This creates the needed volume configuration on Kontena Master, which can then be used in services. This does not actually create any Docker named volumes, they are created on-the-fly when services using a volume are scheduled to a certain node.
driver option needs to be specified. It defines the volume driver to be used when actually creating the Docker named volume.
scope option defines how the volume is created when multiple services and/or service instances use the same volume. Different options for the scopes are described in the documentation in detail.
Using a Volume
After a volume has been created with the
kontena volume create command, it can be used in services. When a stack wants to use a volume, it needs to declare the use in the
volumes section of the stack file:
stack: jussi/redis description: Just a simple Redis stack with volume version: 0.0.1 services: redis: image: redis:3.2-alpine command: redis-server --appendonly yes volumes: - redis-data:/data volumes: redis-data: external: true
This takes the
redis-data named volume into use within the stack.
Alternatively, the volume name within the stack file can be mapped to a different external name using:
volumes: redis-data: external: name: other-named-vol
Show Volume Details
kontena volume show my-volume show volume details such as where it is used:
$ kontena volume show redis-data [EXPERIMENTAL] The `kontena volume` commands are still experimental in Kontena 1.2, and may change in future releases redis-data: id: test/redis-data created: 2017-04-20T12:31:25.199Z scope: instance driver: local driver_opts: instances: - name: redis.redis-data-1 node: polished-frost-64 - name: redis.redis-data-2 node: proud-breeze-32 - name: redis.redis-data-3 node: wandering-rain-51 services: - test/null/redis
Remove a Volume
A volume can be removed only after all the services that are using it are removed. This is to prevent accidental data loss.
A volume can be removed with
kontena volume rm my-volume.
Scheduling of Volumes
When a volume is created with the
kontena volume create command, not much actually happens. The volume information is stored on the master's database and that's about it. The actual Docker named volumes are created only when a service using a given volume is deployed to some of the nodes. Naturally, not all the nodes in the grid have all the possible volume drivers installed. When a service using a volume is scheduled, the list of possible nodes is filtered based on the needed volume drivers.
One aspect that also affects volume and service scheduling is the scope of the volume. The scope defines how many instances (actual Docker named volumes) out of the single volume will be created. When the
instance scope is used, each of the service instances will get their own volume created from the same definition. This affects the scheduling so that the service instance will be scheduled always on the same node where the related volume instance lives on. This behaves quite similarly to
stateful services but also gives the option to use a specific volume driver for the data.
experimental flag suggests, this is only the first step towards managed volumes. With the experiences and feedback we get from the volume support, we'll make the necessary adjustments. For example, the Docker volume naming probably needs some fine tuning as there seems to be severe restrictions in certain environments and drivers around the naming of volumes.
One other thing that we have been thinking of is providing some kind of option to define how a volume can (or if it can in the first place) be re-scheduled across nodes. Some of the drivers support cases where a volume can be moved from one node to another. But usually, there are some constraints when doing this. For example, with the RexRay driver, AWS EBS volumes can move only within the same AZ where they have been created, so there need to be some general and flexible enough control options to take these into account while scheduling services and volumes.
Published at DZone with permission of Jussi Nummelin , DZone MVB. See the original article here.
Opinions expressed by DZone contributors are their own.