Quick disclaimer, I do not claim to be a Mesos expert. This post is a culmination of research we’ve done along with experience using and developing solutions with these technologies in the current ecosystem and community. We welcome additions and corrections. Feel free to add a comment at the bottom of the post!
What Is Apache Mesos?
Apache Mesos abstracts CPU, memory, storage, and other compute resources away from machines (physical or virtual), enabling fault-tolerant and elastic distributed systems to easily be built and run effectively.
Apache Mesos runs tasks by allowing frameworks to run on top of Mesos that receive offers of CPU, memory, network, and storage on which it can schedule a task. If interested in learning more about how Apache Mesos works, start by looking at the architecture diagram in their documentation. The important information for this post is that tasks can be scheduled to run in containers and these tasks sometimes need persistent storage. Such tasks that need persistent storage are databases such as MySQL or MongoDB, a web cache like Nginx cache, logging directories or a data directory that blog software uses to store data. In any case, the framework relies on Mesos to provide the physical or virtual resources it needs to run. In this post, we will focus mainly on the different options available for storage in Mesos and how Flocker can be used to help easily automate persistent tasks.
Using Persistent Storage
Apache Mesos does its job of managing and giving resources to tasks. Typically these tasks are decoupled from the nodes themselves which improves fail-over, flexibility, and scalability. However, when running services that require persistent storage, it can start to get tricky because you could introduce a dependency on the task to the node on which the service runs if using the node’s local storage instead of a more flexible solution. Applications that depend on accessible storage outside of the container can include databases (NoSQL/SQL), web cache, application logs, secret storage and more. We will represent a task and its data with the following diagram.
There are a number of different solutions and architectures you can use to help maintain data availability, security, and consistency for a task. We will walk through some of the known ways to handle storage today in Apache Mesos and discuss what they help with or possibly what pitfalls they may have if used.
Plain Local Filesystem
This approach is the easiest to implement but it has the most drawbacks. In this case, the task references the local filesystem on the host node and data is stored on the disk of the node running the task. Therefore, the task has no flexibility at all including an inability to failover or move around the cluster. With this method, Mesos can try to reschedule to only this node and “pin” the task to this node where it knows its data lives. There are ways to manage failover scenarios with periodic backups to other storage but the configuring can be overly complicated and clunky which defeats some of the main advantages of using Mesos in the first place.
Using Replication or Distributed Filesystems
One way to overcome this drawback is by using a shared or distributed filesystem such as NFS and GlusterFS. Both options give the nodes POSIX-based compliant access to storage and can be managed centrally or on each node. The diagram below shows high-level scenarios of both options.
Some pitfalls of this implementation could be that the overhead of management isn’t worth it, or the reliance on the network could cause corruption or outages; or that not all tasks run as performant on external network-backed POSIX-based file storage and would do better on local, iSCSI or Fibre Channel-based systems.
The upside is that tasks can be rescheduled on other Mesos nodes as well as the fact that NFS is usually straightforward to install and configure compared to other systems such as distributed based filesystems.
Persistent Volumes in Mesos specifically refer to the mechanism in which disk (storage) resources are created for tasks through Mesos.
From the perspective of Mesos, these disks are created from storage resources that already exist on the Mesos slave node and will remain on the node after the task dies. When the task exits, the storage resource can be offered back to the rest of the system so that other tasks can consume any data that was persisted there previously meaning the data won’t be garbage collected for these tasks. Mesos makes sure that your reserved disk resources are available before a task is scheduled.
This method of using persistent volumes in Mesos is much like the local filesystem approach except this is the Mesos specific viewpoint of how to produce such storage resources for tasks. With one caveat. If an operator creates a Path or Mount disk from local SSD, attached storage or distributed storage there are benefits. Except that managing these systems is not automated and disks must be attached, a file system created on them and then mounted before they can be offered to Mesos. Also note that Path disks are shared in the sense that multiple disks are created from one root path of the backing disk and Mount disks have the entire disk dedicated to a task.
Dynamic Reservations are a step above and beyond static reservations which make it possible for frameworks to reserve storage resources after the Mesos slave process has already started.
This makes Mesos have the ability to reserve specific amounts of resources for a framework and later offer it back to the same framework. Something like this can be used with features such as limiting frameworks to use specific nodes in the cluster that may have access to data stores you would like to use for persistent services. As an example, a framework designed for deploying MySQL could make sure to use dynamically reserved storage resources and keep storage offers for said MySQL persistence services only.
This approach again does not have a big impact on the flexibility of the raw storage resources and is more of a Mesos specific feature. In this case, this approach still has most of the pros and cons talked about in the local filesystem and persistent volumes section apart from it getting dynamically reserved for framework roles.
Docker Containerizer and Docker Volume Plugins
One way to gain flexibility for persistent services is to use Docker volume plugins which enable not only local filesystem-like use cases but also shared storage integrations where block devices are attached and mounted on Mesos slaves before tasks are started.
Using this approach you can get native support for drivers through Docker while being able to take advantage of existing systems such as SAN or Software Defined Storage (SDS) platforms.
Some SDS platforms may be deployed in a hyper-converged fashion where storage is pooled on the same nodes as the tasks and is replicated across the SDS solution for redundancy. This allows storage to incur less network traversal than traditional centralized SAN. There are still trade-offs for each approach and this depends on the use cases being considered.
The image below shows an SDS layer as part of multiple nodes where volumes seem “local” and are replicated across the participating Mesos slaves using the network. This also shows a SAN-based approach where volumes are connected from the external SAN box to the Mesos slave over the network.
Some positives include ease of flexibility because volumes are created, mounted and made available with a filesystem when a task is launched dynamically and fail-over and HA are made available with such architectures. Some negatives may include network overhead and some complexity in configuration.
Docker plugins include:
- … and more.
Frameworks, Modules, and Isolators
Like the SAN and SDS use cases mentioned earlier, there are Mesos modules, frameworks, and isolators that allow you work with volume plugins without using the Docker containerizer. This gives tasks all the added benefits above, including networked block storage while using the native Mesos without Docker.
There are a few modules and isolators that are targeted at shared storage.
The Mesos Flocker project adds cloud storage to Mesos. It enables you to start your stateful applications with the confidence of using a redundant data store.
The framework transparently handles the communication with the cloud provider using Flocker. Mesos-Flocker will ensure that your data is always available to your container, even on fail-over.
DVDI Mesos Module is a tool that contains the Docker Volume Driver Isolator Module for Mesos. Its purpose is to create a module that lives on the Mesos slaves that enable external storage to be created/mounted/unmounted with each task that is assigned to a slave.
It’s unique in that it allows you to target different storage platforms provided by Docker volume drivers while allowing users to configure it with volumes options for tasks.
Data Storage Frameworks
There are a number of data storage frameworks that are aimed at managing services or platforms that can provide data storage for applications in Mesos. This differs from some of the above examples which actually provide the raw storage resources to applications that need them but it’s worth mentioning them in the realm of storage.
The above link includes frameworks for specific data services such as:
The list excludes Apache Cotton, which was previously known as Mysos, and is aimed at running MySQL instances as a framework on Mesos but it looks like it has since been retired according to the incubator site.
Generally, Mesos has a lot to offer and that’s a good thing. This is to be expected given newer frameworks and architectures are changing every day, so when looking at the right solution for you, remember this usually highly depends on the needs of your application and use cases.