This article is featured in the DZone Guide to Big Data, Business Intelligence, and Analytics – 2015 Edition. Get your free copy for more insightful articles, industry statistics, and more.
Workload management on distributed systems tends to be an afterthought in implementation. Anyone who has used a compute cluster for a job dependent function quickly discovers that placement of work and its prioritization are paramount not only in daily operations but also individual success. Furthermore, many organizations quickly find out that even with robust job placement policies to allow for dynamic resource sharing, multiple clusters become a requirement for individual lines of business based upon their requirements. As these clusters grow, so does so-called data siloing. Over time, the amount of company computer power acquired could eclipse any reasonable argument for individual systems if used in a programmatically shareable way.
A properly shared technology allows for greater utilization across all business units, allowing for some organizations to consume beyond their reasonable share during global lulls in workloads across the organization. The unique requirements of people, business units and business as a whole make the development of such global sharing difficult—if not impossible. This brings the need for workload and resource management into sharp relief. This article will describe the latest advances in Big Data based workload and resource management.
Much has been written about YARN, and it is well described in many places. For context, I offer a high level overview: YARN is essentially a container system and scheduler designed primarily for use with a Hadoop based cluster. The containers in YARN are capable of running many types of tasks including MPI, web servers, and virtually anything else one would like. YARN containers might be seen as difficult to write, giving rise to other projects like what was once called HOYA (Hbase on YARN, which was eventually coined Apache Slider) as an attempt at providing a generic implementation of easy-to-use YARN containers. This allowed for a much easier entry point for those wishing to use YARN over a distributed file system. This has been used mainly for longer running services like HBase and Accumulo, but will likely support other services as it moves out of incubator status.
Figure 1: YARN Architecture—Basic Structure of the YARN Job Submission Process
YARN, then, was expected to be a cornerstone of Hadoop as an operating system for data. This would include HDFS as the storage layer along with YARN scheduling processes for a wider variety of applications well beyond MapReduce. The implementation, while novel and liberating, leaves room for improvement in several ways. In order to build a novel distributed computer one needs more than just a CPU/scheduler and data storage. This process in many ways mirrors the development of an operating system. In Linux it requires security, distributed configuration management, and workload management, all done with the intention of harnessing the power of multiple hardware systems into a single virtual pool of resources. As luck would have it, the Open Source community offers the benefit of multiple solutions.
Enter Apache Mesos. While some have described Mesos as a “meta scheduler,” or a scheduler of schedulers, its creators have more aptly called Mesos a “distributed systems kernel.” Mesos essentially uses a container architecture but is abstracted enough to allow seamless execution of multiple, sometimes identical, distributed systems on the same architecture, minus the resource overhead of virtualization systems. This includes appropriate resource isolation while still allowing for data locality needed for frameworks like MapReduce.
As one might expect, Mesos uses a very familiar nomenclature and includes container-based architecture. It should be noted that in function, Mesos assigns jobs very differently. Mesos is considered a two level scheduler and is described by its creators as a distributed kernel used to abstract system resources. Mesos itself is really an additional layer of scheduling on top of application frameworks that each bring their own brand of scheduling. Application schedulers interface with a Mesos master setup in a familiar Zookeeper-coordinated active-passive architecture, which passes jobs down to compute slaves to run the application of choice.
Figure 2: Mesos Architecture—High-level Representation of Mesos Daemons Running Both Hadoop- and MPI-based Jobs
Mesos is written in C, not Java, and includes support for Docker along with other frameworks. Mesos, then, is the core of the Mesos Data Center Operating System, or DCOS, as it was coined by Mesosphere.
Figure 3: Mesosphere DCOS Architecture—Representation of the DCOS From Mesosphere
This Operating System includes other handy components such as Marathon and Chronos. Marathon provides cluster-wide “init” capabilities for application in containers like Docker or cgroups. This allows one to programmatically automate the launching of large cluster-based applications. Chronos acts as a Mesos API for longer-running batch type jobs while the core Mesos SDK provides an entry point for other applications like Hadoop and Spark.
Figure 4: Application Definition for Running Docker Containers via Marathon
In many of these architectures there still can exist hard partitioning of resources even though scheduling may be centralized. The true goal is a full shared, generic and reusable on demand distributed architecture. Announced during the authoring of this article is a new offering from Mesosphere using DCOS called Infinity, which is used to package and integrate the deployment of clusters in just such a way. Out of the box it will include Cassandra, Kafka, Spark, and Akka. This is current available as an early access project.
In order to directly integrate YARN with Mesos, the Apache Myriad project was formed. The initial goals include making the execution of YARN work on Mesos scheduled systems transparent, multi-tenant, and smoothly managed. The high-level architecture promises to allow Mesos to centrally schedule YARN work via a Mesos based framework, including a REST API for scaling up or down. The system also includes a Mesos executor for launching the node manager as shown in the following diagram.
Figure 5: Mesos/YARN Interaction—Yarn Container Launching via Mesos Management
From a high level, this type of architectural abstraction might seem like common sense. It should be noted, however, that it has taken some time to evolve and mature each of these systems, which in themselves are fields of study worthy of extensive analysis. This new Myriad (Mesos-based) architecture allows for multiple benefits over YARN-based resource management. It makes resource management generic and, therefore, the use of the overall system more flexible. This includes running multiple versions of Hadoop and other applications using the same hardware as well as operational flexibility for sizing. It also allows for use cases such as the same hardware for development, testing, and production. Ultimately, this type of flexibility has the greatest promise to fulfill the ever-changing needs of the modern data driven architecture.
Figure 6: Myriad Architecture—Multi-tenant Mesos Architecture Including YARN-based Hadoop Workloads
A generic, multi-use architecture, encompassing a distributed persistent layer along with the ability to engage in use cases from batch to streaming (real time/near real time) scheduled dynamically yet discretely over a commodity compute infrastructure seems to be the holy grail of many a project today. Mesos and Myriad seem to be two projects well on their way to fulfilling that dream. There is still a great deal of work to be done, including use cases of jobs that span geography along with associated challenges such as cross geography replication and high availability.
The discussion of high-level architectures really starts to bring into play the concept’s overall solution architectures. Candidates include the Lambda and its more modern successor the Zeta Architecture. Generic components like a distributed persistence layer, containerization, and the handling of both solution and enterprise architecture are hallmarks of these advanced architectures. The question of how to best use resources in terms of first principal componentry is being formed and reformed by a daily onslaught of new technology. What is the best storage layer tech? What is the best tech for streaming applications? All of which are commonly asked and hotly debated. This author would argue that “best” in this case is the tool or technology that fits the selection criteria specific to your use case. Sometimes there is such a thing as “good enough” when engineering a solution for any problem. Not all the technologies described above are needed by every organization, but building upon a solid foundational framework that is dynamic and pluggable in its higher layers will be the solution that eventually wins the day.
For more insights on workload and resource management, real-time reporting, and data analytics, get your free copy of the Guide to Big Data, Business Intelligence, and Analytics – 2015 Edition!