Introduction to Apache Mesos
If Mesos were an airport dispatcher, runways would be computer nodes, airplanes would be computed tasks, and frameworks would be airlines companies.
Join the DZone community and get the full member experience.Join For Free
To understand what Mesos is in layman’s terms, imagine a busy airport. Airplanes are constantly taking off and landing. There are multiple runways, and an airport dispatcher is assigning timeslots to airplanes to land or take off.
Mesos is the airport dispatcher, runways are computer nodes, airplanes are computed tasks, and frameworks like Hadoop, Spark, and Google Kubernetes are airlines companies.
In technical terms, Apache Mesos is the first open-source cluster manager that handles workloads efficiently in a distributed environment through dynamic resource sharing and isolation. This means that you can run any distributed application, i.e. spark, Hadoop etc., that requires clustered resources.
It sits between the application layer and the operating system and makes it easier to deploy and manage applications in large-scale clustered environments more efficiently.
Mesos allows multiple services to scale and utilize a shared pool of servers more efficiently. The key idea behind the Mesos is to turn your data center into one very large computer.
Apache Mesos is the opposite of virtualization because, in virtualization, one physical resource is divided into multiple virtual resources, while in Mesos, multiple physical resources are clubbed into a single virtual resource.
Who Uses It?
Prominent users of Mesos include Twitter, Airbnb, MediaCrossing, Xogito, and Categorize. Airbnb uses Mesos to manage their big data infrastructure.
Mesos leverages features of modern kernels for resource isolation, prioritization, limiting, and accounting. This is normally done by groups in Linux or zones in Solaris. Mesos provides resource isolation for CPU, memory, I/O, file system, etc. It is also possible to use Linux containers but current isolation support for Linux containers in Mesos is limited to only CPU and memory.
The Architecture of Mesos
The Mesos master is the heart of the cluster. It guarantees that the cluster will be highly available. It hosts the primary user interface that provides information about the resources available in the cluster. The master is a central source of all running tasks; it stores in memory all the data related to the task. For a completed task, there is only a fixed amount of memory available, allowing the master to serve the user interface and data about the task with the minimal latency.
The Mesos agent holds and manages the container that hosts the executor (all things run inside a container in Mesos). It manages the communication between the local executor and Mesos master, acting as an intermediate between them. The Mesos agent publishes the information related to the host they are running in, including data about running tasks and executors, available resources of the host, and other metadata. It guarantees the delivery of status updates of the tasks to the schedulers.
Mesos Framework has two parts: the scheduler and the executor. The scheduler registers itself in the Mesos master, and in turn gets the unique framework ID. It is the responsibility of the scheduler to launch tasks when the resource requirement and constraints match with the received offer the Mesos master. It is also responsible for handling task failures and errors. The executor executes the task launched by the scheduler and notifies back the status of each task.
How Mesos Works
- Agent 1 informs the master about its availability (it has 4cpu and 4 GB of memory available). The master then cites the allocation policy module.
- The master sends a resource offer describing what is available on Agent 1 to Framework 1.
- The framework’s scheduler replies to the master with information about two tasks to run on the agent using <2 CPUs, 1 GB RAM> for the first task and <1 CPUs, 2 GB RAM> for the second task.
- Finally, the master sends the task to the agent, which allocates appropriate resource to the framework executor. If space is free, the other framework can also use the spare space and resources.
- It provides a Web UI to monitor cluster state.
- Multi-resource scheduling.
- Fault tolerance and high availability.
- Ability to share resources across many frameworks.
Mesos vs. YARN
Both systems have the same goal: to allow you to share a large cluster of machines between different frameworks.
- Mesos handles both memory and CPU scheduling and YARN only handles memory scheduling (i.e. you request x containers of y MB each).
- Mesos uses Linux container groups and YARN uses simple unix processes.
- The Mesos authentication module uses the Cyrus SASL library. SASL is a flexible framework that allows two endpoints to authenticate with each other using a variety of methods. By default, Mesos uses CRAM-MD5 authentication and YARN uses Kerberos as its authentication and authorization mechanism. Security features of Hadoop consist of authentication, service-level authorization, authentication for web consoles, and data confidentiality.
The aim of this article was to introduce you to Mesos, what it is, and how is it better than YARN. In the next article, we will explore more about Mesos. So, stay tuned! Please feel free to make suggestions or comment!
Published at DZone with permission of Mahesh Chand, DZone MVB. See the original article here.
Opinions expressed by DZone contributors are their own.