There is a long list of items that can be tuned in Hadoop, but understanding how each daemon uses memory in Hadoop is fundamental to effective tuning. Daemons launch JVMs (Java Virtual Machines) that use memory that, in turn, launch more JVMs using memory. However, it's not always easy to profile the memory use across Hadoop with all the moving parts that Hadoop presents.
It’s important to leave memory for the OS by not running operations in swap space (pagefile) constantly. The method for changing these values varies by distribution even though the names of the parameters might be identical. This varies in a distribution specific management user interface.
The job execution system in Hadoop is called YARN. This is a container based system used to make launching work on a Hadoop cluster a generic scheduling process. Yarn orchestrates the flow of jobs via containers as a generic unit of work to be placed on nodes for execution. This was the major change from Hadoop 1 to Hadoop 2 and it opened the world of Hadoop to many more programming paradigms than just MapReduce.
There are three basic areas to consider when tuning:
- Tune the YARN system (or Hadoop itself)
- Tune the containers that are launched by YARN.
- Tune the applications that run within those containers.
Each of these areas control not only the speed with which any single job runs but more importantly, the overall cluster utilization.
Tuning Daemons (Hosts)
Leaving master nodes aside, the tuning of data or execution nodes is a pretty straightforward business. You are essentially dividing up the use of physical and virtual memory amongst the processes running on that node. For the most part, this is simple math. The node manager consumes a configurable amount of RAM as a part of the YARN system - most distros by default turn this very low. At the extreme end of the spectrum, it's fun to investigate the settings use in the sandbox. These settings are typically configured in order to allow for a functioning Hadoop cluster without being highly parallel. On any single execution node, there is a Node Manager brokering resources with the master service known as the Resource Manager for the placement of containers. Containers are units of work that are placed on execution nodes. The first container allocated is called the application master (aka container zero) and is responsible for the administration of actual work executing containers. It's possible to tune the amount of memory used by both Yarn daemons that operate the system and the containers they launch. Some are concerned with adjusting container memory use as it can directly impact the success or failure of a job.
Node managers are controlled by YARN properties starting with “yarn.nodemanager” inside the yarn-default and yarn-site.xml. These include a mix of settings including log locations, timeouts and webapp address for the Node Manager itself. This group of settings also controls the settings for the total amount of memory and CPUs the nodemanager can allocate from a single node. Typically, these are the totals used for resources prior to any container allocation within a node. As work proceeds, this will be the amount from which each container subtracts its own resource usage. The next thing to consider is what size of containers will be allowed on each node.
Tuning Containers (YARN)
There is a mix of YARN properties specific to the resources allocated per container under control of the node manager. Again, these start with “yarn.nodemanager”. The properties important to container control are summarized below.
Tuning Applications (MapReduce or Spark)
The surrounded application running inside the container within YARN also controls the in-memory that can be manipulated. The Resource Manager (RM) will launch an application master (container 0) which launches the working containers (containers 1-N). It should be noted there is also the concept of an “Uber job” wherein the processes launched run in container 0 and never communicate with other containers. This is used for smaller jobs and is used for the MR App Master. The scheduling system inside RM is called a “pure scheduler” because it is not responsible for post job placement monitoring or administration. While it isn't the intention of this article to go into application tuning it's critical to understand tuning overall. MapReduce and Spark each have their own set of properties for the application AND the application master used to launch jobs within YARN.
- Hortonworks has a script and manual calculation examples for additional help.
- If you use Ambari to manage your setting, stick with it. Don’t try to mix command line and Ambari UI at the same time.