At Alpine, we continue to deliver new enterprise analytic features within Chorus. With Chorus 6.1, we launched the ability to deliver sophisticated auto-tuning for Spark jobs. Chorus automatically determines the settings for a Spark application including the size and number of the workers and the size of the driver by using information on the size of the data being analyzed, the analytical operations being used in the flow, and the size of the cluster.
In order to do this, we are using Spark with Apache YARN as the cluster manager, which also supports several different versions of Hadoop. To design a program that could configure a Spark job on an unknown cluster, I first needed to query the YARN API to determine the resources available and the size of the cluster before launching each job.
Unexpectedly, this was just as difficult as developing an algorithm to determine the Spark settings. The following are some of the lessons I learned in building the application and how to get the relevant information from the YARN API.
In this three-part post, I will discuss how to find each of the pieces of information about the cluster that our algorithm depended on with focus on:
- The size of a YARN container (memory).
- The number of cores available to each container.
- The total resources available to the job, which requires knowing:
- Which queue the job will be submitted in.
- Which type of scheduler is being used.
- How many resources are available in either the capacity or fair scheduler situations.
What Is the Maximum Size of a YARN Container?
A YARN container corresponds to the size of one YARN request. The Spark JVM is launched as one such request, so no single Spark JVM can be bigger than the size of a YARN container. Spark will fail to submit a job if the value for the Spark memory requested (spark.executor.memory conf value) plus overhead or the driver memory request (spark.driver.memory) plus overhead is larger than the YARN container. I will discuss calculating memory overhead in the final section of this post.
But how do we determine the size of a container if our application is running in an unknown environment? The error message that Spark throws when more executor or driver memory is requested then fits (with overhead) in the YARN container suggests checking the value of yarn.scheduler.maximum-allocation-mb and/or yarn.nodemanager.resource.memory-mb in the Spark configuration (see the origin of this messaging here, in Line 331).
yarn.scheduler.maximum-allocation-mb is a value in the YARN configuration that represents the largest size of a single request (i.e., a container). It is dictated by the system administrator and not by an inherent limitation of the cluster. The yarn.nodemanager.resource.memory-mb value is the resources allocated for each node. On a correctly configured cluster, the yarn.scheduler.maximum-allocation-mb value should be less than or equal to the node manager value because containers should be no larger than a single node. According to my understanding, the best practice is that the values should be equal since there is rarely enough reason to arbitrarily limit requests to anything smaller than the size of a single node. In some instances, however, I have run into clusters that were configured with a larger container than node size. In this case, we can’t submit jobs that are larger than the node size.
Therefore, I assumed that the size of a container should be the minimum of these two values, as the Spark message suggests. However, this is not always the case. I noticed that when I was submitting a job from within a queue, this check was insufficient to prevent Spark from showing this error. Puzzlingly, the size of the container reported by Spark did not correspond to either of the two configuration values for that cluster. After reading the Spark source code, I realized that Spark actually determines the size of the yarn container by querying the New Application API (i.e., at the URL /ws/v1/cluster/apps/new-application). This API determines what resource would be available for a new application. If this is queried by a user in a queue, the value will reflect size of the container for that user, and this value may be less than the size of a container for someone running in the root queue. The New Application API includes a maximum-resource-capability object that has fields for vCores and memory”, which represent the number of virtual cores that may be requested by a single container and the amount of memory (in MB) available respectively.
To summarize, the actual size of a YARN container for an application submitted by a given user can be determined by a call to the New Application API. The JSON object is keyed by maximum-resource-capability which contains a field for memory and vcores.
The size of the YARN container in memory and cores API call: http://<rm http address:port>/ws/v1/newApplication Path to JSON maximum-resource-capability
Note:: The YARN documentation lists this object as being keyed with “maximum-resource-capabilities.” Empirically I saw the object returned with the key “maximum-resource-capability.”
How Many vCores Does Each Container Have?
Unlike the executor memory, Spark does not explicitly check that the executors requested do not ask for too many vCores. Thus, if you request a job that sets the value of spark.executor.cores” higher than the vCores available in the maximum-resource-capability object, the submission will either fail at the YARN level, with a message about not having enough cores or in some instances, causing the job to hang in an unassigned state without being submitted. It is, therefore, best to use the above API call to determine the number of cores to use before submitting.
In conclusion, we’ve established the maximum size of a YARN container and how many vCores each container can have. Next week, I will walk through how many total resources are available for a job in addition to how to find your queue. Stay tuned for more information!