DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Zones

Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks

Generative AI has transformed nearly every industry. How can you leverage GenAI to improve your productivity and efficiency?

SBOMs are essential to circumventing software supply chain attacks, and they provide visibility into various software components.

Related

  • Anypoint Runtime Fabric Basic vs. Crucial Points for Planning Resources
  • Why Mocking Sucks
  • How to Enhance the Performance of .NET Core Applications for Large Responses
  • Automatic 1111: Adding Custom APIs

Trending

  • Decoding the Secret Language of LLM Tokenizers
  • Secret Recipe of the Template Method: Po Learns the Art of Structured Cooking
  • Docker Model Runner: A Game Changer in Local AI Development (C# Developer Perspective)
  • When Incentives Sabotage Product Strategy
  1. DZone
  2. Software Design and Architecture
  3. Cloud Architecture
  4. How to Use the YARN API to Determine Resources Available for Spark Application Submission (Part 3)

How to Use the YARN API to Determine Resources Available for Spark Application Submission (Part 3)

In this final installment of the series, check out several other features of the YARN API, such as the capacity scheduler and calculating overhead.

By 
Rachel Warren user avatar
Rachel Warren
·
Nov. 11, 16 · Tutorial
Likes (1)
Comment
Save
Tweet
Share
10.3K Views

Join the DZone community and get the full member experience.

Join For Free

Welcome to the final section of this introduction to the YARN API.

Capacity Scheduler

The capacity scheduler is pretty straight forward. It assigns each user a percentage of the parent queue that they are allowed to use. The queue object corresponding to your user, will have an “absoluteCapacity” field containing a double. This is the percentage of the entire cluster (both cpu and memory) that is available for jobs submitted by your user.

Thus you need to multiply this percentage, by the total resource available on the cluster. These metrics can be retrieved from the “clusterMetrics” api which contains both an “availableVirtualCores” and an “availableMB” field.

Finding total memory and total vCores available on the cluster
API Call: http://<rm http address:port>/ws/v1/cluster/metrics
Path to Json clusterMetrics→ availableVirtualCores
→ availableMB

(total_vcores,total_memory) =http://<rm http address:port>/ws/v1/cluster/metrics -> “clusterMetrics” -> {“availableVirtualCores”, “availableMB”}

Finding percent of resources available to your user if the queue type is capacity scheduler
API Call: http://<rm http address:port>/ws/v1/cluster/scheduler
Path to Json scheduler→ schedulerInfo
→ rootQueue
→ queues
….find your queue
→ myUserName
→ absoluteCapacity

Then we can calculate the cores and memory available  for our job with the following  equations

Available vcores for Spark Application  = absoluteCapacity x availableVirtualCores

Available memory (mb) for Spark Application  = absoluteCapacity x availableMB

Fair Scheduler

The fair scheduler is a little more complicated. Essentially resources are divided equally among other active users on the cluster. If the scheduler type is the fairScheduler, than to get the resource available to our queue we have to query the queue object for the “maxResources” object, which contains a “vCores” and “memory” field. These values represent the resource available on each node. Thus, to get the total resource available in our queue we need to multiply each of these numbers by the number of active nodes on the cluster (which can be retrieved from the cluster metrics API.

Finding the active nodes on the cluster
API Call: http://<rm http address:port>/ws/v1/cluster/metrics
Path to Json clusterMetrics→ activeNodes
Finding percent of resources available to your user if the queue type is fair scheduler
API Call: http://<rm http address:port>/ws/v1/cluster/scheduler
Path to Json scheduler→ schedulerInfo
→ rootQueue
→ queues
….find your queue
→ myUserName
→ maxResources
→ vCores
→ memory

Then we can calculate the cores and memory available to our Spark Application with the following equations

Available v-cores = activeNodes  x  v cores per node (see above)

Available memory = activeNodes x  memory per node (see above)

How To Calculate Memory Overhead

This is different depending on whether you are running with yarn client or yarn cluster mode. In yarn cluster mode, both the executor memory overhead and driver memory overhead can be set manually in the conf with the “spark.yarn.executor.memoryOverhead” and “spark.yarn.driver.memoryOverhead” values respectively. If these values are unset the memory can be calculated with the following equation:

memory overhead  =  Max(MEMORY_OVERHEAD_FACTOR *requested memory, MEMORY_OVERHEAD_MINIMUM).

Where MEMORY_OVERHEAD_FACTOR = 0.10 and

MEMORY_OVERHEAD_MINIMUM = 384

That does it for our three-part series on using the YARN API within Chorus. We’re constantly working to deliver new enterprise functionality so check back with our blog to stay up to date and informed on the latest features.

API application Memory (storage engine) cluster

Published at DZone with permission of Rachel Warren, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

Related

  • Anypoint Runtime Fabric Basic vs. Crucial Points for Planning Resources
  • Why Mocking Sucks
  • How to Enhance the Performance of .NET Core Applications for Large Responses
  • Automatic 1111: Adding Custom APIs

Partner Resources

×

Comments

The likes didn't load as expected. Please refresh the page and try again.

ABOUT US

  • About DZone
  • Support and feedback
  • Community research
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 100
  • Nashville, TN 37211
  • [email protected]

Let's be friends: