DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports Events Over 2 million developers have joined DZone. Join Today! Thanks for visiting DZone today,
Edit Profile Manage Email Subscriptions Moderation Admin Console How to Post to DZone Article Submission Guidelines
View Profile
Sign Out
Refcards
Trend Reports
Events
Zones
Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Partner Zones AWS Cloud
by AWS Developer Relations
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Partner Zones
AWS Cloud
by AWS Developer Relations
The Latest "Software Integration: The Intersection of APIs, Microservices, and Cloud-Based Systems" Trend Report
Get the report
  1. DZone
  2. Software Design and Architecture
  3. Cloud Architecture
  4. How to Use the YARN API to Determine Resources Available for Spark Application Submission (Part 3)

How to Use the YARN API to Determine Resources Available for Spark Application Submission (Part 3)

In this final installment of the series, check out several other features of the YARN API, such as the capacity scheduler and calculating overhead.

Rachel Warren user avatar by
Rachel Warren
·
Nov. 11, 16 · Tutorial
Like (1)
Save
Tweet
Share
9.56K Views

Join the DZone community and get the full member experience.

Join For Free

Welcome to the final section of this introduction to the YARN API.

Capacity Scheduler

The capacity scheduler is pretty straight forward. It assigns each user a percentage of the parent queue that they are allowed to use. The queue object corresponding to your user, will have an “absoluteCapacity” field containing a double. This is the percentage of the entire cluster (both cpu and memory) that is available for jobs submitted by your user.

Thus you need to multiply this percentage, by the total resource available on the cluster. These metrics can be retrieved from the “clusterMetrics” api which contains both an “availableVirtualCores” and an “availableMB” field.

Finding total memory and total vCores available on the cluster
API Call: http://<rm http address:port>/ws/v1/cluster/metrics
Path to Json clusterMetrics→ availableVirtualCores
→ availableMB

(total_vcores,total_memory) =http://<rm http address:port>/ws/v1/cluster/metrics -> “clusterMetrics” -> {“availableVirtualCores”, “availableMB”}

Finding percent of resources available to your user if the queue type is capacity scheduler
API Call: http://<rm http address:port>/ws/v1/cluster/scheduler
Path to Json scheduler→ schedulerInfo
→ rootQueue
→ queues
….find your queue
→ myUserName
→ absoluteCapacity

Then we can calculate the cores and memory available  for our job with the following  equations

Available vcores for Spark Application  = absoluteCapacity x availableVirtualCores

Available memory (mb) for Spark Application  = absoluteCapacity x availableMB

Fair Scheduler

The fair scheduler is a little more complicated. Essentially resources are divided equally among other active users on the cluster. If the scheduler type is the fairScheduler, than to get the resource available to our queue we have to query the queue object for the “maxResources” object, which contains a “vCores” and “memory” field. These values represent the resource available on each node. Thus, to get the total resource available in our queue we need to multiply each of these numbers by the number of active nodes on the cluster (which can be retrieved from the cluster metrics API.

Finding the active nodes on the cluster
API Call: http://<rm http address:port>/ws/v1/cluster/metrics
Path to Json clusterMetrics→ activeNodes
Finding percent of resources available to your user if the queue type is fair scheduler
API Call: http://<rm http address:port>/ws/v1/cluster/scheduler
Path to Json scheduler→ schedulerInfo
→ rootQueue
→ queues
….find your queue
→ myUserName
→ maxResources
→ vCores
→ memory

Then we can calculate the cores and memory available to our Spark Application with the following equations

Available v-cores = activeNodes  x  v cores per node (see above)

Available memory = activeNodes x  memory per node (see above)

How To Calculate Memory Overhead

This is different depending on whether you are running with yarn client or yarn cluster mode. In yarn cluster mode, both the executor memory overhead and driver memory overhead can be set manually in the conf with the “spark.yarn.executor.memoryOverhead” and “spark.yarn.driver.memoryOverhead” values respectively. If these values are unset the memory can be calculated with the following equation:

memory overhead  =  Max(MEMORY_OVERHEAD_FACTOR *requested memory, MEMORY_OVERHEAD_MINIMUM).

Where MEMORY_OVERHEAD_FACTOR = 0.10 and

MEMORY_OVERHEAD_MINIMUM = 384

That does it for our three-part series on using the YARN API within Chorus. We’re constantly working to deliver new enterprise functionality so check back with our blog to stay up to date and informed on the latest features.

API application Memory (storage engine) cluster

Published at DZone with permission of Rachel Warren, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

Popular on DZone

  • 4 Best dApp Frameworks for First-Time Ethereum Developers
  • MongoDB Time Series Benchmark and Review
  • Best Practices for Setting up Monitoring Operations for Your AI Team
  • Integrating AWS Secrets Manager With Spring Boot

Comments

Partner Resources

X

ABOUT US

  • About DZone
  • Send feedback
  • Careers
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 600 Park Offices Drive
  • Suite 300
  • Durham, NC 27709
  • support@dzone.com
  • +1 (919) 678-0300

Let's be friends: