Apache Spark Internals: As Easy as Baking a Pizza!
This quick (and hopefully) easy read is a fun take at how you can draw parallels between baking a pizza and running an Apache Spark application!
Join the DZone community and get the full member experience.Join For Free
Even if you aren’t into pizzas, I am sure you have a vague idea of what goes into making them. This quick (and hopefully) easy read is a fun take at how you can draw parallels between baking a pizza and running an Apache Spark application!
Okay! So here we go, let me introduce, Dug. Making “DAGs” is his specialty; He is good with schedules and coming up with a good overall plan for any “job” given to him. He is into the fresh pizza-making business with two other partners (Tessy, who is quite the Task Master, and Buster, who is a good “Cluster Manager”).
DAG Scheduler (Planning)
When a pizza job order comes up, as the planner in charge, Dug comes up with a set of predefined stages and operations that go into each stage of making the pizza. For a very simple pizza, he will come up with a DAG (Directed Acyclic Graph — just read that aloud again — it is quite an intuitive name and graph, isn’t it?)
So Dug has optimized on Stages to making the pizza that includes:
- What operations should go into a pipeline (Stage1 and Stage 2 has a set of sequenced pipelined operations)?
- What can go in parallel (Stage 0, Stage 1, Stage 2 )?
- What sequencing is involved, for example, Stage 3 and Stage 4 cannot commence unless the previous stages are done.
But hold on, remember he is the planning in charge, so he has the additional responsibility of further creating the specific tasks associated with this job. For example:
- If the job order was for a small pizza, he would perhaps determine that this job can be done by 5 people; so he comes up with 5 tasks (1 person each for Stages 0 through 4).
- If it is large pizza, he probably comes up with 8 tasks (2 people for Stage 0+ 2 people for Stage1+ 2 people for Stage2 + 1 person each for stage 3 and Stage4).
So essentially, Dug is considering each unit of work as a task; the bigger the pizza, the larger the number of tasks to be done! And so Dug comes up with a set of tasks that can then be handed over to Tessy, the Task Scheduler.
Task Scheduler and Cluster Manager (Execution)
Now that the exact plan is laid out, the actual execution of these tasks comes into the picture. Worker rooms of the pizza shop with executors coming in when summoned.
In the pizza shop, assume you have a pool of worker rooms, each room can hold some limited set of pizza shop chefs, depending on the worker room capacity. (let us call the chefs executors as they execute a task given to them). Only when there is a task needed to be executed, the chef is called in to execute the given task in that room.
- Executor1 will probably make the pizza base.
- Executor 2 will make the sauce.
- Executor 3 will make the toppings.
- Executor 4 will come in his/her timeline to layer the pizza and finally.
- Executor 5 will be called in to put the pizza in the oven.
In short, when it finally comes to execution, Tessy the Task Scheduler takes the set of tasks. She takes the help of Buster the cluster manager and they work together. Tessy decides what task gets scheduled when. And Buster decides when an “executor” should be summoned in which “worker” room. To reiterate, Tessy only knows when to hand out the tasks in the right order and Buster only knows which worker rooms are free to hold the executors and how to call upon them.
How Many Tasks Can Executor Execute?
One employee with one pair of hands (1 core vCPU), can execute one task at a time. But let’s say we can attach, say 4 extra pairs of hands (4 core vCPU), to the employee, they can execute 4 tasks at the same time!
Stretch this imagery a bit more. Let’s say we have the feature of attaching hands of different sizes to the executor. Now they are capable of handling more dough (data) per task!
Repartition and Coalesce
If during one stage, one or two executors get to knead 10 Kg dough each for the pizza base and the rest of the executors get to knead 100 gms dough each. Is it efficient? Obviously not, the whole stage of pizza base preparation is stalled for those one or two executors who have a lot more to do. Just do a repartition() which does a proper redistribution and we are good.
Consider the opposite case, 100 executors kneading 1gm of dough each. Again not efficient! It does not take as much time for the executor to walk into the room as it does for them to knead the dough. So what should we do? Call the coalesce() cop on the dough/data. And we get the right number of portions (partitions) and therefore executors.
Shuffle, Spill, Shuffle Write and Read
- Shuffle Spill (To Disk From Memory)
Imagine the executor's hands are not large enough to knead the dough all at once. They need to keep a bit aside into the bowl, knead as much as they can and then pull in the part that was kept aside. So it is better here to have “bigger hands” or smaller portions to avoid the back and forth.
- Shuffle Write
When the “make toppings” task or the “make pizza base” task is completed, the executor puts it on a nearby plate, ready to be transferred.
- Shuffle Read
Next, the executor in Stage2 takes the contents from that plate and proceeds to the “assemble pizza” task.
Side Note: It is always in this order — Shuffle Write, followed by a Shuffle Read.
Client, Spark Driver and Spark Session
In the pizza world, you can consider the client (spark-submit/pyspark/spark-shell) as the final end user who has submitted an application/program (two or three pizzas) to the Swiggy Delivery boy (Driver)
Spark Session/Context is the front desk service guy who takes the orders, and coordinates with the Dug, Tessy, Buster, and also with the Swiggy guy in the pizza shop.
Dynamic Allocation vs Static Allocation
Let’s assume the pizza shop had a new pricing model, based on resources consumed (hands and heads attached to the executor).
spark.dynamicAllocation.enabled=true spark.dynamicAllocation.minExecutors=1 spark.dynamicAllocation.maxExecutors=<N>
If you have the resource/price-conscious customer who doesn’t mind waiting a bit more for the pizza to arrive, they would let the scheduler decide how many executors should be dynamically called in to execute a job. It might take a little longer because it takes some time for the executor to walk-in only when needed.
On the other hand, the customer who doesn’t care for the resources/price and has an upfront idea of how many executors they need, then they would specify the exact number of executors per pizza job. They may have some executors lying idle during the execution, but so what, they got the job done fast!
Spark Master + Cluster Manager
The role of Cluster Manager (the one in charge of handing out the cluster resources(executors)) can be taken on by different personas, depending on where the worker rooms are located (Spark cluster is hosted).
a) Spark Standalone Cluster Manager
This is what we have seen so far. Buster is in owner and in charge of the worker rooms; He is the main in-charge when Tessy needs to run her tasks.
b) Mesos, Kuberbernetes, Yarn Cluster Manager
Here the role of cluster manager is taken on by another major player in the system — the one who owns the rooms, It is like you have rented out / outsourced the rooms. Under those circumstances, Buster takes on the role of “Spark Master”, who negotiates with the actual ClusterManager for resources required for the pizza job; You can imagine why — it is because the actual cluster manager is now responsible for handing out resources to many other processes in the system. They may be using the worker rooms and executors to make, maybe cupcakes and other things!
If Spark is running in Standalone mode, the Master process performs the Cluster Manager functions.
Putting It All Together - Who Is the Master Chef
Let us now step back and list down the actors in the pizza shop:
- The customer (client)
- The Swiggy Delivery Boy (Driver/Application)
- The front desk guy (Spark Session)
- Dug the scheduler guy (DAG Scheduler)
- Tessy the taskmaster (Task Scheduler)
- Buster the Master / Cluster Manager (depending on what the Resource Manager is)
- Executor (Chefs who actually do the work)
So who do you think is is the grand orchestrator — the Master Chef?
It is the Driver! It is actually a “Make Your Own Pizza” shop!
Its Driver’s responsibility to create SparkSession object (the entry point for any Spark application) and plan application by creating a DAG consisting of tasks and stages. It’s Spark Driver's responsibility to create the SparkSession. (When we submit a spark application, the first process to come up is the Application Master ) The Driver communicates with a Master, which in turn communicates with a Cluster Manager to allocate application runtime resources (containers) on which Executors will run. Additionally, the Driver coordinates stages and tasks runnings that are defined in the DAG. The key functions that the driver is responsible to execute for task scheduling and running include: Tracking available resources to run task, Scheduling tasks to run “close” to the data wherever it is possible (the concept of data locality) - Src
- One pizza application can have many pizza orders/jobs.
- One pizza application will use one exclusive SparkSession/SparkContext/DAG scheduler/TaskScheduler/ Spark Master.
- One pizza order/job will get its own set of dedicated executors.
- The Cluster Manager and Worker rooms are for all — They are used by all the pizza applications!
- The Master Chef (Driver) talks to all other allotted chefs directly and they exchange stuff (the location of the raw ingredients to the final pizza).
Cluster Mode vs Client Mode (Driver)
This can be compared to the Driver working from home and giving instructions on the phone vs the Driver sitting inside one of the worker rooms of the pizza shop and coordinating the show. If the line gets cut, the pizza order is canceled. If the driver is in the shop itself, it becomes a more efficient process for the driver. (No more of: Hello, are you there? Can you hear me? Your voice is breaking..no network instabilities)
The Pizza Order Is Ready!
Apache Spark is a vast topic and there are several knobs out there to tune your large applications to make them work smoothly. If you are a newbie to Spark, you can get easily thrown off with all the jargon. Understanding the internals and basics will help you take that first step to that journey.
After reading this, go and read that Spark Staged Execution chapter once again, and hopefully, it becomes an easier read. If this has been a good bite, let’s have a show of hands :-)!
“You do not really understand something unless you can explain it to your grandmother.”
― Albert Einstein
Opinions expressed by DZone contributors are their own.