Data Lake 3.0 Part 3 - Distributed TensorFlow Assembly on Apache Hadoop Yarn
In this post on Data Lake 3.0, we'll take a look under the hood and demonstrate a Deep Learning framework (TensorFlow) assembly on Apache Hadoop YARN.
Join the DZone community and get the full member experience.Join For Free
Thank you for reading our Data Lake 3.0 series! In part 1 of the series, we introduced what a Data Lake 3.0 is and in part 2 of the series, we talked about how a multi-colored YARN will play a critical role in building a successful Data Lake 3.0. In this post, we will take a look under the hood and demonstrate a Deep Learning framework (TensorFlow) assembly on Apache Hadoop YARN.
TensorFlow™ is an open source software library for numerical computation using data flow graphs. It is one of the most popular platforms for creating machine learning and deep learning applications.
It’s best to run TensorFlow on machines equipped with GPUs, since TensorFlow can leverage CUDA and cuDNN to speed-up a lot by leveraging GPU power!
We’ve talked about YARN and docker container runtime in Hadoop Summit 2016.
The recently published Nvidia Docker wrapper makes it possible to use GPU capabilities in a Docker container.
Part 0. Why TensorFlow Assembly On Yarn?
YARN has been used successfully to run all sorts of data applications. These applications can all coexist on a shared infrastructure managed through YARN’s centralized scheduling.
With TensorFlow, one can get started with deep learning without much knowledge about advanced math models and optimization algorithms.
If you have GPU-equipped hardware, and you want to run TensorFlow, going through the process of setting up hardware, installing the bits, and optionally also dealing with faults, scaling the app up and down etc. becomes cumbersome really fast. Instead, integrating TensorFlow to YARN allows us to seamlessly manage resources across machine learning / deep learning workloads and other YARN workloads like MapReduce, Spark, Hive, etc.
If you have a few GPUs and you want them to be shared by multiple tenants and applications to run TensorFlow and other YARN applications all on the same cluster, this blogpost will help you.
Part 1. Background
This blog uses a couple of features of YARN, under ongoing active development:
- YARN-3611, Support Docker Containers In LinuxContainerExecutor: Better support of Docker container execution in YARN
- YARN-4793, Simplified API layer for services and beyond: The new simple-services API layer backed by REST interfaces. This API can be used to create and manage the lifecycle of YARN services in a simple manner. Services here can range from simple single-component apps to complex multi-component assemblies needing orchestration.
Both of the features will most likely be available in one of the Apache Hadoop 3.x releases.
Part 2. Running TensorFlow Services Assembly On Yarn
A common workflow of TensorFlow (And this is common for any supervised machine learning platform) is like this:
- Training cluster reads from input dataset, uses algorithms to build a data model. For example, see Google’s cat recognition learning.
- Serving cluster serves the data model, so that any client can connect to the serving cluster to predict new samples, for example, ask “Is this a cat?”
- Training cluster can periodically update data model when new data available, so clients can get prediction results from the up-to-date model.
Also, TensorFlow added support for distributed training since 0.8 release, which can significantly shorten training time in lots of cases. With YARN, we can start a distributed training/serving cluster combo in no time!
Assembly Description File
To orchestrate multi-component assembly on YARN, we need to prepare a description file so YARN can know how to launch it (please refer to the video and slides for an introduction on assemblies).
Below is an example of assembly description file for TensorFlow assembly.
In our example, we have 3 components:
- Training cluster components
- Trainers: Training the model from training.
- Parameter servers: Manage shared model parameters for trainers.
- Serving cluster components: Uses tensorflow-serving to serve exported data model from training cluster.
So we have 3 components in the assembly description file.
Let’s look at the description of each of the components:
First, let’s look at the definition of trainer:
- Docker image is specified to be “hortonworks/tensorflow-yarn:0.2”
- The launch command to use is example-distributed-trainer.py
- It specified hostname/ports of parameter servers and workers (trainers). We configured DNS to make launched container has a unique hostname to: <component_name>.<assembly-name>.<user-name>.<domain>
Description of parameter server is very much similar to trainer, the only difference is to specify job_name to “ps” instead of “worker”
Following is a description of a serving server, it changed to use different Docker images and launch commands to make it serve inception v3 model to do image classification. I will cover more details in the demo video below.
The following demo video shows the launch of TensorFlow assembly on YARN, running TensorFlow training and serving the inception-v3 model to do image classification.
Part 3. Accessing GPU in Yarn Assembly
Nvidia-docker is a wrapper of /usr/bin/docker, which is required to make processes inside the Docker container use the GPU capacity. So first of all, you need to Follow the Nvidia-Docker installation guide to properly install dependencies such as Docker/Nvidia driver, etc. on all the nodes.
After Nvidia Docker is installed, test it by executing the following command on each of the nodes:
You should see this output:
Then, you need to set YARN configuration to use nvidia-docker binary to launch container, just update container-executor.cfg under $HADOOP_CONF_DIR, set docker.binary to
Once all above steps are done, you can launch Docker containers through YARN as same as part 1. And all these containers can access GPU.
The training process gains a lot of speed when it runs with GPU support!
Below are screen recordings of running the same training process without and with a GPU.
Published at DZone with permission of Vinod Kumar Vavilapalli, DZone MVB. See the original article here.
Opinions expressed by DZone contributors are their own.