Using Docker Containers in the Deep Learning Platform FfDL
Let's take a quick look at how to use Docker containers in the Deep Learning platform FfDL.
Join the DZone community and get the full member experience.Join For Free
in order to train the object detection model for my sample, tensorflow object detection for anki overdrive cars , i used a custom docker container. i've extended the description of my repo to also document how fabric for deep learning (ffdl) can be used to run this container.
fabric for deep learning (ffdl) is an open-source extension for kubernetes to run deep learning workloads. it can be run in the cloud or on-premises. it supports frameworks like tensorflow, caffe and pytourch, it's supports gpus and can be used to run distributed trainings. watson deep learning as a service uses ffdl internally.
fortunately, it turned out that you can also bring your own containers and use them for trainings on ffdl. until recently, this was only possible when compiling ffdl before deploying it, but now, this functionality is also available with the standard installation.
name: niklas od description: niklas od version: "1.0" gpus: 0 cpus: 4 learners: 1 memory: 16gb data_stores: - id: test-datastore type: mount_cos training_data: container: nh-od-input training_results: container: nh-od-output connection: auth_url: http://s3-api.dal-us-geo.objectstorage.softlayer.net user_name: xxx password: xxx framework: name: custom version: "nheidloff/train-od" command: cd /tensorflow/models/research/volume && python model_main.py --model_dir=$result_dir/training --pipeline_config_path=ssd_mobilenet_v2_coco.config --num_train_steps=15000 --alsologtostderrdock
at the bottom of the manifest.yml, the docker image is defined as well as the command to start the training. to initiate the training, a cli can be used or a web application as shown in the screenshot.
thanks a lot to my colleagues tommy chaoping and animesh singh for their help to get this sample to work.
Published at DZone with permission of Niklas Heidloff, DZone MVB. See the original article here.
Opinions expressed by DZone contributors are their own.