DZone
AI Zone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
  • Refcardz
  • Trend Reports
  • Webinars
  • Zones
  • |
    • Agile
    • AI
    • Big Data
    • Cloud
    • Database
    • DevOps
    • Integration
    • IoT
    • Java
    • Microservices
    • Open Source
    • Performance
    • Security
    • Web Dev
DZone > AI Zone > Training AI Models on Kubernetes

Training AI Models on Kubernetes

IBM has introduced an open source version of its Deep Learning service, and it runs on Kubernetes! Let's find out more!

Niklas Heidloff user avatar by
Niklas Heidloff
CORE ·
Aug. 06, 18 · AI Zone · News
Like (2)
Save
Tweet
4.73K Views

Join the DZone community and get the full member experience.

Join For Free

early this year, ibm announced deep learning as a service within watson studio. the core of this service is available as open source and can be run on kubernetes clusters. this allows developers and data scientists to train models with confidential data on-premises, for example on the kubernetes-based ibm cloud private .

the open source version of ibm's deep learning service is called fabric for deep learning . fabric for deep learning supports framework independent training of deep learning models on distributed hardware. for the training, cpus can be used as well as gpus . check out the documentation for a list of dl frameworks, versions, and processing units.

i had open sourced samples that show how to train visual recognition models with watson studio that can be deployed to edge devices via tensorflow lite and tensorflow.js . i extended one of these samples slightly to show how to train the model with fabric for deep learning instead. to do this, i only had to change a manifest file slightly since the format expected by watson studio is different.

this is the manifest file that describes how to invoke and run the training. for more details on how to run trainings on kubernetes, check out the readme of my project and the fabric for deep learning documentation .

name: retrain
description: retrain
version: "1.0"
gpus: 2
cpus: 8
memory: 4gb
learners: 1

data_stores:
  - id: sl-internal-os
    type: mount_cos
    training_data:
      container: nh-input
    training_results:
      container: nh-output
    connection:
      auth_url: http://169.62.129.231:32551
      user_name: test
      password: test

framework:
  name: tensorflow
  version: "1.5.0-gpu-py3"
  command: python3 retrain.py --bottleneck_dir ${result_dir}/bottlenecks --image_dir ${data_dir}/images --how_many_training_steps=1000 --architecture mobilenet_0.25_224 --output_labels ${result_dir}/labels.txt --output_graph ${result_dir}/graph.pb --model_dir ${data_dir} --learning_rate 0.01 --summaries_dir ${result_dir}/retrain_logs

evaluation_metrics:
  type: tensorboard
  in: "$job_state_dir/logs/tb"

to initiate the training, this manifest file and the training python code needs to be uploaded. in order to do this, you can either use the web user experience or a cli.

in my example, i've stored the data on my kubernetes cluster. fabric for deep learning comes with s3 based object storage which means that you can use the aws cli to upload and download data. alternatively you could also use object storage in the cloud, for example ibm's cloud object storage .

to find out more about fabric for deep learning, check out these resources .

Kubernetes AI Deep learning

Published at DZone with permission of Niklas Heidloff, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

Popular on DZone

  • Types of UI Design Patterns Depending on Your Idea
  • How To Use Cluster Mesh for Multi-Region Kubernetes Pod Communication
  • 6 Best Books to Learn Multithreading and Concurrency in Java
  • Top 20 Git Commands With Examples

Comments

AI Partner Resources

X

ABOUT US

  • About DZone
  • Send feedback
  • Careers
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • MVB Program
  • Become a Contributor
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 600 Park Offices Drive
  • Suite 300
  • Durham, NC 27709
  • support@dzone.com
  • +1 (919) 678-0300

Let's be friends:

DZone.com is powered by 

AnswerHub logo