{{announcement.body}}
{{announcement.title}}

Deploying Machine Learning Workflows On LKE with Kubeflow

DZone 's Guide to

Deploying Machine Learning Workflows On LKE with Kubeflow

If you're working with machine learning workloads, take a look at how you can use the Linode Kubernetes Service to help scale your workflows.

· Cloud Zone ·
Free Resource

Introduction

Teams that work with Machine Learning (ML) workloads in production know that added complexity can bring projects for a grinding halt. While deploying simple ML workloads might seem like an easy task, the process becomes a lot more involved when you begin to scale and distribute these loads and implement tools like Kubernetes. Although Kubernetes allows teams to rapidly scale their organization's infrastructure, it also adds a layer of complexity that can become a major burden without the right tools. 

Today I'm going to introduce you to an OSS project known as Kubeflow that seeks to assist engineering teams with deploying ML workloads into production in Kubernetes. The Kubeflow project is dedicated to making deployments of machine learning (ML) workflows on Kubernetes simple, portable and scalable.

What is Kubeflow?

Kubeflow is the machine learning toolkit for Kubernetes. Learn about Kubeflow use cases here.

To use Kubeflow, the basic workflow is:

  • Download and run the Kubeflow deployment binary.
  • Customize the resulting configuration files.
  • Run the specified script to deploy your containers to your specific environment.

You can adapt the configuration to choose the platforms and services that you want to use for each stage of the ML workflow: data preparation, model training, prediction serving, and service management.

You can choose to deploy your Kubernetes workloads locally, on-premises, or to a cloud environment.

Deploying Kubeflow to Linode Kubernetes Service

This guide describes how to use the kfctl CLI to deploy Kubeflow on Linode Kubernetes Service.

Prerequisites

  • Install kubectl
  • Create LKS cluster
  • Modify .kube config file to point to LKS cluster

We are going to use the Kubeflow Operator to help deploy, monitor, and manage the lifecycle of Kubeflow. It is built using the Operator Framework which offers an open source toolkit to build, test, package operators and manage the lifecycle of operators.

The Kubeflow Operator is currently in incubation phase and is based on this design doc. It is built on top of kfdef CR, and uses kfctlas the nucleus for Controller.

Deployment Instructions

  1. Clone this repository and deploy the CRD and controller
Shell
 




xxxxxxxxxx
1


 
1
# git clone https://github.com/kubeflow/kfctl.git && cd kfctl
2
OPERATOR_NAMESPACE=operators
3
kubectl create ns ${OPERATOR_NAMESPACE}
4
kubectl create -f deploy/crds/kfdef.apps.kubeflow.org_kfdefs_crd.yaml
5
kubectl create -f deploy/service_account.yaml -n ${OPERATOR_NAMESPACE}
6
kubectl create clusterrolebinding kubeflow-operator --clusterrole cluster-admin --serviceaccount=${OPERATOR_NAMESPACE}:kubeflow-operator
7
kubectl create -f deploy/operator.yaml -n ${OPERATOR_NAMESPACE}



2. Deploy kfdef. You can optionally apply ResourceQuota if your Kubernetes version is 1.15+, which will allow only one kfdef instance or one deployment of Kubeflow on this cluster, which follows the singleton model.  ResourceQuota is used to provide constraints that only one instance of kfdef is allowed within the Kubeflow namespace.

Shell
 




xxxxxxxxxx
1


 
1
KUBEFLOW_NAMESPACE=kubeflow
2
kubectl create ns ${KUBEFLOW_NAMESPACE}
3
# kubectl create -f deploy/crds/kfdef_quota.yaml -n ${KUBEFLOW_NAMESPACE} # only deploy this if the k8s cluster is 1.15+ and has resource quota support
4
kubectl create -f <kfdef> -n ${KUBEFLOW_NAMESPACE}


The above can point to a remote URL or to a local kfdef file. For example, the command will be:

Shell
 




xxxxxxxxxx
1


 
1
kubectl create -f https://raw.githubusercontent.com/kubeflow/manifests/master/kfdef/kfctl_ibm.yaml -n ${KUBEFLOW_NAMESPACE}


Since we are using Linode, you will obviously replace IBM Cloud with Linode! 

Testing Watcher and Reconciler

One of the major benefits of using kfctl as an Operator is to leverage the functionalities around being able to watch and reconcile your Kubeflow deployments. The Operator is watching all the resources with the kfctl label. If one of the resources is deleted, the reconciler will be triggered and re-apply the kfdef to the Kubernetes Cluster.

  1. Check the tf-job-operator deployment is running.
Shell
 




xxxxxxxxxx
1


 
1
kubectl get deploy -n ${KUBEFLOW_NAMESPACE} tf-job-operator
2
# NAME                                          READY   UP-TO-DATE   AVAILABLE   AGE
3
# tf-job-operator                               1/1     1            1           7m15s



2. Delete the tf-job-operator deployment
Shell
 




xxxxxxxxxx
1


 
1
kubectl delete deploy -n ${KUBEFLOW_NAMESPACE} tf-job-operator
2
# deployment.extensions "tf-job-operator" deleted



3. Wait for 10 to 15 seconds, then check the tf-job-operator deployment again. You will be able to see that the deployment is being recreated by the Operator's reconciliation logic.

Shell
 




x


 
1
kubectl get deploy -n ${KUBEFLOW_NAMESPACE} tf-job-operator
2
# NAME                                          READY   UP-TO-DATE   AVAILABLE   AGE
3
# tf-job-operator                               0/1     0            0           10s


Delete KubeFlow

Delete KubeFlow deployment

Shell
 




xxxxxxxxxx
1


 
1
kubectl delete kfdef -n ${KUBEFLOW_NAMESPACE} --all


Delete KubeFlow Operator

Shell
 




xxxxxxxxxx
1


 
1
kubectl delete -f deploy/operator.yaml -n ${OPERATOR_NAMESPACE}
2
kubectl delete clusterrolebinding kubeflow-operator
3
kubectl delete -f deploy/service_account.yaml -n ${OPERATOR_NAMESPACE}
4
kubectl delete -f deploy/crds/kfdef.apps.kubeflow.org_kfdefs_crd.yaml
5
kubectl delete ns ${OPERATOR_NAMESPACE}



Deploying a Basic Kubeflow Pipeline

Now that you have Kubeflow running, let's port-forward to the Istio Gateway so that we can access the central UI. Learn more about Istio and its capabilities here

Access the UI

Use the following command to set up port forwarding to the Istio gateway.

Shell
 




xxxxxxxxxx
1


1
export NAMESPACE=istio-system
2
kubectl port-forward -n istio-system svc/istio-ingressgateway 8080:80


Access the central navigation dashboard at:

Shell
 




xxxxxxxxxx
1


 
1
http://localhost:8080/


Depending on how you’ve configured Kubeflow, not all UIs work behind port-forwarding to the reverse proxy. For some web applications, you need to configure the base URL on which the app is serving.

Open the Pipelines UI

When Kubeflow is running, access the Kubeflow UI at https://localhost:8080. The Kubeflow UI looks like this:

Kubeflow UI

Kubeflow UI


Click Pipelines to access the pipelines UI. The pipelines UI looks like this:

Pipelines UI

Pipelines UI


Run a Basic Pipeline

The pipelines UI offers a few samples that you can use to try out pipelines quickly. The steps below show you how to run a basic sample that includes some Python operations, but doesn’t include a machine learning (ML) workload:

1. Click the name of the sample, [Sample] Basic - Parallel Execution, on the pipelines UI:

Parallel Execution

Parallel Execution


2. Click Create an experiment:

Create an experiement

Create an experiement


3. Follow the prompts to create an experiment and then create a run. The sample supplies default values for all the parameters you need. The following screenshot assumes you’ve already created an experiment named My experiment and are now creating a run named My first run:

Starting a new run

Starting a new run


4. Click Start to create the run.

5. Click the name of the run on the Experiments dashboard:

Run name

Run name


6. Explore the graph and other aspects of your run by clicking on the components of the graph and other UI elements:

Finishing up

And that's about it! Now you should be ready to start taking the complexity out of running your ML workloads in your own Kubernetes clusters with Kubeflow. I hope you liked this post! More to come soon.

Until then, here are some more Kubernetes and Docker best practices for managing and deploying containers. 


This article was originally published on appfleet and has been permitted to be shared on DZone with appropriate credits.  

Topics:
ai ,cloud ,kubeflow ,kubernetes ,linode ,lke ,machine learning ,scaling

Published at DZone with permission of Sudip Sengupta . See the original article here.

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}