Deploying Machine Learning Workflows On LKE with Kubeflow
If you're working with machine learning workloads, take a look at how you can use the Linode Kubernetes Service to help scale your workflows.
Join the DZone community and get the full member experience.Join For Free
Teams that work with Machine Learning (ML) workloads in production know that added complexity can bring projects for a grinding halt. While deploying simple ML workloads might seem like an easy task, the process becomes a lot more involved when you begin to scale and distribute these loads and implement tools like Kubernetes. Although Kubernetes allows teams to rapidly scale their organization's infrastructure, it also adds a layer of complexity that can become a major burden without the right tools.
Today I'm going to introduce you to an OSS project known as Kubeflow that seeks to assist engineering teams with deploying ML workloads into production in Kubernetes. The Kubeflow project is dedicated to making deployments of machine learning (ML) workflows on Kubernetes simple, portable and scalable.
What is Kubeflow?
Kubeflow is the machine learning toolkit for Kubernetes. Learn about Kubeflow use cases here.
To use Kubeflow, the basic workflow is:
- Download and run the Kubeflow deployment binary.
- Customize the resulting configuration files.
- Run the specified script to deploy your containers to your specific environment.
You can adapt the configuration to choose the platforms and services that you want to use for each stage of the ML workflow: data preparation, model training, prediction serving, and service management.
You can choose to deploy your Kubernetes workloads locally, on-premises, or to a cloud environment.
Deploying Kubeflow to Linode Kubernetes Service
This guide describes how to use the kfctl CLI to deploy Kubeflow on Linode Kubernetes Service.
- Install kubectl
- Create LKS cluster
.kubeconfig file to point to LKS cluster
We are going to use the Kubeflow Operator to help deploy, monitor, and manage the lifecycle of Kubeflow. It is built using the Operator Framework which offers an open source toolkit to build, test, package operators and manage the lifecycle of operators.
The Kubeflow Operator is currently in incubation phase and is based on this design doc. It is built on top of kfdef CR, and uses kfctlas the nucleus for Controller.
- Clone this repository and deploy the CRD and controller
2. Deploy kfdef. You can optionally apply
ResourceQuota if your Kubernetes version is 1.15+, which will allow only one kfdef instance or one deployment of Kubeflow on this cluster, which follows the singleton model.
ResourceQuota is used to provide constraints that only one instance of kfdef is allowed within the Kubeflow namespace.
The above can point to a remote URL or to a local kfdef file. For example, the command will be:
Since we are using Linode, you will obviously replace IBM Cloud with Linode!
Testing Watcher and Reconciler
One of the major benefits of using kfctl as an Operator is to leverage the functionalities around being able to watch and reconcile your Kubeflow deployments. The Operator is watching all the resources with the
kfctl label. If one of the resources is deleted, the reconciler will be triggered and re-apply the kfdef to the Kubernetes Cluster.
- Check the
tf-job-operatordeployment is running.
3. Wait for 10 to 15 seconds, then check the
tf-job-operator deployment again. You will be able to see that the deployment is being recreated by the Operator's reconciliation logic.
Delete KubeFlow deployment
Delete KubeFlow Operator
Deploying a Basic Kubeflow Pipeline
Now that you have Kubeflow running, let's port-forward to the Istio Gateway so that we can access the central UI. Learn more about Istio and its capabilities here.
Access the UI
Use the following command to set up port forwarding to the Istio gateway.
Access the central navigation dashboard at:
Depending on how you’ve configured Kubeflow, not all UIs work behind port-forwarding to the reverse proxy. For some web applications, you need to configure the base URL on which the app is serving.
Open the Pipelines UI
When Kubeflow is running, access the Kubeflow UI at
https://localhost:8080. The Kubeflow UI looks like this:
Click Pipelines to access the pipelines UI. The pipelines UI looks like this:
Run a Basic Pipeline
The pipelines UI offers a few samples that you can use to try out pipelines quickly. The steps below show you how to run a basic sample that includes some Python operations, but doesn’t include a machine learning (ML) workload:
1. Click the name of the sample, [Sample] Basic - Parallel Execution, on the pipelines UI:
2. Click Create an experiment:
3. Follow the prompts to create an experiment and then create a run. The sample supplies default values for all the parameters you need. The following screenshot assumes you’ve already created an experiment named My experiment and are now creating a run named My first run:
4. Click Start to create the run.
5. Click the name of the run on the Experiments dashboard:
6. Explore the graph and other aspects of your run by clicking on the components of the graph and other UI elements:
And that's about it! Now you should be ready to start taking the complexity out of running your ML workloads in your own Kubernetes clusters with Kubeflow. I hope you liked this post! More to come soon.
Until then, here are some more Kubernetes and Docker best practices for managing and deploying containers.
This article was originally published on appfleet and has been permitted to be shared on DZone with appropriate credits.
Published at DZone with permission of Sudip Sengupta. See the original article here.
Opinions expressed by DZone contributors are their own.