DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Zones

Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks

Generative AI has transformed nearly every industry. How can you leverage GenAI to improve your productivity and efficiency?

SBOMs are essential to circumventing software supply chain attacks, and they provide visibility into various software components.

Related

  • Containerization and AI: Streamlining the Deployment of Machine Learning Models
  • Docker Model Runner: A Game Changer in Local AI Development (C# Developer Perspective)
  • Docker Model Runner: Running AI Models Locally Made Simple
  • Serverless vs Containers: Choosing the Right Architecture for Your Application

Trending

  • How to Submit a Post to DZone
  • Testing Java Applications With WireMock and Spring Boot
  • 12 Principles for Better Software Engineering
  • Understanding k-NN Search in Elasticsearch
  1. DZone
  2. Software Design and Architecture
  3. Cloud Architecture
  4. Containerizing AI: Hands-On Guide to Deploying ML Models With Docker and Kubernetes

Containerizing AI: Hands-On Guide to Deploying ML Models With Docker and Kubernetes

Containerize your ML model with Docker and deploy it on AWS EKS using Kubernetes in this hands-on guide. Learn to build, serve, and scale your models with ease.

By 
Bhanu Sekhar Guttikonda user avatar
Bhanu Sekhar Guttikonda
·
Jun. 27, 25 · Tutorial
Likes (3)
Comment
Save
Tweet
Share
2.0K Views

Join the DZone community and get the full member experience.

Join For Free

Containerization packages applications into lightweight, portable units. For machine learning, this ensures reproducible environments and easy deployments.  For example, containers bundle the ML model code with its exact dependencies, so results stay consistent across machines They can then be run on any Docker host or cloud, improving portability. Orchestration platforms like Kubernetes add scalability, automatically spinning up or down containers as needed.  Containers also isolate the ML environment from other applications, preventing dependency conflicts. In short, packaging your ML model in a Docker container makes it much easier to move, run, and scale reliably in production.

  • Reproducibility: Container images bundle the model, libraries and runtime (e.g. Python, scikit-learn), so the ML service behaves the same on any system.
  • Portability: The same container runs on a developer’s laptop, CI pipeline, or cloud VM without changes.
  • Scalability: Container platforms (Docker + Kubernetes) can replicate instances under load. Kubernetes can auto-scale pods running your ML service to meet demand.
  • Isolation: Each container is sandboxed from others and the host OS, avoiding version conflicts or “works on my machine” problems.

With these benefits, let’s walk through a concrete example: training a simple model in Python, serving it via a Flask API, and then containerizing and deploying it on an AWS EKS Kubernetes cluster.

Building and Serving a Sample ML Model

First, create a simple Scikit-Learn model. We use the Iris dataset and train a decision tree, then save it with joblib.  In code:

Python
 
# train_model.py
from sklearn.datasets import load_iris
from sklearn.tree import DecisionTreeClassifier
import joblib

iris = load_iris()
X, y = iris.data, iris.target
model = DecisionTreeClassifier()
model.fit(X, y)
joblib.dump(model, 'model.pkl')


This produces model.pkl.  Next, write a REST API to serve predictions. For example, use Flask to load the model and predict based on JSON input:

Python
 
# app.py
from flask import Flask, request, jsonify
import joblib

app = Flask(__name__)
model = joblib.load('model.pkl')

@app.route('/predict', methods=['POST'])
def predict():
    data = request.get_json()
    features = data.get('features')
    prediction = model.predict([features])
    return jsonify({'prediction': int(prediction[0])})

if __name__ == '__main__':
    app.run(host='0.0.0.0', port=5000)


Here the client sends a JSON like {"features": [5.1, 3.5, 1.4, 0.2]}, and the server returns the predicted class.

Dockerizing the ML Service

To containerize, we write a Dockerfile. Docker uses a client-server architecture: the Docker CLI interacts with the Docker daemon to build images, fetch layers from a registry, and run containers. The diagram below illustrates this architecture:

Docker client-server architecture.

Docker uses a client-server model where the docker CLI talks to the Docker daemon, which manages images and containers. Each Docker image is a layered file system that includes your application code and dependencies. Here we will package our Flask API and model into an image.

Create a Dockerfile in the project directory:

Dockerfile
 
# Dockerfile
FROM python:3.9-slim
WORKDIR /app
COPY requirements.txt ./
RUN pip install -r requirements.txt
COPY model.pkl app.py ./
EXPOSE 5000
CMD ["python", "app.py"]


Also include a requirements.txt listing our Python dependencies:

Plain Text
 
flask
scikit-learn
joblib


Build the Docker image locally:(bash)

docker build -t my-ml-app:latest 

This creates an image my-ml-app:latest containing our model server. You can verify by running: (bash)

curl -X POST -H "Content-Type: application/json" \    

-d '{"features": [5.1, 3.5, 1.4, 0.2]}' \      http://localhost:5000/predict

You should get a JSON response like:

JSON
 
{"prediction":0}


With this, our model is containerized and can run anywhere Docker is available.

Kubernetes 101: Pods, Deployments, and Services

A Kubernetes cluster is made up of a control plane and multiple worker nodes. The control plane, sometimes called the master, manages essential components like etcd (used for storing state), the API server, the scheduler, and various controllers. Worker nodes run your containers inside Pods. The architecture looks like this:


Kubernetes architecture


Kubernetes cluster architecture with control plane and worker nodes. Kubernetes clusters follow a master-worker model. The control plane (left) holds cluster state (etcd, API server, scheduler, controller-manager). Worker nodes (right) run kubelet and proxy agents, and host Pods with your containers.

Key concepts:

  • Pod: The smallest deployable unit. A Pod wraps one or more containers that share network/storage. Pods run on nodes and are treated as a single unit.
  • Deployment: Deployment: A controller responsible for overseeing and maintaining a group of Pods, ensuring the desired number are running and up-to-date. You declare a Deployment specifying how many replicas you want, and Kubernetes makes sure that many Pods are running.
  • Service: An abstraction that groups a set of Pods and establishes a consistent policy for accessing them, regardless of their individual IP addresses or lifecycle. A Service provides a stable network endpoint (ClusterIP or LoadBalancer) for Pods, enabling load-balancing and discovery.

In practice, we’ll create a Deployment to keep, say, two copies of our model server running, and a Service to expose them.

Deploying to AWS EKS

Now we push the Docker image to a registry and deploy to Kubernetes on AWS EKS (Elastic Kubernetes Service). First, tag and push your image (using Docker Hub or ECR). For example, with Docker Hub: (bash)

docker tag my-ml-app:latest your_dockerhub_user/my-ml-app:latest docker push your_dockerhub_user/my-ml-app:latest

Replace your_dockerhub_user with your Docker Hub username.

Next, set up an EKS cluster (you need eksctl and AWS CLI configured). If you don’t have a cluster yet, AWS provides guides to create one. For example: (bash)

eksctl create cluster --name ml-model-cluster --region us-west-2 --nodes 2

This creates a basic EKS cluster with two worker nodes. Ensure your kubectl context is pointing to the new cluster (AWS docs explain how to connect).

Create a Kubernetes Deployment manifest (deployment.yaml) that uses your container image:

YAML
 
apiVersion: apps/v1
kind: Deployment
metadata:
  name: ml-model-deployment
spec:
  replicas: 2
  selector:
    matchLabels:
      app: ml-model
  template:
    metadata:
      labels:
        app: ml-model
    spec:
      containers:
      - name: ml-model
        image: your_dockerhub_user/my-ml-app:latest
        ports:
        - containerPort: 9000


And a Service (service.yaml) to expose it externally (using type LoadBalancer on EKS):

YAML
 
apiVersion: v1
kind: Service
metadata:
  name: ml-model-service
spec:
  type: LoadBalancer
  selector:
    app: ml-model
  ports:
  - protocol: TCP
    port: 80
    targetPort: 9000


Apply these to the cluster:(bash)

kubectl apply -f deployment.yaml

kubectl apply -f service.yaml 

Check the status:

kubectl get deployments

kubectl get pods

kubectl get svc ml-model-service

The Service will get an external IP (or AWS DNS) when the LoadBalancer is provisioned. Once ready, you can send a request to that address on port 80, and it will forward to your pods on port 9000.

Conclusion

You’ve now containerized a scikit-learn model, served it with Flask, and deployed it on Kubernetes. For production readiness, consider the following best practices:

  • Scaling: Use kubectl scale or Kubernetes autoscaling to adjust replicas based on CPU/memory or request rate.
  • Monitoring: Deploy monitoring to track pod health and model performance. Collect logs (e.g. with Fluentd/Elasticsearch) for troubleshooting.
  • CI/CD: Automate the workflow with pipelines (e.g. GitHub Actions, Jenkins, or AWS CodePipeline) that rebuild images and update Deployments on new model versions.
  • Security: Use Kubernetes RBAC and network policies to secure access. Consider scanning images for vulnerabilities and using private registries (AWS ECR) with IAM integration.
  • Advanced ML Ops: Explore tools like Kubeflow or Seldon for specialized model serving, and MLflow or Neptune for model tracking. Use GPUs or multi-arch images if your model needs them.

By containerizing your model and leveraging Kubernetes, you gain portability, scalability, and consistency. You can now iterate on your ML service, confidently deploying updates across cloud environments. With further automation and monitoring in place, your containerized ML service will be ready for production workloads and growth.

AI Kubernetes Docker (software)

Opinions expressed by DZone contributors are their own.

Related

  • Containerization and AI: Streamlining the Deployment of Machine Learning Models
  • Docker Model Runner: A Game Changer in Local AI Development (C# Developer Perspective)
  • Docker Model Runner: Running AI Models Locally Made Simple
  • Serverless vs Containers: Choosing the Right Architecture for Your Application

Partner Resources

×

Comments

The likes didn't load as expected. Please refresh the page and try again.

ABOUT US

  • About DZone
  • Support and feedback
  • Community research
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 100
  • Nashville, TN 37211
  • [email protected]

Let's be friends: