DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Zones

Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks

How are you handling the data revolution? We want your take on what's real, what's hype, and what's next in the world of data engineering.

Generative AI has transformed nearly every industry. How can you leverage GenAI to improve your productivity and efficiency?

SBOMs are essential to circumventing software supply chain attacks, and they provide visibility into various software components.

Related

  • Containerizing AI: Hands-On Guide to Deploying ML Models With Docker and Kubernetes
  • Building an AI-Powered Text Analysis App With React: A Step-by-Step Guide
  • Building a Simple Todo App With Model Context Protocol (MCP)
  • AI-Driven Kubernetes Troubleshooting With DeepSeek and k8sgpt

Trending

  • How to Use AI to Understand Gaps in Your Resume and Job Descriptions
  • The OWASP Top 10 for LLM Applications: An Overview of AI Security Risks
  • A Software Engineer’s Guide to Thrive in Gen AI Era: Master It or Fade Out
  • Transform Settlement Process Using AWS Data Pipeline
  1. DZone
  2. Data Engineering
  3. AI/ML
  4. DeepSeek on Kubernetes: AI-Powered Reasoning at Scale

DeepSeek on Kubernetes: AI-Powered Reasoning at Scale

Deploy DeepSeek-R1 on Kubernetes using Ollama for inference and Open WebUI for seamless interaction. Supports local setups like KIND or cloud deployment.

By 
Rajesh Gheware user avatar
Rajesh Gheware
DZone Core CORE ·
Mar. 14, 25 · Analysis
Likes (3)
Comment
Save
Tweet
Share
4.7K Views

Join the DZone community and get the full member experience.

Join For Free

As artificial intelligence continues to evolve, deploying AI-powered applications efficiently and at scale has become critical. Kubernetes, the de facto orchestration platform, plays a crucial role in managing containerized AI workloads, ensuring scalability, resilience, and ease of management. 

In this article, we explore DeepSeek on Kubernetes, a deployment that integrates DeepSeek-R1, a powerful reasoning AI model, with Open WebUI for seamless interaction. 

Why Kubernetes for DeepSeek?

DeepSeek is an advanced reasoning model that naturally benefits from containerization and orchestration provided by Kubernetes. Kubernetes stands out from alternatives like Docker Swarm and Apache Mesos due to its mature ecosystem and extensive features tailored specifically for complex AI workloads. Here's why Kubernetes is ideal for deploying DeepSeek:

Scalability

Kubernetes simplifies scaling AI workloads with tools like Horizontal Pod Autoscaler (HPA) and Cluster Autoscaler. Imagine a scenario where DeepSeek faces a sudden surge in inference requests — Kubernetes seamlessly scales the pods and nodes automatically, ensuring consistent performance without manual intervention.

Resilience

Kubernetes ensures high resilience through automated pod rescheduling and self-healing capabilities. If a DeepSeek pod encounters issues such as resource constraints or node failures, Kubernetes quickly detects and redeploys the affected pod to a healthy node, minimizing downtime and maintaining continuous availability.

Service Discovery

Kubernetes provides built-in DNS-based service discovery and seamless management of microservices. DeepSeek’s inference services can effortlessly discover and connect to supporting microservices, like preprocessing modules or logging services, without the need for complex manual configuration, enhancing maintainability and flexibility.

Persistent Storage

Kubernetes PersistentVolumeClaims (PVCs) effectively handle AI model storage, training datasets, and checkpoints. This ensures critical data remains consistent and available even during updates, pod restarts, or node failures. For example, updating DeepSeek models or scaling inference pods becomes seamless and non-disruptive.

Load Balancing

Kubernetes offers intrinsic load-balancing capabilities, distributing workloads efficiently across multiple replicas. This capability is critical for DeepSeek to evenly distribute inference requests among multiple instances, optimizing resource utilization and significantly reducing response latency.

While alternatives like Docker Swarm offer simplicity, Kubernetes uniquely delivers comprehensive features essential for managing sophisticated AI models like DeepSeek, ensuring scalability, robustness, and operational ease.

Deploying DeepSeek on Kubernetes

1. Kubernetes Cluster Setup

In our setup, we have a three-node Kubernetes cluster with the following nodes:

Plain Text
 
$ kubectl get nodes
NAME                       STATUS   ROLES           AGE    VERSION
deepseek-control-plane     Ready    control-plane   6d5h   v1.32.0
deepseek-worker            Ready    <none>          6d5h   v1.32.0
deepseek-worker2           Ready    <none>          6d5h   v1.32.0


Even if Kubernetes nodes are not powered using GPU, DeepSeek-R1 will still function, although response times may be slower. GPU acceleration is recommended for optimal performance, especially for complex reasoning tasks.

Kubernetes clusters can be set up locally using tools like:

  • KIND (Kubernetes IN Docker)
  • Minikube
  • MicroK8s

If deployed on a cloud provider, the setup can be made securely accessible using an Ingress object to expose services through a web interface with proper authentication and TLS security.

2. Deploying DeepSeek-R1 With Ollama

DeepSeek-R1 is deployed within Kubernetes using Ollama, which handles AI model inference. Below is the Kubernetes manifest for the Ollama deployment:

YAML
 
apiVersion: apps/v1
kind: Deployment
metadata:
  name: ollama
  labels:
    app: ollama
spec:
  replicas: 1
  selector:
    matchLabels:
      app: ollama
  template:
    metadata:
      labels:
        app: ollama
    spec:
      containers:
      - name: ollama
        image: ollama/ollama:latest
        ports:
        - containerPort: 11434
        volumeMounts:
        - mountPath: /root/.ollama
          name: ollama-storage
        env:
        - name: OLLAMA_MODEL
          value: deepseek-r1:1.5b
        - name: OLLAMA_KEEP_ALIVE
          value: "-1"  
        - name: OLLAMA_NO_THINKING
          value: "true"
        - name: OLLAMA_SYSTEM_PROMPT
          value: "You are DeepSeek-R1, a reasoning model. Provide direct answers without detailed reasoning steps or <think> tags."
      volumes:
      - name: ollama-storage
        emptyDir: {}


3. Exposing Ollama as a Service

To allow other services to communicate with Ollama, we define a NodePort service:

YAML
 
apiVersion: v1
kind: Service
metadata:
  name: ollama-service
spec:
  selector:
    app: ollama
  ports:
    - protocol: TCP
      port: 11434
      targetPort: 11434
  type: NodePort


4. Deploying Open WebUI

For an interactive experience, we integrate Open WebUI, which connects to Ollama and provides a user-friendly interface. The deployment is as follows:

YAML
 
apiVersion: apps/v1
kind: Deployment
metadata:
  name: openweb-ui
  labels:
    app: openweb-ui
spec:
  replicas: 1
  selector:
    matchLabels:
      app: openweb-ui
  template:
    metadata:
      labels:
        app: openweb-ui
    spec:
      containers:
      - name: openweb-ui
        image: ghcr.io/open-webui/open-webui:main
        env:
        - name: WEBUI_NAME
          value: "DeepSeek India - Hardware Software Gheware"        
        - name: OLLAMA_BASE_URL
          value: "http://ollama-service:11434"  
        - name: OLLAMA_DEFAULT_MODEL
          value: "deepseek-r1:1.5b"             
        ports:
        - containerPort: 8080
        volumeMounts:
        - name: openweb-data
          mountPath: /app/backend/data
      volumes:
      - name: openweb-data
        persistentVolumeClaim:
          claimName: openweb-ui-pvc


5. Running Inference on DeepSeek-R1

To test the deployment, we can execute a command within the Ollama container:

Shell
 
kubectl exec -it deploy/ollama -- bash
ollama run deepseek-r1:1.5b


This command starts an interactive session with the AI model, allowing direct input queries.

Accessing Open WebUI

After deployment, Open WebUI is accessible by creating an ingress object pointing to the URL.

Plain Text
 
http://deepseek.gheware.com/auth


This interface allows users to interact with DeepSeek-R1 through a chat-based environment.

Conclusion

By deploying DeepSeek on Kubernetes, we achieve a scalable, resilient, and production-ready AI reasoning system. Kubernetes efficiently orchestrates DeepSeek-R1, ensuring smooth model execution and user interaction through Open WebUI. This architecture can be further extended by adding GPU acceleration, auto-scaling, and monitoring with Prometheus and Grafana.

For AI practitioners, Kubernetes offers an excellent foundation for deploying and managing reasoning models like DeepSeek-R1.

AI Kubernetes app

Published at DZone with permission of Rajesh Gheware. See the original article here.

Opinions expressed by DZone contributors are their own.

Related

  • Containerizing AI: Hands-On Guide to Deploying ML Models With Docker and Kubernetes
  • Building an AI-Powered Text Analysis App With React: A Step-by-Step Guide
  • Building a Simple Todo App With Model Context Protocol (MCP)
  • AI-Driven Kubernetes Troubleshooting With DeepSeek and k8sgpt

Partner Resources

×

Comments

The likes didn't load as expected. Please refresh the page and try again.

ABOUT US

  • About DZone
  • Support and feedback
  • Community research
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 100
  • Nashville, TN 37211
  • [email protected]

Let's be friends: