DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Zones

Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks

Generative AI has transformed nearly every industry. How can you leverage GenAI to improve your productivity and efficiency?

SBOMs are essential to circumventing software supply chain attacks, and they provide visibility into various software components.

Related

  • An Introduction to BentoML: A Unified AI Application Framework
  • Docker Model Runner: A Game Changer in Local AI Development (C# Developer Perspective)
  • Containerizing AI: Hands-On Guide to Deploying ML Models With Docker and Kubernetes
  • Docker Model Runner: Streamlining AI Deployment for Developers

Trending

  • The Invisible Risk in Your Middleware: A Next.js Flaw You Shouldn’t Ignore
  • Optimizing Cloud Costs With Serverless Architectures: A Technical Perspective
  • Fraud Detection in Mobility Services With Apache Kafka and Flink
  • Why Traditional CI/CD Falls Short for Cloud Infrastructure
  1. DZone
  2. Data Engineering
  3. AI/ML
  4. Docker Model Runner: Running AI Models Locally Made Simple

Docker Model Runner: Running AI Models Locally Made Simple

Docker Model Runner: run AI models locally with zero setup. Pull from Docker Hub, chat via CLI or API. OpenAI-compatible. Beta.

By 
Suleiman Dibirov user avatar
Suleiman Dibirov
DZone Core CORE ·
Jul. 01, 25 · Tutorial
Likes (2)
Comment
Save
Tweet
Share
1.9K Views

Join the DZone community and get the full member experience.

Join For Free

Docker has released an exciting new beta feature that's set to revolutionize how developers work with generative AI. Docker Model Runner enables you to download, run, and manage AI models directly on your local machine without the complexity of setting up elaborate infrastructure.

What Is Docker Model Runner and Why Should You Care?

Docker Model Runner is a feature that dramatically simplifies AI model management for local development. Currently in beta testing, it's available in Docker Desktop version 4.40 and above across multiple platforms:

  • macOS: Full support on Apple Silicon processors
  • Windows: Supported with NVIDIA GPU acceleration available
  • Linux: Available as a separate package for Docker Engine installations

Key Capabilities

The plugin empowers developers to:

  • Download models directly from Docker Hub (specifically from the ai namespace, which hosts all available models)
  • Execute AI models straight from the command line
  • Manage local models with full CRUD operations (add, view, remove)
  • Interact with models through single prompts or interactive chat sessions

Smart Resource Management

One of Docker Model Runner's standout features is its intelligent resource optimization. Models are pulled from Docker Hub only on first use and cached locally thereafter. They're loaded into memory exclusively during query execution and automatically unloaded when idle. While initial downloads can be time-consuming due to modern AI models' substantial size, subsequent access is lightning-fast thanks to local caching.

Another compelling advantage is the built-in OpenAI-compatible API support, making integration with existing applications seamless.

Essential Commands and Usage

Checking Service Status

Shell
 
$ docker model status
Docker Model Runner is running


Viewing Available Commands

Shell
 
$ docker model help
Usage: docker model COMMAND

Docker Model Runner

Commands:
  inspect   Display detailed information on one model
  list      List the available models that can be run with the Docker Model Runner
  pull      Download a model
  rm        Remove a model downloaded from Docker Hub
  run       Run a model with the Docker Model Runner
  status    Check if the Docker Model Runner is running
  version   Show the Docker Model Runner version

Run 'docker model COMMAND --help' for more information on a command.


Model Management Operations

Downloading a Model

Shell
 
$ docker model pull ai/smollm2
Downloaded: 0.00 MB
Model ai/smollm2 pulled successfully


Note: The download progress display currently shows 0.00 MB after completion — this is a known beta issue.

Listing Local Models

Shell
 
$ docker model list
MODEL       PARAMETERS  QUANTIZATION    ARCHITECTURE  MODEL ID      CREATED     SIZE
ai/smollm2  361.82 M    IQ2_XXS/Q4_K_M  llama        354bf30d0aa3  2 weeks ago 256.35 MiB


Running Models

Single prompt execution:

Shell
 
$ docker model run ai/smollm2 "Hello, how are you?"
Hello! I'm doing well, thank you for asking.


Interactive chat mode:

Shell
 
$ docker model run ai/smollm2
Interactive chat mode started. Type '/bye' to exit.
> Hello there!
Hello! How can I help you today?
> /bye
Chat session ended.


Removing Models

Shell
 
$ docker model rm ai/smollm2
Model ai/smollm2 removed successfully


Viewing Logs for Troubleshooting

Shell
 
$ docker model logs
# Or through Docker Desktop GUI: Models → Logs tab


Application Integration Through OpenAI-Compatible APIs

Docker Model Runner exposes OpenAI-compatible API endpoints with multiple access methods. Here are examples for different scenarios:

From within a container:

Shell
 
#!/bin/sh
curl http://model-runner.docker.internal/engines/llama.cpp/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "ai/smollm2",
    "messages": [
      {
        "role": "system",
        "content": "You are a helpful assistant."
      },
      {
        "role": "user",
        "content": "Write a brief explanation of containerization benefits."
      }
    ]
  }'


From the host via TCP (requires enabling TCP support):

Shell
 
# First enable TCP support
$ docker desktop enable model-runner --tcp 12434

# Then make API calls
#!/bin/sh
curl http://localhost:12434/engines/llama.cpp/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "ai/smollm2",
    "messages": [
      {
        "role": "system",
        "content": "You are a helpful assistant."
      },
      {
        "role": "user",
        "content": "Explain microservices architecture in simple terms."
      }
    ]
  }'


From the host via Unix socket:

Shell
 
#!/bin/sh
curl --unix-socket $HOME/.docker/run/docker.sock \
  localhost/exp/vDD4.40/engines/llama.cpp/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "ai/smollm2",
    "messages": [
      {
        "role": "system",
        "content": "You are a helpful assistant."
      },
      {
        "role": "user",
        "content": "Explain microservices architecture in simple terms."
      }
    ]
  }'

From the host via Unix socket


Quick Start With Sample GenAI Application

To immediately experience Docker Model Runner with a complete generative AI application:

  1. Clone the sample repository: $ git clone https://github.com/docker/hello-genai.git
  2. Navigate to the project directory: $ cd hello-genai
  3. Execute the run script to download the selected model and launch the application: $ ./run.sh
  4. Open the application in your browser using the URL specified in the repository README.

Open the application in your browser


Available API Endpoints

Once Docker Model Runner is active, the following APIs become available:

Container-Internal Endpoints

Plain Text
 
http://model-runner.docker.internal/

# Model Management
POST /models/create
GET /models
GET /models/{namespace}/{name}
DELETE /models/{namespace}/{name}

# OpenAI-Compatible Endpoints
GET /engines/llama.cpp/v1/models
GET /engines/llama.cpp/v1/models/{namespace}/{name}
POST /engines/llama.cpp/v1/chat/completions
POST /engines/llama.cpp/v1/completions
POST /engines/llama.cpp/v1/embeddings


Host Access Options

TCP Access (when enabled):

Plain Text
 
http://localhost:12434/


Host Gateway Access from containers:

Plain Text
 
http://172.17.0.1:12434/


Unix socket access: The same endpoints are available on /var/run/docker.sock with the beta prefix /exp/vDD4.40

Note: The llama.cpp component can be omitted. For example: POST /engines/v1/chat/completions

Docker Compose Integration

For Docker Compose projects, you may need to add an extra_hosts directive to access the model runner:

YAML
 
extra_hosts:
  - "model-runner.docker.internal:host-gateway"


Known Issues and Troubleshooting

Command Recognition Problems

If you encounter:

Shell
 
docker: 'model' is not a docker command


This indicates Docker cannot locate the plugin. Resolution:

Shell
 
$ ln -s /Applications/Docker.app/Contents/Resources/cli-plugins/docker-model ~/.docker/cli-plugins/docker-model


Resource Management Limitations

Docker Model Runner currently lacks safeguards against loading models that exceed available system resources. Attempting to run oversized models may cause significant system slowdowns or temporary unresponsiveness. This is particularly problematic when running LLMs without sufficient GPU memory or system RAM.

Docker Model CLI Digest Support

The Docker Model CLI currently lacks consistent support for specifying models by image digest. As a workaround, refer to models by name instead of digest.

Download Failure Handling

When model downloads fail, docker model run may still initiate the chat interface despite the model being unavailable. In such cases, manually retry the docker model pull command.

Configuration Management

Docker Desktop Setup

Docker Model Runner is enabled by default in Docker Desktop. To modify this setting:

  1. Open Docker Desktop settings.
  2. Navigate to Beta under Features in development (for versions 4.42+) or Experimental features (for versions 4.41 and earlier).
  3. Toggle the Enable Docker Model Runner checkbox.
  4. On Windows with a supported NVIDIA GPU, also enable Enable GPU-backed inference
  5. Click Apply & restart.

Docker Engine Installation (Linux)

For Docker Engine on Linux, install the Docker Model Runner package:

Ubuntu/Debian

Shell
 
$ sudo apt-get update
$ sudo apt-get install docker-model-plugin


RHEL/Fedora

Shell
 
$ sudo dnf update
$ sudo dnf install docker-model-plugin


Test the installation:

Shell
 
$ docker model version
$ docker model run ai/smollm2


Framework Integration

Docker Model Runner now supports integration with popular development frameworks:

  • Testcontainers (Java and Go)
  • Docker Compose (with proper host configuration)

Conclusion

Docker Model Runner makes running AI models locally much easier than before. You don't need to set up complex systems or pay for cloud APIs anymore - everything runs on your own computer.

Since it's still in beta, there are some bugs and missing features. But Docker is working to fix these issues and wants to hear what users think. You can share feedback using the "Give feedback" link next to the Docker Model Runner settings.

If you're building AI apps or just want to try out AI models without the hassle, Docker Model Runner is a great tool to have. As Docker keeps improving it, this will likely become a must-have tool for anyone building AI applications.

Have you tried Docker Model Runner in your projects? Share your experiences and use cases in the comments.

AI Command-line interface Docker (software)

Opinions expressed by DZone contributors are their own.

Related

  • An Introduction to BentoML: A Unified AI Application Framework
  • Docker Model Runner: A Game Changer in Local AI Development (C# Developer Perspective)
  • Containerizing AI: Hands-On Guide to Deploying ML Models With Docker and Kubernetes
  • Docker Model Runner: Streamlining AI Deployment for Developers

Partner Resources

×

Comments

The likes didn't load as expected. Please refresh the page and try again.

ABOUT US

  • About DZone
  • Support and feedback
  • Community research
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 100
  • Nashville, TN 37211
  • [email protected]

Let's be friends: