DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Related

  • An Introduction to BentoML: A Unified AI Application Framework
  • The Rise of AI Orchestrators
  • Run AI Agents Safely With Docker Sandboxes: A Complete Walkthrough
  • Using Agentforce Vibes to Understand Your Salesforce Org

Trending

  • Prompt Injection Is Real, So I Built a Python Firewall for LLM Pipelines
  • How to Save Money Using Custom LLMs for Specific Tasks
  • Migrate a Hardcoded LangGraph Agent to LaunchDarkly AI Configs in 20 Minutes
  • Stop Debugging Glue Jobs Manually: Building an Agentic Observability Layer for Data Pipelines
  1. DZone
  2. Data Engineering
  3. AI/ML
  4. Docker Model Runner: Running AI Models Locally Made Simple

Docker Model Runner: Running AI Models Locally Made Simple

Docker Model Runner: run AI models locally with zero setup. Pull from Docker Hub, chat via CLI or API. OpenAI-compatible. Beta.

By 
Suleiman Dibirov user avatar
Suleiman Dibirov
DZone Core CORE ·
Jul. 01, 25 · Tutorial
Likes (2)
Comment
Save
Tweet
Share
3.8K Views

Join the DZone community and get the full member experience.

Join For Free

Docker has released an exciting new beta feature that's set to revolutionize how developers work with generative AI. Docker Model Runner enables you to download, run, and manage AI models directly on your local machine without the complexity of setting up elaborate infrastructure.

What Is Docker Model Runner and Why Should You Care?

Docker Model Runner is a feature that dramatically simplifies AI model management for local development. Currently in beta testing, it's available in Docker Desktop version 4.40 and above across multiple platforms:

  • macOS: Full support on Apple Silicon processors
  • Windows: Supported with NVIDIA GPU acceleration available
  • Linux: Available as a separate package for Docker Engine installations

Key Capabilities

The plugin empowers developers to:

  • Download models directly from Docker Hub (specifically from the ai namespace, which hosts all available models)
  • Execute AI models straight from the command line
  • Manage local models with full CRUD operations (add, view, remove)
  • Interact with models through single prompts or interactive chat sessions

Smart Resource Management

One of Docker Model Runner's standout features is its intelligent resource optimization. Models are pulled from Docker Hub only on first use and cached locally thereafter. They're loaded into memory exclusively during query execution and automatically unloaded when idle. While initial downloads can be time-consuming due to modern AI models' substantial size, subsequent access is lightning-fast thanks to local caching.

Another compelling advantage is the built-in OpenAI-compatible API support, making integration with existing applications seamless.

Essential Commands and Usage

Checking Service Status

Shell
 
$ docker model status
Docker Model Runner is running


Viewing Available Commands

Shell
 
$ docker model help
Usage: docker model COMMAND

Docker Model Runner

Commands:
  inspect   Display detailed information on one model
  list      List the available models that can be run with the Docker Model Runner
  pull      Download a model
  rm        Remove a model downloaded from Docker Hub
  run       Run a model with the Docker Model Runner
  status    Check if the Docker Model Runner is running
  version   Show the Docker Model Runner version

Run 'docker model COMMAND --help' for more information on a command.


Model Management Operations

Downloading a Model

Shell
 
$ docker model pull ai/smollm2
Downloaded: 0.00 MB
Model ai/smollm2 pulled successfully


Note: The download progress display currently shows 0.00 MB after completion — this is a known beta issue.

Listing Local Models

Shell
 
$ docker model list
MODEL       PARAMETERS  QUANTIZATION    ARCHITECTURE  MODEL ID      CREATED     SIZE
ai/smollm2  361.82 M    IQ2_XXS/Q4_K_M  llama        354bf30d0aa3  2 weeks ago 256.35 MiB


Running Models

Single prompt execution:

Shell
 
$ docker model run ai/smollm2 "Hello, how are you?"
Hello! I'm doing well, thank you for asking.


Interactive chat mode:

Shell
 
$ docker model run ai/smollm2
Interactive chat mode started. Type '/bye' to exit.
> Hello there!
Hello! How can I help you today?
> /bye
Chat session ended.


Removing Models

Shell
 
$ docker model rm ai/smollm2
Model ai/smollm2 removed successfully


Viewing Logs for Troubleshooting

Shell
 
$ docker model logs
# Or through Docker Desktop GUI: Models → Logs tab


Application Integration Through OpenAI-Compatible APIs

Docker Model Runner exposes OpenAI-compatible API endpoints with multiple access methods. Here are examples for different scenarios:

From within a container:

Shell
 
#!/bin/sh
curl http://model-runner.docker.internal/engines/llama.cpp/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "ai/smollm2",
    "messages": [
      {
        "role": "system",
        "content": "You are a helpful assistant."
      },
      {
        "role": "user",
        "content": "Write a brief explanation of containerization benefits."
      }
    ]
  }'


From the host via TCP (requires enabling TCP support):

Shell
 
# First enable TCP support
$ docker desktop enable model-runner --tcp 12434

# Then make API calls
#!/bin/sh
curl http://localhost:12434/engines/llama.cpp/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "ai/smollm2",
    "messages": [
      {
        "role": "system",
        "content": "You are a helpful assistant."
      },
      {
        "role": "user",
        "content": "Explain microservices architecture in simple terms."
      }
    ]
  }'


From the host via Unix socket:

Shell
 
#!/bin/sh
curl --unix-socket $HOME/.docker/run/docker.sock \
  localhost/exp/vDD4.40/engines/llama.cpp/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "ai/smollm2",
    "messages": [
      {
        "role": "system",
        "content": "You are a helpful assistant."
      },
      {
        "role": "user",
        "content": "Explain microservices architecture in simple terms."
      }
    ]
  }'

From the host via Unix socket


Quick Start With Sample GenAI Application

To immediately experience Docker Model Runner with a complete generative AI application:

  1. Clone the sample repository: $ git clone https://github.com/docker/hello-genai.git
  2. Navigate to the project directory: $ cd hello-genai
  3. Execute the run script to download the selected model and launch the application: $ ./run.sh
  4. Open the application in your browser using the URL specified in the repository README.

Open the application in your browser


Available API Endpoints

Once Docker Model Runner is active, the following APIs become available:

Container-Internal Endpoints

Plain Text
 
http://model-runner.docker.internal/

# Model Management
POST /models/create
GET /models
GET /models/{namespace}/{name}
DELETE /models/{namespace}/{name}

# OpenAI-Compatible Endpoints
GET /engines/llama.cpp/v1/models
GET /engines/llama.cpp/v1/models/{namespace}/{name}
POST /engines/llama.cpp/v1/chat/completions
POST /engines/llama.cpp/v1/completions
POST /engines/llama.cpp/v1/embeddings


Host Access Options

TCP Access (when enabled):

Plain Text
 
http://localhost:12434/


Host Gateway Access from containers:

Plain Text
 
http://172.17.0.1:12434/


Unix socket access: The same endpoints are available on /var/run/docker.sock with the beta prefix /exp/vDD4.40

Note: The llama.cpp component can be omitted. For example: POST /engines/v1/chat/completions

Docker Compose Integration

For Docker Compose projects, you may need to add an extra_hosts directive to access the model runner:

YAML
 
extra_hosts:
  - "model-runner.docker.internal:host-gateway"


Known Issues and Troubleshooting

Command Recognition Problems

If you encounter:

Shell
 
docker: 'model' is not a docker command


This indicates Docker cannot locate the plugin. Resolution:

Shell
 
$ ln -s /Applications/Docker.app/Contents/Resources/cli-plugins/docker-model ~/.docker/cli-plugins/docker-model


Resource Management Limitations

Docker Model Runner currently lacks safeguards against loading models that exceed available system resources. Attempting to run oversized models may cause significant system slowdowns or temporary unresponsiveness. This is particularly problematic when running LLMs without sufficient GPU memory or system RAM.

Docker Model CLI Digest Support

The Docker Model CLI currently lacks consistent support for specifying models by image digest. As a workaround, refer to models by name instead of digest.

Download Failure Handling

When model downloads fail, docker model run may still initiate the chat interface despite the model being unavailable. In such cases, manually retry the docker model pull command.

Configuration Management

Docker Desktop Setup

Docker Model Runner is enabled by default in Docker Desktop. To modify this setting:

  1. Open Docker Desktop settings.
  2. Navigate to Beta under Features in development (for versions 4.42+) or Experimental features (for versions 4.41 and earlier).
  3. Toggle the Enable Docker Model Runner checkbox.
  4. On Windows with a supported NVIDIA GPU, also enable Enable GPU-backed inference
  5. Click Apply & restart.

Docker Engine Installation (Linux)

For Docker Engine on Linux, install the Docker Model Runner package:

Ubuntu/Debian

Shell
 
$ sudo apt-get update
$ sudo apt-get install docker-model-plugin


RHEL/Fedora

Shell
 
$ sudo dnf update
$ sudo dnf install docker-model-plugin


Test the installation:

Shell
 
$ docker model version
$ docker model run ai/smollm2


Framework Integration

Docker Model Runner now supports integration with popular development frameworks:

  • Testcontainers (Java and Go)
  • Docker Compose (with proper host configuration)

Conclusion

Docker Model Runner makes running AI models locally much easier than before. You don't need to set up complex systems or pay for cloud APIs anymore - everything runs on your own computer.

Since it's still in beta, there are some bugs and missing features. But Docker is working to fix these issues and wants to hear what users think. You can share feedback using the "Give feedback" link next to the Docker Model Runner settings.

If you're building AI apps or just want to try out AI models without the hassle, Docker Model Runner is a great tool to have. As Docker keeps improving it, this will likely become a must-have tool for anyone building AI applications.

Have you tried Docker Model Runner in your projects? Share your experiences and use cases in the comments.

AI Command-line interface Docker (software)

Opinions expressed by DZone contributors are their own.

Related

  • An Introduction to BentoML: A Unified AI Application Framework
  • The Rise of AI Orchestrators
  • Run AI Agents Safely With Docker Sandboxes: A Complete Walkthrough
  • Using Agentforce Vibes to Understand Your Salesforce Org

Partner Resources

×

Comments

The likes didn't load as expected. Please refresh the page and try again.

  • RSS
  • X
  • Facebook

ABOUT US

  • About DZone
  • Support and feedback
  • Community research

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 215
  • Nashville, TN 37211
  • [email protected]

Let's be friends:

  • RSS
  • X
  • Facebook