DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Related

  • Smart Deployment Strategies for Modern Applications
  • Solving the Mystery: Why Java RSS Grows in Docker on M1 Macs
  • How We Diagnosed a Hidden Scheduler Failure in a Docker Swarm Cluster Serving 2 Million Users
  • Java Backend Development in the Era of Kubernetes and Docker

Trending

  • Product-Led Software Delivery: Intelligent Platforms for DevOps at Scale
  • AWS Managed Database Observability: Monitoring DynamoDB, ElastiCache, and Redshift Beyond CloudWatch
  • DevOps and Platform Engineering Readiness Checklist: Everything Needed for a Scalable, Secure, High-Velocity Delivery Platform
  • Liquibase: Database Change Management and Automated Deployments
  1. DZone
  2. Software Design and Architecture
  3. Containers
  4. NeMo Agent Toolkit With Docker Model Runner

NeMo Agent Toolkit With Docker Model Runner

Agent observability is often missing in the rush to build AI agents. NeMo adds observability to AI agents, helping trace, evaluate, and debug multi-agent workflows.

By 
Siri Varma Vegiraju user avatar
Siri Varma Vegiraju
DZone Core CORE ·
Apr. 15, 26 · Tutorial
Likes (0)
Comment
Save
Tweet
Share
2.5K Views

Join the DZone community and get the full member experience.

Join For Free

The year 2025 has been widely recognized as the year of AI agents. With the launch of frameworks like Docker Cagent, Microsoft Agent Framework (MAF), and Google’s Agent Development Kit (ADK), organizations rapidly embraced agentic systems.

However, one critical area received far less attention: agent observability.

While teams moved quickly to build and deploy agent-based solutions, a fundamental question remained largely unanswered. How do we know these agents are actually working as intended?

  • Are multiple agents coordinating effectively?
  • Are their outputs reliable and of high quality?
  • Can we diagnose failures or unexpected behaviors in complex, multi-agent workflows?

These challenges sit at the core of agent observability.

This is where Nvidia’s open-source toolkit, NeMo, comes into the picture. NeMo brings much-needed, enterprise-grade observability to LLM-powered systems, enabling teams to monitor, evaluate, and trust their agent infrastructure at scale.

At the same time, Docker Model Runner is emerging as the de facto standard for local inference from the desktop. It provides a unified, “single pane of glass” experience for experimenting with a wide range of open-source models available through the Docker Models Hub.

As part of this tutorial, we will look at how we can add observability to your AI agents when inferencing through Docker Model Runner.

Docker Model Runner Setup

First, let’s set up Docker Model Runner using a small language model. In this tutorial, we will use ai/smollm2.

The setup instructions for Docker Model Runner are available in the official documentation. Follow those steps to get your environment ready.

Make sure to enable TCP access in Docker Desktop. This step is essential; without it, your prototype will not be able to communicate with the model runner over localhost.

Docker Model Runner

Command to pull the small language model we will use for inferencing.

Plain Text
 
docker model run ai/smollm2


NeMo Agentic Toolkit Setup

The first step begins with installing the Nvidia NAT package from Python. I recommend installing uv and installing all the nat dependencies through uv because going down the plain “pip” route causes timeouts.

Plain Text
 
uv pip install nvidia-nat


NeMo's agentic setup is done through YAML. So, declare a YAML configuration for eg: agent-run.yaml

YAML
 
functions:
  # Add a tool to search wikipedia
  wikipedia_search:
    _type: wiki_search
    max_results: 2

llms:
  # Tell NeMo Agent Toolkit which LLM to use for the agent
  openai_llm:
    _type: openai
    model_name: ai/smollm2
    base_url: http://localhost:12434/engines/v1  # Docker model runner endpoint
    api_key: "empty" // because we are using local inference this can be empty.
    temperature: 0.7
    max_tokens: 1000
    timeout: 30

general:
  telemetry:
    tracing:
      otelcollector:
        _type: otelcollector
        # The endpoint where you have deployed the otel collector
        endpoint: http://0.0.0.0:5216/v1/traces
        project: nemo_project
        
workflow:
  # Use an agent that 'reasons' and 'acts'
  _type: react_agent
  # Give it access to our wikipedia search tool
  tool_names: [wikipedia_search]
  # Tell it which LLM to use (now using OpenAI with Docker endpoint)
  llm_name: openai_llm
  # Make it verbose
  verbose: true
  # Retry up to 3 times
  parse_agent_response_max_retries: 3


There are four important sections in the YAML file:
  1. Functions: These are simple components that perform a specific operation. In this case, built-in Wikipedia search, for example. You can define your own functions too.
  2. LLMs: The large language model provider we plan to use. Currently, OpenAI, Anthropic, Azure OpenAI, Bedrock, and Hugging Face are the supported providers. Since Docker Model Runner supports both OpenAI and Anthropic API formats, we can leverage it for both the LLM providers.
  3. Telemetry: This is where Observability comes into the picture. In this example, we have added OTel-based tracing. As a result, we will be logging spans to the OpenTelemetry configured destination.
  4. Workflow: This is the final piece in the puzzle, where we will end up configuring all the functions, LLMS, and tools to create a workflow.

For the current workflow, we are configuring a reasoning and act agent along with the Wikipedia search tool and Docker Model Runner inference endpoint.

Before we run the workflow, we will configure the OpenTelemetry exporter to publish spans to the otellogs/span folder.

Create a file named otel_config.yml.

YAML
 
receivers:
  otlp:
    protocols:
      http:
        endpoint: 0.0.0.0:5216


processors:
  batch:
    send_batch_size: 100
    timeout: 10s

exporters:
  file:
    path: /otellogs/spans.json
    format: json

service:
  pipelines:
    traces:
      receivers: [otlp]
      processors: [batch]
exporters: [file]


Run the following command in the terminal.

Plain Text
 
mkdir otel_logs

chmod 777 otel_logs

docker run -v $(pwd)/otelcollectorconfig.yaml:/etc/otelcol-contrib/config.yaml \
-p 5216:5216 \
-v $(pwd)/otel_logs:/otel_logs/ \
otel/opentelemetry-collector-contrib:0.128.0


Finally, run the NeMo workflow using the following command.

Plain Text
 
nat run --config_file ./agent-run.yml --input "What is the capital of Washington"


Output:

Plain Text
 
[AGENT]
Agent input: What is the capital of Washington
Agent's thoughts:
WikiSearch: {'annotation': 'Washington State', 'required': False}

Thought: You should always think about what to do.
Action: Wikipedia Search: {'annotation': 'Washington State', 'required': False}


------------------------------
2026-03-22 21:55:18 - INFO - nat.plugins.langchain.agent.react_agent.agent:357 - [AGENT] Retrying ReAct Agent, including output parsing Observation
2026-03-22 21:55:18 - INFO - httpx:1740 - HTTP Request: POST http://localhost:12434/engines/v1/chat/completions "HTTP/1.1 200 OK"
2026-03-22 21:55:18 - INFO - nat.plugins.langchain.agent.react_agent.agent:270 -
------------------------------
[AGENT]
Agent input: What is the capital of Washington State
Agent's thoughts:
The capital of Washington State is Olympia.


After running the above command, you will see a spans.json file under the otel_logs section, which contains the entire span, along with inputs and outputs.

In addition to what we discussed, it is also possible to set up logging and evaluations on model response that check for coherence, relevance, and groundedness.

References

  • Docker Model Runner: https://docs.docker.com/ai/model-runner/
  • Nvidia NeMo Agent Toolkit: https://docs.nvidia.com/nemo/agent-toolkit/latest/get-started/installation.html
Docker (software)

Opinions expressed by DZone contributors are their own.

Related

  • Smart Deployment Strategies for Modern Applications
  • Solving the Mystery: Why Java RSS Grows in Docker on M1 Macs
  • How We Diagnosed a Hidden Scheduler Failure in a Docker Swarm Cluster Serving 2 Million Users
  • Java Backend Development in the Era of Kubernetes and Docker

Partner Resources

×

Comments

The likes didn't load as expected. Please refresh the page and try again.

  • RSS
  • X
  • Facebook

ABOUT US

  • About DZone
  • Support and feedback
  • Community research

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 215
  • Nashville, TN 37211
  • [email protected]

Let's be friends:

  • RSS
  • X
  • Facebook