DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Related

  • Dynatrace Perform: Day Two
  • Compliance Automated Standard Solution (COMPASS), Part 11: Compliance as Code, the OSCAL MCP Server Way
  • Self-Hosted Inference Doesn’t Have to Be a Nightmare: How to Use GPUStack
  • The Agent Protocol Stack: MCP vs. A2A vs. AG-UI

Trending

  • A Hands-On ABAP RESTful Programming Model Guide
  • Building a Production-Ready AI Agent in 2026: Beyond the Hello World Demo
  • Top JavaScript/TypeScript Gen AI Frameworks for 2026
  • Ujorm3: A New Lightweight ORM for JavaBeans and Records
  1. DZone
  2. Software Design and Architecture
  3. Cloud Architecture
  4. How to Build and Deploy an AI Agent on Kubernetes With AWS Bedrock, FastAPI and Helm

How to Build and Deploy an AI Agent on Kubernetes With AWS Bedrock, FastAPI and Helm

Learn how to build, containerize, and deploy a lightweight, cloud-native AI agent on Amazon EKS using FastAPI, AWS Bedrock, Docker, and Helm.

By 
Chandrasekhar Rao Katru user avatar
Chandrasekhar Rao Katru
·
Jan. 09, 26 · Tutorial
Likes (6)
Comment
Save
Tweet
Share
3.4K Views

Join the DZone community and get the full member experience.

Join For Free

The capabilities offered by AI are no longer limited to large, centralized platforms. Today, engineering teams are increasingly embracing lightweight, specialized AI agents that can be managed, scaled, and deployed just like microservices in a cloud-native environment — whether for summarizing large documents, translation, classification, or other analytical tasks. In this tutorial, you will create, deploy, and run an AI model that provides REST APIs for summarization and translation using AWS Bedrock, FastAPI, Docker, and deployment on Amazon EKS via Helm.

This provides a reusable process for integrating AI into operations: one agent, one task, clear boundaries, and full Kubernetes-native visibility and control.

Why AI Agents Fit the Microservices Model

Organizations implementing “platform thinking” are seeking AI components that function like other services in their architecture:

  • Independently deployable
  • Scalable according to demand
  • CI/CD handled by standard pipelines
  • Observable and secure
  • Easy to integrate via REST

AI capabilities are transformed into microservices, enabling cloud-agnostic AI building blocks rather than monolithic AI platforms.

This article assumes your Amazon EKS cluster and Amazon ECR repository are already provisioned, so the focus remains on application architecture and deployment patterns rather than infrastructure setup.

Real-World Use Cases

Scenario Outcome
Customer Support  Summarize long customer tickets
Engineering Operations Translate incident reports
Risk and Compliance  Condense audit or regulatory documents
Product and Marketing  Translates release notes across regions


Step 1: Project Setup

A clean directory layout keeps application logic, containerization, and deployment assets separate:

Markdown
 
```text
ai-agent/
├── app/
│   ├── main.py
│   ├── providers.py
│   ├── models.py
│   └── config.py
├── Dockerfile
└── charts/


This layout separates application logic, container configurations, and Kubernetes deployment assets.

Step 2: Build FastAPI AI Agent

 Configuration:

Python
 
# app/config.py
from pydantic import BaseSettings

class Settings(BaseSettings):
    aws_region: str = "us-east-1"
    model_summarize: str = "anthropic.claude-v2"
    model_translate: str = "amazon.titan-text-lite-v1"

settings = Settings()


Request Models:

Python
 
# app/models.py
from pydantic import BaseModel

class SummarizeRequest(BaseModel):
    text: str

class TranslateRequest(BaseModel):
    text: str
    target_language: str


Bedrock Provider:

Python
 
# app/providers.py
import boto3
import json
import logging
from app.config import settings

logger = logging.getLogger("ai-agent")
logger.setLevel(logging.INFO)

bedrock_client = boto3.client(
    "bedrock-runtime",
    region_name=settings.aws_region
)

def call_bedrock(model_id: str, prompt: str) -> str:
    try:
        payload = {
            "prompt": prompt,
            "max_tokens_to_sample": 200
        }
        response = bedrock_client.invoke_model(
            modelId=model_id,
            body=json.dumps(payload)
        )
        output = json.loads(response["body"].read())
        return output.get("completion", "")
    except Exception as e:
        logger.error(f"Bedrock error: {e}")
        return "Unable to process request."


FastAPI Application:

Python
 
# app/main.py
from fastapi import FastAPI
from fastapi.middleware.cors import CORSMiddleware
from app.models import SummarizeRequest, TranslateRequest
from app.providers import call_bedrock
from app.config import settings

app = FastAPI(title="AI Summarizer and Translator")

app.add_middleware(
    CORSMiddleware,
    allow_origins=["*"],
    allow_methods=["*"],
    allow_headers=["*"]
)

@app.get("/healthz")
async def health():
    return {"status": "ok"}

@app.post("/summarize")
async def summarize(req: SummarizeRequest):
    prompt = f"Summarize this text in two concise sentences:\n{req.text}"
    return {"summary": call_bedrock(settings.model_summarize, prompt)}

@app.post("/translate")
async def translate(req: TranslateRequest):
    prompt = f"Translate this into {req.target_language}:\n{req.text}"
    return {"translation": call_bedrock(settings.model_translate, prompt)}


Step 3: Containerize the Application

Dockerfile:

Dockerfile
 
FROM python:3.11-slim AS builder
WORKDIR /app
COPY app/requirements.txt .
RUN pip install --user --no-cache-dir -r requirements.txt

FROM python:3.11-slim
ENV PYTHONUNBUFFERED=1
ENV PATH=/root/.local/bin:$PATH
WORKDIR /app

COPY --from=builder /root/.local /root/.local
COPY app ./app

RUN adduser --disabled-password --gecos '' appuser
USER appuser

EXPOSE 8000

CMD ["uvicorn","app.main:app","--host","0.0.0.0","--port","8000","--workers","2"]


Step 4: Push the Image to Amazon ECR

Markdown
 
```bash
aws ecr get-login-password --region us-east-1 \
 | docker login --username AWS --password-stdin <ECR_URL>

docker tag ai-agent:latest <ECR_URL>/ai-agent:latest
docker push <ECR_URL>/ai-agent:latest


Step 5: Create Kubernetes Secretes

Markdown
 
```bash
kubectl create secret generic bedrock-secret \
  --from-literal=AWS_ACCESS_KEY_ID=XXX \
  --from-literal=AWS_SECRET_ACCESS_KEY=YYY


Step 6: Helm Configurations

values.yaml

YAML
 
replicaCount: 2

image:
  repository: <ECR_URL>/ai-agent
  tag: latest
  pullPolicy: Always

service:
  type: LoadBalancer
  port: 80

env:
  AWS_REGION: us-east-1

secretRef: bedrock-secret

resources:
  requests:
    cpu: 100m
    memory: 256Mi
  limits:
    cpu: 500m
    memory: 512Mi


Deployment Template:

YAML
 
apiVersion: apps/v1
kind: Deployment
metadata:
  name: ai-agent
spec:
  replicas: {{ .Values.replicaCount }}
  selector:
    matchLabels:
      app: ai-agent
  template:
    metadata:
      labels:
        app: ai-agent
    spec:
      containers:
        - name: ai-agent
          image: "{{ .Values.image.repository }}:{{ .Values.image.tag }}"
          imagePullPolicy: {{ .Values.image.pullPolicy }}
          ports:
            - containerPort: 8000
          env:
            - name: AWS_REGION
              value: {{ .Values.env.AWS_REGION | quote }}
          envFrom:
            - secretRef:
                name: {{ .Values.secretRef }}
          readinessProbe:
            httpGet:
              path: /healthz
              port: 8000
          livenessProbe:
            httpGet:
              path: /healthz
              port: 8000
          resources:
{{ toYaml .Values.resources | indent 12 }}


Step 7: Deploy to Kubernetes

Markdown
 
```bash
helm install ai-agent ./charts -f values.yaml
kubectl get svc


Step 8: Test the API

Markdown
 
```bash
curl -X POST http://<EXTERNAL-IP>/summarize \
 -H "Content-Type: application/json" \
 -d '{"text":"Customer reported API latency during peak hours."}'


Step 9: Autoscaling and Monitoring

Markdown
 
```bash
kubectl autoscale deployment ai-agent \
  --min=2 --max=6 --cpu-percent=70


Step 10: CI/CD Automation (GitHub Actions or Harness CD)

Once the container image and Helm chart are set up, automation can be implemented through a standard CI/CD pipeline. The process involves building the container image, storing it in Amazon ECR, and deploying/upgrading a Helm release to EKS.

  • GitHub Actions: Ideal for repository-based CI/CD with simple deployment pipelines.
  • Harness CD: Suitable for environments requiring approval gates, RBAC, traceability, and multi-team orchestration.

Regardless of the tool, the deployment lifecycle remains consistent: container versioning, Kubernetes releases via Helm, and rollouts with standard health checks.

Closing Thoughts

Kubernetes offers a solid platform for deploying AI agents as modifiable services, whereas AWS Bedrock makes large language models easily accessible with simplicity that does not accrue any operational complexity in addition to that. Paired together with FastAPI, Docker and Helm, a straight and clear approach towards making AI services easily available via standard APIs becomes possible.

The ability to separate the application logic layer from the deployment aspect makes it easier to implement the approach in a way that promotes reuse, scalability and consistency in operations with the application. With the increasing trend among businesses to consume multiple clouds, the need for the above qualities cannot be overemphasized in order to control the deployment processes without getting

In the succeeding installments of this series, the same machine learning model will be utilized on Azure AKS with Azure OpenAI and on GCP GKE with Vertex AI. This is the power of Kubernetes — the ability to provide an equivalent layer for machine learning tasks on any cloud platform.

AI AWS Kubernetes Build (game engine)

Opinions expressed by DZone contributors are their own.

Related

  • Dynatrace Perform: Day Two
  • Compliance Automated Standard Solution (COMPASS), Part 11: Compliance as Code, the OSCAL MCP Server Way
  • Self-Hosted Inference Doesn’t Have to Be a Nightmare: How to Use GPUStack
  • The Agent Protocol Stack: MCP vs. A2A vs. AG-UI

Partner Resources

×

Comments

The likes didn't load as expected. Please refresh the page and try again.

  • RSS
  • X
  • Facebook

ABOUT US

  • About DZone
  • Support and feedback
  • Community research

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 215
  • Nashville, TN 37211
  • [email protected]

Let's be friends:

  • RSS
  • X
  • Facebook