How to Build and Deploy an AI Agent on Kubernetes With AWS Bedrock, FastAPI and Helm

Learn how to build, containerize, and deploy a lightweight, cloud-native AI agent on Amazon EKS using FastAPI, AWS Bedrock, Docker, and Helm.

Jan. 09, 26 · Tutorial

Likes (6)

Comment

Save

3.6K Views

The capabilities offered by AI are no longer limited to large, centralized platforms. Today, engineering teams are increasingly embracing lightweight, specialized AI agents that can be managed, scaled, and deployed just like microservices in a cloud-native environment — whether for summarizing large documents, translation, classification, or other analytical tasks. In this tutorial, you will create, deploy, and run an AI model that provides REST APIs for summarization and translation using AWS Bedrock, FastAPI, Docker, and deployment on Amazon EKS via Helm.

This provides a reusable process for integrating AI into operations: one agent, one task, clear boundaries, and full Kubernetes-native visibility and control.

Why AI Agents Fit the Microservices Model

Organizations implementing “platform thinking” are seeking AI components that function like other services in their architecture:

Independently deployable
Scalable according to demand
CI/CD handled by standard pipelines
Observable and secure
Easy to integrate via REST

AI capabilities are transformed into microservices, enabling cloud-agnostic AI building blocks rather than monolithic AI platforms.

This article assumes your Amazon EKS cluster and Amazon ECR repository are already provisioned, so the focus remains on application architecture and deployment patterns rather than infrastructure setup.

Real-World Use Cases

Scenario	Outcome
Customer Support	Summarize long customer tickets
Engineering Operations	Translate incident reports
Risk and Compliance	Condense audit or regulatory documents
Product and Marketing	Translates release notes across regions

Step 1: Project Setup

A clean directory layout keeps application logic, containerization, and deployment assets separate:

     Markdown
    
 

    ```text
ai-agent/
├── app/
│   ├── main.py
│   ├── providers.py
│   ├── models.py
│   └── config.py
├── Dockerfile
└── charts/
   

This layout separates application logic, container configurations, and Kubernetes deployment assets.

Step 2: Build FastAPI AI Agent

Configuration:

     Python
    
    # app/config.py
from pydantic import BaseSettings

class Settings(BaseSettings):
    aws_region: str = "us-east-1"
    model_summarize: str = "anthropic.claude-v2"
    model_translate: str = "amazon.titan-text-lite-v1"

settings = Settings()

Request Models:

     Python
    
    # app/models.py
from pydantic import BaseModel

class SummarizeRequest(BaseModel):
    text: str

class TranslateRequest(BaseModel):
    text: str
    target_language: str

Bedrock Provider:

     Python
    
 

    # app/providers.py
import boto3
import json
import logging
from app.config import settings

logger = logging.getLogger("ai-agent")
logger.setLevel(logging.INFO)

bedrock_client = boto3.client(
    "bedrock-runtime",
    region_name=settings.aws_region
)

def call_bedrock(model_id: str, prompt: str) -> str:
    try:
        payload = {
            "prompt": prompt,
            "max_tokens_to_sample": 200
        }
        response = bedrock_client.invoke_model(
            modelId=model_id,
            body=json.dumps(payload)
        )
        output = json.loads(response["body"].read())
        return output.get("completion", "")
    except Exception as e:
        logger.error(f"Bedrock error: {e}")
        return "Unable to process request."
   

FastAPI Application:

     Python
    
 

    # app/main.py
from fastapi import FastAPI
from fastapi.middleware.cors import CORSMiddleware
from app.models import SummarizeRequest, TranslateRequest
from app.providers import call_bedrock
from app.config import settings

app = FastAPI(title="AI Summarizer and Translator")

app.add_middleware(
    CORSMiddleware,
    allow_origins=["*"],
    allow_methods=["*"],
    allow_headers=["*"]
)

@app.get("/healthz")
async def health():
    return {"status": "ok"}

@app.post("/summarize")
async def summarize(req: SummarizeRequest):
    prompt = f"Summarize this text in two concise sentences:\n{req.text}"
    return {"summary": call_bedrock(settings.model_summarize, prompt)}

@app.post("/translate")
async def translate(req: TranslateRequest):
    prompt = f"Translate this into {req.target_language}:\n{req.text}"
    return {"translation": call_bedrock(settings.model_translate, prompt)}
   

Step 3: Containerize the Application

Dockerfile:

     Dockerfile
    
 

    FROM python:3.11-slim AS builder
WORKDIR /app
COPY app/requirements.txt .
RUN pip install --user --no-cache-dir -r requirements.txt

FROM python:3.11-slim
ENV PYTHONUNBUFFERED=1
ENV PATH=/root/.local/bin:$PATH
WORKDIR /app

COPY --from=builder /root/.local /root/.local
COPY app ./app

RUN adduser --disabled-password --gecos '' appuser
USER appuser

EXPOSE 8000

CMD ["uvicorn","app.main:app","--host","0.0.0.0","--port","8000","--workers","2"]
   

Step 4: Push the Image to Amazon ECR

     Markdown
    
    ```bash
aws ecr get-login-password --region us-east-1 \
 | docker login --username AWS --password-stdin <ECR_URL>

docker tag ai-agent:latest <ECR_URL>/ai-agent:latest
docker push <ECR_URL>/ai-agent:latest

Step 5: Create Kubernetes Secretes

     Markdown
    
    ```bash
kubectl create secret generic bedrock-secret \
  --from-literal=AWS_ACCESS_KEY_ID=XXX \
  --from-literal=AWS_SECRET_ACCESS_KEY=YYY

Step 6: Helm Configurations

values.yaml

     YAML
    
 

    replicaCount: 2

image:
  repository: <ECR_URL>/ai-agent
  tag: latest
  pullPolicy: Always

service:
  type: LoadBalancer
  port: 80

env:
  AWS_REGION: us-east-1

secretRef: bedrock-secret

resources:
  requests:
    cpu: 100m
    memory: 256Mi
  limits:
    cpu: 500m
    memory: 512Mi
   

Deployment Template:

     YAML
    
 

    apiVersion: apps/v1
kind: Deployment
metadata:
  name: ai-agent
spec:
  replicas: {{ .Values.replicaCount }}
  selector:
    matchLabels:
      app: ai-agent
  template:
    metadata:
      labels:
        app: ai-agent
    spec:
      containers:
        - name: ai-agent
          image: "{{ .Values.image.repository }}:{{ .Values.image.tag }}"
          imagePullPolicy: {{ .Values.image.pullPolicy }}
          ports:
            - containerPort: 8000
          env:
            - name: AWS_REGION
              value: {{ .Values.env.AWS_REGION | quote }}
          envFrom:
            - secretRef:
                name: {{ .Values.secretRef }}
          readinessProbe:
            httpGet:
              path: /healthz
              port: 8000
          livenessProbe:
            httpGet:
              path: /healthz
              port: 8000
          resources:
{{ toYaml .Values.resources | indent 12 }}
   

Step 7: Deploy to Kubernetes

     Markdown
    
    ```bash
helm install ai-agent ./charts -f values.yaml
kubectl get svc

Step 8: Test the API

     Markdown
    
    ```bash
curl -X POST http://<EXTERNAL-IP>/summarize \
 -H "Content-Type: application/json" \
 -d '{"text":"Customer reported API latency during peak hours."}'

Step 9: Autoscaling and Monitoring

     Markdown
    
    ```bash
kubectl autoscale deployment ai-agent \
  --min=2 --max=6 --cpu-percent=70

Step 10: CI/CD Automation (GitHub Actions or Harness CD)

Once the container image and Helm chart are set up, automation can be implemented through a standard CI/CD pipeline. The process involves building the container image, storing it in Amazon ECR, and deploying/upgrading a Helm release to EKS.

GitHub Actions: Ideal for repository-based CI/CD with simple deployment pipelines.
Harness CD: Suitable for environments requiring approval gates, RBAC, traceability, and multi-team orchestration.

Regardless of the tool, the deployment lifecycle remains consistent: container versioning, Kubernetes releases via Helm, and rollouts with standard health checks.

Closing Thoughts

Kubernetes offers a solid platform for deploying AI agents as modifiable services, whereas AWS Bedrock makes large language models easily accessible with simplicity that does not accrue any operational complexity in addition to that. Paired together with FastAPI, Docker and Helm, a straight and clear approach towards making AI services easily available via standard APIs becomes possible.

The ability to separate the application logic layer from the deployment aspect makes it easier to implement the approach in a way that promotes reuse, scalability and consistency in operations with the application. With the increasing trend among businesses to consume multiple clouds, the need for the above qualities cannot be overemphasized in order to control the deployment processes without getting

In the succeeding installments of this series, the same machine learning model will be utilized on Azure AKS with Azure OpenAI and on GCP GKE with Vertex AI. This is the power of Kubernetes — the ability to provide an equivalent layer for machine learning tasks on any cloud platform.

AI AWS Kubernetes Build (game engine)

Opinions expressed by DZone contributors are their own.

Related

Trending