How to Build and Deploy an AI Agent on Kubernetes With AWS Bedrock, FastAPI and Helm
Learn how to build, containerize, and deploy a lightweight, cloud-native AI agent on Amazon EKS using FastAPI, AWS Bedrock, Docker, and Helm.
Join the DZone community and get the full member experience.
Join For FreeThe capabilities offered by AI are no longer limited to large, centralized platforms. Today, engineering teams are increasingly embracing lightweight, specialized AI agents that can be managed, scaled, and deployed just like microservices in a cloud-native environment — whether for summarizing large documents, translation, classification, or other analytical tasks. In this tutorial, you will create, deploy, and run an AI model that provides REST APIs for summarization and translation using AWS Bedrock, FastAPI, Docker, and deployment on Amazon EKS via Helm.
This provides a reusable process for integrating AI into operations: one agent, one task, clear boundaries, and full Kubernetes-native visibility and control.
Why AI Agents Fit the Microservices Model
Organizations implementing “platform thinking” are seeking AI components that function like other services in their architecture:
- Independently deployable
- Scalable according to demand
- CI/CD handled by standard pipelines
- Observable and secure
- Easy to integrate via REST
AI capabilities are transformed into microservices, enabling cloud-agnostic AI building blocks rather than monolithic AI platforms.
This article assumes your Amazon EKS cluster and Amazon ECR repository are already provisioned, so the focus remains on application architecture and deployment patterns rather than infrastructure setup.
Real-World Use Cases
| Scenario | Outcome |
| Customer Support | Summarize long customer tickets |
| Engineering Operations | Translate incident reports |
| Risk and Compliance | Condense audit or regulatory documents |
| Product and Marketing | Translates release notes across regions |
Step 1: Project Setup
A clean directory layout keeps application logic, containerization, and deployment assets separate:
```text
ai-agent/
├── app/
│ ├── main.py
│ ├── providers.py
│ ├── models.py
│ └── config.py
├── Dockerfile
└── charts/
This layout separates application logic, container configurations, and Kubernetes deployment assets.
Step 2: Build FastAPI AI Agent
Configuration:
# app/config.py
from pydantic import BaseSettings
class Settings(BaseSettings):
aws_region: str = "us-east-1"
model_summarize: str = "anthropic.claude-v2"
model_translate: str = "amazon.titan-text-lite-v1"
settings = Settings()
Request Models:
# app/models.py
from pydantic import BaseModel
class SummarizeRequest(BaseModel):
text: str
class TranslateRequest(BaseModel):
text: str
target_language: str
Bedrock Provider:
# app/providers.py
import boto3
import json
import logging
from app.config import settings
logger = logging.getLogger("ai-agent")
logger.setLevel(logging.INFO)
bedrock_client = boto3.client(
"bedrock-runtime",
region_name=settings.aws_region
)
def call_bedrock(model_id: str, prompt: str) -> str:
try:
payload = {
"prompt": prompt,
"max_tokens_to_sample": 200
}
response = bedrock_client.invoke_model(
modelId=model_id,
body=json.dumps(payload)
)
output = json.loads(response["body"].read())
return output.get("completion", "")
except Exception as e:
logger.error(f"Bedrock error: {e}")
return "Unable to process request."
FastAPI Application:
# app/main.py
from fastapi import FastAPI
from fastapi.middleware.cors import CORSMiddleware
from app.models import SummarizeRequest, TranslateRequest
from app.providers import call_bedrock
from app.config import settings
app = FastAPI(title="AI Summarizer and Translator")
app.add_middleware(
CORSMiddleware,
allow_origins=["*"],
allow_methods=["*"],
allow_headers=["*"]
)
@app.get("/healthz")
async def health():
return {"status": "ok"}
@app.post("/summarize")
async def summarize(req: SummarizeRequest):
prompt = f"Summarize this text in two concise sentences:\n{req.text}"
return {"summary": call_bedrock(settings.model_summarize, prompt)}
@app.post("/translate")
async def translate(req: TranslateRequest):
prompt = f"Translate this into {req.target_language}:\n{req.text}"
return {"translation": call_bedrock(settings.model_translate, prompt)}
Step 3: Containerize the Application
Dockerfile:
FROM python:3.11-slim AS builder
WORKDIR /app
COPY app/requirements.txt .
RUN pip install --user --no-cache-dir -r requirements.txt
FROM python:3.11-slim
ENV PYTHONUNBUFFERED=1
ENV PATH=/root/.local/bin:$PATH
WORKDIR /app
COPY --from=builder /root/.local /root/.local
COPY app ./app
RUN adduser --disabled-password --gecos '' appuser
USER appuser
EXPOSE 8000
CMD ["uvicorn","app.main:app","--host","0.0.0.0","--port","8000","--workers","2"]
Step 4: Push the Image to Amazon ECR
```bash
aws ecr get-login-password --region us-east-1 \
| docker login --username AWS --password-stdin <ECR_URL>
docker tag ai-agent:latest <ECR_URL>/ai-agent:latest
docker push <ECR_URL>/ai-agent:latest
Step 5: Create Kubernetes Secretes
```bash
kubectl create secret generic bedrock-secret \
--from-literal=AWS_ACCESS_KEY_ID=XXX \
--from-literal=AWS_SECRET_ACCESS_KEY=YYY
Step 6: Helm Configurations
values.yaml
replicaCount: 2
image:
repository: <ECR_URL>/ai-agent
tag: latest
pullPolicy: Always
service:
type: LoadBalancer
port: 80
env:
AWS_REGION: us-east-1
secretRef: bedrock-secret
resources:
requests:
cpu: 100m
memory: 256Mi
limits:
cpu: 500m
memory: 512Mi
Deployment Template:
apiVersion: apps/v1
kind: Deployment
metadata:
name: ai-agent
spec:
replicas: {{ .Values.replicaCount }}
selector:
matchLabels:
app: ai-agent
template:
metadata:
labels:
app: ai-agent
spec:
containers:
- name: ai-agent
image: "{{ .Values.image.repository }}:{{ .Values.image.tag }}"
imagePullPolicy: {{ .Values.image.pullPolicy }}
ports:
- containerPort: 8000
env:
- name: AWS_REGION
value: {{ .Values.env.AWS_REGION | quote }}
envFrom:
- secretRef:
name: {{ .Values.secretRef }}
readinessProbe:
httpGet:
path: /healthz
port: 8000
livenessProbe:
httpGet:
path: /healthz
port: 8000
resources:
{{ toYaml .Values.resources | indent 12 }}
Step 7: Deploy to Kubernetes
```bash
helm install ai-agent ./charts -f values.yaml
kubectl get svc
Step 8: Test the API
```bash
curl -X POST http://<EXTERNAL-IP>/summarize \
-H "Content-Type: application/json" \
-d '{"text":"Customer reported API latency during peak hours."}'
Step 9: Autoscaling and Monitoring
```bash
kubectl autoscale deployment ai-agent \
--min=2 --max=6 --cpu-percent=70
Step 10: CI/CD Automation (GitHub Actions or Harness CD)
Once the container image and Helm chart are set up, automation can be implemented through a standard CI/CD pipeline. The process involves building the container image, storing it in Amazon ECR, and deploying/upgrading a Helm release to EKS.
- GitHub Actions: Ideal for repository-based CI/CD with simple deployment pipelines.
- Harness CD: Suitable for environments requiring approval gates, RBAC, traceability, and multi-team orchestration.
Regardless of the tool, the deployment lifecycle remains consistent: container versioning, Kubernetes releases via Helm, and rollouts with standard health checks.
Closing Thoughts
Kubernetes offers a solid platform for deploying AI agents as modifiable services, whereas AWS Bedrock makes large language models easily accessible with simplicity that does not accrue any operational complexity in addition to that. Paired together with FastAPI, Docker and Helm, a straight and clear approach towards making AI services easily available via standard APIs becomes possible.
The ability to separate the application logic layer from the deployment aspect makes it easier to implement the approach in a way that promotes reuse, scalability and consistency in operations with the application. With the increasing trend among businesses to consume multiple clouds, the need for the above qualities cannot be overemphasized in order to control the deployment processes without getting
In the succeeding installments of this series, the same machine learning model will be utilized on Azure AKS with Azure OpenAI and on GCP GKE with Vertex AI. This is the power of Kubernetes — the ability to provide an equivalent layer for machine learning tasks on any cloud platform.
Opinions expressed by DZone contributors are their own.
Comments