DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

AI/ML

Artificial intelligence (AI) and machine learning (ML) are two fields that work together to create computer systems capable of perception, recognition, decision-making, and translation. Separately, AI is the ability for a computer system to mimic human intelligence through math and logic, and ML builds off AI by developing methods that "learn" through experience and do not require instruction. In the AI/ML Zone, you'll find resources ranging from tutorials to use cases that will help you navigate this rapidly growing field.

icon
Latest Premium Content
Trend Report
Generative AI
Generative AI
Refcard #394
AI Automation Essentials
AI Automation Essentials
Refcard #158
Machine Learning Patterns and Anti-Patterns
Machine Learning Patterns and Anti-Patterns

DZone's Featured AI/ML Resources

Docker Security: 6 Practical Labs From Audit to AI Protection

Docker Security: 6 Practical Labs From Audit to AI Protection

By Shamsher Khan
Docker containers share the host kernel. A single misconfigured container can expose sensitive data, provide root access to the host, or compromise the entire infrastructure. This guide provides six practical labs that work on Linux, macOS, and Windows. The examples use open source tools and demonstrate both vulnerable and secure configurations. Each lab is hands-on and runnable. All code and detailed instructions are available on GitHub: https://github.com/opscart/docker-security-practical-guide This article covers six essential security practices: Configuration auditing with Docker Bench SecurityContainer hardening with capabilities and read-only filesystemsVulnerability scanning with Trivy and policy enforcementImage signing for supply chain securitySeccomp profiles for system call filteringAI/ML workload security Why Docker Bench Security for Lab 01 When building this guide, we initially considered Falco for runtime detection. However, Falco requires kernel modules and only works on Linux systems. This creates platform dependency issues for developers on macOS or Windows. Docker Bench Security emerged as the better choice: Official Docker tool with 100+ automated checksWorks on all platforms where Docker runsBased on CIS Docker Benchmark v1.6.0 (industry standard)No dependencies beyond Docker itselfRun one command, get immediate results Falco remains valuable for production runtime monitoring on Linux servers. For learning security fundamentals, Docker Bench provides universal accessibility. Lab 01: Security Auditing With Docker Bench What You'll Learn Run comprehensive security audits to identify misconfigurations before they become vulnerabilities. Docker Bench Security checks host configuration, daemon settings, file permissions, container configurations, and network settings. Hands-On Exercise Clone the repository and navigate to Lab 01: Shell git clone https://github.com/opscart/docker-security-practical-guide.git cd docker-security-practical-guide/labs/01-docker-bench-security Step 1: Run Initial Audit Shell ./run-audit.sh Docker Bench runs 105 checks across seven CIS benchmark sections. The output shows PASS (compliant), WARN (needs attention), and INFO (manual review) results. Initial runs typically show a modest score as Docker's defaults prioritize usability over maximum security. Step 2: Deploy Vulnerable Application Deploy a container with intentional security issues: YAML docker-compose -f demo-vulnerable-app.yml up -d This creates containers with multiple problems: Privileged mode enabledHost network namespace sharedDocker socket mounted inside the containerHardcoded credentials in environment variablesAll Linux capabilities granted Step 3: Audit Again Shell ./run-audit.sh The score drops significantly. New WARN findings appear for privileged containers, exposed Docker socket, and disabled security profiles. Understanding the Results Output format: Plain Text [PASS] 5.1 - Verify AppArmor profile, if applicable [WARN] 5.2 - Container running in privileged mode [INFO] 5.3 - Restrict Linux Kernel Capabilities within containers Three categories: PASS: Meets CIS benchmark requirementsWARN: Security issue requiring actionINFO: Requires manual verification Common Issues and Fixes 1. Issue: Privileged containers Problem: YAML privileged: true # Grants full host access Fix: YAML cap_drop: - ALL cap_add: - NET_BIND_SERVICE # Only required capabilities security_opt: - no-new-privileges:true 2. Issue: Docker socket mounted Problem: YAML volumes: - /var/run/docker.sock:/var/run/docker.sock This grants the container full control over the Docker daemon — equivalent to root access on the host. Fix: Remove the socket mount entirely. If Docker API access is needed, use proper authentication instead. Step 4: Cleanup Shell docker-compose -f demo-vulnerable-app.yml down docker system prune -f Lab 02: Secure Container Configurations Objectives Compare insecure versus secure container configurations. Learn how Linux capabilities work and why read-only filesystems matter. Understanding Linux Capabilities Linux divides root privileges into distinct units called capabilities. Instead of running containers as root with full privileges, grant only specific capabilities needed. Common capabilities: CAP_NET_BIND_SERVICE – Bind to ports below 1024CAP_CHOWN – Change file ownershipCAP_SETUID/SETGID – Change user/group IDs Default Docker behavior grants a subset of capabilities. The secure approach: drop all capabilities, then add back only what the application requires. Hands-On Exercise Shell cd labs/02-secure-configs Step 1: Deploy Insecure Container Shell ./deploy-insecure.sh Deploys nginx with all capabilities, read-write filesystem, running as root, and no security restrictions. Step 2: Deploy Secure Container Shell ./deploy-secure.sh Deploys nginx with minimal capabilities, read-only root filesystem, tmpfs mounts for required writes, and no-new-privileges enabled. Step 3: Compare Security Postures Shell ./compare-security.sh Output shows: Plain Text INSECURE Container Capabilities: CapEff: 000001ffffffffff SECURE Container Capabilities: CapEff: 00000000000004c1 The hexadecimal values represent enabled capabilities. Insecure container has all capabilities (ffffffffff). A secure container has only four specific capabilities (4c1). This reduction significantly limits attacker capabilities after compromise. Step 4: Test Security Controls Shell ./test-security.sh Tests package installation, filesystem writes, and privilege escalation. Secure container blocks package installation and root filesystem writes while maintaining functionality. Configuration Details Insecure container command: Shell docker run -d --name insecure-nginx --privileged --cap-add ALL \ --security-opt apparmor=unconfined -p 8080:80 nginx:alpine Secure container command: Shell docker run -d --name secure-nginx --read-only --cap-drop ALL \ --cap-add NET_BIND_SERVICE --cap-add CHOWN --cap-add SETUID --cap-add SETGID \ --security-opt=no-new-privileges:true \ --tmpfs /tmp:rw,noexec,nosuid,size=64M \ --tmpfs /var/cache/nginx:rw,noexec,nosuid,size=64M \ --tmpfs /var/run:rw,noexec,nosuid,size=64M \ -p 8081:8081 nginx:alpine The read-only filesystem prevents malware installation, configuration tampering, and binary modifications. tmpfs provides memory-backed writable space where needed, with noexec and nosuid flags preventing binary execution. Lab 03: Vulnerability Scanning and Policy Enforcement Objectives Scan container images for known vulnerabilities and enforce security policies before deployment. Understanding Vulnerability Scanning Container images contain OS packages and application dependencies. Each component can have CVE-tracked vulnerabilities. Base images often contain hundreds of vulnerabilities. Dependencies become outdated quickly. Some vulnerabilities have active exploits. Scan during development, in CI/CD pipelines, before deployment, and regularly on running images. Tools Used Trivy: Open source vulnerability scanner. Scans OS packages, application dependencies, Infrastructure as Code files, and Kubernetes configurations. Fast scanning (under 1 minute) with a comprehensive vulnerability database. Open Policy Agent (OPA): Policy engine for enforcing security rules. Checks container configurations, runtime settings, and compliance requirements. Hands-On Exercise Shell cd labs/03-vulnerability-scanning Step 1: Install Trivy macOS: Shell brew install trivy Linux: wget https://github.com/aquasecurity/trivy/releases/latest/download/trivy_Linux-64bit.tar.gz tar zxvf trivy_Linux-64bit.tar.gz sudo mv trivy /usr/local/bin/ Step 2: Scan Container Image Shell ./scan-image.sh Output shows library name, CVE identifier, severity level (CRITICAL/HIGH/MEDIUM/LOW), and current versus fixed version. Step 3: Apply Security Policies Install OPA: Shell brew install opa # macOS Apply policy: Shell ./apply-policy.sh OPA checks for privileged mode, root user, missing health checks, and excessive exposed ports. The policy.rego file defines deny rules (block deployment) and warn rules (flag issues). Step 4: Test Policy Enforcement Deploy privileged container: Shell docker run -d --name test-vulnerable --privileged nginx:alpine ./apply-policy.sh Policy detects privileged mode violation. Clean up: Shell docker rm -f test-vulnerable CI/CD Integration GitLab CI example: YAML scan: stage: test script: - trivy image --exit-code 1 --severity CRITICAL,HIGH $CI_REGISTRY_IMAGE:$CI_COMMIT_SHA allow_failure: false Pipeline fails if critical or high vulnerabilities are found. Remediation Process When vulnerabilities are found: Focus on CRITICAL and HIGH severity firstCheck if fixes are availableUpdate the base image to the latest patched versionUpdate dependencies using the package managerRebuild the image with updatesRescan to verify resolution Example: Dockerfile # Found: openssl 1.1.1t has CVE-2023-12345 (CRITICAL) # Fixed in: 1.1.1u # Update Dockerfile FROM alpine:3.18 # instead of 3.17 RUN apk add --no-cache openssl # Rebuild and verify docker build -t myapp:fixed . trivy image myapp:fixed Lab 04: Image Signing and Verification Why Image Signing Matters Container images can be tampered with during transit, replaced by attackers in registries, or modified by compromised build systems. Image signing provides cryptographic proof of origin and detects unauthorized modifications. Supply chain attacks targeting container registries have increased. Image signing is now a requirement for many compliance frameworks. Without signing, you cannot verify that the image running in production is exactly what your build system created. Understanding Image Signing Digital signatures use asymmetric cryptography: Private key signs the imagePublic key verifies the signatureAny modification breaks the signatureOnly the key holder can sign This lab uses Cosign from the Sigstore project. Cosign supports both traditional key-based signing and keyless signing with OIDC providers. Hands-On Exercise Shell cd labs/04-image-signing Step 1: Install Cosign macOS: Shell brew install cosign Linux: Shell wget https://github.com/sigstore/cosign/releases/latest/download/cosign-linux-amd64 chmod +x cosign-linux-amd64 sudo mv cosign-linux-amd64 /usr/local/bin/cosign Step 2: Set Up Signing Infrastructure Shell ./setup-signing.sh Starts a local registry on port 5001 and generates signing keys (cosign.key and cosign.pub). Step 3: Build and Sign Image Shell ./sign-image.sh The script builds a sample image, pushes it to the local registry, and signs it with Cosign. You'll be prompted for a password to protect the private key. Step 4: Verify Signature Shell ./verify-image.sh Verification confirms: Image came from the expected sourceNo modifications since signingSignature matches the public key Step 5: Test Unsigned Image Try to verify an unsigned image: Shell docker pull nginx:latest cosign verify --key cosign.pub nginx:latest Returns "Error: no signatures found," preventing the use of untrusted images. Policy Enforcement In production, enforce signed images using Kubernetes admission controllers or registry policies. Unsigned images should be rejected before deployment. Kubernetes example: YAML apiVersion: policy.sigstore.dev/v1beta1 kind: ClusterImagePolicy metadata: name: require-signed-images spec: images: - glob: "registry.company.com/**" authorities: - key: data: | -----BEGIN PUBLIC KEY----- [your public key] -----END PUBLIC KEY----- Lab 05: Seccomp Profiles Understanding Seccomp Secure computing mode (seccomp) is a Linux kernel feature that filters system calls. Applications make system calls to request kernel services. Limiting which system calls a container can reduce the attack surface. Linux has 300+ system calls. Most applications need fewer than 100. Blocking unnecessary system calls prevents entire classes of attacks. How It Works When a container attempts a system call: Kernel checks seccomp profileIf allowed: call executesIf blocked: operation fails or process terminates Docker's default profile blocks ~44 dangerous system calls while allowing common operations. Hands-On Exercise Shell cd labs/05-seccomp-profiles Step 1: Test Default Profile Shell ./test-default-profile.sh The default profile allows file operations, network operations, and basic process operations. It blocks reboot, mount, and system time modification. Step 2: Test Restrictive Profile Shell ./test-restrictive-profile.sh A restrictive profile only allows the minimum required system calls. It blocks socket creation, process forking, and file permission changes. This profile is suitable for very constrained environments. Step 3: Generate Application-Specific Profile Shell ./generate-profile.sh Creates a profile for nginx with appropriate system calls. The profile blocks dangerous calls like mount and reboot while allowing network and file operations that nginx needs. Profile Structure Basic seccomp profile format: JSON { "defaultAction": "SCMP_ACT_ERRNO", "architectures": ["SCMP_ARCH_X86_64"], "syscalls": [ { "names": ["read", "write", "open", "close"], "action": "SCMP_ACT_ALLOW" } ] } Key elements: defaultAction: What happens to unlisted system callsarchitectures: CPU architectures supportedsyscalls: Explicitly allowed system callsaction: SCMP_ACT_ALLOW (allow) or SCMP_ACT_ERRNO (block) Applying Profiles Docker run: Shell docker run --security-opt seccomp=profile.json nginx:alpine Docker Compose: YAML services: web: image: nginx:alpine security_opt: Attack Prevention Example: Attacker gains shell access in a container. Without seccomp: Shell attacker$ mount /dev/sda1 /mnt # Success - accesses host disk attacker$ reboot # Success - reboots host With seccomp: Shell attacker$ mount /dev/sda1 /mnt # Blocked: Operation not permitted attacker$ reboot # Blocked: Operation not permitted Even with shell access, the attacker cannot perform dangerous operations. Lab 06: AI Model Security Why AI Model Security Matters Machine learning containers have unique security requirements. Models consume significant CPU/GPU/memory resources. Inference inputs may contain sensitive data. Model weights represent intellectual property. APIs expose an attack surface for adversarial inputs and model extraction. AI-Specific Threats Model extraction: Attackers query the model repeatedly to recreate it without training data access.Adversarial attacks: Carefully crafted inputs cause incorrect predictions.Resource exhaustion: Large batch requests consume all available resources.Data poisoning: Malicious training data corrupts the model. Hands-On Exercise Shell cd labs/06-ai-model-security Step 1: Build ML Container Shell ./build-ml-container.sh Builds a container with Python ML framework, inference server, and sample model. Uses multi-stage build, non-root user, and minimal dependencies. Step 2: Deploy With Security Controls Shell ./deploy-secure.sh Deploys container with: Memory limit: 4GBCPU limit: 2 coresRead-only filesystemtmpfs for temporary filesno-new-privileges enabled Step 3: Test Model Inference Shell curl -X POST http://localhost:5001/predict \ -H 'Content-Type: application/json' \ -d '{"text":"sample text for prediction"}' Returns JSON with prediction, confidence, and metadata. Step 4: Stress Test Shell ./stress-test.sh Sends concurrent requests to verify that resource limits work correctly. Container handles requests within defined limits without crashing. Security Configuration Deployment command: Shell docker run -d --name ml-inference \ --read-only \ --tmpfs /tmp:rw,noexec,nosuid,size=2g \ --memory="4g" \ --cpus="2" \ --pids-limit="100" \ --security-opt=no-new-privileges:true \ --cap-drop=ALL \ -p 5001:5000 \ ml-inference:secure Resource limits prevent exhaustion attacks. Read-only filesystem prevents tampering. Process limits prevent fork bombs. Input Validation Protect against malicious inputs: Python from pydantic import BaseModel, validator class PredictionRequest(BaseModel): features: List[float] @validator('features') def validate_features(cls, v): if len(v) > 100: raise ValueError('Too many features') if any(abs(x) > 1000 for x in v): raise ValueError('Feature values too large') return v Validation prevents oversized inputs and extreme values that could cause problems. Rate Limiting Prevent model extraction and resource exhaustion: Python from slowapi import Limiter limiter = Limiter(key_func=get_remote_address) @app.post("/predict") @limiter.limit("10/minute") def predict(request: Request, data: PredictionRequest): # Process request pass Limits requests per client to prevent abuse. Implementation Roadmap Start implementing these practices in sequence: Week 1-2: Run Docker Bench Security on all systems. Address WARN findings. Add to CI/CD pipeline. Week 3-4: Implement container hardening. Drop capabilities. Enable read-only filesystems where possible. Week 5-6: Set up vulnerability scanning. Integrate Trivy into the build process. Create OPA policies. Week 7-8: Implement image signing. Set up Cosign in CI/CD. Configure verification at deployment. Week 9-10: Create and test seccomp profiles. Start with the default profile. Create custom profiles for critical services. Week 11-12: If running ML workloads, implement resource limits and input validation. Add monitoring and rate limiting. Ongoing: Regular security audits. Update dependencies. Monitor for new vulnerabilities. Refine security policies. Getting Started Complete lab instructions and working code: https://github.com/opscart/docker-security-practical-guide The repository includes: All six lab exercises with scriptsConfiguration examples (vulnerable and secure)Troubleshooting guidesAdditional security resources Work through labs sequentially for a comprehensive understanding. Each lab builds on previous concepts. All labs work on Linux, macOS, and Windows. Key Takeaways Lab 01: Run Docker Bench weekly. Address all WARN findings. Integrate into CI/CD.Lab 02: Drop all capabilities by default. Use read-only filesystems. Add tmpfs for required writes.Lab 03: Scan images before deployment. Enforce policies with OPA. Focus on CRITICAL and HIGH vulnerabilities.Lab 04: Sign all production images. Store keys securely. Enforce verification at deployment.Lab 05: Start with Docker's default seccomp profile. Create custom profiles for sensitive workloads. Test thoroughly.Lab 06: Set resource limits for ML containers. Validate inputs. Implement rate limiting and monitoring. Conclusion Docker security requires attention across multiple layers. This guide provides six practical labs covering configuration auditing, container hardening, vulnerability scanning, image signing, seccomp profiles, and AI model security. Security is not a one-time implementation. Regular audits, updates, and monitoring are essential. The practices in this series provide a solid foundation for production Docker deployments. Start implementing these techniques today: https://github.com/opscart/docker-security-practical-guide. More
GitOps-Backed Agentic Operator for Kubernetes: Safe Auto-Remediation With LLMs and Policy Guardrails

GitOps-Backed Agentic Operator for Kubernetes: Safe Auto-Remediation With LLMs and Policy Guardrails

By Sajal Nigam
Kubernetes is already the master of reconciliation: if a pod dies, the scheduler restarts it; if a node disappears, workloads reschedule. But what happens when the failure is due to misconfiguration, resource limits, or novel runtime errors? Traditional controllers keep retrying without real problem-solving. This is where Agentic AI Operators step in. Instead of blindly retrying, they analyze logs, propose a fix, run it through policies, and deliver it safely via GitOps. In this article, we’ll build a prototype GitOps-backed Agentic Operator that: Detects a failing pod.Collects logs and events.Uses an LLM (local or cloud) to generate a remediation plan.Creates a GitHub Pull Request with manifest changes.Runs policy checks (OPA/Gatekeeper) and CI validation before merging.Let's ArgoCD/Flux reconcile the fix into the cluster. This pattern combines autonomy, safety, and auditability — the missing ingredients in most “AI + Kubernetes” experiments. Architecture Here’s the high-level flow: Step 1: Minimal Agentic Operator (Python) We’ll use the Kubernetes Python client to watch pods and OpenAI for reasoning. Python from kubernetes import client, config, watch import openai, subprocess, os openai.api_key = os.getenv("OPENAI_API_KEY") def analyze_failure(logs, manifest): """Ask LLM to generate a remediation plan.""" resp = openai.ChatCompletion.create( model="gpt-4o-mini", messages=[ {"role": "system", "content": "You are a Kubernetes reliability operator."}, {"role": "user", "content": f"Pod failed.\nLogs:\n{logs}\nManifest:\n{manifest}\nPropose a fix as a YAML patch."} ] ) return resp["choices"][0]["message"]["content"] def create_git_pr(branch, patch_file, commit_msg): """Commit patch and open PR via gh CLI.""" subprocess.run(["git", "checkout", "-b", branch]) with open(patch_file, "w") as f: f.write(commit_msg) subprocess.run(["git", "add", patch_file]) subprocess.run(["git", "commit", "-m", commit_msg]) subprocess.run(["git", "push", "origin", branch]) subprocess.run(["gh", "pr", "create", "--title", commit_msg, "--body", "AI-suggested fix"]) def main(): config.load_kube_config() v1 = client.CoreV1Api() w = watch.Watch() for event in w.stream(v1.list_pod_for_all_namespaces): pod = event["object"] if pod.status.phase == "Failed": logs = v1.read_namespaced_pod_log(pod.metadata.name, pod.metadata.namespace) manifest = str(pod.metadata) # Simplified fix = analyze_failure(logs, manifest) branch = f"ai-fix-{pod.metadata.name}" create_git_pr(branch, f"fix-{pod.metadata.name}.yaml", fix) if __name__ == "__main__": main() Step 2: Policy Guardrails With OPA/Gatekeeper Before merging, we want to ensure no unsafe actions sneak in (e.g., disabling securityContext). Example Rego policy (no_privileged.rego): Shell package kubernetes.admission violation[{"msg": msg}] { input.spec.containers[_].securityContext.privileged == true msg := "Privileged containers are not allowed" } Run OPA check locally: Shell opa eval \ --input fix-myapp.yaml \ --data no_privileged.rego \ "data.kubernetes.admission" Step 3: GitHub Actions CI Pipeline CI ensures the fix compiles, passes lint, and applies cleanly in a dry run. .github/workflows/validate.yaml: YAML name: Validate Fix on: [pull_request] jobs: validate: runs-on: ubuntu-latest steps: - uses: actions/checkout@v4 - name: Lint YAML run: yamllint . - name: Kubeval check uses: instrumenta/kubeval-action@master - name: Dry run apply run: kubectl apply -f . --dry-run=server - name: OPA Policy Check run: opa eval --input fix.yaml --data no_privileged.rego "data.kubernetes.admission" Step 4: GitOps Deployment With ArgoCD Once PR merges, ArgoCD syncs manifests to the cluster. The agent watches the pods again. If failure persists, it retries with a new PR. Step 5: Demo Install the operator into your cluster.Trigger a failure (e.g., pod OOMKilled due to low memory). YAML resources: requests: memory: "64Mi" limits: memory: "64Mi" Operator logs: Plain Text Pod myapp-xyz failed. Logs: OOMKilled. AI suggests patch: resources: limits: memory: "128Mi" PR opens in GitHub → CI validates → OPA approves → Merge.ArgoCD applies new manifest → pod recovers. Why This Matters Most AI-in-Kubernetes articles stop at “AI can explain logs.” This pattern goes further: Safe automation: All fixes flow through GitOps + policy guardrails.Auditable: Each decision is a PR with context.Composable: Works with any GitOps tool (ArgoCD, Flux).Extensible: Add more policies (cost, compliance, SLO budgets). Security and Compliance Considerations Security is the most critical aspect of introducing LLM-backed automation into Kubernetes. While agentic operators increase autonomy, they must never bypass established security or compliance frameworks. Key practices include: Secure API Keys in Kubernetes Secrets Store OpenAI or other LLM provider tokens in Kubernetes Secrets. Mount them as environment variables with least-privilege RBAC rules so only the operator pod can access them. Rotate keys regularly. Enforce Strict OPA/Kyverno Policies All AI-generated manifests must pass through admission controls (OPA Gatekeeper or Kyverno). Example checks include blocking privileged containers, enforcing namespace isolation, and requiring resource limits. This ensures that even if the AI suggests a risky change, it is automatically rejected. Secure Supply Chain in CI/GitOps Sign and verify container images (e.g., using Cosign/Sigstore). Validate manifests with tools like Conftest in the CI pipeline before merging. GitOps reconciliation should only trust signed commits from verified contributors. Require Human Approvals for Critical Workloads For production namespaces or sensitive workloads (e.g., financial apps, healthcare), configure GitHub/GitLab branch protection rules so all AI-generated pull requests require human review. This balances automation with governance. Auditability and Logging Log every AI recommendation, the final applied manifest, and the policy evaluation outcome. Store logs centrally (e.g., in Elasticsearch or Loki) for compliance audits and incident forensics. LLM Data Privacy Controls Redact sensitive data (credentials, PII, financial info) before sending context to LLMs. If operating in regulated industries, consider self-hosted LLMs or fine-tuned models that run inside the compliance boundary. Comparison With Alternatives When designing auto-remediation strategies in Kubernetes, it’s important to understand how agentic operators differ from existing approaches: Human SREs Fixing Issues Site Reliability Engineers bring context, intuition, and creativity to novel failures. However, manual intervention is slow, error-prone, and doesn’t scale well in high-velocity, multi-cluster environments. Human review is best reserved for critical or ambiguous changes. Traditional Self-Healing Operators (e.g., Karpenter, VPA, Cluster Autoscaler) These tools excel at deterministic problems: scaling nodes, adjusting pod resources, or replacing failed infrastructure. But they operate within predefined rules. If the failure falls outside their logic (e.g., misconfigurations, novel runtime errors), they simply retry or escalate alerts. Agentic Operators Agentic operators bridge the gap. Powered by LLM reasoning, they interpret logs and manifests, propose concrete fixes, and validate them against policy guardrails before applying via GitOps. Unlike traditional operators, they can adapt to unseen issues. Unlike fully manual SREs, they automate the “first draft” of remediation while still allowing human-in-the-loop governance. In short: Humans = deep context, slowerTraditional operators = fast, rigidAgentic operators = adaptive, policy-driven, scalable Next Steps for Readers Extend the operator to open Jira/GitHub issues if fixes fail.Integrate a local LLM (Ollama/LocalAI) for private inference.Add a feedback loop: store successful/failed remediations in a vector DB and use it for retrieval-augmented reasoning. With this setup, you’ve built the first step toward truly autonomous Kubernetes — but with the safety net of GitOps and policy enforcement. More
Beyond Dashboards: How Autonomous AI Agents Are Redefining Enterprise Analytics
Beyond Dashboards: How Autonomous AI Agents Are Redefining Enterprise Analytics
By Mohan Krishna Mannava
How Tool-Call Observability Enables You to Support Reliable and Secure AI Agents
How Tool-Call Observability Enables You to Support Reliable and Secure AI Agents
By Gil Feig
Prompt Engineering vs Context Engineering
Prompt Engineering vs Context Engineering
By Vineet Bhatkoti
The Tech Landscape of 2026: What Developers Need to Learn Now
The Tech Landscape of 2026: What Developers Need to Learn Now

This roadmap outlines a shift from coder to systems architect. Developers must lead AI integration, build secure and scalable infrastructure, and think sustainably. Theme 1. AI-Driven Revolution: Prompt engineering, LLM orchestration, hyperautomation, data fluencyTheme 2. Modern Infrastructure: Platform engineering, edge computing, DevSecOpsTheme 3. Developer Toolkit: Low-code integration, green software, polyglot programming Your new role: Design and secure intelligent systems, not just code them. From Coder to Systems Architect As we move toward 2026, the software world is accelerating into an AI-native, low-latency, multi-platform future. AI is coding beside us. Low-code tools are enabling non-developers to build full-stack apps. Edge computing is pushing intelligence closer to where decisions happen in the real world. If you're feeling excited or overwhelmed, you're not alone. Theme 1: Master the AI-Driven Revolution AI as the Backbone of Software Generative and agentic AI are becoming core infrastructure. According to Tenet, over 15 million developers were using GitHub Copilot in 2025, and it writes nearly half of a developer's code. The goal is not to compete with AI, but to orchestrate it. Skills to Learn Prompt engineering and LLM frameworks like OpenAI, Anthropic, and RAG pipelinesMLOps using tools like MLflow, SageMaker, and Vertex AIAI governance, compliance, and safety principles Start Now Build a side project using an LLM APIUse vector databases like Pinecone or Weaviate for RAGExplore model monitoring tools such as Evidently or WhyLogs Hyperautomation and Workflow Engineering Hyperautomation combines AI, RPA, and analytics to automate entire workflows. NashTech found that companies are reducing testing and data-processing times by up to 90 percent with automation. Skills to Learn RPA tools like UiPath and Power AutomateWorkflow orchestration with n8n or Apache AirflowBusiness process modeling with BPMN Start Now Automate a deployment or QA processContribute to an open-source RPA tool Data Fluency and ML Foundations Data is the foundation of intelligent systems. According to NashTech, organizations like Netflix and DBS Bank use it to drive personalization and operational decisions. Skills to Learn Data engineering using SQL, dbt, and SparkML fundamentals, including regression, classification, and model evaluationDashboarding tools such as Superset and Metabase Start Now Build a data pipeline using public datasetsCreate a dashboard to visualize and analyze insights Theme 2: Architect Modern and Secure Infrastructure Platform Engineering and Cloud-Native Dev Platform engineering is enabling organizations to deliver faster by abstracting complexity through internal developer platforms (IDPs), according to Datacenters.com. Skills to Learn Kubernetes, Terraform, and GitOps practicesInternal platforms using Backstage or PortContinuous integration and delivery (CI/CD) Start Now Containerize and deploy a service on KubernetesBuild a lightweight IDP for internal tooling Edge Computing and IoT Systems Edge computing is expanding rapidly, particularly where real-time decisions are critical. More than 80 percent of organizations using edge computing report improved decision-making speeds, per PatentPC. Skills to Learn Languages like Rust and Go for performanceEvent and stream processing using Flink or KafkaProtocols such as MQTT and CoAP for lightweight communication Start Now Build and deploy an edge function using Cloudflare WorkersConnect a microcontroller to stream real-time telemetry data DevSecOps and Shift-Left Security Security is a fundamental expectation. According to Datacenters.com, shift-left strategies integrate scanning, monitoring, and mitigation into every development stage. Skills to Learn Secure coding and static analysis with SonarQube or TrivyThreat modeling using STRIDE or DFDsSecrets management and runtime protection Start Now Implement a secure CI/CD pipelineConduct a threat modeling session on a recent project Theme 3: Diversify the Developer’s Toolkit Low-Code and No-Code Integration NashTech analysis predicts that by 2026, 75 percent of new applications will be built with low-code platforms. Developers will focus on integration, security, and scalability. Skills to Learn Tools like Retool, Mendix, and Power AppsAuthentication, authorization, and compliance integrationWriting custom connectors and APIs Start Now Build a business app using a low-code platformSecure it with OAuth, RBAC, and API validation Green Software and Sustainable Dev Software engineers are increasingly responsible for designing energy-efficient systems. AI workloads can emit as much carbon as five vehicles, as reported by TechRadar. Skills to Learn Carbon profiling and GreenOps metricsEfficient compute patterns and architectural decisionsTools like Cloud Carbon Footprint or GreenCost Start Now Measure carbon impact of your deploymentsMigrate workloads to low-carbon zones or providers Polyglot Programming Knowing multiple languages increases your flexibility and hiring potential. TechGig highlights these as key for 2026: LanguageUse CasePythonAI, automation, scriptingRustEmbedded, edge, securityGoCloud services, APIs, concurrencyTypeScriptWeb development, full-stack appsKotlinMobile and backend developmentSwiftiOS and Apple ecosystemJuliaScientific and high-performance computing Start Now Choose one unfamiliar language and build a real-world projectContribute to open-source projects that use your target language Strategic Recap: The Developer’s Road Ahead The developer's role is evolving from builder to architect. Theme 1: AI-Driven Revolution Learn to guide and integrate AI systemsBuild fluency in data and automation tools Theme 2: Infrastructure and Security Architect cloud-native platformsMove closer to the edge while embedding security Theme 3: Developer Toolkit Embrace low-code and green software principlesBuild fluency across languages and paradigms The most valuable developers in 2026 will not be the ones who simply write the most code. They will be the ones who understand systems, design responsibly, and integrate, orchestrate, and secure the intelligent infrastructure that modern software depends on. References Datacenters.com. (2025). Top Software Development Trends to Watch in 2025. https://www.datacenters.com/news/top-software-development-trends-to-watch-in-2025 Exploding Topics. (2024). Top 11 Software Development Trends (2025 and 2026). https://explodingtopics.com/blog/software-development-trends NashTech. (2024). 10 Technology Trends Set to Transform the Next 3 Years. https://our-thinking.nashtechglobal.com/insights/10-technology-trends-shaping-the-next-3-years TechGig. (2025). Top Programming Languages to Master for 2026 Success. https://content.techgig.com/career-advice/top-programming-languages-to-master-for-2026-success/articleshow/123134190.cms TechRadar. (2019). Training a single AI algorithm emits as much CO2 as five American cars. https://www.techradar.com/news/training-a-single-ai-algorithm-emits-as-much-co2-as-five-american-cars Tenet. (2025). Github Copilot Usage Data Statistics (2025) https://www.wearetenet.com/blog/github-copilot-usage-data-statistics Vertesia. (2025). The 2025 DZone Generative AI Trend Report is Here!. https://vertesiahq.com/resources/dzone-generative-ai-trend-report

By Kailash Pati Dutta
Build a LangGraph Multi-Agent System in 20 Minutes With LaunchDarkly AI Configs
Build a LangGraph Multi-Agent System in 20 Minutes With LaunchDarkly AI Configs

Overview Build a working multi-agent system with dynamic configuration in 20 minutes using LangGraph multi-agent workflows, RAG search, and LaunchDarkly AI Configs. Part 1 of 3 of the series: Chaos to Clarity: Defensible AI Systems That Deliver on Your Goals You've been there: your AI chatbot works great in testing, then production hits and GPT-4 costs spiral out of control. You switch to Claude, but now European users need different privacy rules. Every change means another deploy, more testing, and crossed fingers that nothing breaks. The teams shipping faster? They control AI behavior dynamically instead of hardcoding everything. This series shows you how to build LangGraph multi-agent workflows that get their intelligence from RAG search through your business documents. These workflows are enhanced with MCP tools for live external data and controlled through LaunchDarkly AI Configs—all without needing to deploy code changes. What This Series Covers Part 1 (this post): Build a working multi-agent system with dynamic configuration in 20 minutesPart 2: Add advanced features like segment targeting, MCP tool integration, and cost optimizationPart 3: Run production A/B experiments to prove what actually works By the end, you'll have a system that measures its own performance and adapts based on user data instead of guesswork. What You'll Build Today In the next 20 minutes, you'll have a LangGraph multi-agent system with: Supervisor Agent: Orchestrates workflow between specialized agentsSecurity Agent: Detects PII and sensitive informationSupport Agent: Answers questions using your business documentsDynamic Control: Change models, tools, and behavior through LaunchDarkly without code changes Prerequisites You'll need: Python 3.9+ with uv package manager (install uv)LaunchDarkly account (sign up for free)OpenAI API key (required for RAG architecture embeddings)Anthropic API key (required for Claude models) or OpenAI API key (for GPT models) Step 1: Clone and Configure (2 minutes) First, let's get everything running locally. We'll explain what each piece does as we build. TypeScript-JSX # Get the code git clone https://github.com/launchdarkly-labs/devrel-agents-tutorial cd devrel-agents-tutorial # Install dependencies (LangGraph, LaunchDarkly SDK, etc.) uv sync # Configure your environment cp .env.example .env First, you need to get your LaunchDarkly SDK key by creating a project: Sign up for LaunchDarkly at app.launchdarkly.com(free account). If you're a brand new user, after signing up for an account, you'll need to verify your email address. You can skip through the new user onboarding flow after that.Find projects on the side barCreate a new project called "multi-agent-chatbot" Use exact names for Part 2 compatibility: Project: multi-agent-chatbotAI Configs: supervisor-agent, security-agent, support-agentTools: search_v2, rerankingVariations: supervisor-basic, pii-detector, rag-search-enhanced Get your SDK key: Gear Icon (bottom of sidebar) → Projects → multi-agent-chatbot → Gear Icon (to the right) → Environments → Production → SDK keyThis is your LD_SDK_KEYNow edit .env with your keys: TypeScript LD_SDK_KEY=your-launchdarkly-sdk-key # From step above OPENAI_API_KEY=your-openai-key # Required for RAG embeddings ANTHROPIC_API_KEY=your-anthropic-key # Required for Claude models This sets up a LangGraph application that uses LaunchDarkly to control AI behavior. Think of it like swapping actors, directors, even props mid-performance without stopping the show. Do not check the .env into your source control. Keep those secrets safe! Step 2: Add Your Business Knowledge (2 minutes) The system includes a sample reinforcement learning textbook. Replace it with your own documents for your specific domain. TypeScript # Option A: Use the sample (AI/ML knowledge) # Already included: kb/SuttonBarto-IPRL-Book2ndEd.pdf # Option B: Add your documents rm kb/*.pdf # Clear sample cp /path/to/your-docs/*.pdf kb/ Document types that work well: Legal: Contracts, case law, compliance guidelinesHealthcare: Protocols, research papers, care guidelinesSaaS: API docs, user guides, troubleshooting manualsE-commerce: Product catalogs, policies, FAQs These documents will serve as the knowledge base for your RAG search, providing business-specific context to your agents. Step 3: Initialize Your Knowledge Base (2 minutes) Turn your documents into searchable RAG knowledge: # Create vector embeddings for semantic search uv run python initialize_embeddings.py --force This builds your RAG (Retrieval-Augmented Generation) foundation using OpenAI's text-embedding model and FAISS vector database. RAG converts documents into vector embeddings that capture semantic meaning rather than just keywords, making search actually understand context. Step 4: Define Your Tools (3 minutes) Define the search tools your agents will use. In the LaunchDarkly app sidebar, click Library in the AI section. On the following screen, click the Tools tab, then Create tool. Create the RAG vector search tool: Note: We will create a simple search_v1 during Part 3, when we learn about experimentation. For now, create a tool using the following configuration: Key: search_v2 Description: Semantic search using vector embeddings Schema: { "properties": { "query": { "description": "Search query for semantic matching", "type": "string" }, "top_k": { "description": "Number of results to return", "type": "number" } }, "additionalProperties": false, "required": [ "query" ] } When you're done, click Save. Create the reranking tool: Back on the Tools section, click Add tool to create a new tool. Add the following properties: Key: reranking Description: Reorders results by relevance using BM25 algorithm Schema: { "properties": { "query": { "description": "Original query for scoring", "type": "string" }, "results": { "description": "Results to rerank", "type": "array" } }, "additionalProperties": false, "required": ["query", "results"] } When you're done, click Save. The reranking tool takes search results from search_v2 and reorders them using the BM25 algorithm to improve relevance. This hybrid approach combines semantic search (vector embeddings) with lexical matching (keyword-based scoring), making it particularly useful for technical terms, product names, and error codes where exact term matching is more important than conceptual similarity. How Your RAG Architecture Works Your RAG system works in two stages: search_v2 performs a semantic similarity search using FAISS by converting queries into the same vector space as your documents (via OpenAI embeddings), while reranking reorders results for maximum relevance. This RAG approach significantly outperforms keyword search by understanding context, so asking "My app is broken" can find troubleshooting guides that mention "application errors" or "system failures." Step 5: Create Your AI Agents in LaunchDarkly (5 minutes) Now that you've created the tools your agents will use, it's time to configure the agents themselves. Each agent will have its own AI Config that defines its behavior, model selection, and specific instructions. Create LaunchDarkly AI Configurations to dynamically control your LangGraph multi-agent system. LangGraph is LangChain's framework for building stateful, multi-agent applications that maintain conversation state across agent interactions. Your LangGraph architecture enables sophisticated workflows where agents collaborate and pass context between each other. Create the Supervisor Agent In the LaunchDarkly dashboard sidebar, navigate to AI Configs and click Create AI ConfigSelect Agent-basedName your AI Config supervisor-agent. This will be the key you reference in your code.Configure the following fields in the AI Config form: variation: supervisor-basic Model configuration: Anthropic claude-3-7-sonnet-latest Goal or task: You are an intelligent routing supervisor for a multi-agent system. Your primary job is to assess whether user input likely contains PII (personally identifiable information) to determine the most efficient processing route. PII Assessment: Analyze the user input and provide: - likely_contains_pii: boolean assessment - confidence: confidence score (0.0 to 1.0) - reasoning: clear explanation of your decision - recommended_route: either 'security_agent' or 'support_agent' Route to SECURITY_AGENT if the text likely contains: - Email addresses, phone numbers, addresses - Names (first/last names, usernames) - Financial information (credit cards, SSNs, account numbers) - Sensitive personal data Route to SUPPORT_AGENT if the text appears to be: - General questions without personal details - Technical queries - Search requests - Educational content requests Analyze this user input and recommend the optimal route: 5. Click Review and save. Now enable your AI Config by switching to the Targeting tab and editing the default rule to serve the variation you just created. 6. Click Edit on the Default rule, change it to serve your supervisor-basic variation, and save with a note like "Enabling new agent config". Then type "Production" to confirm. The supervisor agent demonstrates LangGraph orchestration by routing requests based on content analysis rather than rigid rules. LangGraph enables this agent to maintain conversation context and make intelligent routing decisions that adapt to user needs and LaunchDarkly AI Config parameters. Create the Security Agent Similarly, create another AI Config called security-agent variation: pii-detector Model configuration: Anthropic claude-3-7-sonnet-latest Goal or task: You are a privacy agent that REMOVES PII and formats the input for another process. Analyze the input text and identify any personally identifiable information including: Email addresses, Phone numbers, Social Security Numbers, Names (first, last, full names), Physical addresses, Credit card numbers, Driver's license numbers, Any other sensitive personal data. Respond with: detected: true if any PII was found, false otherwise, types: array of PII types found (e.g., ['email', 'name', 'phone']), redacted: the input text with PII replaced by [REDACTED], keeping the text readable and natural. Examples: Input: 'My email is [email protected] and I need help', Output: detected=true, types=['email'], redacted='My email is [REDACTED] and I need help'. Input: 'I need help with my account', Output: detected=false, types=[], redacted='I need help with my account'. Input: 'My name is Sarah Johnson and my phone is 555-1234', Output: detected=true, types=['name', 'phone'], redacted='My name is [REDACTED] and my phone is [REDACTED]'. Be thorough in your analysis and err on the side of caution when identifying potential PII This agent detects PII and provides detailed redaction information, showing exactly what sensitive data was found and how it would be handled for compliance and transparency. Remember to switch to the Targeting tab and enable this agent the same way we did for the supervisor — edit the default rule to serve your pii-detector variation and save it. Create the Support Agent Finally, create support-agent variation: rag-search-enhanced Model configuration: Anthropic claude-3-7-sonnet-latest → Add parameters → Click Custom parameters {"max_tool_calls":5} Click Attach tools. select: ✓reranking ✓search_v2 Goal or task: You are a helpful assistant that can search documentation and research papers. When search results are available, prioritize information from those results over your general knowledge to provide the most accurate and up-to-date responses. Use available tools to search the knowledge base and external research databases to answer questions accurately and comprehensively. This agent combines LangGraph workflow management with your RAG tools. LangGraph enables the agent to chain multiple tool calls together: first using RAG for document retrieval, then semantic reranking, all while maintaining conversation state and handling error recovery gracefully. Remember to switch to the Targeting tab and enable this agent the same way — edit the default rule to serve your rag-search-enhanced variation and save it. When you are done, you should have three enabled AI Config Agents. Step 6: Launch Your System (2 minutes) Start the system: # Terminal 1: Start the backend uv run uvicorn api.main:app --reload --port 8000 # Terminal 2: Launch the UI uv run streamlit run ui/chat_interface.py --server.port 8501 Open http://localhost:8501 in your browser. You should see a clean chat interface. Note: If prompted for authentication, you can leave the email field blank and simply click "Continue" to proceed to the chat interface. Step 7: Test Your Multi-Agent System (2 minutes) Test with these queries: Basic Knowledge Test: "What is reinforcement learning?" (if using sample docs) Or ask about your specific domain: "What's our refund policy?" PII Detection Test: "My email is [email] and I need help." Workflow Details show: Which agents are activatedWhat models and tools are being usedText after redaction Watch LangGraph in action: the supervisor agent first routes to the security agent, which detects PII. It then passes control to the support agent, which uses your RAG system for document search. LangGraph maintains state across this multi-agent workflow so that context flows seamlessly between agents. Step 8: Try New Features Experience the power of dynamic configuration by making real-time changes to your agents without touching any code: Feature 1: Switch Models Instantly Navigate to AI Configs in the LaunchDarkly sidebarClick on support-agentIn the Model configuration section, change from: Current: Anthropic → claude-3-7-sonnet-latestNew: OpenAI → gpt-4-turboClick Save changesReturn to your chat interface at http://localhost:8501Ask the same question again - you'll see the response now comes from GPT-4What you'll notice: Different response style, potentially different tool usage patterns, and the model name displayed in the workflow details Feature 2: Adjust Tool Usage Limit how many times your agent can call tools in a single interaction: While still in the support-agent configFind the Custom parameters sectionUpdate the JSON from: {"max_tool_calls": 5}To: {"max_tool_calls": 2}Click Save changesIn your chat, ask a complex question that would normally trigger multiple searchesWhat you'll notice: The agent now makes, at most, two tool calls, forcing it to be more selective about its searches Feature 3: Change Agent Behavior Transform your support agent into a research specialist: In the support-agent config, locate the Goal or task fieldReplace the existing instructions with: You are a research specialist. Always search multiple times from different angles before answering. Prioritize accuracy over speed. For any question, perform at least 2 different searches with varied search terms to ensure comprehensive coverage. Cite your sources and explain your search strategy.Click Save changesTest with a question like "What are the best practices for feature flags?"What you'll notice: The agent now performs multiple searches, explains its search strategy, and provides more thorough, research-oriented responses All changes take effect immediately — no deployment, no restart, no downtime. Your users experience the updates in real-time. Understanding What You've Built Your LangGraph multi-agent system with RAG includes: LangGraph Orchestration: The supervisor agent uses LangGraph state management to route requests intelligently based on content analysis.Privacy Protection: The supervisor agent uses LangGraph state management to route requests intelligently. This separation allows you to assign a trusted model to the security and supervisor agents and consider on a less-trusted model for the more expensive support agent at a reduced risk of PII exposure.RAG Knowledge System: The support agent combines LangGraph tool chaining with your RAG system for semantic document search and reranking.Runtime Control: LaunchDarkly controls both LangGraph behavior and RAG parameters without code changes. What's Next? Your multi-agent system is running with dynamic control and is ready for optimization. In Part 2, we'll add: Geographic-based privacy rules (strict for EU, standard for other)MCP tools for external dataBusiness tier configurations (free, paid)Cost optimization strategies In Part 3, we'll run A/B experiments to prove which configurations actually work best with real data. Try This Now Experiment with: Different Instructions: Make agents more helpful, more cautious, or more thoroughTool Combinations: Add/remove tools to see impact on qualityModel Comparisons: Try different models for different agentsCost Limits: Find the sweet spot between quality and cost Every change is instant, measurable, and reversible. Key Takeaways Multi-agent systems work best when each agent has a specific roleDynamic configuration handles changing requirements better than hardcodingLaunchDarkly AI Configs control and change AI behavior without requiring deploymentsStart simple and add complexity as you learn what works Ready for more? Continue to Part 2: Smart AI Agent Targeting with MCP Tools. Related Resources Explore the LaunchDarkly MCP Server - enable AI agents to access feature flag configurations, user segments, and experimentation data directly through the Model Context Protocol.

By Scarlett Attensil
AI-Assisted Software Engineering With OOPS, SOLID Principles, and Documentation
AI-Assisted Software Engineering With OOPS, SOLID Principles, and Documentation

Top-down and bottom-up approaches are two problem-solving approaches to divide and conquer a problem. What Is the Top-Down Approach? Take any problem, break it down until you are in a position to schedule it to a machine, OS, SDK, or software system. What Is the Bottom-Up Approach? You have pluggable solutions or building blocks such as machine, os, sdk, which you are aware of, given a problem you put the building blocks together to build a complete solution. How Did OOPS Promote the Bottom-Up Approach in Software? With OOPS, we started building more and more software building blocks that could be reused and combined. A lot of libraries providing many components emerged from OOPS languages like Java, C++, etc. Did OOPS and Bottom-Up Approach Solve the Problem? Yes, partially. Still, most requirements come to us from the top (i.e., from a business perspective), and we have to solve them using a top-down approach to stitch the components together to meet the requirements. The business landscape keeps changing. We need reusable components, but we also need them plugged into our system to solve business problems. The business identifies "needs," which are then converted to "user stories," and these use cases are converted again into "requirement specifications." As the saying goes, 1 need turns to 100 user stories and 100 user stories turn into 1000 requirements. The Needs also keep changing over time, thereby changing user stories and requirement specifications as well. This breakdown is still top-down. Here’s a practical example: Business Need “Improve customer retention by enabling a personalized shopping experience.” User Stories As a returning customer, I want to see product recommendations based on my previous purchases so I can find relevant products faster.As a logged-in user, I want to see my recently viewed products on the homepage so I can resume shopping easily.As a customer, I want to save items to a wishlist so I can revisit them later. Technical Specification (for Story #1) API: GET /api/recommendations/{userId}Data source: Purchase history, user behavior trackingAlgorithm: Collaborative filtering or external ML serviceSecurity: GDPR-compliant data usage and opt-out capabilitiesFront-end integration: Carousel component in homepage layout This breakdown shows how business strategy leads to concrete technical outcomes. While reusable components help implement these specifications, the requirements flow remains top-down. But what should you do if you want to slow down these landscaping changes while still adapting to fast-changing business needs? To adapt to such a changing landscape, we are supposed to make use of SOLID principles. Single Responsibility Principle (SRP): Each module or class should do one thing well.Open/Closed Principle (OCP): Components should be open for extension but closed for modification.Liskov Substitution Principle (LSP): Derived types must be substitutable for their base types.Interface Segregation Principle (ISP): Favor many small, specific interfaces over large, general-purpose ones.Dependency Inversion Principle (DIP): High-level modules should not depend on low-level modules; both should depend on abstractions. Inversion of control principle, when putting components together, it is simply "called" in other terms, such as dependency injection. While we aspire to work on the highest maturity level, like runtime dependency switch, we should be in a position to build reusable components, each having a single responsibility, each open for extension and closed for modifications, and each is ready to be substituted. While each SOLID principle is helpful, start with the Dependency Injection principle. This is a missing piece that can help us slow down the fast-paced changing landscape of needs and increase our churn time or turnaround time, and also introduce us to other principles. How to Apply AI Assistance While Doing It? Even if there is an AI OS tomorrow, the art of building software with reusable components won't change, but AI assistance should be aware of all reusable components and libraries on the market that are maintained and secure to use. AI assistance should know how the components that are present in the current developed system work and their responsibilities, and how they can be extended and substituted. While addressing the business needs, we should make ourselves aware of the need for AI assistance. To reiterate, A catalog of well-maintained reusable components and libraries is required.Needs, user stories, and requirement specification document required.Documentation of component of current software with responsibilities, extendability, and substitutability is required. Are the current AI assistants not doing the required? Yes, but partially no. While LLMs are doing their best to understand the entire workspace, it is only the instructions from the developers that is going to guide AI assistant. In this post, we have only pulled out a systematic approach to leverage an AI assistant effectively. Conclusion In this article, we highlighted bottom up and top down approaches and how top down changes the needs, user stories and requirements and how we adopt OOPS principles and why to rely on SOLID principles, especially dependency injection and dependency over abstractions to build softwares bottom up and how to extend this methods with AI by preparing necessary documents relevant that can align with developer intent.

By Narendran Solai Sridharan
Series (4/4): Toward a Shared Language Between Humans and Machines — Humans as Co-Creators: Ethics, Strategy, and the Future of a Shared Language
Series (4/4): Toward a Shared Language Between Humans and Machines — Humans as Co-Creators: Ethics, Strategy, and the Future of a Shared Language

AI inspires both fascination and fear: are machines capable of replacing us, or are they merely assistants? The real question is not substitution, but co-creation. How can we preserve the uniqueness of human intelligence while harnessing the power of models? This article explores the ethical, economic, and political challenges of a future where humans and machines will have to invent a common language together. In areas such as code translation or transcompilation, neural models can outperform traditional methods and speed up processes. But their role is not to replace human expertise; it is to extend and enhance it. In fields such as medicine, architecture, or education, AI can help simulate, plan, and generate alternatives, but in the end, it is the human who must decide, interpret, and give meaning. This transformation is reflected in the role of linguistic and cultural experts, who are no longer mere translators but have become consultants, guardians of quality and relevance. Language models can reproduce cultural biases; they lack contextual sensitivity. That is why human cultural intelligence, ethical intuition, and emotional intelligence remain irreplaceable. Humans must remain the guarantors of trust and responsibility in every high-stakes interaction. “By positioning AI as a knowledge delivery tool rather than an autonomous practitioner, we can develop systems that genuinely enhance professional practice while preserving the essential human elements of social work. Our study demonstrates these models' facility with foundational social work knowledge; the next step is leveraging this capability to create thoughtfully designed support systems that help practitioners better serve their clients.” - Zia Qi - “AI and Cultural Context: An Empirical Investigation of Large Language Models' Performance on Chinese Social Work Professional Standards" Beyond professional use, the growing integration of AI into our lives raises societal questions that touch on cognition. One of the risks is a gradual loss of human skills, just as GPS has diminished our natural ability to find our way. To avoid pitfalls, it is necessary to frame the development and use of AI systems within an ethical, contextual, and empirical reflection. Professionals must play an active role in this process, not only by supervising, but also by guiding the evolution of these tools so that they reflect the human values they are meant to serve. Thus, the future of a shared language between humans and machines does not depend solely on technology; it rests on our collective ability to preserve the uniqueness of human intelligence and to guide AI as a partner in co-creation, not as a substitute. From my point of view, the true value of AI does not lie in its autonomy but in its ability to strengthen human skills, especially in critical contexts where responsibility and ethics are non-negotiable. Economic and Strategic Implications While the question of a shared human–machine language is, at first, a scientific or philosophical challenge, it also carries major economic and strategic implications. It is essential to keep in mind the broader issues of competitiveness and innovation, regulation and governance, as well as the transformation of human capital and skills. The players capable of harnessing this shared language between humans and machines will gain a tangible competitive advantage. In industry, for instance, digital twins enable more precise and faster simulations; in healthcare, intelligent assistants can help personalize treatments; and in software development, automatic translation between programming languages could accelerate innovation. In light of technological innovations, it is important to remember that these improvements, even within the goal of developing a shared language, raise questions of international governance. In particular, how can we regulate the translation of human intentions into machine language to prevent bias and manipulation? A balanced form of regulation could become a factor of competitiveness. The goal would be to provide businesses and governments with a stable environment conducive to responsible innovation. Conclusion The quest for a shared language between humans and machines goes far beyond technology. It redefines what we mean by intelligence, communication, and humanity. The avenues explored, from world models to quantum experiments, allow us to envision a common space where co-construction becomes possible. But such a space can only exist with the active participation of humans. AI must remain a partner, not a substitute. Cultural expertise, ethical judgment, and emotional intelligence remain uniquely human strengths, indispensable for guiding the use of these tools. In this sense, I believe this challenge is above all cultural and political. It is up to us to write its rules, to prevent it from being seized by a few actors on the technical or geopolitical stage, and to preserve the richness of our experiences in this hybrid future. Links to the previous articles published in this series: Series: Toward a Shared Language Between Humans and MachinesSeries (1/4): Toward a Shared Language Between Humans and Machines — Why Machines Still Struggle to Understand UsSeries (2/4): Toward a Shared Language Between Humans and Machines — From Multimodality to World Models: Teaching Machines to ExperienceSeries (3/4): Toward a Shared Language Between Humans and Machines — Quantum Language and the Limits of Simulation References Abbaszade, Mina; Zomorodi, Mariam; Salari, Vahid; Kurian, Philip. "Toward Quantum Machine Translation of Syntactically Distinct Languages". [link] Brodsky, Sascha. "World models help AI learn what five-year-olds know about gravity". IBM. [link] Gubelmann, Reto. "Pragmatic Norms Are All You Need – Why The Symbol Grounding Problem Does Not Apply to LLMs". [link]Harnad, Stevan. "The Symbol Grounding Problem". [link]LEO (Linguist Education Online). "Human Intelligence in the Age of AI: How Interpreters and Translators Can Thrive in 2025". [link]Meta AI. "Yann LeCun on a vision to make AI systems learn and reason like animals and humans". [link]Opara, Chidimma. "Distinguishing AI-Generated and Human-Written Text Through Psycholinguistic Analysis". [link]Qi, Zia; Perron, Brian E.; Wang, Miao; Fang, Cao; Chen, Sitao; Victor, Bryan G. "AI and Cultural Context: An Empirical Investigation of Large Language Models' Performance on Chinese Social Work Professional Standards". [link] Roziere, Baptiste; Lachaux, Marie-Anne; Chanussot, Lowik; Lample, Guillaume. "Unsupervised Translation of Programming Languages". [link]Strickland, Eliza. "AI Godmother Fei-Fei Li Has a Vision for Computer Vision". IEEE Spectrum. [link]Trott, Sean. "Humans, LLMs, and the symbol grounding problem (pt. 1)". [link]Nature. “Chip-to-chip photonic quantum teleportation over optical fibers, 2025” [link]

By Frederic Jacquet DZone Core CORE
AI Code Generation: The Productivity Paradox in Software Development
AI Code Generation: The Productivity Paradox in Software Development

Measuring and improving developer productivity has long been a complex and contentious topic in software engineering. With the rapid rise of AI across nearly every domain, it's only natural that the impact of AI tooling on developer productivity has become a focal point of renewed debate. A widely held belief suggests that AI could either render developers obsolete or dramatically boost their productivity — depending on whom you ask. Numerous claims from organizations linking layoffs directly to AI adoption have further intensified this perception, casting AI as both a disruptor and a catalyst. In this article, we'll examine the current landscape and delve into recent studies and surveys that investigate how AI is truly influencing developer productivity. Studies Let's explore the findings from the studies below, which assess the impact of AI tooling on developer productivity. Study #1: Experienced Open-Source Developer Productivity To evaluate the impact of AI coding assistant tools on the productivity of experienced open-source developers, a randomized controlled trial (RCT) was conducted from February to June 2025 using the tools. A total of 16 developers with an average of 5 years of experience were chosen to complete a total of 246 tasks in mature projects. These tasks were randomly assigned among developers, with either AI tools being allowed or disallowed, respectively. Before starting tasks, developers forecast that task completion time would decrease by 24% with AI. After completing the task, developers estimated that with AI, the completion time had been reduced by 20%. However, on the contrary, the study found that allowing AI actually increased task completion time by 19%. Moreover, these results are in stark contradiction of experts prediction of task completion time reduction of up to ~39%. Below is the summary of the prediction and findings mismatch: Experts and study participants misjudged the speedup of AI tooling. Image courtesy of respective research. Although the study concludes that AI tooling slowed developers down, it could be due to a variety of factors, with five key factors for observed slowdown listed below: Over-optimism about AI usefulness (Direct productivity loss). Developers are free to use AI tools as they see fit, but their belief that AI boosts productivity is often overly optimistic. They estimate a 20–24% time reduction from AI, even when the actual impact may be neutral or negative, potentially leading to overuse.High developer familiarity with repositories (Raises developer performance). AI assistance tends to be less helpful, and may even slow developers down, on tasks where they have high prior experience and need fewer external resources. Developers report AI as more beneficial for unfamiliar tasks, suggesting its value lies in bridging knowledge gaps rather than enhancing expert workflows.Large and complex repositories (Limits AI performance). Developers report that LLM tools struggle in complex environments, often introducing errors during large-scale edits. This aligns with findings that AI performs worse in mature, large codebases compared to simpler, greenfield projects.Low AI reliability (Limits AI performance). Developers accept less than ~44% of AI-generated code, often spending significant time reviewing, editing, or discarding it. Even accepted outputs require cleanup, with ~75% reading every line and ~56% making major changes, leading to notable productivity loss.Implicit repository context (Limits AI performance, raises developer performance). AI tools often struggle to assist effectively in mature codebases due to a lack of developers' tacit, undocumented knowledge. This gap leads to less relevant suggestions, especially in nuanced cases like backward compatibility or context-specific edits. Due to these factors, the gains of auto-code generation are offset considerably, and thus the significant contrast in perceived/forecasted and actual results in developer productivity is exposed. Also, with the AI tooling, the developer is required to spend additional time on prompting, reviewing AI-generated suggestions, and integrating code outputs with complex codebases. Thus, adding to the overall completion time. See below for average time spent per activity — with and without AI tooling. Average time spent per activity. Image courtesy of respective research. Takeaway: The study reveals a perception gap where AI usage subtly hampers productivity, despite users believing otherwise. While findings show a slowdown in large, complex codebases, researchers caution against broad conclusions and emphasize the need for rigorous evaluation as AI tools and techniques continue to evolve. Thus, the study should merely be considered as a data point in evaluation and not a verdict. Study #2: GitClear The GitClear study analyzed ~211 million structured code changes from 2020 to 2024 to assess how AI-assisted coding impacts developer productivity. It categorized changes — like added, moved, copied/pasted, and churned lines — using GitClear's Diff Delta model to track short-term velocity versus long-term maintainability. Duplicate block detection was introduced to measure how often AI-generated code repeats existing logic. The methodology links rising output metrics to declining code reuse, revealing hidden costs in perceived productivity gains. Below is the trend of code operations and code churn by year as cited in the report. GitClear AI Code Quality Research — Code operations and code churn by year. Image courtesy of respective research. The following points can be inferred from the study: Increased code output: AI-assisted development led to a significant rise in the number of lines added, up 9.2% YoY in 2024. This could be perceived as an increase in developer productivity due to faster code generation and higher task (ticket) completion throughput. However, the key question remains — are the added lines of code required in the first place?Decline in refactoring (“moved” code): “Moved” lines — an indicator of refactoring — dropped nearly 40% YoY in 2024, falling below 10% for the first time. This can be attributed to the developer accepting the AI-generated code as-is and skipping the effort to refactor (to save time). Moreover, AI tools rarely suggest refactoring due to limited context windows, and thus fuel the overall drop.Surge in copy-and-pasted and duplicated code. Copy/pasted lines exceeded moved lines in 2024, with a 17.1% increase YoY. Commits with duplicated blocks (≥5 lines) rose 8x in 2024 compared to 2022. 6.66% of commits now contain such blocks. This, too, can be attributed to the developer accepting the AI-generated code as-is without much effort to keep the code DRY.Increased churn in newly added code. Churn — code revised within 2–4 weeks — increased 20–25% in 2024, i.e., developers are revisiting new code more frequently. This also implies that although the code output surged with AI tooling, due to low quality, code is being revised sooner than it used to happen earlier (when no or limited AI tooling was utilized). Takeaway: The rise in AI-generated code has led to a parallel increase in copy-pasted fragments, duplication, and churn — while refactoring efforts have notably declined. This trend signals a deterioration in overall code quality. Many organizations still gauge developer productivity by metrics like lines of code added or tasks completed. However, these indicators can be easily inflated by AI, often at the expense of long-term maintainability. The result is bloated codebases with higher duplication, reduced clarity, and an expanded surface area for bugs. While AI may boost short-term development velocity, the trade-off is accumulating technical debt and diminished code quality — costs that will surface over time in the form of increased maintenance burden and reduced agility. Surveys While studies often rely on data-driven methodologies, these approaches can sometimes be questioned for their assumptions or limitations. Surveys, on the other hand, offer direct insight into developer sentiment and can help bridge gaps that traditional studies might overlook. In the sections below, we explore findings from independent surveys that assess the impact of AI tools on developer productivity. Survey #1: StackOverflow In its 2025 annual developer survey, Stack Overflow received over 49k responses, covering various aspects, including AI tooling and its related impact. Do note that I, too, was one of the respondents. Among respondents, overall AI tool usage surged to ~84% from ~76% the previous year. The AI tool positive sentiment however dropped by ~10 percentage points signaling a trust deficit by the developers— more on this later. AI tools usage and sentiment. Image courtesy of respective survey results. Among the respondents, ~46% of users actively distrust the accuracy of AI tools. Moreover, ~66% cited that the AI tools solution is not up to mark, and ~45% cited that these solutions require additional debugging time. This clearly means the developer requires additional effort to understand, debug, and potentially refine AI-generated code, effectively increasing overall task completion time. Although trust in AI tools' ability to handle complex tasks surged by ~6 percentage points, this could be due to AI tool enhancements or to developers' overall lack of trust in AI tools' accuracy. Thus, totally avoiding AI tools for any complex tasks given the risk it carries in terms of quality and other aspects. Given the significant trust deficit in the accuracy of AI tools, the decline in positive sentiment seen in the previous section could be well related. Trust in AI tools accuracy and ability to handle complex tasks. Image courtesy of respective survey results. Frustrations with AI tools and humans as the ultimate arbiters of quality and correctness. Image courtesy of respective survey results. Even though the AI agents adoption isn't mainstream more than half of the respondents ~52% cited productivity gains. The AI Agents perhaps could be a space worth watching for as they are relatively new thus a lot of potential enhancement could follow in upcoming years. Moreover, given the contextual information it utilizes to generate code, they seem promising over the simpler AI tools. AI agents and impact on work productivity. Image courtesy of respective survey results. Takeaway: The survey revealed a sharp rise in AI tool adoption accompanied by a notable drop in positive sentiment highlighting a growing trust deficit. Majority of respondents expressing active distrust in AI tool accuracy, due to subpar solutions, suggesting that AI-generated code often demands extra effort to refine and validate. This offsets the productivity gain from faster code generation. Interestingly, trust in AI tools' ability to handle complex tasks rose, reflecting cautious optimism rather than full confidence. Developers still see themselves as the ultimate judges of code quality, reinforcing the need for human oversight. Meanwhile, AI agents — though not yet widely adopted — show early promise. Their use of contextual information positions them as a potentially more reliable and efficient evolution of current AI tooling. Survey #2: Harness Harness surveyed 500 engineering leaders and practitioners to assess various parameters, including the impact of AI on developer productivity. Although the surveyed participants showed overall positive sentiments towards AI tooling and its adoption, 92% also highlighted the associated risks. In an independent related observation, the risks are corroborated. AI Missteps and Impact Radius. Image courtesy: https://martinfowler.com/articles/exploring-gen-ai/13-role-of-developer-skills.html Almost two-thirds of respondents mentioned that they spend more time debugging AI-generated code and/or resolving security vulnerabilities. AI tooling may also generate code that includes outdated dependencies or insecure coding patterns, requiring developers to spend time updating and patching these vulnerabilities. This significantly increases the developer overhead and potentially offsets a considerable part of the productivity gains with AI tooling. Two-third of respondent requires more time debugging AI generated code and/or resolving security vulnerabilities. Image courtesy of respective survey results. About 59%nearly half offsets the gains due to rework or additional efforts 59% of developers experience deployment problems with AI tooling involved. Image courtesy of respective survey results. Since 60% of the respondents don't evaluate the effectiveness of the tools, it's quite challenging to relate it to developer productivity altogether. 60% of respondents don't evaluate the effectiveness of AI tooling. Image courtesy of respective survey results. Takeaway: The survey reveals a nuanced picture of AI's impact on developer productivity. While most respondents expressed optimism about AI tooling, but also flagged significant risks. Notably, the majority reported spending more time debugging AI-generated code and addressing security vulnerabilities — contradicting the assumption that AI always boosts efficiency. Deployment issues further compound the overhead, with many encountering frequent rework. The lack of tool effectiveness evaluation by many respondents underscores the challenge of accurately measuring productivity gains. Overall, the findings highlight that AI adoption demands careful oversight to avoid offsetting its intended benefits. Conclusion The studies and surveys analyzed paint a complex picture of AI's role in software development, revealing that perceived productivity gains often mask deeper issues. While AI tools may accelerate coding tasks, they also introduce duplication, churn, and technical debt — especially in large codebases — undermining long-term maintainability. Trust in AI-generated code remains fragile, with developers frequently needing to debug and refine outputs. This erodes efficiency, offsets gain from faster code generation, and highlights the importance of human oversight. Crucially, coding represents only a fraction of the overall software delivery cycle. Improvements in cycle time don't necessarily translate to gains in lead time. Sustainable productivity demands more than speed — it requires thoughtful architecture, strategic reuse, and vigilant monitoring of maintainability metrics. In essence, AI can be a powerful accelerator, but without deliberate human intervention, its benefits risk being short-lived. References and Further Reads Measuring the Impact of Early-2025 AI on Experienced Open-Source Developer ProductivityGitClear Code Quality Study — 2024 | 2025Harness — State of Software DeliverySO Developer Survey 2025Role of Developer Skills in Agentic Coding

By Ammar Husain
Agentic AI Using Apache Kafka as Event Broker With the Agent2Agent Protocol (A2A) and MCP
Agentic AI Using Apache Kafka as Event Broker With the Agent2Agent Protocol (A2A) and MCP

Agentic AI is gaining traction as a design pattern for building more intelligent, autonomous, and collaborative systems. Unlike traditional task-based automation, agentic AI involves intelligent agents that operate independently, make contextual decisions, and collaborate with other agents or systems—across domains, departments, and even enterprises. In the enterprise world, agentic AI is more than just a technical concept; it is a transformative force. It represents a shift in how systems interact, learn, and evolve. But unlocking its full potential requires more than AI models and point-to-point APIs — it demands the right integration backbone. That’s where Apache Kafka, as an event broker for true decoupling, comes into play, together with two emerging AI standards: Google's Application-to-Application (A2A) Protocol and Antropic's Model Context Protocol (MCP), in an enterprise architecture for Agentic AI. Business Value of Agentic AI in the Enterprise For enterprises, the promise of agentic AI is compelling: Smarter automation through self-directed, context-aware agentsImproved customer experience with faster and more personalized responsesOperational efficiency by connecting internal and external systems more intelligentlyScalable B2B interactions that span suppliers, partners, and digital ecosystems But none of this works if systems are coupled by brittle point-to-point APIs, slow batch jobs, or disconnected data pipelines. Autonomous agents need continuous, real-time access to events, shared state, and a common communication fabric that scales across use cases. Model Context Protocol (MCP) + Agent2Agent (A2A): New Standards for Agentic AI The Model Context Protocol (MCP) coined by Anthropic offers a standardized, model-agnostic interface for context exchange between AI agents and external systems. Whether the interaction is streaming, batch, or API-based, MCP abstracts how agents retrieve inputs, send outputs, and trigger actions across services. This enables real-time coordination between models and tools—improving autonomy, reusability, and interoperability in distributed AI systems. Source: Anthropic Google’s Agent2Agent (A2A) protocol complements this by defining how autonomous software agents can interact with one another in a standard way. A2A enables scalable agent-to-agent collaboration—where agents discover each other, share state, and delegate tasks without predefined integrations. It’s foundational for building open, multi-agent ecosystems that work across departments, companies, and platforms.Source: Google Why Apache Kafka Is a Better Fit Than an API (HTTP/REST) for A2A and MCP Most enterprises today use HTTP-based APIs to connect services — ideal for simple, synchronous request-response interactions. In contrast, Apache Kafka is a distributed event streaming platform designed for asynchronous, high-throughput, and loosely coupled communication — making it a much better fit for multi-agent (A2A) and agentic AI architectures. APIKafkaSynchronous, blockingAsynchronous, event-drivenPoint-to-point couplingLoose coupling with pub/sub topicsHard to scale to many agentsSupports multiple consumers nativelyNo shared memoryRetains and replays event historyLimited observabilityFull traceability with schema registry & DLQs Kafka serves as the decoupling layer. It becomes the place where agents publish their state, subscribe to updates, and communicate changes — independently and asynchronously. This enables multi-agent coordination, resilience, and extensibility. MCP + Kafka = Open, Flexible Communication As the adoption of Agentic AI accelerates, there’s a growing need for scalable communication between AI agents, services, and operational systems. The Model-Context Protocol (MCP) is emerging as a standard to structure these interactions — defining how agents access tools, send inputs, and receive results. But a protocol alone doesn’t solve the challenges of integration, scaling, or observability. This is where Apache Kafka comes in. By combining MCP with Kafka, agents can interact through a Kafka topic — fully decoupled, asynchronous, and in real time. Instead of direct, synchronous calls between agents and services, all communication happens through Kafka topics, using structured events based on the MCP format. This model supports a wide range of implementations and tech stacks. For instance: A Python-based AI agent deployed in a SaaS environmentA Spring Boot Java microservice running inside a transactional core systemA Flink application deployed at the edge performing low-latency stream processingAn API gateway translating HTTP requests into MCP-compliant Kafka events Regardless of where or how an agent is implemented, it can participate in the same event-driven system. Kafka ensures durability, replayability, and scalability. MCP provides the semantic structure for requests and responses. The result is a highly flexible, loosely coupled architecture for Agentic AI — one that supports real-time processing, cross-system coordination, and long-term observability. This combination is already being explored in early enterprise projects and will be a key building block for agent-based systems moving into production. Stream Processing as the Agent’s Companion Stream processing technologies like Apache Flink or Kafka Streams allow agents to: Filter, join, and enrich events in motionMaintain stateful context for decisions (e.g., real-time credit risk)Trigger new downstream actions based on complex event patternsApply AI directly within the stream processing logic, enabling real-time inference and contextual decision-making with embedded models or external calls to a model server, vector database, or any other AI platform Agents don’t need to manage all logic themselves. The data streaming platform can pre-process information, enforce policies, and even trigger fallback or compensating workflows — making agents simpler and more focused. Technology Flexibility for Agentic AI Design with Data Contracts One of the biggest advantages of a Kafka-based event-driven and decoupled backend for agentic systems is that agents can be implemented in any stack: Languages: Python, Java, Go, etc.Environments: Containers, serverless, JVM apps, SaaS toolsCommunication styles: Event streaming, REST APIs, scheduled jobs The Kafka topic is the stable data contract for quality and policy enforcement. Agents can evolve independently, be deployed incrementally, and interoperate without tight dependencies. Microservices, Data Products, and Reusability - Agentic AI Is Just One Piece of the Puzzle To be effective, Agentic AI must integrate seamlessly with existing operational systems and business workflows. Kafka topics enable the creation of reusable data products that serve multiple consumers — AI agents, dashboards, services, or external partners. This aligns perfectly with data mesh and microservice principles, where ownership, scalability, and interoperability are key. A single stream of enriched order events might be consumed via a single data product by: A fraud detection agentA real-time alerting systemAn agent triggering SAP workflow updatesA lakehouse for reporting and batch analytics This One-to-Many model is the opposite of traditional REST designs, and is crucial for enabling agentic orchestration at scale. Agentic Al Needs Integration with Core Enterprise Systems Agentic AI is not a standalone trend—it’s becoming an integral part of broader enterprise AI strategies. While this post focuses on architectural foundations like Kafka, MCP, and A2A, it’s important to recognize how this infrastructure complements the evolution of major AI platforms. Leading vendors such as Databricks, Snowflake, and others are building scalable foundations for machine learning, analytics, and generative AI. These platforms often handle model training and serving. But to bring agentic capabilities into production—especially for real-time, autonomous workflows—they must connect with operational, transactional systems and other agents at runtime. For additional reading, see also my blog series Confluent + Databricks and Apache Kafka + Snowflake. This is where Kafka, as the event broker, becomes essential: it links these analytical backends with AI agents, transactional systems, and streaming pipelines across the enterprise. At the same time, enterprise application vendors are embedding AI assistants and agents directly into their platforms: SAP Joule / Business AI – Embedded AI for finance, supply chain, and operationsSalesforce Einstein / Copilot Studio – Generative AI for CRM and sales automationServiceNow Now Assist – Predictive automation across IT and employee servicesOracle Fusion AI / OCI – ML for ERP, HCM, and procurementMicrosoft Copilot – Integrated AI across Dynamics and Power PlatformIBM watsonx, Adobe Sensei, Infor Coleman AI – Governed, domain-specific AI agents Each of these solutions benefits from the same architectural foundation: real-time data access, decoupled integration, and standardized agent communication. Whether deployed internally or sourced from vendors, agents need a reliable event-driven infrastructure to coordinate with each other and with backend systems. Apache Kafka provides this core integration layer — supporting a consistent, scalable, and open foundation for agentic AI across the enterprise. Agentic AI Requires Decoupling – Apache Kafka Supports A2A and MCP as an Event Broker To deliver on the promise of agentic AI, enterprises must move beyond point-to-point APIs and batch integrations. They need a shared, event-driven foundation that enables agents (and other enterprise software) to work independently and together—with shared context, consistent data, and scalable interactions. Apache Kafka provides exactly that. Combined with MCP and A2A for standardized Agentic AI communication, Kafka unlocks the flexibility, resilience, and openness needed for next-generation enterprise AI. It’s not about picking one agent platform. It’s about providing every agent with a consistent, reliable interface to the rest of the world. Kafka is that interface.

By Kai Wähner DZone Core CORE
Top Takeaways From Devoxx Belgium 2025
Top Takeaways From Devoxx Belgium 2025

In October 2025, I visited Devoxx Belgium, and again it was an awesome event! I learned a lot and received quite a lot of information, which I do not want to withhold from you. In this blog, you can find my takeaways of Devoxx Belgium 2025! Introduction Devoxx Belgium is the largest Java conference in Europe. This year, it was already the 22nd edition. As always, Devoxx is being held in the fantastic theatres of Kinepolis Antwerp. Each year, there is a rush on the tickets. Tickets are released in several batches, so if you could not get a ticket during the first batch, you will get another chance. The first two days of Devoxx are Deep Dive days where you can enjoy more in-depth talks (about 2-3 hours) and hands-on workshops. Days three up and including five are the Conference Days where talks are being held in a time frame of about 30-50 minutes. You receive a lot of information! This edition was a special one for me, because I got the opportunity to speak at Devoxx myself, which has been an awesome experience! I gave a Deep Dive session on Monday, but more on that later. Enough for the introduction, the next paragraphs contain my takeaways from Devoxx. This only scratches the surface of a topic, but it should be enough to make you curious to dive a bit deeper into the topic yourself. Do check out the Devoxx YouTube channel. All the sessions are recorded and can be viewed there. If you intend to view them all, there are 250 of them. Artificial Intelligence Let's start with AI first. More and more AI-related talks are given, which makes Devoxx Belgium also the largest AI conference in the world. But there are sufficient other topics to choose from, but I cannot neglect the importance of AI during this conference. AI Agents Agents are on the rise, and the major libraries for using AI with Java have support for it, or are working on this topic. In general, they all support three flows (explanation is mainly taken from the LangChain4j documentation): Sequential workflow: A sequential workflow is the simplest possible pattern where multiple agents are invoked one after the other, with each agent's output being passed as input to the next agent. This pattern is useful when you have a series of tasks that need to be performed in a specific order.Loop workflow: In this case, you want to improve the output of an LLM in a loop until a certain condition has been met. The agent is invoked multiple times. An end condition can of course also be a maximum number of times in order to prevent the agent to get stuck in the loop.Parallel workflow: With the parallel workflow, you can start multiple agents in parallel and combine their output once they are done with their task. Next to these flows, it is also possible to create agent-to-agent workflows. A2A is an open standard that enables AI agents to communicate and collaborate across different platforms and frameworks, regardless of their underlying technologies. With this approach, you can combine several agents altogether. It is good to know about these capabilities and which support is available in the libraries: LangChain4j, Spring AI, and Agent Development Kit. And do check out the Embabel Framework created by Rod Johnson. This makes use of Goal-Oriented-Action-Planning (GOAP). From LLM orchestration to autonomous agents: Agentic AI patterns with LangChain4j Discover the Agent Development Kit for Java for building AI agents Gen AI Grows Up: Enterprise JVM Agents With Embabel Model Context Protocol If you want to add agents to your AI workflow, you should know about Model Context Protocol (MCP). MCP is a standardized way of interacting with agents. Creating an MCP server is quite easy to do with the above-mentioned libraries. If you want to test your agents, use the MCP Inspector. Something which is not yet addressed sufficiently in the MCP specification is how to secure MCP servers. There is some temporary solution currently, but this probably will change in the near future. Beyond local tools: Deep dive into the Model Context Protocol (MCP) Securing MCP Servers AI Coding Assistants Of course, I have to mention my own Deep Dive. If you want to know more about how to improve the model responses during coding, or if you want to know which tasks can be executed (or not) by AI, etc. Do definitely watch the first part of my Deep Dive. If you are interested in adding MCP servers to your coding workflow so that a model can make use of your terminal, retrieve up-to-date documentation for your libraries, or write End-to-End tests for you, do watch the second part (starting at 1:17). Unlocking AI Coding Assistants: Real-World Use Cases Software Architecture I have read about Architecture Decision Records (ADR) before, and they were mentioned in some talks. But I never had a decent explanation like in the talk I visited. So if you want to get started with ADR, you should definitely take a look at the talk. Creating effective and objective architectural decision records (ADRs) And to continue the architecture paragraph, also watch Making significant Software Architecture decisions. If someone wants to make an architectural decision, you should use the 5 Whys. So, if someone is telling you to use technology A, you ask 'but why?', the person will explain, and then you ask 'but why?', etc. If you still got a decent answer after the fifth why, you are good to go. This and other tips are given in this talk. Security Spring Security I always try to visit a talk about Spring Security, just to freshen up my knowledge and to learn new things, of course. This year, I went to a Spring Security Authorization Deep Dive. You learn about Request, Method, and Object authorization, and how to design your security authorization. Authorization in Spring Security: permissions, roles and beyond Vulnerabilities Ah, vulnerabilities... often a nightmare for developers. Because we need to update our dependencies often. This talk explains CVEs, SBOMs, how to expose your SBOM by means of Spring Boot Actuator, how to use Dependency Track to manage your SBOMs, etc. And also that you should use distroless base images for your own container images in order to reduce the number of dependencies in your container. From Vulnerability to Victory: Mastering the CVE Lifecycle for Java Developers Others Java 25 Between all the AI content, we would almost forget that Java 25 has been released on the 16th of September. In order to get a complete overview, you should take a look at Java 21 to 25 - Better Language, Better APIs, Better Runtime. I was unfortunately not able to attend this Deep Dive because it was scheduled together with my Deep Dive. But that is the beauty of Devoxx Belgium: all talks are recorded and available the next day. This is definitely one of the first talks I will look at. If you are interested in what is coming forward, you should take a look at Weather the Storm: How Value Classes Will Enhance Java Performance. Value classes are immutable and also available for Records. You will get the same performance as with primitive types, meaning that creating value classes comes at almost no performance cost. Spring Boot 4 Another major release coming up is Spring Boot 4 and Spring Framework 7, which is scheduled for November 2025. Discover the new HTTP client, the use of JSpecify annotations, Jackson 3, API versioning, and so on. Bootiful Spring Boot IntelliJ IDEA If you are a Java developer, you probably are using IntelliJ IDEA. IntelliJ covers quite some features and also a lot of them you do not know about. Learn more about it and watch to be more productive with IntelliJ IDEA. You will definitely learn something new. If you are using Spring Boot, you should install the Spring Debugger plugin. At least the ability to see which properties files are loaded, which beans are loaded, is already so valuable that it will help you during debugging. Spring Debugger: Behind The Scenes of Spring Boot Conclusion Devoxx 2025 was great, and I am glad I was able to attend the event. As you can read in this blog, I learned a lot and I need to take a closer look at many topics. At least I do not need to search for inspiration for future blogs!

By Gunter Rotsaert DZone Core CORE
Building a Resilient Observability Stack in 2025: Practical Steps to Reduce Tool Sprawl With OpenTelemetry, Unified Platforms, and AI-Ready Monitoring
Building a Resilient Observability Stack in 2025: Practical Steps to Reduce Tool Sprawl With OpenTelemetry, Unified Platforms, and AI-Ready Monitoring

Editor’s Note: The following is an article written for and published in DZone’s 2025 Trend Report, Intelligent Observability: Building a Foundation for Reliability at Scale. Platform consolidation is an important topic in 2025 as tool sprawl and platform fragmentation are costing engineering teams time, money, and focus. Some surveys of observability practitioners show that 80% of teams are working on reducing vendor count and consolidating their observability and monitoring tools. Observability should be seen as a discipline, not just a toolchain. The surface area of observability now spans performance optimization, real user monitoring, security and compliance, and the team rituals that sustain collaboration at scale. The main goal is to align technology and people around business outcomes instead of noise. The purpose of this checklist is to provide a pragmatic, practitioner-oriented playbook to help readers build a vendor-neutral, OpenTelemetry-first stack and reduce tool sprawl. Understand the True Cost of Tool Sprawl Tool sprawl often hides behind licensing fees, duplicated infrastructure, unused integrations, and the overhead of switching between dashboards. To make an informed consolidation plan, you need to start by assessing the total cost of ownership (TCO), which can be divided into acquisition costs, operational costs, and hidden costs. After that, you need to surface the human impact of tool sprawl as tool fragmentation leads to cognitive overload, training overhead, and integration nightmares. To start assessing the TCO, follow these steps: Create an inventory of every tool: name, version, owner, telemetry pillars it covers, and licence detailsCalculate acquisition and operational costs for each toolDocument hidden costs, like mean time to resolution, duplication of features, and time spent context switching Survey engineers about their current pain points and time lost between switching toolsIdentify duplicated dashboards and redundant alerts that impact the incident resolutionQuantify training efforts needed to onboard new team members Build an OTel-First, Vendor-Neutral Foundation Embracing open standards is the antidote to vendor lock-in. OpenTelemetry is a collection of APIs, SDKs, and tools that enable you to instrument, generate, collect, and export telemetry data across metrics, traces, and logs. OpenTelemetry is on track to become the de facto standard for observability. To start building a vendor-neutral foundation, take a look at these steps: Instrument all services using the OTel SDKs for your language (e.g., Java, Python, Go)Use standard semantic conventions for spans and attributes to ease integrationExport telemetry to a back end of your choice to decouple instrumentation from analysisVerify compatibility with OTel when evaluating vendors Avoid proprietary agents that can't be replaced or extendedCentralize telemetry pipelines using open formats to simplify future migrationsAdopt an observability pipeline that ingests all telemetry types and enriches them with contextEnsure identity propagation across services so that data from different pillars can be joined Consolidate Cloud Platforms and Vendor Landscape Cloud sprawl often mirrors tool sprawl: too many vendors with overlapping capabilities and rising costs. Cloud consolidation doesn't have to mean centralizing everything under one provider; it focuses on being intentional about reducing fragmentation. SAP's CIO report notes that vendor consolidation is the dominant priority for CIOs in 2025 in order to reduce complexity, control costs, and maximize AI potential. Here are some actions you can take to join in this trend: Conduct a vendor audit to list all SaaS, cloud, and observability providersAlign vendor contracts with strategic prioritiesFlag duplicate services or underutilized licencesEvaluate integration complexity by measuring the time and expertise needed to connect each tool Account for vendor viability by considering the risk of discontinued services or price changesAssess security posture across all vendorsPrioritize platforms that unify data and AI pipelines Integrate Continuous Profiling and Real User Monitoring Integrating continuous profiling with real user monitoring (RUM) bridges the gap between back-end and front-end performance and the end-user experience. Continuous Profiling for Code-Level Insights Continuous profilers help you locate exactly which parts of your application are bottlenecks to minimize latency and infrastructure costs. To take advantage of continuous profiling, start by implementing the items on this list: Enable profiling in production across critical servicesVisualize and compare profiles over time to detect regressionsLink profiling data with traces so that you can find the exact line of code causing the issue Use tags (service, version, host) to filter profiles and isolate performance changesDetain profile data and derived metrics long enough to support analysis and trending Real User Monitoring for Digital Experience RUM tracks client-side performance, such as page load time, errors, and request/response duration, to better understand the user experience. RUM is critical because it helps teams understand why users abandon websites after encountering friction so that they are able to react quickly. To give users the best digital experience, here are some actionable steps you can take: Implement RUM instrumentation across web and mobile appsCapture core web vitals and other key metricsSegment data by device, browser, location, and user cohort to uncover patterns Integrate RUM with back-end tracing to correlate front-end issues with service bottlenecksUse session replay to see what the user saw and understand context Outcome-Driven Monitoring and Critical User Journeys Effective observability must connect the front end, back end, and business context. All big players in the industry emphasize critical user journeys (CUJs) as workflows that directly impact conversion, retention, and support tickets. Using this list, you can join in on the benefits of having a consolidated observability stack: Identify your critical user journeysDefine "good" by setting user-centric metricsDeploy digital experience monitoring to validate user journeys Break down silos by sharing CUJ metrics across different teams in the organizationUse full-journey correlation to follow a problem from user click to back-end service Implement AI/LLM Monitoring and AI-Assisted Operations As AI agents and LLMs become more embedded in production systems, we need to think about how to instrument these tools with open standards so that organizations can harness the speed of automation without compromising reliability, compliance, or trust. Observe AI Agents and LLMs The generative AI observability project within OpenTelemetry is defining semantic conventions for AI agents to help ensure that telemetry is represented consistently across frameworks. Here are some steps to help you capture insights into AI models: Instrument AI agents using OTel's draft semantic conventionsCapture prompt/response data, model inference time, model usage, and error rates Emit evaluation metrics (correctness, hallucination score) into the same observability pipelineMonitor external dependencies like tool APIs and connectors Human-in-the-Loop Automation and AI-Assisted Operations When deploying AI and automation, it's important to decide where in that loop humans belong. Effective systems require continuous collaboration between people and machines. Follow these simple steps to successfully implement the human-agent relationship: Define the human responsibilities in the automation loopEnsure AI augments users rather than replaces them by expanding their abilitiesAvoid turning humans into passive monitors Educate teams on AI limitations and context gapsMaintain a feedback loop where human input refines AI behavior Straighten Security Controls and Compliance Observability doesn't only serve performance; it also underpins security and regulatory evidence. This list contains the necessary improvements you need to make to straighten security and compliance: Implement audit trails on application, user, and network layersChoose logging tools that support structured outputAlign log retention with regulations like GDPR, HIPAA, and PCI DSSClassify telemetry data and apply appropriate encryption and masking Implement data loss prevention controlsUse zero-trust principlesLog AI model updates and configuration changesTrack user interactions with AI systems for accountabilityReview compliance with emerging AI regulations and adapt instrumentation accordingly Adopt Team Rituals and Outcome-Driven Practices Consolidation is about tools, culture, and processes. Align different teams around business outcomes and continuous learning. Here's how you can start approaching this: Host cross-functional reviews of CUJ dashboardsDefine clear ownership of each telemetry pillar (metrics, logs, traces, profiles, RUM) and ensure knowledge is shared Continuously refine service-level objectives based on user feedback and business prioritiesEmbrace blameless post-mortems into team ritualsAutomate toil to free engineers for higher-value work Conclusion Platform consolidation is an ongoing discipline. To reduce tool sprawl and build a vendor-neutral stack, teams must: Expose the hidden costs of tool sprawlCommit to open standards by adopting OpenTelemetryConsolidate vendors intentionallyIntegrate performance and experience monitoringImplement AI observability and human-in-the-loop practicesEmbed security and compliance into observability systemsCultivate a shared observability culture This is an excerpt from DZone’s 2025 Trend Report, Intelligent Observability: Building a Foundation for Reliability at Scale.Read the Free Report

By Marija Naumovska DZone Core CORE
What Is Agent Observability? Key Lessons Learned
What Is Agent Observability? Key Lessons Learned

Agents are proliferating like wildfire, yet there is a ton of confusion surrounding foundational concepts such as agent observability. Is it the same as AI observability? What problem does it solve, and how does it work? Fear not, we'll dive into these questions and more. Along the way, we will cite specific user examples as well as our own experience in pushing a customer-facing AI agent into production. By the end of this article, you will understand: Best practices from real data + AI teamsHow the agent observability category is definedThe benefits of agent observabilityThe critical capabilities required for achieving those benefitsBest practices from real data + AI teams What Is an Agent? Anthropic defines an agent as “LLMs autonomously using tools in a loop.” I’ll expand on that definition a bit. An agent is an AI equipped with a set of guiding principles and resources, capable of a multi-step decision and action chain to produce a desired outcome. These resources often consist of access to databases, communication tools, or even other sub-agents (if you are using a multi-agent architecture). What is an agent? A visual guide to the agent lifecycle. Image courtesy of the author. For example, a customer support agent may: Receive a user inquiry regarding a refund on their last purchaseCreate and escalate a ticketAccess the relevant transaction history in the data warehouseAccess the relevant refund policy chunk in a vector databaseUse the provided context and instructional prompt to formulate a responseReply to the user And that would just be step one in the process! The user would reply creating another unique response and series of actions. What Is Observability? Observability is the ability to have visibility into a system's inputs and outputs, as well as the performance of its component parts. An analogy I like to use is a factory that produces widgets. You can test the widgets to make sure they are within spec, but to understand why any deficiencies occurred, you also need to monitor the gears that make up the assembly line (and have a process for fixing broken parts). The broken boxes represent data products, and the gears are the components in a data landscape that introduce reliability issues (data, systems, code). Image courtesy of the author. There are multiple observability categories. The term was first introduced by platforms designed to help software engineers or site reliability engineers reduce the time their applications are offline. These solutions are categorized by Gartner in their Magic Quadrant for Observability Platforms. Barr Moses introduced the data observability category in 2019. These platforms are designed to reduce data downtime and increase adoption of reliable data and AI. Gartner has produced a Data Observability Market Guide and given the category a benefit rating of HIGH. Gartner also projects 70% of organizations will adopt data observability platforms by 2027, an increase from 50% in 2025. And amidst these categories, you also have agent observability. Let’s define it. What Is Agent Observability? If we combine the two definitions — what is an agent and what is observability — together, we get the following: Agent observability is the ability to have visibility into the performance of the inputs, outputs, and component parts of an LLM system that uses tools in a loop. It’s a critical, fast-growing category — Gartner projects that 90% of companies with LLMs in production will adopt these solutions. Agent observability provides visibility into the agent lifecycle. Image courtesy of the author. Let’s revisit our customer success agent example to further flesh out this definition. What was previously an opaque process with a user question, “Can I get a refund?” and agent response, “Yes, you are within the 30-day return window. Would you like me to email you a return label?” now might look like this: Sample trace visualized. Image courtesy of the author. The above image is a visualized trace, or a record of each span (unit of work) the agent took as part of its session with a user. Many of these spans involve LLM calls. As you can see in the image below, agent observability provides visibility into the telemetry of each span, including the prompt (input), completion (output), and operational metrics such as token count (cost), latency, and more. As valuable as this visibility is, what is even more valuable is the ability to set proactive monitors on this telemetry. For example, getting alerted when the relevance of the agent output drops or if the amount of tokens used during a specific span starts to spike. We’ll dive into more details on common features, how it works, and best practices in subsequent sections, but first, let’s make sure we understand the benefits and goals of agent observability. A Quick Note on Synonymous Categories Terms like GenAI observability, AI observability, or LLM observability are often used interchangeably, although technically, the LLM is just one component of an agent. RAG (retrieval-augmented generation) observability refers to a similar but less narrow pattern involving AI retrieving context to inform its response. I’ve also seen teams reference LLMops, AgentOps, or evaluation platforms. The labels and technologies have evolved rapidly over a short period of time, but these categorical terms can be considered roughly synonymous. For example, Gartner has produced an “Innovation Insight: LLM Observability” report with essentially the same definition. Honestly, there is no need to sweat the semantics. Whatever you or your team decide to call it, what’s truly important is that you have the technology and processes in place to monitor and improve the quality and reliability of your agent’s outputs. Do You Need Agent Observability If You Use Guardrails? The short answer is yes. Many AI development platforms, such as AWS Bedrock, include real-time safeguards, called guardrails, to prevent toxic responses. However, guardrails aren’t designed to catch regressions in agent responses over time across dimensions such as accuracy, helpfulness, or relevance. In practice, you need both working together. Guardrails protect you from acute risks in real time, while observability protects you from chronic risks that appear gradually. It’s similar to the relationship between data testing and anomaly detection for monitoring data quality. Problem to Be Solved and Business Benefits Ultimately, the goal of any observability solution is to reduce and minimize downtime. This concept for software applications was popularized by the Google Site Reliability Engineering Handbook, which defined downtime as the portion of unsuccessful requests divided by the total number of requests. Like everything in the AI space, defining a successful request is more difficult than it seems. After all these are non-deterministic systems meaning you can provide the same input many times and get many different outputs. Is a request only unsuccessful if it technically fails? What about if it hallucinates and provides inaccurate information? What if the information is technically correct, but it’s in another language or surrounded by toxic language? Again, it’s best to avoid getting lost in the semantics and pedantics. Ultimately, the goal of reducing downtime is to ensure features are adopted and provide the intended value to users. This means agent downtime should be measured based on the underlying use case. For example, clarity and tone of voice might be paramount for our customer success chatbot, but it might not be a large factor for a revenue operations agent providing summarized insights from sales calls. This also means your downtime metric should correspond to user adoption. If those numbers don’t track, you haven’t captured the key metrics that make your agent valuable. Most data + AI teams I talk to today are using adoption as the main proxy for agent reliability. As the space begins to mature, teams are gradually moving toward more forward leading indicators such as downtime and the metrics that roll up to it such as relevancy, latency, recall (F1), and more. Dropbox, for example, measures agent downtime as: Responses without a citationIf more than 95% of responses have a latency greater than 5 secondsIf the agent does not reference the right source at least 85% of the time (F1 > 85%)Factual accuracy, clarity, and formatting are other dimensions, but a failure threshold isn’t provided. At Monte Carlo, our development team considers our Troubleshooting Agent as experiencing downtime based on the metrics of semantic distance, groundedness, and proper tool usage. These are evaluated on a 0-1 scale using an LLM-as-judge methodology. Downtime in staging is defined as: Any score under 0.5More than 33% of LLM-as-judge evaluations or more than 2 total evaluations score between a .5 and .8, even after an automatic retry. Groundedness tests show the agent invents information or answers out of scope (hallucination or missing context).The agent misuses or fails to use the required tools Outside of adoption, agents can be evaluated across the classic business values of reducing cost, increasing revenue, or decreasing risk. In these scenarios, the cost of downtime can be quantified easily by taking the frequency and duration of downtime and multiplying them by the ROI being driven by the agent. This formula remains mostly academic at the moment since, as we’ve noted previously, most teams are not as focused on measuring immediate ROI. However, I have spoken to a few. One of the clearest examples in this regard is a pharmaceutical company using an agent to enrich customer records in a master data management match-merge process. They originally built their business case on reducing cost, specifically the number of records that need to be enriched by human stewards. However, while they did increase the number of records that could be automatically enriched, they also improved a large number of poor records that would have been automatically discarded as well! So the human steward workload actually increased! Ultimately, this was a good result as record quality improved; however, it does underscore how fluid and unpredictable this space remains. How Agent Observability Works Agent observability can be built internally by engineering teams or purchased from several vendors. We’ll save the build vs. buy analysis for another time, but, as with data testing, some smaller teams will choose to start with an internal build until they reach a scale where a more systemic approach is required. Whether an internal build or vendor platform, when you boil it down to the essentials, there are really two core components to an agent observability platform: trace visualization and evaluation monitors. Trace Visualization Traces, or telemetry data that describes each step taken by an agent, can be captured using an open-source SDK that leverages the OpenTelemetry (Otel) framework. Teams label key steps — such as skills, workflows, or tool calls — as spans. When a session starts, the agent calls the SDK, which captures all the associated telemetry for each span, such as model version, duration, tokens, etc. A collector then sends that data to the intended destination (we think the best practice is to consolidate within your warehouse or lakehouse source of truth), where an application can then help visualize the information, making it easier to explore. One benefit to observing agent architectures is that this telemetry is relatively consolidated and easy to access via LLM orchestration frameworks, as compared to observing data architectures, where critical metadata may be spread across a half dozen systems. Evaluation Monitors Once you have all of this rich telemetry in place, you can monitor or evaluate it. This can be done using an agent observability platform, or sometimes the native capabilities within data + AI platforms. Teams will typically refer to the process of using AI to monitor AI (LLM-as-judge) as an evaluation. This type of monitor is well-suited to evaluate the helpfulness, validity, and accuracy of the agent. This is because the outputs are typically larger text fields and non-deterministic, making traditional SQL-based monitors less effective across these dimensions. Where SQL code-based monitors really shine, however, is in detecting issues across operational metrics (system failures, latency, cost, throughput) as well as situations in which the agent’s output must conform to a very specific format or rule. For example, if the output must be in the format of a US postal address, or if it must always have a citation. Most teams will require both types of monitors. In cases where either approach will produce a valid result, teams should favor code-based monitors as they are more deterministic, explainable, and cost-effective. However, it’s important to ensure your heuristic or code-based monitor is achieving the intended result. Simple code-based monitors focused on use case-specific criteria — say, output length must be under 350 characters–are typically more effective than complex formulas designed to broadly capture semantic accuracy or validity, such as ROUGE, BLEU, cosine similarity, and others. While these traditional metrics benefit from being explainable, they struggle when the same idea is expressed in different terms. Almost every data science team starts with these familiar monitors, only to quickly abandon them after a rash of false positives. What About Context Engineering and Reference Data? This is arguably the third component of agent observability. It can be a bit tricky to draw a firm line between data observability and agent observability — it's probably best not to even try. This is because agent behavior is driven by the data it retrieves, summarizes, or reasons over. In many cases, the “inputs” that shape an agent’s responses — things like vector embeddings, retrieval pipelines, and structured lookup tables — sit somewhere between the two worlds. Or perhaps it may be more accurate to say they all live in one world, and that agent observability MUST include data observability. This argument is pretty sound. After all, an agent can’t get the right answer if it’s fed wrong or incomplete context — and in these scenarios, agent observability evaluations will still pass with flying colors. Challenges and Best Practices It would be easy enough to generate a list of agent observability challenges teams could struggle with, but let’s take a look at the most common problems teams are actually encountering. And remember, these are challenges specifically related to observing agents. Challenge #1: Evaluation Cost LLM workloads aren’t cheap, and a single agent session can involve hundreds of LLM calls. Now imagine for each of those calls you are also calling another LLM multiple times to judge different quality dimensions. It can add up quickly. One data + AI leader confessed to us that their evaluation cost was 10 times as expensive as the baseline agent workload. Monte Carlo’s agent development team strives to maintain roughly a one to one workload to evaluation ratio. Best Practices to Contain Evaluation Cost Most teams will sample a percentage or an aggregate number of spans per trace to manage costs while still retaining the ability to detect performance degradations. Stratified sampling, or sampling a representative portion of the data, can be helpful in this regard. Conversely, it can also be helpful to filter for specific spans, such as those with a longer-than-average duration. Challenge #2: Defining Failure and Alert Conditions Even when teams have all the right telemetry and evaluation infrastructure in place, deciding what actually constitutes “failure” can be surprisingly difficult. To start, defining failure requires a deep understanding of the agent’s use case and user expectations. A customer support bot, a sales assistant, and a research summarizer all have different standards for what counts as “good enough.” What’s more, the relationship between a bad response and its real-world impact on adoption isn’t always linear or obvious. For example, if an evaluation model gives a response that is judged to be a .75 for clarity, is that a failure? Best Practices for Defining Failure and Alert Conditions Aggregate multiple evaluation dimensions. Rather than declaring a failure based on a single score, combine several key metrics — such as helpfulness, accuracy, faithfulness, and clarity — and treat them as a composite pass/fail test. This is the approach Monte Carlo takes in our agent evaluation framework for our internal agents. Most teams will also leverage anomaly detection to identify a consistent drop in scores over a period of time rather than a single (possibly hallucinated) evaluation. Dropbox, for example, leverages dashboards that track their evaluation score trends over hourly, six-hour, and daily intervals. Finally, know which monitors are “soft” and which are “hard.” Some monitors should immediately trigger an alert when their threshold is breached. Typically, these are more deterministic monitors evaluating an operational metric such as latency or a system failure. Challenge #3: Flaky Evaluations Who evaluates the evaluators? Using a system that can hallucinate to monitor a system that can hallucinate has obvious drawbacks. The other challenge for creating valid evaluations is that, as every single person who has put an agent into production has bemoaned to me, small changes to the prompt have a large impact on the outcome. This means creating customized evaluations or experimenting with evaluations can be difficult. Best Practices for Avoiding Flaky Evaluations Most teams avoid flaky tests or evaluations by testing extensively in staging on golden datasets with known input-output pairs. This will typically include representative queries that have proved problematic in the past. It is also a common practice to test evaluations in production on a small sample of real-world traces with a human in the loop. Of course, LLM judges will still occasionally hallucinate. Or as one data scientist put it to me, “one in every ten tests spits out absolute garbage.” He will automatically rerun evaluations for low scores to confirm issues. Challenge #4: Visibility Across the Data + AI Lifecycle Of course, once a monitor sends an alert, the immediate next question is always: “Why did that fail?” Getting the answer isn’t easy! Agents are highly complex, interdependent systems. Finding the root cause requires end-to-end visibility across the four components that introduce reliability issues into a data + AI system: data, systems, code, and model. Here are some examples: Data Real-world changes and input drift. For example, if a company enters a new market and there are now more users speaking Spanish than English. This could impact the language the model was trained in.Unavailable context. We recently wrote about an issue where the model was working as intended but the context on the root cause (in this case a list of recent pull requests made on table queries) was missing. System Pipeline or job failuresAny change to what tools are provided to the agent or changes in the tools themselves. Changes to how the agents are orchestrated Code Data transformation issues (changing queries, transformation models)Updates to promptsChanges impacting how the output is formatted Model Platform updates its model versionChanges to which model is used for a specific call Best Practices for Visibility Across the Data + AI Lifecycle It is critical to consolidate telemetry from your data + AI systems into a single source of truth, and many teams are choosing the warehouse or lakehouse as their central platform. This unified view lets teams correlate failures across domains — for example, seeing that a model’s relevancy drop coincided with a schema change in an upstream dataset or an updated model. Deep Dive: Example Architecture The image above shows the technical architecture that Monte Carlo’s Troubleshooting Agent leverages to build a scalable, secure, and decoupled system that connects its existing monolithic platform to its new AI Agent stack. On the AI side, the AI Agent Service runs on Amazon ECS Fargate, which enables containerized microservices to scale automatically without managing underlying infrastructure. Incoming traffic to the AI Agent Service is distributed through a network load balancer (NLB), providing high-performance, low-latency routing across Fargate tasks. The image below is an abstracted interpretation of the Troubleshooting Agent’s workflow, which leverages several specialized sub-agents. These sub-agents investigate different signals to determine the root cause of a data quality incident and report back to the managing agent, who presents the findings to the user. Deliver Production-Ready Agents The core takeaway I hope you walk away with is that when your agents enter production and become integral to business operations, the ability to assess their reliability becomes a necessity. Production-grade agents must be observed. This article was co-written with Michael Segner.

By Lior Gavish
Navigating the Cyber Frontier: AI and ML's Role in Shaping Tomorrow's Threat Defense
Navigating the Cyber Frontier: AI and ML's Role in Shaping Tomorrow's Threat Defense

Abstract This article explores the transformative role of artificial intelligence (AI) and machine learning (ML) in cybersecurity. It delves into innovative strategies such as adaptive cyber deception and predictive behavioral analysis, which are reshaping defense mechanisms against cyber threats. The integration of AI in zero-trust architectures, quantum cryptography, and automation within cybersecurity frameworks highlights a shift towards more dynamic and proactive security measures. Furthermore, the challenges of the "black box" problem in AI decision-making and the potential for AI to automate routine cybersecurity tasks are discussed. The narrative underscores the importance of complementing technology with human insight for effective digital defenses. Introduction: A Personal Encounter With Cyber Evolution Let me rewind a few years back — a time when I was knee-deep in implementing a creditworthiness model at my previous role at Sar Tech LLC/Capital One. It was around the same time I encountered the formidable intersection of artificial intelligence (AI) and cybersecurity. While tuning machine learning (ML) algorithms to reduce loan approval risks, I witnessed firsthand how AI could pivot an organization's security posture in ways I hadn’t quite imagined before. This realization didn't stem from an academic paper or industry panel — it came from the challenge of protecting sensitive data while simultaneously fine-tuning predictive models. It was an "aha" moment, one which highlighted the potential of AI and ML in a broader, more dynamic context of cybersecurity. 1. Adaptive Cyber Deception: A Strategic Shift Deception as Defense: More Than Just Smoke and Mirrors I vividly recall a project where we employed AI-driven deception techniques, a strategy that initially seemed straight out of a spy thriller rather than a data security meeting. The idea of deploying decoys and traps to mislead would-be attackers wasn't just innovative — it was transformative. We used platforms that could autonomously deploy traps tailored to the intelligence we gathered, constantly evolving as threats matured. This wasn't about fooling some hypothetical hacker; it was a real-world application, dynamically adjusting to threats in real time. The early challenges were not insignificant. The AI needed fine-tuning — much like a brewing pot of coffee that you keep tasting until that perfect balance is struck. Yet, when we saw reduced breach attempts and elongated threat response times, the payoff was clear. This strategy shifted our mindset from being purely defensive to engaging in active deterrence. 2. Predictive Behavioral Analysis: Reading Between The Lines Breaking the Mold: Predicting the Unpredictable Incorporating AI into predictive behavioral analysis feels a bit like playing chess blindfolded — challenging but rewarding. Most cybersecurity efforts focus on known threats —the easily identifiable pawns and bishops. But there's immense value in predicting the moves of hidden pieces. For instance, during a period when identifying insider threats was critical, we leveraged AI to analyze massive datasets, revealing subtle user patterns that could indicate future security risks. It was akin to predictive maintenance in manufacturing. It required a mindset shift — a move from passive analysis to active prediction, not only guarding against known threats but also casting a safety net over potential surprises. The parallels were striking: just as in maintaining a manufacturing line, we had to anticipate system 'failures' before they happened. 3. Zero Trust and AI: A Necessary Symbiosis Continuous Trust: The Ever-Evolving Security Blanket When the conversation turns to zero-trust architectures, my mind immediately goes back to implementing real-time fraud detection systems while working with financial data. Here, AI played a critical role in ensuring persistent verification of user identities and devices. Our experience was that traditional models that granted one-time trust were antiquated. We needed a system that continuously validated not just once, but every step of the way. Implementing this was no easy feat, as it often required the blending of AI with agile security systems—akin to updating software in a live server environment. The automation brought by AI allowed us to evaluate risk in real-time, ensuring that our trust was as fluid as the threats being faced. 4. Quantum Cryptography: The Next Frontier AI and Quantum: The New Dynamic Duo Exploring AI's role in enhancing quantum cryptography was perhaps the most cutting-edge venture. The convergence of AI with quantum methods wasn't just an exploration in theoretical cryptography but a practical endeavor to secure communication channels. We employed machine learning (ML) algorithms to optimize quantum key distribution, dynamically adjusting to new vulnerabilities. The challenge here was twofold: technical and conceptual. The quantum realm doesn’t always adhere to classical physics — or logic, for that matter. Combining it with AI required navigating unfamiliar waters in quantum algorithms and applying ML models in an entirely new context. It was a learning curve, but the potential was too significant to ignore — a robust defense against not only current threats but the looming quantum computing advancements that could render traditional cryptography obsolete. 5. Addressing the "Black Box" Problem Transparency in AI: Demystifying the Algorithms A recurring pain point with AI-driven cybersecurity solutions is their opaque nature—the dreaded "black box." In my experience, transparency in decision-making processes is crucial. Security teams need to trust that AI's decisions are based on sound logic. It's not unlike cooking without a recipe; you need to know the ingredients to trust the outcome. Yet, explainable AI models can bridge this gap by offering insights into the decision-making pathways of algorithms. Initiatives during my tenure at Capital One included developing clear protocols for auditing AI-driven decisions, providing transparency, and fostering trust within our security teams. This endeavor ensured that our 'AI chefs' revealed enough of their recipe to build confidence in the solutions presented. 6. The Increasing Role of AI in Automating Cybersecurity From Manual to Machine: Redefining Roles The future is unmistakably veering towards automation — allowing AI to shoulder more of the operational load. This shift is redefining roles within cybersecurity teams, requiring a new focus on strategic oversight rather than routine tasks. My journey through machine learning projects taught me the value of shifting mundane tasks to AI, freeing up human resources to tackle complex, strategic challenges. However, this evolution comes with its own set of challenges, such as ensuring AI's ethical use and accountability. It’s like introducing a new player into an established team; roles need to be reassessed, and new playbooks developed. The human element will pivot to overseeing, strategizing, and innovating the broader defense strategies rather than routine operations. Conclusion: A Future of Autonomous Defenses Navigating this cyber frontier, one thing remains clear: AI and ML are integral to evolving threat defenses. The journey, punctuated by challenges and groundbreaking strides, is one of continuous learning—much like my career path, which has been anything but linear. The lessons learned along the way emphasize that while technology propels us forward, it remains essential to blend human insight with artificial intelligence. Just as no single technology was ever a panacea, AI and ML are tools — powerful ones — that, when wielded wisely, can redefine how we secure our digital landscapes. In essence, the future of cybersecurity is not just about the tools but the synergy they create with the people behind them. It’s an exciting time to be in this field, and I, for one, am eager to see how AI and ML continue to transform the way we defend against threats. So, here’s to embracing these innovations and blazing a trail into a more secure digital future.

By Geethamanikanta Jakka

Top AI/ML Experts

expert thumbnail

Tuhin Chattopadhyay

CEO at Tuhin AI Advisory and Professor of Practice,
JAGSoM

Dr. Tuhin Chattopadhyay is a celebrated technology thought leader among both the academic and corporate fraternity. Recipient of numerous prestigious awards, Tuhin is hailed as India's Top 10 Data Scientists by Analytics India Magazine. Besides driving his consultancy organization Tuhin AI Advisory, Dr. Tuhin also serves as Professor of Practice at JAGSoM, Bengaluru. His professional accomplishments can be explored from https://www.tuhin.ai/, art portfolio from https://tuhin.art/, joie de vivre from https://tuhinism.com/ and adventures with MySon from https://dogfather.rocks/.
expert thumbnail

Frederic Jacquet

Technology Evangelist,
AI[4]Human-Nexus

My goal is to deepen my research and analysis to track technological developments and understand their real impacts on businesses and individuals. I focus on untangling exaggerated perceptions and irrational fears from genuine technological advances. My approach is critical: I aim to move beyond myths and hype to identify the concrete, realistic progress we can expect from new technologies.
expert thumbnail

Suri (thammuio)

Data & AI Services and Portfolio

Seasoned Data & AI Technologist and Innovator with deep expertise in Big Data, Data Analytics, Cloud, Machine Learning, and Generative AI. He is passionate about building modern data ecosystems that drive intelligent analytics and business transformation. As a Forbes Technology Council and Entrepreneur Leadership Network member, Suri contributes thought leadership on technology strategy, AI innovation, and digital transformation. A founder of multiple startups and a lifelong learner, he combines enterprise experience with entrepreneurial agility to deliver impactful, future-ready data solutions.
expert thumbnail

Pratik Prakash

Principal Solution Architect,
Capital One

Pratik, an experienced solution architect and passionate open-source advocate, combines hands-on engineering expertise with an extensive experience in multi-cloud and data science .Leading transformative initiatives across current and previous roles, he specializes in large-scale multi-cloud technology modernization. Pratik's leadership is highlighted by his proficiency in developing scalable serverless application ecosystems, implementing event-driven architecture, deploying AI-ML & NLP models, and crafting hybrid mobile apps. Notably, his strategic focus on an API-first approach drives digital transformation while embracing SaaS adoption to reshape technological landscapes.

The Latest AI/ML Topics

article thumbnail
Top 5 Best Practices for Building Dockerized MCP Servers
Learn the best practices for building MCP Servers and use them to power your LLM-powered applications. Make sure your setup has isolation and is secure.
November 26, 2025
by Mahak Shah
· 262 Views
article thumbnail
Building a Local RAG App With a UI, No Vector DB Required
A step-by-step guide to building a complete retrieval-augmented generation (RAG) application with FAISS, LangChain, and Streamlit that runs 100% locally.
November 26, 2025
by Nabin Debnath
· 225 Views · 1 Like
article thumbnail
Building AI Agents With Semantic Kernel: A Practical 101 Guide
Learn how to build a simple, production-ready AI agent using Microsoft’s Semantic Kernel, covering kernels, plugins, agents, observability, and scalability.
November 26, 2025
by Sharan Babu Paramasivam Murugesan
· 177 Views
article thumbnail
LLMOps Under the Hood: Docker Practices for Large Language Model Deployment
LLMs need GPUs, libraries, and stable setups, which makes them hard to run. Docker simplifies this by packaging everything into portable containers.
November 26, 2025
by Pragya Keshap
· 266 Views
article thumbnail
Vector Databases in Action: Building a RAG Pipeline for Code Search and Documentation
Build a semantic code search that understands meaning, not keywords, with AST parsing, embeddings, hybrid search, and LLM-powered documentation generation.
November 25, 2025
by Dinesh Elumalai
· 470 Views
article thumbnail
DevSecConflict: How Google Project Zero and FFmpeg Went Viral For All the Wrong Reasons
The tension between security and open source highlights the struggle over responsibility as AI uncovers vulnerabilities faster than we can respond.
November 24, 2025
by Katie Paxton-Fear
· 1,365 Views · 6 Likes
article thumbnail
When Chatbots Go Rogue: Securing Conversational AI in Cyber Defense
Chatbots boost business but pose data risks. Strong security and AI risk management protect trust, compliance, and customer safety.
November 24, 2025
by Arun Goyal
· 714 Views · 1 Like
article thumbnail
Building a Retrieval-Augmented Generation (RAG) System in Java With Spring AI, Vertex AI, and BigQuery
Build a Java RAG application using Spring Boot, Vertex AI embeddings, BigQuery vector search, and a web UI for interactive PDF-based question answering.
November 24, 2025
by Mohammed Fazalullah Qudrath
· 1,108 Views · 2 Likes
article thumbnail
Creating an MCP Client With Spring AI
This blog post outlines the creation of an MCP client using Spring AI, building upon a previously established MCP server.
November 24, 2025
by Gunter Rotsaert DZone Core CORE
· 965 Views · 2 Likes
article thumbnail
Building Multimodal Agents with Google ADK — Practical Insights from My Implementation Journey
AI agents have rapidly evolved to support non-text inputs, unlocking new use cases and enabling seamless human-AI collaboration.
November 21, 2025
by Aakash Sharma
· 1,668 Views · 1 Like
article thumbnail
Revolutionizing Supply Chain Optimization with AI-Driven Constraint Programming
AI-driven constraint programming integrates machine learning with optimization to create adaptive, real-time, and efficient supply chain management.
November 21, 2025
by Shrinivas Jagtap
· 1,185 Views · 1 Like
article thumbnail
Software Testing in the AI Era - Evolving Beyond the Pyramid
Rethinking the canonical three-tiered software testing pyramid and software quality assurance strategies in the decade of AI
November 21, 2025
by Surbhi Madan
· 1,197 Views · 3 Likes
article thumbnail
Beyond Vector Databases: Integrating RAG as a First-Class Data Platform Workload
Architectural framework that integrates RAG deeply into enterprise data platforms through event-driven indexing, multi-layer hybrid retrieval, and governance by design.
November 20, 2025
by Anil kumar Kandalam
· 1,411 Views · 1 Like
article thumbnail
Building Smarter Systems: Architecting AI Agents for Real-World Tasks
Event-driven agents use rules, not AI, to build scalable, reactive systems that automate tasks, boost resilience, and reduce complexity
November 20, 2025
by Vinod Veeramachaneni
· 1,030 Views
article thumbnail
Creating an End-to-End ML Pipeline With Databricks and MLflow
This tutorial shows how to build a complete ML pipeline on Databricks using Delta Lake for data management and MLflow for model tracking, registration, and deployment.
November 19, 2025
by harshraj bhoite
· 1,307 Views · 1 Like
article thumbnail
Smart AI Agent Targeting With MCP Tools
Transform your basic multi-agent system with business tiers, geographic targeting, and MCP tool integration using LaunchDarkly AI Configs.
November 19, 2025
by Scarlett Attensil
· 1,278 Views · 2 Likes
article thumbnail
Meta Data: How Data about Your Data is Optimal for AI
Metadata enhances AI performance by providing crucial context for models. Learn key benefits, implementation strategies, and real-world examples for smarter AI systems.
November 19, 2025
by Kevin Vu
· 960 Views
article thumbnail
From Zero to Local AI in 10 Minutes With Ollama + Python
In under ten minutes, install Ollama, pull a modern model, call it from Python or REST, and ship a repeatable Modelfile with a quick glance at the security checklist.
November 18, 2025
by Parthiban Rajasekaran
· 8,080 Views · 4 Likes
article thumbnail
Embedding Ethics Into Multi-AI Agentic Self-Healing Data Pipelines
With this article, understand how to include ethical practices while developing a multi-agent generative AI framework for self-healing data pipelines.
November 18, 2025
by Naveen Kolli
· 1,371 Views
article thumbnail
NVIDIA GPU Operator Explained: Simplifying GPU Workloads on Kubernetes
Learn how NVIDIA GPU Operator simplifies GPU management in Kubernetes. Explore features, setup steps, and best practices for AI/ML workloads.
November 18, 2025
by Sagar Parmar
· 1,714 Views
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • ...
  • Next

ABOUT US

  • About DZone
  • Support and feedback
  • Community research

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 215
  • Nashville, TN 37211
  • [email protected]

Let's be friends:

×