In our Culture and Methodologies category, dive into Agile, career development, team management, and methodologies such as Waterfall, Lean, and Kanban. Whether you're looking for tips on how to integrate Scrum theory into your team's Agile practices or you need help prepping for your next interview, our resources can help set you up for success.
The Agile methodology is a project management approach that breaks larger projects into several phases. It is a process of planning, executing, and evaluating with stakeholders. Our resources provide information on processes and tools, documentation, customer collaboration, and adjustments to make when planning meetings.
There are several paths to starting a career in software development, including the more non-traditional routes that are now more accessible than ever. Whether you're interested in front-end, back-end, or full-stack development, we offer more than 10,000 resources that can help you grow your current career or *develop* a new one.
Agile, Waterfall, and Lean are just a few of the project-centric methodologies for software development that you'll find in this Zone. Whether your team is focused on goals like achieving greater speed, having well-defined project scopes, or using fewer resources, the approach you adopt will offer clear guidelines to help structure your team's work. In this Zone, you'll find resources on user stories, implementation examples, and more to help you decide which methodology is the best fit and apply it in your development practices.
Development team management involves a combination of technical leadership, project management, and the ability to grow and nurture a team. These skills have never been more important, especially with the rise of remote work both across industries and around the world. The ability to delegate decision-making is key to team engagement. Review our inventory of tutorials, interviews, and first-hand accounts of improving the team dynamic.
Cutting Data Pipeline Costs and Data Freshness Issues With Netflix Maestro and Apache Iceberg: A Practical Tutorial
Workflows vs AI Agents vs Multi-Agent Systems: A Practical Guide for Developers
AWS has been building agentic infrastructure for some time now — Bedrock, AgentCore, Strands — mostly aimed at engineers who want to build their own agent systems from scratch. Amazon Quick is a different layer of the same bet: a ready-to-use agentic workspace that targets teams directly, without requiring custom orchestration code. This article walks through what Quick is, how its components fit together technically, how the MCP integration model works with real code, and where it sits relative to the rest of AWS's agent stack. What Amazon Quick Is Amazon Quick is an AI assistant for work that connects to your existing tools — Slack, Microsoft Teams, Outlook, CRMs, databases, and local files — and gives a unified layer for querying, automating, and acting across them. It launched in preview at AWS's "What's Next with AWS" event on April 28, 2026. The product is aimed at teams, not just individual users. One person can build a custom agent scoped to a specific dataset or workflow, and the whole team benefits from it. Responses from Quick agents are grounded in your actual business data, not the underlying model's training distribution. Under the hood, Quick is built on Amazon Bedrock AgentCore and uses the Model Context Protocol (MCP) as its standard for connecting to external tools. It runs on AWS IAM and VPC, which means it inherits the same security and compliance posture as the rest of your AWS workloads. Components Quick bundles five distinct capabilities. It helps to understand each one separately before thinking about how they compose. ComponentWhat it doesSpacesCollaborative workspaces where teams pool files, dashboards, and data sources. Agents in a Space are grounded in that Space's data.AgentsCustom, domain-scoped agents built on your team's specific data. One person builds, everyone uses.ResearchMulti-source synthesis across internal data, the public web, and third-party datasets. Produces structured reports.Visualize (Quick Sight)Integrated BI layer. Conversational access to dashboards, charts, and forecasting — no separate BI tool required.Automate (Quick Flows)Workflow automation from simple daily tasks to complex multi-step processes with cross-app action execution. Each component is available through the web app, mobile, and a native desktop app (currently in preview for macOS and Windows) that can read local files and calendar context without requiring browser access. Where Quick Sits in the AWS Agent Stack AWS is building in two directions at once. AgentCore is the infrastructure layer for engineers who want to compose their own agent systems — runtime, memory, gateway, observability — with any model and any framework. Quick is the product layer on top: opinionated, team-facing, and deployable without writing orchestration code. The practical implication: if you're an engineer building internal tools or automation pipelines, you'll likely interact with both layers. AgentCore for the infrastructure wiring; Quick as a surface where non-technical teammates interact with the agents you build. The Integration Architecture The core question for any engineer evaluating Quick is: how does it actually connect to external systems, and what does the request path look like? Quick uses MCP (Model Context Protocol) as its primary integration standard. This is significant because MCP is an open protocol — it means Quick agents are not locked into AWS-specific connectors, and any MCP-compatible server can be registered as a tool source. High-Level Request Flow The sequence below shows the full lifecycle of a single agent-triggered tool call — from the moment Quick receives a prompt through to the response returning from a downstream API. Quick acts as the MCP client. Your MCP server exposes tools via listTools and callTool. Quick discovers them at registration time and makes them available to any agent or automation in the workspace. Authentication flows through OAuth 2.0, with support for Dynamic Client Registration (DCR) so Quick can register itself automatically without manual credential setup. Building an MCP Server for Quick Here is a minimal Python MCP server using the mcp SDK that exposes two tools Quick can invoke — get_ticket and list_open_tickets. This pattern works whether you host the server yourself or run it on AgentCore Runtime. Install Dependencies Python pip install mcp[server] httpx uvicorn Server Implementation Python # server.py from mcp.server import Server from mcp.server.sse import SseServerTransport from mcp.types import Tool, TextContent import httpx import json from starlette.applications import Starlette from starlette.routing import Route app = Server("jira-quick-integration") JIRA_BASE_URL = "https://yourorg.atlassian.net" JIRA_TOKEN = "Bearer <your-token>" # in production, load from AWS Secrets Manager @app.list_tools() async def list_tools() -> list[Tool]: return [ Tool( name="get_ticket", description="Retrieve details for a single Jira ticket by issue key.", inputSchema={ "type": "object", "properties": { "issue_key": { "type": "string", "description": "The Jira issue key, e.g. ENG-1234" } }, "required": ["issue_key"] } ), Tool( name="list_open_tickets", description="List open Jira tickets assigned to a given user.", inputSchema={ "type": "object", "properties": { "assignee": { "type": "string", "description": "The Jira username or email of the assignee" } }, "required": ["assignee"] } ) ] @app.call_tool() async def call_tool(name: str, arguments: dict) -> list[TextContent]: headers = {"Authorization": JIRA_TOKEN, "Content-Type": "application/json"} async with httpx.AsyncClient() as client: if name == "get_ticket": key = arguments["issue_key"] resp = await client.get( f"{JIRA_BASE_URL}/rest/api/3/issue/{key}", headers=headers ) resp.raise_for_status() data = resp.json() summary = data["fields"]["summary"] status = data["fields"]["status"]["name"] return [TextContent(type="text", text=f"{key}: {summary} [{status}]")] elif name == "list_open_tickets": assignee = arguments["assignee"] jql = f"assignee={assignee} AND status != Done ORDER BY updated DESC" resp = await client.get( f"{JIRA_BASE_URL}/rest/api/3/search", headers=headers, params={"jql": jql, "maxResults": 20} ) resp.raise_for_status() issues = resp.json().get("issues", []) results = [ f"{i['key']}: {i['fields']['summary']}" for i in issues ] return [TextContent(type="text", text="\n".join(results) or "No open tickets found.")] raise ValueError(f"Unknown tool: {name}") # Wire up SSE transport for Quick compatibility sse = SseServerTransport("/messages/") async def handle_sse(request): async with sse.connect_sse( request.scope, request.receive, request._send ) as streams: await app.run(streams[0], streams[1], app.create_initialization_options()) starlette_app = Starlette( routes=[Route("/sse", endpoint=handle_sse)] ) if __name__ == "__main__": import uvicorn uvicorn.run(starlette_app, host="0.0.0.0", port=8080) A few design constraints to be aware of when building for Quick: Each MCP tool call has a 300-second hard timeout. Operations that exceed this fail with HTTP 424. Keep individual tool calls narrow and fast.The tool list is treated as static after registration. If you add or remove tools on the server, the Quick admin must re-establish the connection to pick up changes.Quick supports both Server-Sent Events (SSE) and streamable HTTP as transports. Streamable HTTP is preferred for new implementations. Registering the MCP Server in Quick Once your server is running and publicly reachable over HTTPS, registration in Quick takes the following path: Shell Quick Console → Integrations → Add Integration → MCP Fields: Server URL: https://your-mcp-server.example.com/sse Auth type: OAuth 2.0 (or Service, or None) Client ID: <from your identity provider> Authorization URL: https://auth.example.com/oauth/authorize Token URL: https://auth.example.com/oauth/token If your identity provider supports OAuth Dynamic Client Registration, Quick will auto-register and you skip the manual client ID step entirely. Quick sends an initial unauthenticated request to the MCP server; if it receives a 401 with a WWW-Authenticate header containing a resource_metadata URL, it fetches the metadata document and proceeds with DCR automatically. Once registered, Quick calls listTools at startup and exposes every discovered tool to agents and automations in the workspace. The AgentCore Gateway Option For teams that don't want to write and operate an MCP server from scratch, Amazon Bedrock AgentCore Gateway provides a managed alternative. You point Gateway at a Lambda function or an OpenAPI spec, and it handles the MCP wrapping, auth, logging, and semantic tool discovery automatically. If you use it, Quick never calls your internal APIs directly — everything flows through Gateway's auth and routing layer, as shown in the sequence diagram above. The semantic search capability is worth noting specifically. When an agent has access to dozens or hundreds of tools, passing the full tool list on every turn wastes context and causes the model to pick the wrong tool. Gateway's built-in x_amz_bedrock_agentcore_search tool lets Quick find the right tool by semantic similarity rather than scanning the entire registry each turn. Practical Considerations A few things worth keeping in mind before integrating: Tool scope matters. When agents are given too many tools simultaneously, selection accuracy degrades — the model reasons over too many options per turn and picks incorrectly more often. Keeping each agent or MCP server to a focused set of 3–5 tools produces better results than exposing everything through one endpoint. This is a known pattern in multi-agent architectures and applies equally to Quick agents. The 300-second timeout is real. Design each tool call to complete a single, bounded operation. Avoid chaining multiple downstream API calls inside a single tool invocation. If you need a multi-step workflow, model it as separate tools and let the agent orchestrate the sequence. Local context on the desktop app. The desktop app reads local files and calendar events directly, without upload. For engineers who work primarily in terminals and local editors, this is a meaningful integration point — meeting context, local documentation, and recent file changes are all available to the assistant without any configuration. MCP interoperability. Because Quick uses MCP as the standard, the same MCP server you build for Quick can also be consumed by Claude Code, Amazon Q Developer, and other MCP-compatible clients. The integration contract is portable. References Amazon Quick — Product overview and featuresIntegrate external tools with Amazon Quick Agents using MCP (AWS ML Blog, Feb 2026)MCP integration — Amazon Quick User GuideAmazon Bedrock AgentCore — Overview and documentationIntroducing Amazon Bedrock AgentCore Gateway (AWS ML Blog)Top announcements of the What's Next with AWS, 2026 (AWS News Blog, Apr 2026)
Large-scale cloud platforms have reached a level of complexity — spanning multi-region Kubernetes clusters, streaming systems like Kafka, and heterogeneous data stores — that often exceeds human cognitive limits. Failures are no longer isolated events; they are emergent behaviors arising from tightly coupled systems where issues propagate across layers such as networking, orchestration, and data pipelines. Even with modern observability stacks, operators must manually correlate signals across dashboards, making incident response slow, inconsistent, and cognitively taxing. Traditional approaches rely heavily on static runbooks and tribal knowledge. These mechanisms do not scale in modern distributed systems. Agentic AI introduces a fundamentally different paradigm. Rather than merely detecting anomalies (as in traditional AIOps), agentic systems use Large Language Models (LLMs) to reason, plan, and act. These systems can iteratively generate hypotheses, validate them using real data, and execute multi-step remediation workflows. The result is not just faster detection, but a closed-loop system capable of autonomous diagnosis and recovery. This article expands on how to architect a production-grade SRE agent that can safely and effectively automate cloud incident response. The system is organized into three layers: Perception (data ingestion), Cognition (multi-agent reasoning), and Action (guarded execution), all operating over a shared knowledge graph. Establish a Cloud Knowledge Graph At the core of any intelligent SRE agent is context. Raw telemetry alone is insufficient; the system must understand how components relate to each other. This is achieved through a domain-specific cloud knowledge graph. The graph models: Nodes: Services, pods, clusters, regions, gateways, Kafka topics, and databasesEdges: Traffic flows, deployment relationships, data lineage, ownership, and failover pathsAttributes: SLOs, capacity limits, configuration history, and prior incidents This structure transforms observability data into a causal reasoning substrate. Instead of treating metrics independently, the agent can traverse dependencies and infer propagation paths. For example, a spike in API latency can be traced through upstream gateways to downstream services and eventually to a throttled database. This graph is not static — it evolves continuously with infrastructure changes and incident learnings. Over time, it becomes a living system model enriched with historical context, enabling better hypothesis generation and faster root-cause analysis. In practice, maintaining graph freshness is critical. You should integrate it with service registries, deployment pipelines, and configuration management systems to ensure it reflects real-time topology. Build the Perception Layer (Observability Pipeline) The Perception Layer acts as the sensory system of the agent, continuously ingesting telemetry across the stack. This includes: Metrics: CPU, memory, I/O, network utilization, Kafka consumer lagLogs: Structured and semi-structured application and infrastructure logsTraces: End-to-end request paths across microservices However, raw ingestion is only the first step. The real value lies in transforming this data into structured, actionable signals. A stream-processing pipeline should: Normalize data across heterogeneous sourcesDetect anomalies using statistical methods and thresholdsEmit structured events tied to entities in the knowledge graph These events act as triggers for the Cognition Layer. Importantly, they should already be enriched with context (e.g., “Service A in region us-east-1 exceeds latency SLO”), reducing the reasoning burden on downstream agents. A critical design consideration is balancing sensitivity and noise. Excessive alerting leads to “signal overload,” a well-known issue where operators — and agents — struggle to prioritize meaningful events . Techniques such as event deduplication, correlation, and temporal aggregation are essential to ensure high-quality inputs. Architect a Multi-Agent Cognition Layer Instead of using a single massive prompt, build a Cognition Layer utilizing a multi-agent LLM architecture (using GPT-5 or Claude-Opus class models) orchestrated by a control plane (e.g., a serverless orchestration layer). Assign specialized roles to different agents: Detector Agent: Monitors the anomaly events and groups related alerts into candidate incidents based on the knowledge graph's dependency structure.Hypothesis Agent: Proposes potential root causes by analyzing the graph and recent telemetry data.Validator Agent: Acts as the investigator by issuing targeted queries back to the observability tools and cloud APIs to confirm or reject the hypotheses based on hard evidence.Planner Agent: Synthesizes an actionable remediation plan. This plan should be an ordered list of operations, complete with preconditions, postconditions, and explicit rollback triggers.Critic (Governance) Agent: Reviews the remediation plan against organizational safety policies before execution, ensuring constraints are not violated. Implement a Guarded Action Layer The Action Layer is what separates an active agent from a passive AIOps recommendation engine. It executes the Planner Agent's steps via the Kubernetes API (scaling, restarting pods) and Cloud Provider APIs (toggling failovers, adjusting traffic weights). Safety is paramount. You must wrap this layer in a strict governance framework: Enforce hard limits on scaling factors and failover scopes.Implement canary rollouts, applying changes to a single zone before expanding.Build auto-rollback mechanisms that trigger immediately if Service Level Objectives (SLOs) deteriorate after an action.Require explicit human-operator approval for high-risk operations like region-wide failovers. Rollout and Optimization Strategies When deploying your SRE agent, start in a "shadow" or assist mode. Allow the agent to observe incidents, propose hypotheses, and draft plans while human operators retain full control and execute the final decisions. As confidence in the system grows, gradually grant it autonomy for low-risk, routine actions. To manage operational costs and latency: Optimize prompts: Externalize static system descriptions into retrieved documents.Caching: Cache intermediate inferences for reuse across similar recurring incidents.Batching: Batch non-urgent tool calls and defer low-impact infrastructure checks to background tasks. Conclusion Agentic AI represents a shift from reactive monitoring to proactive, autonomous operations. By combining a real-time observability pipeline, a continuously evolving knowledge graph, and a multi-agent reasoning system, you can build an SRE agent capable of end-to-end incident management. Using this framework can significantly reduce Mean Time To Recovery, improve root-cause accuracy, and decrease reliance on human escalation — all while maintaining strict safety guarantees. More importantly, these systems create a virtuous cycle: every incident enriches the knowledge graph, improves agent reasoning, and strengthens operational resilience. As cloud systems continue to grow in complexity, agentic SRE architectures will likely become a foundational component of modern reliability engineering.
AI agents have come a long way. They aren’t just answering simple questions, but they’re handling order checks, summarizing support tickets, updating records, routing incidents, approving requests, and even calling internal tools. As these agents slip deeper into real business workflows, just peeking at model logs isn’t enough. Teams need to see everything: what the agent did, why it did it, which systems it poked, and whether the end result actually helped the business. Agent Observability That’s where agent observability comes in. Traditional observability lets teams watch over their apps, APIs, databases, and infrastructure. Agent observability goes a step further. It shines a light on the whole AI workflow: it connects the dots from the user’s request to the agent’s decisions, the tools it touches, the systems it interacts with, and all the way to the final outcome. Let’s see a customer support example. Say a customer messages, “My subscription renewal failed, but I got charged twice.” A human rep checks the account, payment history, billing rules, refund policy, and ticket history before answering. Now, an AI agent might do that job automatically. It’ll spot the billing problem, look up the customer record, call the billing system, check for duplicate payments, and either resolve the issue or escalate it if things get too messy. On the surface, this whole thing just looks like a simple chat. However, under the hood, it’s a full-on workflow. If you want good observability, you need that behind-the-scenes view: Why bother? Because the final response doesn’t tell you the whole story. If the customer comes back unhappy, you need to nail down whether the agent checked the right account, used the right billing tool, hit an error, misread the request, or escalated when it couldn’t help. Don’t just watch the answer: Follow the whole journey When you break down agent interactions, a few basic layers show the full picture. First, track the user request. What did the user ask? Was it urgent, fuzzy, sensitive, or bound to a customer contract? Second, watch the agent’s action. Did it answer straight away, ask a follow-up question, search a knowledge base, use a tool, or hand off to a human? Third, note the context. What sort of information did it use? Did it pull a help article, customer details, invoice, ticket, policy, or product data? Fourth, log tool usage. Did the agent call billing APIs, CRM systems, databases, incident tools, or an approval workflow? Did those calls work, or did they fail? Lastly, look at the result. Did the agent fix the customer’s problem? Was the ticket reopened? Did a human have to clean up after the agent? Without these layers, you’ll know when something was slow or incorrect, but not why. Maybe the context was off, a tool call failed, it lacked permissions, the prompt changed, or something further downstream broke. Use a Single ID to Track Everything One of the easiest fixes is to tag the whole workflow with a tracking ID. Let that ID travel with the request, from the interface through the agent, tools, APIs, and your business systems. Now, if a support ticket gets botched, the team can retrace every step: what the customer asked, what the agent understood, which account it checked, what the billing system said back, and why the agent chose to close or escalate. It’s not just for support. Maybe your SRE team uses an AI agent to help dig into a production alert. The agent scans logs, checks recent deployments, reviews database metrics, and suggests the likely cause. That same tracking ID means you’ll know exactly which systems the agent checked and whether it missed anything crucial. Don’t ignore tool calls; they’re real actions Here’s where things get serious. When an agent calls a tool, it’s taking action. Looking up customers, updating records, approving requests, creating tickets, and kicking off workflows need to be watched closely. For each tool call, capture details like tool name, how long it took, success or failure, retries, permission results, error messages, and what actually happened. Take a finance workflow. Say the agent reviews vendor invoices by extracting details, matching with a purchase order, checking taxes, and routing exceptions to finance. If an invoice gets approved by mistake, did the agent misread the invoice? Match it with the wrong purchase order? Miss a policy update? Or did the finance system return incomplete info? That’s why tracking tool calls is critical. A wrong answer in chat is one thing, but a wrong move in your business system can lead to trouble such as money lost, operations disrupted, and even compliance issues. Understand Agent Decisions, But Protect Privacy Teams need to understand what the agent did, but you don’t want to log every single “thought” it had; it’s just unnecessary noise. Instead, record decision details in a structured way. Example: Intent: billing disputeConfidence: mediumTool: billing lookupReason: account verification neededPolicy result: escalateFinal action: handoff to human Now you have enough to debug the workflow and for reporting, without exposing raw thought streams. You can spot how often agents escalate from low confidence, where tools fail, or if policy rules stop an action. Connect Observability to Business Outcomes Don’t just track the tech stuff; what really matters is whether the agent gets the job done. Watch business metrics like: Resolution timeEscalation rateWorkflow completion rateTool failuresCost per workflowSLA hits or missesReworkHow often humans step in If you’ve got an e-commerce agent helping buyers pick products, check inventory, apply discounts, and guide checkout, you want to know: did the customer actually buy the item? If checkout drops after you tweak a prompt, find out why. Did the agent push out-of-stock items? Apply discounts wrong? Use the wrong tool? Lose customers with confusing answers? Observability at this level helps both engineering and business teams get answers, fast. Build Dashboards for Different Audiences Everyone’s got different needs. SREs care about latency, failed tools, retries, issues with dependencies, and expensive cost spikes. Security teams focus on policy denials, suspicious tool actions, sensitive data flags, or prompt injection attempts. Product owners want completion rates, escalations, customer satisfaction, and abandoned workflows. Engineers need to see how agent behavior shifts after you change the model, prompt, workflow, or deployment. Business folks need throughput, SLAs, cost savings, and improvements to customer experience. Take security operations. Say an agent checks suspicious logins, identity logs, privilege changes, and endpoint activity. Security needs to know: did the agent just review info, or did it try to lock an account? If it got blocked, you want that visible, too. Alert on AI-Specific Failures AI agents fail in new ways. Teams need alerts for things like sudden spikes in tool denials, fallback responses, unexpected tool usage, cost blowups, prompt injection attempts, completion drops, or escalating cases. If an agent suddenly goes wild with refund actions, it could mean a prompt is off, a policy is weak, or something’s getting abused. If fallback responses shoot up, maybe the knowledge base is broken. Costs spike? Maybe the agent is stuck looping, retrying, or making unnecessary expensive calls. Tie alerts to deployments, too. Agents change behavior after you update a prompt, switch models, change schema, adjust policies, or edit a workflow. Teams should compare how the agent behaved before and after. A Simple Way to Grow Observability Observability matures in steps. Basic logs: prompts, responses, errors, timestampsTool visibility: what got used, if it worked, how long it tookEnd-to-end traces: follow the user request through the agent, tools, APIs, systemsBusiness-level result tracking: resolution, escalation, completion, rework, cost, SLAAutomated alerts: regressions after updates, anomalies, unusual patterns Observability is more about making sense of the whole workflow and visibility. Teams need to know what users wanted, what the agent decided, which info it used, which tools it grabbed, which systems it touched, and whether business value was delivered. As AI agents settle into production, observability has to cover more than just servers and app logs. The teams that win will be the ones who trace agent behavior end to end, spot failures early, explain what happened, and keep improving safely.
There is a pattern that repeats itself across engineering organizations regardless of team size, tech stack, or industry. A sprint ends. Features are shipped. The QA team is still writing automation for the previous sprint. The backlog of unautomated scenarios grows. Leadership asks what it would take to close the gap. The answer comes back: more engineers, more time, more tooling budget. Six months later, the gap is the same size. Sometimes larger. This is not a resource problem. It is an architectural problem. And until the architecture changes, the gap does not close. The Upstream Problem Nobody Measures When engineering teams analyze their automation coverage gaps, they almost always focus on execution test runs that are slow, maintenance is high, and flaky tests waste time. These are real problems. But they are downstream of a more fundamental issue that rarely gets measured: the time between a requirement being written and automation existing for it. In a traditional QA workflow, that gap looks like this: Requirement lands in JiraDeveloper builds the featureQA engineer reads the requirement, interprets it, designs test scenariosQA engineer writes test casesQA engineer scripts automation in Playwright or SeleniumQA engineer executes, debugs, maintains Steps 3 through 5 take days. Sometimes weeks. Every sprint adds to the backlog. Every requirement change breaks existing automation. The team runs hard and stays in the same place. The industry has responded to this by automating step 6, making execution faster, smarter, and more parallelized. But steps 3 through 5, requirement interpretation, test design, and scripting, remain almost entirely manual in most organizations. This is the upstream problem. And it is where the real automation opportunity sits in 2026. What Changes When You Start From Requirements The architecture shift that actually closes the coverage gap starts much earlier in the pipeline than most automation teams consider. Instead of "requirement arrives → developer builds → QA manually creates coverage," the new model is "requirement arrives → AI evaluates and enhances → AI generates test cases → AI generates scripts → AI executes → results with traceability returned." The human does not design coverage. The human does not script automation. The human reviews requirements, approves test cases when necessary, and focuses on exploratory testing and quality strategy, the work that actually requires human judgment. This is what requirement-driven autonomous testing means in practice. The requirement is the input. The executed test result is the output. AI owns everything in between. The 5 Stages of a Requirement-to-Result Pipeline Platforms like TestMax implement this model as a connected five-stage pipeline. Understanding each stage explains why the architecture works differently from traditional automation approaches. Stage 1: Requirement Ingestion The pipeline accepts requirements from wherever they live, Jira tickets, Azure DevOps work items, Word documents, PDFs, Excel files, or requirements authored directly in the platform. No reformatting required. The requirement enters the system as it exists. This matters because one of the friction points in traditional QA automation is the translation step, converting a Jira ticket into a format that test tooling can work with. When ingestion is native, that step disappears. Stage 2: Requirement Intelligence Before any test generation begins, every requirement is evaluated by AI across five quality dimensions: clarity, completeness, consistency, testability, and correctness. This stage is the most underestimated in the entire pipeline. Poor requirements produce poor tests always. A requirement that says "the login form should work correctly" is not testable. A requirement that specifies valid credentials, invalid passwords, empty field behavior, account lockout thresholds, and session persistence rules is. When AI catches ambiguity at the requirement stage, it costs nothing to fix. When that same ambiguity surfaces after automation has been built against it, it costs days. The requirement of the intelligence layer moves the defect detection upstream to where it is cheapest. Requirements that fail quality review are flagged with specific improvement suggestions. AI offers rewrites. Nothing ambiguous proceeds to test generation. Stage 3: AI Test Case Generation Once a requirement passes quality review, the platform generates structured test cases automatically. Not surface-level happy path scenarios, complete coverage across positive paths, negative paths, boundary conditions, and edge cases. For a single requirement, like users can reset their password via email verification, the generated coverage includes: Valid email address submitted – verification email receivedInvalid email format – appropriate error returnedEmail address not registered – system response without revealing account existenceVerification link clicked – password reset flow initiatedVerification link expired – appropriate error with re-send optionNew password does not meet policy requirements specific validation messagesSuccessful reset – session handling, redirect behaviour All of this is generated automatically from the requirement. No human designs the coverage strategy. Stage 4: Automation Generation Approved test cases are converted into executable Playwright scripts automatically. Production-ready code with appropriate waits, assertions, and selector strategies generated without a human writing a single line. This is the step that eliminates the scripting bottleneck. In traditional automation, scripting bandwidth is a hard ceiling on coverage growth. When the team can script 50 test cases per sprint, coverage grows at that rate regardless of how many requirements are produced. When scripts are generated automatically from approved test cases, that ceiling disappears. Coverage can grow at the rate requirements are produced, not the rate engineers can write code. Stage 5: Autonomous Execution and Evidence AI agents execute the generated test suite through Playwright MCP. They manage environment setup, handle retries, capture logs, screenshots, and video per test, and return a complete traceability matrix linking every result to its source requirement. The output is not a pass/fail count. It is a complete evidence package suitable for audit, governance, and release decision-making generated automatically from the requirements the team was already writing. Why This Architecture Closes the Coverage Gap The traditional automation model has a linear constraint: coverage grows proportionally to engineering effort. More requirements always mean more backlog because the human work required per requirement is roughly constant. The requirement-driven autonomous model removes the linear constraint. When AI handles test design, scripting, and execution per requirement, the engineering effort per requirement drops dramatically. Coverage can scale with the requirements themselves rather than with team headcount. There are three concrete consequences: Coverage lag is eliminated. When test generation takes minutes rather than days, new features can have automation in the same sprint they are built. The perpetual state of automation backlog, where coverage is always weeks behind the code it is supposed to validate, is a consequence of the manual model, not an inevitability. Maintenance burden shifts. In traditional automation, 60 to 80 percent of automation engineering effort goes to maintaining existing scripts. When AI generates scripts from requirements, the maintenance responsibility belongs to the generation layer. UI changes that would previously break dozens of handwritten selectors are addressed at the generation stage. Requirement quality improves as a side effect. When every requirement must pass an AI quality evaluation before entering the test pipeline, the incentive to write precise, testable requirements increases. Teams that implement requirement-driven testing typically report improvement in requirement quality within two to three sprints, not because they trained their product managers differently, but because the pipeline now provides immediate, specific feedback on every requirement. Integrating With Existing Workflows A practical concern with any architectural change is migration cost. The requirement-driven autonomous model does not require replacing existing infrastructure. Generated Playwright scripts integrate directly into existing CI/CD pipelines. Teams running Jira or Azure DevOps connect those systems natively requirements flow in without manual re-entry. For teams using ATF or other existing test frameworks, the autonomous testing layer runs alongside rather than replacing what already exists. The practical starting point is a single sprint. Take the new requirements entering your backlog this week. Run them through a requirement-driven platform. Compare the test coverage produced in time, in scenario depth, in maintenance overhead against what your team would have produced manually. The experiment answers the adoption question more convincingly than any benchmark. The Architectural Question for 2026 The relevant question for QA teams in 2026 is not whether to use AI in testing. Almost every serious testing platform has added AI capabilities in some form. The question is: where in the pipeline is AI actually doing meaningful work? At one end of the spectrum, AI heals broken selectors and suggests which tests to run. The human still reads requirements, designs coverage, writes scripts, and manages execution. AI makes individual tasks faster. At the other end, AI owns the pipeline from requirement evaluation through execution and evidence delivery. The human provides requirements and reviews results. AI does everything in between. The teams that figure out where they sit on that spectrum and decide consciously which model their coverage goals require are the ones that will stop having the same conversation about automation backlogs next quarter.
Switching from one single sign-on (SSO) vendor to another is a complex process that involves more than just changing technologies. This is a high-stakes identity operation that impacts security, user experience, following the rules, accessing applications, and keeping things running smoothly. It's not the same as moving a reporting tool or a collaboration platform because SSO is at the front door of every application in your environment. If you set it up wrong, everything will stop working. But the biggest danger of SSO migrations is not that they won't work. The little things that go wrong are the most annoying Users being locked out of apps that are important to the businessAccounts being left alone that were never deprovisionedMFA enrollments disappearing without a word and Helpdesk queues are getting longer on the morning of cutover because there was no communication about the change. This guide discusses the best ways to move to cloud SSO and the most important things to keep in mind. It discusses everything from getting the identity estate ready for the move of integrations to phased rollout strategies, making the user experience as smooth as possible, and planning for MFA migration. Why Businesses Change SSO Providers Companies don't usually change their SSO platforms on a whim. One of the following things usually makes it happen: Acquisition of a vendor or announcement of the end of a product's life. Cost consolidation or figuring out how to use enterprise licenses. Standardizing platforms under a broader cloud strategy. Requirements for compliance or regulation that the current business can't meet. Issues with scalability, performance, or missing features in the current platform.A merger or acquisition that introduces a second identity domain. Whatever the reason, migration causes compounding risk since SSO is foundational infrastructure, not an individual application. 3 Types of Migration Approaches and Their Differences There are three main ways to move to SSO, and each one has its risks and effects on governance. Federated Protocol Swap Retain the same IdP architecture but replace the vendor platform underneath. For example, moving from PingFederate to Entra ID External Identities. The protocol (SAML, OIDC, SCIM) may remain the same, but attribute mappings, claim transformations, and session behaviors differ in ways that are often not clear until something breaks in production. Full IdP Replacement The old IdP is completely removed, and a new one is put in its place. Need to set up, test, and cut over every connection with a service provider (SP) again. This type has the most risk, and it's also the one that most businesses don't consider. Consolidation Migration A single authoritative platform brings together many IdPs. Such an event can happen when companies merge or acquire another. There are technical and organizational problems, such as different business units having different app owners, SLAs, and levels of tolerance for disruption. Governance alignment needs to happen before any technical work can begin. Migration Process: The 7 Steps Audit and clean upPlan and PrepareMFA MigrationCommunication PlanningPhased RolloutGovernance ConsiderationDecommission and close out Step 1: Audit and Clean up Most organizations rush, ignore, and migrate everything, including unused applications, inactive users, orphaned accounts, and integrations that have remained unused for three years. These don't break, but leave a security risk. Following validations reduces testing and inventory. Create a complete, clean list of applications: Validate against the CMDB or application catalog.Validate apps being used.Validate access logs from SIEM.Validate against IGA platforms.Reduce redundant applications. Create a complete, clean list of valid users: Active users.Exclude accounts with no activity for 90 days. Exclude dormant accounts whose passwords were never changed.Validate against IGA platforms and HR systems. Mark the unused applications for the decommissioning process. Note down the protocols used (SAML, OIDC, WS-Federation, or legacy), application owners, attributes and claims, MFA requirements, CA policies, and session time-out configurations. Step 2: Plan and Prepare Every application that relies on SSO consumes identity attributes passed in SSO protocols. New IdPs rarely use the same attributes and often have case-sensitive and format changes. These mismatches cause silent authentication failures and will be extremely difficult to diagnose during cutover. Application Metadata Prepare the claims transformation registry. Confirm the case and formats.Validate transformation rules. Redirect URLs For each application, configure a transparent redirect from the legacy IdP login URL (or intranet homepage) to the new IdP's login endpoint. The user will not experience major changes. The only change a user would notice would be the new MFA prompt. Rollback Process Identify when you should roll back.Who will be able to make the rollback decision? Rollbacks generally occur in the following use cases: The rate of successful authentications drops below 95%.Validate SSO failures for major applications.More calls to the help desk than usual during the first 2 days of migration. Migration go-live Documentation regarding new login flow end-to-endPlan for extended staff during the migration. Validate helpdesk access to the new platform.Identify and set up escalation contacts for issues that the helpdesk cannot resolve. Step 3: MFA Migration Prepare a complete inventory of existing MFA enrollments that includes How many users have MFA enrolled vs. password only? What factors are in use? Authenticator Apps – Need to re-enrollSMS – Same phone number and email can be used. Hardware token – FIDO2/WebAuthn keys can be reused if the new vendor supports itBiometrics – Need to re-enroll.How many and which users have only a single factor enrolled? Follow the steps for re-enrollment: Open the self-service enrollment portal.Phone numbers and emails can be reused (since they remain the same).Send advance communications at least two weeks out, explaining what will change and why.Track re-enrollment completion rates by department and group.Send follow-up emails, including deadlines.Set up a plan to re-enroll privileged accounts. Step 4: Communication Plan Communication is a major step in the migration process and should be tracked as a separate workstream, treated with its timeline, owners, deadline, and success metrics. There are three different audiences involved in SSO migration. End users who simply need to know what will change and what to do.Helpdesk and IT staff who need operational readiness confirmations.Stakeholders who need status updates and risk visibility. Major email templates include: General UpdatesMFA-Enrollment NoticesCut Over Day notification Step 5: Phased Rollout Never perform a cutover for the entire organization. Instead, choose a phased rollout. This reduces risk, helps validate configurations in production with real users and real traffic, and provides time to identify issues before affecting most of the organization. First Phase—Technology users Internal IT staff.Identity administrator.Helpdesk personnel.power users.Second Phase - High-frequency application users like ERP applications CRM applications Collaboration platform BI toolsThird Phase—General user population Lower-risk departmentsExceptions and low-activity users ContractorsUsers who log in very lessThird-party users Step 6: Governance Considerations To ensure successful migration and validations, consider the following governance aspects: Changes to IGA Solutions JML changes Provisioning accounts in IDP with required attributes for SSO claims.Disabling or deletion of accounts during terminations.User transfers: changes to account attributes and group memberships.Changing birthright roles Update with new SSO groups.Cleanup of legacy vendor applications. Audit Log Monitoring Onboard logs from new vendor to SIEMSet up alerts for notifications, including Authentication failuresCA policy failuresPassword failuresToken expiration Non-Human Identities Create a separate inventory of NHA accounts and migrate their credentials to the new system. These include accounts with no owners. Step 7: Decommission and Close Out The process can move forward once all the checks are done and the MFA enrollments are at acceptable levels. Monitor the new system for 30 days and plan for the decommissioning of the old SSO solution. Conclusion SSO is the authentication layer for all the applications in the organization. Performing migration without a proper plan includes risk. Most companies follow one or a combination of the above-described approaches. Adhering to a proper plan with communication and the right strategies will never make you think about rollback strategies.
This post walks through building and running a real-world agentic workflow with Agentican and Quarkus. Specifically, an agentic workflow to automate market research and information sharing: Identify the top vendors within a market category.Research the positioning and strengths of each vendor.Classify the findings as either standard or urgent.Draft a brief to share with others in the company. Prerequisites QuarkusJava 25Maven (or Gradle)LLM provider API key Step 1: Add the dependency Create a Quarkus app, and add the Agentican Quarkus runtime module: XML <dependency> <groupId>ai.agentican</groupId> <artifactId>agentican-quarkus-runtime</artifactId> <version>0.1.0-alpha.3</version> </dependency> Step 2: Define Agents, Skills, and the Workflow Create an `agentican-catalog.yaml` file on the classpath. This is where you describe: Who does the work (agents)What they need to do it (skills)How they will do it (workflows) YAML agents: - id: researcher name: researcher role: | Expert at finding accurate, sourced information about companies and markets. Quotes sources. Distinguishes opinion from fact. - id: writer name: writer role: | Synthesizes research into structured, concise briefs. Avoids hedging language. Cites concrete evidence. skills: - id: web-search name: web-search instructions: | When a question requires external information, call the search tool first. Quote sources in your answer. Update the `agentican-catalog.yaml` file to define the workflow. YAML workflows: - id: market-brief name: market-brief description: Research vendors in a market and produce a structured brief outputStep: deliver params: - name: topic description: Market to research required: true - name: vendor_count description: Number of vendors defaultValue: "5" steps: - name: identify agent: researcher skills: [web-search] instructions: | Identify the top {{param.vendor_count} vendors in {{param.topic}. Return a JSON array of vendor names — names only, no commentary. - name: deep-dive type: loop over: identify steps: - name: analyze agent: researcher skills: [web-search] instructions: | Deep-dive vendor {{item}: positioning, key strengths, recent news. Quote sources. - name: classify agent: writer instructions: | Read the per-vendor deep-dives below. If any vendor has launched a competitive feature in the last 30 days, return the single word 'urgent'. Otherwise return 'standard'. Deep-dives: {{step.deep-dive.output} dependencies: [deep-dive] - name: deliver type: branch from: classify default: standard branches: - name: urgent steps: - name: urgent-brief agent: writer instructions: | Synthesize a vendor brief flagged URGENT for executive review. Lead with the recent competitive moves. Topic: {{param.topic} Deep-dives: {{step.deep-dive.output} - name: standard steps: - name: standard-brief agent: writer instructions: | Synthesize a vendor brief. Topic: {{param.topic} Deep-dives: {{step.deep-dive.output} A few things worth flagging: agent: researcher references the agent for a step, skills referenced by name, too.outputStep designates the step whose output becomes the workflow's typed result.{{param.X} interpolates workflow inputs into step instructions.{{step.X.output} interpolates an upstream step's output.{{item} is the current value inside a loop iteration.type: loop steps take an over reference (a step that produced a list, or a list-typed param).type: loop steps run their nested steps once per item, in parallel, and on virtual threads.type: branch steps take a from reference (a step whose output is used to select a branch).branches: mutually exclusive steps (or sets of steps) with default for unrecognized values. The framework loads agentican-catalog.yaml from the classpath, or you can define where it's loaded from: Properties files agentican.catalog-config=/etc/agentican/agentican-catalog.yaml Note: Agents, skills, and workflows can be defined via a fluent builder API as well. Step 3: Configure the Models Agentican reads the engine configuration from `application.properties`. The minimum is one LLM: Properties files agentican.llm[0].api-key=${ANTHROPIC_API_KEY} The provider defaults to `anthropic`, and the model defaults to `claude-sonnet-4-5`. Want OpenAI instead? Properties files agentican.llm[0].provider=openai agentican.llm[0].api-key=${OPENAI_API_KEY} agentican.llm[0].model=gpt-4o-mini Want to mix and match? Configure `name`s and reference them per-agent in the YAML catalog: Properties files agentican.llm[0].name=default agentican.llm[0].api-key=${ANTHROPIC_API_KEY} agentican.llm[1].name=efficient agentican.llm[1].provider=openai agentican.llm[1].api-key=${OPENAI_API_KEY} agentican.llm[1].model=gpt-4o-mini Step 4: Create a Typed Workflow Instance Define the workflow input and output records: Java public record ResearchParams(String topic, int vendorCount) {} public record VendorBrief(String topic, List<Vendor> vendors) { public record Vendor(String name, String positioning, List<String> strengths) {} } Then inject the typed workflow, and call it from a REST endpoint: Java @Path("/market-brief") public class VendorBriefResource { @Inject @AgenticanWorkflow(name = "market-brief") Workflow<ResearchParams, VendorBrief> brief; @POST @Path("/{topic}") public VendorBrief generate(@PathParam("topic") String topic) { return brief.start(new ResearchParams(topic, 5)).await(); } } Now, test the endpoint: Shell curl -X POST http://localhost:8080/market-brief/data%20observability%20platforms A few things worth flagging — they're what set this apart from a generic "call an LLM" library: ResearchParams.vendorCount becomes the workflow parameter vendor_count via SNAKE_CASE mapping.start() returns a WorkflowRun<VendorBrief> and await() parses the output step's text into a VendorBrief.@AgenticanWorkflow(name = "vendor-brief") resolves the registered workflow at injection time. Note: WorkflowRun itself exposes future() for a CompletableFuture<R>, and there's a ReactiveWorkflow<P, R> Mutiny variant for Vert.x stacks. Step 5: Add Agent Tools Agentican ships two integrations out of the box: MCP (Model Context Protocol) There is one config block per server. Tools are auto-discovered: Properties files agentican.mcp[0].slug=github agentican.mcp[0].name=GitHub agentican.mcp[0].url=https://mcp.github.com/sse agentican.mcp[0].headers.Authorization=Bearer ${GITHUB_TOKEN} Composio 100+ SaaS toolkits — Slack, Notion, Linear, Salesforce, GitHub, Google Workspace: Properties files agentican.composio.api-key=${COMPOSIO_API_KEY} agentican.composio.user-id=user-123 Tools are referenced by name within agent steps: YAML steps: - name: research agent: researcher tools: [github_search_repositories] instructions: "Profile open-source vendors in {{param.topic}." Structured agentic workflows for the JVM. Where to Go Next Getting Started — install, configure, and run workflowsCore Concepts — architecture, terminology, and data flowWorkflows & Steps — CDI surface, beans, qualifiers, override patterns.Agents — defining agents, skills, and rolesGetting Started (Quarkus) — dependency setup, config, first taskCDI Integration — injection, qualifiers, lifecycle events, bean overridesREST API — endpoints, SSE streaming, WebSocket, error codesObservability — Micrometer metrics, OTel tracing, Prometheus queries
Teams often say they are building one app. A lot of the time, that is not true. I saw this while reviewing a telemedicine MVP. At first, the plan sounded simple enough: video visits, messaging, scheduling, and basic records. Then the version-one list kept growing: Patient appprovider dashboardAdmin panelMessagingVideoBillingEHR connectionDevice support later At that point, this was no longer one app. It was several systems being planned as one MVP. A patient-facing productA provider-facing productAn admin productA set of outside-service connections When a team treats all of that like one first release, things get messy before development even starts. The Moment It Stopped Being One App The problem was not the number of screens. The problem was the number of users, roles, and data rules hiding behind those screens. A patient needed intake, booking, reminders, and follow-up. A provider needed schedules, patient context, notes, and quick actions during the day. An admin needed visibility, support tools, and role controls. The outside-services side added video vendors, messaging vendors, EHR work, and, later, device data. That is not one product. That is a group of different systems with different jobs. Once that became obvious, the planning changed. Split the Product by User First Before estimating anything, it helps to split the product by who it is for. For this telemedicine project, the first useful split looked like this: 1. Patient Side This part handled: IntakeBookingRemindersFollow-up messagingJoining a visit The patient's side had to stay simple. It also had to be clear about what the patient could and could not see. 2. Provider Side This part handled: Schedule viewPatient detailsVisit notesQuick responsesRole-based access This was not just a different set of screens. It had different speed needs, different daily habits, and different data access rules. 3. Admin Side This part handled: Role setupSupport actionsVisibility into operationsReportingNon-clinical controls Admin work often looks small during planning. In real projects, it adds a lot of rules and a lot of testing. 4. Outside-Service Work This part handled: Video vendor setupMessaging vendor setupEHR-related workFuture device dataLogging and audit-related movement of data This is where many teams get surprised. Video, messaging, and EHR are not tiny add-ons. Each one brings its own work. Start With Access Rules Before the Feature List In multi-role products, one of the quickest ways to find hidden work is to define access rules early. Before locking the feature list, ask: Who can create this dataWho can read itWho can change itWho can delete itWho can export it For the telemedicine project, this made a big difference. A few features looked simple in the scope doc. Once the team asked who could view or change the related data, the work got much larger. A basic example: Admins can help fix booking problems. That sounds harmless. But then the real questions start: Can admins see messages?Can they see visit notes?Can they see call history?Can they open uploaded files? That one sentence can change a big part of the system. Access rules often show hidden work much faster than a feature list does. Treat Outside Services as Separate Work Another mistake teams make is treating outside services like small items on a checklist. On paper, it can look like this: VideoMessagingEHR later In practice, each one adds its own work: Vendor setupRequest and response formatsError handlingRetry rulesLoggingReplacement cost if the vendor needs to change later That is why these items should be planned separately. For the telemedicine case, once video, messaging, and EHR work were split out from the main product list, the first release became easier to define. Some items that seemed close to launch were clearly not ready for version one. Ship One Complete Path First Once the team stopped calling everything an MVP, the first release got smaller. The version-one path that stayed in looked like this: Patient intakeAppointment bookingSecure video through the chosen vendorFollow-up messagingBasic provider access controls That was enough to test whether the product solved a real problem for a clinic. What moved out of the first release: Deeper EHR workMore reportingDetailed billing flowsDevice supportBroader admin tooling Those things were not bad ideas. They just did not belong in the first build. 4 Simple Documents to Create Before Sprint Planning When a team starts to suspect that one MVP is several systems, four short documents can help a lot. 1. User-to-System Map List each part of the product and the main user for it. 2. Permission Matrix Write down who can create, view, change, delete, and export each type of data. 3. Outside-Service List Separate core product work from vendor work and data that moves in or out of the system. 4. First-Release Path Write the one end-to-end path that version one has to get right. These are short documents, but they make planning much better. Why This Matters Outside Healthcare, Too This lesson is not only for telemedicine. It applies to any multi-role product where the team is building for more than one type of user. That includes: Customer apps with admin panelsSaaS products with back-office toolsPlatforms with provider and client sidesProducts that depend on outside vendors from day one The moment a team has different users with different goals, the work stops being “just one app.” Final Point A lot of MVPs get too big because teams keep calling them one product long after that stops being true. The fix is not always better estimates. Sometimes the fix is much simpler: Split the product by user.Write down the access rules.Separate outside-service work.Ship one complete path first. That makes the first release easier to plan, easier to build, and easier to test.
In my 30 years of navigating the IT landscape, I’ve seen ‘Agile’ transform from a revolutionary mindset into what often feels like a series of manual project hurdles. In many large projects I’ve led, I’ve noticed we’ve traded innovation for a culture of ‘babysitting’ Jira boards and tracking Excel sheets. I wish to develop the Agentic Agile Office (AAO) not as another layer of automation, but as a fundamental shift in how I believe we must manage project velocity and governance. The Bottlenecks I’ve Encountered In my experience, traditional Enterprise Agile often buckles under its own weight. I’ve watched Technical Program Managers (TPMs) and Scrum Masters spend up to 60% of their time on administrative overhead. I’ve seen the "manual tax" of chasing status updates slow down the very speed Agile was designed to create. I believe it’s time to move past this. How I Define Autonomous AI Agents The AAO framework I’m proposing moves beyond simple chatbots. I am focusing on agentic AI — systems capable of reasoning, planning, and executing tasks autonomously. Within my framework, these agents don't just answer questions; they take action: The Backlog Agent: This will automatically analyze user feedback and technical debt to suggest prioritization scores for the Product Owner.The Dependency Agent: This agent scans multiple team boards in real-time. I want it to identify and flag architectural conflicts before they cause a sprint failure.The Governance Agent: I see this as the ultimate safeguard, ensuring all code commits meet compliance standards without a human auditor needing to manually check every pull request. Deep Dive: The Architecture of the AAO While defining these agents is the first step, I believe it is critical to understand the architectural engine that drives this office. To move beyond simple automation, I have structured the AAO as a three-tier system: 1. The Intelligence Layer: Reasoning Over Data In my three decades in the industry, the biggest issue hasn't been a lack of data, but the "data fog." I designed the AAO to use large action models (LAMs) that don't just read your tickets; they understand the intent behind them. Contextual memory: I want these agents to remember that a delay in a previous quarter was caused by a specific API bottleneck so they can predict similar risks today.Reasoning loops: Instead of a static trigger, I’ve structured these agents to use "Chain of Thought" processing to validate if a story is actually "Ready" based on historical standards. 2. The Workflow: A Day in the Life of an Agentic Sprint I’ve reimagined the standard sprint cycle to show exactly where I believe these agents provide the most value: Pre-planning: Before the team meets, I have the Backlog Agent scrub requirements. If a user story lacks an acceptance criterion, the agent flags it to the Product Owner immediately, saving us 30 minutes of "discovery" time during the meeting.In-sprint execution: I’ve implemented the Dependency Agent to act as a "digital scout." If a developer changes a schema that another team relies on, the agent detects the conflict in the pull request and notifies both Scrum Masters before the build even fails.The "always-on" retrospective: I believe retrospectives shouldn't just happen every two weeks. My Insight Agent tracks velocity trends daily. If I see a team's burndown stalling, the agent provides me with a root-cause analysis before I even ask. 3. My Strategy: Agentic Over Generative AI I want to be clear on a point of common confusion: Generative AI writes the email; agentic AI recognizes a project risk, decides an email is necessary, and drafts it for my review. In my framework, I am moving the human from being the operator of the tool to being the editor of the agent's actions. I’m shifting our workload from "doing the work" to "verifying the outcomes." Why I Believe This Redefines Our Roles This technical shift leads to a natural question: if agents are handling the logistics, what happens to the people? In my view, this shift doesn't diminish our roles; it elevates them. By offloading the "babysitting" of Jira boards to autonomous agents, I want to empower leadership to focus on: Complex problem solving: Negotiating high-level blockers that require a human touch.Mentorship: Spending more time coaching teams to improve their craft.Strategic alignment: Ensuring technical output truly maps to business value. My Vision for the Future To me, the Agentic Agile Office represents the transition from Agile-by-process to Agile-by-intelligence. I am confident that by integrating these agents, enterprises can finally achieve continuous delivery without the human burnout I’ve witnessed throughout my career. I no longer ask "How do we scale Agile?" I now ask: "How quickly can I help you integrate the agents that will do the scaling for you?"
Editor’s Note: The following is an article written for and published in DZone’s 2026 Trend Report, Platform Engineering and DevOps: How Internal Platforms, Developer Experience, and Modern DevOps Practices Accelerate Software Delivery. The role of the enterprise developer has become more complex over time as organizations adopt new technologies and tools, often without retiring their old ones. Add high staff turnover and increasing time and cost pressure, and developers are confronted with charting their own path through the SDLC. The purpose of internal developer platforms (IDPs) is to create a win-win scenario that benefits developers and their organizations. In this tutorial, you’ll define one golden path for a backend service that covers service setup, deployment, observability, and guardrails end to end. Step 1: Define the Platform Product and First Golden Path Successful IDP efforts focus on end-to-end developer workflows: building a new interface, deploying an updated microservice, running a regression suite, or standing up an environment. Ideally, the whole workflow can be supported directly from your IDP as self-service. Once you have identified the workflow to support, you need to design the “golden path,” which parts you will standardize and what you expose as configuration. It’s important to get that balance right. Components that have to change often, like service accounts, interfaces, and sizing, should be configurable. Creating templates and patterns helps reduce variability between outputs, making it easier to roll out necessary patching and updates. For the first golden path, pick one high-value workflow that is common, repeatable, and easy to measure. We will use the deployment of our backend service to an integration test environment because it touches build, deployment, validation, and evidence capture in one flow. User adoption is the key to success. To measure, it’s important to track both user adoption, such as how often a workflow is triggered, and outcome metrics like the number of compliant application instances, percentage of deployment failures, and average deployment duration. Step 2: Design the Golden Path (Templates and Defaults) Next, we get to design the golden path. An important factor for the developer experience is to provide documentation with contextual guidance. This can be traditional how-to guides or more advanced features such as AI-enabled chatbots. The documentation should explain how testing, application deployments, and other lifecycle activities happen along the golden path, and provide architectural guidance on embedding any newly developed capability in the existing architecture. Standards and governance are other aspects that should be available for self-service, including naming conventions, common libraries, and reusable services. On the technical side, the golden path should cover at least the following: Code repo and standard branching structureSkeleton code based on coding standards (e.g., environment config file, logging framework, data layer)CI/CD pipeline into an ephemeral cloud environment, or pointed at a standard persistent dev environmentSkeleton quality gates in the CI/CD pipeline (e.g., unit test, functional regression, security scan)Access to common utilities; injection of environment values (e.g., URLs, IP addresses, access and secrets management)Ability to spin up the environment (if cloud based) And lastly, the IDP needs to be designed with intuitive naming, a search function, tagging methods, and a hierarchical browsing structure so users can easily find the appropriate golden path. Supporting multiple ways of discovery provides a more resilient interface and eases the adoption of new golden path templates as they become available. For our backend service, choosing the workflow will show a representation of the steps included. Step 3: Wire Self-Service Workflows (Without Tickets) Besides golden path templates, IDPs should aim to be a one-stop shop for developers, so common requests should be available for self-service. Your existing ticket/ITSM systems can be a good source for creating the backlog. Identify the most common requests and start automating them in priority order. In many cases, a ticket continues to be useful even in the self-service model for tracking and approvals, which can be integrated into the automatic workflow. Approvals should be provided automatically based on defined criteria, and only require human approvals when the request is outside of those parameters, such as access to restricted data, use of expensive resources, and non-standard requests. Over time, developers should be able to request new features through a transparent feature backlog and voting mechanism to engage the community. When creating new features, keep things common wherever possible and provide ways for users to tailor their requests. For example, the standard deployment process might define a step for secrets injection, but some teams will tailor the process to skip it as necessary. This approach has two advantages: It creates a common language and process across teams and reduces the work to build and maintain the IDP. Spending a bit more time up front to create customizability pays off over the medium and long term. For our backend service, the first service we define is deployment to the integrated test environment. Step 4: Standardize Delivery With CI/CD + GitOps + IaC in One Flow The principle of the golden path deployment process remains unchanged: You build a software artifact once, and you deploy it multiple times along the environment path. For our backend service, promotion should happen through a versioned change (think GitOps) to the desired environment state, so application version, infrastructure definition, and deployment evidence remain traceable together. In the build stage, code is prepared in any pre-compile steps, then compiled and packaged with all necessary configuration files. In the deployment process, environment variables are injected, and the package is deployed to the target environment, which is scripted as Infrastructure as Code. The validation itself is usually layered: a technical validation to confirm that the deployment was correct, functional regression of core functionality, and testing the new changes. This sequence is based on speed of feedback, which is important in an automated IDP service. When a validation check fails, the golden path needs to have defined failure behavior with clear steps to execute. Pipeline failures like a broken build, failed test, or policy violation will block progression automatically. If the environment is materially impacted, a rollback is automatically initiated. Only in rare cases should a human evaluation be required — for example, when the level of ambiguity is too high and impacts stakeholders who are using the environment. Some policy violations can be treated with time-bound exceptions, such as allowing a new security vulnerability in a non-production environment. This allows functional testing to continue while the team remediates the security vulnerability. Prior to going live, the exception would be removed so the security vulnerability doesn’t progress to production. These types of exceptions should be set to auto-expire to prevent them from being forgotten later. Golden Path Steps and Guardrails stepself-service actionguardrailevidence Build Trigger pipeline via check-in action in source control Code scan and unit test results Build log, composition scan result Promote to non-prod environment Merge to staging branch, promotion request Technical validation, regression test Test results Promote to prod Promotion request Approval and compliance check Approval and audit trail Rollback Automated trigger or manual request Post-rollback validation and regression test Test results Step 5: Bake in Operability for Observability and Day-2 Readiness IDPs reduce cognitive load and toil as solutions to common concerns are built in. This is especially true for the operational concerns. Each workflow and self-service feature creates the log files and traces for auditability. All code and configuration are driven from version control, and the metrics recorded provide insights into the outcomes and performance of the IDP. New operational initiatives, like introducing a software bill of materials, can be rolled out across all technologies that use the IDP. When done correctly, templates can be updated centrally, and the log files provide full auditability to identify where old versions are still in use, reducing the overall security exposure. The IDP governance model needs to define the ownership of templates and any inheritance rules. For instance, some teams will tailor the template by adding additional steps required for their technology. Alongside the IDP instrumentation, standard dashboards and alert definitions ship with the template, pre-wired to the appropriate ownership group. Who responds to what is documented, not assumed. Runbooks and escalation paths are stored in version control alongside the service itself so they evolve with the system rather than rotting in a forgotten wiki page. Our backend service will include the following with the golden path: Logs, metrics, and tracesAlertsRunbook linkOwnership metadata The final piece is the feedback loop. Incidents, near-misses, and recurring friction points are resolved and also used to help continuously improve the platform, first becoming a backlog item. Step 6: Add Guardrails and Governance Without Slowing Delivery The IDP should leverage approved templates where possible and embed basic compliance and policy checks in the workflows. Platform developers will receive immediate feedback on any problems they need to fix. When issue resolution requires a longer time, time-bound exceptions can be allowed. Along the environment path from development to production, the quality gates should become more restrictive as the software quality improves. For our backend service, we define security scanning prior to deployments, and we don’t accept any deviations from the corporate standard for it. We follow a simple block, warn, escalate paradigm. The goal is to address problems that teams can deal with immediately and provide enough time for more complex work. This balance allows work to flow at pace. It is important to version templates and workflows so you can track what is in use. When significant problems are identified with a version, you can use the IDP logs to find any items in use and replace them quickly. Having the right guardrails in place might feel restrictive but in fact reduces the amount of rework over time as there are fewer incidents. Fast feedback reduces the time it takes to resolve problems. Step 7: Measure Adoption, DevEx, and Platform ROI One of the key success factors for IDPs is having the ability to measure adoption (covered earlier), developer experience, and platform ROI (e.g., DORA, SPACE). This allows you to break down and distinguish between adoption measures and outcome metrics. Implementing these criteria in the platform from the beginning captures data systematically. Good adoption measures to start with: number of executed workflows, number and currency of templates, and number of active users. The following outcome metrics can also be used as part of the business case for IDPs: deployment failure rate, MTTR, incident volumes, number of tickets, and security vulnerabilities. The team managing the IDP should actively use the metrics together with captured feedback from the user base (e.g., feature requests) to prioritize the backlog. Executive dashboards should be implemented to provide accountability and increase support across the organization. A Minimal IDP You Can Scale Bringing it together, take the following actions to kick-start your internal developer platform: Choose a common and not too complex workflow for your first golden pathCreate the code repository and CI/CD pipelineDefine a self-service UI for the workflowEmbed quality gates, metrics, and operational tooling into the workflow Start with one workflow for one pilot team, prove the path, then extend to the next workflow or team. Don’t forget to engage with the pilot users to receive feedback and support adoption. If you want to dive deeper, explore the CNCF Platforms for Cloud-Native Computing whitepaper and Platform Engineering Maturity Model. This is an excerpt from DZone’s 2026 Trend Report, Platform Engineering and DevOps: How Internal Platforms, Developer Experience, and Modern DevOps Practices Accelerate Software Delivery.Read the Free Report
Feature flags have become standard practice in enterprise applications, enabling teams to release code into production environments without exposing new features to users. As teams leverage feature flags to increase delivery velocity, technical debt accumulates. Left unchecked, this debt will slowly and silently impact application performance, maintainability, and developer productivity. What Is Feature Flag Debt? Feature flag debt occurs when feature flags are left in the codebase after they’ve served their purpose. The most common symptoms of feature flag debt include: Dead code Context switching for developers Feature flag debt can go unnoticed because it typically doesn’t cause broken features. As a result, developers are often reluctant to clean up flags so they can focus on developing new features. Impact on Performance Feature flag debt can have serious consequences for application performance. In front-end applications, this is often overlooked. Once a feature flag has been introduced into a codebase, it incurs a long-term cost every time the application is loaded in the browser. Larger JS bundles: Each feature flag adds logic to the application. When feature flags are not cleaned up, the associated code is typically not removed from the final bundled app. This means more code for users to download and more memory used on the client.Reduced execution speed in client-side rendering: The browser must download, parse, and evaluate the entire bundle, even if certain code paths are never executed. This leads to slower parsing, longer load times, and slower interaction time. Impact on Developer Productivity Feature flag debt also negatively impacts developer productivity. Imagine having to read through an if/else statement that checks a feature flag that will never be true. Developers frequently encounter this scenario when working with feature flags. New engineers, in particular, often struggle to know which feature flags are safe to ignore. Should they be commenting out this code? What if they need it later? Why Aren’t Feature Flags Cleaned Up? It should be standard practice to remove feature flags from the codebase once they’re no longer needed. However, they often become a long-term liability for the application for several reasons: Nobody takes responsibility for cleaning up flags.People are afraid to remove code.There are no tools to help automate the process.There’s always something more pressing to work on. We often don’t see a defined feature flag lifecycle, which leads to indefinite accumulation. Example of Feature Flag Debt For example, let’s take a look at how a feature would typically look when wrapped in a feature flag: JavaScript const isAIAgentsFeatureFlagEnabled = isFeatureEnabled('ai-agents'); if (isAIAgentsFeatureFlagEnabled) { // lines of code // Code to run when the feature flag is enabled } else { // lines of code // Code to run when the feature flag is disabled } When first implemented, this doesn’t look too bad. When this feature is rolled out to production, there’s still the safety net of keeping the original functionality should something go wrong. However, after the feature flag is turned on for everyone and the feature reaches general availability (GA), there is no reason to keep both pathways in the application. The application still ships both pieces of code in the bundle, but only one will ever execute at runtime. The else block now represents dead code that will not get executed, but still takes up space in the bundle and adds to code complexity. Manage and Eliminate Feature Flag Debt Organizations need to take measures to prevent feature flag debt from slowing down their applications. Defining a feature flag life cycle is a great place to start. By enforcing that each feature flag has a description, owner, status, and expiration date, the team can ensure flags aren’t left to become debt. Treat feature flags as temporary and not part of the application's core architecture. When the feature is in GA, remove the flag and delete any code paths that are no longer needed. This results in a cleaner, more maintainable, and performant codebase. JSON [ { "feature_flag_name": "ai-agents", "description": "Feature flag that will allow AI agents to assist users with workflows and provide suggestions", "owner": "architecture crew", "status": "GA", "expiration_date": "2026-12-31" }, { "feature_flag_name": "smart-checkout", "description": "Feature flag that will allow smart checkout features, including dynamic pricing, custom offers", "owner": "architecture crew", "status": "Dev", "expiration_date": "2026-12-31" }, { "feature_flag_name": "ai-agents-eval", "description": "Feature flag to allow the evaluation framework to execute tests against AI agents to determine how accurate they are", "owner": "agent evaluation crew", "status": "QA", "expiration_date": "2026-10-12" }, { "feature_flag_name": "experiment-recommendation-v2", "description": "Feature flag for experimenting v2 recommendation version", "owner": "agent evaluation crew", "status": "GA", "expiration_date": "2026-12-31" } ] Having the feature flags stored in a format similar to the above can help identify who to contact to clean up old flags. Performance Gains From Cleanup Removing unused feature flags reduces bundle size and eliminates unnecessary code execution, resulting in faster load times, improved rendering performance, and a cleaner codebase. Conclusion For most enterprise applications, feature flags aren’t the problem; it’s forgetting to take them down. As the application grows over time, old feature flags accumulate, which will silently bloat the bundle size, degrade performance, and clutter the code.