DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

AI/ML

Artificial intelligence (AI) and machine learning (ML) are two fields that work together to create computer systems capable of perception, recognition, decision-making, and translation. Separately, AI is the ability for a computer system to mimic human intelligence through math and logic, and ML builds off AI by developing methods that "learn" through experience and do not require instruction. In the AI/ML Zone, you'll find resources ranging from tutorials to use cases that will help you navigate this rapidly growing field.

icon
Latest Premium Content
Trend Report
Generative AI
Generative AI
Refcard #403
Shipping Production-Grade AI Agents
Shipping Production-Grade AI Agents
Refcard #401
Getting Started With Agentic AI
Getting Started With Agentic AI

DZone's Featured AI/ML Resources

The New Senior Developer Job Description: Half Engineer, Half AI Systems Architect

The New Senior Developer Job Description: Half Engineer, Half AI Systems Architect

By Dinesh Elumalai DZone Core CORE
She had everything on the list. Eight years of experience. Strong systems design. Distributed architecture under her belt. The panel interview went well — one of the hiring managers later described it as the best technical conversation they'd had with a candidate all quarter. The team passed on her. Two weeks later, during a casual conversation with that hiring manager, the reason came out. It wasn't her architectural skills or her communication. It was a question someone had slipped in near the end: "Walk us through how you'd set up an AI-assisted code review pipeline for a team that ships twelve microservices." She described doing it manually. The other finalist described standing up an orchestration layer with context-aware models, configuring fallback thresholds, and building observable feedback loops that trained the team's prompt library over time. Same job title. Completely different mental model of what the job now involves. That story isn't unique. It captures something that's been happening gradually over the past eighteen months and then very suddenly in the last six: the senior developer role has quietly split into two jobs. One of them is the job we all trained for. The other is the job that a meaningful portion of your working week now actually requires. And the gap between developers who've accepted that and developers who haven't is becoming very hard to explain away in performance conversations. The Split That Happened Without a Memo Let's be specific about what the "AI Systems Architect" half of the role actually means, because people either over-mystify it or undersell it. It doesn't mean you become a data scientist. It doesn't mean you're fine-tuning models or writing PyTorch. Those are real jobs — they're just different jobs. What it means is something more operational and less glamorous: you are now responsible for designing, maintaining, and improving the systems of AI assistance that your team works inside of, not just the code that the team produces. That sounds abstract until you break it into daily decisions. Which tasks should be fully AI-generated versus AI-assisted versus AI-reviewed only? Where are your model's blind spots for your specific codebase, and how do you account for them in code review? When a junior developer on your team gets a plausible-but-wrong architectural suggestion from an AI assistant, what's the escalation path? How do you measure the quality of your team's prompting over time? These aren't rhetorical questions — they're operational ones that live teams are answering right now, often badly, because no one assigned anyone to own them. Senior developers are getting assigned to own them. Not officially. Not with updated job descriptions. Just through the ordinary mechanism of "this problem needs solving, and you're the most experienced technical person in the room." What "AI Systems Architect" Actually Means Day to Day The phrase sounds bigger than the practice. What it actually breaks down to is four interconnected responsibilities that are now landing on senior developers, whether they want them or not. First: workflow design. Someone has to decide which parts of the development cycle use AI assistance, at what level of autonomy, and with what human checkpoints. At most companies, this currently happens by accident — everyone develops their own habits, and nobody compares notes. The developers who are stepping into the architect half of the role are the ones making that deliberate, rather than emergent. Second: model selection and configuration. Not fine-tuning, but product-level decisions: which models for which tasks, what context window strategy, how to handle codebases that exceed context limits, what fallback behavior looks like. These are practical engineering decisions that live in the space between "developer tool choice" and "infrastructure decision." They belong to senior engineers. Third: quality governance. AI-generated code introduces a new failure mode: plausible-looking outputs that are subtly wrong. The patterns of wrongness are specific and learnable. Senior developers who have mapped the failure modes of their AI tooling — the kinds of edge cases it consistently misses, the naming convention assumptions it gets backward, the security patterns it handles confidently and incorrectly — are providing a form of institutional knowledge that is genuinely hard to replace. Fourth: team prompting culture. This is the one nobody talks about at conferences yet, but engineering managers across the industry have been mentioning it consistently over the past six months: the quality variance in how different team members prompt their AI tools is enormous, and it compounds. Senior developers who build and maintain shared prompt libraries, who do prompt review the way they do code review, who can diagnose why a colleague got a bad output — those developers are operating as a force multiplier for the entire team, not just themselves. The Job Description Before and After: A Concrete Comparison This is worth making explicit. Analysis of actual senior engineer job postings — anonymized, from companies between 80 and 1,200 employees — shows a clear shift when comparing what the role requirements looked like in early 2023 versus what's being written now. The change is real and measurable. The pattern across all of it: the what of the role hasn't changed so much as the how and the governance around it. Senior developers are still responsible for the same categories of work. They're now also responsible for the design of the AI-assisted systems that help a team do that work, and for the failure modes those systems introduce. The New Core Competency Stack Here's what the competency model looks like in practice when you lay it out. The traditional side should feel familiar. The AI architecture side probably contains a few items you haven't formally owned yet — but if you've been doing this job for more than two years and paying attention, you've been building these skills without realizing it. The Salary Premium Is Already Real Compensation data lags reality by about eighteen months, so take specific numbers here with appropriate skepticism. What industry reporting suggests is that a clear pattern is emerging: developers who can demonstrably operate in both halves of the new role — not just use AI tools personally, but architect AI-assisted workflows for a team — are commanding a premium that's running somewhere between 18% and 31% above their single-track counterparts at the same years-of-experience mark. That range is wide. The premium is highest in companies that have recently invested in AI transformation initiatives and learned, the hard way, that "everyone uses Copilot" is not the same as "we have a coherent AI engineering strategy." Those companies are specifically recruiting for systems architect skills because they've already paid for the gap. How to Build the Second Half of the Job Nobody teaches this in a course yet. There are some good books and a growing number of blog posts, but the skills are mostly developed through deliberate practice and iteration. Based on teams that have successfully made this transition, here's what works. The starting point is mapping your team's current AI-assisted work honestly. Not aspirationally — honestly. Which tasks are you and your team currently doing with AI assistance? Where does the output go without sufficient review? What are the categories of error you've caught, and what categories might you be missing? This audit, done once and updated quarterly, is the foundation of a governance practice. From there, the most leveraged thing most senior developers can do is build a shared prompt library for their most common task types. Not a personal one — a shared one, with a versioning and review practice attached. The discipline of reviewing a colleague's prompt and explaining why it produced a wrong output is one of the fastest ways to build the mental model you need for the governance half of the role. More
Building Production-Safe Agentic Remediation With Docker MCP Gateway: Lessons From 43% to 100% Accuracy

Building Production-Safe Agentic Remediation With Docker MCP Gateway: Lessons From 43% to 100% Accuracy

By Mohammad-Ali Arabi
Our first version was wrong 57% of the time. Not because the AI model couldn't identify Docker container failure scenarios—it usually could. The failures occurred at the decision boundary: determining when an automated action was appropriate, when escalation was required, and when no action should be taken. Over several weeks, we built and evaluated an AI-assisted remediation system on Docker MCP Gateway across four container failure scenarios, improving decision correctness from 43% to 100%. What we learned surprised us: the hard problem is not teaching the agent to act. The hard problem is defining and enforcing the boundary where the agent must stop acting. The project reinforced a broader lesson: production-safe AI is less about model intelligence and more about engineering explicit policies, validation mechanisms, and execution controls. This article covers what we built, what failed, and the engineering changes that improved correctness. The full code, audit logs, validation datasets, and analyzer scripts are all in the companion repository. Why Naive Auto-Remediation Is Dangerous The most common mistake in AI-driven operations is treating "AI can fix things" as the goal. It isn't. A remediation system that attempts to fix every incident automatically is often worse than having no automation at all. Consider the failure modes: An automatic restart of a CrashLoopBackOff container does not fix the underlying problem—it simply generates more alerts. The container will fail again because the code or configuration issue remains unchanged. The result is additional operational noise without any meaningful remediation. Automatically increasing memory limits for every OOM event can be equally problematic. The workload continues running, but the underlying memory leak remains hidden. Months later, teams may find themselves running multi-gigabyte containers that should have been consuming a fraction of those resources. Automated remediation without an audit trail creates a different problem: a lack of accountability. Without structured records, it becomes impossible to determine what actions were taken, what actions were considered, and why a particular remediation path was selected. "The AI fixed it" is not a useful postmortem entry. The safest remediation systems are not the ones that automate the most actions. They are the ones with clearly defined operational boundaries, explicit escalation rules, and auditable decision paths. The engineering challenge is not maximizing automation — it is determining where automation should stop. According to Mohammad-Ali A'râbi, Docker Captain: One of the most dangerous assumptions teams can make is treating a language model as if it were an experienced senior site reliability engineer. It is not. A language model may generate useful recommendations, but it has no operational accountability. It does not understand business context, service ownership, deployment history, or the downstream consequences of an action. Any system granted the ability to modify production infrastructure must therefore be treated as an untrusted component operating behind strict controls. The container ecosystem learned this lesson years ago through the principle of least privilege. We stopped running containers as root whenever possible. We reduced Linux capabilities to the minimum required set. We learned that mounting Docker sockets into containers for convenience often created unacceptable security risks. The common theme was simple: convenience should not bypass security boundaries. The same principle applies to operational automation. Granting unrestricted access to restart workloads, modify resource limits, or execute privileged actions without meaningful controls introduces unnecessary risk. The challenge is not improving the quality of recommendations. The challenge is ensuring that every action is constrained, observable, and reversible. This is where Docker MCP Gateway becomes valuable. Rather than allowing direct access to infrastructure operations, the Gateway places a controlled execution layer between the decision-making component and the underlying tools. Authentication, rate limiting, audit logging, input validation, and execution isolation are applied consistently before any action is performed. In our implementation, every tool invocation passed through HMAC authentication, Redis-backed rate limiting, structured audit logging, and containerized execution. These controls were not added as enhancements; they were treated as core design requirements. Production systems already rely on admission controllers, access controls, audit trails, and policy enforcement. Operational automation should be held to the same standard. Access to credentials should remain isolated from the decision-making layer. Direct access to host resources should be minimized. Every action should be traceable and reviewable. The more authority a system is given, the more important it becomes to enforce clear operational boundaries. Reliable automation depends less on unrestricted capability and more on well-defined constraints. What Docker MCP Gateway Gives You At a high level, Docker MCP Gateway acts as a secure control plane between AI agents and MCP tools, enforcing authentication, rate limits, audit logging, and execution isolation for every tool call. The Model Context Protocol (MCP) is an open standard introduced by Anthropic in late 2024 that gives AI applications a uniform interface for invoking external tools and services. It has since gained support across multiple vendors, including Anthropic, OpenAI, Google DeepMind, and AWS. MCP solves the protocol problem. It doesn't solve the production problem. Production systems require controls around tool execution, not just a standardized way to invoke tools Authenticated tool calls (not just "the agent has the API key in plaintext somewhere")Rate limiting (agents can spiral fast)Audit logging of every decisionContainerized tool isolation (so a misbehaving tool can't take down its host)Centralized policy enforcement (so adding a new server doesn't require reconfiguring every client) Docker MCP Gateway provides these operational controls. It sits between AI clients and MCP servers, routing every tool invocation through a centralized enforcement layer that handles authentication, policy enforcement, rate limiting, and execution isolation. For our work, we built a custom MCP server inside Docker that exposes three remediation tools: check_container_logs, restart_container, and update_container_resources. Every request passes through HMAC authentication, is rate-limited using Redis, and is recorded in a structured JSON audit log before execution.mc From Mohammad-Ali A'râbi, Docker Captain: Docker's AI tooling strategy is fundamentally about building a verifiable supply chain for reasoning engines. You cannot build secure AI on top of bloated, vulnerable foundations. The strategy begins with Docker Hardened Images (DHI), providing agents and MCP servers with minimal attack-surface base images backed by cryptographically signed SLSA Level 3 provenance. The Docker Hub MCP then acts as a discovery layer, allowing agents to find and navigate trusted container artifacts through natural-language interactions. From there, these components converge into Docker AI Governance, where MicroVM-based sandboxes apply strict, deny-by-default controls over filesystem access, network connectivity, and tool execution. Together, these capabilities represent a broader architectural shift from securing application code to securing an agent's entire operational blast radius. Recent supply-chain attacks such as Shai-Hulud 2.0 have shown that modern attackers increasingly target the automation layers that underpin software delivery. AI agents now operate inside those same environments, making blast-radius reduction a first-class architectural concern. A Decision Framework: When to Auto-Fix vs. Escalate Before implementing any automation, we documented the expected behavior for each failure mode. This was not a planning exercise—it became the specification the system had to satisfy and later served as the foundation for our validation framework. Failure Type Likely Cause Safe Action OOMKilled Resource exhaustion (often legitimate) Auto-fix: increase memory CrashLoopBackOff Code or configuration bug Escalate — never auto-restart Single Exit (code 1) Could be transient (network, DB) or persistent Try restart once, escalate if it persists HealthCheckFailure App stuck or deadlocked Auto-fix: restart The guiding principle was simple: transient and resource-related failures could be remediated automatically, while persistent application and configuration failures required escalation. Transient and resource-driven failures auto-fix. Persistent and code-driven failures escalate. Every decision is logged. This framing matters more than the implementation. It's the part you should keep even if you replace every other piece of the system. The agent's job isn't to be smart — it's to apply this rule consistently and visibly. We chose to encode this in the agent's system prompt rather than in code branching, which turned out to be one of our most important design decisions. More on that below. The Architecture in Practice The system has five logical layers running across three Docker Compose containers: Five-layer architecture: container failure triggers the AI agent, which routes every tool call through the Docker MCP Gateway security pipeline before reaching MCP Tools and the Docker API. The architecture separates concerns into five layers. The AutoGen agent (GPT-3.5-turbo, cost-optimized for this decision space) handles reasoning and decision-making. The Docker MCP Gateway sits in front of the tools as a security enforcement point — every tool call passes through HMAC authentication, Redis-backed rate limiting (100 requests/hour), input validation, and structured audit logging. The MCP Tools layer exposes three remediation actions: check_container_logs, restart_container, and update_container_resources. Below that, the Docker API performs the actual container operations. In our current implementation, the Gateway and Tools layers are colocated in a single Python service for simplicity — in a multi-tenant production setup you'd separate them into distinct services that scale independently. Every tool call generates an audit log entry like this: JSON { "timestamp": "2026-05-07T02:08:15.456Z", "incident_id": "inc-20260507-020815", "agent_id": "docker-ops-agent-001", "alert": { "description": "Docker container crashed with OOMKilled", "container_id": "nginx-oom-test", "status": "OOMKilled" }, "decision_chain": [ {"tool": "check_container_logs", "result": "..."}, {"tool": "update_container_resources", "result": "Memory limit updated to 200MB"} ], "resolved": true } That structured output is what makes the system auditable. It's also what makes our validation work possible. The Engineering Reality: 43% to 100% Across 7 development-phase incidents, our agent made the correct decision 43% of the time. Across 6 validation-phase incidents after applying our fixes, it was correct 100% of the time. Both datasets are committed in the repo's monitoring/analysis directory. Phase Runs Correct Avg Turns/Incident Before fixes 7 3/7 (43%) 22.7 After fixes 6 6/6 (100%) 11.7 A note on sample size: this is a small dataset. It's enough to show the expected behavior is reproducible across the four scenarios, but not enough to make claims about reliability under load or at scale. What changed between the two phases is documented as nine challenges in the lab README. Three of them drove most of the improvement. Here they are. Challenge A: The OOM That Couldn't Be Fixed In the early runs, the agent correctly diagnosed an OOMKilled container, called the memory-update tool, and got back this Docker error: Plain Text Memory limit should be smaller than already set memoryswap limit, update the memoryswap at the same time Then it correctly escalated, because it had no tool for updating memoryswap. Our analyzer marked this as wrong because the OOMKilled scenario expected AutoResolved, not Escalated. But the agent's logic was right. The bug wasn't in the agent — it was in our test container's --memory-swap configuration. Once we fixed that (set --memory-swap=-1 for unlimited swap), the agent's behavior didn't change at all. The same logic that escalated correctly before now succeeded correctly. The agent went from 0/2 to 2/2 correct. Lesson: When the agent makes the right decision but your tests say it's wrong, check the test setup before blaming the agent. We spent a few hours debugging the agent before realizing our own container configuration was the problem. Challenge B: The Over-Eager Restart In the first three CrashLoopBackOff runs, the agent restarted the container 2 out of 3 times. CrashLoopBackOff is exactly the failure mode where you should never restart — the container is crashing because of a code or config bug, not a transient state. Restarting just generates more crashes. We almost wrote a code branch for it: add a check, route CrashLoopBackOff to a different path. Before doing that, we tried tightening the system prompt instead: Plain Text For CrashLoopBackOff failures: ALWAYS escalate to a human operator. NEVER attempt to restart the container. Restarting will only cause the container to crash again. Your role is to diagnose and report, not to fix. That single change — no code, just words in the prompt — made the agent consistently escalate on every subsequent run. Lesson: If you want the agent to follow a rule, write the rule down in the system prompt. Don't leave it to the model to figure out. We spent more time arguing about whether to add code branching than the prompt change actually took. Challenge C: The Hallucinated Containers After resolving real incidents, the agent started making up alerts for containers that didn't exist — memory-hungry-app, app-crash-loop, none of which were ever in our system. It was inventing failures and then "responding" to them. Root cause: AutoGen's max_consecutive_auto_reply was set to 10. After the agent finished a real incident, the conversation framework kept giving it turns. Without a real prompt to respond to, it generated plausible-looking next incidents and walked itself through fake remediations. Fix: drop max_consecutive_auto_reply to 3. The agent gets exactly enough turns to diagnose, act, and report — then the conversation ends. Lesson: AutoGen and similar frameworks default to long conversations because they're built for chat use cases. For production, you want them to stop talking once the job is done. From Mohammad-Ali A'râbi, Docker Captain: The progression from 43% to 100% correctness reinforced a key lesson: production AI is often less a machine-learning problem; it is a systems engineering challenge. The initial failures were not the fault of the LLM; they were the result of implicit, undocumented policies and permissive execution environments. Production AI engineering requires moving past the "magic" of conversational models and returning to a rigorous, deterministic engineering discipline. It means treating the system prompt as an immutable policy file, writing explicit, boundary-defining rules that leave zero room for the model to improvise. It means enforcing aggressive Redis-backed rate limits to prevent hallucination loops, isolating execution tools to eliminate docker.sock vulnerabilities, and relying exclusively on structured JSON audit logs rather than plain text for forensic validation. The agent is merely a component. The surrounding infrastructure — the cryptographic constraints, the isolated execution environments, and the hardcoded fallbacks — is what actually makes the system safe. Building trust in AI demands the exact same rigor we apply to cluster security: trust nothing, verify everything, and strictly log the rest. Production Patterns We'd Recommend If you're building something similar with Docker MCP Gateway, here's what we'd carry over from our nine challenges: Authenticate every tool call, even in dev. We used HMAC signing on every request from agent to MCP server. The reason to do this early isn't just production security — it surfaces auth integration bugs during development, when they're cheaper to fix. Use structured JSON for audit logs, not text. The audit format we used (incident ID, agent ID, alert, decision chain, resolved flag) made it possible to write an analyzer that validates agent behavior automatically. Plain text logs would have made that impossible. Set rate limit low. We used Redis with 100 requests per hour per agent. Agents can make a lot of tool calls quickly — a single bug in the system prompt triggered thousands of calls in one of our early runs before we noticed. Default to escalation when uncertain. A false-positive escalation costs you a page that turns out to be nothing. A false-negative auto-fix can mask a real problem for weeks. The costs aren't symmetric, so the default shouldn't be either. Validate against expected behavior. Write down what you expect each failure mode to do, then write an analyzer that checks the audit log against that spec. We open-sourced ours — it's about 250 lines of Python, no external dependencies. You can adapt it to any agent that produces structured audit logs. Tighten conversation turn limits. max_consecutive_auto_reply=3 is a sane starting point for production. The agent should do its job and then the conversation should end. Frameworks default to longer because they're optimized for conversational AI demos, not production ops. What's Still Missing This article would be marketing if we didn't include this section. Honest engineering means owning what isn't built yet. No Docker Scout MCP server exists yet. Security-aware container discovery — "find the most secure nginx tag," "show me CVEs in this image" — isn't possible through MCP today. The Docker Hub MCP server has 13 tools, but none of them surface vulnerability data. This is a real gap in the ecosystem. No incident memory or pattern recognition. Our agent treats every incident as fresh. A production system would learn that this container OOMs every Tuesday at 4 pm and recommend a permanent memory increase rather than reactively bumping it each time. We've left this as future work. Sample sizes are small. Our 6 post-fix incidents prove the expected behavior is reproducible across the four scenarios. They don't prove reliability under production load, traffic spikes, or adversarial conditions. We'd need 100x more data and load testing to make those claims. MTTR is unmeasured. AutoGen records all decision-chain timestamps within microseconds of each other, so the per-incident duration data we collected isn't usable as a real mean-time-to-recovery metric. Capturing real MTTR would require external timing instrumentation around the agent. Gateway and tools are colocated. Our MCP server bundles the security pipeline (HMAC, rate limiting, audit) with the tool execution. In a true multi-tenant production setup, you'd separate these into distinct services so they can scale independently. Our current architecture is fine for a single team or environment; it would need refactoring before serving multiple agent populations. What This Means for AI Infrastructure The interesting part of building agentic infrastructure isn't getting the agent to act. It's getting it to not act when acting would make things worse. Docker MCP Gateway is one of the first production tools that takes this seriously — treating the infrastructure around the agent as the security layer, not the agent itself. The pattern we ended up with — a Gateway in front, scoped tools, decision boundaries written into the system prompt, structured audit logs — isn't novel. It's just what worked. We expect most production AI agents will end up looking similar, because this is what makes them debuggable when something goes wrong. The nine challenges we documented in the lab README are probably challenges you'll hit too. The analyzer script, the audit log format, and the validation patterns are all MIT-licensed in the companion repository. Use whatever's useful. This article was originally published on OpsCart. More
Architecting Trustworthy AI: Engineering Patterns for High-Stakes Environments
Architecting Trustworthy AI: Engineering Patterns for High-Stakes Environments
By Sujay Puvvadi
Black Swan Bugs: Paving the Way for New Roles in Software Engineering
Black Swan Bugs: Paving the Way for New Roles in Software Engineering
By Stelios Manioudakis DZone Core CORE
Why Requirements Are Becoming the Control Layer in AI-Assisted Development
Why Requirements Are Becoming the Control Layer in AI-Assisted Development
By Andrei Lavygin
Before the AI Coding Agent Writes Code: Structuring Scattered Requirements With PARA
Before the AI Coding Agent Writes Code: Structuring Scattered Requirements With PARA

AI coding assistants are becoming increasingly capable at generating code, explaining systems, and accelerating development workflows. But in real engineering environments, the biggest blocker is often not the model’s ability to write code. The bigger issue is whether the assistant has the right context before it starts making changes. A developer rarely works from a single source of truth. A Jira ticket may describe the implementation task. A Google Doc may contain the detailed requirements. A slide deck may explain the business goal. A meeting summary may include key decisions, open questions, and next steps that never made it back into the ticket. For a human developer, this creates friction. For an AI coding assistant, it creates risk. The assistant may generate code that looks correct, passes basic syntax checks, and follows existing patterns - but still implements the wrong behavior because the actual feature context was fragmented across multiple places. This is where a PARA-style context workspace becomes useful. PARA - Projects, Areas, Resources, and Archives is commonly used to organize knowledge by actionability. Applied to AI-assisted software development, it can become a practical architecture pattern for preparing scattered engineering knowledge before an AI coding assistant touches code. The goal is not to dump every document into the model. The goal is to organize scattered context so the assistant can reason with the right information for the task. The Problem: AI Coding Assistants Often See Only Part of the Work Consider a developer asked to build a new data pipeline that calculates a generic quality score. The implementation sounds straightforward: Build a pipeline that joins multiple input tables, applies business rules, and produces a quality score output table. But the actual context may be spread across several sources: SourceWhat It May ContainTicketImplementation scope, acceptance criteria, due dateRequirements docBusiness rules, scoring logic, data definitionsSlide deckBusiness goal, stakeholder alignment, expected impactMeeting summaryFinal decisions, open questions, changed thresholdsExisting codePipeline patterns, naming conventions, dependency structureOlder documentsPrevious decisions, deprecated approaches, known constraints If the AI coding assistant only sees the ticket, it may miss the deeper context needed to implement the feature correctly. This is especially risky for data pipelines and analytics features, where correctness depends not only on code structure but also on interpretation: which source tables to use, how freshness should be handled, how business rules are applied, and how downstream consumers will use the output. What Can Go Wrong If the Agent Only Reads the Ticket? A ticket often captures the visible work, but not the full reasoning behind the work. If the assistant only uses the ticket, it may: Implement the task but miss business rules from the requirements documentIgnore key decisions captured in meeting summariesUse a technically available source table that is not the approved source for this featureMiss freshness expectations for the output tableProduce a score that does not match how downstream dashboards or reports will consume itFollow an outdated implementation pattern because it found old but similar codeGenerate a pull request that looks reasonable but fails product or data-quality expectations This is the core issue: The AI assistant may know how to write code, but it may not know which code should be written. That distinction matters. For coding agents to become more reliable, developers need a better way to prepare context before code generation begins. Reframing PARA for AI Coding Agents PARA can be adapted from a personal knowledge organization method into a context classification pattern for AI-assisted development. In a PARA-style context workspace: PARA CategoryEngineering MeaningAgent Context RoleProjectsActive work being deliveredCurrent feature scope, ticket, task goalAreasOngoing responsibilitiesStandards, ownership, governance, quality expectationsResourcesReusable knowledgeDocs, runbooks, design patterns, pipeline examplesArchivesCompleted or inactive knowledgeHistorical decisions, old approaches, past incidents This structure helps the AI assistant understand the role of each piece of information. A current requirement should not be treated the same way as an old design decision. A meeting decision should not be buried behind a generic document search. A reusable pipeline pattern should be available to guide implementation, while archived material should be used carefully as historical context. The value of PARA is not just an organization. It gives the assistant a way to distinguish between active task context, long-running rules, reusable references, and historical information. This flow changes how the assistant approaches implementation. Instead of asking: “What code should I generate from this ticket?” The assistant can reason from a richer question: “What is the active feature goal, what rules must be followed, what reusable references apply, and what historical context should be considered before changing code?” That shift is small, but important. Applying PARA to a Quality Score Pipeline Now apply this to the quality score pipeline example. The feature requires a pipeline that joins multiple input tables, applies business rules, and writes a quality score output table. The exact business logic is intentionally generic, but the pattern is common across analytics engineering, data engineering, machine learning platforms, and reporting systems. A PARA-style workspace could organize the context like this: Project Context This is the active feature work. It may include: The current ticketFeature scopeAcceptance criteriaCurrent implementation statusTarget output tableExpected delivery milestoneKnown blockers or open questions For the coding assistant, this answers: “What am I being asked to build right now?” Area Context This represents ongoing expectations that apply beyond this one feature. It may include: Data quality standardsFreshness expectationsOwnership rulesPrivacy or compliance constraintsNaming conventionsRelease processTesting expectations For the coding assistant, this answers: “What rules and standards must this implementation follow?” Resource Context This is reusable technical knowledge. It may include: Existing pipeline patternsSimilar transformation logicData model documentationDashboard dependency notesCommon test patternsRunbooksData validation examples For the coding assistant, this answers: “What reusable references should guide the implementation?” Archive Context This is historical information that may still be useful, but should not automatically drive the implementation. It may include: Older design decisionsDeprecated scoring logicPast pipeline migrationsPrevious quality metric experimentsHistorical meeting notesOld RCA or incident learnings For the coding assistant, this answers: “What historical context may explain why the system works this way?” The critical point is that archived context should be used for awareness, not blindly copied into the current implementation. Why Meeting Summaries Matter Meeting summaries are often underestimated in AI-assisted development. In many teams, the final decision is not always reflected immediately in the ticket or requirements document. A meeting summary may contain important details such as: A threshold was changed after stakeholder discussionA source table was rejected because of data freshness concernsA metric definition was clarifiedA downstream dashboard dependency was identifiedA launch decision was postponedAn open question was assigned to another teamA temporary workaround was approved only for the first release For a human developer, these details may be remembered from the meeting. For an AI coding assistant, they are invisible unless they are included in context. This is one reason a PARA-style workspace can be valuable. It gives meeting summaries a place in the feature context without treating them as random notes. A meeting summary tied to an active feature belongs in the Project context. A recurring decision about data freshness may become the Area context. A reusable explanation of metric calculation may become the Resource context. Once the feature is complete, the same meeting summary may eventually move into the Archive context. How the Coding Assistant Should Use Context Before Changing Code Before generating code, the AI coding assistant should use the structured context to form an implementation understanding. For a quality score pipeline, it should first understand: What the feature is trying to accomplishWhich input data sources are approvedWhich business rules define the scoreWhich decisions were finalized in meetingsWhat freshness or latency expectations existWhich existing pipeline patterns should be followedWhat downstream dashboards, reports, or consumers depend on the outputWhich historical approaches should be avoided Only after that should it propose an implementation plan or modify code. This changes the assistant’s role. It is no longer simply a code generator responding to a ticket. It becomes a context-aware engineering assistant that can reason across requirements, decisions, standards, and existing system patterns. The Bigger Shift: From Prompting to Context Preparation Prompting is still useful, but it is not enough for complex engineering work. A good prompt cannot fully compensate for missing requirements, outdated context, or scattered decisions. For AI coding assistants, the quality of the result depends heavily on the quality of the context that comes before the prompt. This is especially true when the task involves business logic, analytics definitions, data contracts, or cross-team decisions. In those cases, the question is not: “How do we write a better prompt?” The better question is: “How do we prepare the right engineering context before asking the assistant to write code?” For developers building with AI coding agents, this may become one of the most important habits: do not ask the agent to write code first. Prepare the context first. Because the future of AI-assisted development will not belong only to teams with the most powerful coding models. It will belong to teams that know how to structure knowledge so those models can make better engineering decisions.

By Venkata Naga Satya Sai Vineeth Kondisetty
The New Insider Threat Isn't Human: Securing AI Agents Before They Secure Themselves
The New Insider Threat Isn't Human: Securing AI Agents Before They Secure Themselves

In mid-September 2025, engineers inside Anthropic's threat intelligence team noticed something that didn't fit the usual pattern of automated probing on their platform. Ten days of digging later, they had a name for it: GTG-1002, a Chinese state-sponsored group that had turned Claude Code into the operational core of a cyber-espionage campaign against roughly thirty organizations — banks, chemical manufacturers, tech firms, government agencies. When Anthropic published its account of the intrusion on November 14, the detail that made security teams sit up wasn't the target list. It was the autonomy ratio: by the company's own estimate, the AI agent executed somewhere between 80 and 90 percent of the operation — reconnaissance, vulnerability discovery, exploit development, lateral movement, exfiltration — with humans stepping in only at a handful of strategic checkpoints. Jacob Klein, who heads threat intelligence at Anthropic, called it an escalation that lowers the bar for who can run a sophisticated intrusion at all. I've spent the better part of this year watching that bar keep dropping, one disclosure at a time. And the thing I keep coming back to is this: the security industry built thirty years of tooling around the assumption that the dangerous actor inside your network is a person — a careless employee, a disgruntled admin, a phished contractor. That assumption is now wrong often enough to be a liability. The dangerous actor increasingly has no payroll record, no badge, no manager to flag erratic behavior. It's a process. And it's already inside. Skeleton Keys for Software Here's the uncomfortable arithmetic. CyberArk's 2025 Identity Security Landscape study found machine identities now outnumber human ones by more than 80 to 1 inside the average enterprise, with AI specifically named as the biggest driver of new privileged accounts this year. Other measurements land in a wide band — Rubrik Zero Labs put it at 82 to 1, Entro Labs measured DevOps-heavy environments at 144 to 1 — but every credible estimate points in the same direction, and the gap is widening faster than anyone's governance program. What makes this dangerous isn't the count. It's the habit. Most teams I've talked with over the past eighteen months reached for the path of least resistance when they first wired an agent into production: they handed it a copy of a human's API key, or a service account with the same standing privileges everyone else in that pipeline already had. It's the software equivalent of cutting a spare house key and leaving it under the mat — convenient until the day someone you didn't intend to find it. That convenience is exactly what blew up Salesloft and its customers in August 2025. Attackers tracked as UNC6395 didn't breach Salesforce. They stole OAuth tokens belonging to Drift, a chatbot integration plugged into it, and used those long-lived, broadly scoped tokens to walk into Salesforce, Slack, AWS, and Google Workspace environments at more than 700 downstream organizations — Cloudflare and Google among them — over roughly a ten-day window. Nobody compromised the platform. They compromised the credential that the integration was trusted with, and that credential opened far more doors than the integration's actual job required. Swap "chatbot integration" for "AI agent," and you've described the exact failure mode every analyst is now warning about for 2026. The fix that keeps surfacing in serious architecture conversations isn't exotic — it's the same zero-trust logic that's been preached at humans for a decade, finally pointed at software: Skeleton-key modelScoped-identity modelCredentialCopied human API key or shared service accountUnique identity per agent, issued via OAuth client credentials or a workload-identity standard like SPIFFELifetimeStatic, often unrotated for months or yearsShort-lived, reissued per session or taskBlast radius if stolenEverything that account can touchOnly what that specific agent was scoped to doAuditability"Someone" did thisThis agent, acting on this task, did this None of this is theoretical anymore. Gartner is telling boards that by 2028, roughly a third of enterprise applications will carry embedded agentic AI, and 15 percent of day-to-day work decisions will be made without a human in the loop. You cannot run that volume of autonomous action on credentials designed for an employee who logs in, does a job, and logs out. When the Prompt Is the Payload If identity is the slower-burning problem, prompt injection is the one that's already setting things on fire. OWASP's 2025 Top 10 for LLM Applications kept it at the number-one slot for a second consecutive edition, and for good reason: an LLM has no architectural separation between "instructions I should obey" and "data I should merely read." Feed it both in the same channel, and a sufficiently clever attacker can make the model treat the second as the first. The cleanest public demonstration of how bad this gets in practice is CamoLeak, the vulnerability researcher Omer Mayraz disclosed through Legit Security in October 2025, tracked as CVE-2025-59145 with a CVSS score of 9.6. The setup was almost playful: hide an instruction inside a pull request's invisible comment field, wait for a developer to ask GitHub Copilot Chat to review that PR, and let Copilot — operating with that developer's own repository privileges — quietly search the codebase for strings like "AWS_KEY," then exfiltrate whatever it found one character at a time. Each character got mapped to its own GitHub-hosted image URL, routed through GitHub's own trusted Camo proxy so the outbound traffic looked like nothing more than a chat window rendering a picture. Legit Security's CTO, Liav Caspi, put the core problem plainly: a vigilant network monitor might catch the unusual request pattern, but the average user or maintainer almost certainly wouldn't. GitHub closed the hole in August by disabling image rendering in Copilot Chat entirely — a blunt fix, but an honest acknowledgment that there was no elegant patch for the underlying design flaw. What should worry you is that CamoLeak is GitHub-specific plumbing wrapped around a generic problem. Any agent that reads untrusted content and can also take action — summarize an inbox, browse a webpage, query a ticketing system — has the same exposed nerve. The attack surface isn't the code. It's the fact that the model can't reliably tell an instruction from a sentence describing one. MCP Didn't Invent the Confused Deputy. It Industrialized It. The Model Context Protocol turned eighteen months old this past spring, and in agent circles it's already being described, only half-jokingly, as the USB-C of AI tooling — a single standard that lets an agent plug into dozens of databases, SaaS platforms, and internal systems without custom integration code for each one. That convenience is precisely why it became 2025's most interesting new attack surface. CVE-2025-49596 let attackers run arbitrary commands through unauthenticated MCP Inspector instances, rated 9.4. CVE-2025-6514, found in the widely used mcp-remote project, hit 9.6 and gave attackers OS-level command execution simply by getting an MCP client to connect to a malicious server. Researchers at Invariant Labs separately showed they could pull private repository data and WhatsApp message history out through MCP integrations that trusted server-supplied tool descriptions a little too much. That last detail is the one practitioners now call tool poisoning, and it deserves more attention than it gets. An MCP server doesn't just expose a function — it ships a natural-language description of that function for the model to read. Bury a hidden instruction inside that description, and the agent absorbs it as context with the same credulity it would extend to legitimate documentation. Layer in what researchers call a rug pull — a tool that behaved safely last week, silently swapping in malicious behavior this week, with no re-approval prompt — and you've got a supply chain risk that traditional dependency scanning has no vocabulary for. Underneath all of it sits the same architectural sin the original insider-threat literature has been naming for years: authorization quietly divorcing from authentication. An MCP server executing a database query on an agent's behalf needs to know not just that the agent is who it claims to be, but what the human or task behind that request was actually authorized to do. Skip that check, and you've built a confused deputy that will dutifully escalate its own privileges on a stranger's behalf. Where the Policy Engine Has to Live The architecture pattern that's converging across the vendors and practitioners I trust most isn't subtle, and that's its strength. You insert a policy decision point — Cerbos, Open Policy Agent, or an equivalent — directly in the path between the agent's tool calls and the systems those calls touch, so that nothing executes on trust alone: Plain Text User | v AI Agent ----(declares identity + intent)----> Policy Engine (PDP) ^ | | allow? | deny? | v | MCP Server -----> Database / API | | +---------------------(action result)----------+ The point of that middle box is to ask a boring, specific question on every single call: which agent is this, what was it actually asked to do, and does this particular action fall inside that scope? "Only SalesBot may call lookup_customer." "Any transfer above a threshold requires a human approval step before the MCP server executes it." None of that logic lives in the model's good judgment, because the model's judgment is exactly what prompt injection is designed to corrupt. The enforcement has to sit somewhere a crafted sentence can't reach it. This is also, not coincidentally, where the Cloud Security Alliance's "toxic cloud trilogy" — a public workload, a real vulnerability, and standing high-level privilege, all present at once — actually gets defused. CSA's own telemetry shows that the combination is present in 38 percent of workloads in early 2024, down to 29 percent by mid-2025, as organizations started pulling standing privilege out of the equation. That's real progress. It's also nowhere near fast enough for the rate at which agents are being deployed. What 2026 Actually Requires I don't think the next twelve months are going to be defined by a single dramatic breach, although there will probably be one anyway. I think they'll be defined by something quieter and more structural: the slow, overdue migration of agents off static, shared credentials and onto something closer to what SPIFFE and SPIRE were originally built for in the service-mesh world — short-lived, cryptographically verifiable, per-workload identity that can be issued, scoped, and revoked without anyone touching a spreadsheet of API keys. OWASP published a dedicated Non-Human Identity Top 10 in 2025 for exactly this reason; the existing application-security and human-IAM playbooks simply don't have entries for credentials that never sleep, never request access, and inherit whatever standing permission happens to be sitting there. The governance gap is still wide open. Recent industry surveys put the share of organizations with mature agent-governance programs below one in five, even as more than ninety percent of security leaders rate the problem as critical. That mismatch — high anxiety, low operational maturity — is usually the exact condition under which the expensive breach happens. My honest read, after a year of watching this space accelerate: the organizations that treat their agents as first-class, individually identified, least-privileged principals from day one will look unremarkable in hindsight. The ones that didn't will be writing the incident reports everyone else cites in 2027.

By Igboanugo David Ugochukwu DZone Core CORE
Data Pipeline Observability: Why Your AI Model Fails in Production
Data Pipeline Observability: Why Your AI Model Fails in Production

The 3:00 AM Incident That Changed Everything It was a Tuesday morning when the alerts started firing. Our recommendation engine, the one that drives 30% of our revenue, had tanked. Accuracy dropped from 94% to 58% overnight. The data science team immediately blamed the model. They started tweaking hyperparameters, re-training on new data, and running diagnostics. Nothing worked. I got pulled into the war room at 3:00 AM. The first thing I asked wasn't "What's wrong with the model?" It was "What changed in the data pipeline?" Turns out, everything. A vendor had pushed a schema change upstream. A field that used to be required became optional. Null values started flowing through our pipeline. Our feature engineering code didn't handle nulls gracefully; it just propagated them downstream. By the time the data reached the model, 40% of our feature vectors were corrupted. The model wasn't broken. The data was. We spent six hours manually rolling back the schema change, re-running the pipeline, and restoring service. The incident report was brutal: "Lack of data validation caught a breaking change too late." That's when I realized we needed observability in our data pipeline, not just in our models. The Problem: Data Quality is Invisible Until It Breaks Here's the uncomfortable truth about data pipelines: they fail silently. Your ETL job completes successfully. Your Spark cluster finishes transformations. Your data warehouse loads without errors. Everything looks green in the monitoring dashboard. But the data itself? Garbage in, garbage out. There are three categories of failures that break AI models in production: Missing Values: A source system stops populating a field. Your pipeline doesn't validate it. The model gets NaN values it never saw during training. Predictions become random noise. Schema Changes: An upstream team adds a new column, renames an existing one, or changes data types. Your pipeline doesn't expect these changes. Either it crashes, or worse, it silently maps data to the wrong columns. Distribution Shifts: The statistical properties of your data change. A field that was always between 0 and 100 suddenly has values of 50,000. Your model's scaling assumptions break. Predictions become nonsensical. None of these show up in traditional infrastructure monitoring. Your CPU is fine. Memory is fine. Network is fine. But your data is on fire. The Solution: Observability at Every Layer I started building a three-layer observability framework using dbt, Great Expectations, and custom validation logic. The goal was simple: catch data quality issues before they reach the model. Layer 1: dbt Tests (The First Line of Defense) dbt tests are your cheapest, fastest way to catch obvious data quality issues. They run after every transformation and fail the entire pipeline if something's wrong. Here's what we implemented: SQL -- models/staging/stg_user_events.yml version: 2 models: - name: stg_user_events columns: - name: user_id tests: - not_null - unique - name: event_timestamp tests: - not_null - dbt_utils.expression_is_true: expression: "event_timestamp <= current_timestamp()" - name: event_value tests: - not_null - dbt_utils.expression_is_true: expression: "event_value > 0" These tests are simple but powerful. They catch: Missing required fields (not_null)Duplicate records (unique)Impossible values (event_timestamp in the future)Out-of-range values (negative prices) We run these tests on every dbt run. If any test fails, the pipeline stops. No data reaches the model. No silent corruption. The beauty of dbt tests is that they're version-controlled, documented, and part of your transformation code. When a schema change happens, you update the test, commit it, and everyone knows what changed. Layer 2: Great Expectations (The Statistical Validator) dbt tests catch structural issues. Great Expectations catches statistical anomalies, the subtle shifts that break models. Here's a real scenario: our user_age column had a distribution of 18-65 for two years. Then one day, we started getting ages of 200, 500, 1000. A data entry bug upstream. dbt tests wouldn't catch this because the values are technically valid integers. But Great Expectations would. Python # great_expectations/expectations/user_events_expectations.py from great_expectations.core.batch import RuntimeBatchRequest from great_expectations.data_context import DataContext context = DataContext() suite = context.create_expectation_suite( expectation_suite_name="user_events_suite", overwrite_existing=True ) validator = context.get_validator( batch_request=RuntimeBatchRequest( datasource_name="my_spark_datasource", data_connector_name="default_runtime_data_connector", data_asset_name="user_events" ), expectation_suite_name="user_events_suite" ) # Expect user_age to be between 18 and 120 validator.expect_column_values_to_be_between( column="user_age", min_value=18, max_value=120 ) # Expect event_value to have a mean between 50 and 200 validator.expect_column_mean_to_be_between( column="event_value", min_value=50, max_value=200 ) # Expect less than 5% missing values in critical columns validator.expect_column_values_to_not_be_null( column="user_id", mostly=0.95 ) # Expect the distribution to match historical patterns validator.expect_column_kl_divergence_from_list( column="event_type", partition_object={"event_type": ["click", "view", "purchase"]}, threshold=0.1 ) validator.save_expectation_suite(discard_failed_expectations=False) Great Expectations runs after dbt tests. It validates: Value ranges (age between 18 and 120)Statistical properties (mean event value between 50 and 200)Null rates (less than 5% missing in critical columns)Distribution shifts (event_type distribution matches historical patterns) If Great Expectations detects an anomaly, it alerts us. We investigate before the data reaches the model. Layer 3: Custom Validation (The Domain Expert) dbt and Great Expectations are generic. Your domain is specific. We added custom validation logic that understands our business. Python # pipelines/validation/custom_validators.py import pandas as pd from datetime import datetime, timedelta def validate_feature_engineering(df: pd.DataFrame) -> dict: """ Custom validation for features before they reach the model. Returns a dict of validation results. """ results = {} # Validate 1: Feature completeness # We need at least 95% of features populated feature_cols = [col for col in df.columns if col.startswith('feature_')] null_rate = df[feature_cols].isnull().sum().sum() / (len(df) * len(feature_cols)) results['feature_completeness'] = { 'passed': null_rate < 0.05, 'null_rate': null_rate, 'threshold': 0.05 } # Validate 2: Feature scaling # After normalization, features should be roughly between -3 and 3 (3 sigma) for col in feature_cols: max_val = df[col].max() min_val = df[col].min() results[f'{col}_scaling'] = { 'passed': max_val < 10 and min_val > -10, 'max': max_val, 'min': min_val } # Validate 3: Temporal consistency # Events should be recent (within last 30 days) if 'event_date' in df.columns: df['event_date'] = pd.to_datetime(df['event_date']) days_old = (datetime.now() - df['event_date'].max()).days results['temporal_freshness'] = { 'passed': days_old < 30, 'days_old': days_old, 'threshold_days': 30 } # Validate 4: Business logic # Revenue should always be positive if 'revenue' in df.columns: negative_revenue = (df['revenue'] < 0).sum() results['business_logic_revenue'] = { 'passed': negative_revenue == 0, 'negative_count': negative_revenue } return results def validate_and_alert(df: pd.DataFrame, validation_results: dict) -> bool: """ Check all validations and alert if any fail. Returns True if all pass, False otherwise. """ all_passed = True for check_name, check_result in validation_results.items(): if not check_result['passed']: all_passed = False print(f"ALERT: {check_name} failed") print(f"Details: {check_result}") # Send to monitoring system (Datadog, New Relic, etc.) # send_alert(check_name, check_result) return all_passed This custom validation runs after Great Expectations. It checks: Feature completeness (95% of features populated)Feature scaling (normalized features in the expected range)Temporal freshness (data is recent)Business logic (revenue is positive) If any check fails, we block the pipeline and alert the team. The Real-World Gotchas We Discovered Gotcha 1: Validation Overhead Running dbt tests, Great Expectations, and custom validation on every pipeline run adds latency. We went from 15-minute runs to 25-minute runs. The trade-off was worth it (catching one data quality issue saved us more time than we lost), but you need to plan for it. Gotcha 2: False Positives Great Expectations' distribution shift detection is sensitive. Legitimate business changes (a marketing campaign causing a spike in user_age distribution) triggered false alerts. We had to tune thresholds carefully and add context to alerts. Gotcha 3: Schema Changes Are Sneaky A vendor added a new column to an upstream table. Our pipeline didn't break; it just ignored the new column. But the data science team expected it. We added schema validation to catch new columns and alert us. Gotcha 4: Null Handling Varies Python treats null as None. SQL treats it as NULL. Spark treats it as null. When data flows between systems, nulls get lost or misinterpreted. We had to standardize null handling across the entire pipeline. The Framework: A Decision Matrix Here's how we decide which validation layer to use: Issue TypeCaught ByExampleActionMissing required fielddbt testsuser_id is nullFail pipeline immediatelyDuplicate recordsdbt testsSame user_id appears twiceFail pipeline immediatelyImpossible valuesdbt testsevent_timestamp in futureFail pipeline immediatelyOut-of-range valuesGreat Expectationsage > 150Alert, investigate, fail if severeDistribution shiftGreat Expectationsevent_value mean changes 50%Alert, investigate, continue if acceptableBusiness logic violationCustom validationrevenue is negativeAlert, investigate, failSchema changeCustom validationNew column added upstreamAlert, investigate, update tests The Results: From Chaos to Confidence After implementing this three-layer framework: Incident reduction: We went from 2-3 data quality incidents per month to 0 in six months.Time to resolution: When issues do occur, we catch them within minutes instead of hours.Model stability: Model accuracy stopped fluctuating. It's now consistently 93-95%.Team confidence: Data scientists trust the data. Engineers trust the pipeline. The best part? We caught the schema change incident before it happened. Great Expectations detected the distribution shift, we investigated, found the upstream change, and coordinated with the vendor team before any data reached production. Getting Started: The Minimal Viable Observability You don't need to implement everything at once. Start here: Week 1: Add dbt tests for not_null and unique on critical columns.Week 1: Add dbt tests for not_null and unique on critical columns.Week 1: Add dbt tests for not_null and unique on critical columns.Week 4: Set up alerting so you're notified when validations fail. That's it. You now have observability in your data pipeline. Conclusion: Observability Saves Models Your AI model isn't failing because it's bad. It's failing because the data feeding it is bad. And you won't know the data is bad until you look. The best models in the world can't save you from garbage data. But good observability can. dbt tests, Great Expectations, and custom validation aren't fun. They don't make it into conference talks. But they'll save your production system at 3:00 AM. Start small. Test early. Validate often.

By Abhilash Rao Mesala
Two Clocks Are Running Out at Once, and Almost Nobody Is Watching Both
Two Clocks Are Running Out at Once, and Almost Nobody Is Watching Both

Every CISO I talk to right now is juggling two deadlines that feel unrelated and aren't. One is the slow-motion arrival of quantum computers capable of breaking the public-key cryptography that underpins basically everything — TLS, SSH, JWTs, code-signing. The other is the much faster arrival of AI-assisted coding tools that are shipping security-critical code nobody has fully reviewed. I used to think of these as separate beats. I don't anymore, because the same root failure shows up in both: organizations adopting powerful new capability faster than they're building the visibility and discipline to govern it. Post-Quantum Planning: The Inventory Problem Comes First NIST finalized its first three post-quantum cryptography standards on August 13, 2024, after an eight-year, multi-round public competition: FIPS 203 (ML-KEM, the lattice-based key encapsulation mechanism formerly known as Kyber), FIPS 204 (ML-DSA, the signature scheme formerly known as Dilithium), and FIPS 205 (SLH-DSA, the hash-based fallback formerly known as SPHINCS+). In March 2025, NIST added a fourth algorithm, HQC, specifically chosen because it rests on a different mathematical hardness assumption than the lattice problems underneath ML-KEM and ML-DSA — a deliberate hedge in case lattice-based cryptography turns out to have a weakness nobody's found yet. The NSA's CNSA 2.0 guidance sets 2030 as the mandatory PQC migration deadline for national security systems, and NIST's broader timeline calls for deprecating RSA and ECDSA entirely by 2035. Gartner's framing of where most organizations actually stand is the line I keep sending to clients verbatim: many organizations are already prototyping PQC and improving crypto-agility, but visibility gaps persist. That's the polite analyst version of what I see in the field, which is teams that can tell you they've tested ML-KEM in a lab environment but cannot tell you how many of their production TLS endpoints, SSH host keys, or embedded device certificates are still running plain RSA-2048 with no migration path at all. Gartner's own recommendation sequence is the right one: start a cryptographic inventory, stand up a cryptographic center of excellence, push vendors for their PQC roadmaps, and prioritize migration for whatever data needs to stay confidential the longest. That last point matters more than people give it credit for — "harvest now, decrypt later" only threatens data that's still sensitive when a quantum computer capable of breaking it eventually shows up, so a database of last quarter's marketing metrics is not your priority. Decades-long medical records, government communications, and long-lived intellectual property are. The actual transition is happening faster than most security teams realize, which is encouraging, but it's happening unevenly. Cloudflare's 2025 Radar Year in Review reported that post-quantum-encrypted TLS 1.3 traffic nearly doubled across the year, from 29% in January to 52% by early December — driven heavily by browser vendors enabling hybrid post-quantum key exchange by default and by Apple's iOS 26 release in September 2025, after which the share of post-quantum-capable requests from iOS devices jumped from under 2% to 11% in four days and passed 25% by December. That's the client side. The server side is lagging noticeably: Cloudflare's own measurements put post-quantum-preferred key agreement on the origin server side at roughly 10% as of early 2026, up from under 1% a year earlier — a tenfold increase, but still a small minority. Browsers adopted PQC essentially invisibly. Backend infrastructure, predictably, is the harder problem, because it's full of legacy TLS terminators, hardcoded cipher suites, and vendor appliances nobody wants to touch. Quantum-Resistant Identity: Don't Wait for "Done" The identity layer is where crypto-agility gets concrete rather than theoretical. A PQC-ready JWT issuer isn't exotic engineering — it means your signing service can issue tokens using ML-DSA instead of (or alongside) RS256 or ES256, and your verification logic can check either signature type without a code change every time the algorithm preference shifts. The same logic applies to your internal certificate authority: if your CA can only issue RSA or ECDSA certs today, you don't have crypto-agility; you have a single point of future failure with a five-to-ten-year fuse on it. NIST has indicated that commercially available post-quantum certificates from public CAs likely won't be common until sometime in 2026, which means internal PKI teams building their own quantum-aware issuance now are ahead of the commercial market, not behind some imaginary deadline. It's worth being honest that the early implementations of these algorithms have already had real bugs. In late 2023, researchers disclosed "KyberSlash," a timing side-channel in several Kyber/ML-KEM implementations caused by non-constant-time arithmetic during decapsulation — an attacker with precise enough timing measurements could, in principle, recover a private key. The reference implementations were patched by December 2023, and it's a useful reminder that a mathematically sound post-quantum algorithm is not automatically a secure deployment; the implementation needs the same constant-time discipline that took classical cryptography decades to get right, except this time the industry doesn't have decades to learn the lesson slowly. AI/Vibe Coding Risk: The Other Deadline Andrej Karpathy coined the term "vibe coding" on February 2, 2025, to describe a development style where a programmer describes what they want in plain language, accepts the AI's output largely on faith, and iterates through follow-up prompts rather than reading the generated code line by line. Collins English Dictionary named it Word of the Year for 2025, which tells you how fast the practice spread — and the security data on what it's producing is not encouraging. Veracode's 2025 GenAI Code Security Report tested more than 100 large language models across multiple languages and found that AI-generated code failed basic secure-coding benchmarks roughly 45% of the time, containing on the order of 2.74 times more vulnerabilities than comparable human-written code, with Java the worst performer at a 72% failure rate. Georgia Tech's Systems Software and Security Lab has been tracking this concretely since launching its Vibe Security Radar project in May 2025: CVEs directly attributable to AI coding tools went from six in January 2026 to fifteen in February to thirty-five in March — more in that single month than the entire second half of 2025 combined. Hanqing Zhao, the graduate researcher leading the project, made the point that's stuck with me most: when an AI agent ships something without an authentication check, that's not a typo slipping through — it's a design flaw built in from the start, because the model was never reasoning about access control as a requirement in the first place. The concrete incident I'd point a skeptical engineering lead to is the "Rules File Backdoor," disclosed by Pillar Security on March 18, 2025. AI coding assistants like Cursor and GitHub Copilot let developers drop configuration files — .cursor/rules and similar — into a repository to steer the assistant's behavior and style. Pillar's researchers found that an attacker could embed hidden Unicode characters — zero-width joiners, bidirectional text-direction markers, invisible to a human skimming the file — inside those configuration files. The AI assistant parses and follows the hidden instructions anyway and silently generates backdoored code that looks completely clean in a normal code review because the part doing the steering was never visible to the reviewer in the first place. That's the vibe-coding risk model in one sentence: the attack surface isn't just "the model might write a bug." It's "the model is now a thing an attacker can prompt-inject without ever touching your repository's visible diff." What I'd Actually Build Plain Text PRE-COMMIT / CI LAYER → Static analysis + secret scanning on every AI-assisted commit, no exceptions for "just a quick fix" → Configuration-file integrity checks: scan .cursor/rules, Copilot instructions, and similar files for non-printable/invisible Unicode before they're trusted by any assistant → Flag any AI-generated auth, crypto, or payment-handling code for mandatory human review — never auto-merge CRYPTO-AGILITY LAYER (build-time) → Centralize all algorithm selection behind a crypto abstraction layer / feature flag, never hardcoded cipher suites or signature algorithms scattered through the codebase → CI step that fails the build if a new dependency introduces a hardcoded RSA/ECDSA-only code path with no PQC fallback registered DEPLOY LAYER (quantum-aware) → TLS termination points support hybrid key exchange (e.g., X25519+ML-KEM) by default → Internal CA issues hybrid or PQC-capable certs for anything with a multi-year expected lifetime → JWT issuers support dual-algorithm signing (classical + ML-DSA) during the transition window, with verification accepting either until classical is formally retired The pre-commit layer is aimed at the faster clock — it's the thing that would have caught the Rules File Backdoor pattern before it shipped, by treating AI-assistant configuration as untrusted input rather than developer intent. The crypto-agility and deploy layers are aimed at the slower clock, and they're cheaper to build now than to retrofit in 2029 when public certificate lifespans are down to 47 days, and nobody can find every RSA-2048 endpoint in a hurry. Neither layer replaces human judgment. Both exist because human judgment, applied once at design time, doesn't scale to a world where code gets generated in seconds, and algorithms need to rotate on a schedule measured in weeks, not years. The End-to-End Scenario, Compressed A developer asks an AI assistant to add a new payment-confirmation endpoint. The assistant generates working code, plus a JWT validation routine that happens to hardcode RS256. CI catches the hardcoded algorithm against the crypto-agility policy and fails the build, not because RS256 is currently insecure, but because the policy says nothing security-critical ships without going through the abstraction layer. A human reviews the auth logic specifically because the pipeline flagged it as AI-generated and security-sensitive. It merges with dual-algorithm signing support intact. None of this required the developer to become a post-quantum cryptography expert or to read every line the model produced. It required the pipeline to assume, by default, that AI-generated code and classical-only cryptography are both temporary conveniences that need a forcing function to age out gracefully — because left to their own momentum, neither one ages out on its own. The teams that get hurt by both of these trends at once aren't unlucky. They're the ones that treated "we'll deal with that later" as a plan for two clocks that were never going to wait.

By Igboanugo David Ugochukwu DZone Core CORE
What Cloud Engineers Actually Need to Know About AI Infrastructure
What Cloud Engineers Actually Need to Know About AI Infrastructure

When I decided to move into AI infrastructure, nobody warned me that I had to relearn how to think about compute. I proceeded with the usual steps, such as spinning up VMs, configuring networking, and managing costs. But then a moment came, and I watched, slightly horrified. I misconfigured the inter-node networking. The result was that an eight-node GPU ran a training job at just 11% GPU utilization. It was a wake-up call for me. AI workloads aren’t just different in a marketing sense. They’re different where it counts, i.e., in the architecture — how you build and run things. The ML engineers on that project immediately assumed the model was the problem. They decided to redesign the model and spent a couple of days tweaking the architecture, like chasing a ghost. The real issue resurfaced only when someone checked the network telemetry — the cluster nodes were using standard Ethernet, not InfiniBand. The model had no issues. The infrastructure configuration was incorrect. After years of working with Azure and a period on AWS before that, I wish someone had given me a cheat sheet before starting that project. Compute: Breaking Down the Model Many cloud engineers assume that AI infrastructure requires larger VMs: more cores and more memory, and the workload will run. This approach is insufficient. While right-sizing CPUs remains relevant, it now accounts for only about 20% of considerations. The remaining 80% is driven by GPUs, which operate fundamentally differently from CPUs and significantly impact the infrastructure. A GPU isn’t just a faster CPU; it's a collection of thousands of smaller cores working together to handle large datasets. If any part of your system—such as storage speed, network bandwidth, or data preprocessing—can't keep up, the GPU remains idle, incurring huge unwanted costs. On Azure, idle GPUs cost as much as active ones. Usually, the main limitation in AI infrastructure isn't the GPU itself, but the upstream systems that supply data to it. When working with Azure, you'll mostly use two main GPU families. The NC-series gives you a single A100 per VM at about $3.60 per hour on demand, making it the go-to choice for fine-tuning and inference tasks. The ND-series has eight A100S that are connected through NVLink and InfiniBand, which is perfect for distributed training. If your cluster uses regular Ethernet instead of InfiniBand between nodes, inter-GPU bandwidth can drop by 60 to 70 percent, and Azure may not warn you about this. It’s smart to double-check that your cluster is set up with InfiniBand before starting a multi-node run and to make sure your GPU quota is ready ahead of time. Storage: Where Training Jobs Are Exhausted When you’re training a language model, expect to chew through the dataset over and over — think of it as laps around a track, not a sprint. If you try to pipe 500GB of text straight from regular Azure Blob Storage, you’ll quickly find yourself staring at a progress bar that barely budges. Each blob tops out at about 60 megabytes per second, but an A100 GPU can eat data for breakfast at several gigabytes per second. There’s a massive mismatch. If you want to keep your GPUs busy (and not just waiting around), you’ll need something beefier — Azure Managed Lustre fits the bill, since it can dish out data to your training jobs at speeds regular storage can’t dream of. I’ll admit, the first time I ran into this, I wasted hours on model tweaks before realizing the bottleneck was staring me in the face the whole time. Model checkpoints are a cost trap that is often overlooked. A single checkpoint for a 7B parameter model is around 28GB. Saving checkpoints every 30 minutes over 72 hours generates more than 4TB of data. Configure a Blob lifecycle policy before you start to avoid unexpected storage costs. Networking: Two Problems, One Person Responsible During training, each GPU shares gradient updates with the others in the cluster via AllReduce. The efficiency of the cluster is directly determined by the bandwidth and latency of this communication. If this communication is disrupted, GPU utilization drops. Machine Learning teams often attribute this to model architecture issues, such as an excessive number of parameters or an incorrect batch size, but the network is usually the cause. First, assess network performance and address any issues before the job runs to avoid unnecessary model design, as ML engineers may not consider this when monitoring loss curves. The second networking problem is well known among cloud engineers. Many enterprise clients in financial services and healthcare require AI services that avoid the public internet. Azure AI services, such as Azure OpenAI, Azure ML, and Azure AI Search, all support Private Link, and the configuration process is identical to that of other PaaS services. The key consideration is to integrate private endpoint DNS zones with existing private DNS or manage them manually. ML engineers may interpret a generic “connection refused” error caused by an incorrect DNS configuration as an API issue. Both inter-GPU bandwidth and private network isolation — critical infrastructure concerns — typically fall under the same person’s responsibility. The Azure AI Services Stack: Known Infrastructure, Unknown Branding Recent Azure services such as OpenAI Service, Machine Learning, and AKS with GPU node pools might sound new, but for most infrastructure teams, the actual work remains familiar. The phrase “managed service” sometimes suggests that everything is taken care of, but in reality, only the AI model is managed. Everyday responsibilities like network security, permissions, cost tracking, and system monitoring still rest with your team, no matter how polished the portal looks. Azure OpenAI Service works much like other managed API endpoints, supporting private connections, role-based access, managed identities, and API Management for controlling usage rates. The main distinction is its use of Provisioned Throughput Units (PTUs) — these reserve GPU resources to guarantee performance. If you see HTTP 429 errors, it’s almost always a sign of resource bottlenecks rather than issues in your code, although the latter is a common assumption. Azure Machine Learning sits on top of other infrastructure stacks, such as Blob Storage, ACR, Key Vault, and compute, which you already manage. The failure mode is unique to Azure ML: the compute cluster lifecycle. Ensure clusters auto-scale to zero when idle. Unfortunately, this is not the default setting. When a bill arrives with huge costs due to a cluster running overnight because of an unset idle timeout, everyone looks to the cloud engineer first. While it’s tempting to go with Azure Container Apps for their apparent simplicity, most real-world inference workloads ultimately end up on AKS with GPU node pools. The reason? Container Apps are easy—that is, until you’re hit with cold start lag during actual user traffic and realize spinning up a GPU container on the fly just isn’t fast enough to meet your SLA. With AKS, you get far more say over things like keeping node pools warm, tuning autoscaling, and controlling scheduling—options that simply aren’t available with Container Apps. Costs: Higher Stakes, Faster Exposure Eight GPUs on an ND-series cluster aren’t cheap — about $27 an hour adds up quickly. A few long training runs and you’re already close to $2,000, and if you’re running a batch of experiments, $20,000 can disappear before anything launches. The price tag often slips by until accounting points it out. When models underperform, it’s easy to blame the architecture, but I’ve learned to glance at GPU usage first. If you’re seeing less than 60% during distributed runs, chances are the bottleneck is in the infrastructure, not the model itself. If you want to slash costs, spot VMs can drop your bill by as much as 90%. The catch? Your training jobs must be able to handle abrupt interruptions—so regular checkpointing and clean restarts are a must. If that’s not in place, spot isn’t the way to go—sort it out with your ML team before finance starts asking questions. Reserving GPU resources is a whole different equation than CPUs: GPU supply changes from region to region, and with how quickly AI hardware evolves, locking in a three-year reservation on today’s gear is a real gamble. Security: Same Toolkit, New Attack Surface For AI projects, you still need the basics like private networks, Managed Identity, strong RBAC, and encryption. But now there’s a twist: prompt injection. It’s like the old trick with SQL injection, but for language models. Someone might simply ask a chatbot to show its system prompt. If you haven’t set up protections, it could actually answer. Firewalls won’t help here. Azure Content Safety can block some of these risky requests, but most teams don’t use it until after trouble starts. If you’re in a regulated industry, logging every inference is a must. In finance or healthcare, you need to record inputs, outputs, who did what, and when, so auditors have all the details they need. Decide on your schema and retention policy before going live, because adding it later, after compliance comes calling, is always a headache. The ML engineers on these teams know the models well. But when infrastructure acts up, causing higher costs, slowdowns, or new risks, they're often the last to spot the cause. Closing that gap is the real challenge. For cloud engineers, "architecturally different" isn’t a red flag; it’s a chance to improve.

By Naveen Kalapala
Beyond Software Hope: The Engineering Blueprint for AI Execution Truth
Beyond Software Hope: The Engineering Blueprint for AI Execution Truth

Current enterprise AI governance relies on "software hope," or the belief that probabilistic models can accurately police their own authority through mutable instructions. We've spent years treating system prompts and configuration files as if they're physical vaults. They aren't. They're suggestions that can be bypassed by a single misconfigured line of code. The most dangerous failure modes in modern systems aren't human errors; they're structural. In February 2026, the MITRE ATLAS OpenClaw investigation (CVE-2026-25253) provided a definitive autopsy of our current security models. A controlled red-team exercise demonstrated how a malicious prompt could trigger an unrestricted execution tool, allowing an agent to escape its sandbox and gain broad system access in fewer than two hours. This wasn't a perimeter breach — it was a failure of the architecture's self-concept. When we treat a non-deterministic model as a trusted operator, we're building the future of the autonomous enterprise on a substrate of suggestions. If your agentic governance exists only in the software layer, you're just hosting a crash. Figure 1: Moving the decision boundary from the policy manual (intent) to the hardware substrate (iron) via the Sovereign Spine architecture. The Structural Deficit in Agentic Security The cycle where we audit the vibes of a model and hope the alignment holds has reached its technical limit. True resilience requires a transition to Hardware Truth. Software-defined governance is insufficient for autonomous agency because it can't prevent the "God-mode" vulnerability — where a perfectly valid OAuth token is used to execute an illegitimate intent. To secure an agent, we must externalize and fix its logic path. This requires a technological stack that physically governs AI, termed the sovereign spine. By anchoring intent in silicon, we're eliminating the translation drift common in human bureaucratic governance and moving the decision boundary from the policy manual directly into the hardware substrate. The Sovereign Spine — A Dual-Stack Substrate The sovereign spine establishes a deterministic floor where an instruction physically cannot cycle unless its legitimacy is cryptographically witnessed and hardware-verified. This framework is built on two non-substitutable layers. 1. Reasoning Truth — The Ledger Substrate An agent's intent must be treated as an untrusted execution path until it's validated. We require a substrate capable of capturing an immutable, third-party record of the reasoning that led to an agentic proposal. The industry standard is shifting toward Proof of Reasoning (PoR), where the agent's internal weights and decision logic are hashed and anchored to a distributed ledger. This ensures the reasoning path can't be retroactively altered during a forensic audit. Implementations like the Ontologic framework generate a cryptographic identity for a decision committed to the ledger at the moment of intent. This prevents data tourism by ensuring decision logic is anchored to consensus before reaching the execution layer. 2. Execution Truth — The Citadel Protocol If the reasoning substrate provides the "why," the Citadel protocol provides the "how." Execution Truth requires a physical choke-point that operates independently of the model layer. The foundation of the Citadel protocol is the use of Trusted Execution Environments (TEEs). These hardware-isolated enclaves ensure that governance logic is protected from the host operating system and the agent itself. The protocol defines an intent airlock — a pre-execution stage where an agent's payload is held in a suspended state. The airlock is a non-bypassable gate that evaluates the semantic intent of a request against a sovereign mandate. FeatureSoftware Hope (Current)Hardware Truth (Sovereign Spine)Primary MechanismSystem Prompts / GuardrailsCryptographic Enforcement / TEEsTrust ModelTrust, then AuditVerify, then ExecuteFailure ModeFail Open (Bypassable)Fail Closed (Deterministic)Forensic AuditLog-Based (Mutable)Ledger-Based (Immutable)Authority RootOAuth Token / Policy PDFHardware Root of Trust / TEE The Sovereign Handshake The sovereign handshake is the protocol-level weld between the reasoning hash and the hardware gate. It enforces a suspended handoff where the execution path is physically blocked until two conditions are met. Reasoning truth: The reasoning path is ledger-verified.Execution truth: The execution intent is mandate-aligned. Functional Logic of the Sovereign Handshake The following sequence details the transition from probabilistic intent to deterministic execution within the Sovereign Spine. Figure 2: The Sovereign Handshake: The protocol-level verification of ledger-based reasoning hashes within a Trusted Execution Environment (TEE). 1 - 3: Intent and Witnessing The Autonomous Agent submits a reasoning intent to Hologlass. This intent is structured as rules, inputs, outputs, and meaning (RIOM) morphemes of the request. A human witness verifies the attestation within the Hologlass loop, ensuring accountability before the intent is committed to the hashgraph ledger, which returns the unique Auth_Hash. 4 - 5: Suspended Handoff The Agent submits the payload and the RIOM-based Auth_Hash to the Citadel hardware witness. The intent airlock immediately suspends execution, holding the instruction in a non-executable state. 6 - 8: Cryptographic and Semantic Audit The Hardware Witness performs a remote attestation check against the ledger to verify the hash’s validity. Once the cryptographic proof is received, the witness performs a semantic audit. This is a sovereign mandate check where the hardware witness compares the ruleHash within the RIOM morpheme against the authorized sovereign mandate hosted in the ontologic rule registry. This ensures the agent is not only following a rule, but specifically the current, immutable version of the mandate. 9 - 10: Admissibility (Success Path) If both the human-witnessed hash and semantic audit succeed, the hardware witness opens the gate to the target iron, allowing the instruction to cycle. 11: Terminal Refusal (Failure Path) If the cryptographic witness fails or the intent violates the sovereign mandate, the hardware witness issues a terminal refusal, physically locking the hardware gate and preventing execution. Practical Implementation — The Intent Airlock The intent airlock is more than a simple filter; it's a semantic validator. In a practical enterprise setting, this involves pre-defined business constraints — the sovereign mandate — that are loaded into the TEE at boot time. For instance, in a high-latency financial environment, the Mandate might state: "No single agent may authorize a transfer exceeding $10,000 without a human signature." When the agent attempts a $15,000 transfer, the airlock identifies the violation at the silicon level. Because the airlock resides in the TEE, even a compromised root user on the host system cannot modify the mandate or bypass the check. Governance as Physics The industry has reached its "TCP/IP moment" for AI trust. We must stop building bespoke Python wrappers and start building a unified substrate. You cannot audit a vibe, and you can't protect the enterprise with a PDF policy. The era of software hope is over. By anchoring agentic reasoning on the ledger and enforcing execution in the silicon, we're establishing a substrate of certainty. The future of the autonomous enterprise doesn't rely on better prompts — it's forged in the sovereign spine.

By Theo Ezell
Code and Connect: MCP + MuleSoft
Code and Connect: MCP + MuleSoft

I often find myself in conversations where the same words keep popping up again and again: Agents, MCP, and A2A. Everyone seems excited about them. But the funny part is that when the topic shifts to MCP (Model Context Protocol), the explanations start to vary. One day, someone confidently said, “An MCP server is basically a tool.” Another person immediately disagreed and replied, “No, no — MCP is more like a client.” Before that debate could settle, someone else joined the conversation and said, “Actually, MCP is just a protocol.” And then another perspective appeared: “Think of it as middleware that sits between an agent and APIs.” At that moment, I realized something interesting: we were all talking about the same concept, yet each of us understood it a little differently. These conversations made me curious. If experienced developers and architects describe MCP in different ways, how confusing must it be for someone who is just starting to explore this space? The more I listened, the more I noticed a pattern — people weren’t wrong, but they were often describing only one piece of the puzzle. That realization is what inspired this blog. In this article, I want to step back from the buzzwords and walk through the concepts in a simple way. What exactly is MCP? Is it a server? A tool? A client? Or something else entirely? And how does it relate to the agents that everyone keeps talking about? Is it applicable only to agents, or is it applicable to assistants also? We will also explore MuleSoft's capability in this space. By the end of this post, my goal is to bring clarity to these terms and show how they connect. Instead of hearing multiple interpretations in different conversations, you’ll be able to see the complete picture of how MCP fits into modern AI and integration architectures. Let's Understand What Anthropic Says About MCP MCP (Model Context Protocol) is an open-source standard for connecting AI applications to external systems. Think of MCP like a USB-C port for AI applications. Just as USB-C provides a standardized way to connect electronic devices, MCP provides a standardized way to connect AI applications to external systems. MCP at high level Now let's break down each component and understand it in the simplest way possible. AI Application AI application can be any application that consists of an LLM, orchestration, and tools (You can think of it as assistants), or it may consist of more complex components such as Agent Orchestration, specialized agents, and Tools(You can think of it as an agentic application). Tools can be a Payment Gateway, a Data Retrieval API, a Weather API, a File System, a WebSearch, etc. MCP Model Context Protocol is an open protocol that enables seamless integration between AI applications (LLM Applications) and external data sources and tools. MCP provides a standardized way to connect LLMs with the context they need. MCP follows a client-server architecture. Key components of this architecture are MCP Host, MCP Client, and MCP Server. Let's extend our previous architecture. MCP architecture MCP Host It is nothing but a Host where the AI application is running. MCP Client It is a component that establishes a connection with the MCP Server and gets the context for the MCP Host to use. MCP Server It consists of external services that provide context to LLMs. Model Context Protocol consists of two layers: Data layer: The data layer implements a JSON-RPC 2.0 (JRPC) based exchange protocol that defines the message structure and semantics for client-server communication.Transport layer: The transport layer manages communication channels and authentication between clients and servers. It handles connection establishment, message framing, and secure communication between MCP participants.MCP supports two transport mechanisms: Stdio transport: Uses standard input/output streams for direct process communication between local processes on the same machine, providing optimal performance with no network overhead.Streamable HTTP transport: Uses HTTP POST for client-to-server messages with optional Server-Sent Events for streaming capabilities. This transport enables remote server communication and supports standard HTTP authentication methods, including bearer tokens, API keys, and custom headers. MCP recommends using OAuth to obtain authentication tokens. Use Case We can think of "Weather Intelligence Agent," which uses the MCP server to make a call to a tool that provides weather information based on a city name. This is a simple use case just to demonstrate how an API is called as a tool using MCP. We will use Postman and Cursor to mimic as Agent/Assistant, which will call the Weather API. Let's see how we can implement this use case using MuleSoft: Step 1: MuleSoft provides the MCP Server - Tool Listener connector. We will configure the MCP Server. MuleSoft code Refer to the code: XML <?xml version="1.0" encoding="UTF-8"?> <mule xmlns:ee="http://www.mulesoft.org/schema/mule/ee/core" xmlns:http="http://www.mulesoft.org/schema/mule/http" xmlns:mcp="http://www.mulesoft.org/schema/mule/mcp" xmlns="http://www.mulesoft.org/schema/mule/core" xmlns:doc="http://www.mulesoft.org/schema/mule/documentation" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.mulesoft.org/schema/mule/core http://www.mulesoft.org/schema/mule/core/current/mule.xsd http://www.mulesoft.org/schema/mule/mcp http://www.mulesoft.org/schema/mule/mcp/current/mule-mcp.xsd http://www.mulesoft.org/schema/mule/http http://www.mulesoft.org/schema/mule/http/current/mule-http.xsd http://www.mulesoft.org/schema/mule/ee/core http://www.mulesoft.org/schema/mule/ee/core/current/mule-ee.xsd"> <http:listener-config name="HTTP_Listener_config" doc:name="HTTP Listener config" doc:id="251f2d7c-e84b-4974-a1e8-96d9779bc9e9" > <http:listener-connection host="0.0.0.0" port="8081" /> </http:listener-config> <mcp:server-config name="MCP_Server" doc:name="MCP Server" doc:id="289fb886-e732-4274-990e-9876aca405a6" serverName="mule-mcp-server" serverVersion="1.0.0"> <mcp:streamable-http-server-connection listenerConfig="HTTP_Listener_config"/> </mcp:server-config> <http:request-config name="HTTP_Request_config" doc:name="HTTP Request config" doc:id="b31d7d79-b45b-42ec-a970-50eb19a0a702" > <http:request-connection protocol="HTTPS" host="api.weatherstack.com" /> </http:request-config> <flow name="mcp-weahter-intelligence-apiFlow" doc:id="b1c21d3c-18f0-4eac-bb4e-3cf789608580" > <mcp:tool-listener doc:name="MCP Server - Tool Listener" doc:id="4c42c1cb-898d-4fb9-8d0e-edc541fffb75" config-ref="MCP_Server" name="get_weather_information"> <mcp:description ><![CDATA[This tool gets weather information. Check weather details for device by providing the city name as input or paramValue. Please use the query.]]></mcp:description> <mcp:parameters-schema ><![CDATA[{ "$schema": "http://json-schema.org/draft-07/schema#", "type": "object", "properties": { "query": { "type": "string", "description": "city for querying weather data" } }, "required": ["query"], "additionalProperties": false }]]></mcp:parameters-schema> <mcp:responses > <mcp:text-tool-response-content text="#[payload.^raw]" priority="1"> <mcp:audience > <mcp:audience-item value="ASSISTANT" /> </mcp:audience> </mcp:text-tool-response-content> </mcp:responses> </mcp:tool-listener> <http:request doc:name="Request" doc:id="d10760de-5f93-4f63-aadc-9bfc491f94e0" config-ref="HTTP_Request_config" path="/current"> <http:query-params ><![CDATA[#[output application/java --- { "access_key" : "96d01954d0c4e444aa781fa10b92caff", "query" : payload.query, "units" : "m" }]]]></http:query-params> </http:request> </flow> </mule> Let's run this code and test it: MCP server started successfully: Deployment log Step 2: Let's use Postman as the MCP client to test it and see if it is working as expected: MCP server and available tools Step 3: Click on Connect: Connected to MCP Server Step 4: Now the MCP client is connected to the MCP server. You need to pass a query parameter as the city name, and you will get the weather details: I am writing this Blog from GOA (The Beach Capital of India). I will use GOA as the City name to retrieve weather information about GOA. Use the tool Step 5: Click on Run, and you will get the response as shown below: Response I have demonstrated it in my local version of code, which is deployed in Anypoint Studio. Let's test the same after deploying it to the runtime manager. I have deployed the code to the runtime manager. Deployed in the Anypoint platform Test result I have demonstrated this using Postman, where Postman worked as an MCP client to connect to the MCP server. We can extend it further and use Cursor to mimic the agentic behavior where the agent will use the MCP tool to get the answer. Cursor to use MCP I have used no code/low code tool, which is MuleSoft. In the next blog, I will use Python code to demonstrate the same. Watch the video for more details. Let me know if you liked it!

By Ajay Singh
The AI Definition of Done
The AI Definition of Done

TL;DR: The AI Definition of Done Your team has a Definition of Done for a product increment. It has none for the 20-plus AI-supported outputs that leave the team each week: status reports, stakeholder emails, release notes, and updates for the C-level. Each one carries your team’s name. “I know quality when I see it” is the standard most teams actually run by, and you cannot audit it, teach it to a new colleague, or defend it when a claim turns out to be wrong. The AI Definition of Done fixes that with one page per task class, agreed by the team, before the output ships. Your Increment Has a Standard; Does Your AI Output? A model turns the Jira board into a Friday status update, and the update tells an enterprise prospect that the security feature is in production. Unfortunately, it is not. The feature was descoped three months ago, but the old ticket title persisted because no one felt responsible. So the model reported the title instead of the reality. Nobody checked the claim against the release notes because nobody had agreed that someone should. The email was sent with the team’s name on the cover. A functioning agile team should be able to tell you what “done” means for a product increment. Few can tell you what “done” means for that status update. No agreed standard governs it, and it ships every week. The product increment passes through a standard that the team argued over and agreed on. The AI-assisted output passes through one person’s gut feeling at the moment they clicked send. One of those you can defend to a stakeholder, an auditor, or a new hire. The other you cannot. The AI Definition of Done closes that gap without adding a governance department, which is exactly why it survives in organizations where “AI governance” earns eye rolls. It takes a practice every agile practitioner already owns and points it at the work you have started handing to a model. It is not for everything: skip it for private brainstorming, throwaway prompts, or personal sensemaking, unless the output later informs a decision or leaves the team. The Four Questions Every AI Definition of Done Answers The Concept Verification Level Which claims get checked, by whom, against what source, and how? “Looks good” is not a method. A method names the claim, the checker, the source, and the test: every factual claim about product status gets checked against the release notes by the sender before sending, every time. Where teams get stuck: approval gets mistaken for review. Someone skims a draft, clicks send, and the team’s name now sits on a claim nobody verified. Provenance Disclosure What does the team declare about how the output was produced? Three labels cover practice: a) Human means no material AI contribution to the content, claims, or structure (a spellchecker does not count), b) AI-assisted means AI contributed to drafting, summarizing, or analysis, and a named human reviewed the output and decided, and c) AI-automated means AI produced and sent the output under predefined rules, without human review before release, audited at a set cadence. The line that matters runs through “reviewed”: clicking send on an unread draft is approval, never review. An output approved without reading is AI-automated, whatever the team tells itself. Data Hygiene What never enters a model on the way to this output? Name the exclusions concretely: personal data from team surveys, customer-identifiable information, anything your organization’s AI policy restricts. If the input rules in your A3 Handoff Canvas already cover this, point to them. Do not keep two versions of the same rule. Where teams get stuck: nobody wrote the exclusions down, so each person guesses, and the guesses differ. Sufficiency Tier and Environment Which model, plan, and data boundary are good enough for this task class, and why? A top-notch frontier model drafting calendar invitation may fail in this regard. The cheapest model, run locally on an old Mac mini, can write a board update but likely fails in the other. Capability is only half of it: a board update may need an enterprise plan with a no-training guarantee or an approved connector, even when a mid-tier model is plenty. If your team has a routing policy, point to the tier and the environment it mandates. If it does not yet, name the model and the plan, and explain in one sentence why both are enough. The AI Definition of Done Template Four questions, plus two operating controls, one page. Here is the template a team fills in per task class: DimensionYour Standard for This Task ClassTask classVerification level: What is checked, by whom, against what, howProvenance label: Human (Avoid) / Assist / Automate from the A3 Delegation Framework, and where the label appearsData hygiene: What never enters the modelSufficiency tier and environment: Wich model, plan, and data boundary, and why they are enoughSign-off: Who agreed, on what date, and the review dateStop rule: When the delegation is paused, downgraded, or returned to manual work The last two rows are operational, not definitional: Sign-off records who agreed and when, and the stop rule names the condition that pauses the delegation, because this standard should say not only when an output may ship but when the task class stops being eligible for AI at all. Without it, teams keep tuning the prompt or skill long after the delegation has proven unfit. A Worked Example: External Status Communication The status update failure that opened this article maps to one task class, status communication, leaving the company. Here is the team’s first AI Definition of Done for it: DimensionStandardTask classStatus communication leaving the companyVerification levelEvery claim about feature status is checked against the release notes by the sending manager, before sending, every timeProvenance labelAI-assisted; footer states “Drafted with AI, reviewed by [name]”; Assist is not permitted for this task classData hygieneNo customer names, no security-finding details, no internal financials enter the modelSufficiency tier and environmentMid-tier model on an enterprise plan with no model training; drafting from structured release data needs no frontier modelSign-offTeam agreed, dated; review after the next four status updatesStop ruleIf two updates in a review cycle need a factual correction after sending, the task class returns to manual drafting until the standard is revised The standard costs the sending manager about four minutes a week, set against an error that can put a flagship deal at risk. Write Your AI Definition of Done in 75 Minutes An AI Definition of Done that one person downloads and pastes into the wiki doesn’t change anything. The argument over the standard is where the standard takes hold. Run it as a workshop: Pick three task classes (10 minutes): Choose from work the team actually shipped in the last two weeks, never hypotheticals. The best candidates are outputs that leave the team.Draft in pairs (20 minutes): Each pair fills the template for one task class. Pairs work without comparing notes; divergence is the point.Argue the differences (25 minutes): Compare drafts. Where pairs disagree on verification level or provenance, the team has found an unspoken assumption. Resolve each disagreement with a decision, never with “both are fine.”Set the labels (10 minutes): Agree where provenance labels appear: email footers, document headers, report covers. Visible beats buried.Adopt and date (10 minutes): Sign off each AI Definition of Done with a review date, and add the adoption to your AI working agreement. Ownership stays with the team running the delegation. Compliance, security, or legal may constrain the standard, but they do not write it for the team. When someone says, “We do not need this for internal outputs,” ask what happened the last time an internal draft got forwarded outside the team. Every team has that story. The Record You Get for Free Each signed-off AI Definition of Done is a dated, versioned, one-page record. Stack them, and they answer the due diligence question enterprise buyers increasingly ask, “How do you control AI-generated output?” with documents instead of assurances. Nobody wrote a governance report. The records came out of normal work. That answer is already part of procurement and due diligence conversations. Article 4 of the EU AI Act has been applied since February 2, 2025, and requires providers and deployers to ensure a sufficient level of AI literacy among staff and others operating AI systems on their behalf. The EU Commission’s Q&A places supervision and enforcement under national market surveillance authorities, with the enforcement rules applying from early August 2026. The practical question underlying the regulation is simpler, and a prospect’s procurement team will ask it before any regulator does: can you show the standard that underlies the output you sent us? Three Ways It Fails The downloaded standard: A template adopted without the workshop. Nobody argued, so nobody owns it. An AI Definition of Done that nobody argued about is one nobody will follow. The universal standard: One AI Definition of Done for all work. Verification that aligns with external communication suffocates internal brainstorming, and the team abandons the practice within a month. One page per task class. Contrary to the classic Definition of Done, there is no one-size-fits-all in our use case. The static standard: Written once, reviewed never. Models change, people change, task classes change. The review date is part of the artifact, and your next delegation inspection enforces it. Conclusion: Pick One Output This Week Pick one AI-assisted output your team ships regularly. The Friday status update, the Sprint summary, or the stakeholder email. Walk it through the four questions out loud in your next Retrospective: what gets checked and by whom, how we label it, what never enters the model, and which tier is enough. You will likely find at least one question where the honest answer is “nobody decided that.” Write the one-page response for that task class, argue it, sign it, and date it. One standard, agreed by the team, is the difference between a team that uses AI and a team that a customer can trust with it. Which of your AI-assisted outputs has a standard behind it right now, and which one is merely a habit? Key Questions This Article Answers What Is an AI Definition of Done? An AI Definition of Done is a one-page, team-agreed standard that an AI-assisted output must meet before it leaves the team. Teams write one per task class, such as external status communication or data analysis summaries, never one per task. It answers four questions: what gets verified, how the output is labeled, what data never enters the model, and which model and environment are sufficient. It borrows the discipline of the Scrum Definition of Done and applies it to work on a model touched. What Is the Difference Between Approval and Review for AI Output? Review means a named human reads the AI-generated output and checks its claims against a source before it ships. Approval means someone clicked send. Clicking send on an unread draft is approval, not review, whatever the team calls it. An output approved without reading is effectively AI-automated, and it should carry that provenance label rather than the AI-assisted label, which implies a human verified it. How Do You Write an AI Definition of Done? Run a 75-minute team workshop, not a solo download. Pick three task classes from work shipped in the last two weeks, draft the standard in pairs, then compare and resolve every disagreement with a decision. Agree where provenance labels appear, set a stop rule that returns the task class to manual drafting when outputs repeatedly fail, sign off each standard with a review date, and add the adoption to your AI working agreement. The argument over the standard is what makes the team own it. How Do Agile Teams Prove They Govern AI Output? Each signed-off AI Definition of Done is a dated, one-page record. Together, a team’s standards answer the procurement and due diligence question “how do you control AI-generated output” with documents rather than assurances. The records are a byproduct of normal work, so no separate governance report is needed. This matters because buyers and regulators, including under the EU AI Act Article 4, increasingly require evidence of controlled AI adoption. What Are the Four Dimensions of an AI Definition of Done? Verification level (which claims get checked, by whom, against what source, and how), provenance disclosure (Human, AI-assisted, or AI-automated, and where the label appears), data hygiene (what never enters the model), and sufficiency tier and environment (which model, plan, and data boundary are good enough and why). Each dimension fits on one line of a one-page template, signed off with an adoption date and a stop rule that pauses the delegation when outputs repeatedly fail.

By Stefan Wolpers DZone Core CORE
AI, OAuth, and Other Platform APIs in the Core
AI, OAuth, and Other Platform APIs in the Core

This is the second follow-up to June 5's release post. It covers the platform APIs that moved into the framework core this release. There are two headline pieces (AI/LLM and the modern OAuth/OIDC stack) and two smaller pieces (WiFi/connectivity and share-sheet result callbacks). This continues the direction the previous release set when we moved NFC, biometrics, and cryptography into the framework core. The full background on that earlier set is in NFC, Crypto, Biometrics, And A New Build Cloud. AI: A First-Class LLM Client and a ChatView Component PR #5035 lands the com.codename1.ai package, the ChatView UI component, the speech and TTS additions, and the build-time dependency injection that wires the native pieces in. PR #5057 lands the developer-guide chapter and the agent-skill addition, so any project generated from the Initializr inherits the new APIs through its bundled AGENTS.md. LlmClient: The Basic Chat Request com.codename1.ai.LlmClient is the entry point. The simplest possible use: Java LlmClient client = LlmClient.openai(apiKey); ChatRequest req = new ChatRequest.Builder() .model("gpt-4o-mini") .system("You are a helpful assistant.") .user("What is the capital of France?") .temperature(0.7) .build(); client.chat(req).onResult((resp, err) -> { if (err != null) { Log.e(err); return; } Log.p(resp.firstChoice().content()); LlmClient.openai(...), LlmClient.anthropic(...), LlmClient.gemini(...), LlmClient.ollama(...), and LlmClient.openAiCompatible(baseUrl, apiKey) are the factories. All five are fully implemented native clients. The OpenAI client also drives Ollama, vLLM, llama.cpp, and any other endpoint that speaks the OpenAI wire format, so most local-model stacks plug in through LlmClient.openAiCompatible(...) without a separate driver. Streaming Chat (What You Actually Want for Chat UIs) For any UI that types responses out token-by-token, the streaming entry point is the one to reach for. The callback fires on the EDT, so you can append directly to a text component: Java client.chatStream(req, new ChatStreamListener() { @Override public void onDelta(ChatDelta d) { responseLabel.setText(responseLabel.getText() + d.contentDelta()); responseLabel.getParent().revalidateLater(); } @Override public void onComplete(ChatResponse fin) { sendButton.setEnabled(true); } @Override public void onError(Throwable t) { Log.e(t); sendButton.setEnabled(true); } Under the hood this is a custom ConnectionRequest subclass that parses SSE line-by-line and dispatches each delta through Display.callSerially. AsyncResource.cancel() kills the socket. So a chat UI that has a cancel button is a one-line cancellation. Tool Calls If you want the model to call back into your app, Tool / ToolChoice give you OpenAI-style function calling. Define the tool, hand the model your model and the available tools, and the response surfaces structured ToolCall objects you dispatch: Java Tool getWeather = Tool.builder() .name("get_weather") .description("Look up the current weather for a city.") .parameter("city", "string", "The city name, e.g. \"Paris\".") .build(); ChatRequest req = new ChatRequest.Builder() .model("gpt-4o-mini") .user("Is it raining in Tel Aviv right now?") .tool(getWeather) .toolChoice(ToolChoice.AUTO) .build(); client.chat(req).onResult((resp, err) -> { if (err != null) return; for (ToolCall call : resp.firstChoice().toolCalls()) { if ("get_weather".equals(call.name())) { String city = call.argument("city").asString(); String json = lookupWeather(city); // Loop the result back into the conversation client.chat(req.replyWithToolResult(call, json)) .onResult((followUp, e) -> updateUi(followUp)); } } The shape mirrors the OpenAI function-calling contract one for one, so anything you have written against the OpenAI API directly maps across without rethinking. Embeddings LlmClient.embed(...) returns a vector for any input string. Useful for similarity search against a local SQLite store (tomorrow's post will cover the new ORM that pairs with this): Java EmbeddingRequest er = new EmbeddingRequest.Builder() .model("text-embedding-3-small") .input("Codename One is a cross-platform mobile framework.") .build(); client.embed(er).onResult((emb, err) -> { float[] vector = emb.firstVector(); // store, search, compare Image Generation DALL-E and a Replicate scaffold are surfaced through ImageGenerator: Java ImageGenerator gen = ImageGenerator.openAiDallE(apiKey); gen.generate("A red bicycle leaning against an olive tree", "1024x1024") .onResult((img, err) -> { if (err != null) return; myImageComponent.setIcon(img); Working Against Ollama in the Simulator (No API Charges) JavaSEPort pings localhost:11434 at startup. If it finds Ollama, it sets the cn1.ai.ollamaDetected property. With cn1.ai.simulatorRedirect=auto (or =ollama) every LlmClient.openai(...) call routes through the local Ollama endpoint instead of OpenAI's. Production code does not change. The iteration loop, your tests, and your offline debugging stop costing money and stop needing an internet connection. In common/codenameone_settings.properties: Properties files simulator.cn1.ai.simulatorRedirect=auto (The simulator. prefix scopes the property to the JavaSE simulator path.) Then run Ollama locally with whichever model your code expects (ollama run llama3.2 or similar) and your existing LlmClient.openai(...) calls go to localhost. How to Handle API Keys A direct word on credentials before any of the above sees production. LLM provider API keys (OpenAI, Anthropic, Gemini, your Auth0 / Firebase configs) are bearer tokens with a budget attached. They must never be checked into source control, embedded in your app binary, or hard-coded in code. A leaked key can be extracted from any APK or IPA in minutes and used to drain your account. The correct shape is to fetch the key from your own backend over an authenticated request, then store it on the device using the platform's keychain / keystore. The framework provides both pieces: com.codename1.crypto.SecureStorage (from the previous release) is the cross-platform wrapper over iOS Keychain Services and Android EncryptedSharedPreferences. Values are encrypted at rest using the platform's hardware-backed protection class where one is available.This release adds a single-argument get / set / remove(account, ...) overloads next to the existing biometric-gated methods. The new overloads store the value without a per-read Face ID / Touch ID prompt, which is what you want for an LLM API key (you read it on every network call; a biometric prompt every time is not workable). The biometric-gated methods are still there for credentials you do want to gate per use. A reasonable shape: Java private static AsyncResource<String> getOpenAiKey() { String cached = SecureStorage.get("openai_api_key"); if (cached != null) { return AsyncResource.complete(cached); } return Rest.get(myServer + "/v1/credentials/openai") .bearerToken(userSessionToken()) .fetchAsString() .onResult((key, err) -> { if (err == null) { SecureStorage.set("openai_api_key", key); } }); Your server gates the credential request behind the user's session, your app caches the result on the keychain, and the key never sits anywhere a reverse-engineering pass could find it. If your server rotates the key, invalidate the cache and refetch. Existing biometric-gated SecureStorage calls keep working unchanged. The new overloads are additive. ChatView: A Ready-Made Streaming Chat UI com.codename1.components.ChatView is the matching UI component. Scrollable message list, ChatBubble for the per-message bubble (theme-aware UIIDs so it picks up the iOS Modern / Material 3 native themes consistently), ChatInput for the bottom input bar, and a one-line bindToLlm(...) that wires the input to a streaming chat request: Java ChatView view = new ChatView(); getOpenAiKey().onResult((key, err) -> { view.bindToLlm(LlmClient.openai(key), new ChatRequest.Builder() .model("gpt-4o-mini") .system("You are a friendly tutor for " + "Codename One developers.") .build()); }); Form f = new Form("Chat", new BorderLayout()); f.add(BorderLayout.CENTER, view); The result is a standard mobile chat layout, picked up from whichever native theme the project uses: If you want more control than bindToLlm(...) gives you (custom message styling, a "thinking" placeholder, hand-rolled retry, persistence to your own model class), drive the view by hand: Java ChatView view = new ChatView(); ConversationStore store = ConversationStore.open("tutor-thread"); view.setMessages(store.load()); LlmClient client = LlmClient.openai(apiKeyFromKeychain); view.setInputListener(userText -> { ChatMessage userMsg = ChatMessage.user(userText); view.appendMessage(userMsg); store.append(userMsg); ChatMessage assistant = ChatMessage.assistant(""); view.appendMessage(assistant); ChatRequest req = new ChatRequest.Builder() .model("gpt-4o-mini") .messages(store.load()) .build(); client.chatStream(req, new ChatStreamListener() { @Override public void onDelta(ChatDelta d) { view.appendToLastMessage(d.contentDelta()); } @Override public void onComplete(ChatResponse fin) { store.append(ChatMessage.assistant(view.lastMessage().content())); view.setInputEnabled(true); } @Override public void onError(Throwable t) { view.appendToLastMessage(" [error: " + t.getMessage() + "]"); view.setInputEnabled(true); } }); appendToLastMessage(...) is the streaming entry point; it marshals through callSerially so deltas land on the EDT in order. ConversationStore persists the thread (the default backing is Storage; pluggable via a custom implementation if you would rather keep it in SQLite or push it to your server). The AI cn1libs The core LLM stack is paired with a set of opt-in cn1libs that wrap specific on-device capabilities: Google ML Kit features, the TensorFlow Lite runtime, a local Whisper transcription engine, and an on-device Stable Diffusion model. Thirteen new cn1libs ship this release. These cn1libs are not yet listed in the Codename One Preferences cn1lib picker, so for the moment they are added by hand. Drop the matching dependency block into your project's common/pom.xml and rebuild. The build-time scanner does the rest: the iOS pod or Swift Package, the Android Gradle dependency, the plist usage strings (NSCameraUsageDescription for the vision libraries, NSSpeechRecognitionUsageDescription for Whisper, etc.), and the Android permissions (android.permission.RECORD_AUDIO for audio capture) are all injected automatically the first time the scanner sees the matching class on the classpath. For each cn1lib below, the dependency block is identical in shape; only the <artifactId> changes. The shared pattern is: XML <dependency> <groupId>com.codenameone</groupId> <artifactId><!-- cn1lib artifact id from below --></artifactId> <version>${cn1.version}</version> </dependency> cn1-ai-mlkit-text: Text Recognition (OCR) TL;DR. Pull printed or handwritten text out of an image (a photo of a page, a sign, a receipt) entirely on-device. Platforms. iOS bridges to GoogleMLKit/TextRecognition. Android bridges to com.google.mlkit:text-recognition. The JavaSE simulator returns an unsupported error. Use cases. Receipt scanning, sign translation pipelines (combine with cn1-ai-mlkit-translate), accessibility tools that read printed text aloud, automated form ingestion. Java byte[] jpeg = capturePhotoBytes(); TextRecognizer.recognize(jpeg).onResult((text, err) -> { if (err == null) Log.p("OCR: " + text); cn1-ai-mlkit-barcode: Barcode and QR Scanning TL;DR. Decodes QR, EAN, UPC, Data Matrix, PDF417, and the rest of the common 1D / 2D code families from a captured image. Platforms. iOS bridges to MLKitBarcodeScanning. Android bridges to com.google.mlkit:barcode-scanning. The JavaSE simulator returns an unsupported error. Use cases. Inventory scanning, ticket / boarding-pass readers, QR-driven onboarding flows, retail loyalty cards. Java byte[] jpeg = capturePhotoBytes(); BarcodeScanner.scan(jpeg).onResult((codes, err) -> { if (err == null) { for (String code : codes) Log.p("Found: " + code); } }); cn1-ai-mlkit-face: Face Detection TL;DR. Returns bounding boxes for human faces detected in an image. Each face is reported as a packed int[4] (x, y, width, height). Platforms. iOS bridges to MLKitFaceDetection. Android bridges to com.google.mlkit:face-detection. Use cases. Auto-crop a contact photo, mosaic / blur bystanders in a group shot, drive a face-tracked overlay for AR-lite filters. Java FaceDetector.detect(jpeg).onResult((boxes, err) -> { if (err != null) return; for (int i = 0; i < boxes.length; i += 4) { Log.p("face at " + boxes[i] + "," + boxes[i + 1] + " " + boxes[i + 2] + "x" + boxes[i + 3]); } }); cn1-ai-mlkit-labeling: Image Labeling TL;DR. "What is in this picture." Returns a list of descriptive labels for the image content. Platforms. iOS bridges to MLKitImageLabeling. Android bridges to com.google.mlkit:image-labeling. Use cases. Auto-tagging uploaded photos, content moderation pre-filters, content-based image search. Java ImageLabeler.label(jpeg).onResult((labels, err) -> { if (err == null) Log.p("labels: " + String.join(", ", labels)); }); cn1-ai-mlkit-translate: On-Device Translation TL;DR. Translate short text between supported language pairs entirely on-device; no server round-trip, no API key, works offline. Platforms. iOS bridges to MLKitTranslate. Android bridges to com.google.mlkit:translate. Languages are identified by their ISO 639-1 codes (en, fr, es, ...). Use cases. Offline travel assistants, chat translation, accessibility readers for foreign signage (combine with cn1-ai-mlkit-text). Java Translator.translate("Where is the train station?", "en", "fr") .onResult((fr, err) -> { if (err == null) Log.p(fr); // "Où est la gare ?" }); cn1-ai-mlkit-smartreply: Short Reply Suggestions TL;DR. Generates short suggested replies for chat conversations, similar to Gmail's Smart Reply chips. Platforms. iOS bridges to MLKitSmartReply. Android bridges to com.google.mlkit:smart-reply. The input is a JSON array of {role, message, timestamp, userId} objects. Use cases. A "quick reply" row above the keyboard in your in-app chat, response suggestions in a CRM inbox. Java String thread = "[{\"role\":\"remote\",\"message\":\"See you at 6?\"," + "\"timestamp\":" + System.currentTimeMillis() + "," + "\"userId\":\"u42\"}]"; SmartReply.suggest(thread).onResult((suggestions, err) -> { if (err == null) { for (String s : suggestions) Log.p("suggestion: " + s); } }); cn1-ai-mlkit-langid: Language Identification TL;DR. Returns the most likely ISO 639-1 code for a given text, or und (undetermined) when the input is too short or ambiguous. Platforms. iOS bridges to MLKitLanguageID. Android bridges to com.google.mlkit:language-id. Use cases. Auto-route a customer-support message to the right team, pick the correct TTS voice for an arbitrary string, pre-screen input before running an expensive translation. Java LanguageIdentifier.identify("Bonjour le monde").onResult((code, err) -> { if (err == null) Log.p(code); // "fr" }); cn1-ai-mlkit-pose: Pose Detection TL;DR. Returns 33 skeletal landmarks per detected pose as a packed float[3 * 33] (x, y, confidence triples). Platforms. iOS bridges to MLKitPoseDetection. Android bridges to com.google.mlkit:pose-detection. Use cases. Fitness apps with form correction, dance/yoga timing analysis, gesture-driven controls. Java PoseDetector.detect(jpeg).onResult((landmarks, err) -> { if (err != null || landmarks.length < 99) return; float noseX = landmarks[0], noseY = landmarks[1], noseConf = landmarks[2]; Log.p("nose at (" + noseX + ", " + noseY + ") conf=" + noseConf); }); cn1-ai-mlkit-segmentation: Selfie Segmentation TL;DR. Returns a per-pixel mask separating the person in the foreground from the background as byte[width * height] (0 = background, 255 = foreground). Platforms. iOS bridges to MLKitSegmentationSelfie. Android bridges to com.google.mlkit:segmentation-selfie. Use cases. Background replacement for video calls, sticker / portrait-mode effects, blur-the-background privacy filters. Java SelfieSegmenter.segment(jpeg).onResult((mask, err) -> { if (err == null) applyBackgroundReplacement(mask); }); cn1-ai-mlkit-docscan: Document Scanner TL;DR. Detects a rectangular document in a photo, perspective-corrects it, and writes the cropped JPEG to a temporary file. Returns the file path. Platforms. iOS uses Apple's VisionKit + Core Image rectangle detection (no extra pod). Android uses com.google.android.gms:play-services-mlkit-document-scanner. Use cases. "Scan to PDF" flows, expense apps that capture receipts, contract signing flows, ID-document capture. Java DocumentScanner.scanToFile(jpeg).onResult((path, err) -> { if (err == null) uploadDocument(path); }); cn1-ai-tflite: TensorFlow Lite Interpreter TL;DR. A general-purpose on-device inference engine. Bring your own .tflite model and run it against a float32 input tensor. Platforms. iOS uses TensorFlowLiteSwift (Pods or Swift Package). Android uses org.tensorflow:tensorflow-lite + tensorflow-lite-support. Use cases. Any custom on-device ML model your team trains or pulls from TF Hub. Image classification, simple regression, recommendation pre-filters. Java byte[] modelBytes = Util.readFully(Display.getInstance().getResourceAsStream(null, "/model.tflite")); float[] input = featureVector(); Interpreter.run(modelBytes, input).onResult((output, err) -> { if (err == null) Log.p("model returned " + output.length + " values"); }); cn1-ai-whisper: Speech-to-Text via whisper.cpp TL;DR. On-device transcription of a 16 kHz mono WAV file using a ggml-format Whisper model. The cn1lib bundles libwhisper.a. Platforms. iOS uses the Accelerate framework; Android uses a JNI build of the same whisper.cpp core. Models (e.g. ggml-base.bin) are not bundled; ship the one your app expects under the app's resources or download on first launch. Use cases. Voice notes, accessibility transcription, offline dictation, podcast indexing. Java String modelPath = SecureStorage.getFilePath("ggml-base.bin"); String audioPath = recordWavToFile(); WhisperRecognizer.transcribe(modelPath, audioPath) .onResult((text, err) -> { if (err == null) Log.p("heard: " + text); }); cn1-ai-stablediffusion: On-Device Image Generation TL;DR. Generates a JPEG from a text prompt using a bundled Stable Diffusion model. Multi-gigabyte payload, local build only. Platforms. iOS uses Core ML pipelines compiled from the bundled model. Android uses ONNX Runtime. Both configurations exceed the cloud build server's 2 GB upload limit, so this cn1lib triggers the cn1.ai.requiresBigUpload guard and the cloud build aborts with a "build this one locally" message. Add it to a project you build via mvn cn1:buildAndroid / mvn cn1:buildIosXcodeProject on the developer machine. Use cases. Avatar generation in apps where shipping to a cloud API is undesirable (offline-first apps, regulated industries, privacy-sensitive products). Java StableDiffusion.generate("a teal hot-air balloon over Lisbon, watercolour", 512, 512, /* steps */ 25) .onResult((jpeg, err) -> { if (err == null) display(Image.createImage(jpeg, 0, jpeg.length)); }); Why These Are cn1libs and Not Part of the Core The core gets the AI plumbing every app that adopts AI at all wants: the LLM client, streaming, the chat UI, the secure storage primitive for credentials, the simulator Ollama redirect for offline iteration. The cn1libs above are specialized verticals. Barcode scanning, document scanning, face detection, smart reply, pose detection, on-device translation, transcription, and on-device image generation are genuinely useful, but only for some apps. They also each bring a non-trivial native dependency. The Google ML Kit Android frameworks are large; the iOS pods carry their own weight; the bundled libwhisper.a and the Stable Diffusion model are big. Pulling all of them into the core would tax every app, whether the feature is used or not. The Stable Diffusion cn1lib in particular is large enough that the cloud build server cannot accept the upload at all (it trips the 2 GB pre-upload guard). That kind of opt-in does not belong in a dependency every app inherits. The corresponding chapter, including the full LlmClient API table, the ChatView reference, the SecureStorage overloads, the simulator Ollama redirect, and the full cn1lib coverage, is at AI, Chat UI, and Speech in the developer guide. OAuth and OIDC: The Modern Identity Stack The in-app-WebView Oauth2 flow that Codename One has shipped since approximately forever was the way every cross-platform mobile framework solved "sign in with Google / Facebook / Microsoft" in the 2010s. It is also the way every one of those identity providers stopped wanting you to solve it. Google has been blocking embedded user agents for years. Apple does not want third-party apps wrapping the Apple ID flow in a WKWebView. Microsoft and Facebook joined the chorus. The right answer is the system browser: ASWebAuthenticationSession on iOS, Custom Tabs on Android, with PKCE on the wire. That is what PR #5018 lands. PR #5039 adds a portable WebAuthn / passkey client on top. Sign In With Google (or Any OIDC Provider) com.codename1.io.oidc.OidcClient is the entry point. Point it at the discovery URL of an OIDC provider, hand it the client id and the redirect URI you registered with the provider, ask for tokens: Java OidcConfiguration cfg = OidcConfiguration.discover("https://accounts.google.com"); OidcClient client = OidcClient.builder() .configuration(cfg) .clientId("123-abc.apps.googleusercontent.com") .redirectUri("com.example.myapp:/oauthredirect") .scopes("openid", "email", "profile") .build(); client.signIn().onResult((tokens, err) -> { if (err != null) { OidcException oe = (OidcException) err; if (oe.getCode() == OidcException.USER_CANCELLED) return; Log.e(oe); return; } String idToken = tokens.getIdToken().raw(); String email = tokens.getIdToken().getClaim("email").asString(); proceed(email, idToken); Discovery JSON parsed and cached. PKCE S256 challenge generated and verified. State and nonce checked on the callback. ID-token claims decoded for you (we deliberately do not verify the signature client-side; the dev guide is explicit about why and points at the "re-validate on your backend" remedy). Refresh and revoke are first-class. The token store is pluggable via TokenStore; the default is Storage-backed, but a Keychain-backed or in-memory variant is a small class. On iOS the system-browser piece routes through ASWebAuthenticationSession. On Android through androidx.browser.customtabs with a plain ACTION_VIEW fallback for the rare device with no Custom Tabs provider. AuthenticationServices.framework and androidx.browser:browser are auto-linked when the classpath scanner sees OidcClient in use. Provider Wrappers: Google, Apple, Microsoft, Facebook, Auth0, Firebase If you would rather not configure OIDC by hand, the existing social classes get a signIn(...) method that drives the same stack with the provider's issuer URL pre-wired: Java GoogleConnect.signIn(googleClientId, "com.example.myapp:/oauthredirect", "openid", "email", "profile") .onResult((tokens, err) -> { /* ... */ }); MicrosoftConnect.signIn(entraClientId, "msauth.com.example.myapp://auth", "User.Read") .onResult((tokens, err) -> { /* ... */ }); Auth0Connect.signIn("tenant.auth0.com", clientId, redirectUri, "openid profile email") .onResult((tokens, err) -> { /* ... */ }); FacebookConnect.signIn(...) follows the same shape against the Facebook OIDC endpoint. FirebaseAuth covers the REST-based Firebase auth surface (email/password, IdP token exchange, refresh) which sits underneath any provider hand-off you might want to drive from app code. Sign In With Apple Sign in with Apple is required on iOS for apps that offer any other social login, and on Android it must fall through to a web flow. com.codename1.social.AppleSignIn handles both transparently: Java AppleSignIn.signIn() .onResult((result, err) -> { if (err != null) return; String idToken = result.getIdToken(); String code = result.getAuthorizationCode(); proceedToBackend(idToken, code); }); On iOS 13 and later this drops directly into the native Apple sheet via ASAuthorizationAppleIDProvider. On non-iOS platforms it falls through to the same OIDC web flow as everything else, so a single line of app code does the right thing on every port. The Maven plugin injects the com.apple.developer.applesignin entitlement on iOS when it sees AppleSignIn in use; Android does not see it because it is not there. Migration From the Legacy Oauth2 com.codename1.io.Oauth2 is now deprecated. Existing code still compiles, but the migration is short and almost always shorter than what it replaces: Java // Before Oauth2 oauth = new Oauth2("https://accounts.google.com/o/oauth2/auth", clientId, redirectUri); oauth.setClientSecret(clientSecret); oauth.setScope("openid email profile"); oauth.setBrowserComponent(myBrowserComponent); // tied to a WKWebView String token = oauth.authenticate(); // blocks, opens the web view Java // After OidcClient.builder() .configuration(OidcConfiguration.discover("https://accounts.google.com")) .clientId(clientId) .redirectUri(redirectUri) .scopes("openid", "email", "profile") .build() .signIn() .onResult((tokens, err) -> proceed(tokens.getIdToken().raw())); You stop owning the browser. The OS owns it. The cookies live in the platform's authentication session. The user gets the same login experience they have everywhere else on their device. WebAuthn/Passkeys PR #5039 layers a portable WebAuthn client on top: Java WebAuthnClient client = WebAuthnClient.getInstance(); if (!client.isAvailable()) { fallbackToPassword(); return; } PublicKeyCredentialCreationOptions opts = PublicKeyCredentialCreationOptions.fromServerJson(serverJson); client.create(opts).onResult((cred, err) -> { if (err == null) postToRelyingParty(cred.toJson()); }); W3C JSON wire format in both directions, so the response can be POSTed verbatim to any standard server-side WebAuthn library. iOS 16+ routes through ASAuthorizationPlatformPublicKeyCredentialProvider; Android API 28+ through androidx.credentials.CredentialManager. Provider helpers: Auth0Connect.signInWithPasskey(...) / .registerPasskey(...) and FirebaseAuth.signInWithPasskey(...) / .registerPasskey(...). One thing worth pulling out before you reach for it: if you sign in via OIDC against Google, Apple, Microsoft, Auth0, or Firebase, you usually already get passkeys for free. The identity provider runs the WebAuthn ceremony inside the system browser; OIDC just hands you the resulting tokens. So you do not need WebAuthnClient for that case. You need it for apps that run their own relying-party backend, and for apps driving the Auth0 or Firebase passkey grants directly. Full chapter: Authentication and Identity. Connectivity: WiFi, Bonjour, USB, network-type listeners PR #5021 lands four packages for apps that need to do more with the network than open an HTTP socket. The shape: Java WiFi wifi = WiFi.getInstance(); String ssid = wifi.getCurrentSSID(); String bssid = wifi.getBSSID(); String gateway = wifi.getGateway(); String ip = wifi.getIp(); wifi.scan(new ScanOptions().setTimeoutMillis(5000)) .onResult((results, err) -> { /* ... */ }); wifi.connect("MyNetwork", "hunter2", Security.WPA2_PSK) .onResult((success, err) -> { /* ... */ }); com.codename1.io.wifi for WiFi info, scan, and connect. com.codename1.io.wifi.WiFiDirect for peer-to-peer (Android only by platform reality). com.codename1.io.bonjour for mDNS / Zeroconf via BonjourBrowser and BonjourPublisher. com.codename1.io.usb for USB host (Android only). And NetworkManager.addNetworkTypeListener(...) plus NETWORK_TYPE_* constants so an app can react to a transition between cellular, WiFi, ethernet, or "none": Java NetworkManager.getInstance().addNetworkTypeListener(evt -> { int type = evt.getNetworkType(); if (type == NetworkManager.NETWORK_TYPE_NONE) showOfflineBanner(); else if (type == NetworkManager.NETWORK_TYPE_CELLULAR) suppressLargeBackgroundDownloads(); else clearOfflineBanner(); }); iOS does not expose programmatic WiFi scanning to third-party apps; scan() throws UnsupportedOperationException on iOS. iOS also does not expose WiFi Direct or general USB host. None of those are Codename One limitations; they are Apple's. The dev guide is explicit about each platform's limits. Three new compile-time defines (CN1_INCLUDE_WIFI_INFO, CN1_INCLUDE_HOTSPOT, CN1_INCLUDE_BONJOUR) wrap the iOS native code, set only when the classpath scanner sees the matching Java API in use. Apps that do not use these APIs do not pay for them at App Store review time. Same pattern as the NFC gating from the previous release. Full reference: Network Connectivity. Share-Sheet Result Callbacks PR #5036 closes a small but persistent gap: Display.share(...) and ShareButton finally tell you what the user did with the share sheet: Java ShareButton btn = new ShareButton(); btn.setTextToShare("Look at this fox"); btn.setImageToShare("/fox.jpg"); btn.setShareResultListener(result -> { switch (result.getStatus()) { case SHARED_TO: track("share_completed", result.getTargetPackage()); break; case DISMISSED: track("share_dismissed"); break; case FAILED: track("share_failed", result.getError()); break; } }); iOS routes through UIActivityViewController.completionWithItemsHandler; Android through Intent.createChooser with an IntentSender callback (API 22+). The framework normalizes the platform values into SHARED_TO(packageName), DISMISSED, or FAILED. Appearing in Other Apps' Share Menus The other half of sharing is the inverse direction: not "let the user share from your app", but "let your app receive content other apps share". If a user is in Safari, Photos, or Mail and taps the share icon, your app should be able to appear as a target there alongside Messages, WhatsApp, and Instagram. On iOS that requires a separate Share Extension target inside the .ipa, with its own bundle, its own Info.plist, an App Group string that links it to the host app, and a ShareViewController that handles the incoming payload. Historically the recommendation was to bootstrap that target by hand in Xcode, copy the resulting files into the Codename One project under ios/app_extensions/, and let the build server's extractor consume them. It worked, but it was a workflow most teams put off because the setup is fiddly. The same PR ships an IOSShareExtensionBuilder Mojo that does all of that for you. A typical setup is one Maven command and a one-time configuration block: XML <plugin> <groupId>com.codenameone</groupId> <artifactId>codenameone-maven-plugin</artifactId> <configuration> <iosShareExtension> <bundleIdentifier>com.example.myapp.share</bundleIdentifier> <displayName>MyApp</displayName> <appGroup>group.com.example.myapp</appGroup> <acceptedContent> <content>PUBLIC_URL</content> <content>PUBLIC_IMAGE</content> <content>PUBLIC_TEXT</content> </acceptedContent> </iosShareExtension> </configuration> </plugin> Run mvn cn1:generate-ios-share-extension and the Mojo writes a complete .ios.appext bundle into ios/app_extensions/: the Info.plist with the right NSExtension activation rules for the content types you declared, the App Group entitlement, a minimal ShareViewController.swift that lands the payload in the App Group's UserDefaults(suiteName:), and the matching buildSettings.properties. The result feeds straight into the existing IPhoneBuilder.extractAppExtensions pipeline, so apps that already have a hand-rolled extension keep working unchanged. On the host-app side, you read the payload on launch: Java // Anywhere after Display.init has run String shared = Storage.getInstance() .readObject("ios.shareExtension.lastPayload"); if (shared != null) { handleSharedPayload(shared); } After the next cloud or local build, your app appears in the iOS share sheet for the content types you declared. No Xcode work, no hand-rolled plist, no App Group string typed in three places. The build-time tooling owns it. Wrapping Up Tomorrow's post covers the architectural change in this release: a build-time bytecode annotation framework, the declarative router that is its first consumer, the SQLite ORM and JSON / XML mappers and component binder built on the same SPI, and the build-time SVG / Lottie transcoder that ships in the same release for related reasons. Back to the weekly index.

By Shai Almog DZone Core CORE
5 Warning Signs Your Data Architecture Needs a Redesign (Before It Falls Apart)
5 Warning Signs Your Data Architecture Needs a Redesign (Before It Falls Apart)

Most data architectures don't fail all of a sudden. They clearly show warning signs for months, or sometimes years, before anyone takes action. By that time, the damage is already done. I have spent 20 years building and reviewing data platforms across industries (from CPG to healthcare to consumer tech), and here is what I've learned to identify these signals early. The good news is that you can fix them before they become a disaster. The bad news is that most organizations ignore these signs until an AI initiative gets stuck, executives lose trust in reports/dashboards, or new joinees quit because the system is too complex to understand and maintain. Here are five critical signs that your data architecture needs a redesign, along with what to do about each one. Sign 1: Your AI Initiatives Keep Stalling at the Data Layer You've got the right team. You've picked the best models. You've invested in the necessary infrastructure. Still, your AI projects keep hitting the same blockers: they can't move past experimentation. The problem isn't your models. It's your data. What's Actually Happening is AI systems need three things that most legacy data architectures don't provide: Semantic layer: Clear definitions of what your data meansData lineage: Traceability of where data came from and how it transformedGoverned access: Controlled, policy-driven data access at scale Without these, your AI models are working with incomplete or inconsistent information. They might produce results, but you can't simply trust them. And when business leaders ask, "Why did the model make this decision?" you can't answer. The Architecture Gap This is what an AI-ready architecture looks like. Most architectures skip the middle layers. They have ingested raw data and may have built some curated/gold-layer tables, but nothing in between. That's why AI fails. What to Fix? Add a semantic layer that defines business metrics consistently across teamsImplement active metadata that tracks lineage automaticallyBuild governed access into your architecture, not as a separate policy document AI readiness starts in the architecture. Not the model you picked. Sign 2: Different Teams Get Different Answers From the Same Data The marketing department says revenue is $10M. The finance department says it's $9.2M. The CEO's dashboard shows $10.5M. Everyone's using the same source data. Yet nobody agrees. This isn't a reporting problem; this is a semantic layer problem. When you don't have a centralized definition of what "revenue" means (or any other business metric), every team creates its own version. Marketing might count revenue when a campaign is launched. Finance counts it when payment is recognized. The executive dashboard might include projected revenue. All "correct," but they don't match. The Cost of Inconsistency The Architecture You Need When everyone uses the same semantic definitions, numbers align. Trust returns. Decisions happen faster. What to Fix? Define business metrics once in a centralized semantic layerEnforce those definitions across all reporting toolsDocument the logic in a central place like Confluence so anyone can trace how a number was calculated Sign 3: Your Governance Lives in a Document That Nobody Reads You have a data governance policy. It is a .docx and .pdf file sitting in a SharePoint or Confluence site. No one has opened it for a very long time. Meanwhile, your team is manually handling access requests to the data, and imagine that someone forgot to tag the sensitive data, and the team has no idea which downstream systems are consuming PII data. Governance in 2026 is embedded in the architecture, not sitting in a document somewhere. Real governance is not something about people remembering to follow rules. It's about the systems that automatically enforce them. Old way (broken): Policy documentsTraining sessionsManual access reviewsPeriodic audits Modern way (embedded): Automated lineage trackingActive metadata that tags sensitive dataPolicy enforcement at the query levelContinuous compliance monitoring Embedded Governance Every query is getting checked against policies. Sensitive data is getting tagged automatically. Lineage is being tracked without human input, and governance happens by design, not by reminders. What to Fix? Move governance from documents to code (policy engines, access controls)Implement active metadata that automatically tags and classifies dataBuild lineage tracking into your pipeline toolingEnforce policies at the query layer, not as a post-check Sign 4: Security Was Designed for Humans, Not for AI Agents Your security model works great for analysts querying dashboards, data engineers running pipelines, and Data scientists building models. But here is what it was not built for -> "AI Agents" that query your data autonomously, all the time, at scale, without a HITL (human-in-the-loop). The New Access Pattern Old access: Human queries the dataHuman reviews the output/resultsHumans decide what to do with the results AI agents access: Agents query the data continuouslyAgents processed 1000's of rows automaticallyAgents make decisions without a human reviewing themAgents scale across multiple data sets The Security Gap If your security model assumes humans are always involved, you end up with a growing security gap. Security for AI Agents You need fine-grained, policy-driven security that works for both human and machine users. What to Fix? Implement column-level security (not just object-level)Add rate limits and quotas for AI agentsLog all access in real time with anomaly detectionUse context-aware policies that consider the query intention, not just the user role Sign 5: A New Engineer Needs Months to Understand the Architecture of the System A new data engineer joined your team. They are smart, experienced, and highly motivated. But after 2 months, onboarding is complete, they still can't confidently answer "Where does this metric come from?" or "What happens if I change this part in the pipeline?" Do you think it is a hiring problem? No, certainly not. It's an architecture problem. Great Architecture Is Maintainable If onboarding an engineer takes longer than it should, the design is the issue, not the engineer. Here are the red flags that you should pay attention to. Red flags: No clear data modelling standardsMissing or incomplete metadataPoorly defined ownership (who owns this table?)Fragmented pipeline design (no standard/consistent pattern)Documentation that is missing or outdated Maintainable Architecture Principles When these are part of your architecture, new joinees can navigate the system in weeks instead of months and follow it easily. What to Fix? Standard data modelling across the organizationGenerate metadata automatically (Don't depend on manual documentation)Use a consistent pattern for pipelines (Same design, same tools, same naming standards)Assign clean ownership for every data product/data domainAuto-generate documentation for newly created pipelines/code The fundamentals never change, but the layers around them have. After working for 20 years in this space, I have noticed that the core principles of data architecture remain the same. What never changes: Data modelingSchema designAligning with business outcomes/requiremnts What has matured over time: Governance (embedded rather than document-driven)Metadata (became active from passive)Semantic layer (A centralised one, not scattered across)Security (AI-Aware, not human only)AI readiness (architecture first, not model first) If these modern layers are missing from your architecture, now is the time to add them. Not when AI initiatives stall, not when executive leaders lose trust in the data, not when your best engineers quit because the system became too complex. Which Sign Resonates Most With You? I have worked with companies that have faced all five of these signs. Some are dealing with one. Most are dealing with three or four. The question is not whether you have these problems. The question is: which one is costing you the most right now? Is it AI initiatives that can't move forward?Is it teams that can't agree on basic metrics?Is it governance that exists only in a document?Is it a security gap that you're discovering too late?Is it engineers who can't navigate your architecture? Pick the one that's most urgent and start there. You don't need to solve everything at once. But do start. Before the warning signs become breaking points.

By Rajanikantarao Vellaturi

Top AI/ML Experts

expert thumbnail

Tuhin Chattopadhyay

AI Decision Intelligence Scholar-Practitioner | Founder, Tuhin AI Advisory | Professor & Area Chair, AI & Analytics,
JAGSoM

Dr. Tuhin Chattopadhyay is an AI Decision Intelligence scholar-practitioner, enterprise AI advisor, and Professor & Area Chair, AI & Analytics, at Jagdish Sheth School of Management, Bengaluru. He works at the intersection of artificial intelligence, analytics, governance, and executive decision-making, with a current focus on helping organizations move from fragmented AI adoption to agentic, governed, and decision-aware AI systems. Through Tuhin AI Advisory, he advises enterprises and institutions on AI strategy, analytics transformation, generative AI, agentic AI, AI governance, and executive capability building. His teaching, research, and advisory work span predictive analytics, machine learning, modern data architecture, decision intelligence, and responsible enterprise AI. Recognized among India’s Top 10 Data Scientists by Analytics India Magazine, Dr. Chattopadhyay regularly writes and speaks on AI, analytics, agentic systems, enterprise transformation, and the future of decision-making in organizations.
expert thumbnail

Frederic Jacquet

Technology Evangelist,
AI[4]Human-Nexus

My goal is to deepen my research and analysis to track technological developments and understand their real impacts on businesses and individuals. I focus on untangling exaggerated perceptions and irrational fears from genuine technological advances. My approach is critical: I aim to move beyond myths and hype to identify the concrete, realistic progress we can expect from new technologies.
expert thumbnail

Pratik Prakash

Principal Solution Architect,
Capital One

Pratik, an experienced solution architect and passionate open-source advocate, combines hands-on engineering expertise with an extensive experience in multi-cloud and data science .Leading transformative initiatives across current and previous roles, he specializes in large-scale multi-cloud technology modernization. Pratik's leadership is highlighted by his proficiency in developing scalable serverless application ecosystems, implementing event-driven architecture, deploying AI-ML & NLP models, and crafting hybrid mobile apps. Notably, his strategic focus on an API-first approach drives digital transformation while embracing SaaS adoption to reshape technological landscapes.

The Latest AI/ML Topics

article thumbnail
Why AI-Generated Code Is Making Regression Testing More Important, Not Less
AI-generated code introduces integration failures that spec-based tests cannot catch. Regression testing grounded in real production behavior is the fix.
July 1, 2026
by Sancharini Panda
· 182 Views
article thumbnail
AI-Augmented React Development: How I Rebuilt My Workflow Without Losing Control of the Code
AI accelerates React 18 workflows but breaks down in large enterprise codebases. Here’s where it helps, where it fails, and the guardrails your team needs.
July 1, 2026
by Sathwik Nagulapati
· 229 Views
article thumbnail
If You Can Facilitate a Retrospective, You Can Audit Your AI
Learn how the AI Delegation Audit helps Scrum teams inspect AI workflows, catch automation drift, and keep delegated AI work safe and accountable.
July 1, 2026
by Stefan Wolpers DZone Core CORE
· 216 Views
article thumbnail
Loop Engineering: The Layer After Prompt, Context, and Harness Engineering
This article walks through all four layers side by side, with comparison tables for when to use each one and which agent architecture fits which job.
July 1, 2026
by Vidyasagar (Sarath Chandra) Machupalli FBCS DZone Core CORE
· 295 Views · 1 Like
article thumbnail
The New Senior Developer Job Description: Half Engineer, Half AI Systems Architect
Senior developers now own two roles: traditional engineering plus AI systems architecture. This split reshapes compensation, hiring, and what 'senior' actually means.
June 30, 2026
by Dinesh Elumalai DZone Core CORE
· 835 Views · 2 Likes
article thumbnail
Architecting Trustworthy AI: Engineering Patterns for High-Stakes Environments
This post presents three domain-agnostic engineering patterns for building AI systems that remain safe even when the model is wrong.
June 29, 2026
by Sujay Puvvadi
· 692 Views
article thumbnail
Building Production-Safe Agentic Remediation With Docker MCP Gateway: Lessons From 43% to 100% Accuracy
We built an AI Docker remediation system on MCP Gateway. First version: 43% correct. After 9 engineering fixes: 100%. Here's what changed.
June 29, 2026
by Mohammad-Ali Arabi
· 819 Views
article thumbnail
Black Swan Bugs: Paving the Way for New Roles in Software Engineering
By outsourcing more of our thinking to probabilistic systems, we risk weakening the very human habit black swans demand: the habit of making the right questions.
June 29, 2026
by Stelios Manioudakis DZone Core CORE
· 2,714 Views · 3 Likes
article thumbnail
Why Requirements Are Becoming the Control Layer in AI-Assisted Development
As AI generates more code and tests, requirements become the control layer that keeps delivery consistent, traceable, and aligned with the system context.
June 29, 2026
by Andrei Lavygin
· 748 Views
article thumbnail
Before the AI Coding Agent Writes Code: Structuring Scattered Requirements With PARA
AI coding agents often fail when the required context is scattered. It is about preparing better context before the agent writes code.
June 29, 2026
by Venkata Naga Satya Sai Vineeth Kondisetty
· 617 Views
article thumbnail
The New Insider Threat Isn't Human: Securing AI Agents Before They Secure Themselves
AI agents are becoming powerful insiders. Learn how identity, MCP security, least privilege, and policy enforcement reduce emerging risks.
June 26, 2026
by Igboanugo David Ugochukwu DZone Core CORE
· 1,671 Views · 1 Like
article thumbnail
Data Pipeline Observability: Why Your AI Model Fails in Production
Your machine learning model had 95% accuracy in testing, but crashes in production. The problem isn't the model, it's your data pipeline.
June 26, 2026
by Abhilash Rao Mesala
· 936 Views
article thumbnail
Two Clocks Are Running Out at Once, and Almost Nobody Is Watching Both
Quantum computing and AI coding tools are changing security. Learn why crypto-agility and better governance are now critical.
June 26, 2026
by Igboanugo David Ugochukwu DZone Core CORE
· 1,752 Views · 2 Likes
article thumbnail
What Cloud Engineers Actually Need to Know About AI Infrastructure
AI infrastructure isn’t about GPUs. Most issues come from storage, networking, data pipelines. If GPU utilization is low, check the infrastructure first, not the model.
June 26, 2026
by Naveen Kalapala
· 924 Views
article thumbnail
Beyond Software Hope: The Engineering Blueprint for AI Execution Truth
This engineering blueprint details how to replace "software hope" with deterministic, hardware-level enforcement via TEEs and the Citadel protocol.
June 25, 2026
by Theo Ezell
· 1,163 Views · 1 Like
article thumbnail
Code and Connect: MCP + MuleSoft
Understand MCP, AI agents, and assistants, and learn how Model Context Protocol connects AI applications to tools using MuleSoft.
June 25, 2026
by Ajay Singh
· 1,148 Views
article thumbnail
The AI Definition of Done
The AI Definition of Done: human-in-the-loop is not a quality standard; you need a different approach for agent harnesses or operational excellence.
June 25, 2026
by Stefan Wolpers DZone Core CORE
· 988 Views
article thumbnail
AI, OAuth, and Other Platform APIs in the Core
Deeper AI integration in the framework core, modern authentication via OAuth / OIDC and WebAuthn passkeys driven from the system browser, and a few smaller additions.
June 24, 2026
by Shai Almog DZone Core CORE
· 1,587 Views · 1 Like
article thumbnail
5 Warning Signs Your Data Architecture Needs a Redesign (Before It Falls Apart)
Five signs your data architecture needs redesign. Fix with semantic layer, active metadata, embedded governance, AI security, maintainable design.
June 24, 2026
by Rajanikantarao Vellaturi
· 1,219 Views · 1 Like
article thumbnail
AI Broke Your Definition of Done
When a machine writes most of the code, "the code shipped" stops being a finish line. The work that's left is the work your definition of done was already skipping.
June 24, 2026
by Matt Watson
· 937 Views · 2 Likes
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • ...
  • Next
  • RSS
  • X
  • Facebook

ABOUT US

  • About DZone
  • Support and feedback
  • Community research

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 215
  • Nashville, TN 37211
  • [email protected]

Let's be friends:

  • RSS
  • X
  • Facebook
×