DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Software Design and Architecture

Software design and architecture focus on the development decisions made to improve a system's overall structure and behavior in order to achieve essential qualities such as modifiability, availability, and security. The Zones in this category are available to help developers stay up to date on the latest software design and architecture trends and techniques.

Functions of Software Design and Architecture

Cloud Architecture

Cloud Architecture

Cloud architecture refers to how technologies and components are built in a cloud environment. A cloud environment comprises a network of servers that are located in various places globally, and each serves a specific purpose. With the growth of cloud computing and cloud-native development, modern development practices are constantly changing to adapt to this rapid evolution. This Zone offers the latest information on cloud architecture, covering topics such as builds and deployments to cloud-native environments, Kubernetes practices, cloud databases, hybrid and multi-cloud environments, cloud computing, and more!

Containers

Containers

Containers allow applications to run quicker across many different development environments, and a single container encapsulates everything needed to run an application. Container technologies have exploded in popularity in recent years, leading to diverse use cases as well as new and unexpected challenges. This Zone offers insights into how teams can solve these challenges through its coverage of container performance, Kubernetes, testing, container orchestration, microservices usage to build and deploy containers, and more.

Integration

Integration

Integration refers to the process of combining software parts (or subsystems) into one system. An integration framework is a lightweight utility that provides libraries and standardized methods to coordinate messaging among different technologies. As software connects the world in increasingly more complex ways, integration makes it all possible facilitating app-to-app communication. Learn more about this necessity for modern software development by keeping a pulse on the industry topics such as integrated development environments, API best practices, service-oriented architecture, enterprise service buses, communication architectures, integration testing, and more.

Microservices

Microservices

A microservices architecture is a development method for designing applications as modular services that seamlessly adapt to a highly scalable and dynamic environment. Microservices help solve complex issues such as speed and scalability, while also supporting continuous testing and delivery. This Zone will take you through breaking down the monolith step by step and designing a microservices architecture from scratch. Stay up to date on the industry's changes with topics such as container deployment, architectural design patterns, event-driven architecture, service meshes, and more.

Performance

Performance

Performance refers to how well an application conducts itself compared to an expected level of service. Today's environments are increasingly complex and typically involve loosely coupled architectures, making it difficult to pinpoint bottlenecks in your system. Whatever your performance troubles, this Zone has you covered with everything from root cause analysis, application monitoring, and log management to anomaly detection, observability, and performance testing.

Security

Security

The topic of security covers many different facets within the SDLC. From focusing on secure application design to designing systems to protect computers, data, and networks against potential attacks, it is clear that security should be top of mind for all developers. This Zone provides the latest information on application vulnerabilities, how to incorporate security earlier in your SDLC practices, data governance, and more.

Latest Premium Content
Trend Report
Security by Design
Security by Design
Trend Report
Kubernetes in the Enterprise
Kubernetes in the Enterprise
Refcard #291
Code Review Core Practices
Code Review Core Practices
Refcard #392
Software Supply Chain Security
Software Supply Chain Security

DZone's Featured Software Design and Architecture Resources

The New Insider Threat Isn't Human: Securing AI Agents Before They Secure Themselves

The New Insider Threat Isn't Human: Securing AI Agents Before They Secure Themselves

By Igboanugo David Ugochukwu DZone Core CORE
In mid-September 2025, engineers inside Anthropic's threat intelligence team noticed something that didn't fit the usual pattern of automated probing on their platform. Ten days of digging later, they had a name for it: GTG-1002, a Chinese state-sponsored group that had turned Claude Code into the operational core of a cyber-espionage campaign against roughly thirty organizations — banks, chemical manufacturers, tech firms, government agencies. When Anthropic published its account of the intrusion on November 14, the detail that made security teams sit up wasn't the target list. It was the autonomy ratio: by the company's own estimate, the AI agent executed somewhere between 80 and 90 percent of the operation — reconnaissance, vulnerability discovery, exploit development, lateral movement, exfiltration — with humans stepping in only at a handful of strategic checkpoints. Jacob Klein, who heads threat intelligence at Anthropic, called it an escalation that lowers the bar for who can run a sophisticated intrusion at all. I've spent the better part of this year watching that bar keep dropping, one disclosure at a time. And the thing I keep coming back to is this: the security industry built thirty years of tooling around the assumption that the dangerous actor inside your network is a person — a careless employee, a disgruntled admin, a phished contractor. That assumption is now wrong often enough to be a liability. The dangerous actor increasingly has no payroll record, no badge, no manager to flag erratic behavior. It's a process. And it's already inside. Skeleton Keys for Software Here's the uncomfortable arithmetic. CyberArk's 2025 Identity Security Landscape study found machine identities now outnumber human ones by more than 80 to 1 inside the average enterprise, with AI specifically named as the biggest driver of new privileged accounts this year. Other measurements land in a wide band — Rubrik Zero Labs put it at 82 to 1, Entro Labs measured DevOps-heavy environments at 144 to 1 — but every credible estimate points in the same direction, and the gap is widening faster than anyone's governance program. What makes this dangerous isn't the count. It's the habit. Most teams I've talked with over the past eighteen months reached for the path of least resistance when they first wired an agent into production: they handed it a copy of a human's API key, or a service account with the same standing privileges everyone else in that pipeline already had. It's the software equivalent of cutting a spare house key and leaving it under the mat — convenient until the day someone you didn't intend to find it. That convenience is exactly what blew up Salesloft and its customers in August 2025. Attackers tracked as UNC6395 didn't breach Salesforce. They stole OAuth tokens belonging to Drift, a chatbot integration plugged into it, and used those long-lived, broadly scoped tokens to walk into Salesforce, Slack, AWS, and Google Workspace environments at more than 700 downstream organizations — Cloudflare and Google among them — over roughly a ten-day window. Nobody compromised the platform. They compromised the credential that the integration was trusted with, and that credential opened far more doors than the integration's actual job required. Swap "chatbot integration" for "AI agent," and you've described the exact failure mode every analyst is now warning about for 2026. The fix that keeps surfacing in serious architecture conversations isn't exotic — it's the same zero-trust logic that's been preached at humans for a decade, finally pointed at software: Skeleton-key modelScoped-identity modelCredentialCopied human API key or shared service accountUnique identity per agent, issued via OAuth client credentials or a workload-identity standard like SPIFFELifetimeStatic, often unrotated for months or yearsShort-lived, reissued per session or taskBlast radius if stolenEverything that account can touchOnly what that specific agent was scoped to doAuditability"Someone" did thisThis agent, acting on this task, did this None of this is theoretical anymore. Gartner is telling boards that by 2028, roughly a third of enterprise applications will carry embedded agentic AI, and 15 percent of day-to-day work decisions will be made without a human in the loop. You cannot run that volume of autonomous action on credentials designed for an employee who logs in, does a job, and logs out. When the Prompt Is the Payload If identity is the slower-burning problem, prompt injection is the one that's already setting things on fire. OWASP's 2025 Top 10 for LLM Applications kept it at the number-one slot for a second consecutive edition, and for good reason: an LLM has no architectural separation between "instructions I should obey" and "data I should merely read." Feed it both in the same channel, and a sufficiently clever attacker can make the model treat the second as the first. The cleanest public demonstration of how bad this gets in practice is CamoLeak, the vulnerability researcher Omer Mayraz disclosed through Legit Security in October 2025, tracked as CVE-2025-59145 with a CVSS score of 9.6. The setup was almost playful: hide an instruction inside a pull request's invisible comment field, wait for a developer to ask GitHub Copilot Chat to review that PR, and let Copilot — operating with that developer's own repository privileges — quietly search the codebase for strings like "AWS_KEY," then exfiltrate whatever it found one character at a time. Each character got mapped to its own GitHub-hosted image URL, routed through GitHub's own trusted Camo proxy so the outbound traffic looked like nothing more than a chat window rendering a picture. Legit Security's CTO, Liav Caspi, put the core problem plainly: a vigilant network monitor might catch the unusual request pattern, but the average user or maintainer almost certainly wouldn't. GitHub closed the hole in August by disabling image rendering in Copilot Chat entirely — a blunt fix, but an honest acknowledgment that there was no elegant patch for the underlying design flaw. What should worry you is that CamoLeak is GitHub-specific plumbing wrapped around a generic problem. Any agent that reads untrusted content and can also take action — summarize an inbox, browse a webpage, query a ticketing system — has the same exposed nerve. The attack surface isn't the code. It's the fact that the model can't reliably tell an instruction from a sentence describing one. MCP Didn't Invent the Confused Deputy. It Industrialized It. The Model Context Protocol turned eighteen months old this past spring, and in agent circles it's already being described, only half-jokingly, as the USB-C of AI tooling — a single standard that lets an agent plug into dozens of databases, SaaS platforms, and internal systems without custom integration code for each one. That convenience is precisely why it became 2025's most interesting new attack surface. CVE-2025-49596 let attackers run arbitrary commands through unauthenticated MCP Inspector instances, rated 9.4. CVE-2025-6514, found in the widely used mcp-remote project, hit 9.6 and gave attackers OS-level command execution simply by getting an MCP client to connect to a malicious server. Researchers at Invariant Labs separately showed they could pull private repository data and WhatsApp message history out through MCP integrations that trusted server-supplied tool descriptions a little too much. That last detail is the one practitioners now call tool poisoning, and it deserves more attention than it gets. An MCP server doesn't just expose a function — it ships a natural-language description of that function for the model to read. Bury a hidden instruction inside that description, and the agent absorbs it as context with the same credulity it would extend to legitimate documentation. Layer in what researchers call a rug pull — a tool that behaved safely last week, silently swapping in malicious behavior this week, with no re-approval prompt — and you've got a supply chain risk that traditional dependency scanning has no vocabulary for. Underneath all of it sits the same architectural sin the original insider-threat literature has been naming for years: authorization quietly divorcing from authentication. An MCP server executing a database query on an agent's behalf needs to know not just that the agent is who it claims to be, but what the human or task behind that request was actually authorized to do. Skip that check, and you've built a confused deputy that will dutifully escalate its own privileges on a stranger's behalf. Where the Policy Engine Has to Live The architecture pattern that's converging across the vendors and practitioners I trust most isn't subtle, and that's its strength. You insert a policy decision point — Cerbos, Open Policy Agent, or an equivalent — directly in the path between the agent's tool calls and the systems those calls touch, so that nothing executes on trust alone: Plain Text User | v AI Agent ----(declares identity + intent)----> Policy Engine (PDP) ^ | | allow? | deny? | v | MCP Server -----> Database / API | | +---------------------(action result)----------+ The point of that middle box is to ask a boring, specific question on every single call: which agent is this, what was it actually asked to do, and does this particular action fall inside that scope? "Only SalesBot may call lookup_customer." "Any transfer above a threshold requires a human approval step before the MCP server executes it." None of that logic lives in the model's good judgment, because the model's judgment is exactly what prompt injection is designed to corrupt. The enforcement has to sit somewhere a crafted sentence can't reach it. This is also, not coincidentally, where the Cloud Security Alliance's "toxic cloud trilogy" — a public workload, a real vulnerability, and standing high-level privilege, all present at once — actually gets defused. CSA's own telemetry shows that the combination is present in 38 percent of workloads in early 2024, down to 29 percent by mid-2025, as organizations started pulling standing privilege out of the equation. That's real progress. It's also nowhere near fast enough for the rate at which agents are being deployed. What 2026 Actually Requires I don't think the next twelve months are going to be defined by a single dramatic breach, although there will probably be one anyway. I think they'll be defined by something quieter and more structural: the slow, overdue migration of agents off static, shared credentials and onto something closer to what SPIFFE and SPIRE were originally built for in the service-mesh world — short-lived, cryptographically verifiable, per-workload identity that can be issued, scoped, and revoked without anyone touching a spreadsheet of API keys. OWASP published a dedicated Non-Human Identity Top 10 in 2025 for exactly this reason; the existing application-security and human-IAM playbooks simply don't have entries for credentials that never sleep, never request access, and inherit whatever standing permission happens to be sitting there. The governance gap is still wide open. Recent industry surveys put the share of organizations with mature agent-governance programs below one in five, even as more than ninety percent of security leaders rate the problem as critical. That mismatch — high anxiety, low operational maturity — is usually the exact condition under which the expensive breach happens. My honest read, after a year of watching this space accelerate: the organizations that treat their agents as first-class, individually identified, least-privileged principals from day one will look unremarkable in hindsight. The ones that didn't will be writing the incident reports everyone else cites in 2027. More
Selective Deployment in Azure Data Factory: A Practical Blueprint for Safer CI/CD

Selective Deployment in Azure Data Factory: A Practical Blueprint for Safer CI/CD

By Sauhard Bhatt
Picture this: two features are being developed in parallel. One has already been tested in lower environments, but is still awaiting business approvalThe other is fully validated and ready to go live Naturally, you want to release the second feature to production. But you can’t, because your deployment model forces you to release everything together. If you’ve worked with Azure Data Factory (ADF), this situation probably sounds familiar. Azure Data Factory (ADF) is a cloud-based data integration service from Microsoft that helps you build and orchestrate data pipelines across systems. It works extremely well for managing data workflows — but when it comes to deployments at scale, things get tricky. As our ADF usage grew across multiple teams and environments, we started running into a recurring problem: We had control over development — but very little control over what actually got deployedA simple pipeline fix could unintentionally introduce unrelated changesParallel feature development became harder to manageProduction releases became riskier than they needed to be That’s when we realized: The issue wasn’t ADF itself — it was the deployment model we were relying on. The issue wasn’t ADF itself — it was the deployment model we were relying on. This article walks through how we addressed that challenge by implementing a selective deployment pattern, allowing us to promote only intended changes without impacting everything else. The Real Problem: Parallel Feature Releases in ADF Before diving into the solution, let’s look at a scenario that frequently occurs in real-world teams. What This Diagram Represents This diagram shows two features progressing across environments: Feature 100 Developed earlier, successfully deployed to Dev and TestCurrently in UAT (User Acceptance Testing)Still awaiting business approval before production Feature 200 Developed later, successfully completed across Dev → Test → UATFully validated and ready for production Expected Behavior At this stage, the expectation is straightforward: “Let’s release Feature 200 to production.” Feature 100 is still under testing, so it should remain in UAT. What Actually Happens in ADF Azure Data Factory follows a full-state deployment model. That means when you deploy, you are not deploying a feature; you are deploying the entire factory state. So when you attempt to release Feature 200: Feature 100 gets included automaticallyYou cannot isolate Feature 200You lose control over what reaches production Why This Becomes a Real Problem This isn’t an edge case; it becomes a recurring pattern in larger environments. You’ll encounter this when: Multiple teams are working in parallelFeatures move at different speedsUAT cycles varyProduction fixes need to be released quickly It becomes even more complex when: Existing production pipelines are modifiedPartial updates are requiredDependencies overlap across features The Core Limitation: ADF promotes state, not intent. It does not differentiate between what is ready for production and what is still under testing. Why We Had to Rethink Deployment This limitation introduced real risks: Accidental promotion of incomplete featuresDelayed production releasesIncreased coordination overheadHigher chances of breaking stable pipelines We needed a way to: Promote only Feature 200Keep Feature 100 in UATAvoid impacting unrelated artifactsReduce production risk Architecture Overview To address this challenge, we introduced a selective packaging layer between build and deployment. Flow Feature Branch → PR → Validate → Selective Packaging → ARM Export → Incremental Deploy → Trigger Control Key Idea: Instead of exporting ARM templates from the full ADF repository, we export from a filtered staging folder containing only the required artifacts. Understanding Default ADF Deployment Behavior Before implementing selective deployment, it’s important to understand how Azure Data Factory works by default. ADF follows a full-state deployment model. How Default ADF Deployment Works When you use ADF with Git integration: Developers work in a collaboration branch (typically main)Changes are committed and merged via pull requestsADF provides a Publish button in the UI When you click Publish, ADF generates ARM templates representing the entire factory state. These templates are stored in the adf_publish branch: In modern setups, instead of clicking Publish manually, teams often use @microsoft/azure-data-factory-utilities (npm-based export). This allows pipelines to validate ADF resources and export ARM templates programmatically. YAML - name: Validate ADF resources run: | set -euo pipefail FACTORY_ID="/subscriptions/${{ env.SUBSCRIPTION_ID }/resourceGroups/${{ env.RESOURCE_GROUP }/providers/Microsoft.DataFactory/factories/${{ env.SOURCE_FACTORY_NAME }" npm run build validate "${{ github.workspace }" "$FACTORY_ID" YAML - name: Export ARM templates (CI publish) run: | set -euo pipefail FACTORY_ID="/subscriptions/${{ env.SUBSCRIPTION_ID }/resourceGroups/${{ env.RESOURCE_GROUP }/providers/Microsoft.DataFactory/factories/${{ env.DEV_FACTORY_NAME }" npm run build export "${{ github.workspace }" "$FACTORY_ID" "${{ env.ARM_OUTPUT_DIR }" Whether you click Publish manually or use npm export in CI/CD, the outcome is the same: Full factory deploymentNo control over individual featuresAll changes get bundled together Selective Deployment Layer (Core Design) We can address this requirement and the associated challenges by introducing a workflow driven by a manifest to define the deployment scope, and a program to identify all necessary ADF dependencies for each manifest file. As a developer, I can now control which release is promoted to production, without worrying about releasing any other features that are not ready. The manifest controls which pipelines to deploy and which optional categories to include. Below is an example of a manifest file JSON { "pipelines": ["pl_ingest_population_selective"], "includeTriggers": false, "includeIntegrationRuntimes": false, "includeAllGlobalParameters": true, "includeLinkedServices": true, "validateLinkedServicesExist": true, "includeManagedVirtualNetwork": false, "includeManagedPrivateEndpoints": false } Workflow Explanation Let's understand the crux of the selective deployment workflow now. I am working in the release branch on my feature branch directly in ADF Studio. Since ADF Studio is integrated with Git, my development changes will be saved to my branch. Here are the steps I can take to promote my change to a higher environment. 1) Validation of ADF on PR validation This is an early validation step and a guardrail: if the PR fails, it's because objects are invalid and misaligned. This is equivalent to the "validation all" button in the ADF ui, here is this workflow Trigger: Pull requests targeting the branch selective_deployment. Purpose: Validate that the ADF JSON in the PR is valid in the context of the target factory. Main steps: CheckoutSet up Node.js 20npm installAzure login using OIDC (azure/login@v2)Validate with ADF Utilities: YAML FACTORY_ID="/subscriptions/${AZURE_SUBSCRIPTION_ID}/resourceGroups/${AZURE_RESOURCE_GROUP}/providers/Microsoft.DataFactory/factories/${DEV_FACTORY_NAME}" npm run build validate "$GITHUB_WORKSPACE" "$FACTORY_ID" 2) Release build + selective deploy to DEV adf-release-build-selective-deploy.yml Triggers: Push to selective_deploymentManual run (workflow_dispatch) with optional manifest inputDefault: deploy/manifests/release.json This workflow has two jobs: Job A: adf-build (staging + export + sanitize + artifacts) Checkout (full history)Azure login using OIDCSet up Node.js 20Install build dependencies inside build/ (npm install in build)Stage selective subset python scripts/select_adf_subset.py <manifest>, a code snippet below for the complete script, refer to the GitHub repository link given Python import json import re import shutil import sys from pathlib import Path from typing import Dict, Set, Tuple, List from collections import defaultdict # Your repo layout has pipeline/, dataset/, linkedService/ at ROOT. REPO_ROOT = Path(".") STAGE_ROOT = Path("build/adf_subset") RESOURCE_DIRS = { "pipeline": REPO_ROOT / "pipeline", "dataset": REPO_ROOT / "dataset", "linkedService": REPO_ROOT / "linkedService", "dataflow": REPO_ROOT / "dataflow", "trigger": REPO_ROOT / "trigger", "integrationRuntime": REPO_ROOT / "integrationRuntime", "credential": REPO_ROOT / "credential", "managedVirtualNetwork": REPO_ROOT / "managedVirtualNetwork", } # Copy these if present so ADF utilities behave the same on staged subset. ROOT_FILES_TO_COPY = [ "publish_config.json", "arm-template-parameters-definition.json", "arm_template_parameters-definition.json", "package.json", "package-lock.json", ] Produces: build/adf_subset/ (staged tree)build/adf_subset_report.json (dependency report)Refer to logs below (showing output of stage selective subset and debug to view output generated after select_adf_subset.py )Export ARM templates from the staged subset via ADF Utilities: npm --prefix build run build -- export "adf_subset" "$FACTORY_ID" "ArmTemplate"Produces: build/ArmTemplate/ARMTemplateForFactory.jsonbuild/ArmTemplate/ARMTemplateParametersForFactory.jsonStrip infra-owned resources scripts/strip_arm_resources.py to produce a safe template: build/ArmTemplate/ARMTemplateForFactory.safe.json⚠️ Note on Infrastructure Components (Refer to the “Future Work & Next Steps” section for follow-up topics in this series) The step above intentionally strips infrastructure-dependent components from the generated subset to avoid overwriting existing shared resources such as linked services. This implementation focuses on developer-owned artifacts (pipelines, datasets, and triggers) and assumes that infrastructure components — such as Integration Runtimes, managed private endpoints, and linked services — are pre-provisioned and managed outside of this deployment workflow.Upload artifacts: ARM templates (adf-arm)metadata (adf-release-meta)subset report (adf-subset-report) Job B: deploy_dev (deploy safe template) Download ARM artifactAzure login using OIDCEnsure az Data Factory extension is installedValidate JSON files exist/parseDeploy via azure/arm-deploy@v2(Incremental) to DEV RG/factory: Template: ARMTemplateForFactory.safe.jsonParameters: ARMTemplateParametersForFactory.json + factoryName=<DEV_FACTORY_NAME> Lesson Learned Setting up selective deployment in ADF was more than a technical task. It made us rethink our approach to deployments, ownership, and CI/CD design. Here are the main things we learned: 1. The Problem Is Not Tooling; It’s Deployment Granularity At first, we thought the limitation came from the tools we used, like UI publish or npm export. However, both methods yielded the same result: full factory templates. The real problem was that we couldn’t control the scope of deployments, not how the templates were made. 2. Dependency Awareness Is Critical Selective deployment only works when every dependency is found and included. We learned that: Pipelines often reference multiple datasets and linked services. Missing even one dependency results in deployment failure You must automate dependency discovery. 3. “Incremental” Is Often Misunderstood Incremental deployment is important, but it doesn’t work like a patch. It reapplies the full configuration for all included resources. This means: Your generated templates need to be complete for all the artifacts you include. If you use partial definitions, deployments can fail. 4. Separation of Concerns Is Key Not all ADF artifacts are the same. We began to separate them into different groups: Application-owned artifacts: pipelines, datasets, triggers Infrastructure-owned artifacts: linked service, managed virtual networks, managed private endpoints, and integration-runtime, among others. This separation proved crucial for safe, scalable deployments. 5. Selective Deployment Adds Complexity, But It’s Worth It It’s true that implementing this approach brings in additional scripts, manifest management, and CI/CD complexity. But in exchange, we gained precise control over releases, reduced production risk, and faster hotfix deployments. Future Work and Next Steps While selective deployment solved a major gap in ADF CI/CD, it also opened up new areas for improvement and standardization. 1. Defining Infrastructure vs Application Ownership One of the biggest follow-up areas is clearly defining ownership boundaries. In our experience: Application teams should own pipelines, datasets, and triggers Platform or infrastructure teams should own linked services, managed virtual networks, and managed private endpoints, among other things. Future work can focus on: Enforcing this separation in CI/CD. Preventing accidental deployment of infrastructure components Integrating Terraform or platform pipelines for infrastructure provisioning 2. Governance Around Linked Services Linked services are often shared across multiple pipelines and teams. Future improvements include: Centralizing linked service management Using Key Vault and Managed Identity consistently Preventing direct modifications through application pipelines More
Two Clocks Are Running Out at Once, and Almost Nobody Is Watching Both
Two Clocks Are Running Out at Once, and Almost Nobody Is Watching Both
By Igboanugo David Ugochukwu DZone Core CORE
What Cloud Engineers Actually Need to Know About AI Infrastructure
What Cloud Engineers Actually Need to Know About AI Infrastructure
By Naveen Kalapala
A Tool Is Not a Platform (And Your Team Knows the Difference)
A Tool Is Not a Platform (And Your Team Knows the Difference)
By Jeleel Muibi
No VIP? No Problem: Pacemaker-Based SAP HANA High Availability Using a Load Balancer Health Check
No VIP? No Problem: Pacemaker-Based SAP HANA High Availability Using a Load Balancer Health Check

High availability is a non-negotiable requirement for mission-critical SAP HANA deployments. When a primary database node goes down without an automated failover in place, the business impact is immediate. RHEL Pacemaker has long been the standard cluster manager for SAP HANA High Availability(HA) on Linux; it detects failures, fences misbehaving nodes, promotes secondaries, and orchestrates the full recovery sequence without manual intervention. The standard Pacemaker playbook for SAP HANA HA, as documented in the official documentation, relies on a virtual IP address (VIP) as the single stable network endpoint for all database traffic. Pacemaker keeps that VIP tied to whichever node is currently the active primary. When a failover happens, the VIP moves. Applications reconnect to the same address and reach the new primary without configuration changes. The problem is that this approach breaks down on many cloud platforms. Hyperscalers and private cloud environments frequently do not support traditional floating VIPs in the way bare-metal or on-premises networking does. The official RHEL Pacemaker documentation covers the VIP setup in detail and stops there. When VIPs are not available, practitioners are left to work out an alternative on their own. This article defines a production-ready alternative for exactly this scenario. The approach replaces the floating VIP with a network load balancer (NLB) and uses a Pacemaker-managed health check listener to tell the load balancer which node is the active primary at any given time. This article explains the problem, positions it against existing cloud provider approaches, and walks through the implementation step by step. How Cloud Providers Address This The challenge of replacing a floating VIP with a load balancer while still routing traffic exclusively to the active HANA primary is not new. There is published guidance on how to approach, and the core pattern is consistent across all of them. One such approach is to use an internal passthrough Network Load Balancer alongside a socat-based health check listener managed as a Pacemaker resource. The listener opens on a dedicated port in the private range (49152–65535), and the NLB probes that port to determine which backend is the primary. The approach uses the Open Cluster Framework(OCF) 'anything' resource agent to manage the socat process inside Pacemaker. The second approach is to use an Internal Load Balancer with a health probe on port 625XX (where XX is the HANA instance number). A listener on each HANA node responds to the probe, but only the primary has the listener active. In some configurations, HAProxy is used rather than socat as the listener. The implementation discussed in this article adds to this landscape a clean approach using a native systemd service registered directly as a Pacemaker resource instead of the OCF 'anything' agent or HAProxy, and it targets RHEL specifically. The systemd approach keeps the setup self-contained, auditable, and consistent with how most RHEL administrators already manage services. It works on any cloud provider or private cloud environment that supports network load balancers. Architecture Overview The diagram below shows the two-node SAP HANA cluster, the network load balancer, and how the health check listener connects them. The NLB's backend pool includes both HANA nodes on the standard HANA port (3XX15), but the health probe targets a separate port, 62500, that only the active primary exposes. Overall cluster architecture The NLB sees both nodes as members of its backend pool. Because only the primary node has anything listening on port 62500, the NLB marks the secondary as unhealthy for routing purposes and sends all traffic to the primary. When Pacemaker promotes the secondary during a failover, it starts the listener on the new primary as part of the same orchestration sequence. The NLB detects the change on its next health check cycle and shifts all traffic accordingly. Failover Sequence The diagram below shows the sequence of events from the moment the primary node fails to the moment applications reconnect through the load balancer. Failover sequence from node failure to reconnection Two timing factors govern the total recovery window. The first is Pacemaker's fencing and promotion sequence, typically 30 to 90 seconds, depending on the STONITH method and HANA replication state. The second is the NLB health check interval, which determines how quickly the load balancer detects the new primary after Pacemaker completes its promotion. For production environments, tuning both values together is worth the effort Pacemaker Resource Model The diagram below maps the Pacemaker resource hierarchy and constraints used in this setup. Understanding the resource model helps clarify why both the colocation and ordering constraints are necessary. The colocation constraint (score=INFINITY) tells Pacemaker that lb_healthcheck must always run on the same node as the promoted HANA primary. If the promoted primary moves, the health check listener moves with it. The ordering constraint ensures the listener does not start until HANA has fully completed its promotion, preventing the load balancer from routing traffic to a node that is still finishing its takeover sequence. Prerequisites The following must be in place before starting the implementation: Two RHEL virtual servers with access to the Red Hat High Availability Add-On repositorySAP HANA installed on both servers with HANA System Replication configuredPacemaker installed and configured through section 5.7 of the official Red Hat SAP HANA HA guide, sections 5.8 and 5.9 (virtual IP configuration) are intentionally skippedA network load balancer provisioned with both HANA nodes in the backend pool, backend port set to 3XX15 (where XX is the HANA instance number)socat installed on both HANA nodesFirewall rules permitting TCP traffic on port 62500 from the NLB health check source addresses socat is available in standard RHEL repositories. Install it with: sudo dnf install socat -y Step-by-Step Implementation Step 1: Create the Systemd Health Check Service Run the following command on both HANA nodes. It creates a systemd unit file that uses socat to open a TCP listener on port 62500. The listener accepts any connection and returns success immediately; that response is all the load balancer needs. Shell cat <<EOF > /etc/systemd/system/lb-healthcheck.service [Unit] Description=LB healthcheck listener for active SAP HANA primary After=network-online.target Wants=network-online.target [Service] Type=simple ExecStart=/usr/bin/socat TCP4-LISTEN:62500,reuseaddr,fork EXEC:/bin/true Restart=always RestartSec=2 [Install] WantedBy=multi-user.target EOF Do not enable this service manually. Pacemaker will control its lifecycle entirely. Step 2: Reload Systemd After writing the unit file, reload systemd on both nodes so it registers the new service: Shell systemctl daemon-reload Step 3: Prevent the Service From Starting Automatically Explicitly disable and stop the service. If both nodes have the listener running simultaneously, the load balancer will consider both healthy and will route traffic to either node, which defeats the entire purpose of the setup. Shell systemctl disable lb-healthcheck systemctl stop lb-healthcheck Step 4: Create the Pacemaker Resource Register the systemd service as a Pacemaker-managed resource. From this point forward, Pacemaker owns the start, stop, and monitoring of the listener. Shell pcs resource create lb_healthcheck \ systemd:lb-healthcheck \ op monitor interval=10s timeout=20s Pacemaker will now monitor the listener every 10 seconds and automatically relocate it during failover events. Step 5: Add the Colocation Constraint This is the constraint that enforces the listener always runs on the same node as the promoted SAP HANA primary. Without it, Pacemaker might place the resource on either node. Shell pcs constraint colocation add lb_healthcheck \ with Promoted cln_SAPHanaCon_P01_HDB01 \ score=INFINITY Replace P01_HDB01 with the actual SID and instance number for the environment. For example: if SID is PRD and instance number is 00, use PRD_HDB00 Step 6: Add the Ordering Constraint The ordering constraint prevents the health check listener from starting until after the HANA promotion is fully complete. Without this, a race condition could cause the load balancer to route traffic to a node that is still mid-promotion. Shell pcs constraint order promote cln_SAPHanaCon_P01_HDB01 \ then start lb_healthcheck Step 7: Validate the Pacemaker Configuration Verify that both constraints are correctly registered in the cluster: Shell pcs constraint config The output should contain both of the following entries: Plain Text Colocation Constraints: Started resource 'lb_healthcheck' with Promoted resource 'cln_SAPHanaCon_P01_HDB01' score=INFINITY Order Constraints: promote resource 'cln_SAPHanaCon_P01_HDB01' then start resource 'lb_healthcheck' Step 8: Verify Listener Placement Confirm that only the active primary node is listening on port 62500. Run this command on each node: Shell ss -lntp | grep 62500 On the primary node, the output should show a LISTEN entry on 0.0.0.0:62500. On the secondary node, the command should return nothing. Plain Text # Expected on PRIMARY node: LISTEN 0 5 0.0.0.0:62500 0.0.0.0:* # Expected on SECONDARY node: # (no output) If both nodes show the listener, the colocation constraint is either missing or incorrect. If neither node shows it, check that the HANA clone resource is in the Promoted state with: pcs status Comparison: VIP Approach vs. NLB Health Check Approach The diagram below summarizes the trade-offs between the traditional VIP approach and the NLB health check approach described in this article. Comparison The VIP approach cuts over faster because there is no dependency on an external health check interval. The IP simply moves to the new primary node. It requires the underlying network to support IP address mobility, which cloud environments typically do not. The NLB approach works across any cloud or private cloud environment that supports network load balancers. The trade-off is that traffic cutover depends on the NLB's health check interval in addition to Pacemaker's promotion time. The cloud documentation on major cloud providers acknowledges this trade-off explicitly: using an NLB with a health check listener is their recommended approach for all SAP HANA HA deployments, and they provide the same socat-based pattern using the OCF 'anything' resource agent. The approach documented here achieves the same outcome using a systemd service, which many operators find more familiar and easier to audit. Operational Notes and Tuning A few things are worth keeping in mind when running this setup in production. NLB health check interval: The faster the health check interval, the shorter the window between Pacemaker completing its promotion and the NLB redirecting traffic. A 5-second interval is common in Cloud SAP HA documentation. Setting this too low can cause false positives during normal HANA replication lag. STONITH configuration: This solution assumes STONITH (fencing) is configured as part of the base Pacemaker setup. Without STONITH, Pacemaker will not promote the secondary during a primary failure. STONITH ensures the failed node is definitively powered off before promotion proceeds, preventing split-brain. Port 62500 vs. 625XX convention: Cloud providers use the convention 625XX (where XX is the instance number) for their SAP HANA health check ports. Cloud's documentation recommends using any port in the private range 49152 to 65535. Port 62500 used in this setup falls within that range and does not conflict with standard HANA ports. Teams following other cloud provider conventions can substitute 625XX if they prefer consistency across environments. Testing failover: After setup, the full failover sequence should be tested by killing the primary HANA process (not the OS) and verifying the NLB redirects traffic to the new primary within the expected time window. The pcs status command is the primary tool for watching the Pacemaker side of the transition. Conclusion The standard RHEL Pacemaker documentation for SAP HANA HA assumes a virtual IP is available. Not all hyperscalers provide VIP. The solution fills that gap cleanly: replace the VIP with a network load balancer hostname, and use a Pacemaker-managed socat listener to tell the load balancer which node is the primary at any given time. The core pattern NLB health probe targeting a Pacemaker-owned listener is the same pattern major cloud providers use in their own SAP HA documentation. What this implementation adds is a clean systemd service approach for RHEL, without needing the OCF 'anything' resource agent or additional proxy software. The setup comes down to eight steps: write a systemd service, disable it from auto-starting, register it as a Pacemaker resource, and add two constraints. The constraints — one for colocation, one for ordering — are what tie the listener's lifecycle to the HANA primary promotion sequence and make the whole thing work reliably across failovers. For teams running SAP HANA on RHEL in environments where VIPs are not an option, this is a production-ready path forward that relies entirely on standard RHEL tooling.

By Vidyasagar (Sarath Chandra) Machupalli FBCS DZone Core CORE
Sharing SBOMs Securely Without Giving Too Much Away
Sharing SBOMs Securely Without Giving Too Much Away

SBOMs Create Transparency, But Not Without Risk The Software Bill of Materials, or SBOM, has changed meaning in recent years. It used to be seen as a technical tool for internal inventory management. It is now required as evidence due to regulations. The European Cyber Resilience Act will require digital product manufacturers to reliably document the composition of their software. The NIS 2 Directive increases pressure on operators of essential entities to secure their supply chains in a traceable way. The United States Executive Order 14028 made the SBOM a requirement in government procurement as early as 2021. As a result, the bill of materials evolved from a voluntary artifact to a mandatory disclosure. This rise in importance exposes a conflict of objectives that cannot be resolved, only managed. The bill of materials is designed to establish trust, enable verifiability, and allow quick response to vulnerabilities. Yet it also reveals how a software product is built. It lists third-party components, their versions, and potential vulnerability points. It lets people guess architectural choices and competitively relevant strategies. A complete bill of materials acts as both evidence and blueprint. Publishing it carelessly confuses transparency with surrender. This article argues that the way sharing is controlled, not just the act of sharing, determines whether it helps or harms. Why Complete SBOMs Contain Sensitive Information To see the importance of the conflict, it helps to examine what a complete bill of materials contains. It is not simply a list of libraries used. It frequently includes precise version numbers, the full transitive dependency chain, sometimes internal package names, references to private artifact sources, and metadata about the generators and build process. Each detail may seem harmless on its own. Taken together, they provide a detailed profile of a product’s technical makeup. For readers of a developer publication, this risk is very clear. Applications built with Maven or Gradle often have deep, branched transitive dependency chains. A single library can pull in dozens more. A complete bill of materials shows these chains in full detail. It allows others to see which vulnerabilities may affect a product. It also shows which internal components the manufacturer uses, which frameworks it avoids, and where it is using outdated versions. This intended security measure can become a manual for attackers. The sensitivity of a bill of materials is not simply a side issue, but its core property. Least Disclosure: Sharing as Controlled Disclosure From this understanding comes the key idea: minimal disclosure, or least disclosure. This means you should only share as much as a person really needs for their purpose — no more — and you should be able to prove it. This principle clears up a common misunderstanding. Many assume SBOM sharing means publishing everything. In reality, sharing a bill of materials does not mean making all details broadly available. It is a controlled act. Content, recipient, and context are weighed together. The key question is not whether to share, but what to share, with whom, and under what conditions. This shift sets apart controlled transparency from unintentional overexposure. A minimal disclosure approach views the SBOM not as a single document to send but as a database from which to generate specific views for each need. The technical architecture discussed next builds on this idea. Different Recipients, Different Information Needs To share only what’s needed, you first have to know who you are sharing with, because each group needs different info. You can think of four main groups, and what they need shapes the whole process. The public typically only needs a basic view. For them, listing the component name, license, and project reference is enough. This satisfies the need for transparency, especially for open-source software, without revealing internal structure. Customers need more details. They must analyze risks and justify purchases. They rely on version levels and dependency metadata. Auditors and authorities focus on dependability, not detail. They require evidence that is verifiable and complete. Suppliers and internal teams need operational details. They work with deep data to manage and edit bills of materials together. These differences lead to an important reality. A single, universal SBOM view is too crude in both detail and security. Trying to serve all users the same way usually fails, frequently resulting in email attachments. This practice lacks control and should be avoided. Public Transparency vs. Private Exchange Because the recipients differ, a strong structural separation is needed. Any proper disclosure model must separate public transparency from private exchange. Public transparency is a deliberately limited, open view of the bill of materials. Anyone can access it. Private exchange is the controlled transfer of more detailed information to authorized parties. Do not combine these two modes, whether in technology or organization. If you do, the line between public and private details blurs. Exodos Labs’ model shows this separation well and is used here as an example. It draws a clear line between a public “SBOM Trust Center” and a private “Secure Exchange.” The Trust Center gives a continuously updated, defensible public view. The Secure Exchange allows controlled sharing with specific organizations. The architecture’s main advantage is its clear separation. It makes overexposure harder by assigning public and private data to separate channels from the start. Redaction: Several Secure Views From One Bill of Materials Separating public and private sharing does not fully explain how different views can come from a single database. This is where redaction becomes vital. Redaction is not only about deleting fields. It reduces, masks, aggregates, or hides information based on the recipient. In practice, internal package sources and private registry references may be removed entirely. Transitive dependencies can be summarised rather than listed. Sensitive build and generator metadata can be hidden from certain recipients. Several secure views emerge from the full bill of materials. A minimal public view might show only the component name, license, and project reference. An extended view for authorized customers can include version and dependency details. A contractually protected view might be released after a non-disclosure agreement is signed. The example model supports such selective redaction and can create recipient-specific views. The key point is this: Do not distribute a complete bill of materials and then cut it down. Instead, generate intentionally designed views from the full data set. Each view ought to match the needs and openness suitable for its audience. Access Control Beyond Simple Roles Once you define the views, you must decide how to control access. Simple role models are often not enough. Just being a "customer" or "partner" does not mean someone should see everything. Whether a specific customer can access a certain view depends on more than just their category. More appropriate is attribute-based access control, which combines a range of characteristics before releasing a view. Among these characteristics are the associated organization, the product-related entitlement, the contractual status, and, where applicable, the status of a non-disclosure agreement, the assignment to a specific release, the regulatory context of a request, the release status, and any temporal limitation on access. Only the interaction of these attributes decides which view a requester actually receives. The example model relies precisely on such attribute-based control, combined with the redaction described earlier. The conceptual added value lies in scalability: whereas rigid roles become unmanageable as the number of recipients and special cases grows, attribute-based rules can be enforced consistently even across large circles of recipients. With this, the question of who decides on disclosure is settled — complementing the previously treated question of what is concealed in the first place. Demonstrability: Auditability and Release Binding Controlled disclosure calls for not only that the right measure of information be given to the right party, but also that it be provable. Demonstrability here comprises two sides that belong together, because both answer the same fundamental question: what can the parties involved rely upon? The first side concerns auditability. SBOM sharing is controllable only if it can be traced without gaps, who requested access, who granted it, which view was displayed, and which version was exported. The status of a non-disclosure agreement, its revocation, and temporal limitations likewise belong in this audit trail. An immutable audit trail transforms sharing from a passing file transfer into a provable transaction; in the event of dispute, it replaces assertion with evidence. The second side concerns the binding of a bill of materials to a concrete artifact. A bill of materials is dependable only when it is unambiguously established to which release, to which build, to which JAR or WAR file, to which container image, Git tag, artifact hash, or container digest it belongs. In the case of a security incident in particular, this assignment decides the capacity to act: without it, it remains open whether the bill of materials at hand actually describes the delivered artifact or a long-superseded state. Auditability thus proves who saw what and when; release binding proves what this view refers to in the first place. Together, the two establish the trust that a bill of materials is meant to instill. CI/CD Integration and Conclusion However demanding the mechanisms described may appear, in practice, they most frequently founder on a plain circumstance: manual maintenance. Bills of materials compiled by hand, updated after the fact, and published on static pages inevitably grow stale and thereby lose their value. The consequence is evident: the generation, validation, versioning, and publication of a bill of materials belong in the build and release pipeline. For development teams, this means close integration with Maven, Gradle, and CI/CD processes, so that a current bill of materials is generated with every build, automatically checked against quality criteria, and made publicly available. The example model illustrates this by feeding the Trust Center continuously from the supply chain, so that public disclosure always corresponds to the actual state, and the recurring question of which bill of materials is current does not even arise. Against this background, the typical mistakes that a well-considered approach sidesteps can be named. They range from complete public publication, through dispatch by email, the mixing of public and private views and the absence of a redaction strategy, to missing release processes, deficient auditing, absent release binding, manually maintained disclosure pages, and unprovided-for means of revocation. Each of these mistakes is ultimately a variation on the same fundamental error of equating transparency with maximal disclosure. It is precisely this equation that must be overcome. Bills of materials are necessary for trust, regulatory compliance, and the security of the software supply chain, yet maximal disclosure does not automatically lead to greater transparency. What is decisive is to provide information that is correct, up to date, and appropriate for the target group, and to do so demonstrably. Secure bills of materials arise not through complete publication, but through suitable views for the right recipient in the right context. Whoever takes this to heart transforms the bill of materials from a risk into an instrument.

By Sven Ruppert DZone Core CORE
Code and Connect: MCP + MuleSoft
Code and Connect: MCP + MuleSoft

I often find myself in conversations where the same words keep popping up again and again: Agents, MCP, and A2A. Everyone seems excited about them. But the funny part is that when the topic shifts to MCP (Model Context Protocol), the explanations start to vary. One day, someone confidently said, “An MCP server is basically a tool.” Another person immediately disagreed and replied, “No, no — MCP is more like a client.” Before that debate could settle, someone else joined the conversation and said, “Actually, MCP is just a protocol.” And then another perspective appeared: “Think of it as middleware that sits between an agent and APIs.” At that moment, I realized something interesting: we were all talking about the same concept, yet each of us understood it a little differently. These conversations made me curious. If experienced developers and architects describe MCP in different ways, how confusing must it be for someone who is just starting to explore this space? The more I listened, the more I noticed a pattern — people weren’t wrong, but they were often describing only one piece of the puzzle. That realization is what inspired this blog. In this article, I want to step back from the buzzwords and walk through the concepts in a simple way. What exactly is MCP? Is it a server? A tool? A client? Or something else entirely? And how does it relate to the agents that everyone keeps talking about? Is it applicable only to agents, or is it applicable to assistants also? We will also explore MuleSoft's capability in this space. By the end of this post, my goal is to bring clarity to these terms and show how they connect. Instead of hearing multiple interpretations in different conversations, you’ll be able to see the complete picture of how MCP fits into modern AI and integration architectures. Let's Understand What Anthropic Says About MCP MCP (Model Context Protocol) is an open-source standard for connecting AI applications to external systems. Think of MCP like a USB-C port for AI applications. Just as USB-C provides a standardized way to connect electronic devices, MCP provides a standardized way to connect AI applications to external systems. MCP at high level Now let's break down each component and understand it in the simplest way possible. AI Application AI application can be any application that consists of an LLM, orchestration, and tools (You can think of it as assistants), or it may consist of more complex components such as Agent Orchestration, specialized agents, and Tools(You can think of it as an agentic application). Tools can be a Payment Gateway, a Data Retrieval API, a Weather API, a File System, a WebSearch, etc. MCP Model Context Protocol is an open protocol that enables seamless integration between AI applications (LLM Applications) and external data sources and tools. MCP provides a standardized way to connect LLMs with the context they need. MCP follows a client-server architecture. Key components of this architecture are MCP Host, MCP Client, and MCP Server. Let's extend our previous architecture. MCP architecture MCP Host It is nothing but a Host where the AI application is running. MCP Client It is a component that establishes a connection with the MCP Server and gets the context for the MCP Host to use. MCP Server It consists of external services that provide context to LLMs. Model Context Protocol consists of two layers: Data layer: The data layer implements a JSON-RPC 2.0 (JRPC) based exchange protocol that defines the message structure and semantics for client-server communication.Transport layer: The transport layer manages communication channels and authentication between clients and servers. It handles connection establishment, message framing, and secure communication between MCP participants.MCP supports two transport mechanisms: Stdio transport: Uses standard input/output streams for direct process communication between local processes on the same machine, providing optimal performance with no network overhead.Streamable HTTP transport: Uses HTTP POST for client-to-server messages with optional Server-Sent Events for streaming capabilities. This transport enables remote server communication and supports standard HTTP authentication methods, including bearer tokens, API keys, and custom headers. MCP recommends using OAuth to obtain authentication tokens. Use Case We can think of "Weather Intelligence Agent," which uses the MCP server to make a call to a tool that provides weather information based on a city name. This is a simple use case just to demonstrate how an API is called as a tool using MCP. We will use Postman and Cursor to mimic as Agent/Assistant, which will call the Weather API. Let's see how we can implement this use case using MuleSoft: Step 1: MuleSoft provides the MCP Server - Tool Listener connector. We will configure the MCP Server. MuleSoft code Refer to the code: XML <?xml version="1.0" encoding="UTF-8"?> <mule xmlns:ee="http://www.mulesoft.org/schema/mule/ee/core" xmlns:http="http://www.mulesoft.org/schema/mule/http" xmlns:mcp="http://www.mulesoft.org/schema/mule/mcp" xmlns="http://www.mulesoft.org/schema/mule/core" xmlns:doc="http://www.mulesoft.org/schema/mule/documentation" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.mulesoft.org/schema/mule/core http://www.mulesoft.org/schema/mule/core/current/mule.xsd http://www.mulesoft.org/schema/mule/mcp http://www.mulesoft.org/schema/mule/mcp/current/mule-mcp.xsd http://www.mulesoft.org/schema/mule/http http://www.mulesoft.org/schema/mule/http/current/mule-http.xsd http://www.mulesoft.org/schema/mule/ee/core http://www.mulesoft.org/schema/mule/ee/core/current/mule-ee.xsd"> <http:listener-config name="HTTP_Listener_config" doc:name="HTTP Listener config" doc:id="251f2d7c-e84b-4974-a1e8-96d9779bc9e9" > <http:listener-connection host="0.0.0.0" port="8081" /> </http:listener-config> <mcp:server-config name="MCP_Server" doc:name="MCP Server" doc:id="289fb886-e732-4274-990e-9876aca405a6" serverName="mule-mcp-server" serverVersion="1.0.0"> <mcp:streamable-http-server-connection listenerConfig="HTTP_Listener_config"/> </mcp:server-config> <http:request-config name="HTTP_Request_config" doc:name="HTTP Request config" doc:id="b31d7d79-b45b-42ec-a970-50eb19a0a702" > <http:request-connection protocol="HTTPS" host="api.weatherstack.com" /> </http:request-config> <flow name="mcp-weahter-intelligence-apiFlow" doc:id="b1c21d3c-18f0-4eac-bb4e-3cf789608580" > <mcp:tool-listener doc:name="MCP Server - Tool Listener" doc:id="4c42c1cb-898d-4fb9-8d0e-edc541fffb75" config-ref="MCP_Server" name="get_weather_information"> <mcp:description ><![CDATA[This tool gets weather information. Check weather details for device by providing the city name as input or paramValue. Please use the query.]]></mcp:description> <mcp:parameters-schema ><![CDATA[{ "$schema": "http://json-schema.org/draft-07/schema#", "type": "object", "properties": { "query": { "type": "string", "description": "city for querying weather data" } }, "required": ["query"], "additionalProperties": false }]]></mcp:parameters-schema> <mcp:responses > <mcp:text-tool-response-content text="#[payload.^raw]" priority="1"> <mcp:audience > <mcp:audience-item value="ASSISTANT" /> </mcp:audience> </mcp:text-tool-response-content> </mcp:responses> </mcp:tool-listener> <http:request doc:name="Request" doc:id="d10760de-5f93-4f63-aadc-9bfc491f94e0" config-ref="HTTP_Request_config" path="/current"> <http:query-params ><![CDATA[#[output application/java --- { "access_key" : "96d01954d0c4e444aa781fa10b92caff", "query" : payload.query, "units" : "m" }]]]></http:query-params> </http:request> </flow> </mule> Let's run this code and test it: MCP server started successfully: Deployment log Step 2: Let's use Postman as the MCP client to test it and see if it is working as expected: MCP server and available tools Step 3: Click on Connect: Connected to MCP Server Step 4: Now the MCP client is connected to the MCP server. You need to pass a query parameter as the city name, and you will get the weather details: I am writing this Blog from GOA (The Beach Capital of India). I will use GOA as the City name to retrieve weather information about GOA. Use the tool Step 5: Click on Run, and you will get the response as shown below: Response I have demonstrated it in my local version of code, which is deployed in Anypoint Studio. Let's test the same after deploying it to the runtime manager. I have deployed the code to the runtime manager. Deployed in the Anypoint platform Test result I have demonstrated this using Postman, where Postman worked as an MCP client to connect to the MCP server. We can extend it further and use Cursor to mimic the agentic behavior where the agent will use the MCP tool to get the answer. Cursor to use MCP I have used no code/low code tool, which is MuleSoft. In the next blog, I will use Python code to demonstrate the same. Watch the video for more details. Let me know if you liked it!

By Ajay Singh
REST-Assured Configuration and Specifications: Writing Maintainable API Tests
REST-Assured Configuration and Specifications: Writing Maintainable API Tests

When working on API automation projects, one of the first things that becomes repetitive is configuring the same settings for every test. The base URL, content type, request logging, and common response validations often appear in multiple test classes. As the number of tests increases, maintaining these repeated configurations becomes difficult. REST Assured provides specifications to solve this problem. Instead of defining the same settings in every test, common configurations and specifications can be created once and reused throughout the test suite. This article demonstrates a simple approach to configuring REST Assured using a Base Test class along with Request and Response Specification. What Are REST-Assured Specifications? A specification is a reusable configuration object that contains common request or response settings. So, instead of repeatedly writing: Java given() .baseUri("https://api.example.com") .header("Authorization", "Bearer token") .contentType(ContentType.JSON) The configuration can be defined once and reused across multiple tests. Similarly, the common validations can also be written using the specifications. Specifications help in: Reduce code duplicationImprove test readabilityCentralize API configurationsSimplify maintenanceStandardize request and response validations Why Use Specifications? Consider an API test that retrieves user details. Java @Test public void getUserDetails() { given() .baseUri("https://api.example.com") .when() .get("/orders/2") .then() .statusCode(200); } The test works correctly, but the base URI and common validations, such as status code, will need to be repeated in every test. A better approach is to move these common settings into reusable specifications. What Problem Does It Solve? In many API automation projects, test cases often contain repeated configuration code. The same base URL, content type, authentication details, headers, and response validations are repetitive across multiple test classes. While this may not seem like a problem when there are only a few tests, maintaining the test suite becomes difficult as the project grows. Consider a scenario where the API base URL changes from a QA environment to a Staging environment. Without a centralized configuration, every test containing the old URL would need to be updated. Similarly, if a common header or authentication mechanism changes, modifications would be required in multiple places. Request and Response Specifications solve this problem by moving common configurations into reusable objects. Instead of repeating the same setup in every test, the configuration is defined once and reused wherever required. This reduces code duplication, improves readability, and makes the test suite easier to maintain. As a result, test methods can focus on validating business functionality rather than configuring API requests and responses. This leads to cleaner and more maintainable automation code. Creating a SetupSpecification Class The most common configurations should be placed in a separate class. This allows all test classes to inherit the same setup. The following example creates a Request and Response Specification in a separate class using the @BeforeClass annotation. Java public class SetupSpecification { @BeforeClass public void setup () { final RequestSpecification request = new RequestSpecBuilder () .addHeader ("Content-Type", "application/json") .setBaseUri ("http://localhost:3004") .addFilter (new RequestLoggingFilter ()) .addFilter (new ResponseLoggingFilter ()) .build (); final ResponseSpecification response = new ResponseSpecBuilder () .expectResponseTime (lessThan (10000L)) .build (); RestAssured.requestSpecification = request; RestAssured.responseSpecification = response; } } This setup method runs before the test class execution. The Request Specification contains the base URI, content type, and logging configuration. Any configuration defined in a Request Specification will be applied to every API request that uses that specification. For example, if the specification includes a common header, authentication token, content type, or query parameter, those values will automatically be sent with all requests that reference the specification. While this promotes reusability and reduces duplication, care should be taken when adding request-specific details to a shared specification. Not all APIs may require the same headers, authentication mechanisms, query parameters, or request bodies. Including such configurations in a common specification can lead to unintended behavior and make tests more difficult to maintain. The Response Specification contains the common validations that are expected from the API response. The expectResponseTime() method validates that the API responds within the specified time limit. Additionally, we can also add the validations for: Status CodeHeadersContent-TypeCookieBody However, it is important to understand that any validation defined in a Response Specification will be applied to every API test that uses that specification. For example, if the specification includes a validation for a 200 status code, all tests using that specification will automatically expect a 200 response. This may not be appropriate for APIs that are expected to return different status codes, such as 201, 204, 400, or 404. The same consideration applies to validations related to headers, content type, cookies, and response body content. Including endpoint-specific validations in a shared specification can reduce flexibility and make tests harder to maintain. A good practice is to keep only the truly common validations in a shared Response Specification and add endpoint-specific assertions within the individual test methods. The statement below makes the Request Specification available globally for the test execution. Java RestAssured.requestSpecification = request; RestAssured.responseSpecification = response; As a result, the base URI and header(Content-Type), and validation to check the response time do not need to be specified in every test. Writing a Test Using the Specifications Once the setup is complete, test classes can extend the SetupSpecification class. Java public class TestGetRequestWithRestAssuredSpecs extends SetupSpecification { @Test public void getRequestTestWithRestAssuredConfig () { final int orderId = 3; given ().when () .queryParam ("id", orderId) .get ("/getOrder") .then () .statusCode (200) .and () .assertThat () .body ("orders[0].id", equalTo (orderId), "orders[0].product_name", equalTo ("USB-C Charger")); } } The Request Specification is automatically applied because it was configured in the SetupSpecification class. It means all the common request configurations, such as the base URI, headers, content type, and logging settings, are automatically applied to the request. Similarly, the common response validations configured for expected response time in the SetupSpecification class are reused during test execution. The test itself focuses only on endpoint-specific details by passing the id query parameter, invoking the /getOrder endpoint. This approach keeps the test concise and improves maintainability by separating common configuration from test-specific assertions. Adding Additional Assertions The Response Specification can handle common validations, while endpoint-specific assertions can still be added in the test. Java public class TestGetRequestWithRestAssuredSpecs extends SetupSpecification { @Test public void getRequestTestWithRestAssuredConfig () { final int orderId = 3; given ().when () .queryParam ("id", orderId) .get ("/getOrder") .then () .statusCode (200) .and () .assertThat () .body ("orders[0].id", equalTo (orderId), "orders[0].product_name", equalTo ("USB-C Charger")); } } In this example, the response body validations for order ID and product name remain inside the test because they are specific to this API endpoint. Why This Approach Is Useful As the test suite grows, hundreds of API tests may use the same base URL, content type, authentication, and response validations. Maintaining these configurations in every test class can quickly become difficult. Keeping the Request and Response Specifications in a separate class provides a centralized location for managing common settings. If the API URL changes or additional configurations need to be added, only a single file needs to be updated. This approach also improves readability because the test methods contain only the business validations relevant to the API being tested. Using Request and Response Specifications Directly in the Test Class While many automation projects prefer keeping specifications in a separate class, there are situations where creating specifications directly inside the test class makes sense. This approach is useful for smaller projects, proof-of-concept implementations, or when a test class requires its own configuration that is not shared with other tests. In this approach, the Request and Response Specifications are created using the @BeforeClass annotation and are available only within the current test class. Java public class StringRelatedAssertionTests { private static ResponseSpecification responseSpecification; private static RequestSpecification requestSpecification; @BeforeClass public void setupSpecBuilder () { final RequestSpecBuilder requestSpecBuilder = new RequestSpecBuilder ().setBaseUri ( "https://api.restful-api.dev/objects") .addQueryParam ("id", 3) .addFilter (new RequestLoggingFilter ()) .addFilter (new ResponseLoggingFilter ()); final ResponseSpecBuilder responseSpecBuilder = new ResponseSpecBuilder ().expectStatusCode (200); responseSpecification = responseSpecBuilder.build (); requestSpecification = requestSpecBuilder.build (); } @Test public void testStringAssertions () { given ().spec (requestSpecification) .get () .then () .spec (responseSpecification) .assertThat () .body ("[0].name", equalTo ("Apple iPhone 12 Pro Max")) } } In this example, the Request and Response Specifications are created once in the @BeforeClass method and stored in static variables. The Request Specification contains common request details such as the base URI, query parameters, and logging filters, while the Response Specification defines the expected status code. During test execution, the Request Specification is applied using the spec(requestSpecification) method before sending the request. After the response is received, the Response Specification is applied using spec(responseSpecification) to validate the common response expectations before performing additional assertions on the response body. Keeping the specifications and test logic within the same class makes the example easy to follow, as both the setup and test execution are located in a single file. However, as the test suite grows and multiple test classes require the same configurations, duplicating specifications across classes can become difficult to maintain. In such situations, moving the common Request and Response Specifications to a separate class provides better reusability and reduces code duplication. For smaller projects or learning purposes, defining the specifications directly within the test class remains a simple and effective approach. Summary Rest-Assured Specifications help create cleaner and more maintainable API automation tests. A best practice is to define Request and Response Specification in a separate class and initialize them using the @BeforeClass annotation. The Request Specification manages settings such as the base URI, content type, and logging, while the Response Specification handles common response validations. By centralizing these configurations, test classes become shorter, easier to read, and simpler to maintain. For API automation frameworks built with REST Assured and TestNG, this pattern provides a clean foundation that scales well as the number of tests increases.

By Faisal Khatri DZone Core CORE
Your Biggest Identity Problem Isn't Your Employees Anymore; It's Everything Else
Your Biggest Identity Problem Isn't Your Employees Anymore; It's Everything Else

I used to open identity audits by asking a CISO how many users were on their network. These days, I ask a different question first: how many non-human identities do you have, and when was the last time anyone counted? Most of the time, the answer is a long pause, followed by a number that's wrong, followed by an admission that it's wrong. That pause is the whole story of identity security in 2026. CyberArk's 2025 Identity Security Landscape report, based on a survey of 2,600 security decision-makers across 20 countries, put a hard number on what I'd been seeing anecdotally for two years: machine identities now outnumber human identities by more than 80 to 1 in the average enterprise. Service accounts, API keys, certificates, container workloads, CI/CD pipeline tokens, and now AI agents acting on behalf of users — all of it stacking up faster than anyone is governing it. Clarence Hinton, CyberArk's Chief Strategy Officer, said it plainly when the report came out: the privileged access of AI agents represents an entirely new threat vector. He's not wrong, and the part that should bother you is that "new" undersells how fast it's already arrived. Gartner's framing in its 2026 IAM predictions research is just as blunt: human and machine identities have jointly become the primary attack surface, and the firm expects nearly a third of enterprises to be running AI agents that execute workflows autonomously, at machine speed, by the end of this year. Traditional IAM — built around the assumption that a human logs in, gets a session, and logs out — was never designed for an actor that authenticates itself, chains five API calls together in under a second, and never sleeps. The advice Gartner keeps repeating to CISOs boils down to three things: register every machine actor as a first-class identity, automate the entire credential lifecycle instead of trusting humans to rotate things on schedule, and write authorization policy that treats "agent" as its own subject type, not an edge case bolted onto human IAM. Machine and IoT Identities: Stop Treating Them Like an Afterthought Here's the uncomfortable reframe I give every team I work with: a service account is not a lesser version of a user account. It needs its own identity lifecycle — provisioning, attestation, rotation, and deprovisioning — and it needs it whether it's a Kubernetes pod, an IoT sensor on a factory floor, or an AI agent with a standing connection to your CRM. The instinct to issue a long-lived API key once and forget about it is exactly the instinct that's been getting enterprises breached. This is where SPIFFE and SPIRE earn their keep. SPIFFE — the Secure Production Identity Framework For Everyone — graduated to the Cloud Native Computing Foundation's highest maturity tier in August 2022, and adoption since then has only accelerated; known production users include Bloomberg, ByteDance, Pinterest, Block (where the project originated), Uber, and Yahoo Japan, with HashiCorp, Google, IBM, and Intel building on top of it. The pitch is simple, and the implementation is not: every workload gets a cryptographically verifiable SPIFFE ID, short-lived X.509 SVIDs replace long-lived static credentials, and identity gets attested at the node and workload level rather than assumed from a network position. Andrew Moore, Uber's Platform Authentication Tech Lead, described SPIFFE as the "northstar foundation of securing all production interactions" when the project graduated — and having sat through enough postmortems where the root cause was a hardcoded credential in a config file, I understand exactly why he'd put it that way. For IoT specifically, the same principle holds with extra friction: device certificates and public-key-based provisioning at manufacture time beat shared secrets baked into firmware every time, because a shared secret leaked from one device compromises the fleet, while a compromised device certificate compromises one device. The annoying part is that retrofitting this onto an existing IoT deployment is expensive and slow. The expensive part doesn't go away by ignoring it; it just moves from a planned budget line to an incident response invoice. Zero Trust in Practice: What "Per-Call Auth" Actually Means Zero trust as a phrase has been diluted by marketing decks to the point of meaninglessness, so let me be specific about the part that matters here: every single call between services should carry its own authorization decision, independent of network location and independent of whatever broader session or token initiated the chain. A service mesh with mTLS enforced at the sidecar, a Kubernetes admission controller that rejects workloads without valid SPIFFE attestation, an API gateway that checks scope on every request rather than trusting whatever authenticated upstream — that's zero trust as an engineering practice rather than a slogan. The Salesloft Drift breach from August 2025 is the cleanest recent illustration I've seen of what happens when that discipline is missing, and it's worth walking through because almost nobody talks about it as an identity failure, even though that's exactly what it was. Between August 8 and 18, 2025, an intrusion cluster tracked as UNC6395 stole OAuth refresh tokens belonging to Salesloft's Drift chatbot integration with Salesforce. Those tokens didn't just authenticate Drift once — they granted standing, broadly scoped access that let the attackers run systematic queries against Salesforce instances at more than 700 organizations for roughly ten days before anyone shut it down. Cloudflare, one of the disclosed victims, found that 104 of its own API tokens had been exposed in the process, embedded in support-ticket text that the attackers specifically went hunting through. The breach later cascaded further: Google's Threat Intelligence Group linked the same stolen token set to a follow-on compromise of Gainsight-published Salesforce apps affecting another 200-plus instances. Salesforce's core platform was never touched. The failure was entirely in how a third-party integration's machine credential was scoped, monitored, and trusted by default — precisely the non-human identity gap that CyberArk's 80:1 statistic is describing in the abstract. That's the case for per-call, per-scope enforcement instead of standing trust in a token: if Drift's OAuth grant had been scoped to the specific objects it actually needed, time-boxed, and subject to anomaly detection on query volume, ten days of unmonitored SOQL queries against 700 organizations' CRM data simply doesn't happen. Decentralized Identity: Further Along Than Most Engineers Realize I'll admit I was skeptical of decentralized identity for years — it had the smell of a solution chasing a problem. That changed somewhat in 2025. On May 15, the W3C's Verifiable Credentials Working Group pushed the Verifiable Credentials Data Model 2.0 to full Recommendation status, alongside six companion specifications covering data integrity, JOSE/COSE-based securing of credentials, controlled identifiers, and revocation via bitstring status lists. Decentralized Identifiers themselves reached an updated 1.1 Recommendation the same year, building on the original DID Core spec from 2022. None of this is vaporware standards-body theater; it's the plumbing underneath the EU's Digital Identity Wallet rollout and a growing number of supply-chain credentialing pilots. Where this intersects with machine identity is the part most zero-trust articles skip: a verifiable credential doesn't have to describe a human. An AI agent or a service can hold a VC asserting "this workload is attested by this CI pipeline" or "this device passed this manufacturer's provisioning process," cryptographically signed and independently verifiable without phoning home to a central authority every time. It's still early — more than 150 distinct DID methods exist, and that fragmentation is a genuine interoperability headache — but the standards foundation is no longer the blocker it was three years ago. The blocker now is mostly organizational willingness to pilot something that doesn't look like OAuth. IAM Automation: The Part Nobody Can Skip Anymore Here's where the industry stops having a choice. In April 2025, the CA/Browser Forum unanimously approved Ballot SC-081v3, originally proposed by Apple, which phases public TLS certificate lifespans down from the current 398-day maximum to 200 days starting March 2026, 100 days in March 2027, and 47 days by March 2029. That's roughly an eightfold increase in renewal frequency over four years, on certificates most organizations are still managing through some combination of spreadsheets and tribal knowledge. Manual certificate management was already a liability. At a 47-day cadence, it's not viable at any meaningful scale — full stop. Practically, this means PKI and secrets automation move from "nice to have" to load-bearing infrastructure. HashiCorp Vault and its competitors for dynamic secrets issuance, SPIRE for workload-level short-lived credentials, and CI/CD-integrated certificate lifecycle tooling aren't optional add-ons to a security program anymore — they're the only way the math works once renewal events go from roughly one a year to eight. The teams I've watched handle this transition well started treating certificate and secret rotation as a property of deployment automation, a full year before the CA/Browser Forum vote even landed. The teams scrambling now are discovering that "we'll automate it later" was always a deferred cost, not an avoided one. What I'd Actually Build Plain Text IDENTITY ISSUANCE LAYER → SPIFFE/SPIRE issues short-lived SVIDs to every workload, attested at startup → Device certs provisioned at manufacture/build time, never shared secrets → AI agents registered as distinct identity subjects, not borrowed user sessions ENFORCEMENT LAYER (per call, not per session) → Service mesh enforces mTLS between every workload, no exceptions for "internal" traffic → API gateway validates scope and token freshness on every request → Kubernetes admission controller rejects any workload lacking valid attestation LIFECYCLE LAYER (automated, not scheduled) → Vault-issued dynamic secrets with short TTLs by default → Cert rotation pipelines built for 47-day cycles now, not in 2029 → OAuth grants for third-party SaaS integrations scoped narrowly and reviewed on a fixed cadence, not left standing indefinitely The issuance layer answers, "How do we know who this is?" The enforcement layer answers, "What is this identity actually allowed to touch, right now?" The lifecycle layer is what keeps the answer to both of those questions from going stale — which, per the Salesloft Drift timeline, is exactly the gap that turns a single over-permissioned integration into a 700-company incident. None of this is exotic engineering. It's mostly discipline, applied consistently, to a category of identity that most organizations have been quietly ignoring while they perfected human MFA. The uncomfortable truth for 2026 is that the attackers have already noticed where the gap is. The question is whether your machine identities have an owner, a lifecycle, and an expiration date — or whether they're just credentials that happen to still work, sitting in a config file, waiting for someone to go looking for them first.

By Igboanugo David Ugochukwu DZone Core CORE
AI, OAuth, and Other Platform APIs in the Core
AI, OAuth, and Other Platform APIs in the Core

This is the second follow-up to June 5's release post. It covers the platform APIs that moved into the framework core this release. There are two headline pieces (AI/LLM and the modern OAuth/OIDC stack) and two smaller pieces (WiFi/connectivity and share-sheet result callbacks). This continues the direction the previous release set when we moved NFC, biometrics, and cryptography into the framework core. The full background on that earlier set is in NFC, Crypto, Biometrics, And A New Build Cloud. AI: A First-Class LLM Client and a ChatView Component PR #5035 lands the com.codename1.ai package, the ChatView UI component, the speech and TTS additions, and the build-time dependency injection that wires the native pieces in. PR #5057 lands the developer-guide chapter and the agent-skill addition, so any project generated from the Initializr inherits the new APIs through its bundled AGENTS.md. LlmClient: The Basic Chat Request com.codename1.ai.LlmClient is the entry point. The simplest possible use: Java LlmClient client = LlmClient.openai(apiKey); ChatRequest req = new ChatRequest.Builder() .model("gpt-4o-mini") .system("You are a helpful assistant.") .user("What is the capital of France?") .temperature(0.7) .build(); client.chat(req).onResult((resp, err) -> { if (err != null) { Log.e(err); return; } Log.p(resp.firstChoice().content()); LlmClient.openai(...), LlmClient.anthropic(...), LlmClient.gemini(...), LlmClient.ollama(...), and LlmClient.openAiCompatible(baseUrl, apiKey) are the factories. All five are fully implemented native clients. The OpenAI client also drives Ollama, vLLM, llama.cpp, and any other endpoint that speaks the OpenAI wire format, so most local-model stacks plug in through LlmClient.openAiCompatible(...) without a separate driver. Streaming Chat (What You Actually Want for Chat UIs) For any UI that types responses out token-by-token, the streaming entry point is the one to reach for. The callback fires on the EDT, so you can append directly to a text component: Java client.chatStream(req, new ChatStreamListener() { @Override public void onDelta(ChatDelta d) { responseLabel.setText(responseLabel.getText() + d.contentDelta()); responseLabel.getParent().revalidateLater(); } @Override public void onComplete(ChatResponse fin) { sendButton.setEnabled(true); } @Override public void onError(Throwable t) { Log.e(t); sendButton.setEnabled(true); } Under the hood this is a custom ConnectionRequest subclass that parses SSE line-by-line and dispatches each delta through Display.callSerially. AsyncResource.cancel() kills the socket. So a chat UI that has a cancel button is a one-line cancellation. Tool Calls If you want the model to call back into your app, Tool / ToolChoice give you OpenAI-style function calling. Define the tool, hand the model your model and the available tools, and the response surfaces structured ToolCall objects you dispatch: Java Tool getWeather = Tool.builder() .name("get_weather") .description("Look up the current weather for a city.") .parameter("city", "string", "The city name, e.g. \"Paris\".") .build(); ChatRequest req = new ChatRequest.Builder() .model("gpt-4o-mini") .user("Is it raining in Tel Aviv right now?") .tool(getWeather) .toolChoice(ToolChoice.AUTO) .build(); client.chat(req).onResult((resp, err) -> { if (err != null) return; for (ToolCall call : resp.firstChoice().toolCalls()) { if ("get_weather".equals(call.name())) { String city = call.argument("city").asString(); String json = lookupWeather(city); // Loop the result back into the conversation client.chat(req.replyWithToolResult(call, json)) .onResult((followUp, e) -> updateUi(followUp)); } } The shape mirrors the OpenAI function-calling contract one for one, so anything you have written against the OpenAI API directly maps across without rethinking. Embeddings LlmClient.embed(...) returns a vector for any input string. Useful for similarity search against a local SQLite store (tomorrow's post will cover the new ORM that pairs with this): Java EmbeddingRequest er = new EmbeddingRequest.Builder() .model("text-embedding-3-small") .input("Codename One is a cross-platform mobile framework.") .build(); client.embed(er).onResult((emb, err) -> { float[] vector = emb.firstVector(); // store, search, compare Image Generation DALL-E and a Replicate scaffold are surfaced through ImageGenerator: Java ImageGenerator gen = ImageGenerator.openAiDallE(apiKey); gen.generate("A red bicycle leaning against an olive tree", "1024x1024") .onResult((img, err) -> { if (err != null) return; myImageComponent.setIcon(img); Working Against Ollama in the Simulator (No API Charges) JavaSEPort pings localhost:11434 at startup. If it finds Ollama, it sets the cn1.ai.ollamaDetected property. With cn1.ai.simulatorRedirect=auto (or =ollama) every LlmClient.openai(...) call routes through the local Ollama endpoint instead of OpenAI's. Production code does not change. The iteration loop, your tests, and your offline debugging stop costing money and stop needing an internet connection. In common/codenameone_settings.properties: Properties files simulator.cn1.ai.simulatorRedirect=auto (The simulator. prefix scopes the property to the JavaSE simulator path.) Then run Ollama locally with whichever model your code expects (ollama run llama3.2 or similar) and your existing LlmClient.openai(...) calls go to localhost. How to Handle API Keys A direct word on credentials before any of the above sees production. LLM provider API keys (OpenAI, Anthropic, Gemini, your Auth0 / Firebase configs) are bearer tokens with a budget attached. They must never be checked into source control, embedded in your app binary, or hard-coded in code. A leaked key can be extracted from any APK or IPA in minutes and used to drain your account. The correct shape is to fetch the key from your own backend over an authenticated request, then store it on the device using the platform's keychain / keystore. The framework provides both pieces: com.codename1.crypto.SecureStorage (from the previous release) is the cross-platform wrapper over iOS Keychain Services and Android EncryptedSharedPreferences. Values are encrypted at rest using the platform's hardware-backed protection class where one is available.This release adds a single-argument get / set / remove(account, ...) overloads next to the existing biometric-gated methods. The new overloads store the value without a per-read Face ID / Touch ID prompt, which is what you want for an LLM API key (you read it on every network call; a biometric prompt every time is not workable). The biometric-gated methods are still there for credentials you do want to gate per use. A reasonable shape: Java private static AsyncResource<String> getOpenAiKey() { String cached = SecureStorage.get("openai_api_key"); if (cached != null) { return AsyncResource.complete(cached); } return Rest.get(myServer + "/v1/credentials/openai") .bearerToken(userSessionToken()) .fetchAsString() .onResult((key, err) -> { if (err == null) { SecureStorage.set("openai_api_key", key); } }); Your server gates the credential request behind the user's session, your app caches the result on the keychain, and the key never sits anywhere a reverse-engineering pass could find it. If your server rotates the key, invalidate the cache and refetch. Existing biometric-gated SecureStorage calls keep working unchanged. The new overloads are additive. ChatView: A Ready-Made Streaming Chat UI com.codename1.components.ChatView is the matching UI component. Scrollable message list, ChatBubble for the per-message bubble (theme-aware UIIDs so it picks up the iOS Modern / Material 3 native themes consistently), ChatInput for the bottom input bar, and a one-line bindToLlm(...) that wires the input to a streaming chat request: Java ChatView view = new ChatView(); getOpenAiKey().onResult((key, err) -> { view.bindToLlm(LlmClient.openai(key), new ChatRequest.Builder() .model("gpt-4o-mini") .system("You are a friendly tutor for " + "Codename One developers.") .build()); }); Form f = new Form("Chat", new BorderLayout()); f.add(BorderLayout.CENTER, view); The result is a standard mobile chat layout, picked up from whichever native theme the project uses: If you want more control than bindToLlm(...) gives you (custom message styling, a "thinking" placeholder, hand-rolled retry, persistence to your own model class), drive the view by hand: Java ChatView view = new ChatView(); ConversationStore store = ConversationStore.open("tutor-thread"); view.setMessages(store.load()); LlmClient client = LlmClient.openai(apiKeyFromKeychain); view.setInputListener(userText -> { ChatMessage userMsg = ChatMessage.user(userText); view.appendMessage(userMsg); store.append(userMsg); ChatMessage assistant = ChatMessage.assistant(""); view.appendMessage(assistant); ChatRequest req = new ChatRequest.Builder() .model("gpt-4o-mini") .messages(store.load()) .build(); client.chatStream(req, new ChatStreamListener() { @Override public void onDelta(ChatDelta d) { view.appendToLastMessage(d.contentDelta()); } @Override public void onComplete(ChatResponse fin) { store.append(ChatMessage.assistant(view.lastMessage().content())); view.setInputEnabled(true); } @Override public void onError(Throwable t) { view.appendToLastMessage(" [error: " + t.getMessage() + "]"); view.setInputEnabled(true); } }); appendToLastMessage(...) is the streaming entry point; it marshals through callSerially so deltas land on the EDT in order. ConversationStore persists the thread (the default backing is Storage; pluggable via a custom implementation if you would rather keep it in SQLite or push it to your server). The AI cn1libs The core LLM stack is paired with a set of opt-in cn1libs that wrap specific on-device capabilities: Google ML Kit features, the TensorFlow Lite runtime, a local Whisper transcription engine, and an on-device Stable Diffusion model. Thirteen new cn1libs ship this release. These cn1libs are not yet listed in the Codename One Preferences cn1lib picker, so for the moment they are added by hand. Drop the matching dependency block into your project's common/pom.xml and rebuild. The build-time scanner does the rest: the iOS pod or Swift Package, the Android Gradle dependency, the plist usage strings (NSCameraUsageDescription for the vision libraries, NSSpeechRecognitionUsageDescription for Whisper, etc.), and the Android permissions (android.permission.RECORD_AUDIO for audio capture) are all injected automatically the first time the scanner sees the matching class on the classpath. For each cn1lib below, the dependency block is identical in shape; only the <artifactId> changes. The shared pattern is: XML <dependency> <groupId>com.codenameone</groupId> <artifactId><!-- cn1lib artifact id from below --></artifactId> <version>${cn1.version}</version> </dependency> cn1-ai-mlkit-text: Text Recognition (OCR) TL;DR. Pull printed or handwritten text out of an image (a photo of a page, a sign, a receipt) entirely on-device. Platforms. iOS bridges to GoogleMLKit/TextRecognition. Android bridges to com.google.mlkit:text-recognition. The JavaSE simulator returns an unsupported error. Use cases. Receipt scanning, sign translation pipelines (combine with cn1-ai-mlkit-translate), accessibility tools that read printed text aloud, automated form ingestion. Java byte[] jpeg = capturePhotoBytes(); TextRecognizer.recognize(jpeg).onResult((text, err) -> { if (err == null) Log.p("OCR: " + text); cn1-ai-mlkit-barcode: Barcode and QR Scanning TL;DR. Decodes QR, EAN, UPC, Data Matrix, PDF417, and the rest of the common 1D / 2D code families from a captured image. Platforms. iOS bridges to MLKitBarcodeScanning. Android bridges to com.google.mlkit:barcode-scanning. The JavaSE simulator returns an unsupported error. Use cases. Inventory scanning, ticket / boarding-pass readers, QR-driven onboarding flows, retail loyalty cards. Java byte[] jpeg = capturePhotoBytes(); BarcodeScanner.scan(jpeg).onResult((codes, err) -> { if (err == null) { for (String code : codes) Log.p("Found: " + code); } }); cn1-ai-mlkit-face: Face Detection TL;DR. Returns bounding boxes for human faces detected in an image. Each face is reported as a packed int[4] (x, y, width, height). Platforms. iOS bridges to MLKitFaceDetection. Android bridges to com.google.mlkit:face-detection. Use cases. Auto-crop a contact photo, mosaic / blur bystanders in a group shot, drive a face-tracked overlay for AR-lite filters. Java FaceDetector.detect(jpeg).onResult((boxes, err) -> { if (err != null) return; for (int i = 0; i < boxes.length; i += 4) { Log.p("face at " + boxes[i] + "," + boxes[i + 1] + " " + boxes[i + 2] + "x" + boxes[i + 3]); } }); cn1-ai-mlkit-labeling: Image Labeling TL;DR. "What is in this picture." Returns a list of descriptive labels for the image content. Platforms. iOS bridges to MLKitImageLabeling. Android bridges to com.google.mlkit:image-labeling. Use cases. Auto-tagging uploaded photos, content moderation pre-filters, content-based image search. Java ImageLabeler.label(jpeg).onResult((labels, err) -> { if (err == null) Log.p("labels: " + String.join(", ", labels)); }); cn1-ai-mlkit-translate: On-Device Translation TL;DR. Translate short text between supported language pairs entirely on-device; no server round-trip, no API key, works offline. Platforms. iOS bridges to MLKitTranslate. Android bridges to com.google.mlkit:translate. Languages are identified by their ISO 639-1 codes (en, fr, es, ...). Use cases. Offline travel assistants, chat translation, accessibility readers for foreign signage (combine with cn1-ai-mlkit-text). Java Translator.translate("Where is the train station?", "en", "fr") .onResult((fr, err) -> { if (err == null) Log.p(fr); // "Où est la gare ?" }); cn1-ai-mlkit-smartreply: Short Reply Suggestions TL;DR. Generates short suggested replies for chat conversations, similar to Gmail's Smart Reply chips. Platforms. iOS bridges to MLKitSmartReply. Android bridges to com.google.mlkit:smart-reply. The input is a JSON array of {role, message, timestamp, userId} objects. Use cases. A "quick reply" row above the keyboard in your in-app chat, response suggestions in a CRM inbox. Java String thread = "[{\"role\":\"remote\",\"message\":\"See you at 6?\"," + "\"timestamp\":" + System.currentTimeMillis() + "," + "\"userId\":\"u42\"}]"; SmartReply.suggest(thread).onResult((suggestions, err) -> { if (err == null) { for (String s : suggestions) Log.p("suggestion: " + s); } }); cn1-ai-mlkit-langid: Language Identification TL;DR. Returns the most likely ISO 639-1 code for a given text, or und (undetermined) when the input is too short or ambiguous. Platforms. iOS bridges to MLKitLanguageID. Android bridges to com.google.mlkit:language-id. Use cases. Auto-route a customer-support message to the right team, pick the correct TTS voice for an arbitrary string, pre-screen input before running an expensive translation. Java LanguageIdentifier.identify("Bonjour le monde").onResult((code, err) -> { if (err == null) Log.p(code); // "fr" }); cn1-ai-mlkit-pose: Pose Detection TL;DR. Returns 33 skeletal landmarks per detected pose as a packed float[3 * 33] (x, y, confidence triples). Platforms. iOS bridges to MLKitPoseDetection. Android bridges to com.google.mlkit:pose-detection. Use cases. Fitness apps with form correction, dance/yoga timing analysis, gesture-driven controls. Java PoseDetector.detect(jpeg).onResult((landmarks, err) -> { if (err != null || landmarks.length < 99) return; float noseX = landmarks[0], noseY = landmarks[1], noseConf = landmarks[2]; Log.p("nose at (" + noseX + ", " + noseY + ") conf=" + noseConf); }); cn1-ai-mlkit-segmentation: Selfie Segmentation TL;DR. Returns a per-pixel mask separating the person in the foreground from the background as byte[width * height] (0 = background, 255 = foreground). Platforms. iOS bridges to MLKitSegmentationSelfie. Android bridges to com.google.mlkit:segmentation-selfie. Use cases. Background replacement for video calls, sticker / portrait-mode effects, blur-the-background privacy filters. Java SelfieSegmenter.segment(jpeg).onResult((mask, err) -> { if (err == null) applyBackgroundReplacement(mask); }); cn1-ai-mlkit-docscan: Document Scanner TL;DR. Detects a rectangular document in a photo, perspective-corrects it, and writes the cropped JPEG to a temporary file. Returns the file path. Platforms. iOS uses Apple's VisionKit + Core Image rectangle detection (no extra pod). Android uses com.google.android.gms:play-services-mlkit-document-scanner. Use cases. "Scan to PDF" flows, expense apps that capture receipts, contract signing flows, ID-document capture. Java DocumentScanner.scanToFile(jpeg).onResult((path, err) -> { if (err == null) uploadDocument(path); }); cn1-ai-tflite: TensorFlow Lite Interpreter TL;DR. A general-purpose on-device inference engine. Bring your own .tflite model and run it against a float32 input tensor. Platforms. iOS uses TensorFlowLiteSwift (Pods or Swift Package). Android uses org.tensorflow:tensorflow-lite + tensorflow-lite-support. Use cases. Any custom on-device ML model your team trains or pulls from TF Hub. Image classification, simple regression, recommendation pre-filters. Java byte[] modelBytes = Util.readFully(Display.getInstance().getResourceAsStream(null, "/model.tflite")); float[] input = featureVector(); Interpreter.run(modelBytes, input).onResult((output, err) -> { if (err == null) Log.p("model returned " + output.length + " values"); }); cn1-ai-whisper: Speech-to-Text via whisper.cpp TL;DR. On-device transcription of a 16 kHz mono WAV file using a ggml-format Whisper model. The cn1lib bundles libwhisper.a. Platforms. iOS uses the Accelerate framework; Android uses a JNI build of the same whisper.cpp core. Models (e.g. ggml-base.bin) are not bundled; ship the one your app expects under the app's resources or download on first launch. Use cases. Voice notes, accessibility transcription, offline dictation, podcast indexing. Java String modelPath = SecureStorage.getFilePath("ggml-base.bin"); String audioPath = recordWavToFile(); WhisperRecognizer.transcribe(modelPath, audioPath) .onResult((text, err) -> { if (err == null) Log.p("heard: " + text); }); cn1-ai-stablediffusion: On-Device Image Generation TL;DR. Generates a JPEG from a text prompt using a bundled Stable Diffusion model. Multi-gigabyte payload, local build only. Platforms. iOS uses Core ML pipelines compiled from the bundled model. Android uses ONNX Runtime. Both configurations exceed the cloud build server's 2 GB upload limit, so this cn1lib triggers the cn1.ai.requiresBigUpload guard and the cloud build aborts with a "build this one locally" message. Add it to a project you build via mvn cn1:buildAndroid / mvn cn1:buildIosXcodeProject on the developer machine. Use cases. Avatar generation in apps where shipping to a cloud API is undesirable (offline-first apps, regulated industries, privacy-sensitive products). Java StableDiffusion.generate("a teal hot-air balloon over Lisbon, watercolour", 512, 512, /* steps */ 25) .onResult((jpeg, err) -> { if (err == null) display(Image.createImage(jpeg, 0, jpeg.length)); }); Why These Are cn1libs and Not Part of the Core The core gets the AI plumbing every app that adopts AI at all wants: the LLM client, streaming, the chat UI, the secure storage primitive for credentials, the simulator Ollama redirect for offline iteration. The cn1libs above are specialized verticals. Barcode scanning, document scanning, face detection, smart reply, pose detection, on-device translation, transcription, and on-device image generation are genuinely useful, but only for some apps. They also each bring a non-trivial native dependency. The Google ML Kit Android frameworks are large; the iOS pods carry their own weight; the bundled libwhisper.a and the Stable Diffusion model are big. Pulling all of them into the core would tax every app, whether the feature is used or not. The Stable Diffusion cn1lib in particular is large enough that the cloud build server cannot accept the upload at all (it trips the 2 GB pre-upload guard). That kind of opt-in does not belong in a dependency every app inherits. The corresponding chapter, including the full LlmClient API table, the ChatView reference, the SecureStorage overloads, the simulator Ollama redirect, and the full cn1lib coverage, is at AI, Chat UI, and Speech in the developer guide. OAuth and OIDC: The Modern Identity Stack The in-app-WebView Oauth2 flow that Codename One has shipped since approximately forever was the way every cross-platform mobile framework solved "sign in with Google / Facebook / Microsoft" in the 2010s. It is also the way every one of those identity providers stopped wanting you to solve it. Google has been blocking embedded user agents for years. Apple does not want third-party apps wrapping the Apple ID flow in a WKWebView. Microsoft and Facebook joined the chorus. The right answer is the system browser: ASWebAuthenticationSession on iOS, Custom Tabs on Android, with PKCE on the wire. That is what PR #5018 lands. PR #5039 adds a portable WebAuthn / passkey client on top. Sign In With Google (or Any OIDC Provider) com.codename1.io.oidc.OidcClient is the entry point. Point it at the discovery URL of an OIDC provider, hand it the client id and the redirect URI you registered with the provider, ask for tokens: Java OidcConfiguration cfg = OidcConfiguration.discover("https://accounts.google.com"); OidcClient client = OidcClient.builder() .configuration(cfg) .clientId("123-abc.apps.googleusercontent.com") .redirectUri("com.example.myapp:/oauthredirect") .scopes("openid", "email", "profile") .build(); client.signIn().onResult((tokens, err) -> { if (err != null) { OidcException oe = (OidcException) err; if (oe.getCode() == OidcException.USER_CANCELLED) return; Log.e(oe); return; } String idToken = tokens.getIdToken().raw(); String email = tokens.getIdToken().getClaim("email").asString(); proceed(email, idToken); Discovery JSON parsed and cached. PKCE S256 challenge generated and verified. State and nonce checked on the callback. ID-token claims decoded for you (we deliberately do not verify the signature client-side; the dev guide is explicit about why and points at the "re-validate on your backend" remedy). Refresh and revoke are first-class. The token store is pluggable via TokenStore; the default is Storage-backed, but a Keychain-backed or in-memory variant is a small class. On iOS the system-browser piece routes through ASWebAuthenticationSession. On Android through androidx.browser.customtabs with a plain ACTION_VIEW fallback for the rare device with no Custom Tabs provider. AuthenticationServices.framework and androidx.browser:browser are auto-linked when the classpath scanner sees OidcClient in use. Provider Wrappers: Google, Apple, Microsoft, Facebook, Auth0, Firebase If you would rather not configure OIDC by hand, the existing social classes get a signIn(...) method that drives the same stack with the provider's issuer URL pre-wired: Java GoogleConnect.signIn(googleClientId, "com.example.myapp:/oauthredirect", "openid", "email", "profile") .onResult((tokens, err) -> { /* ... */ }); MicrosoftConnect.signIn(entraClientId, "msauth.com.example.myapp://auth", "User.Read") .onResult((tokens, err) -> { /* ... */ }); Auth0Connect.signIn("tenant.auth0.com", clientId, redirectUri, "openid profile email") .onResult((tokens, err) -> { /* ... */ }); FacebookConnect.signIn(...) follows the same shape against the Facebook OIDC endpoint. FirebaseAuth covers the REST-based Firebase auth surface (email/password, IdP token exchange, refresh) which sits underneath any provider hand-off you might want to drive from app code. Sign In With Apple Sign in with Apple is required on iOS for apps that offer any other social login, and on Android it must fall through to a web flow. com.codename1.social.AppleSignIn handles both transparently: Java AppleSignIn.signIn() .onResult((result, err) -> { if (err != null) return; String idToken = result.getIdToken(); String code = result.getAuthorizationCode(); proceedToBackend(idToken, code); }); On iOS 13 and later this drops directly into the native Apple sheet via ASAuthorizationAppleIDProvider. On non-iOS platforms it falls through to the same OIDC web flow as everything else, so a single line of app code does the right thing on every port. The Maven plugin injects the com.apple.developer.applesignin entitlement on iOS when it sees AppleSignIn in use; Android does not see it because it is not there. Migration From the Legacy Oauth2 com.codename1.io.Oauth2 is now deprecated. Existing code still compiles, but the migration is short and almost always shorter than what it replaces: Java // Before Oauth2 oauth = new Oauth2("https://accounts.google.com/o/oauth2/auth", clientId, redirectUri); oauth.setClientSecret(clientSecret); oauth.setScope("openid email profile"); oauth.setBrowserComponent(myBrowserComponent); // tied to a WKWebView String token = oauth.authenticate(); // blocks, opens the web view Java // After OidcClient.builder() .configuration(OidcConfiguration.discover("https://accounts.google.com")) .clientId(clientId) .redirectUri(redirectUri) .scopes("openid", "email", "profile") .build() .signIn() .onResult((tokens, err) -> proceed(tokens.getIdToken().raw())); You stop owning the browser. The OS owns it. The cookies live in the platform's authentication session. The user gets the same login experience they have everywhere else on their device. WebAuthn/Passkeys PR #5039 layers a portable WebAuthn client on top: Java WebAuthnClient client = WebAuthnClient.getInstance(); if (!client.isAvailable()) { fallbackToPassword(); return; } PublicKeyCredentialCreationOptions opts = PublicKeyCredentialCreationOptions.fromServerJson(serverJson); client.create(opts).onResult((cred, err) -> { if (err == null) postToRelyingParty(cred.toJson()); }); W3C JSON wire format in both directions, so the response can be POSTed verbatim to any standard server-side WebAuthn library. iOS 16+ routes through ASAuthorizationPlatformPublicKeyCredentialProvider; Android API 28+ through androidx.credentials.CredentialManager. Provider helpers: Auth0Connect.signInWithPasskey(...) / .registerPasskey(...) and FirebaseAuth.signInWithPasskey(...) / .registerPasskey(...). One thing worth pulling out before you reach for it: if you sign in via OIDC against Google, Apple, Microsoft, Auth0, or Firebase, you usually already get passkeys for free. The identity provider runs the WebAuthn ceremony inside the system browser; OIDC just hands you the resulting tokens. So you do not need WebAuthnClient for that case. You need it for apps that run their own relying-party backend, and for apps driving the Auth0 or Firebase passkey grants directly. Full chapter: Authentication and Identity. Connectivity: WiFi, Bonjour, USB, network-type listeners PR #5021 lands four packages for apps that need to do more with the network than open an HTTP socket. The shape: Java WiFi wifi = WiFi.getInstance(); String ssid = wifi.getCurrentSSID(); String bssid = wifi.getBSSID(); String gateway = wifi.getGateway(); String ip = wifi.getIp(); wifi.scan(new ScanOptions().setTimeoutMillis(5000)) .onResult((results, err) -> { /* ... */ }); wifi.connect("MyNetwork", "hunter2", Security.WPA2_PSK) .onResult((success, err) -> { /* ... */ }); com.codename1.io.wifi for WiFi info, scan, and connect. com.codename1.io.wifi.WiFiDirect for peer-to-peer (Android only by platform reality). com.codename1.io.bonjour for mDNS / Zeroconf via BonjourBrowser and BonjourPublisher. com.codename1.io.usb for USB host (Android only). And NetworkManager.addNetworkTypeListener(...) plus NETWORK_TYPE_* constants so an app can react to a transition between cellular, WiFi, ethernet, or "none": Java NetworkManager.getInstance().addNetworkTypeListener(evt -> { int type = evt.getNetworkType(); if (type == NetworkManager.NETWORK_TYPE_NONE) showOfflineBanner(); else if (type == NetworkManager.NETWORK_TYPE_CELLULAR) suppressLargeBackgroundDownloads(); else clearOfflineBanner(); }); iOS does not expose programmatic WiFi scanning to third-party apps; scan() throws UnsupportedOperationException on iOS. iOS also does not expose WiFi Direct or general USB host. None of those are Codename One limitations; they are Apple's. The dev guide is explicit about each platform's limits. Three new compile-time defines (CN1_INCLUDE_WIFI_INFO, CN1_INCLUDE_HOTSPOT, CN1_INCLUDE_BONJOUR) wrap the iOS native code, set only when the classpath scanner sees the matching Java API in use. Apps that do not use these APIs do not pay for them at App Store review time. Same pattern as the NFC gating from the previous release. Full reference: Network Connectivity. Share-Sheet Result Callbacks PR #5036 closes a small but persistent gap: Display.share(...) and ShareButton finally tell you what the user did with the share sheet: Java ShareButton btn = new ShareButton(); btn.setTextToShare("Look at this fox"); btn.setImageToShare("/fox.jpg"); btn.setShareResultListener(result -> { switch (result.getStatus()) { case SHARED_TO: track("share_completed", result.getTargetPackage()); break; case DISMISSED: track("share_dismissed"); break; case FAILED: track("share_failed", result.getError()); break; } }); iOS routes through UIActivityViewController.completionWithItemsHandler; Android through Intent.createChooser with an IntentSender callback (API 22+). The framework normalizes the platform values into SHARED_TO(packageName), DISMISSED, or FAILED. Appearing in Other Apps' Share Menus The other half of sharing is the inverse direction: not "let the user share from your app", but "let your app receive content other apps share". If a user is in Safari, Photos, or Mail and taps the share icon, your app should be able to appear as a target there alongside Messages, WhatsApp, and Instagram. On iOS that requires a separate Share Extension target inside the .ipa, with its own bundle, its own Info.plist, an App Group string that links it to the host app, and a ShareViewController that handles the incoming payload. Historically the recommendation was to bootstrap that target by hand in Xcode, copy the resulting files into the Codename One project under ios/app_extensions/, and let the build server's extractor consume them. It worked, but it was a workflow most teams put off because the setup is fiddly. The same PR ships an IOSShareExtensionBuilder Mojo that does all of that for you. A typical setup is one Maven command and a one-time configuration block: XML <plugin> <groupId>com.codenameone</groupId> <artifactId>codenameone-maven-plugin</artifactId> <configuration> <iosShareExtension> <bundleIdentifier>com.example.myapp.share</bundleIdentifier> <displayName>MyApp</displayName> <appGroup>group.com.example.myapp</appGroup> <acceptedContent> <content>PUBLIC_URL</content> <content>PUBLIC_IMAGE</content> <content>PUBLIC_TEXT</content> </acceptedContent> </iosShareExtension> </configuration> </plugin> Run mvn cn1:generate-ios-share-extension and the Mojo writes a complete .ios.appext bundle into ios/app_extensions/: the Info.plist with the right NSExtension activation rules for the content types you declared, the App Group entitlement, a minimal ShareViewController.swift that lands the payload in the App Group's UserDefaults(suiteName:), and the matching buildSettings.properties. The result feeds straight into the existing IPhoneBuilder.extractAppExtensions pipeline, so apps that already have a hand-rolled extension keep working unchanged. On the host-app side, you read the payload on launch: Java // Anywhere after Display.init has run String shared = Storage.getInstance() .readObject("ios.shareExtension.lastPayload"); if (shared != null) { handleSharedPayload(shared); } After the next cloud or local build, your app appears in the iOS share sheet for the content types you declared. No Xcode work, no hand-rolled plist, no App Group string typed in three places. The build-time tooling owns it. Wrapping Up Tomorrow's post covers the architectural change in this release: a build-time bytecode annotation framework, the declarative router that is its first consumer, the SQLite ORM and JSON / XML mappers and component binder built on the same SPI, and the build-time SVG / Lottie transcoder that ships in the same release for related reasons. Back to the weekly index.

By Shai Almog DZone Core CORE
Implementing Asynchronous Communication Between Microservices Using Kafka and Spring Boot
Implementing Asynchronous Communication Between Microservices Using Kafka and Spring Boot

In a microservices system, that tight coupling turns a small hiccup into a cascading slowdown. Thread pools fill, retries amplify traffic, and suddenly your simple request is blocked on half the fleet. My executive summary: asynchronous messaging with Kafka helps systems keep moving when individual components inevitably slow down or fail. It does this by decoupling producers from consumers, absorbing traffic spikes, and allowing services to evolve without tying their availability directly to one another. Code Patterns in Spring Boot With Kafka Spring for Apache Kafka gives me two primitives that feel pleasantly old Spring KafkaTemplate for sending and @KafkaListener for receiving. That template/listener model is intentionally similar to other Spring integration tech, which keeps application code focused on domain logic instead of raw client plumbing. Below is a compact (but production-shaped) pattern: externalized config via @ConfigurationProperties, a service port for publishing, a REST command endpoint, a consumer with a real error strategy (DLT), and a REST error advice. Java // === Messaging config (externalized, type-safe) === @ConfigurationProperties(prefix = "messaging.orders") @Validated record OrdersMessagingProps( @NotBlank String topic, @NotBlank String dltTopic ) {} // === DTO (event contract) === public record OrderCreatedEvent(UUID orderId, UUID userId, BigDecimal total, Instant createdAt) {} // === Service port (keeps domain testable, Kafka swappable) === public interface OrderEventPublisher { void publishOrderCreated(OrderCreatedEvent event); } // === Adapter: Kafka producer === @Component class KafkaOrderEventPublisher implements OrderEventPublisher { private final KafkaTemplate<String, OrderCreatedEvent> template; private final OrdersMessagingProps props; KafkaOrderEventPublisher(KafkaTemplate<String, OrderCreatedEvent> template, OrdersMessagingProps props) { this.template = template; this.props = props; } @Override public void publishOrderCreated(OrderCreatedEvent event) { // Keying by orderId keeps per-order ordering and drives partitioning decisions. template.send(props.topic(), event.orderId().toString(), event); } } // === REST command API (synchronous edge, async core) === @RestController @RequestMapping("/v1/orders") class OrdersController { private final OrderService orderService; // domain port OrdersController(OrderService orderService) { this.orderService = orderService; } @PostMapping public ResponseEntity<Map<String, Object>> create(@Valid @RequestBody CreateOrderRequest req) { UUID orderId = orderService.create(req.userId(), req.total()); // persists + publishes event return ResponseEntity.accepted().body(Map.of("orderId", orderId, "status", "ACCEPTED")); } record CreateOrderRequest(@NotNull UUID userId, @NotNull @Positive BigDecimal total) {} } // === Domain service port (implementation can use outbox, transactions, etc.) === public interface OrderService { UUID create(UUID userId, BigDecimal total); } // === Consumer: downstream service reacts to events === @Component class BillingListener { @KafkaListener(topics = "${messaging.orders.topic}", groupId = "${spring.kafka.consumer.group-id}") void onOrderCreated(OrderCreatedEvent event) { // Idempotency belongs here: process-by-key + store processed eventId/orderId to avoid duplicates. // Do work (charge card, create invoice, etc.) } } // === Kafka consumer error handling: retries + DLT === @Configuration class KafkaErrorHandlingConfig { @Bean DefaultErrorHandler defaultErrorHandler(KafkaTemplate<Object, Object> template, OrdersMessagingProps props) { var recoverer = new DeadLetterPublishingRecoverer(template, (rec, ex) -> new TopicPartition(props.dltTopic(), rec.partition())); // Backoff and retry policy are configurable; keep it finite to avoid poison-pill loops. return new DefaultErrorHandler(recoverer, new FixedBackOff(1000L, 3)); } } // === REST error handling (ProblemDetail) === @RestControllerAdvice class ApiErrors { @ExceptionHandler(IllegalArgumentException.class) @ResponseStatus(HttpStatus.BAD_REQUEST) ProblemDetail badRequest(IllegalArgumentException ex) { var pd = ProblemDetail.forStatusAndDetail(HttpStatus.BAD_REQUEST, ex.getMessage()); pd.setTitle("Invalid request"); return pd; } } A few been-burned-before notes on the code above. Spring Kafka’s reference docs are explicit that KafkaTemplate is the convenience wrapper for producing, and DefaultErrorHandler + DeadLetterPublishingRecoverer is a first-class way to route failed records to dead-letter topics after retries. If we want non-blocking retries, Spring Kafka also provides @RetryableTopic, which orchestrates retry topics and a DLT automatically useful when transient failures are common and you want predictable retry delay semantics. Containers and Local Dev With Docker Compose When I’m chasing down event flow bugs, I like local environments that feel like the old days: one command, deterministic startup order, and no mystery dependencies. Docker Compose is still the quickest way to stand up Kafka alongside your services, and Confluent publishes straightforward Docker-based tutorials and compose examples for running Kafka locally. For the service image itself, multi-stage builds are the modern classic compile in a builder stage, and copy the artifact into a slimmer runtime stage. Docker documents multi-stage builds as a way to reduce the final image contents and keep build dependencies out of production. Dockerfile # Multi-stage Dockerfile for a Spring Boot service (orders-service) FROM eclipse-temurin:21-jdk AS build WORKDIR /workspace COPY mvnw pom.xml ./ COPY .mvn .mvn RUN ./mvnw -q -DskipTests dependency:go-offline COPY src src RUN ./mvnw -q -DskipTests package FROM eclipse-temurin:21-jre WORKDIR /app COPY --from=build /workspace/target/*.jar app.jar EXPOSE 8080 ENTRYPOINT ["java","-jar","/app/app.jar"] And here’s a Compose file that wires up Kafka and Schema Registry, plus an example Spring Boot service. The exact image choices are illustrative. Your production choices are unspecified and should reflect your standards and security posture. YAML # compose.yaml (local/dev) services: zookeeper: image: confluentinc/cp-zookeeper:7.6.0 environment: ZOOKEEPER_CLIENT_PORT: 2181 kafka: image: confluentinc/cp-kafka:7.6.0 depends_on: [zookeeper] ports: ["9092:9092"] environment: KAFKA_ZOOKEEPER_CONNECT: zookeeper:2181 KAFKA_ADVERTISED_LISTENERS: PLAINTEXT://kafka:9092,PLAINTEXT_HOST://localhost:9092 KAFKA_LISTENER_SECURITY_PROTOCOL_MAP: PLAINTEXT:PLAINTEXT,PLAINTEXT_HOST:PLAINTEXT KAFKA_INTER_BROKER_LISTENER_NAME: PLAINTEXT KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR: 1 schema-registry: image: confluentinc/cp-schema-registry:7.6.0 depends_on: [kafka] ports: ["8081:8081"] environment: SCHEMA_REGISTRY_HOST_NAME: schema-registry SCHEMA_REGISTRY_KAFKASTORE_BOOTSTRAP_SERVERS: PLAINTEXT://kafka:9092 orders: build: ./orders-service depends_on: [kafka] ports: ["8080:8080"] environment: SPRING_KAFKA_BOOTSTRAP_SERVERS: kafka:9092 MESSAGING_ORDERS_TOPIC: orders.events MESSAGING_ORDERS_DLTTOPIC: orders.events.dlt SCHEMA_REGISTRY_URL: http://schema-registry:8081 Deploying on Kubernetes or AWS On AWS, the Kafka decision is usually managed or self-managed. If you choose Amazon MSK, the cluster lives in your VPC, pick subnets across distinct Availability Zones, and connect clients using the cluster’s bootstrap brokers. That’s the networking baseline, and it’s not optional. MSK is VPC-first by design. For authentication/authorization, MSK supports IAM access control. AWS documents the client configuration for IAM mechanisms. In EKS, I typically pair MSK IAM with IRSA so pods can obtain AWS credentials the AWS way, while ECS services would use task roles instead. Both patterns are documented by AWS, and your choice here is unspecified. Kubernetes service discovery is usually the easy part. Services and Pods get DNS names so workloads can call each other by name rather than IP. Kafka itself is reached via bootstrap broker endpoints or via internal Services, but either way, you want the strings in externalized config, not hardcoded. Here’s a minimal Kubernetes Deployment/Service for a Kafka client service. Values like region, account IDs, and MSK endpoints are unspecified placeholders. YAML apiVersion: apps/v1 kind: Deployment metadata: name: orders namespace: apps spec: replicas: 2 selector: matchLabels: { app: orders } template: metadata: labels: { app: orders } spec: serviceAccountName: orders-sa # IRSA-bound (role ARN unspecified) containers: - name: orders image: <UNSPECIFIED_AWS_ACCOUNT_ID>.dkr.ecr.<UNSPECIFIED_REGION>.amazonaws.com/orders:<TAG> ports: [{ containerPort: 8080 }] env: - name: SPRING_KAFKA_BOOTSTRAP_SERVERS value: "<UNSPECIFIED_MSK_BOOTSTRAP_BROKERS>" - name: MESSAGING_ORDERS_TOPIC value: "orders.events" - name: MESSAGING_ORDERS_DLTTOPIC value: "orders.events.dlt" readinessProbe: httpGet: { path: /actuator/health/readiness, port: 8080 } initialDelaySeconds: 10 --- apiVersion: v1 kind: Service metadata: name: orders namespace: apps spec: selector: { app: orders } ports: - port: 80 targetPort: 8080 Operationally, MSK exposes metrics into CloudWatch (AWS/Kafka), and broker logs can be delivered to CloudWatch Logs (or S3/Firehose). That combination gives you the classic visibility loop: throughput, lag, under-replicated partitions, and error logs without running your own monitoring plane. For distributed tracing in async flows, OpenTelemetry is my default vocabulary now. Spring Boot supports OpenTelemetry export via OTLP, and OpenTelemetry defines Kafka semantic conventions so your producer/consumer spans and attributes stay consistent across tools. CI/CD and the Hard-Earned Field Notes For CI/CD, I keep it boring: build once, push an immutable image, deploy via a declarative mechanism. AWS Prescriptive Guidance provides a clear GitHub Actions pattern for building Docker images and pushing to Amazon ECR, which is a solid baseline when your region/account is unspecified until configured. YAML # .github/workflows/orders.yml name: orders on: push: branches: ["main"] jobs: build_push_deploy: runs-on: ubuntu-latest permissions: id-token: write contents: read steps: - uses: actions/checkout@v4 - uses: actions/setup-java@v4 with: distribution: temurin java-version: "21" - name: Build & test run: ./mvnw -q test package - name: Configure AWS credentials (OIDC) uses: aws-actions/configure-aws-credentials@v4 with: role-to-assume: arn:aws:iam::<UNSPECIFIED_AWS_ACCOUNT_ID>:role/<UNSPECIFIED_GHA_ROLE> aws-region: <UNSPECIFIED_REGION> - name: Login to ECR run: | aws ecr get-login-password --region <UNSPECIFIED_REGION> \ | docker login --username AWS --password-stdin <UNSPECIFIED_AWS_ACCOUNT_ID>.dkr.ecr.<UNSPECIFIED_REGION>.amazonaws.com - name: Build & push image run: | IMAGE=<UNSPECIFIED_AWS_ACCOUNT_ID>.dkr.ecr.<UNSPECIFIED_REGION>.amazonaws.com/orders:${{ github.sha } docker build -t $IMAGE ./orders-service docker push $IMAGE - name: Deploy to EKS (example) run: | aws eks update-kubeconfig --name <UNSPECIFIED_EKS_CLUSTER> --region <UNSPECIFIED_REGION> kubectl -n apps set image deploy/orders orders=$IMAGE Now, the part I wish someone had handed me in 2016: Kafka gives you strong tools, but it does not remove distributed-systems truths. You still need safeguards on the consumer side: idempotent processing, disciplined schema management, and clearly defined retry and dead-letter topic behavior. Kafka’s documentation is careful about the limits of “exactly once” guarantees. Idempotent producers and transactions can strengthen delivery semantics, but achieving true end-to-end exactly-once behavior, especially when external side effects are involved, still depends on deliberate system design. For schema governance, Kafka itself doesn’t ship a schema registry, but acknowledges third-party registries; in practice, Confluent Schema Registry and Apicurio Registry are common choices. Both store schemas out-of-band, so messages carry only a schema identifier, and both support evolvable contracts across Avro/JSON Schema/Protobuf depending on your ecosystem. Conclusion and Best Practices If you take one lesson from my legacy brain into modern event-driven systems, let it be this: asynchrony is a reliability feature, not a performance trick. Kafka’s durable log and consumer group model decouples uptime and absorbs spikes, but you only get the real benefit when you treat schemas as contracts, consumers as idempotent processors, and failure handling as first-class application behavior. On AWS, the operational baseline is non-negotiable. MSK lives in your VPC across AZ subnets, clients connect via bootstrap brokers, IAM auth is configured explicitly, and observability lives in CloudWatch. Do those fundamentals early, and Kafka stops feeling like a mysterious black box and starts feeling like the dependable workhorse it was built to be.

By Mallikharjuna Manepalli
Who Owns the Data Stack?: How AI Is Reshaping Ownership, Architecture, and Accountability Across Teams
Who Owns the Data Stack?: How AI Is Reshaping Ownership, Architecture, and Accountability Across Teams

Editor’s Note: The following is an article written for and published in DZone’s 2026 Trend Report, Cognitive Databases, Intelligent Data: Unified Infrastructure for Vector Search, AI-Optimized Queries, and Hybrid Workloads. For years, some of us have argued that the data stack is part of the product and should be engineered like the application layer: as code and as a service. The market matured toward it, and the data mesh has been the clearest recent expression. AI has eclipsed those debates and settled the matter. The data stack is now product-facing, shaping what users see, what AI answers, and which automated decisions and workflows fire. That makes one question unavoidable: When an answer depends on data across many systems and teams, who is accountable for accuracy? An AI answer is assembled at request time from corporate data. The data stack is inside the response. AI Turns Data Infrastructure Into Product Behavior AI makes the data stack part of product behavior, but raw infrastructure should not leak into the product. The goal is to abstract the stack behind durable, governed interfaces. An AI feature should consume meaning, relationships, permissions, and context. Following data mesh and data contracts, the API layer has to evolve from returning data to exposing capabilities. A consumer, including an AI model, should depend on a contract that carries: Metadata – origin, lineage, meaningQuality – freshness, completeness, confidenceRelationships – how entities compose and traverseSecurity – authorization applied consistently across operational, analytical, and vector stores When meaning lives in the contract, infrastructure becomes interchangeable, and a misbehaving AI feature is no longer an opaque failure — it’s a question with an owner. Where Ownership Breaks First Ownership does not break at the edges of systems but much earlier, in how the organization is designed. Most technology organizations still distribute teams around components and technical specialization: applications, databases, pipelines, governance, indexing, and analytics. Each team owns its layer, though no one owns the end-to-end meaning of the data. That worked when data only fed analytics. It fails in AI-native products, where data is product behavior, and the two lifecycles are inseparable. AI composes its behavior across every layer at once, inheriting each inconsistency in semantics, freshness, permissions, and relationships. So this is not a handoff problem; it is a Conway’s Law problem. Architecture mirrors the organization, and AI makes the organizational seams visible to the user. Platform teams remain essential for shared abstractions, governance primitives, and standards. But product teams need to own both their features and APIs and their data end to end: its lifecycle, meaning, quality, and governance. Splitting teams by technical layer scatters one business entity across many disconnected owners, and AI inherits that fragmentation. AI-native organizations give product teams end-to-end ownership of the data, with platform teams providing shared standards. Accountability Follows Product Behavior When data only fed dashboards, accountability could stay narrow: Did the pipeline run, and did the report match? AI moves that boundary. Once retrieval, copilots, and agents start making decisions and generating answers from data, a correct pipeline, a healthy index, and a valid access policy still don’t guarantee a correct user-facing result. Accountability can’t be pinned to technical layers. It has to follow the behavior the user experiences. The product team that owns an AI capability is responsible for the end-to-end correctness, freshness, explainability, and safety of the data behind it. Its job is to own the contract that defines what the AI may know and retrieve. Platform teams provide the standardized primitives that make this accountability structure possible: semantic contracts, lineage, quality signals, access enforcement, observability, and governance-aware retrieval. The question shifts from “which team owns this layer?” to “which product team owns this behavior, and which platform capabilities guarantee it?” In AI-native systems, accountability rests with the team that owns the behavior, not the system that happened to fail. Table: Accountability Differences Between the Layer-by-Layer and AI-Native Models arealayer-by-layerai-nativeSource of truthEach system decides locallyThe product team owns the authoritative semantic contractQualityThe data team checks pipelinesThe product team owns user-facing correctness; the platform provides quality signalsRetrievalThe platform team owns indexes as infrastructureA governed product capability with explicit SLOsAccessThe security team owns policies separatelyEnforced consistently across product, data, and AI layersIncidentsRouted to whichever layer failedThe product team leads; the platform, data, and security teams support as capability owners Architecture Choices Are Also Operating Model Choices Architecture decisions also decide how an organization governs and evolves meaning. AI-native systems raise the stakes here because copilots and agents consume meaning — entities, relationships, metrics, and permissions — rather than tables. Semantic consistency becomes part of how the product behaves. No central team can own the meaning of every domain, so meaning has to live close to the domain that owns the capability. But decentralization alone backfires: Without platform-enforced standards, the old central bottleneck just turns into semantic fragmentation, with every domain exposing its own definitions and contracts. The fix is to split ownership cleanly: Domains own the meaningPlatform teams own the contracts that keep it consistent Underneath, storage and processing keep churning. What actually lasts is whether stable abstractions (e.g., “employee,” “payroll,” “entitlement”) survive above them. The principle is simple: Infrastructure should be replaceable, and meaning should not. So the real operating-model choice comes down to who owns meaning, and who keeps it consistent. Shared Data Contracts Make Accountability Concrete If organizational fragmentation is the root problem, contracts make ownership explicit. A classic data contract is necessary but insufficient. Schema validation catches a renamed column, but it misses semantic drift, stale meaning, or a changed business definition. Those failures don’t break a build. They break behavior. The contract has to grow from schema into semantics, carrying meaning, lineage, quality, and authorization. Crucially, it abstracts the capability and meaning a domain exposes, not the storage format underneath, so it behaves the same whether the source is a table, a document, an event, or an embedding. That makes the data contract both a producer-to-consumer check and a runtime semantic interface that retrieval, copilots, and agents all consume. Its real value is relocating accountability to the source so drift surfaces in the producing domain while context stays local, which accelerates interoperability rather than centralizing control. Governance Has to Travel With the Data Traditional governance sat beside the data in the form of periodic reviews, approvals, and access checks. AI breaks that model. Data now moves continuously through pipelines, caches, embeddings, indexes, and agents, recomposed at runtime faster than any review can observe. Governance must be part of the execution model itself. Governance travels with meaning, not storage. An embedding holds no raw rows yet reveals sensitive meaning, so policy must follow the semantic classification. The gap is sharpest in authorization. Identity systems stop at the API boundary, and AI doesn’t preserve security boundaries on its own, which turns every embedding, cache, and retrieval step into a new one to defend. Governance therefore becomes a runtime capability that decides what AI may retrieve, infer, expose, and act on. Solving that calls for composable, declarative governance primitives embedded in the platform so auditability becomes a property of the system rather than the outcome of a project. Accountability Gaps That Slow AI Data Work The real cost of fragmented accountability is the constant drag on every data-powered capability. Friction is never neutral, so when teams can’t trust the platform’s freshness, semantics, or governance, they route around it and build their own, resulting in shadow pipelines, local indexes, and duplicated transformations. Each workaround makes sense locally even as it corrodes the whole, fragmenting governance and eroding trust in the very platform it was meant to replace. And piling on more central control only hides the problem — the fragmentation just migrates into those shadow systems. So the deeper gap was missing platform contracts. What Clear Ownership Looks Instead of adding more teams on more layers, clear ownership means aligning accountability with the single product experience the user meets. What you’re really investing in is the stable semantic abstractions that outlast whatever infrastructure comes and goes. And the hardest problem is how to make the organization understandable to its own AI systems. Additional resources: DAMA-DMBOK: Data Management Body of KnowledgeDAMA International – foundational guidance on data ownership, stewardship, and governance rolesOpen Data Contract Standard (ODCS) – an open spec for declaring schema, semantics, quality, and service levels between data producers and consumersOpenLineage – an open standard for collecting data lineage across pipelines and services, useful for tracing what AI features consumeNIST AI Risk Management Framework (AI RMF) – a vendor-neutral framework for accountability and governance of AI systemsCoral – exposes diverse data sources to agents through one declared SQL and semantic layer; an example of meaning being owned per source rather than centrallyGetting Started With Data Quality, DZone Refcard by Miguel García LorenzoData Pipeline Essentials, DZone Refcard by Sudip SenguptaOpen-Source Data Management Practices and Patterns, DZone Refcard by Abhishek GuptaReal-Time Data Architecture Patterns, DZone Refcard by Miguel García Lorenzo“Building Trusted, Performant, and Scalable Databases: A Practitioner’s Checklist” by Saurabh Dashora This is an excerpt from DZone’s 2026 Trend Report, Cognitive Databases, Intelligent Data: Unified Infrastructure for Vector Search, AI-Optimized Queries, and Hybrid Workloads.Read the Free Report

By Miguel Garcia DZone Core CORE
Architectural Collapse: How Extension Poisoning, Node Vulnerabilities, and Infrastructure Fog Enabled the GitHub Repository Breach
Architectural Collapse: How Extension Poisoning, Node Vulnerabilities, and Infrastructure Fog Enabled the GitHub Repository Breach

Enterprise perimeter defenses are fundamentally built on an obsolete assumption that the developer’s workstation is a secure, trusted anchor point. The massive security breach executed by the threat group TeamPCP, resulting in the exfiltration of 3,800 internal GitHub source code repositories, completely shattered this illusion. This was not a standalone exploit. It was a multi-vector convergence where vulnerabilities in the Node/NPM ecosystem, the systemic ungoverned architecture of the Visual Studio Code Marketplace, and the tactical “fog of war” caused by a period of historic GitHub infrastructure instability came together to create the perfect attack. Phase 1: The Root Exploitation (Node/NPM and the TanStack Supply Chain Pivot) The kill chain did not begin at GitHub; it originated deep within the modern JavaScript developer tool-chain. TeamPCP executed a localized supply chain compromise targeting upstream open-source utilities, specifically targeting contributors to TanStack npm packages (a widely relied-upon suite for state management and routing). [TanStack NPM Compromise] -> [Stolen ‘gh’ CLI Tokens] -> [Nx Console Pipeline Hijack (No Multi-Admin Approval)] -> [Malicious Extension Version 18.95.0 Published] By injecting malicious code into these highly trusted downstream dependencies, the attackers performed targeted local credential harvesting. Their primary target was not production application code, but the development environments of the maintainers themselves. The exploit successfully extracted long-lived GitHub CLI (`gh`) authentication tokens from a legitimate core developer who maintained both TanStack and the Nx Console ecosystem. Because these developer access tokens lacked granular scoping restrictions, they provided direct administrative write access to the main release pipelines of secondary repositories. Phase 2: VS Code Extension Poisoning (The Nx Console Triage (CVE-2026–48027)) Armed with the stolen gh tokens, TeamPCP bypassed standard perimeter security by pivoting to the Visual Studio Marketplace. On May 18, version 18.95.0 of Nx Console (a heavily utilized Monorepo orchestration extension with over 2.2 million installs) was maliciously built and uploaded. The deployment revealed two fatal flaws within modern developer workflows, 1. The Single-Factor Release Pipeline The malicious version was uploaded directly to both the Visual Studio Marketplace and the open-source OpenVSX registry. Because the Nx Console publishing architecture lacked a “two-admin manual validation mandate” for automated releases, the publishing pipeline accepted the stolen developer token at face value without triggering a secondary verification gate. 2. The “Silent Killer”: Marketplace Metrics vs. Background Sync Microsoft’s public marketplace logs initially registered a negligible 28 manual downloads before the package was identified and yanked 18 minutes later. However, Nx’s internal telemetry revealed that ~6,000 extension activations occurred simultaneously. This massive discrepancy highlights the danger of VS Code’s background auto-update synchronization. Thousands of developer environments pulled down, unzipped, and executed the malicious version automatically while the developers’ IDEs were running in the background. JSON // Example of the target parameters within compromised developer workspaces { "extensions.autoUpdate": true, // The default vulnerability exploited by TeamPCP "terminal.integrated.profiles.osx": { "malicious-hook": { "path": "/bin/bash", "args": ["-c", "python3 ~/.local/share/kitty/cat.py &"] } } } The Node Execution Layer Upon extension activation, the poisoned payload immediately dropped an obfuscated Node.js post-install hook. Operating completely within user space to evade basic Endpoint Detection and Response (EDR) behavioral hooks, it set an environmental marker (`__DAEMONIZED=1`) and spawned a background Python process (`cat.py`). The malware systemically scanned local paths for, Infrastructure configuration: Plaintext HashiCorp Vault tokens (`~/.vault-token`), local Kubernetes kubeconfig files, and AWS/Azure IAM metadata endpoint hashes.Ecosystem identity: Plaintext .npmrc registry tokens and active GitHub tokens (`ghp_`, gho_, ghs_).Active memory subsystems: Contents of 1Password vaults via the op CLI by hijacking active, unlocked terminal sessions. Phase 3: The Climax (The Internal GitHub Breach) The payload achieved its ultimate goal when an internal GitHub software engineer, who utilized Nx Console for local workflow management, had their workstation pull down the background update. The malware executed on the engineer’s local machine, scraped their active internal enterprise session tokens, and exfiltrated them to a remote command-and-control (C2) server. TeamPCP then used these highly privileged internal access credentials to bypass GitHub’s corporate identity perimeters entirely. Because internal repository boundaries operate on flat network access structures once an authenticated developer endpoint is cleared, the threat actors systematically cloned and exfiltrated 3,800 proprietary internal GitHub source code repositories before the endpoint could be isolated. Phase 4: The Tactical Fog of War (GitHub’s Infrastructure Instability) The velocity and stealth of this attack were significantly aided by an ongoing reliability crisis within GitHub’s core infrastructure. During the 12-month window surrounding the breach, GitHub recorded a massive spike in service degradation. Total tracked incidents: 257 distinct technical incidents.Major outages: 48 major service shutdowns, totaling 112 hours and 18 minutes of total downtime.Primary failure vector: GitHub Actions experienced 57 outages, three times the incidence rate of core Git storage operations.On October 29: Outage in compute dependency (Microsoft Azure). 90% error rate across Codespaces/ Actions affected Telemetry gaps, security monitoring systems failed to track cross-border API token replication.On February 2: Configuration failure in user settings caching, cascading failures in the Git HTTPS proxy affected High volume of synchronous cache writes generated a deluge of network errors, masking anomalous Git clone calls.On February 12: Authorization claim changes in core networking dependencies, 90% Codespace provisioning failure affected Security alerts failed to populate due to misclassified severity ratings, delaying incident detection by hours. The Alert Fatigue Vulnerability This constant infrastructural noise created an ideal tactical environment for the attackers. SecOps and DevSecOps teams were caught in a continuous state of alert fatigue, dealing with broken GitHub Actions pipelines, Elasticsearch cluster degradation, and database timeouts. When TeamPCP’s automated scripts began running rapid API queries and pulling massive volumes of repository data using the compromised engineer’s token, the unusual spikes in data transfer blended into the background noise of an infrastructure already struggling with systemic capacity and networking failures. Hard Takeaways: How Developers Must Harden Their Environments If your environment was active or using automated tools during this period, you must shift your development machine from an implied trust zone to a completely zero-trust environment. 1. Kill IDE Auto-Updates Globally Never allow your IDE to pull unvetted code in the background. Explicitly configure your editor to require manual permission before updating any third-party extension. In VS Code’s global settings.json, enforce: Plain Text "extensions.autoUpdate": false, "extensions.autoCheckUpdates": false 2. Isolate Development Runtimes Stop running compilers, package managers, and complex IDE extensions directly on your bare-metal operating system. Utilize isolated, ephemeral development environments (e.g., containerized Dev containers or heavily scoped virtual environments) where local file-system access is completely decoupled from your master ~/.ssh/, ~/.aws/, or .npmrc configuration folders. 3. Implement Strict Token Volatility Eliminate long-lived personal access tokens (`ghp_`). Switch entirely to fine-grained personal access tokens configured with strict, single-repository scope constraints and a maximum 7-day expiration date. Explicitly configure your local password managers and authentication tools to require biological verification (e.g., TouchID/FaceID) or short timeout windows for every individual call executed via the terminal command line (`op signin timeout`).

By Akash Lomas
Phantom APIs Are Eating Your Attack Surface, and Most Security Teams Are Still Looking the Other Way
Phantom APIs Are Eating Your Attack Surface, and Most Security Teams Are Still Looking the Other Way

I've spent the better part of fifteen years staring at API traffic logs for a living, and I can tell you the job has changed twice. The first shift came with microservices, when a handful of monolithic endpoints became thousands of small, chatty interfaces, and nobody could agree on who owned the inventory. The second shift is happening right now, and it's worse because this time the endpoints aren't even being written by people who can explain why they exist. Call them phantom APIs: routes, handlers, and parameters that show up in production but never appear in a spec, a ticket, or a design review. Some get hand-built by a developer in a hurry and are forgotten. Increasingly, though, they're a byproduct of AI code generation — Copilot, Cursor, an internal fine-tuned assistant, whatever your shop has standardized on — quietly scaffolding an admin route, a debug handler, or a permissive query path because that pattern showed up often enough in training data to feel "normal." Nobody asked for it. Nobody reviewed it with fresh eyes, because by the time a human glances at the diff, the suggestion already looks plausible. That's the part that should worry you more than any single CVE: plausibility, not malice, is now the main vector. How a Phantom Gets Born Here's the mechanism, stripped of drama. An engineer asks an AI assistant to "add an endpoint that lets support staff look up account status." The model, trained on millions of internal admin panels, often reaches for the path of least resistance: broad object access, no granular scope check, maybe a debug flag left wired to a query parameter "for testing." It compiles. It passes the smoke tests because the smoke tests check that the feature works, not that it's bounded. It ships. None of that shows up in your OpenAPI document because nobody updated the spec — the AI didn't know one existed, and the human reviewing the pull request was scanning for logic bugs, not authorization boundaries. Your API gateway, meanwhile, is busy enforcing policy on the routes it knows about. A path it has never seen just rides along on the same TLS termination and the same network ACLs as everything else, because from the network's point of view, there's nothing unusual happening. The gateway isn't broken. It's just answering a question nobody thought to ask it. I've heard versions of this story from engineers at a logistics platform, a healthcare billing vendor, and a fintech, all in the last year, none of whom wanted their names anywhere near a public postmortem — which is its own data point. Shame keeps these incidents quiet, and quiet incidents are exactly what let the pattern repeat across the industry instead of getting fixed once. The Numbers Stopped Being Theoretical in 2025 If you've been treating "API security" as a slide in next year's budget deck rather than this quarter's incident response calendar, the data from the past twelve months should change your mind. Wallarm's 2026 API ThreatStats Report, which pulled from 67,058 published vulnerabilities and 60 disclosed API breaches across 2025, found that API-related flaws made up 17% of all published vulnerabilities and 43% of the entries CISA added to its Known Exploited Vulnerabilities catalog that year. The technical profile of those flaws is the part that should keep API owners up at night: 97% exploitable with a single request, 99% remotely reachable, and 59% requiring no authentication at all. This isn't an attack surface that rewards patience and tradecraft. It rewards speed, and speed is exactly what AI tooling hands to attackers as readily as it hands to developers. That same report tracked AI-related vulnerabilities jumping from 439 in 2024 to 2,185 in 2025 — a 398% increase — with 315 of those tied specifically to Model Context Protocol implementations, the connective tissue between AI agents and the tools they're allowed to call. MCP didn't exist as a meaningful attack surface two years ago. Now it's 14% of all AI-related vulnerability disclosures in a single annual report. I don't think I've watched a category go from nonexistent to material that fast since the early days of container orchestration. IBM's X-Force Threat Intelligence Index 2026 adds the macro view: exploitation of public-facing applications became the single most common initial access vector in 2025, up 44% year over year, and 56% of the roughly 40,000 vulnerabilities X-Force tracked required no authentication to exploit. CybelAngel's own 2025 API threat reporting found that 95% of API attacks that year originated from sessions that were already authenticated — meaning the front door wasn't the problem; what happened after someone walked through it was. Put those two findings side by side, and you get a fairly bleak picture: getting in is easy, and once an attacker is in, the API layer rarely stops them from going sideways. And CrowdStrike's 2026 Global Threat Report puts a number on how little time defenders now have to notice. Average eCrime breakout time — the gap between initial access and lateral movement — fell to 29 minutes in 2025, down from 48 minutes the year before and 98 minutes in 2021. The fastest breakout CrowdStrike observed clocked in at 27 seconds. AI-enabled adversary operations rose 89% year over year, and the company recorded prompt-injection or AI-tool abuse incidents at more than 90 organizations. As Adam Meyers, CrowdStrike's head of counter adversary operations, put it when the report landed, breakout time is now the clearest signal of how intrusions have changed. A phantom API sitting outside your monitoring isn't a slow-burning liability anymore. It's a 27-second one. GraphQL Made This Worse, Not Better GraphQL was supposed to reduce shadow API risk by giving clients one well-documented entry point instead of dozens of REST routes. In practice, it concentrated the risk instead of eliminating it. Roughly 70% of organizations now run GraphQL in some form, according to Wallarm's Q2 2025 ThreatStats data, and the same report flagged something that should sound familiar to anyone who's done incident response: zero GraphQL-specific breaches were publicly disclosed that quarter, despite the technology's deep reach into production systems. That's not a sign GraphQL is safe. It's a sign almost nobody is looking closely enough to catch what's happening inside a single, deeply nested query that can touch a dozen resolvers and a dozen authorization decisions in one round trip. A REST endpoint that's missing an authorization check is one bug. A GraphQL resolver tree with the same gap can be a dozen bugs wearing one URL. Shadow and zombie APIs compound the problem from the other direction. Salt Security's 2025 CISO report found that only 19% of CISOs globally have full visibility into their API inventory — just 27% among large enterprises, and a thin 12% among smaller organizations — despite 73% ranking API security as a high or critical priority. Two-thirds of organizations audit for shadow APIs only monthly or quarterly, which leaves a four-to-twelve-week window every single cycle during which an undocumented route can sit there, fully reachable, before anyone goes looking. Salt Labs' own Q1 2025 data found that 99% of organizations had encountered an API security issue in the prior twelve months, and BOLA and injection flaws together accounted for more than a third of everything reported. None of this is exotic. It's the same handful of failure modes, recurring at a scale that AI-assisted development is now accelerating rather than fixing. The Failure Chain, Step by Step Strip away the vendor-report statistics for a second and walk through how this actually plays out on a single team, because the abstraction is where people lose the thread. A developer asks an AI assistant for a quick internal tool: pull account status for support staff, fast, no fuss. The assistant generates a working route, and because "working" was the only bar anyone set, it also generates a second, undocumented path the model added on its own initiative — a debug variant that accepts a raw account ID with no scope check, left over from however the model's training data tends to structure admin tooling. The pull request gets reviewed for logic, not for the existence of a route nobody asked for, because nobody is in the habit of reading a diff looking for endpoints that shouldn't exist. It merges. The OpenAPI spec doesn't change because nothing in the toolchain forces it to. The API gateway keeps doing its job — rate limiting, TLS, routing — on every path it's configured to recognize, and the new one simply isn't on that list, so it inherits whatever the underlying framework allows by default rather than anything the security team actually decided. For months, nothing happens because nobody is sending traffic to a path nobody knows about. Then someone does. Maybe it's a script kiddie running a wordlist against common admin paths, maybe it's a scraper, maybe it's one of the AI-driven reconnaissance tools the CrowdStrike and Wallarm data above describe as increasingly common. The request lands. There's no auth check to fail, so there's no log entry resembling a failed login — the kind of signal most SOC dashboards are tuned to catch. There's just a 200 response and a payload of account data. Given that CrowdStrike clocked the fastest 2025 breakout at 27 seconds and the average at 29 minutes, the gap between "endpoint found" and "data gone" is no longer a window anyone can rely on noticing in real time. By the time it surfaces — an anomaly report, a customer complaint, a researcher's disclosure email — the honest answer to "how long has this been exposed" is usually some shrug-worthy variant of "the logs only go back so far." That's the chain: AI suggestion → unreviewed scope gap → silent spec drift → gateway blind spot → silent exploitation → discovery after the fact. Every link in it is mundane. None of it requires a sophisticated attacker. That's exactly why it keeps happening. What I'd Actually Build to Catch It Description is cheap. Here's the shape of a pipeline I'd put in front of a team that wanted to stop shipping phantom routes instead of just talking about the risk: Plain Text CI/CD LAYER (pre-merge, blocking) → Generate live OpenAPI spec from the build → Diff against the last approved spec → Any new route not explicitly annotated/reviewed → FAIL build → Flag missing auth decorators, missing rate-limit config, wildcard scopes RUNTIME LAYER (continuous, post-deploy) → Traffic profiler sits behind the gateway, fingerprints every path actually receiving requests → Cross-reference live traffic against the approved spec, on a rolling window (hours, not quarters) → Anything serving 200s that isn't in the spec → page on-call, not a quarterly report GATEWAY LAYER (enforcement) → Default-deny for any path not present in the signed spec → Schema validation on request/response shape, not just route existence → Auth/scope check enforced at the gateway, independent of what the service itself does The CI step is the cheapest control here, and the one most teams skip, because it requires someone to decide that an undocumented route is a build failure, not a Slack message for later. The runtime layer catches what gets past CI anyway — config drift, routes added outside the normal deploy path, anything a human forgot to annotate. The gateway layer is the backstop: even if the first two fail, a default-deny policy means an unrecognized path doesn't get served at all, rather than getting served and merely logged. None of these three layers is sufficient alone. Together, they convert "we hope someone notices" into "the system refuses to let this happen quietly," which is the actual point. What Actually Works, and What's Mostly Marketing The vendor response has been predictably fast and not entirely cynical. Akamai's $450 million acquisition of Noname Security, announced in May 2024 and closed that June, folded one of the better-regarded API discovery platforms directly into a CDN-and-edge company's security stack — a clear bet that API visibility belongs as close to the traffic as possible, not bolted on afterward. Salt Security's 1H 2026 report introduced what it calls Agentic Security Posture Management, aimed squarely at mapping the relationships between LLMs, MCP servers, and the APIs underneath them, specifically to catch what the industry has started calling "Shadow MCP." Whether that label sticks or fades in eighteen months, the underlying instinct is correct: you cannot secure an API layer you can't continuously enumerate, and static documentation reviewed once a quarter is no longer a serious control. The defenses that actually move the needle, based on what I've watched, hold up under real incident response, aren't glamorous: Runtime discovery over documentation trust. Treat your OpenAPI spec as a claim to be verified against live traffic, not a source of truth. If traffic is hitting a path that isn't in the spec, that's an incident, not a documentation gap.Spec-diffing in CI, not just in security review. A pull request that introduces a new route should fail a build if that route doesn't appear in an updated, reviewed spec. This is cheap to automate and catches the AI-generated-endpoint problem at the exact moment it's introduced.Authorization checks that don't trust the session. Given that 95% of API attacks in CybelAngel's 2025 dataset started from an authenticated session, the perimeter check matters far less than the per-object, per-field authorization decision happening on every single call.AI-assisted review aimed at AI-generated code specifically. Ironically, the same pattern-matching that produces phantom endpoints can be turned around to flag them — diff-aware tooling that specifically interrogates new routes for missing rate limits, missing auth decorators, or unscoped data access, rather than general-purpose linting.Treat MCP and agent tool definitions as part of your API attack surface, full stop. They're not a side project. They're API endpoints with extra steps, and the ThreatStats data says they're already 14% of AI-related disclosures. None of these are silver bullets, and I'd be lying if I said any vendor has fully solved this. What I will say, after watching this category for a year now, is that the organizations doing well are the ones that stopped treating "shadow API discovery" as a once-a-quarter audit and started treating it as a property of the deployment pipeline itself — something that gets checked on every merge, the same way a linter or a test suite does. The ones still relying on a documentation review process built for a world where humans wrote every route are going to keep finding out about their phantom APIs the way most teams still do: during an incident, not before one. The question worth sitting with isn't whether your API inventory has gaps — every inventory does. It's whether you could currently produce, on demand, a complete list of every endpoint serving production traffic right now, including the ones nobody remembers approving. If the honest answer is no, you don't have an API security posture. You have an API security guess, and AI-generated code is making the guess bigger every sprint.

By Igboanugo David Ugochukwu DZone Core CORE

The Latest Software Design and Architecture Topics

article thumbnail
Why Push-Based Systems Fail at Scale — and How Hybrid Fan-Out Fixes It
Push-based systems work until celebrity-scale traffic creates massive fan-out pressure. Modern platforms solve this using hybrid architectures.
July 1, 2026
by Jayapragash Dakshnamurthy
· 220 Views
article thumbnail
One Stolen Key, One Stolen Token: Why Machine Identity Is Cloud-Native's Quietest Crisis — and the Only Fix That Actually Holds
Learn how stolen machine credentials fuel major cloud breaches and how policy-as-code and short-lived identities help stop modern attacks.
July 1, 2026
by Igboanugo David Ugochukwu DZone Core CORE
· 377 Views
article thumbnail
Why AI-Generated Code Is Making Regression Testing More Important, Not Less
AI-generated code introduces integration failures that spec-based tests cannot catch. Regression testing grounded in real production behavior is the fix.
July 1, 2026
by Sancharini Panda
· 273 Views
article thumbnail
Can Rust Have Zero-Cost Dependency Injection?
This article explains compile-time dependency injection in Rust without reflection, runtime containers, dyn, Arc, Rc, or runtime resolution overhead.
July 1, 2026
by Dmytro Brazhnyk
· 335 Views
article thumbnail
An Ingredient List Doesn't Stop the Worm: What SBOMs Can and Can't Do
An SBOM alone can't stop supply chain attacks. Learn why software signing, provenance, and deployment verification are essential for secure releases.
June 30, 2026
by Igboanugo David Ugochukwu DZone Core CORE
· 480 Views
article thumbnail
A Low-Latency Routing Pattern for Multiple Small Language Models
A low-latency multi-SLM architecture uses a lightweight router to direct requests to the most suitable language model, ensuring fast responses with minimal overhead.
June 30, 2026
by Akhil Madineni
· 386 Views
article thumbnail
Beyond Static Thresholds: Building Self-Healing Systems via Context-Aware Control Loops
Static thresholds fail in complex distributed systems. This article introduces a context-aware control loop architecture to isolate failures and automate recovery.
June 29, 2026
by Darshan Botadra
· 730 Views
article thumbnail
Architecting Trustworthy AI: Engineering Patterns for High-Stakes Environments
This post presents three domain-agnostic engineering patterns for building AI systems that remain safe even when the model is wrong.
June 29, 2026
by Sujay Puvvadi
· 740 Views
article thumbnail
Building Production-Safe Agentic Remediation With Docker MCP Gateway: Lessons From 43% to 100% Accuracy
We built an AI Docker remediation system on MCP Gateway. First version: 43% correct. After 9 engineering fixes: 100%. Here's what changed.
June 29, 2026
by Mohammad-Ali Arabi
· 870 Views
article thumbnail
High-Cardinality Threat Detection: Why MapReduce Breaks and Heuristics Win
Scalable systems that succeed don’t process more — they ignore more, using heuristics to isolate the small fraction of activity that actually matters.
June 29, 2026
by Karanpreet Singh
· 610 Views
article thumbnail
Mac Native Builds, Live Protocols, And Open Issues Under 350
The open issue count dropped below 350 after a push through the oldest reports, and the same week brought native Mac builds, WebSockets in the core, gRPC and GraphQL inte
June 29, 2026
by Shai Almog DZone Core CORE
· 585 Views · 1 Like
article thumbnail
The New Insider Threat Isn't Human: Securing AI Agents Before They Secure Themselves
AI agents are becoming powerful insiders. Learn how identity, MCP security, least privilege, and policy enforcement reduce emerging risks.
June 26, 2026
by Igboanugo David Ugochukwu DZone Core CORE
· 1,723 Views · 1 Like
article thumbnail
Selective Deployment in Azure Data Factory: A Practical Blueprint for Safer CI/CD
Implement selective deployment in Azure Data Factory to safely promote individual features without deploying the entire factory state
June 26, 2026
by Sauhard Bhatt
· 1,446 Views · 1 Like
article thumbnail
Two Clocks Are Running Out at Once, and Almost Nobody Is Watching Both
Quantum computing and AI coding tools are changing security. Learn why crypto-agility and better governance are now critical.
June 26, 2026
by Igboanugo David Ugochukwu DZone Core CORE
· 1,793 Views · 2 Likes
article thumbnail
What Cloud Engineers Actually Need to Know About AI Infrastructure
AI infrastructure isn’t about GPUs. Most issues come from storage, networking, data pipelines. If GPU utilization is low, check the infrastructure first, not the model.
June 26, 2026
by Naveen Kalapala
· 930 Views
article thumbnail
A Tool Is Not a Platform (And Your Team Knows the Difference)
Calling a collection of tools a platform creates expectations it cannot meet. A platform has a contract. A toolchain has documentation.
June 25, 2026
by Jeleel Muibi
· 1,120 Views
article thumbnail
No VIP? No Problem: Pacemaker-Based SAP HANA High Availability Using a Load Balancer Health Check
Many cloud platforms do not support floating virtual IPs, which breaks the standard RHEL Pacemaker setup for SAP HANA HA. Use a network load balancer.
June 25, 2026
by Vidyasagar (Sarath Chandra) Machupalli FBCS DZone Core CORE
· 1,075 Views · 2 Likes
article thumbnail
Sharing SBOMs Securely Without Giving Too Much Away
SBOMs improve software supply chain transparency, but sharing them carelessly creates risk. Learn how controlled disclosure balances trust and security.
June 25, 2026
by Sven Ruppert DZone Core CORE
· 1,627 Views
article thumbnail
Code and Connect: MCP + MuleSoft
Understand MCP, AI agents, and assistants, and learn how Model Context Protocol connects AI applications to tools using MuleSoft.
June 25, 2026
by Ajay Singh
· 1,156 Views
article thumbnail
REST-Assured Configuration and Specifications: Writing Maintainable API Tests
Learn how to use REST-Assured Configuration, Request Specifications, and Response Specifications to build maintainable API tests.
June 25, 2026
by Faisal Khatri DZone Core CORE
· 981 Views
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • ...
  • Next
  • RSS
  • X
  • Facebook

ABOUT US

  • About DZone
  • Support and feedback
  • Community research

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 215
  • Nashville, TN 37211
  • [email protected]

Let's be friends:

  • RSS
  • X
  • Facebook
×