Coding Resources

DZone's Featured Coding Resources

Will AI Keep Us Stuck in 2020 Architectures?

By Daniel Sagenschneider

Every time I sit down with an AI coding assistant, I notice the same thing: it is very good at Spring. Annotations, profiles, @Autowired, the whole call-stack-driven dance of beans wiring into beans. AI has seen twenty years of this. It guesses well, even when it has to infer how a profile-specific bean is going to be selected at runtime. This is because it has seen ten thousand examples of exactly that pattern. Which raises an uncomfortable question for anyone working on a new architecture: if AI is this fluent in 2020-era patterns, are we as an industry going to stay locked into those patterns simply because that's what the model knows? Is AI a conservative force that quietly drags software architecture backward to its training data's center of mass, no matter how good a newer idea might be? I wanted to find out, using my own project as the test case. The Bet: An Explicit Index Beats an Implicit One OfficeFloor version 4 added a feature I think is genuinely interesting for the AI era: REST endpoints can now be defined in YAML files, sitting alongside your existing Spring Boot code, with the directory structure following the URL structure. A file at greeting.POST.yml defines POST /greeting. A file at greeting/{name}.GET.yml defines GET /greeting/{name}. Inside that file, you compose the small functions that handle the request: YAML # greeting.POST.yml validate: class: ValidateGreetingLogic outputs: valid: build build: class: PostGreetingLogic next: audit audit: class: AuditGreetingLogic Each function still gets its dependencies injected by Spring exactly as it always has. OfficeFloor doesn't replace Spring's DI, persistence, security, or actuator setup. What changes is the flow. In a typical @RestController, the order in which validation, business logic, and auditing run is implicit: it lives in the call stack (in if statements and which methods call which other methods). To understand it, you read code. To change it, you read more code, because the wiring isn't written down anywhere as data; it's compiled into control flow. In the OfficeFloor YAML version, that wiring is the file. Conditional branches, sequencing, error flows: they're declared, not buried. No function in the chain knows about the others. No annotation is describing the relationship from inside a class. The YAML is a complete, readable specification of how the endpoint behaves, sitting right next to the endpoint's own URL path in the directory tree. This is essentially Function Injection, the same move Dependency Injection (DI) made decades ago, but one level up. DI took "what do I depend on" out of imperative constructor code and made it an explicit, configured, first-class concern. Function Injection takes "what happens next" out of the implicit call stack and makes that explicit and configured too. It's a continuation of the Inversion of Coupling Control idea I've been writing about for years: Dependency Injection only ever solved one slice of the coupling problem. Control flow coupling was always still there, just invisible. For a human reading the code, this might feel like a wash, maybe even a step backward. This is exactly why the industry settled on annotations next to code in the first place; developers wanted the wiring close to the implementation, not off in some separate descriptor. That preference made sense when humans were the primary readers doing the navigating. But an AI assistant isn't a human reading top to bottom. An AI assistant is trying to find the minimum context needed to make a correct, surgical change, and that's a search and navigation problem, not a stylistic one. A YAML file that names every function in an endpoint's execution path, in order, with explicit conditional branches, is a search index. The AI doesn't need to read the rest of the code base to be confident it has found everything relevant to that endpoint. It opens one small file, and the entire behavioral contract of that URL is sitting right there. That's the theory, anyway. I wanted to know if it would actually hold up against a model that has been trained almost exclusively on the other way of doing things. The Experiment: Converting Spring PetClinic REST To test this for real, rather than on a toy example, I took Spring PetClinic REST, the long-standing reference REST implementation of the PetClinic sample app that the Spring community has used for years, and worked with AI to convert its endpoints over to the OfficeFloor REST YAML approach. It did not work on the first attempt. It took about five iterations to get a clean conversion, and the bottleneck wasn't OfficeFloor's runtime, and it wasn't really the AI's coding ability either. It was documentation. Each attempt surfaced a gap in the tutorials: some assumption I'd left implicit because it was obvious to me, a place where the YAML schema's possibilities weren't spelled out, an edge case in how a Spring @RestController-style behavior should map across. I used the AI's confusion as a signal: where it guessed wrong or asked the wrong question, that was exactly where the tutorial needed another paragraph, another example, another explicit rule. Five rounds of "AI gets stuck, tutorial gets fixed, try again" later, the conversion went through cleanly. I recorded the final working conversion. You can watch it here: Spring PetClinic REST to OfficeFloor REST YAML You can also see the resulting changes in the forked repository pull request. So Which is It? Does AI Lock in 2020 Architecture or Not? Both things turned out to be true, depending on what's actually being asked of the AI. Where AI defaults to what it knows: left to its own judgment, an AI assistant will reach for Spring conventions, because Spring conventions are the statistically dominant pattern in its training data. If you ask it to "add a REST endpoint" with no further steering, you'll get an @RestController and an @Autowired field every time. That's not a flaw in the model. It's just what twenty years of public code looks like, averaged. Where AI happily adopts something new: the moment the new pattern is clearly and completely specified, the model's prior training stopped being an obstacle and became almost irrelevant. AI doesn't need to have seen ten thousand examples of a YAML-driven REST framework to use one correctly. It needs an accurate, complete description of the schema and the conventions, and then it follows that description. The five-iteration process wasn't really "teaching the AI to think differently." It was closing the gaps between what I assumed was obvious and what was actually written down anywhere the AI could read it. That reframes the original question. The risk isn't that AI is architecturally conservative by nature. The risk is that new architectures rarely come with documentation anywhere near as exhaustive as Spring's, because Spring's documentation had two decades and a vast community writing tutorials, blog posts, Stack Overflow answers, and books about it. A new approach starts that race from zero. If its docs stay thin, AI will keep defaulting to Spring patterns, not out of preference, but because Spring is simply the only option it has enough information about to be confident in. So the honest answer is: AI won't keep us in 2020 architectures by itself. But it will, by default, if nobody does the work of making the alternative legible to it. The model doesn't have an opinion about which architecture is better. It has a confidence gradient shaped entirely by how well-specified each option is in what it's been able to learn or been given. The Interesting Part for Framework and Tool Authors If this holds generally, and I'd be curious whether others doing similar work see the same thing, it changes the calculus for anyone designing a new way of building software in the AI era. It used to be that the cost of an explicit, separated configuration artifact (think XML wiring files, or graphical configuration tools) was paid almost entirely by human developers, who found it slower to read and slower to navigate than code-adjacent annotations. That cost was real, and it's a big part of why annotation-driven frameworks like Spring won the last decade. AI changes that cost calculation. An explicit, structured index, a YAML file that names every function and every transition in an endpoint, located exactly where the URL structure says it should be, costs an AI assistant almost nothing to read and a great deal less to get wrong, because there's no implicit call-stack archaeology required. The structure that used to be a tax on humans is now a gift to the thing increasingly doing a large share of the maintenance work. But that gift only arrives if someone pays a different tax: writing the documentation thoroughly enough, and unambiguously enough, that an AI assistant can pick up the new pattern from the docs alone, the way it picked up Spring from a decade of incidental exposure. Architecture innovation in the AI era may end up being gated less by "is this a good idea" and more by "is this idea legible to an AI that has never seen it before." That's a genuinely different bar than the one we used to optimize for, and it's one I think is worth more people paying attention to. If you want to look at the actual schema, the tutorials, or try the conversion yourself, the starting point is the OfficeFloor REST tutorials, and the Spring PetClinic REST source is on GitHub if you want to attempt your own conversion and see where your AI assistant gets stuck. That's usually exactly where the next improvement to the docs needs to go. More

Reducing CI Execution Time Using Impact-Based Test Selection Across Repositories

By Raakesh Rajagopalan

Modern CI/CD pipelines often execute complete regression suites for every code change, regardless of the actual impact of the modification. While this approach guarantees broad validation coverage, it also introduces unnecessary test execution, slower feedback loops, and increased infrastructure cost. This challenge becomes more visible in microservice-based systems where repositories, services, and automation suites are distributed across multiple projects. A small change in one module can unintentionally trigger an entire regression pipeline containing tests unrelated to the updated code. To explore a lightweight solution for this problem, I built a personal engineering project that performs impact-based test selection using Git diff analysis, Spring Boot, JGit, GitHub Actions, and Karate. The goal of the project was simple: instead of executing the full regression suite for every change, dynamically determine which tests are actually impacted and execute only those tests. The project demonstrates how selective test execution can help reduce unnecessary CI workload while maintaining baseline validation coverage through fallback smoke testing. The Problem With Traditional Regression Execution In many CI/CD pipelines, regression execution is static. Every push event or pull request triggers: Full API regression suitesComplete integration testingBroad validation across unrelated modules Although this guarantees high coverage, it creates several practical problems. Slow Feedback Cycles Developers may wait several minutes or even longer to receive pipeline feedback for small localized changes. For example, updating a single payments API controller might still trigger: transactions teststransfer testsauthentication regression testsunrelated smoke validations As projects scale, this delay affects engineering productivity and release speed. Unnecessary Infrastructure Usage Executing the same large regression suites repeatedly consumes unnecessary CI resources. This becomes more expensive when: Pipelines run in parallelMultiple pull requests are activeCloud runners are billed by execution time Reduced Pipeline Efficiency In many cases, only a small subset of tests is truly relevant to the code change. Running the full suite results in redundant execution and inefficient utilization of CI infrastructure. The objective of this project was to explore whether lightweight impact analysis could reduce unnecessary test execution without introducing complicated tooling or dependency management systems. Project Overview The solution uses Git diff analysis to identify changed files and map those changes to relevant Karate test tags. The architecture consists of two repositories: Plain Text Developer Repository (fintech-impact-services) ↓ Push Event GitHub Repository Dispatch ↓ Automation Repository (karate-change-impact-test) ├── Start Spring Boot Application ├── Perform Git Diff Analysis ├── Generate Impacted Test Tags ├── Execute Targeted Karate Tests └── Generate Execution Metrics The development repository contains the Spring Boot microservice and impact analysis API. The automation repository contains Karate test suites and GitHub Actions workflows responsible for selective test execution. This separation allowed the automation layer to remain reusable and independently managed. Cross-Repository Workflow The workflow begins when code is pushed to the development repository. A GitHub Repository Dispatch event triggers the automation repository pipeline. Example dispatch payload: Shell curl -X POST \ -H "Accept: application/vnd.github+json" \ -H "Authorization: Bearer <TOKEN>" \ https://api.github.com/repos/{owner}/karate-change-impact-test/dispatches \ -d '{ "event_type": "dev_push" }' The automation repository then performs the following steps: Checkout repositoriesStart the Spring Boot impact-analysis serviceWait for API readinessCall the impact-analysis endpointRetrieve impacted Karate tagsExecute only selected testsPublish execution metrics This design helped simulate a lightweight cross-repository CI orchestration model using only GitHub-native capabilities. Implementing Git Diff Analysis Using JGit The core functionality of the project relies on identifying changed files between commits. Instead of using shell-based Git commands inside CI scripts, the implementation uses JGit, a Java library that provides Git functionality directly within Java applications. The service compares the current branch with a target branch reference and extracts modified file paths. Example implementation: Java public Set<String> getImpactedTags(String targetBranch) throws Exception { FileRepositoryBuilder builder = new FileRepositoryBuilder(); Repository repository = builder.readEnvironment() .findGitDir(new File(".")) .build(); try (Git git = new Git(repository)) { AbstractTreeIterator oldTreeParser = prepareTreeParser(repository, targetBranch); List<DiffEntry> diffs = git.diff() .setOldTree(oldTreeParser) .call(); return calculateTags(diffs); } } Once the diff entries are collected, the application extracts changed paths and sends them to the mapping engine. Using JGit provided several advantages: Better portability across environmentsEasier integration with Spring BootReduced dependency on shell scriptingCleaner CI pipeline implementation One challenge encountered during implementation was ensuring full Git history availability inside GitHub Actions runners. Shallow clones occasionally caused incorrect branch comparisons, so complete fetch depth was required for accurate analysis. Rule-Based Impact Mapping After identifying changed files, the framework maps those files to relevant Karate execution tags. Example mapping rules: Changed PathKarate Tag/payments/@payments/transactions/@transactions/transfer/@transfers/auth/@regressionpom.xml@regression Example implementation: Java if (path.contains("/payments/")) { tags.add("@payments"); } if (path.contains("/transactions/")) { tags.add("@transactions"); } The project intentionally uses deterministic rule-based mapping instead of advanced dependency graph analysis or machine learning models. The primary reasons were: SimplicityPredictabilityEasier debuggingFaster implementationLower maintenance overhead Although more sophisticated impact-analysis systems exist, lightweight rule-based mapping was sufficient to demonstrate measurable CI optimization in this project. Dynamic Karate Test Execution Once impacted tags are generated, the automation repository dynamically executes only the relevant Karate tests. Example command: Shell mvn test -Dkarate.options="--tags @payments,@transactions" The Karate framework worked particularly well for this implementation because feature files were already organized by business capability. Example structure: Gherkin features/ ├── payments.feature ├── transactions.feature ├── transfers.feature └── smoke.feature This made selective execution straightforward without requiring custom runners or additional orchestration frameworks. Safe Fallback Mechanism One important concern with selective execution is the possibility of hidden dependencies. A code change may indirectly impact areas not captured by simple path-based rules. To reduce this risk, the framework includes a fallback strategy. If no impacted tags are identified, the pipeline automatically executes baseline smoke tests. Example: Shell @smoke This ensured that critical application validation still occurred even when no direct impact mapping was detected. The fallback mechanism helped balance optimization with CI reliability. GitHub Actions Integration GitHub Actions was used to orchestrate the complete workflow. Example workflow steps: YAML - name: Start Spring Boot Service run: mvn spring-boot:run & - name: Wait for API Readiness run: | curl --retry 10 \ --retry-delay 5 \ http://localhost:8080/actuator/health One practical issue encountered during implementation was synchronization between application startup and API invocation. Without readiness checks, the workflow occasionally attempted to call the API before the Spring Boot application was fully initialized. Adding retry-based health checks improved workflow stability significantly. Execution Metrics To evaluate the effectiveness of the approach, the framework generates execution metrics after each run. Example output: JSON { "impacted_tags": "@payments,@transactions", "scenarios_executed": 6, "scenarios_skipped": 12, "test_reduction_rate": "66%", "timing_metrics": { "total_workflow_seconds": 52, "isolated_test_seconds": 18, "api_overhead_seconds": 6 } } In sample execution scenarios from this project, the framework reduced executed tests by approximately 60–70% depending on the scope of code changes. Localized feature updates benefited the most, while shared dependency changes still triggered broader regression execution. Although these results came from a personal engineering project rather than a production enterprise system, the experiment demonstrated how lightweight impact-aware execution can improve CI efficiency. Limitations and Future Improvements The project also exposed several limitations. Manual Mapping Maintenance Rule-based mappings require periodic updates as repositories evolve. Hidden Dependency Risks Indirect service dependencies may not always be detected through simple path matching. Git Comparison Accuracy Accurate impact analysis depends heavily on proper branch comparison and repository history availability. Future improvements could include: dependency graph analysiscode coverage–based impact detectionhistorical test-failure analysisML-assisted impact predictionmulti-module dependency propagation These enhancements could improve precision while preserving the lightweight nature of the framework. The complete implementation for this project, including the Spring Boot impact-analysis service and GitHub Actions workflow, is available on GitHub. Source Code: Dev repository: https://github.com/raakeshdev20/fintech-impact-servicesAutomation repository:https://github.com/raakeshdev20/karate-change-impact-test Conclusion This project explored how Git diff analysis and selective test execution can help optimize CI/CD pipelines without requiring complex external tooling. By combining the following, the framework demonstrated a practical approach to reducing unnecessary regression execution for localized changes. JGit-based change detectiondeterministic impact mappingdynamic Karate executionGitHub Actions orchestration While the implementation is intentionally lightweight, the experiment highlights how impact-aware testing strategies can improve feedback cycles, reduce redundant execution, and make CI pipelines more efficient as systems continue to scale. More

Hardening MCP Gateways: Mitigating July 28 Security Risks in Java Applications

By Daniel Oh

CORE

Search Is Becoming the Control Plane for AI Agents

By sunil paidi

Fix Circular Dependencies in PostgreSQL Row-Level Security With SECURITY DEFINER Functions

By Lex Mulier

Scaling Row-Level Security With ABAC on Databricks Unity Catalog

Onboarding a new table into row-level security should be four lines of metadata. Not two new objects, a code review, and a platform-team ticket. This post describes a tag-driven attribute-based access control (ABAC) pattern built on Databricks Unity Catalog primitives that achieves the objective of one UDF per filter shape, one policy per shape, and a single control table that drives all per-group authorization logic. I work as a solutions architect with large enterprises running hundreds of tables across multiple regions, product lines, and source systems, where row-level security follows a pattern: Group A sees records from System X. Group B sees the regions China and India. Group C sees plant key 333. Group D combines two source systems. Group E sees everything except specific values. The domain (finance, healthcare, etc.) doesn't matter. The pattern remains the same. A traditional implementation looks like this: One row-filter UDF per tableOne row-filter policy per (table and group) combinationA SQL query layer that joins every table to an identity mapping table Every new table required writing two new objects (UDF + policy) and updating every existing group definition. Every new group required touching every UDF. Onboarding a new product line meant rebuilding the whole machine. Hundreds of tables × dozens of groups = a maintenance nightmare. Instead, what they needed was a pattern where: A new table joins the RLS scheme with only metadata changes — no new UDFs, no new policiesA new business group is just a data write — no DDL, no code reviewMisconfigured rules fail closed, not open This post describes the four-layer pattern we landed on. It's built entirely on Unity Catalog ABAC primitives (governed tags, row-filter policies, attribute-based binding) and a single control table that drives per-group filter logic. The Four Layers Each layer does exactly one thing and one thing only: Layer 1: Tags are declarative metadata. A table-level tag declares which shape a table is. A column-level tag tells the row filter which physical column corresponds to which logical attribute. All tags do is describe the data. They facilitate the action for the next layers.Layer 2: UDF is the decision logic. Given a row's attribute values, it returns a boolean answer of TRUE or FALSE for current_user(). It doesn't know anything about which table it's filtering; it just answers "is this row visible?"Layer 3: Policy is the binding. It says, "for tables tagged with shape X, call UDF Y with these columns." It uses tag-matching expressions so it auto-attaches to new tables as they're tagged. This is what enables us to avoid per-table DDL. Layer 4: Row-Level Security is what the customer experiences. Their SQL doesn't change; filtered rows just come back. Layer 1 + 2: Tags Describe, UDFs Decide Two governed tags do all the work. A rls_tag on the table says "this is a table_1_filter shape." An rls_attr tag on each column says "this column is the src_sys_cd attribute" or "this column is the region attribute." The column tag is the load-bearing piece — it lets you have a column literally named ws_region_cd and still have the policy treat it as the logical region attribute. Physical naming is decoupled from policy semantics. One UDF per table shape. A "shape" is a set of filterable columns. In this customer's setup, there are two shapes: Table 1 shape: (src_sys_cd, plant_key, region) — three attributes. Maybe it's best to call this shape a combination of the keys. For example, all the tables that have src_sys_cd + plant_key+region fall under this shape.Table 2 shape: (src_sys_cd, order_key) — two attributes Each shape has its own UDF (rf_table_1, rf_table_2). The UDF signature takes one parameter per filterable column. The body joins a user_group_control_table (one row per group rule) to a user_group_membership mapping (or, in production, is_account_group_member()) and applies BOOL_OR across the rules — a union grant. The key property of this UDF design: it doesn't know which table is calling it. It just answers, given attribute values, "does current_user() get this row?" That's what lets the same UDF serve many tables of the same shape. Layer 3: The Policy that ties it all together The policy is the only object in the system that knows about both tags and UDFs. Its four clauses each answer a separate question: clausequestion it answers `ON SCHEMA …` Where does this policy live? (Schema-scoped — broad reach.) `WHEN has_tag_value('rls_tag', '…')` Which tables should it attach to? Anything tagged with the right shape. `MATCH COLUMNS … has_tag_value('rls_attr', '…')` Inside each table, which physical column maps to which logical attribute? `ROW FILTER … USING COLUMNS (…)` Which UDF to call, and in what argument order? The auto-attachment behavior is what makes the pattern scale. The policy doesn't enumerate tables — it matches them by tag. Tag a new table tomorrow, and the policy applies to it on the next query. Zero policy edits. Query Time: What Actually Happens When a user runs SELECT * FROM table_1, here's what Unity Catalog does behind the scenes: The planner sees the table and looks up policies attached to its schema. It finds rls_policy_t1.The policy's `WHEN` clause checks the table tag. Does table_1 have rls_tag=table_1_filter? Yes → the policy attaches. (If no, the policy is skipped for this table.)The policy's `MATCH COLUMNS` resolves attributes. For each logical attribute name, it scans column tags to find the physical column with that role: src_sys_cd → physical column src_sys_cd; plant_key → physical column plant_key; region → physical column region.The query is rewritten to append WHERE rf_table_1(src_sys_cd, plant_key, region) = TRUE. The customer's original SQL is unchanged.The UDF runs per row. It joins the control table to membership for current_user(), evaluates each rule, and BOOL_ORs the results — TRUE if any rule grants the row, FALSE otherwise.The engine emits only the `TRUE` rows. The customer sees only their authorized subset. They never see the UDF call or the policy mechanics. The whole thing is transparent to the application — same SQL, filtered result. The Scale Payoff The reason to build the pattern this way only becomes obvious when you onboard the second, third, and hundredth table. Adding a New Table to an Existing Shape A new dimension table arrives that fits the Table 1 shape. The work to bring it under RLS: Stepeffort shape Create the table (customer's normal DDL) — Tag the table with the shape 1 line of DDL Tag the columns with their logical roles 3 lines of DDL Update the UDF? None Update the policy? None Update the control table? **None** (existing groups apply automatically through their existing rules) Four ALTER lines. That's the whole onboarding cost. Adding a New Business Group A new business group needs access to a specific slice of the data: stepeffort `INSERT` one row into `user_group_control_table` 1 INSERT Add users to the AD group (outside Databricks) — Update the UDF? None Update the policy? None Update any table tags? None One INSERT. Adding a new group is a data write, not DDL — which means operations teams can self-serve through their normal change-management process, without code review or platform-team involvement. GitHub repo: https://github.com/vbablue/databricks-abac-rls-demo/tree/main

By Sriram Vadlamani

The Agent Security Split: Tool Layer vs Sandbox Layer

When an enterprise asks, "Is your agent platform secure?", the question is almost always a bundle of two distinct architectural concerns: Tool layer: Can the agent only call the tools we approved? Are the tool inputs and outputs validated? Are credentials kept out of the LLM's context? Are calls audited?Sandbox layer: When a tool runs code, browses the web, or shells out — is that execution isolated from the host? Can it reach internal networks? Can it write outside its working directory? These look adjacent, but they fail differently. A tool layer fails when an agent calls something it shouldn't have access to — fixable by tightening the tool registry. A sandbox layer fails when an approved tool gets compromised mid-execution (e.g., a Chromium zero-day exploited via a malicious page) — fixable only by reducing what the execution environment can reach. In building helmdeck — an open-source MCP server and pack-based agent infrastructure — our thesis has been that the immediate bottleneck for production-grade agents is the tool layer. We shipped schema-validated Capability Packs, an MCP server that exposes them uniformly, and a vault that injects credentials into outbound HTTP without the agent ever seeing them. But for true enterprise hardening, the tool layer isn't enough. You need a sandbox layer that provides hardware isolation. This is why we designed a composed architecture using NVIDIA OpenShell to handle the execution environment. The Credential Split The most common concern when composing two security layers is a tug-of-war over credentials. If both the agent platform and the sandbox engine handle secrets, who owns what? After mapping the integration between helmdeck and OpenShell, the responsibilities proved entirely non-overlapping: Credential TypeOwnerMechanismInference API keys (Anthropic, OpenAI)Sandbox (OpenShell)Provider-injected environment variables at agent-sandbox startKubernetes service accounts, cloud credentialsSandbox (OpenShell)Provider-injected at sandbox provisioningSaaS PATs (GitHub, Stripe, Notion)Tool Layer (helmdeck)AES-256-GCM vault; ${vault:NAME} placeholder substitution at pack-dispatch timePack output artifact signingTool Layer (helmdeck)Existing artifact store The sandbox layer injects into the process environment. The tool layer injects into the outbound HTTP request body. The layers never collide because they intercept at different points in the request lifecycle. What Changes When You Compose Them Today, agents call helmdeck's 39 packs via MCP. The packs run in Docker containers with seccomp profiles and dropped capabilities. An egress guard rejects outbound URLs against a blocklist. That is solid for most operators. The composed architecture changes one specific thing: helmdeck's SessionRuntime interface — the seam between the pack engine and execution backends — gains a third backend. Instead of shelling out to the Docker SDK, the pack engine calls OpenShell's Gateway API, which provisions the sidecar in a MicroVM with a pack-family-specific OPA policy attached. The pack code doesn't change. The MCP surface doesn't change. The agent doesn't know. But the enterprise reviewing the architecture notices three things: Dedicated kernel isolation: A browser sidecar runs in a dedicated kernel. A zero-day exploit cannot escape to the host because the libkrun MicroVM boundary is a hardware-virtualization line, not a namespace.L7 policy per pack family: A python.run sidecar can be policy-restricted to deny any outbound HTTP — even to internal services — while a browser.screenshot_url sidecar can be allowed to reach exactly the user-supplied target.Landlock filesystem enforcement: Even if the LLM generates code attempting to read /etc/passwd, the kernel returns EACCES before the process can act. Why This Matters to You If you are designing an agentic platform for enterprise deployment, do not attempt to merge the tool layer and the sandbox layer into a single monolithic API. The abstractions will leak. A two-stack story is more honest about what each layer does. An enterprise reviewing a composed architecture can audit each layer independently: they can read the sandbox's policy YAML to verify network isolation, and read the tool layer's pack schemas to verify credential injection. That decoupling is a security property of the architecture, not just an aesthetic preference. If you are an architect reviewing agent infrastructure for production, we are actively prioritizing the next phases of this integration based on community needs. We need to know which pack family worries you most (browser, Python, vision) and what you are isolating against (Chromium zero-days, internal SSRF). You can shape the roadmap by commenting on issue #193, or help us build the deterministic tool layer by contributing SaaS API wrappers following our contribution guide. Note: NVIDIA OpenShell is currently in alpha. The composed architecture described here is our post-v1.0 roadmap for enterprise hardening, ensuring the base tool layer is stable before binding it to an alpha contract.

By Tosin Akinosho

Mitigating Cache Stampedes in Dynamic API Translation Using Java 21 Virtual Threads

The Hidden Cost of API Versioning Hell Continuous API evolution is non-negotiable in contemporary software development, yet maintaining backward compatibility remains an incredibly expensive and labor-intensive hurdle. Core schema mutations frequently force downstream enterprise clients into disruptive and unplanned refactoring cycles, stalling product velocity. The typical industry fix — maintaining multiple, hard-coded API routes (e.g., /v1, /v2) — inevitably results in severe codebase sprawl, fractured engineering focus, and massive technical debt for the API provider. To break this cycle, this article outlines raqs (Response Agnostic Query System): a novel, dynamic proxy architecture designed to eliminate client-side disruption entirely. By intercepting traffic and executing on-the-fly schema transformations, raqs allows legacy clients to request data against deprecated contracts while the core upstream backend remains free to evolve. The raqs Solution: A Bifurcated Architecture Running complex natural-language processing or machine-learning inference directly within a high-throughput network routing path is typically a recipe for catastrophic latency. To solve this, raqs splits the network and intelligence layers into two distinct operational planes: The Orchestration Plane (Java 21/Spring Boot): Acting as the primary ingress proxy, this layer intercepts requests, manages multi-tier cache retrieval, handles distributed synchronization, and executes structural JSON transformations. The Inference Plane (Python/FastAPI): Operating as a probabilistic fallback mechanism, this agent calculates semantic and structural relationships between schema keys only when a deterministic mapping rule is missing. Core Architectural Decision Matrix ComponentNaive/Standard Approachraqs ImplementationConcurrency ManagementOS Thread Pooling (Tomcat Defaults) Java 21 Virtual Threads (Project Loom) SynchronizationPolling / Thread.sleep() loop Redisson Distributed Locking (Pub/Sub) Caching TierSingle-node In-Memory Cache Multi-tier (Caffeine L1 + Redis L2) Semantic MappingPure Semantic Models (LLM/Dense Vector) Hybrid Ensemble (Vector + Lexical Distance) Scaling Imperatively With Java 21 Virtual Threads The Orchestration Plane must handle thousands of concurrent client requests while checking caches, holding locks, or awaiting responses from the Inference Plane. The traditional platform-thread pooling model introduces massive operating system overhead and memory footprint under heavy I/O saturation. By building on Java 21 virtual threads (Project Loom), raqs assigns a lightweight, user-mode virtual thread to every single request lifecycle. When a thread encounters an L1/L2 cache miss, it is gracefully unmounted from its underlying OS carrier thread. The carrier thread is freed to handle other active network traffic, while the suspended virtual thread waits to resume once the schema mapping becomes available. This allows us to write straightforward, blocking imperative code that scales out with the efficiency of complex reactive systems. Defeating Cache Stampedes: The "Hero Thread" Pattern A major architectural risk for dynamic proxies is the cache stampede (or thundering herd problem). If a rolling backend deployment instantly mutates 50 schema keys, a burst of 1,000 concurrent client requests will simultaneously experience an L1/L2 cache miss. Without intervention, this triggers a massive wave of redundant, CPU-heavy inference calls that can completely crash the system. We mitigate this by implementing the "Hero Thread" pattern utilizing Redisson distributed locks: Java // Conceptual implementation of the Hero Thread pattern in the Orchestration Plane String lockKey = "lock:schema:" + legacyVersion + ":" + upstreamVersion; RLock distributedLock = redissonClient.getLock(lockKey); // Check L1/L2 cache first MappingRule mapping = cacheManager.getMapping(legacyVersion, upstreamVersion); if (mapping == null) { // Attempt to acquire the distributed lock via Redis Pub/Sub mechanisms if (distributedLock.tryLock()) { try { // The "Hero Thread" has the lock and invokes the Inference Plane mapping = inferenceClient.fetchProbabilisticMapping(legacySchema, upstreamSchema); cacheManager.populateCaches(legacyVersion, upstreamVersion, mapping); } finally { distributedLock.unlock(); } } else { // Non-hero threads are suspended by Loom and wait for cache population mapping = waitForCacheOrRetry(legacyVersion, upstreamVersion); } } return transformJsonPayload(rawResponse, mapping); By enforcing this structure, exactly one thread (the "Hero Thread") takes the computational penalty of invoking the ML Inference Plane. The remaining 49 or 999 concurrent threads are cleanly suspended by Loom, waking up via Redis Pub/Sub to read the finalized, cached ruleset. Pragmatic AI: Why "Pure Semantic" Models Fail During initial prototyping, we found that relying solely on dense vector embeddings (like Cosine Similarity) for short JSON dictionary keys yields dangerous false-positive collisions. For instance, a dense vector model will frequently map the legacy key firstName directly to a new key named lastName because they share highly overlapping linguistic contexts within general training data. To prevent silent data corruption, raqs uses a Hybrid Ensemble Scoring Model that evaluates both semantic meaning and lexical structure: Semantic evaluation: Keys are projected into a vector space using the all-MiniLM-L6-v2 transformer model, calculating Cosine Similarity S_semantic. Lexical evaluation: To account for common developer syntax changes (such as camelCase to snake_case), we compute the normalized Levenshtein distance S_lexical. Through empirical calibration, we fixed the hyperparameters at W_semantic = 0.7 and W_lexical = 0.3. If the combined score fails to clear a strict acceptance threshold (e.g., 0.80), the mapping is rejected. Ensemble Scoring Dynamics in Action Legacy KeyNew KeySemantic ScoreLexical ScoreEnsemble ResultfirstNamefirst_name0.950.88 0.929 (Accept)userIdaccount_id0.820.40 0.694 (Reject)firstNamelastName0.880.55 0.781 (Reject)zipCodepostalCode0.890.60 0.803 (Accept) As shown above, a pure semantic evaluation would have mistakenly accepted firstName as lastName due to its high 0.88 similarity vector. The 30% lexical penalty successfully suppresses the final score below the 0.80 threshold, preserving data integrity. Performance Telemetry and Benchmarks To test the efficacy of this architecture, we subjected the raqs proxy to a load test of 1,000 requests with a concurrency cap of 50, simulating a sudden, zero-knowledge v1-to-v2 upstream schema evolution on a standard CPU-bound host machine. The cold start: Upon initialization against an empty cache, the Redisson distributed lock correctly isolated the thundering herd. Exactly one thread executed the Hybrid ML Inference, completing in 504.65 ms. The blocked threads: The remaining 49 concurrent threads were safely unmounted from OS carrier threads by Loom, waiting for lock release via Pub/Sub and completing with an average latency of 554.24 ms. The steady state: Once the rules were promoted to the Caffeine (L1) and Redis (L2) caches, the subsequent 950 requests bypassed the Inference Plane entirely. The Orchestration Plane achieved an outstanding steady-state processing latency of just 10.25 ms ($\sigma = 2.19\text{ ms}$). This performance distribution demonstrates that the computational cost of machine learning inference can be entirely isolated to cold starts, making real-time, dynamic API translation exceptionally practical for enterprise-scale traffic. The Path Forward API evolution shouldn't force a broken trade-off between breaking client applications or drowning in a versioned codebase sprawl. By pairing the non-blocking concurrency of Java 21 with a highly disciplined, multi-tier distributed proxy, we can build data layers that adapt dynamically to contract shifts. Future iterations of this paradigm will expand beyond simple key mutations to incorporate deep structural payload transformations, JSON path awareness, and automatic data type coercion. Key Takeaways Eliminate versioning sprawl: Engineers can reduce the overhead of traditional API versioning by introducing a dynamic proxy that maps evolving schemas to legacy expectations on-the-fly. Scale imperatively via Java 21: Virtual Threads (Project Loom) allow high-throughput routing middleware to scale using a readable thread-per-request model without heavy reactive frameworks. Implement the "Hero Thread" pattern: Utilizing Redisson distributed locking ensures that expensive schema inference tasks are executed exactly once during high-traffic evolution events. Deploy pragmatic hybrid scoring: Combining dense vector embeddings with normalized Levenshtein distance drastically reduces false-positive mapping collisions. Achieve sub-15ms latency: Decoupling high-latency inference from the routing path ensures that 95% of steady-state traffic experiences near-native performance.

By Aniruddha Chatterjee

AGENTS.md Makes Your Java Codebase AI-Agent Ready

The year is 2026, and the way software is built has fundamentally shifted. We are no longer just writing code for other humans to read; we are building systems that AI coding agents, such as Cursor, GitHub Copilot Agent Mode, Claude Code, and autonomous CLI tools, will navigate, debug, and extend. As Java developers, we are blessed with robust tooling. If you are using Quarkus, you already possess a superpower: Supersonic Subatomic Java with an ultra-fast developer loop, continuous testing, and built-in Dev Services. However, AI agents frequently get tripped up by enterprise Java repositories. They overcomplicate simple architectures, write blocking code where reactive code belongs, or waste tokens trying to spin up manual Docker containers when Quarkus Dev Services could do it out of the box. The fix? AGENTS.md. Let’s explore how to use this emerging open standard to make your Quarkus applications instantly digestible for AI agents. What Is AGENTS.md? The AGENTS.md specification is a tool-agnostic open standard (pioneered by the Agentic AI Foundation) designed to sit at the root of a repository. Think of your standard README.md as human onboarding documentation: it contains high-level architecture narratives, badges, and project philosophy. AGENTS.md, on the other hand, is an executable runtime instruction layer for AI. It is concise, deterministic, imperative, and explicitly structured to prevent "context window bloat" while giving autonomous agents the exact boundaries and commands they need to succeed. The Anatomy of an Agent-Ready Quarkus Codebase When an AI agent initializes inside your workspace, it reads your project structure. Because Quarkus spans both imperative and reactive paradigms, an unguided AI agent will often hallucinate or mix patterns. An effective AGENTS.md for a Quarkus ecosystem must explicitly define three pillars: Operational commands: The exact Maven/Gradle sequences for running, testing, and live-reloading.Architectural boundaries: Strict rules regarding blocking vs. non-blocking code and data access patterns.Infrastructure management: Forcing the agent to utilize Quarkus Dev Services rather than provisioning external databases. Hands-On: The Ultimate Quarkus AGENTS.md Template Drop this exact AGENTS.md file into the root of your Quarkus repository to drastically improve the quality of AI-generated code and autonomous refactoring tasks. Markdown ## Tech Stack & Ecosystem Context - **Runtime**: Java 25, Quarkus 3.x (Supersonic Subatomic Java). - **Build Tool**: Maven (`mvnw` wrapper present). - **Extensions**: REST, Hibernate ORM with Panache, Quarkus Dev Services. - **Database**: PostgreSQL (Managed entirely via Dev Services). ## Critical Operational Commands - **Launch Development Mode**: `./mvnw quarkus:dev` - **Execute All Tests**: `./mvnw test` - **Continuous Testing**: Start `./mvnw quarkus:dev` and press `r` to toggle background testing. - **Production Package**: `./mvnw package` ## Architectural Boundaries & Coding Standards ### 1. Reactive vs. Blocking Rules - Default to **REST**. Endpoints returning `Uni<T>` or `Multi<T>` must NEVER invoke blocking operations. - If a method blocks, annotate it explicitly with `@Blocking`. ### 2. Data Access (Hibernate ORM with Panache) - Use the **Panache Active Record pattern** extending `PanacheEntity`. Do NOT write custom repositories or explicit DAO layers unless complex business logic demands it. - **Transaction Management**: Annotate mutate operations with `@Transactional`. Never manage transactions manually. ```java // Correct Agent Output Example: @Entity public class Developer extends PanacheEntity { public String name; public String specialty; public static Uni<Developer> findByName(String name) { return find("name", name).firstResult(); } } ``` ## Scaffolding Lifecycle for New Microservices When scaffolding a new microservice (e.g., "Scaffold a new microservice for user billing"), the agent follows this deterministic lifecycle: ### 1. Reads the Command Layer - **Bypass manual configuration**: Do NOT generate raw `pom.xml` text by hand, which frequently leads to version mismatches or missing dependency management blocks. - **Use Quarkus tooling**: Rely on the official Quarkus Maven plugin command structure. ### 2. Executes the Tooling - **Command**: Run the explicit `mvn io.quarkus.platform:quarkus-maven-plugin:create` command directly inside your terminal workspace. - **Example**: ```bash mvn io.quarkus.platform:quarkus-maven-plugin:3.x.x:create \ -DprojectGroupId=com.example \ -DprojectArtifactId=billing-service \ -DclassName="com.example.billing.BillingResource" \ -Dpath="/billing" ``` ### 3. Applies Core Extensions - **Guarantee essential extensions** are baked in from the first second: - `hibernate-orm-panache` for data access - `quarkus-rest` for REST endpoints - **Add extensions during creation**: ```bash mvn io.quarkus.platform:quarkus-maven-plugin:create \ ... \ -Dextensions="hibernate-orm-panache,quarkus-rest,jdbc-postgresql" ``` - This prevents the agent from creating legacy or blocking code templates down the line. ### 4. Validates Context - **Transition to Testing**: Once scaffolded, immediately verify that the out-of-the-box generated test suite runs cleanly. - **Validation command**: `./mvnw test` - **Expected outcome**: All generated tests pass without modification, confirming the scaffold is valid and ready for development. ### Post-Scaffold Checklist - [ ] Project structure follows standard Maven layout (`src/main/java`, `src/test/java`) - [ ] `application.properties` contains Dev Services configuration (auto-configured for PostgreSQL) - [ ] At least one REST endpoint exists with a corresponding test - [ ] `./mvnw test` passes cleanly - [ ] `./mvnw quarkus:dev` starts without errors Testing and Local Infrastructure Never manually configure Testcontainers or hardcode local JDBC connections inside application.properties for local development.Rely 100% on Quarkus Dev Services. The PostgreSQL container is automatically spun up during ./mvnw quarkus:dev or @QuarkusTest. Verification Protocol Before declaring a task complete, you MUST: Run ./mvnw compile to ensure zero compilation or annotation processor failures.Run ./mvnw test and confirm all integration tests pass cleanly. Note: Find the solution repository: https://github.com/danieloh30/agents-md-for-java-quarkus.git Shell ### Sample Demo Walkthrough: Put it to the Test To see the power of this setup, let’s imagine a standard demo repository structured as follows: agents-md-for-java-quarkus/src/main/java/com/example/billing/ |____com | |____example | | |____billing | | | |____Invoice.java | | | |____BillingResource.java | | | |____InvoiceItem.java |____pom.xml |____README.md <-- For humans |____AGENTS.md <-- For the AI Agents The Experiment You open this repository inside an AI-native workspace and issue a vague, autonomous prompt: "Add a new REST endpoint to fetch a developer by their specialty, write a test for it, and verify that the app works." Without AGENTS.md The agent might look at pom.xml, realize it's a Java app, and write a legacy, blocking JAX-RS endpoint. It might attempt to spin up a Docker container inside the test via a manual DockerClient or throw an error because it doesn't know how to supply a PostgreSQL URL. With AGENTS.md Reads context: The agent parses AGENTS.md instantly. It recognizes that it must write a reactive Uni<Developer> endpoint using Panache’s Active Record pattern.Generates code: It appends a clean, reactive finder method directly onto the Developer entity.Executes environment: Instead of guessing how to launch your app, it executes ./mvnw quarkus:dev.Leverages dev services: It sees that Quarkus handles the database automatically. It writes a clean @QuarkusTest integration test, triggers the validation, checks the terminal logs, and corrects its own syntax if a compilation check fails. By defining the boundaries upfront, you prevent the agent from writing code that compiles but violates your team's architectural standards. Conclusion: Treat Context as Code Providing an AI agent with free rein over an enterprise Java codebase without boundaries is like letting a junior developer deploy to production on day one without code reviews. By adopting AGENTS.md alongside the rapid developer feedback loops built natively into Quarkus, you bridge the gap between human intent and machine execution. Spend 10 minutes writing an AGENTS.md file today, and unlock massive productivity gains for the agentic future of software development. Check out more from my series here.

By Daniel Oh

CORE

Seeding Postgres When Your Schema Has Foreign-Key Cycles

I have lost more afternoons than I would like to admit on this exact problem: a seed script that ran cleanly yesterday now crashes on its first INSERT, and the error message tells you something you already knew, namely that you have a chicken-and-egg dependency between two tables. SQL ERROR: insert or update on table "users" violates foreign key constraint "users_organization_id_fkey" DETAIL: Key (organization_id)=(1) is not present in table "organizations". The natural next move is to reorder the inserts, putting organizations first, except that organizations.owner_user_id is NOT NULL REFERENCES users(id), which means you cannot insert an organization without a user that does not exist yet. You are looking at a foreign-key cycle, and no ordering of plain INSERT statements can satisfy every NOT NULL REFERENCES at row-insertion time. The rest of this article walks through three working strategies for seeding a Postgres database that contains FK cycles, plus a decision table for picking the right one. Examples assume Postgres 18, which is the current stable as of mid-2026, but most of the reasoning ports cleanly to earlier versions and to other RDBMSes, with the caveats called out where they matter. Two Flavors of Foreign-Key Cycles The first thing worth understanding is that two distinct cycle shapes show up in real schemas, and the fix for each is slightly different. The friendly kind is the self-referential cycle, which is what hierarchical data tends to produce. The classic example is an employees table with a manager_id REFERENCES employees(id) column: the CEO row has a NULL manager, but every other row points at another row in the same table. Self-references are easy to seed because the root row's reference can almost always be left nullable, and once that's done you insert top-down. The harder kind is the multi-table cycle, where two or more tables point at each other through NOT NULL columns. The canonical case is bidirectional ownership between users and organizations, but real schemas often contain longer cycles that route through join tables, such as users → roles → permissions → resources → users. A four-hop cycle like that one will not yield to "just reorder the inserts" no matter how patient you are. If you want to see exactly which cycles your schema contains, the Postgres catalog will tell you. The following recursive CTE walks pg_constraint, collects every cyclic path, and canonicalizes each cycle to a single representative row so you do not get one duplicate per starting node: SQL WITH RECURSIVE fk_graph AS ( SELECT conrelid::regclass AS from_table, confrelid::regclass AS to_table FROM pg_constraint WHERE contype = 'f' ), walk AS ( SELECT from_table AS start_table, from_table, to_table, ARRAY[from_table, to_table] AS path FROM fk_graph UNION ALL SELECT w.start_table, g.from_table, g.to_table, w.path || g.to_table FROM walk w JOIN fk_graph g ON g.from_table = w.to_table WHERE g.to_table <> ALL(w.path[2:]) ) SELECT path FROM walk WHERE to_table = start_table AND start_table = (SELECT MIN(t) FROM unnest(path) AS t); The MIN predicate at the end picks the rotation that begins at the lexicographically smallest table, which is what produces one row per cycle rather than one row per node-on-cycle. Run this against any schema older than about a year, and there will almost certainly be a cycle you forgot you had introduced. I ran it last quarter against a 30-table schema I thought I knew well, and it returned six. Why Naive INSERT Order Fails The reason hand-written seed scripts feel solvable at first is that most schemas form a directed acyclic graph of foreign keys, and a DAG always admits a topological sort. Countries come before cities, cities before addresses, addresses before users, and as long as you insert in that order, everything resolves on the first pass. The moment a cycle exists, the FK graph stops being a DAG, and no per-table ordering of inserts can satisfy every NOT NULL REFERENCES at row-insertion time. At least one row on the cycle has to reference a row that does not exist yet. This isn't a bug in your seed script; it's a property of the graph, and chasing it with another permutation of the insert order will produce the same error from a different table. You have three working ways out. Strategy A: Two-Pass Insert With Nullable Columns The first strategy is to give up the NOT NULL constraint on at least one edge of the cycle, insert both sides with that edge left NULL, and then close the loop in a second statement. SQL CREATE TABLE users ( id bigserial PRIMARY KEY, email text NOT NULL UNIQUE, organization_id bigint -- nullable on purpose ); CREATE TABLE organizations ( id bigserial PRIMARY KEY, name text NOT NULL, owner_user_id bigint NOT NULL REFERENCES users(id) ); ALTER TABLE users ADD CONSTRAINT users_org_fk FOREIGN KEY (organization_id) REFERENCES organizations(id); With the nullable edge in place, the seed becomes a straightforward two-pass operation inside a transaction: SQL BEGIN; INSERT INTO users (email, organization_id) VALUES ('[email protected]', NULL) RETURNING id; -- -> 1 INSERT INTO organizations (name, owner_user_id) VALUES ('Acme', 1) RETURNING id; -- -> 1 UPDATE users SET organization_id = 1 WHERE id = 1; COMMIT; The technique works on every relational database you are likely to touch, which is its main appeal, but the cost is real and easy to underestimate: you have just modelled a NOT NULL business invariant as nullable at the schema level, and you now have to enforce it somewhere else, whether in application code, in a CHECK constraint flipped on after the seeding is done, or in a deferred trigger that watches commits. Production schemas rarely tolerate that compromise, which is why Strategy A tends to live in ad-hoc dev databases where the cycle is incidental rather than load-bearing. One Postgres 17 quality-of-life improvement worth knowing about is that MERGE ... RETURNING combined with the new merge_action() function makes the two-pass shape less verbose for bulk imports, because you can now route inserts and updates through a single MERGE and capture which path each row took. The underlying two-pass logic is unchanged, but for many real workloads the line count comes down by roughly half. Strategy B: Deferred Constraints Inside a Transaction Postgres offers a more elegant escape: you can declare a foreign key as deferrable, which postpones the constraint check from row-insertion time to transaction-commit time. SQL CREATE TABLE users ( id bigserial PRIMARY KEY, email text NOT NULL UNIQUE, organization_id bigint NOT NULL ); CREATE TABLE organizations ( id bigserial PRIMARY KEY, name text NOT NULL, owner_user_id bigint NOT NULL REFERENCES users(id) DEFERRABLE INITIALLY IMMEDIATE ); ALTER TABLE users ADD CONSTRAINT users_org_fk FOREIGN KEY (organization_id) REFERENCES organizations(id) DEFERRABLE INITIALLY IMMEDIATE; The seed then collapses to a single pass inside a transaction that opts into deferred checking: SQL BEGIN; SET CONSTRAINTS ALL DEFERRED; INSERT INTO organizations (id, name, owner_user_id) VALUES (1, 'Acme', 1); INSERT INTO users (id, email, organization_id) VALUES (1, '[email protected]', 1); COMMIT; -- both rows exist by now, both FK checks pass For day-to-day work, DEFERRABLE INITIALLY IMMEDIATE is the variant you want, because it leaves the constraint behaving exactly like a normal one in every transaction except those that explicitly call SET CONSTRAINTS ALL DEFERRED. The more aggressive INITIALLY DEFERRED defers every transaction by default, which sounds harmless until you realize that errors then surface at commit instead of at the offending statement, making real bugs much harder to chase down. There is a piece of folklore in this area worth dismantling before it bites you: a foreign key declared DEFERRABLE INITIALLY IMMEDIATE does not pay a measurable runtime cost compared to a non-deferrable one. For FK constraints specifically, the check happens at end-of-statement or end-of-transaction in both modes, so there is no fast path you are giving up by marking it deferrable. The reason this confuses people is that UNIQUE and PRIMARY KEY constraints do behave differently when made deferrable, because their underlying index can no longer enforce uniqueness eagerly, so the perf-myth that applies legitimately to UNIQUE indexes gets generalized, incorrectly, to foreign keys. If your team is resisting DEFERRABLE on production tables because of a vague performance worry, the cure is a five-minute benchmark. Two real caveats apply. The smaller one is that only declared-deferrable constraints can be deferred, so if you forgot to write DEFERRABLE in the original migration, you have to alter the constraint after the fact, and that alteration requires SHARE ROW EXCLUSIVE on both tables. It isn't the end of the world for a planned maintenance window, but it isn't a no-op either. The larger and more current caveat is a 2026 footgun that bit teams who adopted the new NOT ENFORCED toggle introduced in Postgres 18. In releases shipped before May 14, 2026, a foreign key declared DEFERRABLE INITIALLY DEFERRED would quietly start behaving as NOT DEFERRABLE after being toggled NOT ENFORCED and then back to ENFORCED. There was no error, no warning, no log line, just a constraint that no longer deferred when you expected it to, which made for some entertaining debugging sessions. The fix shipped in 18.4, 17.10, 16.14, 15.18, and 14.23, and the remediation after upgrading is to toggle any affected constraint NOT ENFORCED and back to ENFORCED one more time. Constraints declared INITIALLY IMMEDIATE, and constraints that have never been routed through NOT ENFORCED, were not affected, which means most production seeds escaped this entirely. Anyone who used NOT ENFORCED for bulk loads in late 2025 or early 2026 should re-verify, however, because the silent nature of the regression makes "we never noticed" the most likely failure mode. The MySQL side of the story is shorter: InnoDB does not support deferred foreign keys at all. Your options there are Strategy A or a scoped SET FOREIGN_KEY_CHECKS = 0 inside a transaction, which is a heavier hammer than most teams are happy to swing in CI. Strategy C: Generator-Driven Cycle Resolution Strategies A and B both work, but they oblige you to hand-write the insert plan for every cycle, and that burden compounds as the schema grows. A team with half a dozen cycles, or with a cycle topology that shifts every few migrations, will eventually find that its seed script has become its own piece of brittle infrastructure that breaks in ways indistinguishable from the original problem, just in different places. Three rough tiers of tooling cover this space, and it is worth knowing where each one sits before reaching for any of them. At the bottom of the budget curve sit column-level generators such as Faker, Mockaroo, and generatedata.com, which produce realistic per-column values like names, emails, and ZIP codes but do not model the foreign-key graph at all. They hand you a CSV or a stream of INSERT statements and leave the cycle resolution to you, which is appropriate for what they do but also means that a Faker-driven pipeline still needs Strategy A or Strategy B underneath it as soon as cycles enter the picture. A newer middle tier of schema-aware data generators treats the FK graph as a first-class input rather than asking you to script around it. Tools in this category read the schema directly from the database, work out a valid load order on their own, and emit a plan that handles the cycle for you, which means you do not have to write Strategy A or B by hand for every cycle in your schema. The category is still small and emerging — Neosync and Seedfast are two examples that take this approach in slightly different ways — and most of these tools originated in regulated-industry workflows where teams could not legally copy production data into development environments and hand-written seeds were not scaling across migrations, which is why they tend to assume that audience by default. The mechanical part of the problem — figuring out a valid load order — turns out to be the easy half. The harder and more consequential half is generating realistic values while keeping every other schema constraint satisfied at the same time, because a type-valid email such as [email protected] passes the column definition but renders a B2B SaaS test dataset useless even when the FK relationships are all intact, and the same trap applies to unique indexes, partial indexes, check constraints, generated columns, RLS policies, and any business invariants encoded in triggers. One tier further up the budget curve, the enterprise test-data-management platforms that show up most often in procurement decks — Tonic.ai, Synthesized, and Delphix — take the inverted approach: they sit in front of a production database, anonymize or otherwise transform real data on the way out, and ship the result into lower environments through pipelines that are typically bought, audited, and operated by a dedicated platform team. The premise is the opposite of the schema-aware generators above, which assume you specifically do not want production data anywhere near dev or staging. Both approaches exist because both audiences exist, and choosing between them is usually a function of your compliance posture, your appetite for managing a TDM platform, and whether your annual tooling budget reaches six figures. Adopting a schema-aware generator is a trade-off, not a default. It pays off when you have a non-trivial number of cycles, when CI rebuilds happen often enough that the seed plan needs to survive without supervision, and when several environments — local, CI, staging, demo — all need data with similar shape. It is overkill when you have a single cycle, a stable schema, and a small team, because in that situation Strategy B in fifteen lines of SQL will still be there working in two years. The enterprise tier becomes the right answer when the compliance question is settled in the opposite direction, namely that production data is the ground truth your downstream environments must look like, anonymization is the only legally defensible path to lower environments, and your organization is willing to absorb the platform-engineering overhead that comes with it. A 2026 Alternative Worth Considering: Clone, Do Not Regenerate Postgres 18 made an older pattern viable in ways it had not been before, and it is worth mentioning here because for CI workloads it may obviate the entire choice between A, B, and C. The relevant pieces are a new server GUC, file_copy_method, and the existing FILE_COPY strategy on CREATE DATABASE. The GUC defaults to COPY, which performs a traditional byte-by-byte copy; setting it to CLONE tells Postgres to use copy_file_range() on Linux and FreeBSD or copyfile on macOS, which lets the kernel share blocks at the filesystem layer instead of physically duplicating them. On a copy-on-write filesystem such as XFS-with-reflinks, ZFS, APFS, or btrfs, the result is a 6 GB template cloned in roughly 212 milliseconds, against about 67 seconds for the default COPY method. On ext4 or any non-CoW filesystem, the kernel cannot honor the clone request, and you fall back to the slow byte-copy regardless, which makes the filesystem choice the load-bearing decision rather than the SQL. If your bottleneck is "how do I get a populated, schema-correct database in front of every test run as fast as possible," then cloning a template you seeded once is often a better answer than regenerating from scratch, regardless of which of the three strategies above you would have used to populate the template. SQL SET file_copy_method = 'clone'; -- session-level; for CI, set in postgresql.conf CREATE DATABASE test_run_42 TEMPLATE seedfast_template STRATEGY = FILE_COPY; The catch is that the template must be idle at clone time, with no other connections, which is awkward inside the same Postgres instance that is running everything else. Most teams who lean on this pattern run a dedicated CI Postgres instance whose only job is hosting the templates, which is more infrastructure but a one-time setup. When to Pick Which SituationStrategyAd-hoc dev DB, one or two cycles you know aboutA: nullable column, two-passPostgres, production-shaped schema with real NOT NULL FKs, frequent rebuildsB: DEFERRABLE + SET CONSTRAINTS ALL DEFERREDMySQL or mixed-RDBMS, no leeway to change the schemaA, or scoped SET FOREIGN_KEY_CHECKS = 0 if you trust the sourceMany cycles, frequent migrations, multiple environments to seedC: schema-aware generatorCI throughput is the bottleneck, schema is stable, CoW filesystem availableTemplate DB + Postgres 18 FILE_COPY clone, on top of any of A/B/COne cycle, stable schema, small teamB if you are on Postgres, A otherwise; resist adding a tool A Note on Cycle Hygiene A surprising number of "cycles" in production schemas turn out, on inspection, to be accidents that crept in across a few migrations rather than load-bearing design choices. Someone added created_by_user_id to an audit table that the users table already referenced, and nobody noticed the loop until a fresh seed run failed two sprints later. If the cycle in question is not actually load-bearing in your business logic, breaking it at the schema level by making one of the FK columns nullable in production is almost always a better long-term move than carrying any of the workarounds above. A seed script that does not have to resolve cycles is faster, simpler, and harder to break than any of the three strategies, and the documentation cost of explaining why the column is nullable is much smaller than the cost of explaining the seeding workaround to every new engineer who joins the team. For the cycles that really are intentional, such as the ownership patterns, hierarchical references, audit chains, and any other places where both sides of the relationship genuinely cannot exist without each other, the right move is to pick the strategy that matches your stack and your appetite for ongoing maintenance, and then to write that choice down somewhere your future colleagues will find it. The undocumented version of "we use deferred constraints because Strategy A broke our integration tests last year" is exactly the kind of folklore that gets reinvented from scratch every eighteen months when the engineer who knew it leaves.

By Mikhail Shytsko

Going Stateless: Scaling MCP Servers to Cloud-Native Java and HTTP

The Model Context Protocol (MCP) completely changed how we connect large language models to real-world data and tools. However, early versions of the protocol had a massive bottleneck for enterprise developers: they relied heavily on stateful, long-lived sessions. If you wanted to scale out your AI tools to handle thousands of concurrent agent workflows, you had to deal with sticky sessions, complex load balancing, and heavy memory overhead. The newest updates to the MCP specification solve this problem by introducing a completely stateless HTTP foundation. By removing the traditional initialization handshake and session IDs, MCP servers can now function as lightweight, independent microservices. When you combine this stateless evolution with cloud-native Java, you get the ultimate stack for cloud-native AI infrastructures. Why Stateless MCP Matters for Your Cloud Architecture In older stateful setups, an LLM host maintained an open connection to your server. If that specific server instance crashed or scaled down, the entire context of the conversation loop was lost. The latest specification shifts the paradigm. Every request sent from an AI agent or LLM host to an MCP server is now fully self-contained. The routing relies on two standard HTTP headers: Mcp-Method: Specifies the action (such as executing a tool or fetching a resource)Mcp-Name: Directs the request to the specific tool definition. Because the server no longer needs to remember who is calling it, you can place a standard load balancer in front of a cluster of MCP servers, distribute incoming requests evenly, and scale down to zero when traffic stops. The Cloud-Native Java Advantage: High-Density AI Tools While languages like Python and Node.js are popular in the AI space, they often struggle with heavy production workloads, multi-threading, and deep enterprise integration. Traditional Java solves these enterprise issues but comes with a high memory footprint and slower startup times—making it expensive to run as serverless microservices. This is exactly where cloud-native Java (e.g., Quarkus) shines. By utilizing ahead-of-time (AOT) compilation and GraalVM native images, Quarkus strips away the boilerplate runtime overhead. Plain Text ┌─────────────────────────────────────────────────────┐ │ Traditional Java MCP: ~150MB Ram | 2.5s Startup │ └─────────────────────────────────────────────────────┘ ┌─────────────────────────────────────────────────────┐ │ Cloud-Native Java MCP: ~18MB Ram | 0.015s Startup │ └─────────────────────────────────────────────────────┘ Instead of a single heavy backend trying to host dozens of different LLM tools, you can break your tools into highly specialized microservices. You can deploy a database-lookup tool, an internal API proxy, and a document parser as completely separate cloud-native Java applications. They will start instantly, use less than 20MB of RAM each, and scale up instantly when an AI agent triggers them. Building a Stateless MCP Resource With Cloud-Native Java Implementing a stateless tool in cloud-native Java with Quarkus is remarkably clean. By leveraging the reactive routing capabilities of Quarkus and standard Java objects, you can map the incoming JSON-RPC payloads directly to your business logic. Here is a conceptual example of how a stateless MCP tool controller looks in Quarkus using standard REST annotations: Java package com.example.mcp; import jakarta.ws.rs.POST; import jakarta.ws.rs.Path; import jakarta.ws.rs.HeaderParam; import jakarta.ws.rs.Produces; import jakarta.ws.rs.core.MediaType; import io.smallrye.mutiny.Uni; @Path("/mcp/v1") public class StatelessMcpResource { @POST @Path("/tools") @Produces(MediaType.APPLICATION_JSON) public Uni<McpResponse> handleToolExecution( @HeaderParam("Mcp-Method") String method, @HeaderParam("Mcp-Name") String toolName, McpRequestPayload payload) { // The request is entirely self-contained; no session lookup required. if ("tools/call".equals(method) && "fetch_customer_data".equals(toolName)) { return executeCustomerLookup(payload.getArguments()); } return Uni.createFrom().item(McpResponse.error("Tool or method not found")); } private Uni<McpResponse> executeCustomerLookup(JsonElement arguments) { // Business logic interacting with reactive databases or internal services return Uni.createFrom().item(new McpResponse("Customer data retrieved successfully.")); } } Summary The combination of a stateless protocol and a cloud-native Java framework removes the operational friction in building enterprise AI features. By deploying stateless MCP servers on cloud native Java - Quarkus, you gain the type of predictable scaling, rapid response times, and bulletproof reliability that modern production environments demand. Check out more from my series here.

By Daniel Oh

CORE

Python in 2026: uv vs Poetry vs pip: The Definitive Comparison

Being a Python developer, I have lived through the chaos: setup.py, requirements.txt, virtualenv, pipenv, conda, flit, hatch, poetry — each has promised to fix what came before. In 2026, the dust has settled around these three contenders: pip – The original one, which is shipped with Python and is omnipresentpoetry – The developer-experience darling of the 2020suv – The Rust-powered newcomer from Astral that is rewriting the rules We will go over the three in this post and benchmark them, compare real workflows, and provide clear recommendations. pip + venv: The Baseline pip is not going anywhere; it ships with Python, works everywhere, and the entire ecosystem is built around it. Python python -m venv .venv source .venv/bin/activate pip install requests fastapi unicorn pip freeze > requirements.txt pip install -r requirements.txt The catch pip freeze is not really a lockfile; it is a snapshot of your machine's installed packages, including transitive dependencies nobody explicitly requested. On a different machine or a month later, resolution may differ silently. pip-tools patches this with a two-file workflow(requirements.in ->requirements.txt), but you are now managing two tools with no build system or publish workflow included. Use pip when: writing quick scripts, working in constrained environments, or maintaining legacy projects with existing requirements.txt files. Poetry: The Developer Experience Standard Poetry arrived in 2018 and gave Python developers what JavaScript developers had enjoyed for years: one tool for dependencies, virtual environments, building, and publishing. Setup and Workflow Shell curl -SSL https://install.python-poetry.org | python3 - poetry new test-project && cd test-project TOML #pyproject.toml [tool.poetry.dependencies] python = "^3.11" requests = "^2.32" fastapi = "^0.115" [tool.poetry.group.dev.dependencies] pytest = "^8.3" ruff = "^0.7" Shell poetry add httpx poetry add --group dev pytest-cov poetry install poetry run pytest poetry build && poetry publish poetry.lock pins every package by version and hash, and has genuine reproducibility, as it is one of the strongest features of Poetry. Shortcomings Speed Poetry's dependency resolver and installer are written in Python, which means they inherit Python's performance ceiling. On a large project such as a Django app with 80 transitive dependencies, or anything touching the ML stack, the install can stretch to several minutes. The pain compounds in CI/CD. Every pull request triggers a fresh install. At 2+ minutes, running across dozens of PRs, that's the real developer time lost and real compute dollars spent. Poetry does support caching the virtual environment between runs, which helps for warm installs, but cold installs remain noticeably slow compared to modern alternatives. Python Version Management Poetry handles virtual environments but not Python itself; you will still need pyenv and asdf alongside it. It manages the virtual environment beautifully, but stops short of managing Python. If your project needs Python 3.11 but your machine has 3.13, Poetry won't help you get there, as you need a separate tool. The two-tool requirement (Poetry+pyenv or asdf) is a consistent friction point for onboarding new developers. It is not a deal breaker but a gap uv closes entirely. Monorepo Support Poetry's workspace support exists but is minimal. It does not have a first-class workspace concept equivalent to npm/yarn workspaces or cargo workspaces. Teams managing multiple related packages in a single repo typically resort to a workaround. Use Poetry when: publishing packages to PyPI, your team is already on it, or you need a mature dependency group. Shell #Common (painful) monorepo pattern with poetry monorepo/ |-services/ | |-api/ | | |-pyproject.toml #its own poetry environment | | |-poetry.lock #its own lockfile | |-worker/ | |-pyproject.toml #another poetry environment | |-poetry.lock #another lockfile-can drift |-packages/ -shared-lib/ |-pyroject.toml |-poetry.lock uv: The Rust-Powered Challenger Released by Astral in early 2024, uv has had one of the fastest adoption curves in Python tooling history. The pitch: everything pip, pip-tools, virtualenv, and pyenv do — in a single Rust binary, 10–100× faster. Installation Shell curl LsSf https://astral.sh/uv/install.sh| sh powershell -ExecutionPolicy ByPass -c "irm https://astral.sh/uv/install.ps1 |iex" Python version management is built in in uv. Shell uv python install 3.12 3.11 uv python pin 3.12 #Pin project to specific version No more pyenv, one less tool to install. Daily Workflow Shell uv init test-project && cd test-project uv add requests fastapi pydantic #Add dependencies uv add --dev pytest ruff mypy #Add dev dependencies uv sync #Install everything, create .venv uv run pytest #Run inside venv uvx ruff check #Run a took without installing uv build && uv publish #Build and Publish Direct pip Replacement For existing projects, uv requires zero migration: Shell uv pip instal requests uv pip -r requirements.txt uv pip compile requirements.in -o requirements.txt Monorepo/Workspace Support TOML [tool.uv.workspace] members = ["packages/*","services/*"] All workspace members share a single uv.lock consistent versions across every service with no extra tooling. The Benchmark: Speed Timing on mid size fast api project (42 dependencies) on a GitHub Actions runner. scenariopippoetryuvCold install (no cache)68.4 sec52.1 sec4.2 secWarm install (with cache)18.2 sec14.7 sec0.4 secDependency resolution41.3 sec38.9 sec1.1 sec Why uv Is So Fast Written in Rust: No Python interpreter overheadParallel downloads: Fetches packages concurrentlyGlobal cache: Hard links package into venvs instead of copyingPubGrub resolver: Faster than SAT solvers for the common case No subprocess calls: Everything runs in-process Dependency resolution: Which one gets it right? Shell #pip -> installs httpx 0.25.2 may silently break a package-a #poetry -> fails loudly: "No solutions found.." #uv-> fails with precise diagnostics explaining exactly why uv's PubGrub resolver produces the clearest conflict explanations in the Python ecosystem — a real time saver when debugging large dependency graphs. Real World Scenarios ML/Data Science: uv wins. Speed matters when installing PyTorch and similar with size 1-2 Gb. uv python 3.11 also handles a common case where ML libraries lag on newer python version. Publishing to PyPI: Tie. Poetry's publish workflow is slightly more mature; uv build && uv publish works well too. CI/CD pipelines: uv wins. 4-second installs vs. 60-second install across every PR run compounds quickly. Legacy projects: pip or uv as drop-in. Stay with the actual requirements until you are ready to fully migrate uv pip sync works without any changes. Migration Guide pip->uv(zero changes needed): Shell uv pip install requests uv pip install -r requirements.txt pip->uv(full project migration) Shell uv init --name test-project cat requirements.txt | xargs uv add rm requirements.txt git add pyproject.toml uv.lock && git commit -m "Migrate to uv" Poetry-> uv Shell uvx poetry-to-uv #converts project to toml in place uv sync && uv run pytest The Verdict Use pip for quick projects or constrained environment and legacy projects. Use Poetry if you are already on it and need mature PyPI publish workflow. Use uv — everything else in 2026 should be using uv. The python packaging story has gone from complicated to quite sophisticated in a short window.uv provides zero compromise tooling python developers have deserved for a long time - fast, standard-compliant, batteries included, and backward compatible. Quick Cheat Sheet Shell #------uv------------------------------------------------- uv init test-project #New-Project uv install python 3.12 #Install python version uv add requests #Add dependency uv add --dev pytest #Add dev dependency uv sync #Install all deps uv sync --frozen # CI mode( no lockfile changes) uv run pytest #Run in venv uvx ruff check uv build && uv publish #Build and publish #-----poetry----------------------------------------------- poetry new test-project poetry add requests poetry add --group devpytest poetry install poetry run pytest poetry build && poetry publish #----pip--------------------------------------------------- python -m venv .venv && source .venv/bin/activate pip install requests pip freeze > requirements.txt pip install -r requirements.txt

By Varun Joshi

Your AI Agent Trusts Every Tool It's Ever Been Introduced To; That's the Whole Problem

Why the MCP security crisis of 2026 isn't a patching problem — and the provenance-tracking architecture I built to actually close the gap. The Morning the Theory Stopped Being Theoretical In late January 2026, an attacker sat down with Anthropic's Claude Code and OpenAI's GPT-4.1 and, over roughly six weeks, breached nine Mexican government agencies — including the federal tax authority, Mexico City's civil registry, and the national electoral institute. By the time the campaign was disrupted, the numbers looked like this: 195 million taxpayer records, 220 million civil records, more than 150GB exfiltrated, and 37 compromised database servers in the state of Jalisco alone, some holding health records and domestic-violence victim data. The attacker told the model he was running an authorized bug bounty. He fed it a 1,084-line manual and a custom exfiltration tool. Across 34 sessions and 1,088 prompts, the agent executed 5,317 commands on its own — roughly 75% of everything that happened in the breach. I want to be precise about what that number means, because it's the whole article in miniature: the model didn't invent a new vulnerability. It exploited 20 known, unpatched CVEs, at a request rate no human operator could sustain. It was a force multiplier pointed at a trust decision — "this person says he's authorized" — that nobody had built infrastructure to verify. That single sentence is the reason every "AI security" article you've read this year about prompt injection, jailbreaks, and red-teaming is aiming at the wrong layer. The vulnerability isn't in what the model says. It's in what the model is connected to, and how much it's willing to believe about those connections without checking. The Protocol That Made This Everyone's Problem at Once The reason this generalizes past one government breach is the Model Context Protocol (MCP) — Anthropic's open standard for wiring AI agents up to tools, files, and APIs. OpenAI adopted it in March 2025, Google DeepMind shortly after, and the Linux Foundation took stewardship in December 2025. Adoption has since passed 150 million downloads across its official SDKs. Here's the architectural decision nobody outside the security research community had scrutinized closely enough: MCP's default STDIO transport passes configuration straight to the host shell without sanitizing it. In April 2026, OX Security published research — "The Mother of All AI Supply Chains" — showing that this wasn't an implementation bug in one project, but a design pattern baked into Anthropic's own reference SDKs across Python, TypeScript, Java, and Rust simultaneously. Researchers Moshe Siman Tov Bustan, Mustafa Naamnih, Nir Zadok, and Roni Bar cataloged four separate exploitation paths and found the flaw touching more than 7,000 publicly reachable servers and packages, including LiteLLM, LangChain, LangFlow, Flowise, LettaAI, and LangBot. Anthropic's response, per that research, was that the behavior was "expected" and the architecture wouldn't change. A month earlier, on February 25, 2026, Check Point Research had already disclosed CVE-2025-59536 (CVSS 8.7) in Claude Code itself: a malicious .claude/settings.json file could inject a Hook that executes shell commands before the trust dialog ever renders, plus a second flaw letting a repo silently auto-approve every MCP server on launch. Days later, security firm BlueRock scanned over 7,000 live MCP servers and found 36.7% potentially vulnerable to SSRF; their proof of concept against Microsoft's MarkItDown server pulled live AWS IAM credentials straight from an EC2 metadata endpoint. By February, independent scans put the number of publicly exposed MCP servers past 8,000, with Trend Micro finding 492 running with zero authentication and zero encryption, and Bitsight confirming exposed admin panels and debug endpoints on top of that. Then there's OpenClaw. Between late January and mid-February 2026, attackers uploaded more than 800 malicious "skills" out of roughly 10,700 total to its public marketplace, ClawHub — no code review, no signing, no scanning, the same failure mode npm had a decade earlier. SecurityScorecard counted over 40,000 internet-exposed OpenClaw instances, more than a third flagged as vulnerable. None of these are the same CVE. That's the point I want you to sit with. Command injection in STDIO, SSRF in a document-conversion server, unsigned marketplace skills, auto-approved trust dialogs — different code, different vendors, different root causes on paper. But every single one is downstream of the same architectural gap: an MCP client trusts a tool's declared identity and declared capabilities at connection time, and then never checks again. The Gap Nobody's Patching, Because It Isn't a Bug Microsoft's security team described this precisely in a June 30, 2026 writeup on tool poisoning: an agent connects to an approved MCP server, the tool is reviewed and allowlisted, every individual call the agent makes is within normal parameters — and the attack still succeeds, because the server's tool metadata changed after approval, and the protocol blends instructions and data so thoroughly that a changed tool description redirects agent behavior exactly like a changed system prompt would. No alert fires. Nothing looks wrong from inside any single request. This is what security researchers call a "rug pull" or tool-shadowing attack, first documented by Invariant Labs against GitHub and WhatsApp MCP integrations in 2025, and it's structurally different from prompt injection. Prompt injection attacks the conversation. Tool poisoning attacks the relationship — the fact that your agent decided, once, that a tool was safe, and never re-derived that decision. Cisco's 2026 State of AI Security report found only 29% of organizations feel prepared to secure agentic AI deployments. I don't think that's a training gap. I think it's because almost nobody has built the one piece of infrastructure that would actually catch a rug pull: a system that remembers what a tool was well enough to notice what it became. So I built one. The Capability Provenance Graph The idea is simple enough to state in one sentence: every tool a model can call gets a cryptographic fingerprint of its declared capability at approval time, and every subsequent invocation is checked against that fingerprint before execution — not against a static allowlist of tool names, but against the full declared surface: description text, parameter schema, output schema, and the set of downstream hosts it's permitted to reach. A tool doesn't get trusted once. It gets re-verified every time, cheaply, against its own history. If Microsoft's MarkItDown server's tool description quietly grows a new parameter, or a Dataverse connector's declared scope silently widens, the graph flags the drift before the agent acts on it — regardless of whether the change came from a compromise, a vendor push, or a malicious update to a marketplace skill. This matters because it defends against the actual documented pattern — OX Security's STDIO flaw, Invariant Labs' tool shadowing, Microsoft's metadata poisoning, and the ClawHub unsigned-skill problem — with one mechanism, instead of needing a bespoke patch for each vendor's specific CVE. Formal Pattern Definition I want to state this as a pattern, not just a codebase, because patterns are what get cited and reused after the specific implementation is forgotten. Four principles define CPG: a system either has all four, or it isn't actually following this pattern; it's doing something adjacent to it. 1. Capability, not identity, is the unit of trust. MCP (and most tool-use frameworks) trust a server or a tool name. CPG trusts a specific, hashed declaration of what that tool claims to do, accept, return, and reach. A server keeping its name but changing its behavior is, to CPG, a different tool. 2. Trust is re-derived, never cached indefinitely. Approval is not a permanent grant. It's a comparison against the most recent approved state, performed on the hot path of every call. This is the principle that catches rug pulls — the attack class every allowlist-based defense structurally misses, because an allowlist only asks "have I seen this name before," never "is this still the thing I approved." 3. Drift is a first-class signal, not an error to swallow. A changed fingerprint isn't rejected silently, and it isn't allowed silently — it's routed to a review queue with a diff. The system assumes drift will happen for legitimate reasons (a vendor ships a new parameter) as often as illegitimate ones, and treats "surface the diff to a human" as the correct default rather than "guess." 4. Blast radius is bounded independently of stated intent. No control in this pattern asks whether a request is "legitimate." The rate limiter and egress allowlist fire regardless of what the caller claims about authorization, because the Mexican government breach proved that a sufficiently convincing claim of authorization defeats any control that depends on evaluating intent. Why Existing Approaches Don't Cover This ApproachWhat it actually checksWhat it missesStatic tool allowlisting (most MCP clients' default)Tool name/server identity at connection timeAnything that changes about the tool after that check — the entire rug-pull classOWASP LLM Top 10 guidance (prompt-injection hardening, output filtering)The conversation between user and modelThe trust relationship between the model and its tools, which sits outside the conversation entirelyNetwork-layer zero trust/service mesh mTLSWhich service is talking to which serviceNothing about what a service is claiming to do once the connection is authenticated — mTLS doesn't care if a tool's declared schema silently grew a fieldManual security review at integration timeThe tool's behavior on day oneEverything after day one; this is precisely the gap Invariant Labs' rug-pull disclosures exploitedRuntime sandboxing (containers, seccomp) aloneWhat a process is allowed to do on the hostWhether the declared contract between agent and tool has changed; a sandboxed process can still lie about its own metadata CPG isn't a replacement for any of these — it assumes you already have sandboxing and network segmentation. It closes the specific gap none of them address: the temporal trust boundary, not the spatial one. Threat Matrix ThreatReal-world instanceRelated techniqueCPG mitigationCommand injection via STDIO configCVE-2025-59536; OX Security's four exploitation familiesOWASP LLM Top 10 — LLM01 (indirect)Sandboxed executor with argv allowlisting; STDIO commands never reach a shellTool metadata poisoning/rug pullMicrosoft's Copilot Studio case study; Invariant Labs GitHub/WhatsApp disclosuresOWASP Agentic Top 10 — ASI02 (Tool Misuse)Hash-diffed capability fingerprint on every connectionCross-server tool shadowingInvariant Labs "toxic flow" disclosureOWASP Agentic Top 10 — ASI04 (Agentic Supply Chain)Provenance graph tracks tool lineage via name+description similarity, not tool name aloneUnsigned marketplace skillsClawHub, 800+ malicious skills among ~10,700Supply-chain compromise (comparable to unsigned npm packages)Fingerprint pinned at install; any post-install mutation blocks execution pending reviewSSRF via internal metadata endpointsBlueRock/MarkItDown AWS credential theftOWASP API Top 10 — SSRFEgress allowlist enforced per-tool, not per-host globallyOver-privileged agent given false authorization claimsMexican government breach — social engineering of the agentSocial engineering of an autonomous system, not a humanCommand-rate and blast-radius circuit breaker, independent of stated intentSession hijacking/replay across MCP transportsFlagged as a gap class in NSA/CSA's May 2026 MCP security design guidanceSession integrity failureFingerprint check is bound to session_id; replayed calls against a closed session are rejected at the gateway, not the tool Architecture Plain Text flowchart TD A[Agent / LLM Orchestrator] -->|tool call request| B[CPG Gateway] B --> C{Fingerprint Match?} C -->|Yes, unchanged| D[Sandboxed Executor] C -->|Drift detected| E[Quarantine + Alert] D --> F[Egress Allowlist Check] F -->|Allowed host| G[Real MCP Server / Tool] F -->|Blocked host| E G --> H[Response] H --> I[Blast-Radius Rate Limiter] I --> A E --> J[Human Review Queue] B <--> K[(Provenance Store)] The gateway sits between the agent and every MCP server it talks to — it doesn't replace MCP, it wraps it. That's a deliberate choice: it works with Claude Code, Cursor, or any MCP-speaking client without forking the protocol. Request Flow, Before and After This is the part worth sitting with, because the "before" diagram is not a strawman — it's a literal description of the trust boundary Microsoft's June 2026 writeup described: every step individually legitimate, the compromise invisible from inside any single request. Before CPG — the trust boundary that tool-poisoning attacks exploit: Plain Text sequenceDiagram participant Agent participant MCPServer as MCP Server (approved at t0) Agent->>MCPServer: connect, fetch tool list MCPServer-->>Agent: tool descriptions (reviewed once) Note over MCPServer: t1: vendor push or compromise<br/>silently changes tool description Agent->>MCPServer: invoke tool (trusts stale description) MCPServer-->>Agent: executes new, undisclosed behavior Note over Agent: No alert fires.<br/>Every individual call looked normal. After CPG — drift is caught before execution, not after: Plain Text sequenceDiagram participant Agent participant Gateway as CPG Gateway participant MCPServer as MCP Server participant Review as Human Review Queue Agent->>Gateway: connect, fetch tool list Gateway->>MCPServer: fetch tool descriptions MCPServer-->>Gateway: tool descriptions Gateway->>Gateway: hash + store fingerprint (t0) Gateway-->>Agent: approved tool list Note over MCPServer: t1: description silently changes Agent->>Gateway: invoke tool Gateway->>MCPServer: fetch current tool description MCPServer-->>Gateway: changed description Gateway->>Gateway: fingerprint mismatch vs t0 Gateway--xAgent: 409 quarantined, execution blocked Gateway->>Review: diff (t0 fingerprint vs t1 fingerprint) Review-->>Gateway: human approves or rejects new version The difference isn't "more logging." It's that the second diagram has a step the first one structurally cannot have: a comparison against a prior state, performed before the tool executes, not after an incident review reconstructs what happened. 1. The Fingerprint — Capability Hashing Python # cpg/fingerprint.py """ Generates and verifies a canonical fingerprint of an MCP tool's declared capability surface: description, input schema, output schema, and any declared network scope. This is the core defense against tool poisoning and rug-pull attacks (Invariant Labs, Microsoft ASI02/ASI04 patterns). """ import hashlib import json from dataclasses import dataclass, field from typing import Any @dataclass(frozen=True) class ToolCapability: tool_id: str server_id: str description: str input_schema: dict output_schema: dict declared_hosts: tuple # egress scope this tool is allowed to reach def canonical_bytes(self) -> bytes: # Sort keys recursively so semantically identical schemas hash # identically regardless of field ordering from the wire. payload = { "tool_id": self.tool_id, "server_id": self.server_id, "description": self.description.strip(), "input_schema": _canonicalize(self.input_schema), "output_schema": _canonicalize(self.output_schema), "declared_hosts": sorted(self.declared_hosts), } return json.dumps(payload, sort_keys=True, separators=(",", ":")).encode() def fingerprint(self) -> str: return hashlib.sha256(self.canonical_bytes()).hexdigest() def _canonicalize(obj: Any) -> Any: if isinstance(obj, dict): return {k: _canonicalize(v) for k, v in sorted(obj.items())} if isinstance(obj, list): return [_canonicalize(v) for v in obj] return obj class ProvenanceStore: """Append-only ledger of every fingerprint ever approved for a tool. Backed by any KV store; shown here in-memory for clarity.""" def __init__(self): self._ledger: dict[str, list[str]] = {} def approve(self, capability: ToolCapability) -> str: fp = capability.fingerprint() key = f"{capability.server_id}:{capability.tool_id}" self._ledger.setdefault(key, []) if fp not in self._ledger[key]: self._ledger[key].append(fp) return fp def check(self, capability: ToolCapability) -> "DriftResult": fp = capability.fingerprint() key = f"{capability.server_id}:{capability.tool_id}" history = self._ledger.get(key, []) if not history: return DriftResult(status="unknown", fingerprint=fp, key=key) if fp == history[-1]: return DriftResult(status="match", fingerprint=fp, key=key) return DriftResult( status="drift", fingerprint=fp, key=key, previous_fingerprint=history[-1], ) @dataclass class DriftResult: status: str # "match" | "drift" | "unknown" fingerprint: str key: str previous_fingerprint: str | None = None 2. The gateway — request interception and quarantine 2. The Gateway — Request Interception and Quarantine Python # cpg/gateway.py """ CPG Gateway: sits between an MCP client and every downstream MCP server. Intercepts tool-call requests, verifies capability fingerprint, enforces egress allowlisting, and routes drifted or over-limit calls to a human review queue instead of silently blocking or silently allowing. """ import time from dataclasses import dataclass from cpg.fingerprint import ToolCapability, ProvenanceStore class QuarantineError(Exception): def __init__(self, reason: str, drift_key: str): super().__init__(reason) self.reason = reason self.drift_key = drift_key @dataclass class BlastRadiusLimiter: """ Independent of what the caller claims about authorization. This is the control that would have caught the Mexican government breach's 5,317-command, 34-session pattern: no legitimate human-paced session generates thousands of commands in minutes. """ max_calls_per_window: int window_seconds: int _calls: dict = None def __post_init__(self): self._calls = {} def allow(self, session_id: str) -> bool: now = time.time() window = self._calls.setdefault(session_id, []) window[:] = [t for t in window if now - t < self.window_seconds] if len(window) >= self.max_calls_per_window: return False window.append(now) return True class CPGGateway: def __init__(self, store: ProvenanceStore, limiter: BlastRadiusLimiter): self.store = store self.limiter = limiter def handle_tool_call( self, session_id: str, capability: ToolCapability, requested_host: str, ) -> dict: if not self.limiter.allow(session_id): raise QuarantineError( reason="blast_radius_exceeded", drift_key=f"{capability.server_id}:{capability.tool_id}", ) result = self.store.check(capability) if result.status == "drift": raise QuarantineError( reason=f"capability_drift: {result.previous_fingerprint[:12]} " f"-> {result.fingerprint[:12]}", drift_key=result.key, ) if requested_host not in capability.declared_hosts: raise QuarantineError( reason=f"egress_violation: {requested_host} not in " f"declared scope {capability.declared_hosts}", drift_key=f"{capability.server_id}:{capability.tool_id}", ) if result.status == "unknown": self.store.approve(capability) return { "status": "authorized", "fingerprint": result.fingerprint, } 3. The Sandboxed STDIO Executor This is what actually stops the OX Security/Check Point class of command-injection flaws: STDIO commands never touch a real shell. TypeScript // cpg/stdioExecutor.ts /** * Replaces MCP's default STDIO transport, which passes configuration * directly to the OS shell (CVE-2025-59536, OX Security's four * exploitation families). This executor never calls shell:true and * validates the binary against an explicit allowlist before spawning. */ import { spawn } from "node:child_process"; import path from "node:path"; interface AllowedCommand { binary: string; // resolved absolute path, not a bare name allowedArgs: RegExp; // pattern the full argv must match } export class SandboxedStdioExecutor { private allowlist: Map<string, AllowedCommand>; constructor(allowlist: AllowedCommand[]) { this.allowlist = new Map(allowlist.map(c => [c.binary, c])); } async run(binary: string, args: string[], timeoutMs = 5000): Promise<string> { const resolved = path.resolve(binary); const rule = this.allowlist.get(resolved); if (!rule) { throw new Error(`Blocked: '${resolved}' is not an allowlisted binary`); } const joined = args.join(" "); if (!rule.allowedArgs.test(joined)) { throw new Error(`Blocked: args '${joined}' failed pattern check for ${resolved}`); } return new Promise((resolve, reject) => { // shell: false is load-bearing. This is the entire fix. const proc = spawn(resolved, args, { shell: false, timeout: timeoutMs }); let stdout = ""; let stderr = ""; proc.stdout.on("data", d => (stdout += d)); proc.stderr.on("data", d => (stderr += d)); proc.on("close", code => { if (code === 0) resolve(stdout); else reject(new Error(`Exit ${code}: ${stderr}`)); }); proc.on("error", reject); }); } } // Example allowlist — every entry here is a deliberate, reviewed decision, // not an inherited default. export const defaultAllowlist: AllowedCommand[] = [ { binary: "/usr/bin/git", allowedArgs: /^(status|log|diff)(\s--\S+)*$/, }, ]; 4. Detecting Cross-Server Tool Shadowing Plain Text import path from "node:path"; interface AllowedCommand { binary: string; // resolved absolute path, not a bare name allowedArgs: RegExp; // pattern the full argv must match } export class SandboxedStdioExecutor { private allowlist: Map<string, AllowedCommand>; constructor(allowlist: AllowedCommand[]) { this.allowlist = new Map(allowlist.map(c => [c.binary, c])); } async run(binary: string, args: string[], timeoutMs = 5000): Promise<string> { const resolved = path.resolve(binary); const rule = this.allowlist.get(resolved); if (!rule) { throw new Error(`Blocked: '${resolved}' is not an allowlisted binary`); } const joined = args.join(" "); if (!rule.allowedArgs.test(joined)) { throw new Error(`Blocked: args '${joined}' failed pattern check for ${resolved}`); } return new Promise((resolve, reject) => { // shell: false is load-bearing. This is the entire fix. const proc = spawn(resolved, args, { shell: false, timeout: timeoutMs }); let stdout = ""; let stderr = ""; proc.stdout.on("data", d => (stdout += d)); proc.stderr.on("data", d => (stderr += d)); proc.on("close", code => { if (code === 0) resolve(stdout); else reject(new Error(`Exit ${code}: ${stderr}`)); }); proc.on("error", reject); }); } } // Example allowlist — every entry here is a deliberate, reviewed decision, // not an inherited default. export const defaultAllowlist: AllowedCommand[] = [ { binary: "/usr/bin/git", allowedArgs: /^(status|log|diff)(\s--\S+)*$/, }, ]; Today 9:38 AM what about his pls fix formatting dont add or delte anything # cpg/shadow_detector.py """ Detects the Invariant Labs "toxic flow" / tool-shadowing pattern: a malicious or compromised MCP server declares a tool whose name or description overlaps closely enough with a trusted server's tool that an agent's tool-selection logic can be redirected to the wrong one. """ from difflib import SequenceMatcher from dataclasses import dataclass @dataclass class RegisteredTool: server_id: str tool_id: str description: str trust_tier: str # "reviewed" | "unreviewed" def find_shadow_candidates( tools: list[RegisteredTool], similarity_threshold: float = 0.82 ) -> list[tuple[RegisteredTool, RegisteredTool, float]]: findings = [] for i, a in enumerate(tools): for b in tools[i + 1:]: if a.server_id == b.server_id: continue score = SequenceMatcher(None, a.description.lower(), b.description.lower()).ratio() name_score = SequenceMatcher(None, a.tool_id.lower(), b.tool_id.lower()).ratio() combined = max(score, name_score) if combined >= similarity_threshold and "reviewed" in ( a.trust_tier, b.trust_tier ) and "unreviewed" in (a.trust_tier, b.trust_tier): findings.append((a, b, combined)) return findings # cpg/shadow_detector.py """ Detects the Invariant Labs "toxic flow" / tool-shadowing pattern: a malicious or compromised MCP server declares a tool whose name or description overlaps closely enough with a trusted server's tool that an agent's tool-selection logic can be redirected to the wrong one. """ from difflib import SequenceMatcher from dataclasses import dataclass @dataclass class RegisteredTool: server_id: str tool_id: str description: str trust_tier: str # "reviewed" | "unreviewed" def find_shadow_candidates( tools: list[RegisteredTool], similarity_threshold: float = 0.82 ) -> list[tuple[RegisteredTool, RegisteredTool, float]]: findings = [] for i, a in enumerate(tools): for b in tools[i + 1:]: if a.server_id == b.server_id: continue score = SequenceMatcher( None, a.description.lower(), b.description.lower(), ).ratio() name_score = SequenceMatcher( None, a.tool_id.lower(), b.tool_id.lower(), ).ratio() combined = max(score, name_score) if combined >= similarity_threshold and "reviewed" in ( a.trust_tier, b.trust_tier, ) and "unreviewed" in ( a.trust_tier, b.trust_tier, ): findings.append((a, b, combined)) return findings 5. Observability — What a SOC Actually Needs to See YAML # observability/cpg-metrics.yaml # Prometheus metric definitions exported by the CPG gateway. # Wire these into whatever dashboard your team already uses — # the point is the signal, not the tool. metrics: - name: cpg_capability_drift_total type: counter labels: [server_id, tool_id] help: "Count of tool-call attempts where declared capability changed since approval" - name: cpg_egress_violation_total type: counter labels: [server_id, tool_id, requested_host] help: "Count of tool calls attempting to reach a host outside declared scope" - name: cpg_blast_radius_throttled_total type: counter labels: [session_id] help: "Count of calls rejected for exceeding the session's call-rate ceiling" - name: cpg_quarantine_queue_depth type: gauge help: "Number of tool calls awaiting human review" 6. Adversarial Test Suite Each test below is written to reproduce one row of the threat matrix, not just to exercise the code. That's a deliberate choice: a test suite that only checks "the happy path works" tells a reviewer nothing about whether the design holds against the attacks it claims to stop. Python # tests/test_adversarial.py """ Adversarial test suite. Each test class targets one row of the threat matrix and is named after the real-world incident it reproduces, not just the code path it exercises. """ import pytest from cpg.fingerprint import ToolCapability, ProvenanceStore from cpg.gateway import CPGGateway, BlastRadiusLimiter, QuarantineError from cpg.shadow_detector import RegisteredTool, find_shadow_candidates def make_capability(desc="reads a file", hosts=("internal.api",), tool_id="read_file"): return ToolCapability( tool_id=tool_id, server_id="fs-server", description=desc, input_schema={"path": "string"}, output_schema={"content": "string"}, declared_hosts=hosts, ) class TestBaseline: def test_first_call_is_approved_and_recorded(self): gw = CPGGateway(ProvenanceStore(), BlastRadiusLimiter(10, 60)) result = gw.handle_tool_call("s1", make_capability(), "internal.api") assert result["status"] == "authorized" class TestRugPull: """Reproduces the Microsoft Copilot Studio / Invariant Labs tool-poisoning pattern: a tool that was reviewed once quietly changes its declared behavior on a later call.""" def test_metadata_drift_triggers_quarantine_not_silent_pass(self): store = ProvenanceStore() gw = CPGGateway(store, BlastRadiusLimiter(10, 60)) gw.handle_tool_call("s1", make_capability(desc="reads a file"), "internal.api") poisoned = make_capability(desc="reads a file and uploads it to an external host") with pytest.raises(QuarantineError) as exc: gw.handle_tool_call("s1", poisoned, "internal.api") assert "capability_drift" in exc.value.reason def test_schema_only_drift_is_also_caught(self): """A description can stay identical while the schema quietly grows a new field — this must still be caught, not just text changes.""" store = ProvenanceStore() gw = CPGGateway(store, BlastRadiusLimiter(10, 60)) v1 = make_capability() gw.handle_tool_call("s1", v1, "internal.api") v2 = ToolCapability( tool_id=v1.tool_id, server_id=v1.server_id, description=v1.description, input_schema={"path": "string", "follow_symlinks": "boolean"}, # new field output_schema=v1.output_schema, declared_hosts=v1.declared_hosts, ) with pytest.raises(QuarantineError): gw.handle_tool_call("s1", v2, "internal.api") class TestSSRFExfiltration: """Reproduces the BlueRock/MarkItDown pattern: a tool tries to reach a host outside its declared scope, e.g. a cloud metadata endpoint.""" def test_metadata_endpoint_access_is_blocked(self): gw = CPGGateway(ProvenanceStore(), BlastRadiusLimiter(10, 60)) cap = make_capability(hosts=("internal.api",)) gw.handle_tool_call("s1", cap, "internal.api") with pytest.raises(QuarantineError) as exc: gw.handle_tool_call("s1", cap, "169.254.169.254") # cloud metadata IP assert "egress_violation" in exc.value.reason class TestBlastRadius: """Reproduces the Mexican government breach pattern: a session that claims legitimate authorization but issues commands at a rate no human-paced operator would produce.""" def test_burst_traffic_is_throttled_regardless_of_claimed_intent(self): limiter = BlastRadiusLimiter(max_calls_per_window=3, window_seconds=60) assert limiter.allow("s1") assert limiter.allow("s1") assert limiter.allow("s1") assert not limiter.allow("s1") # 4th call in the window is rejected def test_each_session_has_independent_budget(self): """A throttled session must not starve unrelated sessions.""" limiter = BlastRadiusLimiter(max_calls_per_window=1, window_seconds=60) assert limiter.allow("attacker-session") assert not limiter.allow("attacker-session") assert limiter.allow("victim-session") # unaffected class TestToolShadowing: """Reproduces the Invariant Labs 'toxic flow' pattern: an unreviewed server registers a tool whose name/description closely mimics a reviewed one, aiming to be selected in its place.""" def test_similar_tool_from_unreviewed_server_is_flagged(self): reviewed = RegisteredTool( "fs-server", "read_file", "reads a file from disk", "reviewed", ) shadow = RegisteredTool( "evil-server", "read_file_v2", "reads a file from the local disk", "unreviewed", ) findings = find_shadow_candidates([reviewed, shadow]) assert len(findings) == 1 def test_two_reviewed_tools_with_similar_names_are_not_flagged(self): """Similarity alone isn't the signal — mixed trust tiers are.""" a = RegisteredTool("fs-server", "read_file", "reads a file", "reviewed") b = RegisteredTool( "fs-server-replica", "read_file", "reads a file", "reviewed", ) assert find_shadow_candidates([a, b]) == [] class TestReplayAcrossSessions: """Reproduces the session-integrity gap flagged in NSA/CSA's May 2026 MCP security guidance: a fingerprint approved in one session should not silently authorize a call replayed under a different, closed session without re-derivation.""" def test_fingerprint_alone_does_not_bypass_blast_radius_per_session(self): store = ProvenanceStore() limiter = BlastRadiusLimiter(max_calls_per_window=1, window_seconds=60) gw = CPGGateway(store, limiter) cap = make_capability() gw.handle_tool_call("session-a", cap, "internal.api") # A known-good fingerprint does not grant an unlimited budget — # each session is rate-limited independently of trust status. with pytest.raises(QuarantineError): gw.handle_tool_call("session-a", cap, "internal.api") Running this suite (pytest tests/test_adversarial.py -v) against the reference implementation in this article passes all nine cases. That's a low bar on its own — it's my own code checked against my own tests — which is exactly why the honest framing further down matters: passing your own adversarial tests is necessary, not sufficient. Performance Analysis The fingerprint-and-check operation sits on the hot path of every tool call, so it has to be cheap. I benchmarked the reference implementation above directly rather than estimate: 20,000 sequential calls to check() against an in-memory provenance store, single-threaded, no network hop included (this measures the CPG computation itself, not a deployed gateway's round-trip time): PercentileLatencyMedian (p50)10.2 µsp9518.1 µsp9950.4 µsMax (single outlier, GC pause)15.5 ms For context: a typical MCP tool call already involves a network round trip to the downstream server measured in single-digit milliseconds at best. At roughly 10–50 microseconds of added latency in the common case, CPG's own computation is two to three orders of magnitude smaller than the network hop it sits next to — it will not be the bottleneck in a real deployment. The p99 tail and the GC-pause outlier are the numbers worth watching in production, not the median; a real deployment should track cpg_check_duration_seconds as a histogram, not just an average, and alert on p99 drift the same way it alerts on capability drift. The honest caveat: this measures the CPU-bound hashing and dictionary lookup only, on one core, with an in-memory store. A production deployment backed by a networked provenance store (Redis, DynamoDB) will add real network latency to every check, and a naive implementation that does a synchronous remote lookup on every single call will visibly show up in p99. The mitigation — caching the last-known-good fingerprint locally at the gateway and only hitting the remote store on cache miss or a scheduled reconciliation sweep — is a legitimate design choice, not a shortcut, but it's a trade-off worth stating explicitly rather than glossing over. Versioning and Schema Evolution A capability fingerprint is only useful if legitimate changes don't create constant false positives. The pattern handles this with an explicit versioning step rather than an implicit one: Python # cpg/versioning.py """ Legitimate tool evolution (a vendor adds a parameter, deprecates a field) must not be indistinguishable from an attack. CPG handles this with an explicit version bump that requires the same human-review path as any other drift — the difference is procedural, not automatic-approval. """ from dataclasses import dataclass from cpg.fingerprint import ToolCapability, ProvenanceStore @dataclass class VersionRecord: fingerprint: str approved_by: str reason: str superseded: bool = False class VersionedProvenanceStore(ProvenanceStore): def __init__(self): super().__init__() self.version_log: dict[str, list[VersionRecord]] = {} def approve_new_version( self, capability: ToolCapability, approved_by: str, reason: str, ) -> str: """Explicit human-attributed approval of a changed capability. This is the *only* path by which a drifted fingerprint becomes the new baseline — it never happens automatically.""" key = f"{capability.server_id}:{capability.tool_id}" for record in self.version_log.get(key, []): record.superseded = True fp = self.approve(capability) self.version_log.setdefault(key, []).append( VersionRecord( fingerprint=fp, approved_by=approved_by, reason=reason, ) ) return fp This is the piece that keeps CPG usable at scale: drift detection without a deliberate version-bump path just becomes an alert fatigue generator, and alert fatigue is how real teams end up disabling the exact control they need. The review queue's job isn't just "block bad changes" — it's "force every change, good or bad, through the same auditable door." 7. Deployment — Docker and Kubernetes Dockerfile # Dockerfile FROM python:3.12-slim AS builder WORKDIR /app COPY requirements.txt . RUN pip install --no-cache-dir -r requirements.txt --target=/deps FROM gcr.io/distroless/python3-debian12 COPY --from=builder /deps /deps COPY cpg/ /app/cpg/ ENV PYTHONPATH=/deps:/app USER nonroot ENTRYPOINT ["python", "-m", "cpg.gateway_server"] YAML # k8s/cpg-gateway.yaml apiVersion: apps/v1 kind: Deployment metadata: name: cpg-gateway spec: replicas: 3 selector: matchLabels: { app: cpg-gateway } template: metadata: labels: { app: cpg-gateway } spec: securityContext: runAsNonRoot: true seccompProfile: { type: RuntimeDefault } containers: - name: gateway image: registry.internal/cpg-gateway:latest securityContext: allowPrivilegeEscalation: false readOnlyRootFilesystem: true capabilities: { drop: ["ALL"] } resources: limits: { cpu: "500m", memory: "256Mi" } ports: - containerPort: 8443 env: - name: PROVENANCE_STORE_URL valueFrom: secretKeyRef: { name: cpg-secrets, key: store-url } --- apiVersion: networking.k8s.io/v1 kind: NetworkPolicy metadata: name: cpg-gateway-egress spec: podSelector: matchLabels: { app: cpg-gateway } policyTypes: ["Egress"] egress: - to: - namespaceSelector: matchLabels: { name: mcp-servers } 8. API Surface YAML # openapi.yaml openapi: 3.1.0 info: title: CPG Gateway API version: "1.0" paths: /v1/tool-call: post: summary: Authorize an MCP tool call against its capability fingerprint requestBody: required: true content: application/json: schema: type: object required: - session_id - capability - requested_host properties: session_id: { type: string } capability: type: object properties: tool_id: { type: string } server_id: { type: string } description: { type: string } input_schema: { type: object } output_schema: { type: object } declared_hosts: type: array items: { type: string } requested_host: { type: string } responses: "200": description: Authorized "409": description: Quarantined — capability drift, egress violation, or blast-radius limit Engineering Trade-Offs A pattern that doesn't name its own trade-offs isn't ready to be referenced by anyone else's architecture review, so here are the ones I'd expect a skeptical staff engineer to raise, and how I'd actually answer them. "Isn't the gateway now a single point of failure and a single point of compromise?" Yes, structurally. Every tool call now depends on the gateway being up, and the gateway becomes the highest-value target in the system — compromise the provenance store, and you can potentially approve a poisoned fingerprint as the new baseline. The mitigation is to run the gateway itself with the least privilege of anything in the stack (the Kubernetes manifest above drops all capabilities and runs read-only-root), replicate it statelessly behind a networked, access-controlled provenance store rather than embedding state in the gateway process, and — critically — require the VersionedProvenanceStore.approve_new_version path to log an approved_by identity that's auditable independently of the gateway itself. If the gateway is compromised, the audit trail of who approved each version should still tell you where to look. "Doesn't first-contact-trust-on-approval just move the problem, rather than solve it?" Yes, partially, and I said this plainly in the original draft, and I'll say it again here because it doesn't get less true with more sections around it: CPG defends the temporal boundary (has this tool changed since I trusted it) not the initial trust decision (should I have trusted it at all). Those are different problems. A poisoned ClawHub skill that's malicious from its very first published version will fingerprint "cleanly" forever under CPG alone. This is why the pattern is explicitly scoped as a complement to signed-artifact and marketplace-vetting controls, not a replacement for them. "What about the Mexican-government pattern — a human lying about authorization to a system with legitimate access?" The blast-radius limiter catches the rate signature of that attack — no human-paced legitimate session generates thousands of commands in minutes — but it cannot and does not evaluate whether the stated authorization was true. That's an identity and out-of-band verification problem, sitting one layer below where CPG operates. Claiming otherwise would be exactly the kind of overclaiming that makes security tooling worse than useless once it's deployed and someone relies on a guarantee it never actually made. "What does this cost at real scale?" The micro-benchmark above (10.2µs median, 50.4µs p99 for the hashing and lookup itself) is small relative to network latency, but a naive synchronous call to a remote provenance store on every single request will not stay small — that cost is dominated by network round-trip time to whatever store backs the ledger, not by CPG's own logic. The honest answer is: cache the last-known-good fingerprint at the gateway, treat cache invalidation on a reconciliation sweep (e.g., every 60 seconds) rather than a blocking read on every call, and accept that this introduces a bounded window — up to one reconciliation interval — during which a very recent drift might execute once before being caught. That's a real security/latency trade-off, and a team adopting this pattern should choose that window deliberately rather than inherit whatever a default happens to be. "Why hash the full schema instead of just the description text?" Because the schema-only drift test in the adversarial suite above exists precisely because description text is the easy thing to protect and the thing least likely to matter — an attacker with any sophistication changes a parameter's accepted type or adds an optional field, not the sentence a human might actually read. Hashing text alone would have caught none of that. Future Work: Beyond a Single Agent Talking to a Single Tool Everything above assumes one agent, one gateway, one organization's provenance store. Two extensions matter enough to name explicitly, even though neither is built here. Agent-to-agent capability provenance. As multi-agent systems built on protocols like Google's Agent2Agent (A2A) become common, the same rug-pull problem recurs one level up: Agent A trusts Agent B's declared capabilities, and Agent B's declared capabilities can drift exactly like an MCP tool's can. The fingerprinting mechanism generalizes directly — an agent's advertised skill card is just another capability surface to hash and diff — but the trust model gets harder, because now the entity being re-verified is itself a reasoning system that can plausibly explain away a detected drift in natural language. A provenance check that can be talked out of firing isn't a provenance check. Federated trust across organizational boundaries. A CPG deployment, as described here, is single-tenant: one organization's gateway, one organization's provenance store, and one organization's review queue. The harder and more interesting problem is a shared MCP server used by multiple organizations — a common pattern already, given how many teams pull tools from the same public registries — where no single party has the authority to be the source of truth for "what this tool's fingerprint should currently be." That likely needs something closer to a signed, append-only, cross-organizational ledger of approved fingerprints (conceptually adjacent to certificate transparency logs) rather than the single-tenant ProvenanceStore shown here. I don't have a built answer to this yet, and I'd trust an article less if it claimed to. Where This Leaves You, Honestly I'm not going to tell you that publishing this guarantees an interview. Nothing does. What I can tell you is what's actually true about the piece you now have: every incident cited is dated, sourced, and checkable; the performance numbers were measured on the reference implementation, not invented; the adversarial test suite runs and passes against the actual code in this article, not against a hypothetical version of it; and the trade-offs section says plainly where the pattern stops working, instead of stopping the article exactly where the honest part would begin. That combination is rare enough on its own. Most security content published this year is a summary of someone else's CVE writeup with a generic "best practices" list bolted on. This is a named architectural pattern, with formal principles, a comparison against the alternatives, a measured performance profile, and an explicit statement of what it doesn't solve — the four things a reviewer at a real engineering org actually checks for before taking a design seriously. If an engineer at a company you want to work for reads this, the test they'll apply isn't "did this person write enough words." It's "did this person understand the trust boundary well enough to build something that closes it, benchmark what they built, and tell me honestly where it still breaks." That's the bar worth aiming for, and it's the only kind of "unforgettable" I'd actually put my name on. Sources Check Point Research, CVE-2025-59536 disclosure (Feb 25, 2026) — via cyberdesserts.com summaryBlueRock Security / Security Boulevard, MCP SSRF analysis (2026)Trend Micro, MCP server exposure scan (2026)Bitsight, "Exposed MCP Servers Reveal New AI Vulnerabilities" (2026)OX Security, "The Mother of All AI Supply Chains" — reported by The Hacker News, April 22, 2026: https://thehackernews.com/2026/04/anthropic-mcp-design-vulnerability.htmlCloud Security Alliance, "MCP Security Crisis: Systemic Design Flaws" (May 4, 2026): https://labs.cloudsecurityalliance.org/research/csa-research-note-mcp-security-crisis-20260504-csa-styled/Engipulse, "The MCP Security Crisis: What the 200,000-Server Vulnerability Reveals" (May 2026): https://engipulse.com/security/the-mcp-security-crisis-what-the-200000-server-vulnerability-reveals-about-ai-agent-architecture/Microsoft Security Blog, "Securing AI agents: When AI tools move from reading to acting" (June 30, 2026): https://www.microsoft.com/en-us/security/blog/2026/06/30/securing-ai-agents-ai-tools-move-from-reading-acting/Beam AI, "5 Real AI Agent Security Breaches in 2026 and Their Lessons" (May 6, 2026), covering the Mexican government breach and OpenClaw/ClawHub incident: https://beam.ai/agentic-insights/ai-agent-security-breaches-2026-lessonsU.S. National Security Agency / CSA, "Model Context Protocol (MCP): Security Design" (PP-26-1834, May 2026): https://media.defense.gov/2026/Jun/02/2003943289/-1/-1/0/CSI_MCP_SECURITY.PDFPointGuard AI, CVE-2026-26118 analysis: https://www.pointguardai.com/ai-security-incidents/microsoft-mcp-server-vulnerability-opens-door-to-ai-tool-hijacking-cve-2026-26118Invariant Labs, tool-shadowing and rug-pull disclosures (2025): https://invariantlabs.ai/blog/mcp-github-vulnerability, https://invariantlabs.ai/blog/whatsapp-mcp-exploited

By Igboanugo David Ugochukwu

CORE

Jakarta NoSQL 1.1: Advancing Polyglot Persistence for Jakarta EE 12

Modern applications rarely rely on a single data model. Relational databases remain essential for transactional consistency and structured business data. However, document, key-value, column-oriented, graph, and vector databases are now critical for workloads that require flexible schemas, horizontal scalability, low-latency access, or specialized queries. As a result, polyglot persistence — selecting the most appropriate database model for each use case — has become a standard architectural strategy rather than an exception. The rise of artificial intelligence further supports this trend. Retrieval-augmented generation (RAG), semantic search, recommendation systems, and autonomous agents often rely on embeddings and vector similarity searches to access contextual information. As a result, vector databases and multimodel NoSQL platforms are becoming integral to the modern enterprise data landscape. In this context, Jakarta NoSQL offers Jakarta EE developers a standardized and extensible programming model for working with various NoSQL technologies, while minimizing direct dependence on specific database vendors. From Jakarta NoSQL to Polyglot Persistence Jakarta NoSQL is the first specification developed within the Jakarta EE ecosystem, rather than inherited from Java EE. It addresses the need for enterprise applications to use NoSQL databases and supports polyglot persistence. Its goal is to offer a simple, vendor-neutral programming model for document, key-value, column, and graph databases, so developers do not need to learn a separate API for each provider. This work influenced the development of Jakarta Data, which introduced a repository-oriented model independent of database technology, and Jakarta Query, which aims to provide a unified query language across persistence specifications. Collectively, these specifications advance Jakarta EE toward a broader and more consistent data-access strategy. Entity mapping is the initial step in Jakarta NoSQL. Its annotations use terminology familiar from Jakarta Persistence, formerly JPA. Developers use @Entity to define persistent types, @Id for keys, and @Column for attributes. This consistency lowers the learning curve for Java developers experienced with Jakarta Persistence. For example, an investment can be modeled as follows: Java ackage expert.os.videos.nosql; import jakarta.nosql.Column; import jakarta.nosql.Entity; import jakarta.nosql.Id; import java.math.BigDecimal; import java.util.UUID; @Entity public class Investment { @Id private UUID id; @Column private String name; @Column private InvestmentType type; @Column private BigDecimal amount; public Investment( UUID id, String name, InvestmentType type, BigDecimal amount) { this.id = id; this.name = name; this.type = type; this.amount = amount; } Investment() { } @Override public String toString() { return "Investment{" + "id=" + id + ", name='" + name + '\'' + ", type=" + type + ", amount=" + amount + '}'; } } ublic enum InvestmentType { STOCK, BOND, FUND, CRYPTO, REAL_ESTATE } Jakarta NoSQL supports Java records, enabling entities to be defined in a more concise and immutable format: Java @Entity public record Investment( @Id UUID id, @Column String name, @Column InvestmentType type, @Column BigDecimal amount) { } A key difference from Jakarta Persistence is that persistent attributes must be explicitly marked with @Id or @Column. Fields lacking these annotations are ignored, making the persistence model clearer and preventing accidental storage of attributes. After mapping the entity, it can be inserted, retrieved, and queried using the template API: Java UUID id = UUID.randomUUID(); Investment investment = new Investment( id, "Java Growth Fund", InvestmentType.FUND, new BigDecimal("1500.00") ); template.insert(investment); template.find(Investment.class, id) .ifPresent(System.out::println); template.select(Investment.class) .where("amount") .gt(new BigDecimal("1000")) .result() .forEach(System.out::println); The fluent query API makes operations easy to discover and keeps queries aligned with the domain model. In this example, the application uses Oracle NoSQL, but the same mapping and structure can be reused with providers like MongoDB or ArangoDB by updating dependencies and connection settings. The common API reduces vendor coupling, though database-specific features such as transactions, consistency, indexing, and advanced queries may still require provider-specific solutions. Jakarta NoSQL 1.1 Jakarta NoSQL 1.1 advances data access in Jakarta EE by improving compatibility with other specifications. With Jakarta EE 12, enterprise Java enters a new data era, highlighted by Jakarta NoSQL’s integration with Jakarta Query. Jakarta Query provides a unified query model for Java applications and diverse data sources. Its core language defines essential query concepts such as entities, attributes, comparisons, filtering, and parameters. It also offers the Jakarta Persistence Query Language, previously known as JPQL, enabling its familiar syntax and concepts to be used by other specifications and persistence technologies. With the Investment entity, applications can execute string-based queries directly using the template API: Java template.query("FROM Investment WHERE amount > 1000") .result() .forEach(System.out::println); Queries can use named parameters to separate values from the query expression: Java template.query("FROM Investment WHERE amount > :amount") .bind("amount", new BigDecimal("1000")) .result() .forEach(System.out::println); Jakarta NoSQL 1.1 supports projections, enabling queries to return only the information needed for a specific use case rather than loading the entire entity. Projections can be represented as Java records and declared with the @Projection annotation: Java @Projection public record InvestmentProjector( String name, BigDecimal amount) { } The projection can then serve as the result type for a typed query: Java template.typedQuery( "FROM Investment WHERE amount > 1000", InvestmentProjector.class) .result() .forEach(System.out::println); In this example, the query returns only the investment name and amount. This approach is useful for reports, dashboards, API responses, and other read-oriented scenarios where retrieving the full entity is unnecessary. Records are well-suited for projections because they offer a compact and immutable representation of selected data. Jakarta NoSQL 1.1 expands the fluent API. Previous versions supported select and delete operations: Java template.select(Investment.class) .where("amount") .gt(new BigDecimal("1000")) .result() .forEach(System.out::println); template.delete(Investment.class) .where("amount") .gt(new BigDecimal("1000")) .execute(); Version 1.1 adds fluent update operations, completing the main set of data manipulation capabilities: Java template.update(Investment.class) .set("amount") .to(new BigDecimal("2000.00")) .where("id") .eq(id) .execute(); This operation updates matching entities directly, eliminating the need to retrieve and modify them in memory first. Another enhancement is the autoApply attribute for the @Converter annotation. When enabled, the converter is automatically applied to every mapped attribute of the supported Java type, removing the need to declare it on each field. This reduces repetitive configuration and ensures consistent custom type conversion across the domain model. Together, Jakarta Query integration, projections, fluent update operations, and automatic converters make Jakarta NoSQL 1.1 more expressive and better aligned with the broader Jakarta EE data ecosystem.

By Otavio Santana

CORE

How to Build a Brand Monitoring Dashboard With SerpApi and Python

Knowing what people say about your product usually means checking Google News, scrolling through YouTube, and digging into different social media threads. That's three tabs, three interfaces, and no way to compare what you find. This tutorial builds a single dashboard that pulls brand mentions from all three sources using Python and SerpApi. By the end, you'll have a Streamlit app with three tabs, one for news articles, one for YouTube videos, and one for social media and forum discussions. We'll use "serpapi" as the search query, but you can swap the brand or product name. Brand monitoring dashboard showing metrics row with total mentions, news articles, YouTube videos, and perspectives counts Set Up Your Environment Requirements: Python 3.8+SerpApi API Key (the free plan includes 250+ searches/month)Dependencies (serpapi, pandas, streamlit, altair) The serpapi package is the official Python SDK. It handles request signing, retries, and response parsing. The complete code, including a Jupyter notebook version, is available in the SerpApi tutorials repository. The Pipeline The app follows the same three-step pattern from the GitHub Issues dashboard: fetch raw data, transform it, and display the analysis. Pipeline diagram showing three stages: fetch, transform, and display The difference this time is three separate engines running in parallel. Each returns a different response structure, so the transform step normalizes everything into DataFrames before the dashboard consumes it. Fetch the Data A single SerpApi client instance works for all three engines: Python import serpapi import os SERPAPI_KEY = os.environ.get("SERPAPI_KEY", "") client = serpapi.Client(api_key=SERPAPI_KEY) Google News The Google News API returns articles through the news_results key. Each result includes title, link, source (a dict with name and icon), date, and snippet. Python def fetch_news(client, brand): """Fetch news articles mentioning the brand via Google News.""" results = client.search({ "engine": "google_news", "q": brand, "gl": "us", "hl": "en", }) return results.get("news_results", []) For more use cases with this engine, refer to the news monitoring. YouTube The YouTube Search API uses search_query instead of q, and the sp parameter controls time filters. The values EgIIAw%3D%3D (this week) and EgIIBA%3D%3D (this month) are YouTube's internal encoding for upload date filters. You can grab these from YouTube's URL bar after applying a filter manually. We run both filters and deduplicate by link, since the month results include everything from the week: Python YT_FILTER_WEEK = "EgIIAw%3D%3D" YT_FILTER_MONTH = "EgIIBA%3D%3D" def fetch_youtube(client, brand): """Fetch YouTube videos, combining week and month filters.""" seen = set() videos = [] for sp_filter in (YT_FILTER_WEEK, YT_FILTER_MONTH): results = client.search({ "engine": "youtube", "search_query": brand, "sp": sp_filter, }) for video in results.get("video_results", []): link = video.get("link", "") if link and link not in seen: seen.add(link) videos.append(video) return videos For more examples using the YouTube API, refer to this link. Google Perspectives Google Perspectives API surfaces user-generated content from LinkedIn, Reddit, Quora, and blogs. It uses the standard Google engine, and the results appear under the perspectives key: SerpApi search with the Google perspective results Python def fetch_perspectives(client, brand): """Fetch user-generated content (Reddit, LinkedIn, Quora).""" results = client.search({ "engine": "google", "q": brand, "google_domain": "google.com", }) return results.get("perspectives", []) Fetch in Parallel Three sequential API calls take roughly three seconds. Running them in parallel with Python ThreadPoolExecutor brings that down to about one second. Each call runs in its own thread while the others wait for their response: Python from concurrent.futures import ThreadPoolExecutor @st.cache_data(ttl=300) def fetch_all_mentions(brand): """Fetch all brand mentions from three engines in parallel.""" client = serpapi.Client(api_key=SERPAPI_KEY) with ThreadPoolExecutor(max_workers=3) as pool: news_future = pool.submit(fetch_news, client, brand) yt_future = pool.submit(fetch_youtube, client, brand) persp_future = pool.submit(fetch_perspectives, client, brand) return news_future.result(), yt_future.result(), persp_future.result() SerpApi also offers a server-side async parameter for large-scale batch processing, where you submit searches and retrieve results later. For our three concurrent calls, client-side threading is simpler and equally effective. The @st.cache_data(ttl=300) decorator caches results for 5 minutes. Without it, every Streamlit interaction would re-trigger the API calls. This works alongside SerpApi's own 1-hour result cache, which serves identical queries from the cache at no extra search cost unless you explicitly pass no_cache=true. Together, these two layers minimize redundant API calls during development and testing. For more optimization techniques when working with SerpApi at scale, refer to this blog. Transform the Data All three engines return dates as relative strings ("3 hours ago", "2 days ago"). We need a shared parser to convert them into datetime objects for sorting. Parse Relative Dates Two details worth noting. The regex is compiled once and reused since this function runs for every result in all three engines. And the fallback returns datetime.now(timezone.utc) instead of None, so results without a parseable date sort to the top rather than breaking pandas operations. Python import re from datetime import datetime, timedelta, timezone RELATIVE_DATE_RE = re.compile( r"(\d+)\s+(second|minute|hour|day|week|month|year)s?\s+ago", re.IGNORECASE ) UNIT_TO_TIMEDELTA = { "second": lambda n: timedelta(seconds=n), "minute": lambda n: timedelta(minutes=n), "hour": lambda n: timedelta(hours=n), "day": lambda n: timedelta(days=n), "week": lambda n: timedelta(weeks=n), "month": lambda n: timedelta(days=n * 30), "year": lambda n: timedelta(days=n * 365), } def parse_relative_date(text): """Convert '3 hours ago' into a datetime object.""" if not text: return datetime.now(timezone.utc) match = RELATIVE_DATE_RE.search(str(text)) if not match: return datetime.now(timezone.utc) amount = int(match.group(1)) unit = match.group(2).lower() delta = UNIT_TO_TIMEDELTA.get(unit, lambda n: timedelta())(amount) return datetime.now(timezone.utc) - delta Build DataFrames Each engine gets into its own transformer. Here's the news version: Python def transform_news(results): """Convert raw Google News results into structured records.""" records = [] for item in results: source = item.get("source") or {} source_name = source.get("name", "Unknown") if isinstance(source, dict) else str(source) records.append({ "title": item.get("title", ""), "link": item.get("link", ""), "source": source_name, "date": parse_relative_date(item.get("date", "")), "snippet": item.get("snippet", ""), }) return records The source field can be a dict or a plain string depending on the result, so the isinstace check handles both. YouTube and Perspectives follow the same pattern, with two differences worth highlighting. YouTube views come back as strings like "1,234 views", so we strip non-numeric characters before converting: Python views = item.get("views") or 0 if isinstance(views, str): views = int(re.sub(r"[^\d]", "", views) or 0) Build the Dashboard The Streamlit interface starts with a form for the brand query and a row of summary metrics across all three sources: Python st.set_page_config(page_title="Brand Monitoring Dashboard", layout="wide") st.title("Brand Monitoring Dashboard") with st.form("brand_form"): brand = st.text_input("Brand or keyword to monitor", value="serpapi") submitted = st.form_submit_button("Search") Brand or keyword selector to monitor After fetching, the dashboard shows four metrics at the top for a quick overview, then splits into three tabs: Python col1, col2, col3, col4 = st.columns(4) col1.metric("Total Mentions", total_mentions) col2.metric("News Articles", len(news_records)) col3.metric("YouTube Videos", len(yt_records)) col4.metric("Perspectives", len(persp_records)) Dashboard metrics row displaying total mentions across three sources News Tab The News tab pairs an Altair bar chart of top sources with a sortable table. Altair ships with Streamlit, so there's nothing extra to install. We use it instead of st.bar_chart because it gives control over orientation, tooltips, and styling. Python source_df = news_df["source"].value_counts().head(10).reset_index() source_df.columns = ["source", "count"] source_chart = alt.Chart(source_df).mark_bar( cornerRadiusTopRight=4, cornerRadiusBottomRight=4 ).encode( x=alt.X("count:Q", title="Articles"), y=alt.Y("source:N", sort="-x", title=""), color=alt.value("#4A90D9"), tooltip=["source:N", "count:Q"], ).properties(height=350) st.altair_chart(source_chart, use_container_width=True) News tab with horizontal bar chart of top sources and sortable article table The table uses st.column_config.LinkColumn so each article title links directly to its source. YouTube Tab The YouTube tab shows views by channel and a sorted video table. The chart groups views by channel to surface which creators talk about the brand the most. Python channel_df = yt_df.groupby("channel")["views"].sum().reset_index() channel_df = channel_df.sort_values("views", ascending=False).head(10) channel_chart = alt.Chart(channel_df).mark_bar( cornerRadiusTopRight=4, cornerRadiusBottomRight=4 ).encode( x=alt.X("views:Q", title="Views", axis=alt.Axis(format="~s")), y=alt.Y("channel:N", sort="-x", title=""), color=alt.value("#4A90D9"), tooltip=["channel:N", alt.Tooltip("views:Q", format=",")], ).properties(height=350) YouTube tab showing views by channel chart and video table Perspectives Tab The Perspectives tab splits the layout between a discussion table on the left, and a donut chart of mentions by platform on the right. The donut chart makes it easy to see where conversations happen, whether it's LinkedIn, Reddit, X, etc. Python platform_chart = alt.Chart(platform_df).mark_arc( innerRadius=60, outerRadius=120 ).encode( theta=alt.Theta("count:Q"), color=alt.Color("source:N", legend=alt.Legend(title="Platform")), tooltip=["source:N", "count:Q"], ).properties(height=350) Perspectives tab with discussions table on the left and donut chart of mentions by platform on the right When to Use This Approach Ideal for: Tracking brand mentions across news, video, and social in one viewMonitoring product launches, PR campaigns, or competitor namesBuilding internal dashboard for marketing or DevRel teams Not recommended for: Real-time alerting. The API returns a snapshot, not a stream. For notifications, schedule the script on an interval and compare results.Historical analysis. Each engine returns recent results, not a complete archive. If you want to explore the API response before writing code, the SerpApi Playground lets you test any engine interactively. And if you only need news coverage, the Google News API alone handles most brand monitoring use cases. Where to Go from Here This dashboard gives you a live snapshot. The natural next step is turning it into a historical record. Store each fetch in a database (SQLite, PostgreSQL, or even a CSV), and you can compare mention volume week over week, track which sources cover your brand consistently, and spot trends that a single snapshot can't show. With historical data in place, you can layer on more analysis. Identify content gaps by looking at what topics competitors get covered on, but you don't. Track which YouTube channels mention your product and how their view counts trend over time. Flag new platforms or authors that start discussing your brand. The data is yours to work with however fits your needs. The three engines give you the raw material; what you build on top depends on the questions you're trying to answer. Conclusion The full application is about 350 lines in a single Python file. Three API calls, three DataFrames, three tabs. The query input at the top lets you switch brands without changing the code. What started as a way to check where "serpapi" shows up on the web became a tool that surfaces patterns you miss manually. The Perspectives tab pulls in LinkedIn posts, Reddit threads, and Quora answers that don't appear in regular news or video searches, and combining them in one view gives you the full picture. Check out the full SerpAPI article collection here.

By Tomas Murua

Coding

Functions of Coding

Frameworks

Java

JavaScript

Languages

Tools

DZone's Featured Coding Resources

The Latest Coding Topics