DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Coding

Also known as the build stage of the SDLC, coding focuses on the writing and programming of a system. The Zones in this category take a hands-on approach to equip developers with the knowledge about frameworks, tools, and languages that they can tailor to their own build needs.

Functions of Coding

Frameworks

Frameworks

A framework is a collection of code that is leveraged in the development process by providing ready-made components. Through the use of frameworks, architectural patterns and structures are created, which help speed up the development process. This Zone contains helpful resources for developers to learn about and further explore popular frameworks such as the Spring framework, Drupal, Angular, Eclipse, and more.

Java

Java

Java is an object-oriented programming language that allows engineers to produce software for multiple platforms. Our resources in this Zone are designed to help engineers with Java program development, Java SDKs, compilers, interpreters, documentation generators, and other tools used to produce a complete application.

JavaScript

JavaScript

JavaScript (JS) is an object-oriented programming language that allows engineers to produce and implement complex features within web browsers. JavaScript is popular because of its versatility and is preferred as the primary choice unless a specific function is needed. In this Zone, we provide resources that cover popular JS frameworks, server applications, supported data types, and other useful topics for a front-end engineer.

Languages

Languages

Programming languages allow us to communicate with computers, and they operate like sets of instructions. There are numerous types of languages, including procedural, functional, object-oriented, and more. Whether you’re looking to learn a new language or trying to find some tips or tricks, the resources in the Languages Zone will give you all the information you need and more.

Tools

Tools

Development and programming tools are used to build frameworks, and they can be used for creating, debugging, and maintaining programs — and much more. The resources in this Zone cover topics such as compilers, database management systems, code editors, and other software tools and can help ensure engineers are writing clean code.

Latest Premium Content
Trend Report
Platform Engineering and DevOps
Platform Engineering and DevOps
Trend Report
Developer Experience
Developer Experience
Refcard #291
Code Review Core Practices
Code Review Core Practices
Refcard #400
Java Application Containerization and Deployment
Java Application Containerization and Deployment

DZone's Featured Coding Resources

Selective Deployment in Azure Data Factory: A Practical Blueprint for Safer CI/CD

Selective Deployment in Azure Data Factory: A Practical Blueprint for Safer CI/CD

By Sauhard Bhatt
Picture this: two features are being developed in parallel. One has already been tested in lower environments, but is still awaiting business approvalThe other is fully validated and ready to go live Naturally, you want to release the second feature to production. But you can’t, because your deployment model forces you to release everything together. If you’ve worked with Azure Data Factory (ADF), this situation probably sounds familiar. Azure Data Factory (ADF) is a cloud-based data integration service from Microsoft that helps you build and orchestrate data pipelines across systems. It works extremely well for managing data workflows — but when it comes to deployments at scale, things get tricky. As our ADF usage grew across multiple teams and environments, we started running into a recurring problem: We had control over development — but very little control over what actually got deployedA simple pipeline fix could unintentionally introduce unrelated changesParallel feature development became harder to manageProduction releases became riskier than they needed to be That’s when we realized: The issue wasn’t ADF itself — it was the deployment model we were relying on. The issue wasn’t ADF itself — it was the deployment model we were relying on. This article walks through how we addressed that challenge by implementing a selective deployment pattern, allowing us to promote only intended changes without impacting everything else. The Real Problem: Parallel Feature Releases in ADF Before diving into the solution, let’s look at a scenario that frequently occurs in real-world teams. What This Diagram Represents This diagram shows two features progressing across environments: Feature 100 Developed earlier, successfully deployed to Dev and TestCurrently in UAT (User Acceptance Testing)Still awaiting business approval before production Feature 200 Developed later, successfully completed across Dev → Test → UATFully validated and ready for production Expected Behavior At this stage, the expectation is straightforward: “Let’s release Feature 200 to production.” Feature 100 is still under testing, so it should remain in UAT. What Actually Happens in ADF Azure Data Factory follows a full-state deployment model. That means when you deploy, you are not deploying a feature; you are deploying the entire factory state. So when you attempt to release Feature 200: Feature 100 gets included automaticallyYou cannot isolate Feature 200You lose control over what reaches production Why This Becomes a Real Problem This isn’t an edge case; it becomes a recurring pattern in larger environments. You’ll encounter this when: Multiple teams are working in parallelFeatures move at different speedsUAT cycles varyProduction fixes need to be released quickly It becomes even more complex when: Existing production pipelines are modifiedPartial updates are requiredDependencies overlap across features The Core Limitation: ADF promotes state, not intent. It does not differentiate between what is ready for production and what is still under testing. Why We Had to Rethink Deployment This limitation introduced real risks: Accidental promotion of incomplete featuresDelayed production releasesIncreased coordination overheadHigher chances of breaking stable pipelines We needed a way to: Promote only Feature 200Keep Feature 100 in UATAvoid impacting unrelated artifactsReduce production risk Architecture Overview To address this challenge, we introduced a selective packaging layer between build and deployment. Flow Feature Branch → PR → Validate → Selective Packaging → ARM Export → Incremental Deploy → Trigger Control Key Idea: Instead of exporting ARM templates from the full ADF repository, we export from a filtered staging folder containing only the required artifacts. Understanding Default ADF Deployment Behavior Before implementing selective deployment, it’s important to understand how Azure Data Factory works by default. ADF follows a full-state deployment model. How Default ADF Deployment Works When you use ADF with Git integration: Developers work in a collaboration branch (typically main)Changes are committed and merged via pull requestsADF provides a Publish button in the UI When you click Publish, ADF generates ARM templates representing the entire factory state. These templates are stored in the adf_publish branch: In modern setups, instead of clicking Publish manually, teams often use @microsoft/azure-data-factory-utilities (npm-based export). This allows pipelines to validate ADF resources and export ARM templates programmatically. YAML - name: Validate ADF resources run: | set -euo pipefail FACTORY_ID="/subscriptions/${{ env.SUBSCRIPTION_ID }/resourceGroups/${{ env.RESOURCE_GROUP }/providers/Microsoft.DataFactory/factories/${{ env.SOURCE_FACTORY_NAME }" npm run build validate "${{ github.workspace }" "$FACTORY_ID" YAML - name: Export ARM templates (CI publish) run: | set -euo pipefail FACTORY_ID="/subscriptions/${{ env.SUBSCRIPTION_ID }/resourceGroups/${{ env.RESOURCE_GROUP }/providers/Microsoft.DataFactory/factories/${{ env.DEV_FACTORY_NAME }" npm run build export "${{ github.workspace }" "$FACTORY_ID" "${{ env.ARM_OUTPUT_DIR }" Whether you click Publish manually or use npm export in CI/CD, the outcome is the same: Full factory deploymentNo control over individual featuresAll changes get bundled together Selective Deployment Layer (Core Design) We can address this requirement and the associated challenges by introducing a workflow driven by a manifest to define the deployment scope, and a program to identify all necessary ADF dependencies for each manifest file. As a developer, I can now control which release is promoted to production, without worrying about releasing any other features that are not ready. The manifest controls which pipelines to deploy and which optional categories to include. Below is an example of a manifest file JSON { "pipelines": ["pl_ingest_population_selective"], "includeTriggers": false, "includeIntegrationRuntimes": false, "includeAllGlobalParameters": true, "includeLinkedServices": true, "validateLinkedServicesExist": true, "includeManagedVirtualNetwork": false, "includeManagedPrivateEndpoints": false } Workflow Explanation Let's understand the crux of the selective deployment workflow now. I am working in the release branch on my feature branch directly in ADF Studio. Since ADF Studio is integrated with Git, my development changes will be saved to my branch. Here are the steps I can take to promote my change to a higher environment. 1) Validation of ADF on PR validation This is an early validation step and a guardrail: if the PR fails, it's because objects are invalid and misaligned. This is equivalent to the "validation all" button in the ADF ui, here is this workflow Trigger: Pull requests targeting the branch selective_deployment. Purpose: Validate that the ADF JSON in the PR is valid in the context of the target factory. Main steps: CheckoutSet up Node.js 20npm installAzure login using OIDC (azure/login@v2)Validate with ADF Utilities: YAML FACTORY_ID="/subscriptions/${AZURE_SUBSCRIPTION_ID}/resourceGroups/${AZURE_RESOURCE_GROUP}/providers/Microsoft.DataFactory/factories/${DEV_FACTORY_NAME}" npm run build validate "$GITHUB_WORKSPACE" "$FACTORY_ID" 2) Release build + selective deploy to DEV adf-release-build-selective-deploy.yml Triggers: Push to selective_deploymentManual run (workflow_dispatch) with optional manifest inputDefault: deploy/manifests/release.json This workflow has two jobs: Job A: adf-build (staging + export + sanitize + artifacts) Checkout (full history)Azure login using OIDCSet up Node.js 20Install build dependencies inside build/ (npm install in build)Stage selective subset python scripts/select_adf_subset.py <manifest>, a code snippet below for the complete script, refer to the GitHub repository link given Python import json import re import shutil import sys from pathlib import Path from typing import Dict, Set, Tuple, List from collections import defaultdict # Your repo layout has pipeline/, dataset/, linkedService/ at ROOT. REPO_ROOT = Path(".") STAGE_ROOT = Path("build/adf_subset") RESOURCE_DIRS = { "pipeline": REPO_ROOT / "pipeline", "dataset": REPO_ROOT / "dataset", "linkedService": REPO_ROOT / "linkedService", "dataflow": REPO_ROOT / "dataflow", "trigger": REPO_ROOT / "trigger", "integrationRuntime": REPO_ROOT / "integrationRuntime", "credential": REPO_ROOT / "credential", "managedVirtualNetwork": REPO_ROOT / "managedVirtualNetwork", } # Copy these if present so ADF utilities behave the same on staged subset. ROOT_FILES_TO_COPY = [ "publish_config.json", "arm-template-parameters-definition.json", "arm_template_parameters-definition.json", "package.json", "package-lock.json", ] Produces: build/adf_subset/ (staged tree)build/adf_subset_report.json (dependency report)Refer to logs below (showing output of stage selective subset and debug to view output generated after select_adf_subset.py )Export ARM templates from the staged subset via ADF Utilities: npm --prefix build run build -- export "adf_subset" "$FACTORY_ID" "ArmTemplate"Produces: build/ArmTemplate/ARMTemplateForFactory.jsonbuild/ArmTemplate/ARMTemplateParametersForFactory.jsonStrip infra-owned resources scripts/strip_arm_resources.py to produce a safe template: build/ArmTemplate/ARMTemplateForFactory.safe.json⚠️ Note on Infrastructure Components (Refer to the “Future Work & Next Steps” section for follow-up topics in this series) The step above intentionally strips infrastructure-dependent components from the generated subset to avoid overwriting existing shared resources such as linked services. This implementation focuses on developer-owned artifacts (pipelines, datasets, and triggers) and assumes that infrastructure components — such as Integration Runtimes, managed private endpoints, and linked services — are pre-provisioned and managed outside of this deployment workflow.Upload artifacts: ARM templates (adf-arm)metadata (adf-release-meta)subset report (adf-subset-report) Job B: deploy_dev (deploy safe template) Download ARM artifactAzure login using OIDCEnsure az Data Factory extension is installedValidate JSON files exist/parseDeploy via azure/arm-deploy@v2(Incremental) to DEV RG/factory: Template: ARMTemplateForFactory.safe.jsonParameters: ARMTemplateParametersForFactory.json + factoryName=<DEV_FACTORY_NAME> Lesson Learned Setting up selective deployment in ADF was more than a technical task. It made us rethink our approach to deployments, ownership, and CI/CD design. Here are the main things we learned: 1. The Problem Is Not Tooling; It’s Deployment Granularity At first, we thought the limitation came from the tools we used, like UI publish or npm export. However, both methods yielded the same result: full factory templates. The real problem was that we couldn’t control the scope of deployments, not how the templates were made. 2. Dependency Awareness Is Critical Selective deployment only works when every dependency is found and included. We learned that: Pipelines often reference multiple datasets and linked services. Missing even one dependency results in deployment failure You must automate dependency discovery. 3. “Incremental” Is Often Misunderstood Incremental deployment is important, but it doesn’t work like a patch. It reapplies the full configuration for all included resources. This means: Your generated templates need to be complete for all the artifacts you include. If you use partial definitions, deployments can fail. 4. Separation of Concerns Is Key Not all ADF artifacts are the same. We began to separate them into different groups: Application-owned artifacts: pipelines, datasets, triggers Infrastructure-owned artifacts: linked service, managed virtual networks, managed private endpoints, and integration-runtime, among others. This separation proved crucial for safe, scalable deployments. 5. Selective Deployment Adds Complexity, But It’s Worth It It’s true that implementing this approach brings in additional scripts, manifest management, and CI/CD complexity. But in exchange, we gained precise control over releases, reduced production risk, and faster hotfix deployments. Future Work and Next Steps While selective deployment solved a major gap in ADF CI/CD, it also opened up new areas for improvement and standardization. 1. Defining Infrastructure vs Application Ownership One of the biggest follow-up areas is clearly defining ownership boundaries. In our experience: Application teams should own pipelines, datasets, and triggers Platform or infrastructure teams should own linked services, managed virtual networks, and managed private endpoints, among other things. Future work can focus on: Enforcing this separation in CI/CD. Preventing accidental deployment of infrastructure components Integrating Terraform or platform pipelines for infrastructure provisioning 2. Governance Around Linked Services Linked services are often shared across multiple pipelines and teams. Future improvements include: Centralizing linked service management Using Key Vault and Managed Identity consistently Preventing direct modifications through application pipelines More
A Tool Is Not a Platform (And Your Team Knows the Difference)

A Tool Is Not a Platform (And Your Team Knows the Difference)

By Jeleel Muibi
Most infrastructure teams have a moment where someone says “we should build a platform.” The motivation is real: teams are duplicating work, the current setup is hard to use consistently, and a more structured approach would help. A few months later, the platform is a Terraform module collection, a GitLab CI template, a shared repository of scripts, and a README that several people have tried to keep current. That is a useful thing. It is not a platform. The distinction is worth being clear about, not to dismiss the work, but because the word “platform” creates expectations. When internal teams hear “we have a platform,” they assume stability, a usable interface, a versioning model, and some mechanism for raising problems when things break. A toolchain with documentation does not deliver those things by default. What Makes Something a Platform A platform is defined by its contract, not its technology. The contract describes what the consumer can expect: what they call, what parameters they provide, what outputs they receive, and what stability guarantees apply to that interface. A Terraform module with a published interface is closer to a platform primitive than a pipeline that provisions the same resources through environment variables, undocumented flags, and positional arguments. The module has a contract. The pipeline has a process. The contract does not have to be formal. It needs three things. A stable surface. Consumers should be able to call the same interface next month and receive the same type of result. Internal changes to how it works do not break consumers.A versioning model. When the interface changes, that change is communicated, and consumers are not silently broken. A git tag is enough to start with. Semantic versioning is better.A feedback path. Consumers can report when the contract is violated or the interface does not behave as documented. Someone is responsible for responding. A Terraform module with these three properties is a platform primitive. A set of modules with a shared versioning model, a stable registry entry, and a team responsible for maintaining the contract is starting to look like a platform. What Teams Actually Experience The gap between a toolchain and a platform shows up in how teams actually use it. With a toolchain, onboarding a new team means pointing them at the repository and telling them to read the README. Anything not in the README requires asking someone who has been around for a while. Changes to the toolchain break existing consumers silently because there is no versioning model. The team that maintains the toolchain treats every consumer as having kept up with the latest state of the repository. With a platform, onboarding means pointing teams at interface documentation with a working example. Changes go through a version increment. Consuming teams that pin to a version are not broken by changes they did not ask for. Plain Text # Consuming a module with a pinned version module "vm" { source = "registry.example.com/hybridops/vm/proxmox" version = "~> 2.1" name = "web-01" cores = 2 memory = 4096 } This looks like a small detail. For teams consuming infrastructure modules across a growing estate, it is the difference between a managed dependency and a shared folder everyone is afraid to touch. When a Toolchain Is the Right Call Not every infrastructure system needs to be a platform. A toolchain is appropriate when the team is small and holds the full mental model, the surface area is limited, and the rate of change is low enough that everyone stays current without a formal versioning model. When those conditions hold, the overhead of maintaining a platform contract is not justified. The problem is not having a toolchain. The problem is calling it a platform when it is not, and then finding that the expectations it created are not being met. Teams told they have a stable platform, then hit with a broken workflow from an unannounced change, lose confidence quickly. That confidence is hard to rebuild. HybridOps has been working in this space: publishing Terraform modules to a registry, versioning releases, and treating module interfaces as contracts. It is not a finished platform. It is a direction, and being explicit about that direction changes how the work gets done. A Simple Test If a consuming team pins to the current version of your toolchain today, will it still work in three months without any changes on their side? If you cannot answer yes with confidence, you have a toolchain, not a platform. Both are useful. Only one creates the kind of trust that makes a growing engineering organisation move faster rather than slower. Knowing which one you have is the first step toward building the right one. More
Code and Connect: MCP + MuleSoft
Code and Connect: MCP + MuleSoft
By Ajay Singh
Reducing Alert Fatigue in the SOC Using Correlation Rules and Detection-as-Code
Reducing Alert Fatigue in the SOC Using Correlation Rules and Detection-as-Code
By Krishnaveni Musku
Architectural Cost of Rust's Orphan Rule
Architectural Cost of Rust's Orphan Rule
By Krun Dev
Implementing Asynchronous Communication Between Microservices Using Kafka and Spring Boot
Implementing Asynchronous Communication Between Microservices Using Kafka and Spring Boot

In a microservices system, that tight coupling turns a small hiccup into a cascading slowdown. Thread pools fill, retries amplify traffic, and suddenly your simple request is blocked on half the fleet. My executive summary: asynchronous messaging with Kafka helps systems keep moving when individual components inevitably slow down or fail. It does this by decoupling producers from consumers, absorbing traffic spikes, and allowing services to evolve without tying their availability directly to one another. Code Patterns in Spring Boot With Kafka Spring for Apache Kafka gives me two primitives that feel pleasantly old Spring KafkaTemplate for sending and @KafkaListener for receiving. That template/listener model is intentionally similar to other Spring integration tech, which keeps application code focused on domain logic instead of raw client plumbing. Below is a compact (but production-shaped) pattern: externalized config via @ConfigurationProperties, a service port for publishing, a REST command endpoint, a consumer with a real error strategy (DLT), and a REST error advice. Java // === Messaging config (externalized, type-safe) === @ConfigurationProperties(prefix = "messaging.orders") @Validated record OrdersMessagingProps( @NotBlank String topic, @NotBlank String dltTopic ) {} // === DTO (event contract) === public record OrderCreatedEvent(UUID orderId, UUID userId, BigDecimal total, Instant createdAt) {} // === Service port (keeps domain testable, Kafka swappable) === public interface OrderEventPublisher { void publishOrderCreated(OrderCreatedEvent event); } // === Adapter: Kafka producer === @Component class KafkaOrderEventPublisher implements OrderEventPublisher { private final KafkaTemplate<String, OrderCreatedEvent> template; private final OrdersMessagingProps props; KafkaOrderEventPublisher(KafkaTemplate<String, OrderCreatedEvent> template, OrdersMessagingProps props) { this.template = template; this.props = props; } @Override public void publishOrderCreated(OrderCreatedEvent event) { // Keying by orderId keeps per-order ordering and drives partitioning decisions. template.send(props.topic(), event.orderId().toString(), event); } } // === REST command API (synchronous edge, async core) === @RestController @RequestMapping("/v1/orders") class OrdersController { private final OrderService orderService; // domain port OrdersController(OrderService orderService) { this.orderService = orderService; } @PostMapping public ResponseEntity<Map<String, Object>> create(@Valid @RequestBody CreateOrderRequest req) { UUID orderId = orderService.create(req.userId(), req.total()); // persists + publishes event return ResponseEntity.accepted().body(Map.of("orderId", orderId, "status", "ACCEPTED")); } record CreateOrderRequest(@NotNull UUID userId, @NotNull @Positive BigDecimal total) {} } // === Domain service port (implementation can use outbox, transactions, etc.) === public interface OrderService { UUID create(UUID userId, BigDecimal total); } // === Consumer: downstream service reacts to events === @Component class BillingListener { @KafkaListener(topics = "${messaging.orders.topic}", groupId = "${spring.kafka.consumer.group-id}") void onOrderCreated(OrderCreatedEvent event) { // Idempotency belongs here: process-by-key + store processed eventId/orderId to avoid duplicates. // Do work (charge card, create invoice, etc.) } } // === Kafka consumer error handling: retries + DLT === @Configuration class KafkaErrorHandlingConfig { @Bean DefaultErrorHandler defaultErrorHandler(KafkaTemplate<Object, Object> template, OrdersMessagingProps props) { var recoverer = new DeadLetterPublishingRecoverer(template, (rec, ex) -> new TopicPartition(props.dltTopic(), rec.partition())); // Backoff and retry policy are configurable; keep it finite to avoid poison-pill loops. return new DefaultErrorHandler(recoverer, new FixedBackOff(1000L, 3)); } } // === REST error handling (ProblemDetail) === @RestControllerAdvice class ApiErrors { @ExceptionHandler(IllegalArgumentException.class) @ResponseStatus(HttpStatus.BAD_REQUEST) ProblemDetail badRequest(IllegalArgumentException ex) { var pd = ProblemDetail.forStatusAndDetail(HttpStatus.BAD_REQUEST, ex.getMessage()); pd.setTitle("Invalid request"); return pd; } } A few been-burned-before notes on the code above. Spring Kafka’s reference docs are explicit that KafkaTemplate is the convenience wrapper for producing, and DefaultErrorHandler + DeadLetterPublishingRecoverer is a first-class way to route failed records to dead-letter topics after retries. If we want non-blocking retries, Spring Kafka also provides @RetryableTopic, which orchestrates retry topics and a DLT automatically useful when transient failures are common and you want predictable retry delay semantics. Containers and Local Dev With Docker Compose When I’m chasing down event flow bugs, I like local environments that feel like the old days: one command, deterministic startup order, and no mystery dependencies. Docker Compose is still the quickest way to stand up Kafka alongside your services, and Confluent publishes straightforward Docker-based tutorials and compose examples for running Kafka locally. For the service image itself, multi-stage builds are the modern classic compile in a builder stage, and copy the artifact into a slimmer runtime stage. Docker documents multi-stage builds as a way to reduce the final image contents and keep build dependencies out of production. Dockerfile # Multi-stage Dockerfile for a Spring Boot service (orders-service) FROM eclipse-temurin:21-jdk AS build WORKDIR /workspace COPY mvnw pom.xml ./ COPY .mvn .mvn RUN ./mvnw -q -DskipTests dependency:go-offline COPY src src RUN ./mvnw -q -DskipTests package FROM eclipse-temurin:21-jre WORKDIR /app COPY --from=build /workspace/target/*.jar app.jar EXPOSE 8080 ENTRYPOINT ["java","-jar","/app/app.jar"] And here’s a Compose file that wires up Kafka and Schema Registry, plus an example Spring Boot service. The exact image choices are illustrative. Your production choices are unspecified and should reflect your standards and security posture. YAML # compose.yaml (local/dev) services: zookeeper: image: confluentinc/cp-zookeeper:7.6.0 environment: ZOOKEEPER_CLIENT_PORT: 2181 kafka: image: confluentinc/cp-kafka:7.6.0 depends_on: [zookeeper] ports: ["9092:9092"] environment: KAFKA_ZOOKEEPER_CONNECT: zookeeper:2181 KAFKA_ADVERTISED_LISTENERS: PLAINTEXT://kafka:9092,PLAINTEXT_HOST://localhost:9092 KAFKA_LISTENER_SECURITY_PROTOCOL_MAP: PLAINTEXT:PLAINTEXT,PLAINTEXT_HOST:PLAINTEXT KAFKA_INTER_BROKER_LISTENER_NAME: PLAINTEXT KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR: 1 schema-registry: image: confluentinc/cp-schema-registry:7.6.0 depends_on: [kafka] ports: ["8081:8081"] environment: SCHEMA_REGISTRY_HOST_NAME: schema-registry SCHEMA_REGISTRY_KAFKASTORE_BOOTSTRAP_SERVERS: PLAINTEXT://kafka:9092 orders: build: ./orders-service depends_on: [kafka] ports: ["8080:8080"] environment: SPRING_KAFKA_BOOTSTRAP_SERVERS: kafka:9092 MESSAGING_ORDERS_TOPIC: orders.events MESSAGING_ORDERS_DLTTOPIC: orders.events.dlt SCHEMA_REGISTRY_URL: http://schema-registry:8081 Deploying on Kubernetes or AWS On AWS, the Kafka decision is usually managed or self-managed. If you choose Amazon MSK, the cluster lives in your VPC, pick subnets across distinct Availability Zones, and connect clients using the cluster’s bootstrap brokers. That’s the networking baseline, and it’s not optional. MSK is VPC-first by design. For authentication/authorization, MSK supports IAM access control. AWS documents the client configuration for IAM mechanisms. In EKS, I typically pair MSK IAM with IRSA so pods can obtain AWS credentials the AWS way, while ECS services would use task roles instead. Both patterns are documented by AWS, and your choice here is unspecified. Kubernetes service discovery is usually the easy part. Services and Pods get DNS names so workloads can call each other by name rather than IP. Kafka itself is reached via bootstrap broker endpoints or via internal Services, but either way, you want the strings in externalized config, not hardcoded. Here’s a minimal Kubernetes Deployment/Service for a Kafka client service. Values like region, account IDs, and MSK endpoints are unspecified placeholders. YAML apiVersion: apps/v1 kind: Deployment metadata: name: orders namespace: apps spec: replicas: 2 selector: matchLabels: { app: orders } template: metadata: labels: { app: orders } spec: serviceAccountName: orders-sa # IRSA-bound (role ARN unspecified) containers: - name: orders image: <UNSPECIFIED_AWS_ACCOUNT_ID>.dkr.ecr.<UNSPECIFIED_REGION>.amazonaws.com/orders:<TAG> ports: [{ containerPort: 8080 }] env: - name: SPRING_KAFKA_BOOTSTRAP_SERVERS value: "<UNSPECIFIED_MSK_BOOTSTRAP_BROKERS>" - name: MESSAGING_ORDERS_TOPIC value: "orders.events" - name: MESSAGING_ORDERS_DLTTOPIC value: "orders.events.dlt" readinessProbe: httpGet: { path: /actuator/health/readiness, port: 8080 } initialDelaySeconds: 10 --- apiVersion: v1 kind: Service metadata: name: orders namespace: apps spec: selector: { app: orders } ports: - port: 80 targetPort: 8080 Operationally, MSK exposes metrics into CloudWatch (AWS/Kafka), and broker logs can be delivered to CloudWatch Logs (or S3/Firehose). That combination gives you the classic visibility loop: throughput, lag, under-replicated partitions, and error logs without running your own monitoring plane. For distributed tracing in async flows, OpenTelemetry is my default vocabulary now. Spring Boot supports OpenTelemetry export via OTLP, and OpenTelemetry defines Kafka semantic conventions so your producer/consumer spans and attributes stay consistent across tools. CI/CD and the Hard-Earned Field Notes For CI/CD, I keep it boring: build once, push an immutable image, deploy via a declarative mechanism. AWS Prescriptive Guidance provides a clear GitHub Actions pattern for building Docker images and pushing to Amazon ECR, which is a solid baseline when your region/account is unspecified until configured. YAML # .github/workflows/orders.yml name: orders on: push: branches: ["main"] jobs: build_push_deploy: runs-on: ubuntu-latest permissions: id-token: write contents: read steps: - uses: actions/checkout@v4 - uses: actions/setup-java@v4 with: distribution: temurin java-version: "21" - name: Build & test run: ./mvnw -q test package - name: Configure AWS credentials (OIDC) uses: aws-actions/configure-aws-credentials@v4 with: role-to-assume: arn:aws:iam::<UNSPECIFIED_AWS_ACCOUNT_ID>:role/<UNSPECIFIED_GHA_ROLE> aws-region: <UNSPECIFIED_REGION> - name: Login to ECR run: | aws ecr get-login-password --region <UNSPECIFIED_REGION> \ | docker login --username AWS --password-stdin <UNSPECIFIED_AWS_ACCOUNT_ID>.dkr.ecr.<UNSPECIFIED_REGION>.amazonaws.com - name: Build & push image run: | IMAGE=<UNSPECIFIED_AWS_ACCOUNT_ID>.dkr.ecr.<UNSPECIFIED_REGION>.amazonaws.com/orders:${{ github.sha } docker build -t $IMAGE ./orders-service docker push $IMAGE - name: Deploy to EKS (example) run: | aws eks update-kubeconfig --name <UNSPECIFIED_EKS_CLUSTER> --region <UNSPECIFIED_REGION> kubectl -n apps set image deploy/orders orders=$IMAGE Now, the part I wish someone had handed me in 2016: Kafka gives you strong tools, but it does not remove distributed-systems truths. You still need safeguards on the consumer side: idempotent processing, disciplined schema management, and clearly defined retry and dead-letter topic behavior. Kafka’s documentation is careful about the limits of “exactly once” guarantees. Idempotent producers and transactions can strengthen delivery semantics, but achieving true end-to-end exactly-once behavior, especially when external side effects are involved, still depends on deliberate system design. For schema governance, Kafka itself doesn’t ship a schema registry, but acknowledges third-party registries; in practice, Confluent Schema Registry and Apicurio Registry are common choices. Both store schemas out-of-band, so messages carry only a schema identifier, and both support evolvable contracts across Avro/JSON Schema/Protobuf depending on your ecosystem. Conclusion and Best Practices If you take one lesson from my legacy brain into modern event-driven systems, let it be this: asynchrony is a reliability feature, not a performance trick. Kafka’s durable log and consumer group model decouples uptime and absorbs spikes, but you only get the real benefit when you treat schemas as contracts, consumers as idempotent processors, and failure handling as first-class application behavior. On AWS, the operational baseline is non-negotiable. MSK lives in your VPC across AZ subnets, clients connect via bootstrap brokers, IAM auth is configured explicitly, and observability lives in CloudWatch. Do those fundamentals early, and Kafka stops feeling like a mysterious black box and starts feeling like the dependable workhorse it was built to be.

By Mallikharjuna Manepalli
Architectural Collapse: How Extension Poisoning, Node Vulnerabilities, and Infrastructure Fog Enabled the GitHub Repository Breach
Architectural Collapse: How Extension Poisoning, Node Vulnerabilities, and Infrastructure Fog Enabled the GitHub Repository Breach

Enterprise perimeter defenses are fundamentally built on an obsolete assumption that the developer’s workstation is a secure, trusted anchor point. The massive security breach executed by the threat group TeamPCP, resulting in the exfiltration of 3,800 internal GitHub source code repositories, completely shattered this illusion. This was not a standalone exploit. It was a multi-vector convergence where vulnerabilities in the Node/NPM ecosystem, the systemic ungoverned architecture of the Visual Studio Code Marketplace, and the tactical “fog of war” caused by a period of historic GitHub infrastructure instability came together to create the perfect attack. Phase 1: The Root Exploitation (Node/NPM and the TanStack Supply Chain Pivot) The kill chain did not begin at GitHub; it originated deep within the modern JavaScript developer tool-chain. TeamPCP executed a localized supply chain compromise targeting upstream open-source utilities, specifically targeting contributors to TanStack npm packages (a widely relied-upon suite for state management and routing). [TanStack NPM Compromise] -> [Stolen ‘gh’ CLI Tokens] -> [Nx Console Pipeline Hijack (No Multi-Admin Approval)] -> [Malicious Extension Version 18.95.0 Published] By injecting malicious code into these highly trusted downstream dependencies, the attackers performed targeted local credential harvesting. Their primary target was not production application code, but the development environments of the maintainers themselves. The exploit successfully extracted long-lived GitHub CLI (`gh`) authentication tokens from a legitimate core developer who maintained both TanStack and the Nx Console ecosystem. Because these developer access tokens lacked granular scoping restrictions, they provided direct administrative write access to the main release pipelines of secondary repositories. Phase 2: VS Code Extension Poisoning (The Nx Console Triage (CVE-2026–48027)) Armed with the stolen gh tokens, TeamPCP bypassed standard perimeter security by pivoting to the Visual Studio Marketplace. On May 18, version 18.95.0 of Nx Console (a heavily utilized Monorepo orchestration extension with over 2.2 million installs) was maliciously built and uploaded. The deployment revealed two fatal flaws within modern developer workflows, 1. The Single-Factor Release Pipeline The malicious version was uploaded directly to both the Visual Studio Marketplace and the open-source OpenVSX registry. Because the Nx Console publishing architecture lacked a “two-admin manual validation mandate” for automated releases, the publishing pipeline accepted the stolen developer token at face value without triggering a secondary verification gate. 2. The “Silent Killer”: Marketplace Metrics vs. Background Sync Microsoft’s public marketplace logs initially registered a negligible 28 manual downloads before the package was identified and yanked 18 minutes later. However, Nx’s internal telemetry revealed that ~6,000 extension activations occurred simultaneously. This massive discrepancy highlights the danger of VS Code’s background auto-update synchronization. Thousands of developer environments pulled down, unzipped, and executed the malicious version automatically while the developers’ IDEs were running in the background. JSON // Example of the target parameters within compromised developer workspaces { "extensions.autoUpdate": true, // The default vulnerability exploited by TeamPCP "terminal.integrated.profiles.osx": { "malicious-hook": { "path": "/bin/bash", "args": ["-c", "python3 ~/.local/share/kitty/cat.py &"] } } } The Node Execution Layer Upon extension activation, the poisoned payload immediately dropped an obfuscated Node.js post-install hook. Operating completely within user space to evade basic Endpoint Detection and Response (EDR) behavioral hooks, it set an environmental marker (`__DAEMONIZED=1`) and spawned a background Python process (`cat.py`). The malware systemically scanned local paths for, Infrastructure configuration: Plaintext HashiCorp Vault tokens (`~/.vault-token`), local Kubernetes kubeconfig files, and AWS/Azure IAM metadata endpoint hashes.Ecosystem identity: Plaintext .npmrc registry tokens and active GitHub tokens (`ghp_`, gho_, ghs_).Active memory subsystems: Contents of 1Password vaults via the op CLI by hijacking active, unlocked terminal sessions. Phase 3: The Climax (The Internal GitHub Breach) The payload achieved its ultimate goal when an internal GitHub software engineer, who utilized Nx Console for local workflow management, had their workstation pull down the background update. The malware executed on the engineer’s local machine, scraped their active internal enterprise session tokens, and exfiltrated them to a remote command-and-control (C2) server. TeamPCP then used these highly privileged internal access credentials to bypass GitHub’s corporate identity perimeters entirely. Because internal repository boundaries operate on flat network access structures once an authenticated developer endpoint is cleared, the threat actors systematically cloned and exfiltrated 3,800 proprietary internal GitHub source code repositories before the endpoint could be isolated. Phase 4: The Tactical Fog of War (GitHub’s Infrastructure Instability) The velocity and stealth of this attack were significantly aided by an ongoing reliability crisis within GitHub’s core infrastructure. During the 12-month window surrounding the breach, GitHub recorded a massive spike in service degradation. Total tracked incidents: 257 distinct technical incidents.Major outages: 48 major service shutdowns, totaling 112 hours and 18 minutes of total downtime.Primary failure vector: GitHub Actions experienced 57 outages, three times the incidence rate of core Git storage operations.On October 29: Outage in compute dependency (Microsoft Azure). 90% error rate across Codespaces/ Actions affected Telemetry gaps, security monitoring systems failed to track cross-border API token replication.On February 2: Configuration failure in user settings caching, cascading failures in the Git HTTPS proxy affected High volume of synchronous cache writes generated a deluge of network errors, masking anomalous Git clone calls.On February 12: Authorization claim changes in core networking dependencies, 90% Codespace provisioning failure affected Security alerts failed to populate due to misclassified severity ratings, delaying incident detection by hours. The Alert Fatigue Vulnerability This constant infrastructural noise created an ideal tactical environment for the attackers. SecOps and DevSecOps teams were caught in a continuous state of alert fatigue, dealing with broken GitHub Actions pipelines, Elasticsearch cluster degradation, and database timeouts. When TeamPCP’s automated scripts began running rapid API queries and pulling massive volumes of repository data using the compromised engineer’s token, the unusual spikes in data transfer blended into the background noise of an infrastructure already struggling with systemic capacity and networking failures. Hard Takeaways: How Developers Must Harden Their Environments If your environment was active or using automated tools during this period, you must shift your development machine from an implied trust zone to a completely zero-trust environment. 1. Kill IDE Auto-Updates Globally Never allow your IDE to pull unvetted code in the background. Explicitly configure your editor to require manual permission before updating any third-party extension. In VS Code’s global settings.json, enforce: Plain Text "extensions.autoUpdate": false, "extensions.autoCheckUpdates": false 2. Isolate Development Runtimes Stop running compilers, package managers, and complex IDE extensions directly on your bare-metal operating system. Utilize isolated, ephemeral development environments (e.g., containerized Dev containers or heavily scoped virtual environments) where local file-system access is completely decoupled from your master ~/.ssh/, ~/.aws/, or .npmrc configuration folders. 3. Implement Strict Token Volatility Eliminate long-lived personal access tokens (`ghp_`). Switch entirely to fine-grained personal access tokens configured with strict, single-repository scope constraints and a maximum 7-day expiration date. Explicitly configure your local password managers and authentication tools to require biological verification (e.g., TouchID/FaceID) or short timeout windows for every individual call executed via the terminal command line (`op signin timeout`).

By Akash Lomas
I Built a VS Code Extension to Debug Azure AI Foundry Agents Without Leaving My Editor
I Built a VS Code Extension to Debug Azure AI Foundry Agents Without Leaving My Editor

The Problem Azure AI Foundry has a genuinely great portal. You can see your agent runs, the tools it calls, the messages it sends and receives, and even a breakdown of token usage — all in a clean UI. But here's what actually happens when you're building an agent locally: Write some code, trigger a runSwitch to the browser, open the Foundry portalNavigate to your project → your agent → Traces tabFind the right runClick through to see what happenedSwitch back to VS Code to make a fixRepeat That context switch sounds minor. But when you're iterating fast — tweaking a system prompt, adjusting tool call logic, debugging why an agent handed off to the wrong sub-agent — it adds up. You're constantly pulling your attention out of your editor and into the browser and back again. What I wanted was simple: see the trace right where I'm working. What Foundry Trace Inspector Does The extension connects to your Azure AI Foundry project and gives you three views for every agent run, all inside a VS Code panel: Trajectories: The Full Span Tree A Gantt-style collapsible tree showing the full execution: Session → Invoke Agent → Chat turns → Tool calls. Every span shows duration, token counts, and cost. Click any span to open a detail drawer with the model, status, token breakdown, and raw input/output. Duration Per-span timing bars — see exactly how long each step took. Tokens Input vs output token breakdown per span. This is the view I use most during debugging. At a glance, I can see: did the tool call happen? How long did it take? What did the LLM actually receive as input? User View: Readable Conversation Replay A chat-bubble timeline of the full conversation: user messages and assistant replies rendered the way a human reads them, with the agent name and model on each assistant turn. Each assistant bubble has a "View Trace" button that jumps directly to the corresponding response in the sidebar — so you can go from "something looked off in this reply" to the raw span in one click. Token and Cost Chart A stacked bar chart (input vs output tokens per LLM turn) so you can instantly spot which turns are burning the most tokens — useful when you're trying to understand why a multi-turn conversation is getting expensive. Per span cost breakdown for both input and output tokens consumed. How It Works Under the Hood Azure AI Foundry agents use the OpenAI Responses API internally. Every agent reply produces a resp_... response ID that's visible in the Foundry portal's Traces tab. The extension fetches those responses directly via the same API and reconstructs the full conversation timeline locally. When a session spans multiple turns, each response links to the previous one via previous_response_id. Load any response in the chain and the extension walks the chain automatically — you don't need to manually track down every ID. Conversation IDs (conv_...) are discovered automatically from your saved responses, so once you track one response, the whole conversation surfaces. No intermediate server. The extension makes API calls only to the Azure endpoint you configure. Your API key is stored in VS Code's encrypted SecretStorage — it never touches settings.json and never leaves your machine. Setting It Up You need two things: An Azure AI Foundry project endpoint URL (found in the Foundry portal under your project → Overview)Either an API key or Azure CLI auth (az login) via DefaultAzureCredential Once configured, grab a conv_... conversation ID from the portal's Traces tab, paste it into the sidebar, and the extension fetches all responses in that conversation automatically. What's Next A few things I want to add in v0.2: Auto-discovery of recent runs – instead of pasting IDs manually, list recent conversations directly from the panelSide-by-side diff – compare two runs of the same agent to see what changed between runsExport to Markdown – generate a readable trace report you can paste into a PR or incident note Further Reading What is Foundry Agent Service? – official overview of the service this extension connects toUse the Azure OpenAI Responses API – the underlying API the extension fetches trace data fromMicrosoft Foundry Pricing – understand what your agents actually cost to runVS Code Webview API – how the timeline panel is builtVS Code Extension API – full reference if you want to contribute or build on top of this

By Jubin Abhishek Soni DZone Core CORE
Foxit MCP Server: Give AI Agents Direct Access to 30+ PDF Tools via Model Context Protocol
Foxit MCP Server: Give AI Agents Direct Access to 30+ PDF Tools via Model Context Protocol

Wiring a document automation agent directly to REST endpoints forces you to repeat the same plumbing for every operation: push a file up, poll until the task finishes, pull the result down, catch failures, and juggle auth tokens across several services. With PDFs, that cycle runs again for each conversion, OCR pass, or merge in your pipeline. The Foxit PDF API MCP Server replaces all of that with 30+ tools an agent can invoke directly, while the MCP Server absorbs the upstream REST mechanics behind the scenes. This article walks through registering the server, the full tool catalog it advertises, how Foxit’s eSign and DocGen REST APIs carry the same agent session forward into signing and document generation, and a concrete four-step workflow you can reproduce with your own files. MCP Architecture in 90 Seconds The MCP specification splits responsibility across three roles. The Host is the LLM runtime, such as Claude Desktop, VS Code with GitHub Copilot, or Cursor, which owns the conversation and chooses when a tool should run. The Server is the capability provider, a process that publishes tools over the MCP protocol and runs them against an underlying service. Tools are the individual operations a server makes callable, each described by a JSON schema so the host knows what goes in and what comes out. Foxit sits on both ends of this picture. Foxit PDF Editor ships as an MCP Host, the first PDF application to take that role, reaching outward to external MCP servers such as Gmail or Salesforce so its built-in AI assistant can use those services. The Foxit PDF API MCP Server points the other way, publishing Foxit’s cloud PDF Services API as 30+ tools that any MCP Host can invoke. The operations the MCP Server surfaces span format conversion, content extraction, OCR, merge, split, compress, flatten, linearize, compare, watermark, form data import/export, security, and property inspection. Foxit’s eSign API and DocGen API sit outside the MCP Server as independent REST services, which means they never appear as MCP tools. An agent workflow can still call them within the same session, just through the agent’s own code-execution layer instead of the MCP protocol itself, a difference the eSign section unpacks fully. PDF processing belongs to the MCP tools; signing and template generation belong to code the agent executes. Prerequisites and Configuration Three things need to be in place before you register the server: A Foxit developer account to obtain a client_id and client_secret (the free plan at developer-api.foxit.com needs no credit card)Python 3.11+ alongside the uv package manager, or Node.js 18+ with pnpm if you prefer the TypeScript versionAny MCP-compatible host, such as Claude Desktop, VS Code, or Cursor Grab the repo from github.com/foxitsoftware/foxit-pdf-api-mcp-server and add it to your host’s MCP configuration. Claude Desktop is the host used in the walkthrough below, but the identical command, args, and env values carry over to any MCP host. In Claude Desktop, open Settings, switch to the Developer tab, and choose Edit Config. Next, open claude_desktop_config.json in any text editor. The file lives at ~/Library/Application Support/Claude/ on macOS or %APPDATA%\Claude\ on Windows. Register the Foxit server beneath the mcpServers key: JSON { "mcpServers": { "foxit-pdf": { "command": "uv", "args": [ "--directory", "/path/to/foxit-pdf-api-mcp-server", "run", "foxit-pdf-api-mcp-server" ], "env": { "FOXIT_CLOUD_API_HOST": "https://na1.fusion.foxit.com/pdf-services", "FOXIT_CLOUD_API_CLIENT_ID": "your_client_id", "FOXIT_CLOUD_API_CLIENT_SECRET": "your_client_secret" } } } } Define FOXIT_CLOUD_API_CLIENT_ID and FOXIT_CLOUD_API_CLIENT_SECRET as system environment variables before the host process starts. Feeding credentials in through prompt context is a security exposure that any production setup should close off. The client_id and client_secret from your developer portal cover authentication for every MCP tool call against the PDF Services API. Bringing eSign into the same agent session means performing its own OAuth2 token exchange (detailed in the next section), so the two credential scopes never mix. Once you save the file, quit Claude Desktop entirely and relaunch it. On startup, it reads the config and spawns the server as a local subprocess communicating over standard input and output, which is the transport the Foxit server speaks. After the restart, the Foxit MCP server should appear as Running under local MCP servers in the Developer tab. Head to the Customize tab, open Connectors, and click foxit-pdf to inspect the tools the Foxit MCP server provides; the full set of 30+ registered tools should be listed there. If the connector never appears, the server failed to launch. Claude’s logs at ~/Library/Logs/Claude/mcp*.log usually reveal why, most often a missing uv binary or an incorrect --directory path. Invoking a tool is as simple as typing a natural-language request like “Convert this Word file to PDF and compress it.” The agent picks pdf_from_word and pdf_compress, and before each call executes, Claude Desktop displays an approval prompt listing the exact tool name and arguments; the tool’s JSON result then streams back into the chat. That per-call approval doubles as your audit point, because it shows precisely which tool the agent selected and the arguments it supplied. To run the server in VS Code instead, place the equivalent entry in .vscode/mcp.json under a top-level servers key, adding a "type": "stdio" field, so VS Code launches the process the same way: JSON { "servers": { "foxit-pdf": { "type": "stdio", "command": "uv", "args": [ "--directory", "/path/to/foxit-pdf-api-mcp-server", "run", "foxit-pdf-api-mcp-server" ], "env": { "FOXIT_CLOUD_API_HOST": "https://na1.fusion.foxit.com/pdf-services", "FOXIT_CLOUD_API_CLIENT_ID": "your_client_id", "FOXIT_CLOUD_API_CLIENT_SECRET": "your_client_secret" } } } } An alternative path is running MCP: Add Server from the Command Palette (Cmd+Shift+P or Ctrl+Shift+P), selecting Command (stdio), then choosing Workspace to store the entry in .vscode/mcp.json or Global to keep it in your user profile. After saving, VS Code displays inline Start, Stop, and Restart actions above the server entry and adds it to the MCP SERVERS - INSTALLED view, where a green indicator and the discovered tool count confirm everything is connected. PDF Services MCP Tools: Full Catalog The 30+ tools fall into seven functional categories. Nearly all of them expect a documentId produced by an earlier upload_document call and hand back a resultDocumentId you can feed to download_document whenever you need the output on disk. The one exception is pdf_from_url, which takes a URL directly. Document Lifecycle upload_document: push a PDF, Office file, image, HTML file, or plain text file to the cloud; returns a documentId used by every later operationdownload_document: pull a processed result down to a local file pathdelete_document: remove stored files from cloud storage when you are done with them PDF Creation (File to PDF) pdf_from_word, pdf_from_excel, pdf_from_ppt: turn Office documents into PDFspdf_from_text, pdf_from_image, pdf_from_html: turn plaintext, image files, or HTML into PDFspdf_from_url: fetch a live URL and render the page as a PDF PDF Conversion (PDF to File) pdf_to_word, pdf_to_excel, pdf_to_ppt: recover editable Office formats from a PDFpdf_to_text, pdf_to_html, pdf_to_image: produce text, HTML, or image representations Manipulation pdf_merge: join multiple PDFs into a single filepdf_split: divide a PDF by page ranges, page count, or one file per pagepdf_extract: lift a subset of pages out of a PDFpdf_compress: shrink file size by 30-70% depending on content typepdf_flatten: bake form fields and annotations into static content (a requirement for compliance archiving workflows)pdf_linearize: prepare a file for Fast Web View so browsers can stream pages as they loadpdf_watermark: stamp text or image watermarks with configurable position, opacity, and rotationpdf_manipulate: rotate, delete, or rearrange pages Analysis pdf_compare: diff two PDFs and produce a color-coded annotation document highlighting the changespdf_ocr: turn scanned or image-based PDFs into searchable text, with multi-language supportpdf_structural_analysis: detect document structure (titles, headings, paragraphs, tables with cell grids, images, form fields, hyperlinks, and metadata) with bounding boxes, following the Foxit PDF structural extraction engine schema. The output is JSON delivered inside a downloadable ZIP rather than a set of named business entities; it describes layout and structure only, and converting that into fields such as party names falls to the agent’s LLM, which performs the semantic extraction over the JSON Security and Forms pdf_protect: lock a document with password protection using 128-bit or 256-bit AES encryption plus granular permission flagspdf_remove_password: lift password protection off a documentexport_pdf_form_data: read form field values out as JSONimport_pdf_form_data: fill form fields from a JSON payload Properties get_pdf_properties: report page count, page dimensions, PDF version, encryption status, digital signature info, embedded files, font inventory, and document metadata In production document pipelines, the operation that gets called most is pdf_from_word. The agent uploads a DOCX, receives a documentId, then invokes pdf_from_word with that ID. Under the hood the PDF Services API performs the conversion asynchronously, but the MCP Server takes care of polling internally and hands the finished result straight back to the agent. MCP tool call: JSON { "name": "pdf_from_word", "input": { "documentId": "doc_abc123" } } MCP tool response: JSON { "success": true, "taskId": "task_xyz789", "resultDocumentId": "doc_result456", "message": "Word document converted to PDF successfully. Download using documentId: doc_result456" } From here, hand doc_result456 to download_document to save the PDF locally, or pipe it straight into the next tool in a chain, such as pdf_structural_analysis or pdf_compress. Extending to eSign: Foxit’s Signing API as a Complementary REST Layer Once the MCP tools finish PDF processing, the workflow’s next stage sends a document out for signature through Foxit’s eSign REST API, hosted at https://na1.foxitesign.foxit.com. Everything in this guide targets the na1 (US) region. Foxit also runs regional eSign hosts for the EU (eu1.foxitesign.foxit.com), Canada (na2.foxitesign.foxit.com), and Australia (au1.foxitesign.foxit.com). Payloads and endpoints stay identical across regions; only the host differs, so select whichever host satisfies your data residency requirements. The eSign API lives outside the Foxit MCP Server, so it is not an MCP tool, and that detail shapes how the agent gets to it. Most MCP hosts have no ability to fire arbitrary HTTP requests themselves, which means eSign is never reached “through MCP.” The agent instead calls eSign from its own code-execution layer, whether that takes the form of a host-provided code interpreter, an agent framework executing Python, or a custom tool you register that wraps the eSign endpoints. The cleanest pattern for production is wrapping the eSign operations you need as custom MCP tools so the host invokes them exactly as it invokes the PDF tools; the production considerations section comes back to this. The code below is what runs inside that layer. Authentication relies on OAuth2 client_credentials. This eSign token exchange is a separate flow from the PDF Services header auth that powers your MCP tools: Python import requests resp = requests.post( "https://na1.foxitesign.foxit.com/api/oauth2/access_token", data={ "client_id": ESIGN_CLIENT_ID, "client_secret": ESIGN_CLIENT_SECRET, "grant_type": "client_credentials", "scope": "read-write" } ) access_token = resp.json()["access_token"] “Folder” is the term the Foxit eSign API developer guide uses throughout its documentation. An automated signing flow centers on these endpoints: POST /api/folders/createfolder: build a signing folder from one or more PDF documents, including signers, subject, and messagePOST /api/folders/sendDraftFolder: send a draft folder out to its signersPOST /api/templates/createtemplate: store a reusable template from a PDF with pre-placed signature fields (later instantiate a folder from it via POST /api/templates/createFolder)GET /api/folders/viewActivityHistory?folderId={id}: fetch the activity audit trail for a folder after it has been sent (a draft that was never shared returns an error)Webhook channels for status callbacks: register a callback URL to get real-time events whenever signers view, sign, or decline A createfolder call accepts the PDF produced by your MCP pipeline, uploaded into eSign’s document storage after download_document fetches it, and configures the signing workflow: POST /api/folders/createfolder Authorization: Bearer {access_token} Content-Type: application/json JSON { "folderName": "Acme Corp Contract - Q3 2025", "sendNow": false, "fileUrls": ["https://your-storage.example.com/acme_contract_final.pdf"], "fileNames": ["acme_contract_final.pdf"], "parties": [ { "firstName": "John", "lastName": "Smith", "emailId": "[email protected]", "permission": "FILL_FIELDS_AND_SIGN", "sequence": 1 } ] } With sendNow at false, the call creates a draft folder you dispatch later through a separate request to /api/folders/sendDraftFolder. Setting sendNow to true instead creates and sends in one step. When a file cannot be reached by URL, include "inputType": "base64" and supply the documents as a base64FileString array in place of fileUrls; leaving out inputType causes the API to reject the base64 payload as empty. Foxit’s eSign API comes with HIPAA, eIDAS, ESIGN Act, UETA, 21 CFR Part 11, FERPA, and FINRA compliance built in. Each audit trail record captures signer location, IP address, recipient identity, event timestamp, consent confirmation, security level, and the complete folder history. If legal defensibility matters in your regulated industry, persist those fields in your own data layer as well, since depending entirely on Foxit’s folder history API for compliance record-keeping leaves a single point of failure in your audit chain. End-to-End Workflow: AI Agent Automates a Sales Contract Imagine a sales ops agent handed one natural language goal, “Generate a contract for Acme Corp, $48,000 ARR, and send it for signature.” No part of the tool sequence is hard-coded. Because the MCP Server advertises its PDF tools to the host at connection time, the agent can interpret the goal, recognize it has a template to render and a document to route for signature, and choose which operations to run and in what order. The PDF steps execute as MCP tool calls, while the DocGen and eSign steps execute from the agent’s code layer. The sequence shown below is one plausible run the agent could produce, not a fixed script assembled ahead of time. The agent starts with MCP tools to get a PDF in hand. It uploads the DOCX contract template through upload_document, gets documentId: "doc_abc" back, and runs pdf_from_word. The MCP Server manages the async conversion internally and reports resultDocumentId: "doc_pdf" when the job finishes. To understand what the PDF contains, the agent runs pdf_structural_analysis against documentId: "doc_pdf". The tool never returns named entities such as “party” or “ARR.” What comes back is a resultDocumentId pointing at a ZIP archive, so the agent fetches it with download_document, unpacks it, and reads the structural JSON describing headings, paragraphs, and table cells along with their positions. Semantic extraction is the job of the agent’s LLM, which reads that structural JSON and lifts “Acme Corp” from a heading or a contract value from a table cell, verifying the fields it needs exist. Structure comes from the tool; meaning comes from the model. If you would rather have an API return business entities directly instead of relying on the model to interpret layout, that capability belongs to Foxit’s iDox.ai Document API, a separate service purpose-built for entity and PII extraction. Holding the field values, the agent produces the finished contract via the DocGen API, posting to /document-generation/api/GenerateDocumentBase64 so the values merge into the template through {{dynamic_tags} syntax. Because DocGen is synchronous, the finalized PDF arrives in the response body with Acme Corp’s name, the $48,000 ARR figure, and the right dates filled in. There is no polling step. The last move is routing the document for signature. The agent authenticates against the eSign OAuth2 endpoint, uploads the DocGen output, builds a signing folder through /api/folders/createfolder with [email protected] as the signer, and sends it via /api/folders/sendDraftFolder. The thread running through all of this is that the model derives the order from the goal rather than following a script. PDF steps resolve to MCP tool calls the host already knows about, while DocGen and eSign steps pass through the agent’s code layer because those APIs are not MCP tools. Each step’s output feeds the next step’s input, and the only orchestration left for you to maintain is whatever exposes that code layer to the model, ideally a set of custom tools rather than ad hoc scripting. Production Considerations: Error Handling, Rate Limits, and Data Governance Calling PDF Services through the MCP Server means async polling stays inside the server process, and your agent only ever sees the final resultDocumentId once the task completes. Calling the raw PDF Services REST API directly is different, since every operation hands back a taskId you must poll yourself. The pattern below uses exponential backoff capped at 10 seconds per interval with a 30-second overall timeout: Python import time, requests API_HOST = "https://na1.fusion.foxit.com/pdf-services" auth_headers = { "client_id": "your_client_id", "client_secret": "your_client_secret" } def poll_task(task_id: str, max_wait: int = 30) -> str: delay = 1 elapsed = 0 while elapsed < max_wait: resp = requests.get( f"{API_HOST}/api/tasks/{task_id}", headers=auth_headers ) data = resp.json() if data["status"] == "COMPLETED": return data["resultDocumentId"] time.sleep(delay) elapsed += delay delay = min(delay * 2, 10) raise TimeoutError(f"Task {task_id} timed out after {max_wait}s") Since eSign and DocGen are not MCP tools, be deliberate about how the agent reaches them. Allowing the model to emit raw HTTP from a free-form code interpreter is fragile and difficult to audit. The sturdier approach is wrapping the specific eSign and DocGen operations you actually use, such as create-folder, send-folder, and generate-document, as custom MCP tools with typed inputs. The host then invokes them over the same protocol it uses for the PDF tools, credentials remain inside the tool process instead of the prompt, and the agent’s decisions surface as inspectable tool calls rather than opaque scripts. The output of pdf_structural_analysis warrants a caution of its own. For a long contract, the structural JSON can contain many thousands of elements, and pushing the whole file into the model can silently exceed its context window, a failure that usually shows up as truncated or confused extraction instead of a clean error. The code that unzips the archive should filter the JSON before the model ever sees it, retaining only the element types and pages that matter (for a contract, typically the heading blocks and the relevant table) instead of forwarding the entire document. The free developer plan at developer-api.foxit.com is sized for development and testing volumes. Production workloads beyond the free-tier threshold call for a volume plan requested through the Developer Portal. On the data governance side, every API call travels over TLS 1.2+, and documents at rest are protected with AES-256 encryption. Foxit’s API security documentation details SOC 2 Type II audit status, HIPAA BAA support, GDPR, CCPA, eIDAS, ESIGN Act, UETA, 21 CFR Part 11, FERPA, and FINRA requirements. Customer data is kept in logically segmented environments. Teams in healthcare, legal, or financial services should confirm data residency requirements before wiring up production document flows, then pick the matching regional eSign host described earlier, because the host you call determines where the data gets processed. Run Your First Tool Call Now A working MCP tool call is under 15 minutes away: Sign up for a free developer account at developer-api.foxit.com (no credit card, instant access), then copy your client_id and client_secret from the dashboard.Set the three environment variables: Shell export FOXIT_CLOUD_API_HOST="https://na1.fusion.foxit.com/pdf-services" export FOXIT_CLOUD_API_CLIENT_ID="your_client_id" export FOXIT_CLOUD_API_CLIENT_SECRET="your_client_secret" Clone the repo, register it with the config block from the Prerequisites section, restart your MCP host, and call pdf_from_url against any public URL. A confirmed PDF lands in your working directory. The Developer Portal also offers a live API Playground where you can validate request payloads against the PDF Services API before connecting them to an agent. To extend toward a full signing workflow, the smallest useful addition on top of the MCP setup is authenticating against the eSign OAuth2 endpoint and posting a static PDF to /api/folders/createfolder. From there, DocGen field population, pdf_structural_analysis extraction, and webhook callbacks build on the same pattern step by step. Claim your free API access at developer-api.foxit.com.

By Lucien Chemaly
When Valid SQL Was Still the Wrong Answer
When Valid SQL Was Still the Wrong Answer

Editor’s Note: The following is an article written for and published in DZone’s 2026 Trend Report, Cognitive Databases, Intelligent Data: Unified Infrastructure for Vector Search, AI-Optimized Queries, and Hybrid Workloads. I started working on a personal project with a simple question: If AI can analyze a database schema and generate SQL, what still makes the answer hard to trust? The first version of my prototype worked at a surface level. A user could ask a business question, and the system would retrieve the relevant schema, generate SQL, run the query against a sample analytics database, and return a result. Technically, that felt like progress. But then I tested a question like, What is monthly revenue? The SQL ran. The database returned an answer. Still, I could not say the answer was truly correct because the meaning of revenue was not clear enough. After that test, I stopped treating the prototype as a text-to-SQL demo. I started handling it as an experiment in the database context that an AI assistant needs: metric definitions, semantic retrieval, validation rules, and governance signals. The semantic registry and validation layer help move the prototype from raw text-to-SQL toward governed, context-aware analytics. The Problem: Context Is Not the Same as Understanding The model could write SQL; the problem was that SQL execution alone did not prove the answer was right. In my database, monthly revenue could mean net revenue after refunds, gross order value, or paid revenue. Active customer could mean a customer with a login, a purchase, or both. Even if the model retrieved the right tables, it still needed to understand which business definition to use. The main failure signals were: Metric names that had more than one possible meaningMultiple date columns that could change the answerJoins that were technically possible but not analytically safeMissing filters like date range, status, or regionQuestions that needed clarification before execution The Constraints: Keep It Small, But Make It Reliable Because this was a personal project, I was not trying to build a full BI platform or enterprise data catalog. I wanted to focus on one narrow and realistic piece: how to make an AI-generated database answer more trustworthy before it reaches the user. Table: Prototype Boundaries ConstraintDesign ResponseSmall project scopeFocused on a few high-risk metricsAmbiguous business termsCreated explicit metric contractsThe schema alone was not enoughAdded semantic retrieval over definitionsNo analyst review stepAdded validation before SQL executionSimple user experienceUsed clarification instead of exposing schema complexity The hard part was keeping the prototype lightweight without making it too shallow. I wanted enough structure to make the answers safer, but not so much complexity that the project turned into a full governance platform. The Tradeoffs: What I Changed in the Pipeline The first major decision was to add a lightweight semantic registry. Each metric contract included the metric name, definition, grain, default date column, required filters, safe dimensions, and approved join path. YAML metric: net_revenue definition: paid_amount - refunded_amount grain: month date_column: payment_date required_filters: - payment_status = 'completed' clarify_if_missing: - date_range I almost relied only on schema retrieval because it was easier to build. But the schema only tells the model what exists. It does not tell the model what is correct for a specific business question. The second decision was to retrieve both schema metadata and metric definitions. This made the retrieval step more useful because the model was both matching a question to tables and grounding the answer in business meaning. The third decision was to validate SQL before execution. The generated query had to pass checks for allowed tables, approved joins, required filters, and selected dimensions. If it failed, the system either regenerated the query with stronger constraints or asked the user a clarification question. Table: Decisions, Tradeoffs, and Outcomes ChoiceTradeoffOutcomeAdd metric contractsMore setup workClearer business meaningRetrieve semantic context, not just schemaMore retrieval complexityBetter groundingValidate before executionSlightly slower responseFewer misleading answers What Changed The biggest lesson was that query execution is not the same as analytical correctness. A query can run successfully and still answer the wrong question. The prototype became more reliable when I stopped treating the database as just tables and columns. The model needed more context: what the metric means, which joins are safe, which filters are required, and when it should ask the user for clarification. My main takeaway was practical: Before tuning prompts, define the meaning layer the prompt is expected to respect. For intelligent data systems, the interesting work is not only faster retrieval or cleaner SQL. It is the connection between data, definitions, governance rules, and answers people can trust. This is an excerpt from DZone’s 2026 Trend Report, Cognitive Databases, Intelligent Data: Unified Infrastructure for Vector Search, AI-Optimized Queries, and Hybrid Workloads.Read the Free Report

By Anusha Kovi DZone Core CORE
Keeping AI-Powered BI Honest: A Human-in-the-Loop (HITL) Playbook
Keeping AI-Powered BI Honest: A Human-in-the-Loop (HITL) Playbook

A few months ago, I led a BI project with a deceptively simple pitch: let business users ask questions in plain English, and hand back the answer. We wired an LLM to our warehouse, got SQL generation working, and ran a pilot. It did not go well. The model was actually right a lot of the time, and that wasn’t the problem. The problem was that nobody on the business side could tell when it was right. Prompts came in tangled; the model would interpret one clause subtly wrong, and we’d return a clean-looking number sitting on top of a clean-looking SQL query. The users couldn’t read the SQL. When we tried to surface the model’s reasoning, it was a wall of CTEs and join keys that helped no one. We had humans in the loop. Reviewers, too. The catch is they weren’t operating as a loop; they were operating as a relay. They’d glance at the SQL, agree it looked plausible, and forward the answer along. The user nodded. Two weeks later, finance would surface a number that didn’t reconcile, and by the time we traced it back, the decision had already been made. That was the painful version of the lesson I want to share. HITL is not a checkbox between the model and production. It’s a translation layer. The model produces SQL and rows; the user needs an answer they trust. A human has to do the work of turning one into the other, and the system around that human has to make the work possible. Show a reviewer raw SQL plus a confidence score, and you’ve built a relay, not a loop. Below is the playbook I wish I’d had on day one. 1. Confidence Threshold Routing Score every generated query before it runs. Self-consistency sampling is the cleanest version: generate 5 candidate SQL statements for the same prompt and check how many agree on the join logic. If 4 out of 5 join to dim_employee and one joins to dim_customer, your agreement ratio is 0.8. If your threshold is 0.85, that query gets routed to review even though it looks correct on the surface. Agreement across multiple generations is a stronger signal than any single model’s confidence score, which is famously well-calibrated for the wrong things. Aggregated log-probabilities are another option. The choice matters less than the discipline: anything below the threshold goes into a queue, never straight to execution. The threshold itself becomes a tunable lever you tighten over time as you learn which query patterns deserve more scrutiny. 2. Staged Execution With Approval Gates Confidence on its own isn’t enough for high-stakes domains. Define a list of high-impact tables such as revenue facts, employee dimensions, compliance event logs, and require human approval for any query that touches them, regardless of confidence score. The model might be certain. The business context still demands validation. In practice, the table list is governance work, not engineering work. The data team should own it, finance and HR should ratify it, and you should revisit it the same way you revisit access controls. If you let engineering pick the list alone, the list will be wrong, and nobody outside engineering will know it’s wrong until it’s too late. 3. Reviewer Tooling: The Part Everyone Underinvests In This is where my pilot fell over. Showing a non-technical reviewer raw SQL and asking “looks good?” is worse than no review at all, as it produces fake assurance. Reviewer tooling has to bridge SQL and business context. On a single screen, the reviewer needs the original natural-language prompt, the semantic-model entities the query touches (measures and dimensions, not table names), the filters being applied in human terms, and the expected shape of the result. The reviewer’s job is to validate intent, not parse syntax. Build the interface around that. If your reviewers are reading SQL out loud in their heads to figure out what a query does, you’ve shipped a relay. 4. Audit-Linked Approval Records Every reviewer decision to approve, reject, or edit has to write back to an audit log alongside the original prompt, the generated SQL, the reviewer’s identity and a timestamp. That log is the dataset you’ll need months later. It’s how you explain a number when finance comes asking. It’s how you recalibrate thresholds based on what actually shipped versus what got bounced. It’s how you find the query patterns that consistently trip up the model. Skip this step and the program loses its memory. You keep paying the human-review cost without ever compounding the learnings, which is the worst of both worlds. 5. Escalation Paths Reviewers will get stuck. They’ll sense a query is doing something odd without being able to articulate why, especially when it crosses domain boundaries. Give them a one-click route to a domain expert such as a finance lead, HR ops, or compliance, along with their concern, without freezing the user who originally submitted the query. The whole point is to prevent reluctant approvals. A reviewer who isn’t sure should never feel pressured to sign off because they have no other option. In my pilot, “no other option” was the silent failure mode. Reviewers approved because rejecting felt rude, and the loop swallowed the doubt instead of routing it. 6. HITL Bypass Logging When a query clears the confidence threshold and isn’t flagged as high-impact, it just runs. Log that bypass anyway with the score that justified it, the prompt, and the SQL. This is the data that surfaces threshold drift, model regressions, and good training examples for the next iteration. It also closes the audit gap between “approved by a human” and “approved by silence”. Without it, you can’t tell the two apart, which means you can’t defend either. Wrapping Up Shipping AI-generated SQL straight to production is reckless. The model will be wrong, and it will be wrong in ways that look right. A single bad number in a board deck can outlive whoever wrote the prompt. HITL isn’t a nice-to-have here. It’s the only thing standing between a useful BI assistant and a very fast way to make confident, well-formatted and completely wrong decisions. The lesson from my pilot wasn’t that humans should validate SQL. It’s that humans have to translate. The model speaks in joins; the business speaks in outcomes. Build the loop so the people in the middle have a real chance of bridging the two, like tooling that surfaces intent, processes that protect them from reluctant sign-off, audit trails that turn every decision into future training data. Do that, and you get a BI assistant that’s actually trusted. Skip it, and you get a relay that breaks quietly until it doesn’t. Key Takeaways Confidence threshold routing using self-consistency sampling catches semantic errors that a model’s own confidence scores miss. Generate multiple candidates and measure agreement.Staged execution with approval gates protects high-stakes queries such as revenue, headcount, and compliance regardless of model confidence.Reviewer tooling has to bridge SQL and business context. Show the prompt, the semantic entities, and the expected output shape, and never just the raw query.Audit-linked approval records are the dataset you’ll need to recalibrate thresholds and explain numbers when finance comes asking months later.Escalation paths prevent reluctant approvals. Make it one click to route to a domain expert.HITL bypass logging turns silent successes into a feedback loop and closes the gap between “approved by a human” and “approved by silence.”

By Nithish Shetty
Fix the Target, Precompute Once: A Backend-Free Word-Ladder Solver With a BFS Distance Field
Fix the Target, Precompute Once: A Backend-Free Word-Ladder Solver With a BFS Distance Field

When you build an interactive puzzle, the latency budget is unforgiving. Every keystroke needs an answer that feels instant. A daily word-ladder game has to do three of those instant jobs at once: confirm that the word a player typed is legal, tell them the best possible score for the day, and, on request, reveal the shortest solution. I ran into all three while building Poople, a daily game where you change a 4-letter word into POOP one letter at a time, and the fix turned out to be a tidy lesson in trading repeated computation for one-time precomputation. The obvious approach is to run a graph search whenever you need an answer. That works, and it is also the wrong default here. This article walks through why, then shows how fixing the destination word lets you replace every future search with a single offline pass plus an O(1) lookup. The whole solver then runs in the browser, with no backend and no per-request search. Figure 1. The expensive graph work happens once at build time. The runtime only does lookups. The Problem in Graph Terms A word ladder connects two words by changing one letter at a time while keeping a valid word at each step. The idea is old. Lewis Carroll published it as Doublets in 1877. Model it as a graph, and it becomes a textbook shortest-path problem: Each valid 4-letter word is a node.Two nodes share an edge when their words differ by exactly one letter.The shortest path between two words is the fewest steps to ladder between them. Figure 2. Distances to the fixed target POOP form a field. Every word with a finite distance has a neighbor one step closer. In Poople, the destination is always the same word, POOP, and that fixed target is the hinge the whole design turns on. The shortest distance from a word to POOP is what the game calls par, the best achievable score for that day's starting word. In an unweighted graph like this one, breadth-first search gives those shortest distances directly. The graph is small. The shipped dictionary holds about 2,300 valid 4-letter words, and the hardest starting words sit around eleven steps from POOP. Small, but not so small that you want to search for it again on every interaction. Modeling the Edges Without Storing Them You do not need an adjacency list. Because an edge is just a one-letter difference, you can generate a node's neighbors on demand by trying every single-letter change and keeping the ones that land on a real word. Membership is a Set lookup. TypeScript const ALPHABET = "abcdefghijklmnopqrstuvwxyz"; /** Every dictionary word exactly one letter away from `word`. */ function neighbors(word: string, dictionary: Set<string>): string[] { const out: string[] = []; for (let i = 0; i < word.length; i++) { for (const c of ALPHABET) { if (c === word[i]) continue; const candidate = word.slice(0, i) + c + word.slice(i + 1); if (dictionary.has(candidate)) out.push(candidate); } } return out; } For a 4-letter word, this checks 4 positions times 25 other letters, so 100 candidate strings, each an O(1) Set lookup. The graph stays implicit, which keeps the shipped data to a flat word list rather than a serialized edge structure. Figure 5. Neighbors are generated by mutating each position, then filtered by membership in the word set. Invalid strings are dropped. The Naive Approach, and Why It Does Not Fit With neighbors in hand, the textbook move is a per-query breadth-first search that returns the path. TypeScript function shortestPath(start: string, end: string, dict: Set<string>): string[] | null { if (!dict.has(start)) return null; const queue: string[][] = [[start]]; const visited = new Set<string>([start]); while (queue.length) { const path = queue.shift()!; const node = path[path.length - 1]; if (node === end) return path; for (const next of neighbors(node, dict)) { if (!visited.has(next)) { visited.add(next); queue.push([...path, next]); } } } return null; This is correct and easy to read. It also has two properties I did not want in a game loop. It stores a full path for every entry in the queue, so memory grows with the frontier. More importantly, it repeats the entire search for every word a player explores. On a game whose target never changes, that is the same work over and over. The Inversion: One Search From the Target Here is the key observation. Every query ends at the same node, POOP. So search backward from POOP exactly once. One breadth-first pass sources at the target labels every reachable word with its distance to POOP. That labeling is a distance field, the same idea as a flow field in grid pathfinding, and it answers every future query in advance. Figure 4. Searching per query repeats work. One precomputed field turns every later query into a lookup. The build step runs offline, in a script during the build, never in the player's browser. TypeScript /** Run once at build time. Distance from every reachable word to the target. */ function buildDistanceField(words: Set<string>, target = "poop"): Map<string, number> { const dist = new Map<string, number>([[target, 0]]); let frontier = [target]; while (frontier.length) { const next: string[] = []; for (const word of frontier) { const d = dist.get(word)! + 1; for (const neighbor of neighbors(word, words)) { if (!dist.has(neighbor)) { dist.set(neighbor, d); next.push(neighbor); } } } frontier = next; } return dist; Because the graph is undirected, distance from POOP to a word equals distance from that word to POOP, so one source covers the entire dictionary. The output serializes to one word,distance line per word, which is the data the game ships. TypeScript // build-distances.ts const field = buildDistanceField(allWords); const lines = [...field].map(([word, d]) => `${word},${d}`).join("\n"); writeFileSync("word-dist.ts", "export const WORD_DIST_RAW = `\n" + lines + "\n`;"); For about 2,300 words, the full pass finishes in a few milliseconds on a laptop, and the resulting table is roughly 17 KB of raw text. That table is the only artifact the runtime needs. Runtime: Lookups Instead of Searches At load, the shipped table parses once into two structures: a Map from word to distance, and a Set of valid words. After that, the three jobs from the introduction are all constant-time or close to it. TypeScript const distEntries: Array<[string, number]> = WORD_DIST_RAW .trim() .split("\n") .map((line) => { const [word, dist] = line.split(","); return [word.trim().toLowerCase(), parseInt(dist, 10)]; }); /** word -> shortest distance to POOP. This is "par". */ export const wordDist: Map<string, number> = new Map(distEntries); /** Every legal move, derived from the same table. */ export const allWords: Set<string> = new Set(distEntries.map(([w]) => w)); export function getDist(word: string): number { return wordDist.get(word.toLowerCase()) ?? -1; // -1 means unknown word } export function isWord(word: string): boolean { return allWords.has(word.toLowerCase()); } Validating a move is isWord, an O(1) Set lookup. Reading par is getDist, an O(1) Map lookup. The third job, showing a full shortest solution, is where the distance field pays off a second time. You do not need another search. From the start word, repeatedly step to any neighbor whose distance is one less than the current distance, until you reach POOP. TypeScript function solveShortestPath(start: string, target = "poop"): string[] { let current = start.toLowerCase(); const path = [current]; let dist = getDist(current); if (dist < 0) return path; // unknown word, no route while (current !== target && dist > 0) { const step = neighbors(current, allWords).find((n) => getDist(n) === dist - 1); if (!step) break; path.push(step); current = step; dist -= 1; } return path; } This greedy descent is always correct on a distance field, and that is worth stating precisely. Every node at distance d greater than zero has at least one neighbor at distance d - 1, because that is exactly how BFS assigned the labels. So a step down always exists, and the walk reaches zero in d steps. There is no queue and no visited set. The work is proportional to par, which caps near eleven, so a solution is effectively free to produce. Figure 3. One shortest solution, produced by stepping down the distance field one level at a time. A Daily Puzzle With No Database There is one more piece. The game is the same for everyone in the world on a given day, and it still has no backend. The puzzle is a pure function of the clock. Take whole days since a fixed epoch and use that integer both as the puzzle number and as the index into a list of starting words. TypeScript const DAY_MS = 86_400_000; const EPOCH_UTC = Date.UTC(2025, 7, 14, 8, 0, 0, 0); // 2025-08-14 08:00 UTC function daysSinceEpoch(nowMs = Date.now()): number { if (nowMs <= EPOCH_UTC) return 0; return Math.floor((nowMs - EPOCH_UTC) / DAY_MS); } function getStartWord(startWords: string[], dayIndex = daysSinceEpoch()): string { const len = startWords.length; return startWords[((dayIndex % len) + len) % len]; No database read, no per-user state, no synchronization. Two players who open the page at the same moment compute the same puzzle independently. The 08:00 UTC rollover is just the time component baked into the epoch. Because the result depends only on the date, the page is fully cacheable at the edge, which is what lets the whole game sit behind a CDN. Tradeoffs and Lessons Precompute when the target is fixed. The entire win comes from one constraint: every query ends at the same node. That lets a single backward search amortize across all future queries. If the target varied per day, you would rebuild the field per day, which is still cheap here but changes the calculus. A distance field beats path-in-queue BFS for repeated queries. The naive solver allocates a growing array per queued path and re-explores every time. The field uses one shared Map, and reconstruction is a greedy walk with O(par) memory. Keep the shipped data flat and parse it once. A word,distance table is trivial to generate, diff in version control, and parse into a Map and a Set at module load. There is no custom binary format to maintain. Mind the graph-construction cost if you scale up. Generating neighbors with per-position membership tests is O(N x L x 26) across the dictionary. At four letters and a 26-letter alphabet, that is nothing. For longer words or larger alphabets, the classic optimization is to bucket words by wildcard patterns such as *OOP, P*OP, PO*P, and POO*, so words sharing a bucket are neighbors. That builds adjacency in O(N x L) and is worth the switch only when the simple version starts to hurt. Guard the edges. Unknown words return a sentinel distance of -1 rather than throwing, and the descent has a natural termination because the distance strictly decreases. A small step cap is a cheap safety net against any future data inconsistency. Many shortest paths can exist. Several routes can tie for par. The greedy descent returns one valid par path, which is all the game needs to show, and scoring by step count treats every par route as equal. Where This Pattern Applies The technique generalizes to any setting where many shortest-path queries share a fixed endpoint over a static graph: Routing toward a single sink, such as a depot or an exit.Autocomplete ranking by edit distance to a fixed term.Game hint systems and grid flow fields, where a unit always heads toward one goal.Any repeated shortest-path query to a constant target where the graph rarely changes. The limits follow from the assumptions. The field assumes a fixed target and a static graph. Change the target or the word set, and you rebuild the field, which is a build-time cost rather than a request-time one. For variable targets, a bidirectional search or a small set of precomputed fields, one per target, keeps most of the benefit. The lesson that stuck with me is simple. When a search always ends in the same place, stop searching forward from the start. Search backward from the end once, write down the answer for every node, and let the runtime read instead of compute. You can see the result running live at Poople, where every par score and every shortest solution is a lookup into a table that was built before you ever opened the page.

By horus he
From Open SQL to CDS Views: Rewriting SAP Data Access for Performance at Scale
From Open SQL to CDS Views: Rewriting SAP Data Access for Performance at Scale

Modern SAP landscapes running on SAP HANA demand a rethink of how ABAP programs access data. Traditional Open SQL queries embedded in ABAP code have served developers for decades, but at large data volumes, they can become performance bottlenecks. SAP’s introduction of Core Data Services (CDS) views offers a new paradigm: push more work to the in-memory database and retrieve only what’s needed. Traditional ABAP Data Access With Open SQL Open SQL is the standard SQL interface in ABAP that allows developers to query the underlying database in a database-agnostic way. For example, an ABAP report might join two tables and fetch results like this: Plain Text SELECT bkpf~bukrs, bkpf~belnr, bkpf~gjahr, bseg~koart, bseg~wrbtr, bseg~shkzg FROM bkpf INNER JOIN bseg ON bkpf~bukrs = bseg~bukrs AND bkpf~belnr = bseg~belnr AND bkpf~gjahr = bseg~gjahr INTO TABLE @DATA(it_fi_docs) WHERE bkpf~bukrs = '1000' AND bkpf~gjahr = '2023' AND bseg~koart = 'K'. This Open SQL example joins the BKPF and BSEG tables to retrieve financial documents. Open SQL sends such queries to the database, and on SAP HANA, the heavy lifting of the join and filtering is done in-memory on the DB server. The result is then brought back to the ABAP application server. However, the challenge with Open SQL at scale comes when ABAP code handles large data sets or complex logic in the application layer. Common performance issues in legacy ABAP include: Too much data transferred: Selecting wide tables or not filtering enough leads to heavy network and memory usage. Best practice is to filter and aggregate in the query to keep the result set small and transfer only the required columns (avoid SELECT *). Multiple round-trips: Performing calculations with many small queries or loops causes repeated DB calls. It’s more efficient to push joins and subqueries into one SQL if possible. Each context switch adds overhead. Application-side processing: If business logic runs on millions of records in ABAP, the application server CPU becomes the bottleneck. The database could perform these operations faster, set-wise. In summary, while Open SQL can express complex data retrieval, ABAP developers traditionally had to be very disciplined in query design to avoid performance issues at scale. This paved the way for a new approach leveraging SAP HANA’s strengths. The Case for Change: Code-to-Data Paradigm SAP HANA’s in-memory, columnar architecture enables it to execute aggregations, filters, and joins extremely fast at the database level. To exploit this, SAP advocated the code-to-data paradigm. push computations down to the database rather than pulling data up to the code. Rewriting data access using CDS views is a key technique in this paradigm, alongside others like AMDP. By offloading heavy operations to the DB, we minimize data transfer and let HANA’s optimized engines handle crunching the data. For example, instead of reading a full table and then filtering in ABAP, you pass WHERE conditions so the DB does it. Instead of multiple selects and merges in ABAP, you perform a JOIN or a subquery in one shot. Another driver for change is SAP’s new data models in S/4HANA. Many classic transparent tables were replaced by HANA-optimized structures or compatibility views. Custom ABAP code written for ECC often breaks or needs adaptation for S/4HANA’s simplified data model. In these cases, SAP often provides CDS views as the new interface to data. As one DZone article notes, engineers moving to S/4 must switch to the S/4 equivalents to replace old data access logic. In short, adopting CDS views is not only about performance but also about aligning with SAP’s modern architecture. Introducing ABAP Core Data Services (CDS) Views ABAP CDS is a framework to define rich data models directly on the database, using a declarative syntax in ABAP Development Tools (ADT). A CDS view is essentially a view in the HANA database, defined via an ABAP DDL statement. For example, here’s a simple CDS view definition joining two tables: Plain Text @AbapCatalog.sqlViewName: 'ZDEMO_FLIGHTS' define view ZFlightInfo as select from spfli inner join scarr on spfli.carrid = scarr.carrid { scarr.carrname as carrier, spfli.connid as flight, spfli.cityfrom as departure, spfli.cityto as arrival } This CDS view ZFlightInfo performs the same join between SPFLI and SCARR as an equivalent Open SQL join would. In fact, you could copy-paste the join logic from ABAP into the CDS definition with minor syntax changes. After activating this view in ADT, the system creates a database view in HANA. ABAP programs can then consume the CDS view just like a table: SQL SELECT * FROM ZFlightInfo INTO TABLE @DATA(it_flights) ORDER BY carrier, flight. The result set it_flights from the CDS view will be identical to what an Open SQL join would produce for the same input tables. Under the hood, both approaches result in the database executing a similar SQL SELECT. So, why use CDS? The benefits become evident as complexity grows: Reusability and model centralization: CDS definitions are stored in the ABAP Dictionary and can be reused by any number of programs or even other CDS views. Instead of writing the same joins or calculations in multiple ABAP reports, you define them once in a CDS view. SAP recommends using a CDS view when you need to retrieve data from multiple related tables, because it involves the least amount of coding and can be reused in multiple objects. In large-scale systems, this consistency is key to a single source of truth for that piece of data logic. Rich expression and metadata: CDS supports advanced SQL features and built-in functions. You can define calculated fields, aggregations, and even leverage specialized HANA capabilities within the view. CDS also allows adding annotations, making the data model self-descriptive. Performance through pushdown: By moving logic into the CDS (and thus into SQL on the database), you reduce the workload on the ABAP layer. The database can apply filters, joins, and computations in parallel, using its optimized engines. Only the final result is sent back to ABAP. Secure and controlled access: CDS views integrate with the SAP authorization concept, ensuring consistent enforcement of business security rules at the data model level, rather than scattering checks in ABAP code. This means performance benefits without sacrificing governance. Tutorial: Converting an Open SQL to a CDS View (with Code) To solidify the concept, let’s walk through a simple conversion. Imagine we have an ABAP report that needs to list flight routes with the airline name. In classic ABAP, you might do this with an inner join in Open SQL as shown below: Open SQL Approach (Legacy ABAP code): Plain Text DATA: lt_flights TYPE TABLE OF zflight_info. "Structure for results SELECT scarr~carrname AS carrier, spfli~connid AS flight, spfli~cityfrom AS departure, spfli~cityto AS arrival FROM spfli INNER JOIN scarr ON spfli~carrid = scarr~carrid INTO TABLE @lt_flights ORDER BY carrname, connid. This code joins SPFLI with SCARR and populates an internal table lt_flights. It works, but the logic is embedded in the program. Now, suppose we want to reuse this same join in multiple places. We can refactor it into a CDS view: CDS View Approach: Define the view in ABAP DDL (e.g., in Eclipse ADT): Plain Text @AbapCatalog.sqlViewName: 'ZFLIGHTINF' @AccessControl.authorizationCheck: #NOT_REQUIRED define view ZFlightInfo as select from spfli inner join scarr on spfli.carrid = scarr.carrid { scarr.carrname as carrier, spfli.connid as flight, spfli.cityfrom as departure, spfli.cityto as arrival } We give the view a name ZFlightInfo. Note that this is almost identical to the Open SQL, just expressed as a view definition. Once activated, the CDS is available system-wide. Now our ABAP report can simply do: Plain Text SELECT * FROM ZFlightInfo INTO TABLE @lt_flights ORDER BY carrier, flight. The result in lt_flights will be the same. We have effectively decoupled the data retrieval logic from the program and centralized it in the DB layer. This not only improves reuse; in a HANA system, it can also improve performance. The database can better optimize a single persistent view than ad-hoc SQL scattered in code. And if we needed to adjust the join or add a new field. Performance Considerations and Best Practices When rewriting Open SQL to CDS, ABAP developers should keep a few important considerations in mind: Measure, don’t guess: Simply converting an Open SQL to a CDS view doesn’t magically speed up the query if it was already efficient. As noted earlier, for straightforward SELECTs or joins, the performance will be equivalent in many cases. The real gains come when you use CDS to do more complex processing in one go. Always use tools like ST05 SQL trace or HANA’s PlanViz to ensure the new design is actually optimal. The execution plan is what matters, not whether you wrote it in Open SQL or CDS. Avoid over-complex views: It’s possible to go overboard with stacking CDS views on top of each other. While layering is good for separation of concerns, too many nested views or excessive use of associations can lead to very complex SQL at runtime. This can confuse the optimizer or prevent predicate pushdown. Be wary of heavy calculations in a single CDS. If performance suffers, consider alternatives like ABAP Managed DB Procedures (AMDP) for really complex logic or break the problem down differently. Select only what you need: Just as with Open SQL, a CDS view should be designed to return only necessary fields and records. Don’t define a CDS with SELECT * from a wide table list the needed fields. This ensures consumer queries aren’t unknowingly pulling extra data. One common pitfall is using CDS to expose an entire table with all columns, which defeats the purpose. Instead, tailor views to use cases or use parameters in CDS to filter data. Use CDS features wisely: Leverage CDS capabilities like aggregations, calculated fields, and unions to eliminate extra work in ABAP. Reuse and consistency: Replace multiple Open SQL implementations of the same logic with a single CDS. Not only does this reuse improve maintainability, but it also means the database might handle the unified load more efficiently. SAP itself follows this approach in S/4HANA with the Virtual Data Model, hundreds of CDS views that serve as the source for Fiori apps and reports, rather than raw table access. By moving to CDS, you align your custom code to the same philosophy. Conclusion Rewriting data access from Open SQL to CDS views is a strategic move for ABAP developers aiming to maximize performance at scale. By pushing more logic to the SAP HANA database, we take full advantage of its in-memory speed and parallel processing. CDS views enable complex data gathering in one shot, reduce the load on the application server, and provide a modular, reusable data model for your SAP applications. That said, an engineer must also approach CDS with a critical eye, understanding the execution plan and ensuring that moving to CDS truly improves the situation, rather than blindly adding abstraction. Advanced ABAP development is about choosing the right tool for the job. In the case of data-intensive operations, CDS views have proven to be a powerful tool, aligning with SAP’s modern direction and delivering robust performance at scale. By rewriting your data access with CDS and following best practices, you can future-proof your ABAP code for the HANA era, achieving faster results and a cleaner, more sustainable codebase for the long run.

By Deepika Paturu
Jakarta NoSQL: Why JPA Is Not Enough for the AI Era
Jakarta NoSQL: Why JPA Is Not Enough for the AI Era

The most effective way to present this idea is to begin with the challenge architects face: AI has transformed the persistence landscape. Enterprise applications were once built almost exclusively on relational databases, making JPA a keystone of Jakarta EE. Today, modern systems use a mix of relational databases, document stores, caches, graph engines, and increasingly, vector databases that support semantic search, retrieval-augmented generation (RAG), and AI-powered applications. Polyglot persistence is now the industry standard. While Jakarta EE standardized relational persistence through JPA, it still lacks a vendor-neutral standard for non-relational persistence. This gap forces developers to rely on fragmented, proprietary solutions, creating barriers to portability, productivity, and innovation. The rise of AI makes this gap critical. Vector databases are now essential to intelligent systems, supporting semantic search, embeddings, and contextual retrieval. For Jakarta EE to remain the leading enterprise Java platform in the AI era, it must offer a standardized approach to NoSQL persistence, as it did for relational databases. Jakarta NoSQL is not just another specification; it constitutes a strategic investment in the ecosystem's future. By offering a familiar programming model, reducing vendor lock-in, and integrating with AI workloads, Jakarta NoSQL ensures that Jakarta EE remains relevant and competitive for the next generation of enterprise applications. NoSQL in the AI Era: Understanding the Modern Data Landscape For years, enterprise data persistence focused on relational databases. Systems relied on tables, rows, foreign keys, and SQL, making relational technology the standard for business applications. While still essential, modern architectures now use polyglot persistence, where multiple database types coexist, each satisfying specific requirements. Today, NoSQL refers to a family of database paradigms, each engineered for specific workloads and architectural needs, rather than just document databases. Key-value databases store data as key-value pairs, enabling fast lookups and low latency. Typical uses include caching, user sessions, feature flags, and temporary application state.Document databases store data as structured documents, such as JSON or BSON. They are effective for applications having hierarchical or evolving schemas, including web applications, e-commerce platforms, and content management systems.Column-family databases organize data by columns instead of rows, supporting high write throughput and horizontal scalability. They are used for IoT telemetry, event logging, analytics, and large-scale distributed systems.Graph databases model entities and relationships as nodes and edges. This structure is ideal for social networks, fraud detection, recommendation engines, dependency analysis, and knowledge graphs in which relationships are critical.Vector databases store high-dimensional embeddings from machine learning models and large language models (LLMs). They enable semantic search, similarity matching, retrieval-augmented generation (RAG), recommendation platforms, and other AI-driven features via understanding meaning instead of exact text matches.Time-series databases specialize in timestamped data that changes over time. They are used for observability, monitoring, financial markets, industrial sensors, and operational metrics where high-performance temporal data storage and analysis are essential. These database types often coexist within the same architecture. Modern applications may use PostgreSQL for transactions, Redis for caching, MongoDB for documents, Neo4j for relationships, InfluxDB for telemetry, and a vector database like Milvus, Pinecone, or Weaviate for AI-powered search and retrieval. This approach, known as polyglot persistence, is now standard in enterprise systems. The industry has embraced this shift. The Stack Overflow Developer Survey shows that while relational databases still dominate enterprise workloads, NoSQL technologies are now standard tools for developers. Technologies like Redis, MongoDB, and Elasticsearch are used alongside PostgreSQL and MySQL. Organizations no longer choose between SQL and NoSQL; instead, they combine multiple persistence technologies to leverage their strengths. Polyglot persistence is now the baseline for modern software systems. Vector databases are especially important among NoSQL categories, as they are basic to modern Artificial Intelligence systems. In contrast to traditional databases that store explicit business data, vector databases store numerical representations called embeddings. Generated by machine learning models, these embeddings encode the semantic meaning of words, documents, images, or other content as mathematical vectors. This enables software to search and retrieve information based on meaning rather than exact text matches. The distinction between lexical and semantic search illustrates the significance of vector databases. For example, a traditional SQL search for “Pet” returns records with that exact term, such as “Pet Shop,” but ignores related expressions like “Dog” or “Puppy.” Semantic search, by comparing embeddings, retrieves documents about dogs, puppies, or animal companions because it recognizes their semantic relationship. The search engine matches meaning, not just syntax. This function is vital for modern AI architectures. Large language models do not process relational tables directly; they use embeddings and contextual connections between concepts. Systems such as retrieval-augmented generation (RAG), enterprise knowledge search, recommendation engines, and intelligent assistants depend on similarity searches across millions of vectors. While relational databases can support some vector operations through extensions, vector databases are purpose-built for these workloads, offering optimized indexing and similarity algorithms for large-scale semantic retrieval. As AI adoption grows, vector databases are becoming a strategic component of enterprise architecture. Appreciating the importance of NoSQL, several Java ecosystems have developed their own solutions. Spring offers independent projects like Spring Data MongoDB, Spring Data Redis, and Spring Data Cassandra. These integrations provide a productive programming model but are tightly coupled to the Spring ecosystem. Quarkus supports NoSQL persistence through Panache and database-specific integrations, emphasizing developer productivity and cloud-native deployment. Micronaut Data supports several NoSQL engines, using compile-time code generation and ahead-of-time processing to improve performance and reduce execution overhead. While these solutions are effective, they remain framework-specific rather than platform standards. Developers switching frameworks encounter different APIs, abstractions, annotations, and operational models, even when solving similar persistence challenges. Jakarta EE addressed this for relational persistence with Jakarta Persistence (JPA), delivering a standardized, vendor-independent programming model. As NoSQL technologies expand and AI workloads more and more depend on vector databases, the lack of a vendor-neutral NoSQL standard is a significant gap in the Jakarta ecosystem. The Java Standardization Journey The need for a standardized NoSQL solution in the Java ecosystem has been discussed for years. During the Java EE era, several proposals tried to integrate non-relational databases into the enterprise platform. As NoSQL technologies grew in popularity throughout the 2010s, developers anticipated a dedicated specification to accompany traditional enterprise APIs at JavaOne conferences. Despite clear demand, no such initiative emerged within Java EE. The platform remained focused on relational persistence via JPA, leaving NoSQL adoption to rely on vendor-specific libraries and framework integrations. The transition of Java EE to the Eclipse Foundation provided an opportunity to address this challenge. Instead of waiting for a platform-level solution, the community launched Eclipse JNoSQL, an open-source project supplying a unified programming model for NoSQL databases. Drawing on JPA's success, Eclipse JNoSQL introduced mapping annotations, repositories, templates, and communication APIs that support document, key-value, column-family, and graph databases. The project showed that a consistent developer experience could be attained without compromising each database model's unique features. As Jakarta EE matured, Eclipse JNoSQL became the foundation for a new standardization effort: Jakarta NoSQL. Jakarta NoSQL was the first persistence specification created entirely within the Jakarta EE process. Unlike earlier specifications that migrated from Java EE, Jakarta NoSQL was conceived, developed, and released under the Eclipse Foundation governance model. It was among the first to complete the full Jakarta Specification Process from inception to release. Jakarta NoSQL's impact extended beyond its initial scope. During development, the expert group identified a common challenge for both relational and non-relational databases: developers needed a consistent repository abstraction independent of the underlying persistence engine. This led to the creation of a separate specification, Jakarta Data. The need to standardize NoSQL access patterns directly influenced the development of Jakarta Data's repository-oriented programming model, which applies across multiple persistence technologies. The relationship between these specifications highlights Jakarta NoSQL's broader influence on the Jakarta EE ecosystem. Jakarta NoSQL focuses on mapping and interacting with non-relational databases, while Jakarta Data delivers a unified repository abstraction for both relational and NoSQL implementations. Together, they significantly reduce fragmentation in enterprise persistence. This evolution continued beyond Jakarta Data. The drive to standardize modern persistence requirements has inspired new specifications, such as Jakarta Query, which aims to deliver a portable, type-safe, and expressive query language for various persistence technologies. As the Jakarta ecosystem grows, Jakarta NoSQL acts as a key milestone. It addressed the long-standing absence of a NoSQL standard and helped lay the foundation for the next generation of persistence specifications within Jakarta EE. Jakarta NoSQL: Built for NoSQL, Not Adapted to It When architects consider standardizing NoSQL development in Jakarta EE, a common question arises: why not extend Jakarta Persistence (JPA) to support NoSQL databases? JPA has long provided a unified programming model for relational databases in the Java ecosystem. The answer is based on a core architectural principle: tools should be optimized for their intended purpose. The first challenge is that JPA was designed specifically for relational databases, relying on concepts like tables, columns, joins, foreign keys, and transactional consistency. These are not simply implementation details but core elements of the specification. Forcing document, graph, key-value, or vector databases into this model creates friction and limits the use of each database’s native features. The second challenge is that NoSQL systems behave fundamentally differently. Graph databases perform path traversals, document databases store nested structures without normalization, key-value databases focus on fast lookups, and vector databases handle similarity calculations. These systems also differ in consistency, transactions, query languages, indexing, and scalability capabilities. Representing all these paradigms through a single relational abstraction leads to compromises. The third challenge is the importance of specialization. As Abraham Maslow noted, “if the only tool you have is a hammer, it is tempting to treat everything as if it were a nail.” Relational databases are effective, but not ideal for every persistence need. Semantic search, graph traversal, and high-volume telemetry storage are not relational problems. Applying a relational abstraction to all database types runs the risk of losing the unique optimizations each technology provides. Examine the analogy of transportation: cars, boats, submarines, and airplanes all address transportation but are specialized for different environments. Forcing them to use the same controls would result in mediocrity across all. Similarly, a single persistence abstraction may remove the features that make each database effective. Therefore, Jakarta NoSQL does not extend JPA beyond its intended scope. Instead, it offers a dedicated persistence model for non-relational databases, while continuing to maintain the familiar developer experience that contributed to JPA’s success. A key design goal of Jakarta NoSQL is to reduce mental effort for enterprise Java developers. Teams experienced with JPA should find the specification immediately approachable, as Jakarta NoSQL intentionally uses familiar terminology and concepts from the Jakarta EE community. Developers will encounter annotations like @Entity, @Id, and @Column, enabling a smooth transition from relational to non-relational persistence. Java @Entity public class Car { @Id private Long id; @Column private String name; @Column private CarType type; } At first glance, this entity closely resembles a JPA entity, which is intentional. However, the underlying implementation is fundamentally different. Jakarta NoSQL is built to support schema flexibility, embedded structures, nested documents, and database-specific storage models. This approach is reflected throughout the API. Instead of requiring developers to oversee low-level driver details, Jakarta NoSQL offers a high-level programming model via the Template API. Java @Inject Template template; Car ferrari = Car.builder() .id(1L) .name("Ferrari") .build(); template.insert(ferrari); List<Car> sports = template.select(Car.class) .where("type").eq(CarType.SPORT) .orderBy("name") .result(); The objective mirrors JPA’s original mission: permitting developers to focus on domain models and business logic, rather than serialization, connection management, or vendor-specific APIs. This foundation shaped Jakarta NoSQL 1.0. The initial release introduced the mapping layer, CDI integration, repository support, template operations, and standardized endpoints for four major NoSQL categories: Document databasesKey-value databasesColumn-family databasesGraph databases Jakarta NoSQL 1.0 showed that a unified Java programming model can respect the particular characteristics of each database family. Jakarta NoSQL 1.1 continued this evolution. While version 1.0 focused on mapping and persistence, version 1.1 expanded querying capabilities through integration with Jakarta Query. A key addition is support for parameterized queries, letting developers to safely bind parameters instead of manually constructing query strings. Java List<Car> cars = template.query( "FROM Car WHERE type = :type") .bind("type", CarType.SPORT) .result(); Version 1.1 also introduces projection support, allowing applications to retrieve lightweight views instead of entire entities. Java @Projection public record TechCarView( String name, CarType type) { } List<TechCarView> views = template .typedQuery( "FROM Car WHERE type = 'SPORT'", TechCarView.class) .result(); These features improve performance, reduce data transfer, and comply with modern Java features such as records. An important aspect of Jakarta NoSQL is its long-term architectural vision. While most developers use the mapping layer, the specification also defines a lower-level communication API for advanced scenarios. Java DocumentManagerFactory factory = ...; DocumentManager manager = factory.get("users"); DocumentRecord record = ...; manager.put(record); Optional<DocumentRecord> result = manager.findByKey("user:10"); manager.deleteByKey("user:10"); This communication layer is optional. Application developers can build complete systems without it, but it is valuable for database vendors, framework authors, and advanced integrations needing direct access to database capabilities. This design is fundamentally different from JDBC, which assumes communication through SQL statements and tabular result sets. That model works well because relational databases share a common language and interaction pattern. NoSQL databases do not. Document databases may use BSON, graph databases may offer traversal languages, and vector databases may provide similarity-search APIs. Others use REST endpoints, binary protocols, gRPC streams, or vendor-specific mechanisms. Forcing these models into a JDBC-style abstraction would limit their capabilities or demand ongoing vendor-specific extensions. For this reason, Jakarta NoSQL uses a layered architecture. The mapping layer offers a portable, productive programming model for developers, while the communication layer remains flexible to support diverse NoSQL systems. This architecture positions the specification for future growth. As new technologies like vector databases, time-series engines, and AI-native storage emerge, Jakarta NoSQL can evolve without imposing a relational mindset. Rather than treating every database as a nail for the JPA hammer, Jakarta NoSQL recognizes that different problems require different tools, while still presenting a consistent and familiar experience for enterprise Java developers.

By Otavio Santana DZone Core CORE
Your AI Coding Agent Can't Steal What It Never Had: The Docker Sandbox Isolation Story
Your AI Coding Agent Can't Steal What It Never Had: The Docker Sandbox Isolation Story

I ran an AI coding agent against a broken Kubernetes deployment for five minutes. The agent called Anthropic's API dozens of times — reasoning about manifests, running kubectl commands, redeploying workloads. It made fully authenticated requests throughout the entire session. The API key was never in its environment. Shell env | grep -iE "anthropic|api_key|secret|token|password" # (empty) That is Docker Sandbox's credential isolation model in action. This article is about what that actually means — and what else the isolation holds, breaks, and surprises you with when you probe it properly. Key Takeaways Docker Sandbox uses a host-side proxy to inject API credentials without the agent ever seeing them — the agent makes authenticated calls without possessing the keySeven live isolation probes confirmed the boundary held throughout real AI agent activity, not just at restNetwork policy is hostname-scoped HTTP filtering — not a full network control plane — with three specific behaviors the documentation doesn't make clearDevOps agents can run docker build and kubectl inside the sandbox without any path to the host Docker daemon or cluster credentialsThe --branch parallel agent mode is Git-level isolation, not VM-level — important distinction for threat models requiring separate credentials per agent The Setup I manage eight AKS clusters for Fortune 500 clients. My laptop has Azure service principals, SSH keys, kubeconfig files with a dozen cluster contexts, and twenty-plus repos — some with .env files containing real API keys. Running an AI agent from this machine without guardrails means the agent inherits all of it. Docker Sandbox changes that. Each sandbox is a microVM — its own Linux kernel, its own Docker daemon, its own network stack. You mount one project directory. The agent sees one project directory. Everything else on the machine does not exist inside the sandbox. I spent two weeks testing this claim. Here is what I found. Test environment: What Detail sbx version v0.31.1 · commit e658be1 Host macOS Apple Silicon Network endpoints probed 13 Isolation probes 7 targeted commands Kubernetes scenario Real agent task, two bugs, timed All findings backed by real terminal output. Full repo: github.com/opscart/docker-sandbox-devops. How the Credential Isolation Actually Works The sandbox environment has no API keys. But the agent made authenticated API calls. Here is the mechanism: Shell env | grep proxy # https_proxy=http://gateway.docker.internal:3128 # http_proxy=http://gateway.docker.internal:3128 # JAVA_TOOL_OPTIONS=-Dhttp.proxyHost=gateway.docker.internal -Dhttp.proxyPort=3128 ... Every outbound request — HTTP, HTTPS, even Java tools — routes through a proxy at gateway.docker.internal:3128. That proxy runs on the Mac host, completely outside the microVM boundary. When the agent sends a POST to api.anthropic.com, there is no Authorization header — the agent does not have the key. The request reaches the host-side proxy. The proxy checks the allowlist — api.anthropic.com is in the default AI services group under the Balanced policy. Authentication is performed by the host-side proxy using credentials stored outside the sandbox boundary. The authenticated request is forwarded to Anthropic. The agent receives the response. It has no idea what key was used, where it came from, or how to find it again. Think of it like an OAuth gateway. The proxy holds the credential and vouches for the agent's requests. The agent gets access without ever possessing the key. You cannot steal what you never had. This is architecturally different from the standard setup where ANTHROPIC_API_KEY sits in the shell environment — one echo $ANTHROPIC_API_KEY away from being exfiltrated. What the Four Isolation Layers Actually Do Docker Sandbox stacks four layers: Hypervisor isolation. Separate Linux kernel per sandbox. Host processes invisible. Other sandboxes invisible. A compromised sandbox cannot escalate to the host kernel. This is the fundamental difference from a Docker container — a container shares the host kernel. The microVM does not. Network isolation. All outbound HTTP/HTTPS routes through the host-side proxy. Raw TCP, UDP, and ICMP are blocked at the network layer. Three policy tiers: allow-all, balanced (curated dev allowlist), deny-all. Set before starting your first sandbox: Shell sbx policy set-default balanced Docker Engine isolation. Each sandbox runs a private Docker daemon with its own socket. No path to the host Docker daemon. An agent can run docker build and docker run without socket mounting — which is the tradeoff that breaks isolation in plain container-based approaches. Credential isolation. Proxy-based injection as described above. The raw key never enters the microVM. macOS host with sensitive assets and proxy on the left, Docker Sandbox microVM in the center, network policy zones on the right. Seven Isolation Proofs — Run Live After a Real Agent Task The agent exited after completing the debugging task. The sandbox remained alive, and I executed the following commands from the same shell session the agent had used — to show exactly what was accessible throughout the entire run. 1. Filesystem Boundary Shell ls /Users/opscart/ # Source ls /Users/opscart/.ssh/ 2>&1 One directory. The workspace mount. SSH keys, other repos, credential directories — none of them exist inside the sandbox. Parent directories above the workspace are read-only stubs with no siblings. One critical implication: if your workspace is your home directory, your entire home is visible and writable. Always mount a project subdirectory, not your home. 2. No Credentials in Environment Shell env | grep -iE "anthropic|api_key|aws|secret|token|password" # (empty) Confirmed. The agent that just made dozens of API calls had no raw credentials anywhere in its environment. 3. Proxy Confirms the Injection Mechanism Shell env | grep proxy # https_proxy=http://gateway.docker.internal:3128 # no_proxy=localhost,127.0.0.1,::1,[::1],gateway.docker.internal Proxy address visible. Credentials it carries: not visible. The mechanism described above confirmed live inside the running sandbox. 4. Process Namespace Shell ps aux | wc -l # 13 A macOS host runs hundreds of processes. The sandbox shows 13 — all internal. The stack includes dockerd, containerd, socat bridging SSH agent forwarding, and the coding agent. Host processes completely invisible. No way to inspect or interact with anything running on the host. 5. Private Docker Engine Shell docker info | grep -E "Server Version|Operating System|ID" # Server Version: 29.4.3 # Operating System: Ubuntu 25.10 (containerized) # ID: e6934b23-368c-4259-a873-96f879f587e5 Ubuntu 25.10. A unique daemon ID that differs from docker info on the host — confirming the sandbox runs a fully isolated daemon. The agent deployed a full Kubernetes cluster using this daemon. No path to the host Docker socket existed. 6. Host Services Unreachable Shell curl -s --max-time 3 https://localhost:6443 2>&1 || echo "blocked" # curl: (7) Failed to connect to localhost port 6443: Connection refused Port 6443 — my minikube cluster on the Mac host. From inside the sandbox, localhost is the sandbox's own loopback. Host clusters, host SSH, host services — unreachable by default. Eight AKS contexts on this machine. Zero is reachable from inside the sandbox without an explicit policy rule. 7. What the Agent Had vs. What It Didn't During the entire debugging task, the agent had full access to one project directory, kubectl to the sandbox-internal Kubernetes cluster, and full Docker capabilities against the private daemon. It could not reach any other directory, cloud credentials, other kubeconfig contexts, the host Docker daemon, or any cluster not running inside the sandbox. All seven proofs held throughout the session without exception. Three Network Policy Findings That Change How You Think About It Network policy is not a full network control plane. It is hostname-scoped HTTP filtering. Three findings define the actual scope: Finding 1: Blocking returns HTTP 403, not TCP rejection. Plain Text probe "example.com" "https://example.com" # example.com | exit=0 | http=403 Exit code 0. The curl command succeeded. The proxy returned 403 directly. An agent that retries on 403 will retry blocked requests indefinitely. It cannot distinguish a blocked domain from a legitimate server-side error by exit code. For DevOps workflows — an agent hitting a blocked container registry will keep retrying silently rather than failing fast. Finding 2: HTTP CONNECT established a tunnel to port 22 on an allowed host. Plain Text # Port 22 — SSH port curl -s --max-time 5 telnet://github.com:22 # Connected to github.com port 22 # Port 9999 — non-standard port curl -s --max-time 5 telnet://github.com:9999 # Connected to github.com port 9999 github.com is on the Balanced allowlist. HTTP CONNECT established TCP tunnels to github.com on both port 22 and the non-standard port 9999 — both succeeded. Port-based restrictions are not enforced at the proxy layer. The Balanced policy is hostname-scoped only. Any port to an allowed host is reachable via HTTP CONNECT. Finding 3: DNS is not filtered. A common assumption is that all outbound traffic routes through the HTTP proxy — including DNS. Lab results show DNS resolution occurs independently: Plain Text dig example.com +short # 172.66.147.243 A blocked domain resolved. The microVM has an internal stub resolver that forwards DNS independently of the HTTP proxy. An agent can resolve any hostname regardless of the active policy. DNS cannot serve as a secondary enforcement layer. These findings do not break the isolation model. They define its actual boundary. Network policy controls HTTP/HTTPS access by hostname. It does not control DNS, TCP tunnels to allowed hosts on arbitrary ports, or how agents interpret 403 responses. The Agent Scenario: Isolation Under Real Load The real test of isolation is not seven probe commands — it is whether the boundary holds while an agent is actively working, making API calls, running kubectl, deploying containers. I gave an AI agent a broken Kubernetes deployment: a payments-service with memory limits set to 64Mi on a service that needs ~150Mi at peak. The agent received a task file and a set of manifests. No other context. The agent completed the task in under five minutes. It found two bugs — one planted, one discovered independently by reading the manifest and noticing health check probes targeting port 8080 on an nginx container that only serves on port 80. The task said nothing about probes. Result: both pods 1/1 Running, 0 restarts. The seven isolation proofs above were verified immediately after — throughout the entire debugging session, the boundary held without exception. Full article and complete repo at opscart.com/docker-sandbox-devops. What This Means for DevOps Engineers Specifically Most Docker Sandbox articles target software developers running Claude Code on a single codebase. The DevOps case is different and more demanding. A DevOps engineer running an AI agent faces a broader attack surface: multiple cluster contexts, infrastructure credentials, IAM roles, service accounts, kubeconfigs that grant production access. The blast radius of a compromised or manipulated agent is not one repo — it is potentially every system those credentials touch. Docker Sandbox addresses this at the architecture level rather than the prompt level. You are not relying on the agent being well-behaved. You are relying on the microVM boundary, the proxy, and the private Docker daemon. The agent can be fully autonomous inside the sandbox because the guardrail is the environment, not the agent's behavior. The private Docker Engine is particularly significant. DevOps agents need to build and test containers. Every other local isolation approach that allows container operations requires socket mounting — which gives the agent direct access to the host Docker daemon and every image and volume on the host. Docker Sandbox eliminates this tradeoff. What Is Still Rough The image iteration cycle is the primary friction point. Adding a tool requires editing a Dockerfile, rebuilding, pushing to a registry, and recreating the sandbox. For a stable toolchain, this is acceptable. For rapid experimentation, it is not. The --branch parallel agent mode is Git isolation, not VM isolation. Both agents run in one microVM with shared Docker and network. For separate credentials or separate network policies per agent, you need separate workspace directories. The network policy CLI has non-obvious syntax in several places — sbx policy deny does not remove an allow rule, and external cluster access requires two policy rules not one. Neither behavior is documented. The CLI changes between minor versions. v0.31.1 changed login flow, renamed policy tiers, and introduced --clone mode. Pin your version. When Not to Use Docker Sandbox Docker Sandbox is the right tool for a specific set of problems. It is not the right tool when: You need raw UDP or ICMP. Network tracing tools (traceroute, mtr), some mTLS configurations, and anything relying on ICMP will not work — the sandbox proxy only handles HTTP/HTTPS. Your toolchain requires host-device access. USB devices, GPU passthrough beyond basic forwarding, and hardware security keys are not accessible from inside the microVM. You are on a memory-constrained machine. Each sandbox runs a full microVM plus its own Docker daemon. On a machine with 8GB RAM, running multiple sandboxes simultaneously alongside Docker Desktop and a browser will cause pressure. You need production-grade audit logging. Docker Sandbox is Experimental. Audit trails, compliance logging, and enterprise controls are not mature yet. For regulated environments, evaluate accordingly. Your agent needs to coordinate across multiple repositories simultaneously. The one-sandbox-per-workspace model means cross-repo agent work requires careful orchestration. The --clone mode helps but adds git workflow overhead. Conclusion The credential isolation model is the headline: the agent made authenticated API calls throughout the session without the API key ever entering the sandbox. Authentication was performed by the host-side proxy using credentials stored outside the sandbox boundary. The agent could use the credential — it could never see, copy, or exfiltrate it. Seven isolation proofs confirmed the boundary held under real active load. One directory visible. No credentials. No host processes. No host clusters. No host Docker daemon. The network policy findings add important nuance. The --branch mode reality is different from what the documentation implies. Docker Sandbox is Experimental, and the CLI is moving. Use it knowing what it is — and what it is not.

By Shamsher Khan DZone Core CORE

The Latest Coding Topics

article thumbnail
Mac Native Builds, Live Protocols, And Open Issues Under 350
The open issue count dropped below 350 after a push through the oldest reports, and the same week brought native Mac builds, WebSockets in the core, gRPC and GraphQL inte
June 29, 2026
by Shai Almog DZone Core CORE
· 146 Views · 1 Like
article thumbnail
Selective Deployment in Azure Data Factory: A Practical Blueprint for Safer CI/CD
Implement selective deployment in Azure Data Factory to safely promote individual features without deploying the entire factory state
June 26, 2026
by Sauhard Bhatt
· 1,008 Views · 1 Like
article thumbnail
A Tool Is Not a Platform (And Your Team Knows the Difference)
Calling a collection of tools a platform creates expectations it cannot meet. A platform has a contract. A toolchain has documentation.
June 25, 2026
by Jeleel Muibi
· 831 Views
article thumbnail
Code and Connect: MCP + MuleSoft
Understand MCP, AI agents, and assistants, and learn how Model Context Protocol connects AI applications to tools using MuleSoft.
June 25, 2026
by Ajay Singh
· 893 Views
article thumbnail
Reducing Alert Fatigue in the SOC Using Correlation Rules and Detection-as-Code
Correlation rules and risk aggregation collapse noisy alerts into fewer, higher-context escalations. Detection-as-code keeps it sustainable.
June 25, 2026
by Krishnaveni Musku
· 756 Views
article thumbnail
Architectural Cost of Rust's Orphan Rule
A deep dive into how Rust's orphan rule creates architectural trade-offs in large monorepos. Learn when to use the newtype pattern, local traits, or bridge crates
June 24, 2026
by Krun Dev
· 725 Views
article thumbnail
Implementing Asynchronous Communication Between Microservices Using Kafka and Spring Boot
Kafka decouples services, buffers spikes, and routes failures to a DLT. Schemas are contracts; consumers must be idempotent.
June 24, 2026
by Mallikharjuna Manepalli
· 1,542 Views
article thumbnail
Architectural Collapse: How Extension Poisoning, Node Vulnerabilities, and Infrastructure Fog Enabled the GitHub Repository Breach
A major GitHub breach showed how extension poisoning, Node ecosystem weaknesses, and insecure developer workstations can bypass traditional security defenses.
June 23, 2026
by Akash Lomas
· 2,149 Views · 1 Like
article thumbnail
I Built a VS Code Extension to Debug Azure AI Foundry Agents Without Leaving My Editor
Free VS Code extension for Azure AI Foundry agent traces into your editor as an interactive timeline — see tool calls, token costs, and conversation replays.
June 23, 2026
by Jubin Abhishek Soni DZone Core CORE
· 960 Views
article thumbnail
Foxit MCP Server: Give AI Agents Direct Access to 30+ PDF Tools via Model Context Protocol
Foxit MCP Server gives any AI agent direct access to 30+ PDF tools for conversion, OCR, merge, and compare via the Model Context Protocol.
June 22, 2026
by Lucien Chemaly
· 1,030 Views
article thumbnail
When Valid SQL Was Still the Wrong Answer
A personal project exploring why AI-generated SQL isn't always trustworthy and how semantic context, validation, and governance improve analytics accuracy.
June 22, 2026
by Anusha Kovi DZone Core CORE
· 892 Views · 1 Like
article thumbnail
Keeping AI-Powered BI Honest: A Human-in-the-Loop (HITL) Playbook
AI-generated SQL can look right while being wrong. Learn how human-in-the-loop workflows build trust through reviews, approvals, audits, and escalation paths.
June 22, 2026
by Nithish Shetty
· 852 Views
article thumbnail
Fix the Target, Precompute Once: A Backend-Free Word-Ladder Solver With a BFS Distance Field
Every word ladder ends at the same word. One offline BFS precomputes a distance field, making par and shortest-path queries O(1) lookups, no backend.
June 22, 2026
by horus he
· 718 Views
article thumbnail
From Open SQL to CDS Views: Rewriting SAP Data Access for Performance at Scale
Swap Open SQL for CDS views to push logic into HANA and centralize reusable data models, but verify the execution plan, not just the pattern
June 19, 2026
by Deepika Paturu
· 1,090 Views
article thumbnail
Jakarta NoSQL: Why JPA Is Not Enough for the AI Era
Jakarta NoSQL provides a familiar Java programming model while preserving the strengths of document, graph, key-value, and AI-driven vector databases.
June 19, 2026
by Otavio Santana DZone Core CORE
· 1,245 Views · 1 Like
article thumbnail
Your AI Coding Agent Can't Steal What It Never Had: The Docker Sandbox Isolation Story
Docker Sandbox runs AI agents in microVMs. The API key never enters the sandbox — the host proxy authenticates on the agent's behalf.
June 19, 2026
by Shamsher Khan DZone Core CORE
· 1,350 Views
article thumbnail
From printTriangularNumber to Duff’s Device: Mastering Java Switch Statements Old and New
This post traces that journey using triangular number computation as a practical example of intentional fall-through and connects the technique to Duff's Device.
June 19, 2026
by NaveenKumar Namachivayam DZone Core CORE
· 1,151 Views · 2 Likes
article thumbnail
Top Java Security Vulnerabilities and How to Prevent Them in Modern Java
Most Java security breaches stem from preventable coding mistakes. Follow secure coding practices, validate inputs, and keep dependencies updated to reduce risk.
June 18, 2026
by Muhammed Harris Kodavath
· 1,978 Views
article thumbnail
OpenAPI, ORM, SVG, and Lottie
Learn about Codename One's latest release with OpenAPI code generation, SQLite ORM, SVG and Lottie support, deep links, and routing.
June 17, 2026
by Shai Almog DZone Core CORE
· 2,726 Views · 1 Like
article thumbnail
The Real-Time Revolution: Why Blockchain Needs Data Stream Processing
Blockchain and data streaming are bringing unprecedented levels of security, transparency, and real-time mechanisms to move data across the digital world.
June 17, 2026
by Gautam Goswami DZone Core CORE
· 1,437 Views
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • ...
  • Next
  • RSS
  • X
  • Facebook

ABOUT US

  • About DZone
  • Support and feedback
  • Community research

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 215
  • Nashville, TN 37211
  • [email protected]

Let's be friends:

  • RSS
  • X
  • Facebook
×