DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Languages

Programming languages allow us to communicate with computers, and they operate like sets of instructions. There are numerous types of languages, including procedural, functional, object-oriented, and more. Whether you’re looking to learn a new language or trying to find some tips or tricks, the resources in the Languages Zone will give you all the information you need and more.

icon
Latest Premium Content
Refcard #357
NoSQL Migration Essentials
NoSQL Migration Essentials
Refcard #071
PostgreSQL Essentials
PostgreSQL Essentials
Refcard #029
MySQL Essentials
MySQL Essentials

DZone's Featured Languages Resources

Architectural Cost of Rust's Orphan Rule

Architectural Cost of Rust's Orphan Rule

By Krun Dev
The architectural cost of Rust's orphan rule doesn't show up on day one. It shows up when you're six months deep into a monorepo, the domain model is clean, crate split looks sane — and then you try to wire two libraries together, and the compiler just says no. No negotiation, no workaround at the language level. Foreign types don't connect to foreign traits. That's the rule. If you're coming from Java or C#, the instinct is to extend. OOP gives you ad-hoc polymorphism for free — bolt any interface onto any object, anywhere, anytime. Rust's coherence system exists specifically to make that impossible, and the collision between those two mental models is where experienced engineers lose days they don't budget for. You want to implement serde::Serialize for uuid::Uuid in your domain crate. Both are foreign types. The compiler doesn't care that it's reasonable. It sees two types you don't own and closes the door. Rust Coherence Rules Explained for Seniors: Beyond the Compiler Error Before you blame the borrow checker, understand what trait coherence actually prevents. Without it, any crate could implement any trait for any type — and a minor upstream change in a transitive dependency could silently shadow your implementation or create an ambiguous resolution. Your build breaks not because you changed anything, but because a library three levels deep added a blanket impl that overlaps with yours. Overlapping implementations aren't a warning you suppress. They're a design conflict with no runtime fallback. Rust chooses global uniqueness of trait implementations over developer convenience. One (Type, Trait) pair, one implementation, across every downstream crate in the compiled program. No priority system, no override mechanism. That's the contract. Rust Conflicting Implementations of Trait: A Design Deadlock The failure mode nobody talks about: you didn't do anything wrong. You wrote a blanket impl six months ago that made sense. You add a new dependency. That dependency also ships a blanket impl covering overlapping types. Now you have conflicting implementations of a trait, and neither is removable without breaking something real. Rust // your crate — written six months ago impl Serialize for T { ... } // new_dependency — just landed in Cargo.toml impl Serialize for T { ... } // Your core type implements both. // Compiler: two candidates, zero resolution. This is a design deadlock, not a syntax error. The two impls are individually coherent and jointly impossible. Resolution requires forking a crate or restructuring your trait hierarchy. Neither is a Tuesday afternoon fix. When you hit conflicting implementations, the first question isn't "how do I fix the error" — it's "who should have owned this impl from the start." Rust Orphan Rules Workaround: When Newtype isn't Enough The newtype pattern is the first thing everyone reaches for, and for good reason — it compiles. You wrap the foreign type in a local struct, implement the foreign trait on your wrapper, done. The orphan rule is satisfied because you now own the type. What the documentation skips is what it costs to keep that wrapper alive in a real codebase over real time. Rust struct WrappedUuid(uuid::Uuid); impl serde::Serialize for WrappedUuid { fn serialize(&self, s: S) -> Result<S::Ok, S::Error> { self.0.serialize(s) } } That's the happy path. Now your system also needs Deserialize, Display, FromStr, Hash, PartialEq, Eq, Clone, Copy, Debug. None of those derive automatically from the inner type. You either re-implement each one by hand or you reach for Deref and start the trampoline. Newtype Pattern Rust Performance Overhead: Zero-Cost or Hidden Debt? At the machine code level, the wrapper optimizes away — identical memory layout, zero runtime overhead. The cost is compile-time. Every generic function parameterized over your newtype generates its own instantiation. Five domain-level newtypes crossing the same generic utility functions means five times the monomorphization footprint. In a large monorepo, this compounds into CI slowdowns that are genuinely hard to attribute without a compile-time profiler. The Deref trampoline adds a different tax. You add impl Deref<Target = uuid::Uuid> to stop delegating methods manually. Now you have implicit coercions in play. Reviewers track which methods belong to the wrapper and which bleed through from the inner type. Multiply by thirty newtypes with inconsistent Deref usage, and you have a real cognitive load problem, not a theoretical one. Local Trait for Remote Type Rust: Strategies for Dependency Decoupling The alternative: own the trait instead of wrapping the type. Define a local trait that captures the behavior you need, implement it for the foreign type — legal, because you own the trait — and depend on your abstraction internally. No wrapper, no Deref, no trampoline. Rust trait Identifiable { fn id_bytes(&self) -> [u8; 16]; } impl Identifiable for uuid::Uuid { fn id_bytes(&self) -> [u8; 16] { *self.as_bytes() } } The limit is integration. Your local trait won't satisfy call sites that expect the foreign trait. If downstream code or framework layers speak serde::Serialize, your local abstraction is invisible to them. Dependency decoupling through local traits works inside a bounded context. It breaks at the boundary where your code has to speak the ecosystem's language. Implement External Trait for External Type Rust: The Coherence Wall The classic monorepo split — API, domain, infrastructure — is clean on a whiteboard. In practice, the orphan rule makes the boundaries structural in ways the diagram doesn't show. The domain defines Order. The infrastructure needs to serialize it. serde::Serialize is foreign. The order lives in a different crate than the serialization logic. Implement Serialize for Order in infrastructure — rejected, wrong crate ownership. Push the impl into the domain — now your domain layer couples to a serialization library. Create a bridge crate to hold the impl — now you have a dependency node whose only job is to satisfy a compiler rule. All three options carry real costs. The orphan rule forces you to make the tension explicit instead of hiding it in a convenient ad-hoc impl. The coherence wall doesn't give you a good option. It gives you a forced choice — and forces you to own it. The Performance and Maintenance Tax Nobody Budgets For Bridge crates solve the immediate coherence problem and create a different one. A crate that implements Serialize for Order depends on both domain and serde. Every time Order gains a field, the bridge crate needs an update. Every time Serde shifts its serializer interface, the same story. You've turned a single implementation concern into a maintenance obligation with its own crate, its own version, and its own CI surface. Dependency bloat from coherence workarounds isn't just binary size. It's the surface area of code that has to move when the ecosystem changes underneath you. Upstream Changes and the Downstream Crate Tax Here's the scenario that produces the most quietly expensive engineering work: you implemented Serialize for WrappedUuid because it didn't ship Serde support. Then UUID 1.3 adds a Serde feature flag. Now you have two implementations in the dependency graph — yours and the upstream one. The compiler picks one depending on resolution order, or it doesn't pick, and you get a conflict error that's genuinely confusing to diagnose. Removing your workaround means unwrapping every call site, verifying the upstream impl behaves identically to yours, and updating every downstream crate that imported the wrapper type. In a monorepo with ten crates depending on your domain primitives, that's a multi-day refactor for what should have been a one-line Cargo.toml change. Generic Over-Specialization and Monomorphization Footprint at Scale When you can't implement a foreign trait directly, the compensation is making your own APIs more generic. Trait bounds multiply, associated types appear, and concrete types get wrapped in trait objects. Every abstraction layer added to route around the coherence wall grows the monomorphization footprint and slows the compiler. A function that was fn process(order: Order) becomes fn process<T: Serializable + Identifiable + Auditable>(entity: T) — not because the domain got more complex, but because the workarounds did. Blanket impls make this worse. Writing impl<T: MyTrait> ForeignTrait for T permanently closes an extension point for every downstream crate. No crate that depends on yours can add a more specific implementation for a concrete type satisfying MyTrait. You've passed the coherence cost downstream without asking. Before shipping a blanket impl as a workaround — ask whether you're solving the problem or relocating it. The Verdict: Own Your Data or Own Your Behavior The orphan rule is a feature. Not a reframe — a literal design decision that prevents an entire class of large-scale dependency bugs. Global uniqueness of trait implementations is what makes a compiled Rust program reasonably auditable. Without it, you get C++ template conflict problems with a package manager actively making the dependency graph denser. The practical split: if the foreign trait is fundamental to how your type participates in the ecosystem — serialization, hashing, ordering — push the implementation upstream. Open a PR, request a feature flag. This amortizes the cost across the ecosystem and eliminates the maintenance surface in your codebase entirely. If the crate is unmaintained or the timeline doesn't allow it — use a newtype, scope it to the infrastructure boundary, document it as a coherence workaround, and don't let it leak into your domain model as a first-class type. When upstream ships the impl you hacked around, you'll want to know exactly where your workaround lives. Keep it tight. The compiler doesn't care about your clean architecture. But the engineer doing the migration in eight months will. Frequently asked questions What is trait coherence, and why does it constrain large Rust codebases? Trait coherence guarantees one implementation per (Type, Trait) pair across the entire compiled program. Without it, upstream changes in transitive dependencies could silently break your trait resolution. The orphan rule is the enforcement mechanism — it's not a compiler quirk, it's the price of a stable dependency ecosystem. How do overlapping implementations cause deadlocks in a monorepo? When two crates ship blanket impls covering the same concrete type, the compiler emits a hard error with no resolution path. The only exits are removing one impl, forking a crate, or restructuring trait ownership. In a multi-team monorepo, this is a coordination problem as much as a technical one. Is the newtype pattern actually zero-cost in production Rust? At runtime, yes. At compile time, no. Monomorphization footprint grows with every generic boundary the newtype crosses. In codebases with many domain-level newtypes, this compounds into measurable CI slowdowns. The Deref trampoline adds a separate maintenance tax that shows up in code review, not benchmarks. When does a bridge crate make more sense than a newtype? When you need the foreign type to stay unwrapped across multiple crates, and pushing the impl upstream isn't an option. The bridge crate satisfies the orphan rule by owning the impl without owning the type or the trait. The cost is a dependency node that tracks two upstream surfaces and needs updates when either changes. What happens when upstream ships the trait impl you worked around? You get a migration. Your workaround becomes dead weight, call sites need unwrapping, and downstream crates need updates. In a monorepo with deep dependencies on the wrapper type, this is days of work for what should be a version bump. Scope your workarounds tightly from day one. Can blanket implementations lock downstream crates out of specialization? Yes — and this is the most underappreciated long-term cost. Once you publish a blanket impl, no downstream crate can add a more specific implementation for any type it covers. That extension point is permanently closed. Every crate, depending on yours, inherits the constraint without being asked. More
Foxit MCP Server: Give AI Agents Direct Access to 30+ PDF Tools via Model Context Protocol

Foxit MCP Server: Give AI Agents Direct Access to 30+ PDF Tools via Model Context Protocol

By Lucien Chemaly
Wiring a document automation agent directly to REST endpoints forces you to repeat the same plumbing for every operation: push a file up, poll until the task finishes, pull the result down, catch failures, and juggle auth tokens across several services. With PDFs, that cycle runs again for each conversion, OCR pass, or merge in your pipeline. The Foxit PDF API MCP Server replaces all of that with 30+ tools an agent can invoke directly, while the MCP Server absorbs the upstream REST mechanics behind the scenes. This article walks through registering the server, the full tool catalog it advertises, how Foxit’s eSign and DocGen REST APIs carry the same agent session forward into signing and document generation, and a concrete four-step workflow you can reproduce with your own files. MCP Architecture in 90 Seconds The MCP specification splits responsibility across three roles. The Host is the LLM runtime, such as Claude Desktop, VS Code with GitHub Copilot, or Cursor, which owns the conversation and chooses when a tool should run. The Server is the capability provider, a process that publishes tools over the MCP protocol and runs them against an underlying service. Tools are the individual operations a server makes callable, each described by a JSON schema so the host knows what goes in and what comes out. Foxit sits on both ends of this picture. Foxit PDF Editor ships as an MCP Host, the first PDF application to take that role, reaching outward to external MCP servers such as Gmail or Salesforce so its built-in AI assistant can use those services. The Foxit PDF API MCP Server points the other way, publishing Foxit’s cloud PDF Services API as 30+ tools that any MCP Host can invoke. The operations the MCP Server surfaces span format conversion, content extraction, OCR, merge, split, compress, flatten, linearize, compare, watermark, form data import/export, security, and property inspection. Foxit’s eSign API and DocGen API sit outside the MCP Server as independent REST services, which means they never appear as MCP tools. An agent workflow can still call them within the same session, just through the agent’s own code-execution layer instead of the MCP protocol itself, a difference the eSign section unpacks fully. PDF processing belongs to the MCP tools; signing and template generation belong to code the agent executes. Prerequisites and Configuration Three things need to be in place before you register the server: A Foxit developer account to obtain a client_id and client_secret (the free plan at developer-api.foxit.com needs no credit card)Python 3.11+ alongside the uv package manager, or Node.js 18+ with pnpm if you prefer the TypeScript versionAny MCP-compatible host, such as Claude Desktop, VS Code, or Cursor Grab the repo from github.com/foxitsoftware/foxit-pdf-api-mcp-server and add it to your host’s MCP configuration. Claude Desktop is the host used in the walkthrough below, but the identical command, args, and env values carry over to any MCP host. In Claude Desktop, open Settings, switch to the Developer tab, and choose Edit Config. Next, open claude_desktop_config.json in any text editor. The file lives at ~/Library/Application Support/Claude/ on macOS or %APPDATA%\Claude\ on Windows. Register the Foxit server beneath the mcpServers key: JSON { "mcpServers": { "foxit-pdf": { "command": "uv", "args": [ "--directory", "/path/to/foxit-pdf-api-mcp-server", "run", "foxit-pdf-api-mcp-server" ], "env": { "FOXIT_CLOUD_API_HOST": "https://na1.fusion.foxit.com/pdf-services", "FOXIT_CLOUD_API_CLIENT_ID": "your_client_id", "FOXIT_CLOUD_API_CLIENT_SECRET": "your_client_secret" } } } } Define FOXIT_CLOUD_API_CLIENT_ID and FOXIT_CLOUD_API_CLIENT_SECRET as system environment variables before the host process starts. Feeding credentials in through prompt context is a security exposure that any production setup should close off. The client_id and client_secret from your developer portal cover authentication for every MCP tool call against the PDF Services API. Bringing eSign into the same agent session means performing its own OAuth2 token exchange (detailed in the next section), so the two credential scopes never mix. Once you save the file, quit Claude Desktop entirely and relaunch it. On startup, it reads the config and spawns the server as a local subprocess communicating over standard input and output, which is the transport the Foxit server speaks. After the restart, the Foxit MCP server should appear as Running under local MCP servers in the Developer tab. Head to the Customize tab, open Connectors, and click foxit-pdf to inspect the tools the Foxit MCP server provides; the full set of 30+ registered tools should be listed there. If the connector never appears, the server failed to launch. Claude’s logs at ~/Library/Logs/Claude/mcp*.log usually reveal why, most often a missing uv binary or an incorrect --directory path. Invoking a tool is as simple as typing a natural-language request like “Convert this Word file to PDF and compress it.” The agent picks pdf_from_word and pdf_compress, and before each call executes, Claude Desktop displays an approval prompt listing the exact tool name and arguments; the tool’s JSON result then streams back into the chat. That per-call approval doubles as your audit point, because it shows precisely which tool the agent selected and the arguments it supplied. To run the server in VS Code instead, place the equivalent entry in .vscode/mcp.json under a top-level servers key, adding a "type": "stdio" field, so VS Code launches the process the same way: JSON { "servers": { "foxit-pdf": { "type": "stdio", "command": "uv", "args": [ "--directory", "/path/to/foxit-pdf-api-mcp-server", "run", "foxit-pdf-api-mcp-server" ], "env": { "FOXIT_CLOUD_API_HOST": "https://na1.fusion.foxit.com/pdf-services", "FOXIT_CLOUD_API_CLIENT_ID": "your_client_id", "FOXIT_CLOUD_API_CLIENT_SECRET": "your_client_secret" } } } } An alternative path is running MCP: Add Server from the Command Palette (Cmd+Shift+P or Ctrl+Shift+P), selecting Command (stdio), then choosing Workspace to store the entry in .vscode/mcp.json or Global to keep it in your user profile. After saving, VS Code displays inline Start, Stop, and Restart actions above the server entry and adds it to the MCP SERVERS - INSTALLED view, where a green indicator and the discovered tool count confirm everything is connected. PDF Services MCP Tools: Full Catalog The 30+ tools fall into seven functional categories. Nearly all of them expect a documentId produced by an earlier upload_document call and hand back a resultDocumentId you can feed to download_document whenever you need the output on disk. The one exception is pdf_from_url, which takes a URL directly. Document Lifecycle upload_document: push a PDF, Office file, image, HTML file, or plain text file to the cloud; returns a documentId used by every later operationdownload_document: pull a processed result down to a local file pathdelete_document: remove stored files from cloud storage when you are done with them PDF Creation (File to PDF) pdf_from_word, pdf_from_excel, pdf_from_ppt: turn Office documents into PDFspdf_from_text, pdf_from_image, pdf_from_html: turn plaintext, image files, or HTML into PDFspdf_from_url: fetch a live URL and render the page as a PDF PDF Conversion (PDF to File) pdf_to_word, pdf_to_excel, pdf_to_ppt: recover editable Office formats from a PDFpdf_to_text, pdf_to_html, pdf_to_image: produce text, HTML, or image representations Manipulation pdf_merge: join multiple PDFs into a single filepdf_split: divide a PDF by page ranges, page count, or one file per pagepdf_extract: lift a subset of pages out of a PDFpdf_compress: shrink file size by 30-70% depending on content typepdf_flatten: bake form fields and annotations into static content (a requirement for compliance archiving workflows)pdf_linearize: prepare a file for Fast Web View so browsers can stream pages as they loadpdf_watermark: stamp text or image watermarks with configurable position, opacity, and rotationpdf_manipulate: rotate, delete, or rearrange pages Analysis pdf_compare: diff two PDFs and produce a color-coded annotation document highlighting the changespdf_ocr: turn scanned or image-based PDFs into searchable text, with multi-language supportpdf_structural_analysis: detect document structure (titles, headings, paragraphs, tables with cell grids, images, form fields, hyperlinks, and metadata) with bounding boxes, following the Foxit PDF structural extraction engine schema. The output is JSON delivered inside a downloadable ZIP rather than a set of named business entities; it describes layout and structure only, and converting that into fields such as party names falls to the agent’s LLM, which performs the semantic extraction over the JSON Security and Forms pdf_protect: lock a document with password protection using 128-bit or 256-bit AES encryption plus granular permission flagspdf_remove_password: lift password protection off a documentexport_pdf_form_data: read form field values out as JSONimport_pdf_form_data: fill form fields from a JSON payload Properties get_pdf_properties: report page count, page dimensions, PDF version, encryption status, digital signature info, embedded files, font inventory, and document metadata In production document pipelines, the operation that gets called most is pdf_from_word. The agent uploads a DOCX, receives a documentId, then invokes pdf_from_word with that ID. Under the hood the PDF Services API performs the conversion asynchronously, but the MCP Server takes care of polling internally and hands the finished result straight back to the agent. MCP tool call: JSON { "name": "pdf_from_word", "input": { "documentId": "doc_abc123" } } MCP tool response: JSON { "success": true, "taskId": "task_xyz789", "resultDocumentId": "doc_result456", "message": "Word document converted to PDF successfully. Download using documentId: doc_result456" } From here, hand doc_result456 to download_document to save the PDF locally, or pipe it straight into the next tool in a chain, such as pdf_structural_analysis or pdf_compress. Extending to eSign: Foxit’s Signing API as a Complementary REST Layer Once the MCP tools finish PDF processing, the workflow’s next stage sends a document out for signature through Foxit’s eSign REST API, hosted at https://na1.foxitesign.foxit.com. Everything in this guide targets the na1 (US) region. Foxit also runs regional eSign hosts for the EU (eu1.foxitesign.foxit.com), Canada (na2.foxitesign.foxit.com), and Australia (au1.foxitesign.foxit.com). Payloads and endpoints stay identical across regions; only the host differs, so select whichever host satisfies your data residency requirements. The eSign API lives outside the Foxit MCP Server, so it is not an MCP tool, and that detail shapes how the agent gets to it. Most MCP hosts have no ability to fire arbitrary HTTP requests themselves, which means eSign is never reached “through MCP.” The agent instead calls eSign from its own code-execution layer, whether that takes the form of a host-provided code interpreter, an agent framework executing Python, or a custom tool you register that wraps the eSign endpoints. The cleanest pattern for production is wrapping the eSign operations you need as custom MCP tools so the host invokes them exactly as it invokes the PDF tools; the production considerations section comes back to this. The code below is what runs inside that layer. Authentication relies on OAuth2 client_credentials. This eSign token exchange is a separate flow from the PDF Services header auth that powers your MCP tools: Python import requests resp = requests.post( "https://na1.foxitesign.foxit.com/api/oauth2/access_token", data={ "client_id": ESIGN_CLIENT_ID, "client_secret": ESIGN_CLIENT_SECRET, "grant_type": "client_credentials", "scope": "read-write" } ) access_token = resp.json()["access_token"] “Folder” is the term the Foxit eSign API developer guide uses throughout its documentation. An automated signing flow centers on these endpoints: POST /api/folders/createfolder: build a signing folder from one or more PDF documents, including signers, subject, and messagePOST /api/folders/sendDraftFolder: send a draft folder out to its signersPOST /api/templates/createtemplate: store a reusable template from a PDF with pre-placed signature fields (later instantiate a folder from it via POST /api/templates/createFolder)GET /api/folders/viewActivityHistory?folderId={id}: fetch the activity audit trail for a folder after it has been sent (a draft that was never shared returns an error)Webhook channels for status callbacks: register a callback URL to get real-time events whenever signers view, sign, or decline A createfolder call accepts the PDF produced by your MCP pipeline, uploaded into eSign’s document storage after download_document fetches it, and configures the signing workflow: POST /api/folders/createfolder Authorization: Bearer {access_token} Content-Type: application/json JSON { "folderName": "Acme Corp Contract - Q3 2025", "sendNow": false, "fileUrls": ["https://your-storage.example.com/acme_contract_final.pdf"], "fileNames": ["acme_contract_final.pdf"], "parties": [ { "firstName": "John", "lastName": "Smith", "emailId": "[email protected]", "permission": "FILL_FIELDS_AND_SIGN", "sequence": 1 } ] } With sendNow at false, the call creates a draft folder you dispatch later through a separate request to /api/folders/sendDraftFolder. Setting sendNow to true instead creates and sends in one step. When a file cannot be reached by URL, include "inputType": "base64" and supply the documents as a base64FileString array in place of fileUrls; leaving out inputType causes the API to reject the base64 payload as empty. Foxit’s eSign API comes with HIPAA, eIDAS, ESIGN Act, UETA, 21 CFR Part 11, FERPA, and FINRA compliance built in. Each audit trail record captures signer location, IP address, recipient identity, event timestamp, consent confirmation, security level, and the complete folder history. If legal defensibility matters in your regulated industry, persist those fields in your own data layer as well, since depending entirely on Foxit’s folder history API for compliance record-keeping leaves a single point of failure in your audit chain. End-to-End Workflow: AI Agent Automates a Sales Contract Imagine a sales ops agent handed one natural language goal, “Generate a contract for Acme Corp, $48,000 ARR, and send it for signature.” No part of the tool sequence is hard-coded. Because the MCP Server advertises its PDF tools to the host at connection time, the agent can interpret the goal, recognize it has a template to render and a document to route for signature, and choose which operations to run and in what order. The PDF steps execute as MCP tool calls, while the DocGen and eSign steps execute from the agent’s code layer. The sequence shown below is one plausible run the agent could produce, not a fixed script assembled ahead of time. The agent starts with MCP tools to get a PDF in hand. It uploads the DOCX contract template through upload_document, gets documentId: "doc_abc" back, and runs pdf_from_word. The MCP Server manages the async conversion internally and reports resultDocumentId: "doc_pdf" when the job finishes. To understand what the PDF contains, the agent runs pdf_structural_analysis against documentId: "doc_pdf". The tool never returns named entities such as “party” or “ARR.” What comes back is a resultDocumentId pointing at a ZIP archive, so the agent fetches it with download_document, unpacks it, and reads the structural JSON describing headings, paragraphs, and table cells along with their positions. Semantic extraction is the job of the agent’s LLM, which reads that structural JSON and lifts “Acme Corp” from a heading or a contract value from a table cell, verifying the fields it needs exist. Structure comes from the tool; meaning comes from the model. If you would rather have an API return business entities directly instead of relying on the model to interpret layout, that capability belongs to Foxit’s iDox.ai Document API, a separate service purpose-built for entity and PII extraction. Holding the field values, the agent produces the finished contract via the DocGen API, posting to /document-generation/api/GenerateDocumentBase64 so the values merge into the template through {{dynamic_tags} syntax. Because DocGen is synchronous, the finalized PDF arrives in the response body with Acme Corp’s name, the $48,000 ARR figure, and the right dates filled in. There is no polling step. The last move is routing the document for signature. The agent authenticates against the eSign OAuth2 endpoint, uploads the DocGen output, builds a signing folder through /api/folders/createfolder with [email protected] as the signer, and sends it via /api/folders/sendDraftFolder. The thread running through all of this is that the model derives the order from the goal rather than following a script. PDF steps resolve to MCP tool calls the host already knows about, while DocGen and eSign steps pass through the agent’s code layer because those APIs are not MCP tools. Each step’s output feeds the next step’s input, and the only orchestration left for you to maintain is whatever exposes that code layer to the model, ideally a set of custom tools rather than ad hoc scripting. Production Considerations: Error Handling, Rate Limits, and Data Governance Calling PDF Services through the MCP Server means async polling stays inside the server process, and your agent only ever sees the final resultDocumentId once the task completes. Calling the raw PDF Services REST API directly is different, since every operation hands back a taskId you must poll yourself. The pattern below uses exponential backoff capped at 10 seconds per interval with a 30-second overall timeout: Python import time, requests API_HOST = "https://na1.fusion.foxit.com/pdf-services" auth_headers = { "client_id": "your_client_id", "client_secret": "your_client_secret" } def poll_task(task_id: str, max_wait: int = 30) -> str: delay = 1 elapsed = 0 while elapsed < max_wait: resp = requests.get( f"{API_HOST}/api/tasks/{task_id}", headers=auth_headers ) data = resp.json() if data["status"] == "COMPLETED": return data["resultDocumentId"] time.sleep(delay) elapsed += delay delay = min(delay * 2, 10) raise TimeoutError(f"Task {task_id} timed out after {max_wait}s") Since eSign and DocGen are not MCP tools, be deliberate about how the agent reaches them. Allowing the model to emit raw HTTP from a free-form code interpreter is fragile and difficult to audit. The sturdier approach is wrapping the specific eSign and DocGen operations you actually use, such as create-folder, send-folder, and generate-document, as custom MCP tools with typed inputs. The host then invokes them over the same protocol it uses for the PDF tools, credentials remain inside the tool process instead of the prompt, and the agent’s decisions surface as inspectable tool calls rather than opaque scripts. The output of pdf_structural_analysis warrants a caution of its own. For a long contract, the structural JSON can contain many thousands of elements, and pushing the whole file into the model can silently exceed its context window, a failure that usually shows up as truncated or confused extraction instead of a clean error. The code that unzips the archive should filter the JSON before the model ever sees it, retaining only the element types and pages that matter (for a contract, typically the heading blocks and the relevant table) instead of forwarding the entire document. The free developer plan at developer-api.foxit.com is sized for development and testing volumes. Production workloads beyond the free-tier threshold call for a volume plan requested through the Developer Portal. On the data governance side, every API call travels over TLS 1.2+, and documents at rest are protected with AES-256 encryption. Foxit’s API security documentation details SOC 2 Type II audit status, HIPAA BAA support, GDPR, CCPA, eIDAS, ESIGN Act, UETA, 21 CFR Part 11, FERPA, and FINRA requirements. Customer data is kept in logically segmented environments. Teams in healthcare, legal, or financial services should confirm data residency requirements before wiring up production document flows, then pick the matching regional eSign host described earlier, because the host you call determines where the data gets processed. Run Your First Tool Call Now A working MCP tool call is under 15 minutes away: Sign up for a free developer account at developer-api.foxit.com (no credit card, instant access), then copy your client_id and client_secret from the dashboard.Set the three environment variables: Shell export FOXIT_CLOUD_API_HOST="https://na1.fusion.foxit.com/pdf-services" export FOXIT_CLOUD_API_CLIENT_ID="your_client_id" export FOXIT_CLOUD_API_CLIENT_SECRET="your_client_secret" Clone the repo, register it with the config block from the Prerequisites section, restart your MCP host, and call pdf_from_url against any public URL. A confirmed PDF lands in your working directory. The Developer Portal also offers a live API Playground where you can validate request payloads against the PDF Services API before connecting them to an agent. To extend toward a full signing workflow, the smallest useful addition on top of the MCP setup is authenticating against the eSign OAuth2 endpoint and posting a static PDF to /api/folders/createfolder. From there, DocGen field population, pdf_structural_analysis extraction, and webhook callbacks build on the same pattern step by step. Claim your free API access at developer-api.foxit.com. More
Reducing Alert Fatigue in the SOC Using Correlation Rules and Detection-as-Code
Reducing Alert Fatigue in the SOC Using Correlation Rules and Detection-as-Code
By Krishnaveni Musku
When Valid SQL Was Still the Wrong Answer
When Valid SQL Was Still the Wrong Answer
By Anusha Kovi DZone Core CORE
Keeping AI-Powered BI Honest: A Human-in-the-Loop (HITL) Playbook
Keeping AI-Powered BI Honest: A Human-in-the-Loop (HITL) Playbook
By Nithish Shetty
From Open SQL to CDS Views: Rewriting SAP Data Access for Performance at Scale
From Open SQL to CDS Views: Rewriting SAP Data Access for Performance at Scale

Modern SAP landscapes running on SAP HANA demand a rethink of how ABAP programs access data. Traditional Open SQL queries embedded in ABAP code have served developers for decades, but at large data volumes, they can become performance bottlenecks. SAP’s introduction of Core Data Services (CDS) views offers a new paradigm: push more work to the in-memory database and retrieve only what’s needed. Traditional ABAP Data Access With Open SQL Open SQL is the standard SQL interface in ABAP that allows developers to query the underlying database in a database-agnostic way. For example, an ABAP report might join two tables and fetch results like this: Plain Text SELECT bkpf~bukrs, bkpf~belnr, bkpf~gjahr, bseg~koart, bseg~wrbtr, bseg~shkzg FROM bkpf INNER JOIN bseg ON bkpf~bukrs = bseg~bukrs AND bkpf~belnr = bseg~belnr AND bkpf~gjahr = bseg~gjahr INTO TABLE @DATA(it_fi_docs) WHERE bkpf~bukrs = '1000' AND bkpf~gjahr = '2023' AND bseg~koart = 'K'. This Open SQL example joins the BKPF and BSEG tables to retrieve financial documents. Open SQL sends such queries to the database, and on SAP HANA, the heavy lifting of the join and filtering is done in-memory on the DB server. The result is then brought back to the ABAP application server. However, the challenge with Open SQL at scale comes when ABAP code handles large data sets or complex logic in the application layer. Common performance issues in legacy ABAP include: Too much data transferred: Selecting wide tables or not filtering enough leads to heavy network and memory usage. Best practice is to filter and aggregate in the query to keep the result set small and transfer only the required columns (avoid SELECT *). Multiple round-trips: Performing calculations with many small queries or loops causes repeated DB calls. It’s more efficient to push joins and subqueries into one SQL if possible. Each context switch adds overhead. Application-side processing: If business logic runs on millions of records in ABAP, the application server CPU becomes the bottleneck. The database could perform these operations faster, set-wise. In summary, while Open SQL can express complex data retrieval, ABAP developers traditionally had to be very disciplined in query design to avoid performance issues at scale. This paved the way for a new approach leveraging SAP HANA’s strengths. The Case for Change: Code-to-Data Paradigm SAP HANA’s in-memory, columnar architecture enables it to execute aggregations, filters, and joins extremely fast at the database level. To exploit this, SAP advocated the code-to-data paradigm. push computations down to the database rather than pulling data up to the code. Rewriting data access using CDS views is a key technique in this paradigm, alongside others like AMDP. By offloading heavy operations to the DB, we minimize data transfer and let HANA’s optimized engines handle crunching the data. For example, instead of reading a full table and then filtering in ABAP, you pass WHERE conditions so the DB does it. Instead of multiple selects and merges in ABAP, you perform a JOIN or a subquery in one shot. Another driver for change is SAP’s new data models in S/4HANA. Many classic transparent tables were replaced by HANA-optimized structures or compatibility views. Custom ABAP code written for ECC often breaks or needs adaptation for S/4HANA’s simplified data model. In these cases, SAP often provides CDS views as the new interface to data. As one DZone article notes, engineers moving to S/4 must switch to the S/4 equivalents to replace old data access logic. In short, adopting CDS views is not only about performance but also about aligning with SAP’s modern architecture. Introducing ABAP Core Data Services (CDS) Views ABAP CDS is a framework to define rich data models directly on the database, using a declarative syntax in ABAP Development Tools (ADT). A CDS view is essentially a view in the HANA database, defined via an ABAP DDL statement. For example, here’s a simple CDS view definition joining two tables: Plain Text @AbapCatalog.sqlViewName: 'ZDEMO_FLIGHTS' define view ZFlightInfo as select from spfli inner join scarr on spfli.carrid = scarr.carrid { scarr.carrname as carrier, spfli.connid as flight, spfli.cityfrom as departure, spfli.cityto as arrival } This CDS view ZFlightInfo performs the same join between SPFLI and SCARR as an equivalent Open SQL join would. In fact, you could copy-paste the join logic from ABAP into the CDS definition with minor syntax changes. After activating this view in ADT, the system creates a database view in HANA. ABAP programs can then consume the CDS view just like a table: SQL SELECT * FROM ZFlightInfo INTO TABLE @DATA(it_flights) ORDER BY carrier, flight. The result set it_flights from the CDS view will be identical to what an Open SQL join would produce for the same input tables. Under the hood, both approaches result in the database executing a similar SQL SELECT. So, why use CDS? The benefits become evident as complexity grows: Reusability and model centralization: CDS definitions are stored in the ABAP Dictionary and can be reused by any number of programs or even other CDS views. Instead of writing the same joins or calculations in multiple ABAP reports, you define them once in a CDS view. SAP recommends using a CDS view when you need to retrieve data from multiple related tables, because it involves the least amount of coding and can be reused in multiple objects. In large-scale systems, this consistency is key to a single source of truth for that piece of data logic. Rich expression and metadata: CDS supports advanced SQL features and built-in functions. You can define calculated fields, aggregations, and even leverage specialized HANA capabilities within the view. CDS also allows adding annotations, making the data model self-descriptive. Performance through pushdown: By moving logic into the CDS (and thus into SQL on the database), you reduce the workload on the ABAP layer. The database can apply filters, joins, and computations in parallel, using its optimized engines. Only the final result is sent back to ABAP. Secure and controlled access: CDS views integrate with the SAP authorization concept, ensuring consistent enforcement of business security rules at the data model level, rather than scattering checks in ABAP code. This means performance benefits without sacrificing governance. Tutorial: Converting an Open SQL to a CDS View (with Code) To solidify the concept, let’s walk through a simple conversion. Imagine we have an ABAP report that needs to list flight routes with the airline name. In classic ABAP, you might do this with an inner join in Open SQL as shown below: Open SQL Approach (Legacy ABAP code): Plain Text DATA: lt_flights TYPE TABLE OF zflight_info. "Structure for results SELECT scarr~carrname AS carrier, spfli~connid AS flight, spfli~cityfrom AS departure, spfli~cityto AS arrival FROM spfli INNER JOIN scarr ON spfli~carrid = scarr~carrid INTO TABLE @lt_flights ORDER BY carrname, connid. This code joins SPFLI with SCARR and populates an internal table lt_flights. It works, but the logic is embedded in the program. Now, suppose we want to reuse this same join in multiple places. We can refactor it into a CDS view: CDS View Approach: Define the view in ABAP DDL (e.g., in Eclipse ADT): Plain Text @AbapCatalog.sqlViewName: 'ZFLIGHTINF' @AccessControl.authorizationCheck: #NOT_REQUIRED define view ZFlightInfo as select from spfli inner join scarr on spfli.carrid = scarr.carrid { scarr.carrname as carrier, spfli.connid as flight, spfli.cityfrom as departure, spfli.cityto as arrival } We give the view a name ZFlightInfo. Note that this is almost identical to the Open SQL, just expressed as a view definition. Once activated, the CDS is available system-wide. Now our ABAP report can simply do: Plain Text SELECT * FROM ZFlightInfo INTO TABLE @lt_flights ORDER BY carrier, flight. The result in lt_flights will be the same. We have effectively decoupled the data retrieval logic from the program and centralized it in the DB layer. This not only improves reuse; in a HANA system, it can also improve performance. The database can better optimize a single persistent view than ad-hoc SQL scattered in code. And if we needed to adjust the join or add a new field. Performance Considerations and Best Practices When rewriting Open SQL to CDS, ABAP developers should keep a few important considerations in mind: Measure, don’t guess: Simply converting an Open SQL to a CDS view doesn’t magically speed up the query if it was already efficient. As noted earlier, for straightforward SELECTs or joins, the performance will be equivalent in many cases. The real gains come when you use CDS to do more complex processing in one go. Always use tools like ST05 SQL trace or HANA’s PlanViz to ensure the new design is actually optimal. The execution plan is what matters, not whether you wrote it in Open SQL or CDS. Avoid over-complex views: It’s possible to go overboard with stacking CDS views on top of each other. While layering is good for separation of concerns, too many nested views or excessive use of associations can lead to very complex SQL at runtime. This can confuse the optimizer or prevent predicate pushdown. Be wary of heavy calculations in a single CDS. If performance suffers, consider alternatives like ABAP Managed DB Procedures (AMDP) for really complex logic or break the problem down differently. Select only what you need: Just as with Open SQL, a CDS view should be designed to return only necessary fields and records. Don’t define a CDS with SELECT * from a wide table list the needed fields. This ensures consumer queries aren’t unknowingly pulling extra data. One common pitfall is using CDS to expose an entire table with all columns, which defeats the purpose. Instead, tailor views to use cases or use parameters in CDS to filter data. Use CDS features wisely: Leverage CDS capabilities like aggregations, calculated fields, and unions to eliminate extra work in ABAP. Reuse and consistency: Replace multiple Open SQL implementations of the same logic with a single CDS. Not only does this reuse improve maintainability, but it also means the database might handle the unified load more efficiently. SAP itself follows this approach in S/4HANA with the Virtual Data Model, hundreds of CDS views that serve as the source for Fiori apps and reports, rather than raw table access. By moving to CDS, you align your custom code to the same philosophy. Conclusion Rewriting data access from Open SQL to CDS views is a strategic move for ABAP developers aiming to maximize performance at scale. By pushing more logic to the SAP HANA database, we take full advantage of its in-memory speed and parallel processing. CDS views enable complex data gathering in one shot, reduce the load on the application server, and provide a modular, reusable data model for your SAP applications. That said, an engineer must also approach CDS with a critical eye, understanding the execution plan and ensuring that moving to CDS truly improves the situation, rather than blindly adding abstraction. Advanced ABAP development is about choosing the right tool for the job. In the case of data-intensive operations, CDS views have proven to be a powerful tool, aligning with SAP’s modern direction and delivering robust performance at scale. By rewriting your data access with CDS and following best practices, you can future-proof your ABAP code for the HANA era, achieving faster results and a cleaner, more sustainable codebase for the long run.

By Deepika Paturu
From printTriangularNumber to Duff’s Device: Mastering Java Switch Statements Old and New
From printTriangularNumber to Duff’s Device: Mastering Java Switch Statements Old and New

In this blog post, we will see how the humble Java switch statement evolved from a fall-through curiosity into a powerful expression, and how understanding its mechanics unlocks classic techniques like Duff's Device. Java's switch statement has evolved from a fall-through-prone construct into a modern expression syntax introduced in Java 14. The post traces this evolution using a concrete example, a method that computes triangular numbers by intentionally allowing execution to cascade through cases without break statements. The post also connects this behavior to Duff's Device, a 1983 loop-unrolling technique that uses deliberate fall-through to handle remainder elements before processing full blocks. A comparison of old and new switch syntax outlines trade-offs, and practical guidance is offered on when each form is appropriate. The Accidental Discovery I was prepping for the OCP Java 21 exam and stumbled across a tricky question. A method named question2 used a switch statement without any break statements. The output surprised me at first. Once I traced through it, I renamed the method to printTriangularNumber. That one rename told the whole story. This post dives into why. The Old Switch Statement The traditional switch statement has been part of Java since day one. The syntax looks like this: Java int day = 3; switch (day) { case 1: System.out.println("Monday"); break; case 2: System.out.println("Tuesday"); break; case 3: System.out.println("Wednesday"); break; default: System.out.println("Unknown"); break; } As shown above, every case ends with a break. Without it, execution does not stop. It keeps going into the next case. The old switch works on int, char, String, and enum types. Fall-Through: Feature or Bug? The most misunderstood behavior in switch is fall-through. When you omit break, execution literally falls into the next case. Java int x = 2; switch (x) { case 3: System.out.println("three"); case 2: System.out.println("two"); // jumps here case 1: System.out.println("one"); // falls through default: System.out.println("done"); // falls through } Output: Plain Text two one done Most developers treat this as a bug waiting to happen. They are not wrong. Forgetting a break is one of the most common Java mistakes. But intentional fall-through is a different story. It is a deliberate tool. And printTriangularNumber is the perfect example. printTriangularNumber: Fall-Through in Action Here is the method I renamed from question2 during my OCP prep: Java private static void printTriangularNumber(int n) { int res = 0; switch (n) { case 5: res += 5; case 4: res += 4; case 3: res += 3; case 2: res += 2; case 1: res += 1; default: break; } System.out.println(res == 0 ? "Ok, bye." : res); Let us trace through n = 4: Jumps to case 4, adds 4. res = 4 Falls to case 3, adds 3. res = 7 Falls to case 2, adds 2. res = 9 Falls to case 1, adds 1. res = 10 Hits default, breaks Output: 10 The pattern for each input: nResultFormula111232+1363+2+14104+3+2+15155+4+3+2+1 This is n * (n + 1) / 2, the triangular number formula. The fall-through is doing the summation for you. Each case accumulates the remaining values by simply not stopping. For n = 0 or any value above 5, no case matches, default fires immediately, and res stays 0. The ternary prints "Ok, bye.". I personally find it a beautiful example of using language semantics intentionally. This is also the kind of question the OCP exam loves to throw at you. The New Switch Expression (Java 14+) Java 14 introduced switch expressions as a standard feature. The arrow syntax -> eliminates fall-through entirely. Each arm is independent. Java int day = 3; String name = switch (day) { case 1 -> "Monday"; case 2 -> "Tuesday"; case 3 -> "Wednesday"; default -> "Unknown"; }; System.out.println(name); // Wednesday A few things to notice here: Switch is now an expression. It returns a value. The arrow -> replaces : and break together. No fall-through. Each arm executes independently. Multiple labels on a single arm: case 1, 7 -> "Weekend"; You can also use it inline: Java System.out.println(switch (day) { case 1, 7 -> "Weekend"; default -> "Weekday"; }); Much cleaner. Much safer. Switch Expressions With Yield Sometimes you need more than a single expression in an arm. That is where yield comes in. Java int n = 4; int result = switch (n) { case 1, 2 -> n * 10; case 3, 4 -> { int temp = n * n; System.out.println("Computing for: " + n); yield temp; // return value from block } default -> 0; }; System.out.println(result); // 16 Think of yield as the return statement for a switch block arm. You need it whenever the arm has multiple statements inside {}. A common mistake is using return instead of yield inside a switch expression block. That compiles only inside a method and it returns from the entire method, not just the switch. Always use yield inside switch expression blocks. Duff's Device: Fall-Through Taken to the Extreme Now that we understand fall-through well, let us look at the most famous intentional use of it: Duff's Device. Tom Duff invented this in 1983 to speed up memory copy operations by reducing loop branch overhead. The trick is to unroll the copy loop and use a switch to jump into the middle of it based on the remainder. In Java, we replicate it in two clean phases since Java does not allow interleaved switch+loop syntax: Java public static void duffCopy(int[] src, int[] dst, int n) { int i = 0; int rem = n % 4; // Phase 1: handle remainder via fall-through switch (rem) { case 3: dst[i] = src[i]; i++; case 2: dst[i] = src[i]; i++; case 1: dst[i] = src[i]; i++; case 0: break; } // Phase 2: full blocks of 4 int fullBlocks = (n - rem) / 4; while (fullBlocks-- > 0) { dst[i] = src[i]; i++; dst[i] = src[i]; i++; dst[i] = src[i]; i++; dst[i] = src[i]; i++; } } Let us trace through n = 13: rem = 13 % 4 = 1 Switch jumps to case 1, copies 1 element. i = 1 fullBlocks = (13 - 1) / 4 = 3 Loop runs 3 times, copying 4 elements each time Total: 1 + 12 = 13 elements The Python equivalent makes the two phases explicit: Python def duff_copy(src, n): dst = [None] * n rem = n % 4 for i in range(rem): # Phase 1: remainder dst[i] = src[i] i = rem while i < n: # Phase 2: full blocks dst[i] = src[i] dst[i+1] = src[i+1] dst[i+2] = src[i+2] dst[i+3] = src[i+3] i += 4 return dst The connection to printTriangularNumber is direct. Both use fall-through intentionally. In printTriangularNumber, the switch jumps to the right case and accumulates downward. In Duff's Device, the switch jumps to the right case and copies the remainder before the main loop takes over. Old vs. New Switch at a Glance FeatureOld Switch (:)New Switch (->)Fall-throughYes (default)NoReturns valueNoYesbreak neededYesNoMultiple labelsNoYes (case 1, 2 ->)Block with yieldNoYesNull safeNoYes (Java 21 preview)OCP exam topicYesYes Which One Should You Use? For new code, always prefer the switch expression with ->. It is safer, cleaner, and expressive. Your reviewers will thank you. Reserve the old switch with fall-through only when you genuinely need the cascading behavior, like in printTriangularNumber or a hand-tuned loop like Duff's Device. In those cases, add a comment explaining the intent. Otherwise, the next developer (including future you) will assume the break is missing by accident. My personal observation: the OCP Java 21 exam tests both heavily. Knowing when fall-through is intentional versus accidental is the key distinction examiners probe. Make sure you can trace through any switch block without running it. Happy testing! What is your take: is intentional fall-through clever engineering or a maintenance nightmare waiting to happen? Drop your thoughts below!

By NaveenKumar Namachivayam DZone Core CORE
Jakarta NoSQL: Why JPA Is Not Enough for the AI Era
Jakarta NoSQL: Why JPA Is Not Enough for the AI Era

The most effective way to present this idea is to begin with the challenge architects face: AI has transformed the persistence landscape. Enterprise applications were once built almost exclusively on relational databases, making JPA a keystone of Jakarta EE. Today, modern systems use a mix of relational databases, document stores, caches, graph engines, and increasingly, vector databases that support semantic search, retrieval-augmented generation (RAG), and AI-powered applications. Polyglot persistence is now the industry standard. While Jakarta EE standardized relational persistence through JPA, it still lacks a vendor-neutral standard for non-relational persistence. This gap forces developers to rely on fragmented, proprietary solutions, creating barriers to portability, productivity, and innovation. The rise of AI makes this gap critical. Vector databases are now essential to intelligent systems, supporting semantic search, embeddings, and contextual retrieval. For Jakarta EE to remain the leading enterprise Java platform in the AI era, it must offer a standardized approach to NoSQL persistence, as it did for relational databases. Jakarta NoSQL is not just another specification; it constitutes a strategic investment in the ecosystem's future. By offering a familiar programming model, reducing vendor lock-in, and integrating with AI workloads, Jakarta NoSQL ensures that Jakarta EE remains relevant and competitive for the next generation of enterprise applications. NoSQL in the AI Era: Understanding the Modern Data Landscape For years, enterprise data persistence focused on relational databases. Systems relied on tables, rows, foreign keys, and SQL, making relational technology the standard for business applications. While still essential, modern architectures now use polyglot persistence, where multiple database types coexist, each satisfying specific requirements. Today, NoSQL refers to a family of database paradigms, each engineered for specific workloads and architectural needs, rather than just document databases. Key-value databases store data as key-value pairs, enabling fast lookups and low latency. Typical uses include caching, user sessions, feature flags, and temporary application state.Document databases store data as structured documents, such as JSON or BSON. They are effective for applications having hierarchical or evolving schemas, including web applications, e-commerce platforms, and content management systems.Column-family databases organize data by columns instead of rows, supporting high write throughput and horizontal scalability. They are used for IoT telemetry, event logging, analytics, and large-scale distributed systems.Graph databases model entities and relationships as nodes and edges. This structure is ideal for social networks, fraud detection, recommendation engines, dependency analysis, and knowledge graphs in which relationships are critical.Vector databases store high-dimensional embeddings from machine learning models and large language models (LLMs). They enable semantic search, similarity matching, retrieval-augmented generation (RAG), recommendation platforms, and other AI-driven features via understanding meaning instead of exact text matches.Time-series databases specialize in timestamped data that changes over time. They are used for observability, monitoring, financial markets, industrial sensors, and operational metrics where high-performance temporal data storage and analysis are essential. These database types often coexist within the same architecture. Modern applications may use PostgreSQL for transactions, Redis for caching, MongoDB for documents, Neo4j for relationships, InfluxDB for telemetry, and a vector database like Milvus, Pinecone, or Weaviate for AI-powered search and retrieval. This approach, known as polyglot persistence, is now standard in enterprise systems. The industry has embraced this shift. The Stack Overflow Developer Survey shows that while relational databases still dominate enterprise workloads, NoSQL technologies are now standard tools for developers. Technologies like Redis, MongoDB, and Elasticsearch are used alongside PostgreSQL and MySQL. Organizations no longer choose between SQL and NoSQL; instead, they combine multiple persistence technologies to leverage their strengths. Polyglot persistence is now the baseline for modern software systems. Vector databases are especially important among NoSQL categories, as they are basic to modern Artificial Intelligence systems. In contrast to traditional databases that store explicit business data, vector databases store numerical representations called embeddings. Generated by machine learning models, these embeddings encode the semantic meaning of words, documents, images, or other content as mathematical vectors. This enables software to search and retrieve information based on meaning rather than exact text matches. The distinction between lexical and semantic search illustrates the significance of vector databases. For example, a traditional SQL search for “Pet” returns records with that exact term, such as “Pet Shop,” but ignores related expressions like “Dog” or “Puppy.” Semantic search, by comparing embeddings, retrieves documents about dogs, puppies, or animal companions because it recognizes their semantic relationship. The search engine matches meaning, not just syntax. This function is vital for modern AI architectures. Large language models do not process relational tables directly; they use embeddings and contextual connections between concepts. Systems such as retrieval-augmented generation (RAG), enterprise knowledge search, recommendation engines, and intelligent assistants depend on similarity searches across millions of vectors. While relational databases can support some vector operations through extensions, vector databases are purpose-built for these workloads, offering optimized indexing and similarity algorithms for large-scale semantic retrieval. As AI adoption grows, vector databases are becoming a strategic component of enterprise architecture. Appreciating the importance of NoSQL, several Java ecosystems have developed their own solutions. Spring offers independent projects like Spring Data MongoDB, Spring Data Redis, and Spring Data Cassandra. These integrations provide a productive programming model but are tightly coupled to the Spring ecosystem. Quarkus supports NoSQL persistence through Panache and database-specific integrations, emphasizing developer productivity and cloud-native deployment. Micronaut Data supports several NoSQL engines, using compile-time code generation and ahead-of-time processing to improve performance and reduce execution overhead. While these solutions are effective, they remain framework-specific rather than platform standards. Developers switching frameworks encounter different APIs, abstractions, annotations, and operational models, even when solving similar persistence challenges. Jakarta EE addressed this for relational persistence with Jakarta Persistence (JPA), delivering a standardized, vendor-independent programming model. As NoSQL technologies expand and AI workloads more and more depend on vector databases, the lack of a vendor-neutral NoSQL standard is a significant gap in the Jakarta ecosystem. The Java Standardization Journey The need for a standardized NoSQL solution in the Java ecosystem has been discussed for years. During the Java EE era, several proposals tried to integrate non-relational databases into the enterprise platform. As NoSQL technologies grew in popularity throughout the 2010s, developers anticipated a dedicated specification to accompany traditional enterprise APIs at JavaOne conferences. Despite clear demand, no such initiative emerged within Java EE. The platform remained focused on relational persistence via JPA, leaving NoSQL adoption to rely on vendor-specific libraries and framework integrations. The transition of Java EE to the Eclipse Foundation provided an opportunity to address this challenge. Instead of waiting for a platform-level solution, the community launched Eclipse JNoSQL, an open-source project supplying a unified programming model for NoSQL databases. Drawing on JPA's success, Eclipse JNoSQL introduced mapping annotations, repositories, templates, and communication APIs that support document, key-value, column-family, and graph databases. The project showed that a consistent developer experience could be attained without compromising each database model's unique features. As Jakarta EE matured, Eclipse JNoSQL became the foundation for a new standardization effort: Jakarta NoSQL. Jakarta NoSQL was the first persistence specification created entirely within the Jakarta EE process. Unlike earlier specifications that migrated from Java EE, Jakarta NoSQL was conceived, developed, and released under the Eclipse Foundation governance model. It was among the first to complete the full Jakarta Specification Process from inception to release. Jakarta NoSQL's impact extended beyond its initial scope. During development, the expert group identified a common challenge for both relational and non-relational databases: developers needed a consistent repository abstraction independent of the underlying persistence engine. This led to the creation of a separate specification, Jakarta Data. The need to standardize NoSQL access patterns directly influenced the development of Jakarta Data's repository-oriented programming model, which applies across multiple persistence technologies. The relationship between these specifications highlights Jakarta NoSQL's broader influence on the Jakarta EE ecosystem. Jakarta NoSQL focuses on mapping and interacting with non-relational databases, while Jakarta Data delivers a unified repository abstraction for both relational and NoSQL implementations. Together, they significantly reduce fragmentation in enterprise persistence. This evolution continued beyond Jakarta Data. The drive to standardize modern persistence requirements has inspired new specifications, such as Jakarta Query, which aims to deliver a portable, type-safe, and expressive query language for various persistence technologies. As the Jakarta ecosystem grows, Jakarta NoSQL acts as a key milestone. It addressed the long-standing absence of a NoSQL standard and helped lay the foundation for the next generation of persistence specifications within Jakarta EE. Jakarta NoSQL: Built for NoSQL, Not Adapted to It When architects consider standardizing NoSQL development in Jakarta EE, a common question arises: why not extend Jakarta Persistence (JPA) to support NoSQL databases? JPA has long provided a unified programming model for relational databases in the Java ecosystem. The answer is based on a core architectural principle: tools should be optimized for their intended purpose. The first challenge is that JPA was designed specifically for relational databases, relying on concepts like tables, columns, joins, foreign keys, and transactional consistency. These are not simply implementation details but core elements of the specification. Forcing document, graph, key-value, or vector databases into this model creates friction and limits the use of each database’s native features. The second challenge is that NoSQL systems behave fundamentally differently. Graph databases perform path traversals, document databases store nested structures without normalization, key-value databases focus on fast lookups, and vector databases handle similarity calculations. These systems also differ in consistency, transactions, query languages, indexing, and scalability capabilities. Representing all these paradigms through a single relational abstraction leads to compromises. The third challenge is the importance of specialization. As Abraham Maslow noted, “if the only tool you have is a hammer, it is tempting to treat everything as if it were a nail.” Relational databases are effective, but not ideal for every persistence need. Semantic search, graph traversal, and high-volume telemetry storage are not relational problems. Applying a relational abstraction to all database types runs the risk of losing the unique optimizations each technology provides. Examine the analogy of transportation: cars, boats, submarines, and airplanes all address transportation but are specialized for different environments. Forcing them to use the same controls would result in mediocrity across all. Similarly, a single persistence abstraction may remove the features that make each database effective. Therefore, Jakarta NoSQL does not extend JPA beyond its intended scope. Instead, it offers a dedicated persistence model for non-relational databases, while continuing to maintain the familiar developer experience that contributed to JPA’s success. A key design goal of Jakarta NoSQL is to reduce mental effort for enterprise Java developers. Teams experienced with JPA should find the specification immediately approachable, as Jakarta NoSQL intentionally uses familiar terminology and concepts from the Jakarta EE community. Developers will encounter annotations like @Entity, @Id, and @Column, enabling a smooth transition from relational to non-relational persistence. Java @Entity public class Car { @Id private Long id; @Column private String name; @Column private CarType type; } At first glance, this entity closely resembles a JPA entity, which is intentional. However, the underlying implementation is fundamentally different. Jakarta NoSQL is built to support schema flexibility, embedded structures, nested documents, and database-specific storage models. This approach is reflected throughout the API. Instead of requiring developers to oversee low-level driver details, Jakarta NoSQL offers a high-level programming model via the Template API. Java @Inject Template template; Car ferrari = Car.builder() .id(1L) .name("Ferrari") .build(); template.insert(ferrari); List<Car> sports = template.select(Car.class) .where("type").eq(CarType.SPORT) .orderBy("name") .result(); The objective mirrors JPA’s original mission: permitting developers to focus on domain models and business logic, rather than serialization, connection management, or vendor-specific APIs. This foundation shaped Jakarta NoSQL 1.0. The initial release introduced the mapping layer, CDI integration, repository support, template operations, and standardized endpoints for four major NoSQL categories: Document databasesKey-value databasesColumn-family databasesGraph databases Jakarta NoSQL 1.0 showed that a unified Java programming model can respect the particular characteristics of each database family. Jakarta NoSQL 1.1 continued this evolution. While version 1.0 focused on mapping and persistence, version 1.1 expanded querying capabilities through integration with Jakarta Query. A key addition is support for parameterized queries, letting developers to safely bind parameters instead of manually constructing query strings. Java List<Car> cars = template.query( "FROM Car WHERE type = :type") .bind("type", CarType.SPORT) .result(); Version 1.1 also introduces projection support, allowing applications to retrieve lightweight views instead of entire entities. Java @Projection public record TechCarView( String name, CarType type) { } List<TechCarView> views = template .typedQuery( "FROM Car WHERE type = 'SPORT'", TechCarView.class) .result(); These features improve performance, reduce data transfer, and comply with modern Java features such as records. An important aspect of Jakarta NoSQL is its long-term architectural vision. While most developers use the mapping layer, the specification also defines a lower-level communication API for advanced scenarios. Java DocumentManagerFactory factory = ...; DocumentManager manager = factory.get("users"); DocumentRecord record = ...; manager.put(record); Optional<DocumentRecord> result = manager.findByKey("user:10"); manager.deleteByKey("user:10"); This communication layer is optional. Application developers can build complete systems without it, but it is valuable for database vendors, framework authors, and advanced integrations needing direct access to database capabilities. This design is fundamentally different from JDBC, which assumes communication through SQL statements and tabular result sets. That model works well because relational databases share a common language and interaction pattern. NoSQL databases do not. Document databases may use BSON, graph databases may offer traversal languages, and vector databases may provide similarity-search APIs. Others use REST endpoints, binary protocols, gRPC streams, or vendor-specific mechanisms. Forcing these models into a JDBC-style abstraction would limit their capabilities or demand ongoing vendor-specific extensions. For this reason, Jakarta NoSQL uses a layered architecture. The mapping layer offers a portable, productive programming model for developers, while the communication layer remains flexible to support diverse NoSQL systems. This architecture positions the specification for future growth. As new technologies like vector databases, time-series engines, and AI-native storage emerge, Jakarta NoSQL can evolve without imposing a relational mindset. Rather than treating every database as a nail for the JPA hammer, Jakarta NoSQL recognizes that different problems require different tools, while still presenting a consistent and familiar experience for enterprise Java developers.

By Otavio Santana DZone Core CORE
Optimizing Databricks Spark Pipelines Using Declarative Patterns
Optimizing Databricks Spark Pipelines Using Declarative Patterns

If you've ever inherited a Spark job that runs in 35 minutes and someone asks you to make it faster, you know the routine. You start by checking partition counts, then file sizes, then shuffle stages, then broadcast hints. You find a handwritten OPTIMIZE schedule from 2022, a Z-ORDER on the wrong column, and a cluster sized for last year's data volume. By the time you've made the job fast, you've absorbed three new things to maintain. The next person to inherit it will absorb four. This pattern — call it the hand-tuning treadmill — is what the declarative optimization story on Databricks is trying to break. It's not a single feature; it's a cluster of capabilities that collectively let teams describe what a table should look like and let the engine handle the physical optimizations. What follows is the practical view of those patterns: where they fit, what they replace, and how to migrate without a rewrite weekend. 1. The Hand-Tuning Treadmill: Why Imperative Optimization Doesn't Scale Before getting into the declarative side, it's worth being concrete about what "imperative Spark optimization" actually means in production. The shape is consistent across teams I've audited: Layout decisions frozen on day one. Somebody picks a partition column when the table is created. The data shape changes a year later. Nobody re-partitions because the migration is scary. Query plans drift toward full scans.Maintenance jobs that nobody owns. An OPTIMIZE / Z-ORDER / VACUUM script lives in a notebook scheduled at 3 AM. It runs on a cluster that's slightly mis-sized. When data volume grows, the job runs into the morning workload, and people complain about latency.Cluster sizing as a guess. Worker count is a heuristic from a senior engineer's memory of last year's spike. Half the time it's too big, half the time it's too small, and the cost discussion gets emotional.Hint-driven plans. Broadcast hints, repartition hints, coalesce (N) — sprinkled through pipelines to fix yesterday's problem, kept indefinitely because removing them feels risky. None of these are bugs. They're symptoms of the imperative model: the team owns the layout, the maintenance, the sizing, and the plan tuning. In small pipelines, ownership is fine. At scale, it becomes the bottleneck that the team can't outsource. 2. What "Declarative" Means in the Spark Optimization Context Declarative is a word that gets used in two different ways here, and it's worth pulling them apart. Within Lakeflow pipelines (formerly DLT), it means "describe the tables, not the steps" — the engine builds the DAG and runs it. But in the broader optimization story, declarative also means "describe the desired property of the table or workload, not the operations to maintain it": Layout: I want this table clustered by these columns; figure out when and how to re-cluster.Maintenance: I want this table optimized and vacuumed; figure out the schedule.Ingestion: I want all new files in this path picked up exactly once; figure out checkpointing and listing.Quality: These rows must satisfy these expectations; enforce them and report what gets dropped.Compute: I want this query fast and not wasteful; size and scale appropriately. Each one of those bullets corresponds to a piece of the declarative stack. Used together, they replace a remarkable amount of the boilerplate that has historically lived in Spark pipelines. The mental shift: You stop writing operations against the table and start writing properties of the table. The engine becomes the actor; you become the editor. 3. The Declarative Optimization Stack on Databricks The chart below maps each thing the team declares to the engine capability that handles it, ending at the physical Delta table. It's the picture I draw on whiteboards when teams ask, "What's the order to adopt these in?" Figure 1. The declarative optimization stack: each user-facing intent at the top maps to a continuous engine behavior, which keeps the underlying Delta tables well-clustered, compacted, and statistically up-to-date — without human intervention. Two things are worth highlighting in this picture. First, every box in the engine row is something that runs continuously, not on a cron — there is no daily "optimization window" anymore. Second, the bottom layer is identical to what you'd get from any well-tuned imperative pipeline: 256 MB Parquet files with current statistics. The declarative path doesn't change what good looks like; it changes who does the work to keep things looking good. 4. Layout: Liquid Clustering Replaces Hand-Maintained Z-ORDER Liquid Clustering is the change with the largest practical impact, because partition-key choices are where most lakehouse pipelines accumulate the most technical debt. The declarative version: you specify the columns the data is most often filtered or joined by, and the engine maintains a layout that supports those access patterns — incrementally, as new data arrives, without a full rewrite. When access patterns change, you change the cluster columns, and the engine re-clusters in the background. Defining Liquid-Clustered Tables SQL -- New table, clustered by the columns most commonly filtered on. -- No more PARTITIONED BY, no more guessing at partition cardinality. CREATE TABLE prod.gold.daily_totals ( account_id STRING, region STRING, ingest_date DATE, daily_total DECIMAL(18,2), txn_count BIGINT ) USING DELTA CLUSTER BY (region, ingest_date, account_id); -- Even better: let the engine pick the clustering columns by -- observing real query patterns over time. CREATE TABLE prod.gold.events_clustered USING DELTA CLUSTER BY AUTO AS SELECT * FROM prod.silver.events; Migrating an Existing Partitioned/Z-ORDER Table SQL -- Convert a legacy partitioned table to liquid clustering. -- Existing data files are not rewritten immediately; the engine -- rebalances incrementally on subsequent writes + maintenance. ALTER TABLE prod.silver.transactions CLUSTER BY (account_id, ingest_date); -- Force the first clustering pass for a freshly converted table OPTIMIZE prod.silver.transactions FULL; Why this matters: the recurring 2 AM Slack thread of "can we re-partition this table?" goes away. Layout becomes a property you change with one DDL statement, not a multi-week rewrite project. 5. Maintenance: Predictive Optimization Replaces Cron-Driven OPTIMIZE/VACUUM Predictive optimization is the part that retired the most legacy code in the pipelines I've migrated. Once enabled at the catalog or schema level, the engine monitors each table's read and write patterns and decides on its own when to compact files, re-cluster, vacuum, and refresh statistics. The big win isn't the operations themselves — the imperative pipeline could already run those — it's that the timing is observed-driven, not schedule-driven. Tables that get heavy ingestion get more frequent maintenance. Cold tables get left alone. SQL -- Turn it on at the catalog level once; new tables inherit. ALTER CATALOG prod SET PREDICTIVE OPTIMIZATION = ENABLED; -- Or at the schema level for a phased rollout ALTER SCHEMA prod.gold SET PREDICTIVE OPTIMIZATION = ENABLED; -- Inspect what the engine has been doing on a given table SELECT operation, operation_metrics.numFilesAdded AS files_added, operation_metrics.numFilesRemoved AS files_removed, operation_metrics.numOutputBytes AS output_bytes, timestamp FROM (DESCRIBE HISTORY prod.gold.daily_totals) WHERE userMetadata IS NULL -- engine-driven, not user AND operation IN ('OPTIMIZE', 'VACUUM') AND timestamp >= current_timestamp() - INTERVAL 7 DAYS ORDER BY timestamp DESC; What you should delete after enabling this: the nightly notebook that runs OPTIMIZE on every table in a schema, the VACUUM cron job, the ANALYZE TABLE wrapper, and the alerting that wakes someone up when those jobs run long. None of them are needed anymore, and leaving them on creates duplicate work that the engine and the cron will fight over. 6. Ingestion: Auto Loader Replaces Listing-Based File Detection Auto Loader is the declarative answer to the perennial "which files have we processed already?" problem. Instead of listing a directory, comparing it to a state file, and figuring out the new bits, you describe the source location and the format and let the engine maintain its own incremental state. It uses cloud-native event notifications (S3 events, ADLS notifications, or efficient directory listing as a fallback), and the checkpoint is just another piece of state the engine owns. Python from pyspark.sql.functions import current_timestamp # Streaming ingest from S3 with schema inference + evolution. # Replaces hand-maintained checkpointing, listing logic, and # whatever file-tracking table the team built two years ago. (spark.readStream .format("cloudFiles") .option("cloudFiles.format", "json") .option("cloudFiles.inferColumnTypes", "true") .option("cloudFiles.schemaLocation", "s3://acme-checkpoints/txns_schema") .option("cloudFiles.schemaEvolutionMode", "addNewColumns") .load("s3://landing/txns/") .withColumn("_ingest_ts", current_timestamp()) .writeStream .format("delta") .option("checkpointLocation", "s3://acme-checkpoints/txns_writer") .trigger(availableNow=True) # batch-style; runs to completion .toTable("prod.bronze.txns")) Two notes from production. First, schemaEvolutionMode is the option that prevents the silent-data-loss class of bugs when partner schemas change; pick the policy explicitly rather than letting it default. Second, trigger(availableNow=True) gives you batch ergonomics on a streaming source — the job runs until it has consumed everything and exits, which is what most teams actually want for daily ingestion. 7. Transforms and Quality: Declarative Pipelines Replace Bare Spark + External DQ The final piece is the transformation layer. Lakeflow pipelines (the rebrand of Delta Live Tables) let you declare each table as a Python or SQL definition, and add expectations as a first-class concept. The engine derives the DAG from the dependencies and enforces the expectations on every write — the data quality framework, the lineage layer, and the orchestration glue collapse into a single artifact. Python import dlt from pyspark.sql.functions import sum as _sum, col @dlt.table( name="silver_txns", table_properties={ "delta.enableChangeDataFeed": "true", "delta.tuneFileSizesForRewrites": "true", }, cluster_by=["account_id", "ingest_date"], ) @dlt.expect_or_drop("non_null_amount", "amount IS NOT NULL") @dlt.expect_or_fail("valid_currency", "currency IN ('USD','EUR','GBP')") @dlt.expect("unique_txn", "txn_id IS NOT NULL") def silver_txns(): return (dlt.read_stream("bronze_txns") .dropDuplicates(["txn_id"])) @dlt.table(name="gold_daily_totals") def gold_daily_totals(): return (dlt.read("silver_txns") .groupBy("ingest_date", "account_id", "region") .agg(_sum("amount").alias("daily_total"))) The decorators do four things at once: define the table, declare its layout (cluster_by), declare its quality rules, and let the engine infer that gold_daily_totals depends on silver_txns from the dlt.read call. There is no DAG file. There is no separate Great Expectations suite. Lineage is generated for free in Unity Catalog, including column-level edges. If you want to query how the expectations have been performing — useful for SLO dashboards or alerting — the event log surfaces it directly: SQL -- Pass / fail / drop counts per expectation, last 24 hours SELECT flow_name, details:flow_progress.data_quality.expectations[0].name AS exp_name, details:flow_progress.data_quality.expectations[0].passed_records AS passed, details:flow_progress.data_quality.expectations[0].failed_records AS failed, details:flow_progress.data_quality.expectations[0].dropped_records AS dropped, timestamp FROM event_log("<pipeline-id>") WHERE event_type = 'flow_progress' AND timestamp >= current_timestamp() - INTERVAL 1 DAY ORDER BY timestamp DESC; 8. Putting It Together: Where to Start, What to Measure Adopting all of this at once is a recipe for pain. The order I've seen work, and a small set of metrics to verify the change is paying off: Step Adopt Retire Verify with 1 Predictive optimization at schema level Nightly OPTIMIZE / VACUUM jobs Reduction in maintenance-cluster cost 2 Liquid clustering on top 5 tables Static partitioning + Z-ORDER p95 query latency on the same workloads 3 Auto loader for 1-2 ingestion pipelines Custom file-tracking + listing logic End-to-end data freshness 4 Lakeflow pipelines for new pipelines only External DQ + DAG glue (for new work) Lines of pipeline code per table 5 Serverless compute for SQL warehouses + DLT Hand-sized job clusters Cost-per-query, scale-up time What you do not need to migrate: imperative pipelines that already work and aren't growing. Declarative patterns are about new work and high-pain hot spots, not a heroic rewrite of every notebook ever shipped. 9. Honest Limitations and Where Imperative Still Wins Three places where the declarative model still bites — worth knowing before you commit: Procedural logic still belongs in Jobs. If your pipeline is really a sequence of API calls with branching error handling, that's a Lakeflow Job (or external code), not a declarative table. Don't try to bend dlt around it.Predictive optimization needs observation time. On a table that's a week old, the engine hasn't seen enough patterns to make great decisions. For tables under heavy initial load, an explicit OPTIMIZE FULL after the first big ingest still helps.Cluster-by-column choice still matters. CLUSTER BY AUTO is great for stable workloads with predictable filters. For tables whose access pattern is genuinely heterogeneous across teams, an explicit cluster-by based on the dominant query is usually faster.Hint-driven escapes are still allowed. If a particular query benefits from a /*+ BROADCAST(t) */ hint and AQE isn't catching it, the hint is fine. Just keep them rare and document why. Conclusion The declarative optimization story isn't a single feature you toggle — it's a quiet shift in who owns the boring parts of a Spark pipeline. Layout, maintenance, ingestion bookkeeping, plan tuning, cluster sizing, data quality enforcement: every one of those was traditionally a thing the team owned and paid for in toil. The current Databricks stack lets you express each as an intent and let the engine handle the operations underneath. Adopt them in order, retire what they replace, and the optimization treadmill slows from a daily concern to a quarterly review. That's the actual win, and it's the reason the declarative paradigm has gone from a Lakeflow detail to the default mental model for new pipelines on Databricks.

By Seshendranath Balla Venkata
Top Java Security Vulnerabilities and How to Prevent Them in Modern Java
Top Java Security Vulnerabilities and How to Prevent Them in Modern Java

With the increasing number of security threats, organizations have invested heavily in cybersecurity initiatives to protect their applications, infrastructure, and sensitive data. Security vulnerabilities are rarely introduced intentionally. Most of them creep into applications through shortcuts, overlooked edge cases, outdated libraries, or some bad coding habits. Modern Java has significantly improved its security capabilities, but no framework or JVM version can completely protect an application from insecure coding practices. As developers, we still need to understand where vulnerabilities originate and how to prevent them before they reach production. In this article, I am trying to summarize some of the most common Java security vulnerabilities and practical techniques used to prevent them. These are the same security best practices and lessons learned that I frequently share with new team members joining my team. I am sharing them here in the hope that they can serve as a practical handbook for Java developers looking to build more secure applications. 1. SQL Injection SQL injection remains one of the oldest and most dangerous vulnerabilities. It occurs when user input is directly concatenated into SQL statements. Consider the following example: Java String query = "SELECT * FROM users WHERE username = '" + username + "'"; Statement stmt = connection.createStatement(); ResultSet rs = stmt.executeQuery(query); If an attacker enters, the query can be manipulated to return unintended results. SQL admin' OR '1'='1 Prevention Always use parameterized queries. Java String query = "SELECT * FROM users WHERE username = ?"; PreparedStatement stmt = connection.prepareStatement(query); stmt.setString(1, username); ResultSet rs = stmt.executeQuery(); Prepared statements separate data from executable SQL, eliminating injection opportunities. 2. Hardcoded Secrets One of the most common findings during security reviews is hardcoded credentials. Java private static final String API_KEY = "abcd123456789"; This may seem harmless during development, but once committed to source control, secrets often remain exposed indefinitely. Prevention Store secrets externally. SQL String apiKey = System.getenv("PAYMENT_API_KEY"); Better alternatives are to include it in AWS Secrets Manager, Azure Key Vault, HashiCorp Vault, or Kubernetes Secrets. Secrets should never live inside source code repositories. 3. Insecure Deserialization Java serialization has been responsible for numerous security incidents. Example: Java ObjectInputStream input = new ObjectInputStream(request.getInputStream()); Object obj = input.readObject(); The danger is that attackers can craft malicious serialized objects that execute unexpected code during deserialization. Prevention Avoid Java serialization whenever possible. Prefer formats such as JSON, XML (with secure parsing), or Protocol Buffers. Example using Jackson: Java ObjectMapper mapper = new ObjectMapper(); User user = mapper.readValue(json, User.class); Using structured formats reduces attack surfaces significantly. 4. Cross-Site Scripting (XSS) Although often associated with front-end applications, backend services can accidentally enable XSS vulnerabilities when user-generated content is returned without sanitization. Example: Java String comment = request.getParameter("comment"); response.getWriter().write(comment); If the user submits, the browser executes the script. HTML <script>alert('Hacked')</script> Prevention Always encode output. Using Spring: Java String safeComment = HtmlUtils.htmlEscape(comment); Additionally, validate inputs, sanitize rich text, and implement Content Security Policies (CSP). 5. Path Traversal Attacks File download functionality often introduces path traversal vulnerabilities. Example: Java String file = request.getParameter("file"); Path path = Paths.get("/documents/" + file); An attacker could submit and potentially access sensitive files. Shell ../../../etc/passwd Prevention Normalize and validate paths. Java Path base = Paths.get("/documents"); Path resolved = base.resolve(file).normalize(); if (!resolved.startsWith(base)) { throw new SecurityException( "Invalid file path"); } Never trust file names coming directly from user input 6. Weak Password Storage Storing passwords improperly remains surprisingly common. Bad practice: Java String passwordHash = DigestUtils.md5Hex(password); MD5 and SHA-1 are no longer considered secure for password storage. Prevention Use adaptive hashing algorithms. Example with BCrypt: Java BCryptPasswordEncoder encoder = new BCryptPasswordEncoder(); String hash = encoder.encode(password); BCrypt automatically includes salting and work-factor adjustments. Other strong alternatives include Argon2, PBKDF2 or SCrypt 7. Dependency Vulnerabilities Modern Java applications often contain more third-party code than custom code. A secure application can still become vulnerable because of outdated dependencies. Prevention Integrate dependency scanning into CI/CD pipelines. Example Maven plugin: XML <plugin> <groupId>org.owasp</groupId> <artifactId>dependency-check-maven</artifactId> </plugin> Additionally, tools such as Snyk can automatically identify known vulnerabilities. We have been using Snyk for the last couple of years, and it is effective. Regular dependency updates should be part of every release cycle. 8. Improper Logging of Sensitive Data Developers often log information for troubleshooting without considering security implications. Example: Java logger.info( "Login request received for user={} password={}", username, password); This exposes credentials inside log files. Prevention Mask or exclude sensitive information. Java logger.info( "Login request received for user={}", username); Never log passwords, access tokens, credit card information, Personal health information (PHI), or PII information. This is especially important in regulated industries such as healthcare, like ours. 9. Insufficient Authentication and Authorization Authentication verifies identity, and authorization determines access. Many applications perform authentication correctly but fail to enforce authorization consistently. Example: Java @GetMapping("/admin/users") public List<User> getUsers() { return userService.findAll(); } Without authorization checks, any authenticated user might gain access. Prevention Use role-based security. Java @PreAuthorize("hasRole('ADMIN')") @GetMapping("/admin/users") public List<User> getUsers() { return userService.findAll(); } Security should be enforced at every layer, not just the UI. 10. Lack of Input Validation Many vulnerabilities originate from accepting unexpected input. Example: Java String age = request.getParameter("age"); int userAge = Integer.parseInt(age); Invalid input can cause exceptions or unexpected behavior. Prevention Validate all external input. Java @Min(18) @Max(120) private Integer age; Bean Validation provides a simple and consistent approach for validating request payloads. Never assume user input is safe. Final Thoughts Security is not a feature that can be added at the end of a project. It needs to be part of the development process from the very beginning. The vulnerabilities discussed here are not theoretical. They are among the most common findings during security assessments, penetration tests, and production incident investigations. Fortunately, modern Java provides mature frameworks, libraries, and tools that make secure development significantly easier than it was a decade ago. The key is building security awareness into everyday development practices: Use parameterized queriesProtect secrets properlyValidate all inputsKeep dependencies updatedApply strong authentication and authorizationLog responsiblyContinuously scan for vulnerabilities Security is ultimately about reducing risk. Small improvements applied consistently across a codebase can prevent incidents that would otherwise become expensive lessons later.

By Muhammed Harris Kodavath
OpenAPI, ORM, SVG, and Lottie
OpenAPI, ORM, SVG, and Lottie

This is the third follow-up to Friday's release post. Saturday's was about how you iterate; yesterday's was about new platform APIs in the core; today's is about a run of pieces that change how you write the structural parts of an app. The pieces are an OpenAPI client generator, a SQLite ORM, JSON and XML mappers, a component binder with validation, build-time SVG and Lottie transcoders, and a declarative router with deep links. All ride on a single build-time codegen pipeline: a Maven-plugin pass that reads annotations or declarative source files at build time and emits typed Java that compiles into your binary. No reflection, no service loader, no Class.forName. The "How it works" section at the end of this post covers the codegen plumbing once you have seen what it powers. OpenAPI Client Generation The headline of this release for any team that talks to a backend. A new cn1:generate-openapi-client Mojo reads an OpenAPI 3.x JSON spec (a URL or a local file) and writes typed Codename One client code that compiles into your app: One @Mapped POJO per components.schemas entry.One <Tag>Api.java class per OpenAPI tag, with one fluent method per operation.Every method routes through Rest.<verb> + Mappers.toJson + fetchAsMapped / fetchAsMappedList, so the generated surface integrates with the rest of the framework instead of dragging in a separate HTTP stack. Wire it into the project's pom.xml: XML <plugin> <groupId>com.codenameone</groupId> <artifactId>codenameone-maven-plugin</artifactId> <executions> <execution> <id>petstore-client</id> <goals><goal>generate-openapi-client</goal></goals> <configuration> <specUrl>https://petstore3.swagger.io/api/v3/openapi.json</specUrl> <basePackage>com.example.petstore</basePackage> </configuration> </execution> </executions> mvn generate-sources picks the spec up, downloads it, and writes one file per schema and one per tag under target/generated-sources/. The Petstore reference spec exercised end-to-end produces six model classes (Pet, Order, Customer, Tag, Category, User) and three API classes (PetApi, StoreApi, UserApi), and the nine generated .class files compile cleanly against codenameone-core. Documented at the OpenAPI codegen Maven goal. In application code you call the generated Api class the same way you would call any other Java method: Java PetApi pets = new PetApi(); // Returns AsyncResource<Pet>; resolves with the deserialised object. pets.getPetById(42).onResult((pet, err) -> { if (err == null) Log.p("Got " + pet.getName()); }); // Returns AsyncResource<List<Pet>>. pets.findPetsByStatus("available").onResult((list, err) -> { if (err == null) { for (Pet p : list) Log.p(p.getName()); } }); // POST with a request body. addPet takes a Pet, returns a Pet. Pet candidate = new Pet(); candidate.setName("Mittens"); candidate.setStatus("available"); pets.addPet(candidate).onResult((created, err) -> { /* ... */ }); There is no hand-rolled ConnectionRequest setup, no manual JSON parsing, no string-typed request bodies. The generated client takes a typed Pet, serializes it with Mappers.toJson(...), fires the right HTTP verb, deserializes the response with Mappers.fromJson(...), and surfaces the result through the framework's AsyncResource so your callback fires on the EDT. For teams who already publish an OpenAPI spec as part of their backend (most modern backend frameworks do this automatically; FastAPI, Spring's springdoc-openapi, NestJS, ASP.NET Core, Go's gnostic), the practical effect is that the mobile client's bindings stay in sync with the backend without anyone hand-writing a single network call. Update the spec, re-run mvn generate-sources, and the new and changed endpoints land in your app as typed Java; the IDE picks up immediately. It is the kind of change that is most useful when you do not know you have it: pull a fresh spec, rebuild, and your IDE highlights every place in the codebase that called a renamed endpoint or passed the wrong type to a parameter. SQLite ORM @Entity marks the class; @Id and @Column shape the schema; @DbTransient opts a field out: Java @Entity public class TodoItem { @Id @Column long id; @Column String title; @Column(name = "completed_at") Date completedAt; @DbTransient Object cachedView; } Dao<TodoItem> dao = EntityManager.open("todos.db").dao(TodoItem.class); dao.createTable(); dao.insert(new TodoItem(0, "Read the post", null)); List<TodoItem> open = dao.find("completed_at IS NULL", new Object[] {}); TodoItem byId = dao.findById(42); dao.delete(byId); The generated DAO does the typed work underneath. No reflection in insert; the generated code calls setString(1, e.title) and setLong(2, e.id) directly against the SQLite PreparedStatement. Validation at build time catches missing @Id, fields that look like relationships but are not yet supported, and abstract entity classes; the build fails with a class name and a reason. For JPA/Hibernate developers, the API is intentionally familiar. @Entity, @Id, @Column, and @Transient (here renamed @DbTransient to avoid colliding with java.beans.Transient) carry the same meaning they do under javax.persistence / jakarta.persistence. The EntityManager name is the same. Dao#findById, Dao#findAll, Dao#find(where, params), Dao#insert, Dao#update, Dao#delete line up with the basic JPA repository contract. The query language is plain SQL (there is no JPQL or Criteria DSL), but the annotation surface, the lifecycle, and the runtime methods will feel like a long-lost friend to anyone with server-side Java persistence experience. JSON/XML Mapping @Mapped marks a class as a transferable POJO. @JsonProperty and @XmlElement (plus @XmlRoot, @XmlAttribute, @JsonIgnore, @XmlTransient) shape the wire format. The runtime entry points are Mappers.toJson(...), Mappers.fromJson(...), Mappers.toXml(...), Mappers.fromXml(...): Java @Mapped public class User { @JsonProperty("user_id") long id; @JsonProperty String name; @JsonProperty("created_at") Date createdAt; @JsonIgnore String passwordHash; } String json = Mappers.toJson(user); User back = Mappers.fromJson(json, User.class); The same @Mapped POJO is the type the typed Rest helpers accept: Java Rest.get("https://api.example.com/users/42") .fetchAsMapped(User.class) .onResult((user, err) -> { /* ... */ }); Rest.get("https://api.example.com/users") .fetchAsMappedList(User.class) .onResult((users, err) -> { /* ... */ }); Rest.fetchAsJsonList (top-level JSON arrays, no {"root":[...]} envelope trick), JSONWriter (the complement of JSONParser, with fluent builders and streaming variants for Writer and OutputStream), and URLImage.setDefaultBearerToken (auth headers on image fetches) all ship alongside. For JAXB developers, the XML surface (@XmlRoot, @XmlElement, @XmlAttribute, @XmlTransient) is a direct port of the long-established javax.xml.bind.annotation surface. The same model class can be both @XmlRoot-decorated and @JsonProperty-decorated, which gives you a single source of truth for both wire formats. The JSON surface adopts the Jackson convention (@JsonProperty, @JsonIgnore) that nearly every modern JVM JSON binding (Jackson, Moshi, kotlinx-serialization) inherited. Component Binding With Validation The fourth annotation processor on the same pipeline is the component binder. @Bindable marks a model class; @Bind(name = "userField") ties a field to a component on a form by the component's name. Field-level validation annotations compose with @Bind on the same field: Java @Bindable public class SignupModel { @Bind(name = "userField") @Required @Length(min = 3) private String user; @Bind(name = "emailField") @Required @Email private String email; @Bind(name = "ageField") @Numeric(min = 13, max = 120) private String age; @Bind(name = "roleField") @ExistIn({ "admin", "editor", "viewer" }) private String role; } The matching form sets a name on each component so the binder can find them: Java TextField user = new TextField(); user.setName("userField"); TextField email = new TextField(); email.setName("emailField"); TextField age = new TextField(); age.setName("ageField"); ComboBox<String> role = new ComboBox<>("admin", "editor", "viewer"); role.setName("roleField"); Button submit = new Button("Sign up"); Form form = new Form("Sign Up", BoxLayout.y()); form.add(user).add(email).add(age).add(role).add(submit); form.show(); SignupModel model = new SignupModel(); Binding binding = Binders.bind(model, form); binding.getValidator().addSubmitButtons(submit); Binding is the handle: refresh() re-reads the model into the components, commit() writes the components back, disconnect() tears the listeners down. Multiple validation annotations on a single field compose via Validator.addConstraint(Component, Constraint...) and GroupConstraint (first failure wins). @Validate(MyClass.class) is the escape hatch for hand-written Constraint implementations. The validation set: @Required, @Length, @Regex, @Email, @Url, @Numeric, @ExistIn, @Validate. The new BindAttr enum lets @Bind target a specific attribute of the component (TEXT, UIID, SELECTED, ...) when the default ("write a String field into the component's text") is not what you want. SVG at Build Time Drop an SVG into src/main/css/, alongside theme.css: Shell src/main/css/ theme.css star.svg gradient_circle.svg path_arrow.svg rounded_button.svg wave.svg pro_badge.svg After the next build, every SVG is a regular Codename One Image. An SVG handled by the transcoder is a vector image, but it is still an Image. Everywhere a raster Image works (Label.setIcon, Button.setIcon, BorderLayout.NORTH, the toolbar, a MultiButton's leading icon, a CSS background: url(...) rule), the SVG works too. The difference is that it stays crisp at any size: the same source file is sharp at a 16-point list-row icon, a 64-point hero header, and a 256-point launch screen, on every DPI bucket. A grid of the static SVGs from the hellocodenameone fixture, rendered through the new pipeline: Sizing in Millimeters The SVG transcoder's most useful feature is also the one most easily missed: size every SVG in millimeters from CSS. SVGs in the wild routinely declare odd width / height attributes (a 1024×1024 export of a 24×24 icon, no dimensions at all, design-pixel values from one specific framework). Pinning the rendered size in millimeters sidesteps all of that. CSS HomeIcon { background: url(home.svg); cn1-svg-width: 6mm; cn1-svg-height: 6mm; bg-type: image_scaled_fit; } LogoBanner { background: url(logo.svg); cn1-svg-width: 32mm; cn1-svg-height: 12mm; } A 6 mm icon is 6 mm tall on a 1× desktop, 6 mm on a high-DPI handset, and 6 mm on a 4K tablet. The transcoder routes both values through Display.convertToPixels() at install time, the same way font-size: 3mm already behaves elsewhere in Codename One CSS. No design-pixel guesswork, no DPI bucket to choose, no scaling surprise when the artist re-exports the source SVG at a different resolution. If a project does not use CSS for theming, the two-float constructor on the generated class takes millimeters directly: new com.codename1.generated.svg.Home(6f, 6f). Coverage and What We Still Want Feedback On The transcoder is a maven/svg-transcoder/ module that parses SVG with javax.xml StAX. No Batik, no Flamingo, no external dependencies. Coverage targets what real-world icon SVGs use: rect (rounded corners included), circle, ellipse, line, polyline, polygon, the full path grammar (M / L / H / V / C / S / Q / T / A / Z plus relative-coordinate and smooth-curve reflection), groups with affine transforms (translate, scale, rotate, skew, matrix), linear gradients via LinearGradientPaint, fill, stroke, stroke-width, linecap, linejoin, opacity. SMIL animations are supported in the same pipeline: <animate>, <animateTransform> (translate, scale, rotate), and <set>. Time values interpolate against wall-clock time on every paint, with from / to / values / begin / dur / repeatCount / fill="freeze" honored. Text and clip-path landed in the follow-up PR for the static SVG fixtures, and both are visible in the screenshot above (the "Codename One / build-time SVG" wordmark in the rounded button, the "PRO" badge text, and the clip-path-shaped rounded-corner badge underneath). <text> and <tspan> work with single-style fills and transforms; <clipPath> referenced via clip-path="url(#id)" works against rect, circle, and path clip shapes (nested clip refs are ignored). What is still not supported: SVG filter primitives, <mask> (treated as a clip, so alpha masking falls back to opaque), <radialGradient> (falls back to the first-stop color), and CSS-in-SVG (style rules inside the SVG document; the transcoder reads presentation attributes and the inline style="..." attribute, but a <style> element with selectors is not parsed). If you hit an SVG that does not transcode the way you expect, please open an issue at github.com/codenameone/CodenameOne/issues and attach the source file. The fastest way to extend the coverage is for us to run the failing case through the test fixtures and watch the output. Every SVG we ship test goldens for started as somebody else's "this doesn't render right" report. Caveat on iOS: The transcoded SVGs use the framework's shape API (fillShape, drawShape, LinearGradientPaint). The full surface is implemented on the Metal renderer. The deprecated GL ES 2 pipeline does not have parity on every operation, so an SVG drawn under ios.metal=false will often render with visible artifacts (missing gradients, clipped fills, distorted paths) rather than the placeholder you might expect. Now that Metal is the default for new iOS builds as of last Friday, this is a non-issue on most apps; if you have explicitly pinned ios.metal=false, expect some visual regressions on SVG content and let us know which. The coverage matrix and troubleshooting are in the SVG Transcoder in the developer guide. Lottie at Build Time The same pipeline carries Lottie. Drop a Bodymovin export into the same src/main/css/: JSON src/main/css/ theme.css pulse.json spinner.json After the next build, both are real Image instances on every platform that exposes the shape API. The same vector-everywhere story as SVG: a Lottie animation renders crisply at any size and slots into any Image slot in the framework. Java Image pulse = Resources.getGlobalResources().getImage("pulse"); Image spinner = Resources.getGlobalResources().getImage("spinner"); Animation runs against wall-clock time on every paint, with no Timer and no allocation in the hot path. A capture of the hellocodenameone Lottie fixture in motion: The Lottie transcoder lives in maven/lottie-transcoder/. It parses Bodymovin JSON with no external dependencies (the framework's built-in JSON parser carries the load) and lowers each file into the same SVGDocument model the SVG path uses. The same JavaCodeGenerator emits the same GeneratedSVGImage subclass, and the same SVGRegistry registers it under the source filename. No new Image base class, no new registry, no per-port wiring, since the SVG path's JavaSE reflective load and iOS / Android Stub weaving already cover the new format. Coverage in v1: shape layers (rc / el / sh) with solid fills and strokes; layer transforms (anchor, position, scale, rotation, opacity); animated rotation, position, and scale collapsed to a two-keyframe loop; solid-color layers as filled rects. Most icon-grade Bodymovin exports lower cleanly. Complex character animations from After Effects with image references, masks, and effects do not, and the transcoder logs which layers it dropped so the source of any blank output is obvious. Same ask as for SVG: if a Lottie / Bodymovin file does not transcode the way you expect, please open an issue at github.com/codenameone/CodenameOne/issues and attach the source .json. The transcoder grows one shape family at a time from the cases the community reports. The same iOS caveat applies: the renderer leans on the shape API, so the deprecated GL ES 2 pipeline shows artifacts on the more elaborate Lottie animations. Use the Metal default (now on by default for new iOS builds). Deep Links and Routing Two pieces of plumbing for apps that handle URLs from outside themselves (notification taps, marketing links, share targets, Universal Links from Safari and the equivalent App Links from Chrome on Android). Deep Links Codename One has had deep-link support for a long time through Display.setProperty("AppArg", url). The platform plumbing already writes the incoming URL into that property on cold launch, and an app-resume sets it again on warm launch; reading it back from start() works fine for a small number of patterns. Where the AppArg-only approach gets fragile is consistency. The cold and warm paths execute different lifecycle code, the value is a flat string with no parsing, and the trickiest case is the one where a user lands in the middle of the app via a link and then continues to interact: their next navigation needs to compose with the entry point, the back-stack needs to make sense as if they had arrived through the usual flow, and "fall off the edge of the app" on back is a common bug. With a hand-rolled AppArg reader it is easy to miss one of these and ship a half-working flow. This release introduces a typed DeepLink and a single handler that fires for both cold and warm launches: Java Display.getInstance().setDeepLinkHandler(link -> { // link is a normalised DeepLink: scheme, host, path, // segments, query map, fragment. Same shape cold or warm. if ("/users".equals(link.path()) && link.segments().size() == 2) { showUserDetailForm(link.segments().get(1)); return true; } return false; AppArg still works for projects that depend on it, but the new handler is what we recommend going forward. The handler runs on a consistent lifecycle path on both cold and warm starts, and the parsed DeepLink value carries the scheme, host, path segments, query map, and fragment, so app code does not need to roll its own URL parser. Routing For projects that handle more than a handful of URL patterns, the second piece is the declarative router in com.codename1.router. We built it on the same build-time codegen pipeline as the ORM and the mappers (the router was actually the first concrete consumer of the new preprocessor), so the two surfaces compose: a deep-link handler that delegates to the router becomes a one-liner. Each form declares its own path with a @Route annotation: Java @Route("/") public class HomeForm extends Form { /* ... */ } @Route("/users/:id") public class UserDetailForm extends Form { public UserDetailForm(RouteMatch match) { String userId = match.param("id"); // build UI for user `userId` } } @Route("/about") Router.navigate("/users/42") resolves the path, instantiates UserDetailForm, and shows it. The deep-link handler now collapses to: Java Display.getInstance().setDeepLinkHandler(link -> Router.navigate(link.toString())); Each form owns its own routing rule. Adding or moving a screen is a one-class change. The "what screens does this app have, and at what paths?" question is answered by an IDE search for @Route, not by reading every form constructor in the project. For Spring developers, the shape is familiar by design. @Route plays the same role as Spring MVC's @RequestMapping: a class-level declaration that announces "this controller handles URLs of this shape". The :id parameter syntax mirrors Spring's {id} path-variable syntax; RouteMatch.param("id") is the same kind of accessor as Spring's @PathVariable. The mental model carries over from server-side Java with almost no friction. The same recognition is available to anyone with React Router, Vue Router, or Angular Router experience; the :param convention is the cross-framework default. The build-time processor validates that each annotated class extends Form, that the path starts with /, that the constructor is accessible, and that there are no duplicate patterns. Any rule violation fails the build with a class name and a reason, not at runtime with a stack trace. The rest of the router surface covers the kind of thing that has become table stakes in modern client routing: Route guards run before navigation completes and can cancel or redirect.Per-tab navigation stacks via TabsForm, where each tab keeps its own back stack.Location listeners so anything in the app can subscribe to "the route changed".Form.setPopGuard(PopGuard) intercepts hardware back, toolbar back, or Router.pop() with a chance to ask "are you sure?".Sheet.showForResult() returns an AsyncResource<T> that auto-cancels with null if the user dismisses the sheet. The API is opt-in. Apps that prefer the existing Form.show() / Form.showBack() flow keep using that; nothing changes. For the link-publishing side, an AasaBuilder emits the iOS apple-app-site-association JSON and an AssetLinksBuilder emits the Android assetlinks.json. The full setup walk-through (entitlements, the Android intent-filter, the .well-known/ upload on your origin server) is at Routing and Deep Links in the developer guide. The JavaScript port bridges the router into window.history so navigating the in-app router pushes a real entry into the browser's session history. Back and forward in the browser drive the router; reloading the page lands at the deep-link URL; sharing the URL out of the address bar takes a colleague to the same in-app location. How It Works: The Build-Time Codegen Pipeline Everything above sits on a single Maven-plugin pass. The plugin has an AnnotationProcessor SPI and two new Mojos: cn1:generate-annotation-stubs (in generate-sources) and cn1:process-annotations (in process-classes). The orchestrator ASM-scans target/classes, dispatches to every registered processor, validates the annotated classes, and emits a typed runtime artifact next to each one plus a tiny Index class that registers everything with a public runtime registry. Adding a new processor later is a matter of dropping it into META-INF/services with no orchestrator changes. The reason this runs against bytecode rather than against source text is that the source-regex prototype was scrapped early. The bytecode pass sees the JVM's view of the project (extends Form is a thing the JVM actually knows, not a pattern we have to hope the user wrote a specific way), rule violations come back with class names and reasons, and the build fails fast before any generated .class lands on disk. The infrastructure shares the ASM passes that the BytecodeComplianceMojo's existing String rewrites already use. A small stub source is emitted under target/generated-sources/cn1-annotations/ during generate-sources so application code that references the generated registry resolves at compile time. The real .class overwrites the stub later in process-classes. Standard "compile against a stub, link against the real thing" pattern; it just works inside a single Maven build instead of needing a multi-module split. cn1-core ships a no-op stub of each generated index (RoutesIndex, MappersIndex, BindersIndex, DaosIndex), so application code compiles even when the project has no annotated classes. The build-time processor shadows each stub with the real implementation before packaging. The SVG and Lottie transcoders sit on a parallel pipeline (declarative graphics files in place of annotations), but they emit the same shape of code and obey the same constraints. The practical effect is that the kind of code that historically required reflection at runtime (with all the obfuscation hazards and surprise allocations that come with that) now happens once at build time and produces direct, dead-code-eliminable, rename-safe symbol references. Wrapping Up That closes this release's post series. We already have some pretty big features lined up for this Friday's release post; the headline pieces are the most substantial things to land in months and are worth checking back for. Back to the weekly index.

By Shai Almog DZone Core CORE
Parallel Kafka Batch Processing With Kotlin Coroutines in Spring Boot
Parallel Kafka Batch Processing With Kotlin Coroutines in Spring Boot

Managing high-volume message traffic in distributed architectures is crucial. Efficient use of database and CPU resources is also very important. There are structures that allow us to receive messages in batches. The default Spring Kafka "BatchMessageListener" structure addresses this need. However, the processing of these messages often goes through a sequential bottleneck. This article will discuss the structure and usage of Kotlin Coroutines in detail. We will examine how to maximize Kafka message processing performance using Structured Concurrency principles and Resource Throttling techniques. Architectural Bottleneck: Sequential I/O Blocking On the current Kafka listener: Database or external service calls made for each message directly increase total processing times. If the processing speed of a message lags behind the message arrival speed and the max-poll-interval-ms time is exceeded, the consumer is removed from the consumer group. Rebalancing is triggered, and the partitions of that consumer are redistributed to other consumers in the group. Kotlin @KafkaListener(topics = ["usage-pool-topic"]) fun usagePoolListener(records: List<ConsumerRecord<String, String>>) { records.forEach { record -> processRecord(record) // Network latency + DB I/O blocking } } Solution 1. Batch-Fetch and In-Memory Map Structure Before any concurrent code is entered, data is retrieved collectively from all necessary entities. Multiple separate queries are converted into a batch query before data processing begins. The N+1 query problem is solved at the application layer. All data is cached once before being broken down into concurrent operations. Having the data cached significantly reduces our reliance on the database. Using the associateBy function, we transform the data into a map structure with X access times. This allows us to read the data safely from the maps instead of reading each concurrent operation from the database. Kotlin val messages = records.map { objectMapper.readValue(it.value(), UsagePoolRecord::class.java) } val usagePoolEntities = usagePoolRepository .findByIds(messages.map { it.usagePoolId.toBigInteger() }) .associateBy { it.usagePoolId } val lockEntities = lockRepository .findByUserIds(messages.map { it.userId }) .associateBy { it.userId } 2. Structured Concurrency Memory Management With Chunking The chunk structure serves two purposes. It prevents the creation of coroutines simultaneously. This prevents unnecessary memory usage. Each chunk writes to the database after all coroutines have completed their operations. Unnecessary connection pool consumption is avoided. Kotlin messages.chunked(150).forEach { chunk -> // Each chunk of 150 records is processed concurrently } Resource Isolation With limitedParallelism Why limitedParallelism? If the database connection pool has, for example, X connections, keeping the parallelism limit below X prevents "Connection Timeout" errors. Kotlin messages.chunked(150).forEach { chunk -> val deferredResults = chunk.map { record -> CoroutineScope(Dispatchers.IO.limitedParallelism(15)).async { try { processRecord(record, usagePoolEntities, lockEntities) } catch (e: Exception) { log.error("Operation error: ${record.key()}", e) buildErrorRecord(record, e) } } } val results = deferredResults.awaitAll() // Structural waiting collectAndAggregate(results) } The Dispatchers.IO.limitedParallelism(X) command limits the number of concurrent coroutines to X, preventing the DB connection pool from being exhausted.Each coroutine returns a result with the async command. The awaitAll() command waits for all coroutines in the chunk to finish before proceeding to the next step. runBlocking This function blocks callers until all concurrent operations are complete. This is the correct approach here because: It ensures that the Kafka consumer remains blocked to maintain its offset commit structure until all records in the batch are processed. We still benefit from concurrent operation parallelism within the runBlocking block. 3. Thread-Safe Result Structure After the awaitAll() operation, all results are collected in thread-safe queues. Then a single batch write operation takes place. Using MutableList structures to combine results returned from parallel processed coroutines can lead to data loss. At this point, lock-free data structures should be preferred. ConcurrentLinkedQueue uses CAS (Compare-And-Swap) algorithms instead of synchronized blocks. This provides superior performance in high-content write operations. Why Shouldn't We Use ConcurrentLinkedQueue? Concurrent operations (concurrent functions) perform simultaneous write operations to a shared collection of results. Using MutableList leads to race conditions. It performs well in secure and concurrent write operations. Kotlin data class AggregatedRecords( val processedSave: ConcurrentLinkedQueue<ProcessedEntity> = ConcurrentLinkedQueue(), val toDelete: ConcurrentLinkedQueue<UsagePoolEntity> = ConcurrentLinkedQueue(), val retryQueue: ConcurrentLinkedQueue<RetryEntity> = ConcurrentLinkedQueue() ) The DataIntegrityViolationException return is important. When two consumer instances are processing the same record, one of them falls into a unique constraint violation. Instead of making the entire batch fail, record-by-record deletion is performed. Kotlin AggregatedRecords.processedSave .chunked(150) .forEach { batch -> try { processedRepository.saveAll(batch) } catch (e: DataIntegrityViolationException) { batch.forEach { record -> try { processedRepository.save(record) } catch (e: DataIntegrityViolationException) {} } } } 4. Error Tolerance in Write Operations Batch write (saveAll) operations are performant. However, a "Unique Constraint" error in a single record can cause the entire batch to fail. The following structure is critical to meet Optimistic Locking or Idempotency requirements. Kotlin aggregatedRecords.processedSave.chunked(150).forEach { batch -> try { processedRepository.saveAll(batch) } catch (e: DataIntegrityViolationException) { // Fallback: Try one by one if batch fails batch.forEach { record -> try { processedRepository.save(record) } catch (innerException: DataIntegrityViolationException) { log.warn("Duplicate record skipped: ${record.id}") } } } } 5. Data Flow Diagram Ingress: The Kafka batch is caught with runBlocking.Preparation: All necessary context data is retrieved bulk from the DB.Execution: Coroutines are started asynchronously in chunks.Synchronization: The completion of all coroutines is awaited as a barrier point with awaitAll().Egress: Collected results are made permanent with saveAll. Performance Analysis and Results Conclusion Processing Kafka messages in Spring Boot with Kotlin Coroutines not only increases speed but also improves code readability and makes resource management deterministic (predictable). The use of runBlocking allows us to build a bridge between the blocking Kafka consumer thread and the suspended world without disrupting Kafka's offset management mechanism. Dependencies XML <dependency> <groupId>org.jetbrains.kotlinx</groupId> <artifactId>kotlinx-coroutines-core</artifactId> <version>1.7.3</version> </dependency> <dependency> <groupId>org.springframework.kafka</groupId> <artifactId>spring-kafka</artifactId> </dependency>

By Erkin Karanlık
On-Device Debugging and JUnit 5
On-Device Debugging and JUnit 5

This is the first follow-up to Friday's release post, and it covers the two changes from this release that affect how you iterate on a Codename One app rather than what the app itself does. On-device debugging that treats Java as Java on a real iPhone or a real Android device, and standard JUnit 5 against the JavaSE simulator. The first is the one we have been wanting for a long time, and is the one that takes the most explaining, so most of the post is about it. On-Device Debugging That Treats Java as Java Codename One has always supported on-device debugging in the strict technical sense. You could attach Xcode to a .ipa, you could attach Android Studio to a running APK, you could read the native call stack, you could step through Objective-C or the C that ParparVM emits. What you could not do was set a breakpoint in MyForm.java, hit it on a real iPhone, and inspect a Java field on a Java object as a Java object. You also could not debug an iOS app without a Mac in the loop somewhere, because the only debugger that understood the binary was Xcode. The translation step between the Java you wrote and the C that ParparVM produces left no way back across the gap on the device. PR #4999 (iOS) and PR #5012 (Android) close that gap. As of this week, any JDWP-speaking debugger (IntelliJ IDEA, jdb, VS Code's Java Debugger, Eclipse, NetBeans) can attach to a Codename One app and treat the running process as a JVM. Supported targets: iOS The iOS Simulator (requires a Mac, because the iOS Simulator only runs on a Mac),A real iPhone reached over Wi-Fi from the developer machine on the same network. You do not need a local Mac to debug on a real iPhone. The Codename One build cloud runs the iOS build for you and produces a signed .ipa; install it on your iPhone the usual way (TestFlight, ad-hoc, or the standard Build Cloud install link), and the JDWP attach over Wi-Fi works from a Linux or Windows IDE just as well as from a Mac. The Mac is only required for the local Xcode build path and for running the iOS Simulator. Android The Android emulatorA real Android phone over USBA real Android phone over wireless adb The Android attach uses standard adb, so you need the Android SDK platform tools installed on the developer machine. Those are available on macOS, Linux, and Windows, so any of the three is fine for Android debugging. What It Looks Like A breakpoint inside an iOS app, hit on the iOS Simulator next to IntelliJ IDEA: The same Debug tool window you use for any other Java project. The frames panel on the left has the full Java call stack. The Variables panel shows this and the locals as Java values, with the same drill-down you would get on a regular JVM. The simulator on the right is the real iOS app, paused at the breakpoint, waiting for the next step. How the Pieces Fit Together On iOS, the IDE never talks to the device directly. The CN1 Debug Proxy is a small Java process you run on your developer machine. It binds two TCP ports: one for the iOS app to dial into using the CN1 wire protocol, and one that speaks standard JDWP for the IDE. The IDE sees a normal remote JVM. The iOS app sees a debug proxy. The proxy translates between the two and walks the ParparVM struct layout so Java fields, method calls, and values round-trip cleanly in both directions. On Android, the proxy is unnecessary. Dalvik/ART implements JDWP themselves, so IntelliJ attaches directly to the device through adb's built-in JDWP forwarder. The Maven plugin's new cn1:android-on-device-debugging goal does the adb orchestration and the port forwarding for you. A capability difference between the two platforms worth knowing up front: on Android, a native interface's Impl class is regular Java, so the JDWP attach steps through it the same way it steps through any other class in your project. On iOS the Impl is Objective-C, which JDWP does not speak, so you cannot step through it from the IDE. You can still step through the Codename One framework code and your own Java up to and through the native-interface call, and you can inspect the value the call returns; the body of the Objective-C method is the only thing that is opaque from the JDWP side. Attach Xcode in parallel if you need to step through the Objective-C as well. Tutorial: IntelliJ + iOS The Codename One archetype now generates two run configurations under an On-Device Debug folder in the IntelliJ run-config dropdown: CN1 Debug Proxy and CN1 Attach iOS. The tutorial below assumes a project generated from the Initializr recently enough to have those. If you have an older project, generate a new project with initializr and copy over the .idea directory and maven pom.xml files. 1. Enable the Build Hints Open common/codenameone_settings.properties and uncomment the four lines the archetype generated: Properties files ios.onDeviceDebug=true ios.onDeviceDebug.proxyHost=127.0.0.1 ios.onDeviceDebug.proxyPort=55333 ios.onDeviceDebug=true flips the iOS build into the instrumented variant. The other three configure the proxy connection. The fourth hint, ios.onDeviceDebug.waitForAttach=true, is the block-on-load option, and we recommend leaving it on. With it enabled, the iOS app shows a "Waiting for debugger" overlay at launch and does not progress past Display.init until the proxy issues its first resume. The recommendation is mostly about making the on-device-debug variant visible. Without the overlay it is easy to launch an on-device-debug build expecting the debugger to attach and not realize it is silently waiting for a proxy that is not running, and it is also easy to mistake an on-device-debug build for a regular build and then be surprised when it does not perform as smoothly as the release variant. The overlay rules out both of those. For a physical iPhone the proxyHost value should be the laptop's LAN IP (run ifconfig | grep "inet " to find it) rather than 127.0.0.1. The iOS Simulator can always use 127.0.0.1. 2. Build the iOS App Either path works: Local Xcode build (mvn cn1:buildIosXcodeProject) and then run from Xcode.Cloud build for a real device (mvn cn1:buildIosOnDeviceDebug) and install the resulting .ipa. Both produce an iOS binary instrumented for on-device debugging because the build hint is set. 3. Start the Proxy In IntelliJ, pick CN1 Debug Proxy from the run-config dropdown and click the green Run button (not the bug icon; Debug on this config would attach IntelliJ to the proxy itself, which is not what you want). The Run tool window shows: Plain Text On-device-debug proxy starting: symbols : .../cn1-symbols.txt device : listening on tcp://0.0.0.0:55333 jdwp : listening on tcp://0.0.0.0:8000 [device] listening on port 55333 for ParparVM app to dial in When the [jdwp] line appears, the proxy is ready. 4. Attach the Debugger Switch the run-config dropdown to CN1 Attach iOS and click the Debug button. IntelliJ connects to localhost:8000 and opens its standard Debug tool window. You can now set breakpoints anywhere in your Java code or in the framework. 5. Launch the App Launch the iOS app under the iOS Simulator (from Xcode) or on the tethered device. With waitForAttach=true it pauses at the "Waiting for debugger" overlay until the proxy issues its first resume. Hit Resume on the IntelliJ Debug toolbar; the app proceeds, your breakpoints fire as the app exercises them. The proxy's Run window is also your device console. Anything the app writes to System.out, Log.p, printf, or NSLog from native code is forwarded to the proxy and printed in the CN1 Debug Proxy Run window with a [device] prefix. This is genuinely useful and is one fewer thing you need Xcode for. The caveat is that the forwarding starts when the proxy connection is established, so output written during the very first millisecond of process launch (before Display.init) is not always captured. If you need every byte from t=0, attach Xcode's console for that specific run. Tutorial: IntelliJ + Android Android is simpler because the proxy is not needed. The archetype generates two run configurations under the same On-Device Debug folder: CN1 Android On-Device Debug (Maven, builds and installs the APK and forwards JDWP) and CN1 Attach Android (Remote JVM Debug at localhost:5005). 1. Enable the Build Hint In common/codenameone_settings.properties: Properties files android.onDeviceDebug=true This single hint flips the manifest to debuggable="true" and turns R8 / Proguard off for this build. Release builds without the hint are unaffected. 2. Run CN1 Android On-Device Debug Picks up the hint, builds the APK, installs it on the connected device or emulator, sets the debug-app for wait-for-attach, launches the Activity, forwards JDWP to localhost:5005, and streams logcat --pid=<pid> into the Run window with a [device] prefix. For wireless adb, pass -Dcn1.android.onDeviceDebug.wireless=<ip:port> and the goal will adb connect before installing. Both the Android 11+ adb pair flow and the legacy adb tcpip flow work. 3. Attach the Debugger Switch to CN1, Attach Android, and click Debug. IntelliJ connects to localhost:5005. Set breakpoints anywhere; they fire when exercised. Source resolution covers both the codenameone-core and codenameone-android sources jars, so breakpoints inside the framework or inside the Android port resolve to the right files. On Android, native interfaces are themselves Java, so a breakpoint inside the Impl class of your own native interface fires just like a breakpoint anywhere else in your code; you can step through the implementation, inspect locals, and evaluate expressions the same way. The dev guide has the full reference, including the wireless-pairing flows, the VS Code and Eclipse equivalents, and a troubleshooting section: iOS on-device debugging and Android on-device debugging. When to Use It (and When Not To) For most bugs, the JavaSE simulator is still, by a large margin, the fastest loop. Reach for on-device debugging when the bug is platform-specific: ParparVM-specific threading, an iOS-only layout glitch under the modern native theme, a real-radio Bluetooth interaction, a Touch ID gate, an Android-only manifest interaction, anything that only reproduces under iOS background memory pressure. The kind of bug that previously sent you reaching for Log.p and a rebuild loop. That bug now has a debugger pointed at it. JUnit 5 Against the Simulator The other change in this release is the new JUnit 5 integration in the JavaSE port (PR #5032). To be clear about what this is: it is standard JUnit 5. There is no fork of JUnit in com.codename1.testing.junit. That package holds a small set of annotations and a CodenameOneExtension that plugs into the regular JUnit Jupiter lifecycle. You write @Test methods using org.junit.jupiter.api.Test, you assert with org.junit.jupiter.api.Assertions, and your IDE's native test runner picks them up the way it does on any other Java project. Why a separate integration at all? The legacy com.codename1.testing.AbstractTest framework, driven by the cn1:test Maven goal, still exists and is still the only way to run tests on a real iOS or Android device (JUnit Jupiter is not available on ParparVM). The trade-off is that AbstractTest tests have to compile under the Codename One device subset, with no reflection, no java.net.http, no java.nio.file, no Mockito, no AssertJ, no assertThrows. JUnit-style tests run only on the JavaSE simulator JVM, but the JVM is a regular JVM, so reflection, Mockito, AssertJ, and parameterized tests are all available. Both styles coexist in the same project under common/src/test/java. You pick per test class. The runners discover disjoint sets (cn1:test looks for UnitTest implementers; Surefire looks for @Test methods), so a mvn install runs both passes in the same phase without overlap. A Minimal Test Tests live in common/src/test/java. The shape most apps want is one that boots the project's app class through the same init / start sequence the simulator uses, then asserts against the form the app actually opens: Java package com.example.myapp; import com.codename1.testing.junit.CodenameOneTest; import com.codename1.testing.junit.RunOnEdt; import com.codename1.ui.CN; import com.codename1.ui.Display; import com.codename1.ui.Form; import org.junit.jupiter.api.Test; import static org.junit.jupiter.api.Assertions.assertEquals; import static org.junit.jupiter.api.Assertions.assertTrue; @CodenameOneTest class GreetingFormTest { @Test @RunOnEdt void formShowsExpectedTitle() { MyAppName app = new MyAppName(); app.init(null); app.start(); assertEquals("Hi World", Display.getInstance().getCurrent().getTitle()); assertTrue(CN.isEdt(), "@RunOnEdt method runs on the Codename One EDT"); } } That is more useful than constructing a Form directly in the test because it exercises the same startup path the simulator runs. The assertions check the form your app opens, not a form the test wrote. The natural way to run it is from the IntelliJ gutter. Click the green icon next to the class declaration: The results land in the standard Run tool window: Click the green icon next to a specific @Test method to run just that method. The same flow works in VS Code's Test Explorer and in Eclipse's JUnit view. If you prefer the command line: Shell mvn -Ptest test # run the JUnit suite mvn -Ptest test -Dtest=GreetingFormTest # one class mvn -Ptest test -Dtest=GreetingFormTest#formShowsExpectedTitle @CodenameOneTest is the class-level entry point. It wires the simulator extension into the JUnit Jupiter lifecycle, boots Display.init(null) once per JVM (idempotent, so subsequent classes share the same Display), and skips the class with a TestAbortedException if the JVM is genuinely headless (so CI runners that have no display do not poison the rest of the run). @RunOnEdt dispatches the test body through CN.callSerially, which is what you want any time the body touches UI state. It rethrows the body's exceptions on the JUnit thread so the stack trace stays clickable in the IDE. Place it on the method for one test, on the class to apply to every test. A Couple More Common Cases A test that exercises a plain validator, with no UI involved at all: Java @CodenameOneTest class EmailValidatorTest { @Test void rejectsEmptyString() { assertFalse(new EmailValidator().isValid("")); } @Test void acceptsCommonAddress() { assertTrue(new EmailValidator().isValid("[email protected]")); } } This is the "pure model code" shape. No @RunOnEdt, no UI, runs on the JUnit worker thread, fast. A test of a form under a specific visual configuration: Java @CodenameOneTest class GreetingFormVisualTest { @Test @RunOnEdt @DarkMode @LargerText(scale = 1.6f) void titleStillFitsInDarkModeAtAccessibilityScale() { new GreetingForm().show(); Form current = Display.getInstance().getCurrent(); assertEquals("Hello", current.getTitle()); assertTrue(current.getPreferredW() <= Display.getInstance().getDisplayWidth()); } } The visual-config annotations (@Theme, @DarkMode, @LargerText, @Orientation, @RTL) apply on the EDT in one batch, followed by a single theme refresh, so the test body sees the simulator in the exact configuration you asked for without flicker. A test that injects a custom property for the duration of one method: Java @Test @RunOnEdt @SimulatorProperty(name = "feature.flag", value = "on") void newCodePathRunsWhenFlagIsOn() { // Display.getProperty("feature.flag", "off") returns "on" here runFeature(); assertEquals("expected", Display.getInstance().getCurrent().getTitle()); Class-level @SimulatorProperty applies to every method in the class. Method-level overrides class-level. Use the container @SimulatorProperties for more than one (the package source level rules out @Repeatable). The full reference, including the dependency-block YAML for common/pom.xml and javase/pom.xml and the @Theme / @Orientation / @RTL details, is at Testing with JUnit 5 in the developer guide. Wrapping Up That is the workflow half of this release. Tomorrow's post covers the new platform APIs that moved into the core this week: AI and OAuth/OIDC are the headline pieces, with wifi/connectivity and a few smaller items alongside them. Back to the weekly index.

By Shai Almog DZone Core CORE
Introducing RAI Audit Kit: Evidence-Grade Responsible AI Audits in Python
Introducing RAI Audit Kit: Evidence-Grade Responsible AI Audits in Python

This is the first article in a 6-part series on building practical, responsible AI audit workflows with RAI Audit Kit, an open-source Python package suite. The series will move from foundational AI systems to more advanced and production-oriented audit workflows: Launching RAI Audit Kit – why evidence-grade responsible AI audits matterAuditing ML systems – fairness, drift, data quality, and robustnessAuditing deep learning systems – image models, medical imaging, robustness, and explainabilityAuditing LLM and RAG systems – prompt injection, faithfulness, citations, and retrieval securityAuditing AI agents – tool use, memory, permissions, and trace safetyAdding audit gates to CI/CD – turning audit results into engineering controls This first article introduces the project, the problem it is designed to solve, and how the package suite is structured. Why Responsible AI Audits Need Better Tooling AI systems are becoming more complex. A few years ago, many teams mainly worried about model accuracy. Today, the picture is much broader. Modern AI systems may include tabular machine learning models, deep learning pipelines, LLM applications, RAG systems, and AI agents that call tools or use memory. That means AI evaluation can no longer stop at: “Is the model accurate?” A better question is: “Can we show evidence that this AI system was evaluated for fairness, robustness, drift, data quality, safety, security, and traceability?” In many teams, this evidence is scattered across notebooks, scripts, screenshots, spreadsheets, and manual review documents. That makes audits hard to reproduce and harder to compare across versions. Responsible AI needs to become part of normal engineering workflows. That is why I built the RAI Audit Kit. What Is the RAI Audit Kit? RAI Audit Kit is an open-source Python package suite for responsible, secure, and trustworthy AI audits. The goal is to help developers and AI teams run repeatable audits, generate structured findings, preserve evidence, and export useful reports. It is designed to support different types of AI systems, including: Classical machine learningDeep learningLLM applicationsRAG systemsAgentic AI workflows The package can help generate outputs such as findings, evidence manifests, model cards, audit reports, and CI/CD-friendly results. Install: PowerShell pip install rai-audit-kit Full install: PowerShell pip install "rai-audit-kit[all]" Package Architecture RAI Audit Kit is organized as a suite of smaller packages: PackagePurposerai-audit-coreReports, findings, evidence, model cards, audit history, and CI gatesrai-audit-mlFairness, drift, data quality, and robustness checks for tabular MLrai-audit-dlDeep learning, image, medical imaging, robustness, and explainability auditsrai-audit-llmLLM and RAG audits for prompt injection, toxicity, faithfulness, citations, and retrieval securityrai-audit-agentsAgent audits for tools, memory, permissions, prompt injection, and trace behaviorrai-audit-kitMeta-package for unified installation and CLI usage The structure is modular because responsible AI is not a single problem. A tabular ML system has different risks from a deep learning model. A RAG application has different risks from an autonomous agent. The suite is designed to keep those workflows connected while still allowing each package to focus on its own risk area. Quick Start A basic CLI workflow looks like this: PowerShell rai-audit init --project responsible-ai-demo rai-audit run --config audit.yaml For tabular ML, the Python API can look like this: Python from rai_audit.ml import ClassificationAudit report = ClassificationAudit( y_true=y_true, y_pred=y_pred, sensitive_features=sensitive_df, ).run() report.to_html("audit_report.html") The goal is to move from one-off evaluation scripts to repeatable audit runs that produce reviewable artifacts. What Can It Audit? RAI Audit Kit is designed around the idea that different AI systems need different audit lenses. For machine learning systems, the focus is on fairness, drift, data quality, and robustness. A model may perform well overall but still fail for certain subgroups or become unreliable after deployment.For deep learning systems, especially image and medical imaging models, the focus shifts toward robustness, explainability, patient leakage, site-level differences, and class-level performance.For LLM and RAG systems, the audit scope expands to prompt injection, unsafe output, toxicity, faithfulness, citation quality, retrieval quality, and retrieval security.For AI agents, the focus becomes tool use, memory, permissions, trace completeness, and prompt injection through external sources such as tools, webpages, retrieval systems, or email content. This article will not go deep into each area. Each one will be covered separately in the rest of the series. Why Evidence Matters Responsible AI audits should not disappear inside notebooks. A useful audit should answer: What checks were run?What data or predictions were evaluated?What findings were generated?What evidence supports each finding?Which artifacts were exported?Can the audit be repeated later?Can this be integrated into CI/CD? This evidence-first mindset is one of the main ideas behind the RAI Audit Kit. Reports can be exported in formats such as HTML, Markdown, and JSON. This makes the results useful for developers, reviewers, governance teams, and automation workflows. A simple audit flow may look like this: Plain Text Run evaluation ↓ Run responsible AI audit ↓ Generate findings ↓ Preserve evidence ↓ Export reports ↓ Review or gate deployment This does not replace human judgment. It gives reviewers better evidence to work with. Not a Compliance Shortcut It is important to be clear about the scope. RAI Audit Kit is a technical audit and reporting toolkit. It can help generate structured evidence and standards-oriented summaries, but it does not automatically certify that a system is compliant with any law, regulation, or internal policy. The goal is to support better review, not replace legal review, domain expertise, risk management, or organizational accountability. Responsible AI tools should help teams ask better questions and preserve better evidence. They should not create false confidence. Why This Project Matters Responsible AI needs practical engineering tools. Teams should be able to audit models, preserve evidence, compare results, and include risk checks in their development workflow. RAI Audit Kit is an early step in that direction. It brings together audits for ML, deep learning, LLMs, RAG systems, and AI agents under one Python suite. The core idea is simple: Responsible AI should be repeatable, evidence-backed, and built into the way we engineer AI systems. What’s Next in This Series In the next article, I will focus on auditing machine learning systems for fairness, drift, data quality, and robustness using the RAI Audit Kit. We will look at why accuracy alone is not enough, how subgroup performance can hide model risk, and how audit outputs can make ML review more structured and repeatable. Project Links GitHub: https://github.com/SaiTeja-Erukude/rai-auditInstall: pip install rai-audit-kit If you work on responsible AI, AI safety, LLM security, RAG systems, agentic AI, or MLOps, I would love feedback, ideas, and contributions.

By Sai Teja Erukude
A Spring Boot App With Half the Startup Time
A Spring Boot App With Half the Startup Time

The MovieManager project has been updated to use JDK 25 and the AOT cache from project Leyden. Project Leyden is part of the OpenJDK project and provides cached linking and cached performance statistics. That means the time spent linking at startup is moved to build time, and the statistics are created during a test run at build time as well. Because of that, the JVM loads the needed classes already linked and starts compiling the hot code paths immediately. The MovieManager application starts in less than half the time with these optimizations without any code changes. All these advantages come with preconditions: Exactly the same JVM version at build time, training time, and run timeThe same OS(Linux is used here) and libc at all steps -> (No Alpine-based Docker Images)Same CPU architecture, for example, AMD64 or ARM64 The steps to use Project Leyden: Build the Spring Boot ApplicationExtract the Spring Boot ApplicationDo a training run with the extracted Application to create the AOT cacheCreate the Docker Image with the extracted Application and the AOT cache Building and Training the Application The first step is to build the Spring Boot JAR. The MovieManager project has an integrated build that builds the Angular frontend and the Spring Boot backend with this Maven command: Shell ./mvnw clean install -Ddocker=true -Dnpm.test.script=test-chromium Project Leyden does not support Spring Boot Jars. The Jar has to be extracted to help Project Leyden find the used library jars of the project. To do that, this command needs to be used: Shell java -Djarmode=tools -jar backend/target/moviemanager-backend-0.0.1-SNAPSHOT.jar extract --destination extracted The result is the directory ‘extracted’ with the application jar and a sub-directory ‘lib’ that contains the used libraries. The second step is to create the AOT cache. To do that, the application has to run in production conditions. That means using a real PostgreSQL database with the database driver. That enables the JDK to record all the needed classes of the project and to create realistic performance statistics for the code compilation. To do this, a PostgreSQL database has to be started(done here in a Docker container), and the Application has to do the full startup. These commands are needed: Shell docker pull postgres:13 docker run --name local-postgres -e POSTGRES_PASSWORD=sven1 -e POSTGRES_USER=sven1 -e POSTGRES_DB=movies -p 5432:5432 -d postgres java -XX:+UseG1GC -XX:MaxGCPauseMillis=50 -XX:+UseCompressedOops -XX:+UseCompactObjectHeaders -XX:+ExitOnOutOfMemoryError -XX:MaxDirectMemorySize=64m -XX:+UseStringDeduplication -Xlog:aot -XX:AOTCacheOutput=app.aot -Dspring.context.exit=onRefresh -Djava.security.egd=file:/dev/./urandom -jar extracted/moviemanager-backend-0.0.1-SNAPSHOT.jar --spring.profiles.active=prod The Java command runs the application with the parameter ‘-Dspring.context.exit=onRefresh’ that makes Spring Boot do the full startup and exit then. The parameters ‘-Xlog:aot -XX:AOTCacheOutput=app.aot’ enable the logging of the AOT process and the creation of the ‘app.aot’ that is the AOT cache. The AOT cache contains everything that is needed for a fast startup of the application. If the AOT cache should also contain information to improve production performance, it would have to start up and process realistic production requests. That is beyond the scope of this article. The third step is to test the new application setup: Shell java -XX:+UseG1GC -XX:MaxGCPauseMillis=50 -XX:+UseCompressedOops -XX:+UseCompactObjectHeaders -XX:+ExitOnOutOfMemoryError -XX:MaxDirectMemorySize=64m -XX:+UseStringDeduplication -Xlog:class+path=info -XX:AOTCache=app.aot -Xlog:aot -Djava.security.egd=file:/dev/./urandom -jar extracted/moviemanager-backend-0.0.1-SNAPSHOT.jar --spring.profiles.active=prod The start-up time of the new setup with the AOT cache can be compared to the start-up time of the Spring Boot jar. On a medium-powered laptop, the times are: 9 seconds for the Spring Boot Jar3.5 seconds for the new setup with the AOT cache Creating a Docker Image To use the application in production, it needs to be packaged into a Docker image. The Docker image needs to contain the extracted application setup and the AOT cache. The base image needs to have the exact same JDK version, OS, and the same libc. That means small base images like Alpine cannot be used. The created Image can not be small because it contains 180 MB of AOT cache and a larger base image. This can be done with this Dockerfile: Dockerfile FROM eclipse-temurin:25.0.3_9-jdk-jammy WORKDIR /application ARG JAR_FILE=extracted/*.jar COPY ${JAR_FILE} moviemanager-backend-0.0.1-SNAPSHOT.jar COPY extracted/ ./ COPY app.aot app.aot ENV JAVA_OPTS="-XX:+UseG1GC \ -XX:MaxGCPauseMillis=50 \ -XX:+UseCompressedOops \ -XX:+UseCompactObjectHeaders \ -XX:+ExitOnOutOfMemoryError \ -XX:MaxDirectMemorySize=64m \ -XX:+UseStringDeduplication" ENTRYPOINT exec java $JAVA_OPTS -XX:+AOTClassLinking \ -XX:AOTCache=app.aot \ -Xlog:class+path=info \ -Djava.security.egd=file:/dev/./urandom \ -jar moviemanager-backend-0.0.1-SNAPSHOT.jar It copies the new application setup in the image and adds the AOT cache. The name of the application jar is in the AOT cache and has to be exactly the same as during the creation of the AOT cache. The ‘JAVA_OPTS’ also have to be the same. If the JDK version in the build environment changes, the version of the base image has to be adjusted accordingly. The parameter ‘-Xlog:class+path=info’ makes analyzing AOT problems much easier. The Docker container size is 705 MB. That makes the container about double the size of a Docker container with a Spring Boot Jar and an Alpine-based JDK image. Creating a Build Pipeline Creating Docker images for an application by hand is unsustainable in a production environment. A build pipeline is needed. The MovieManager project is hosted on GitHub; because of that, the project uses a GitHub Workflow as a build pipeline. The complete code for the build pipeline is in the script. The steps of the GitHub pipeline can be recreated in other environments too. The first step is to set up the PostgreSQL database service to be used in this build: YAML jobs: analyze: name: Analyze runs-on: ubuntu-latest env: POSTGRES_URL: jdbc:postgresql://localhost:5432/movies services: postgres: image: postgres:latest env: POSTGRES_USER: sven1 POSTGRES_PASSWORD: sven1 POSTGRES_DB: movies ports: - 5432:5432 options: >- --health-cmd="pg_isready -U sven1 -d movies" --health-interval=10s --health-timeout=5s --health-retries=5 The commands set up the PostgreSQL service in the build pipeline with user, password, dbname, and dbport. The ‘POSTGRES_URL’ is set to access the database later. The second step is to check out the project: YAML steps: - name: Checkout repository uses: actions/checkout@v3 It checks out the contents of the master branch. The third step is to provide the JDK: YAML - name: Setup Java JDK uses: actions/setup-java@v3 with: distribution: 'temurin' java-version: 25 JDK version 25 is the minimum to use the project Leyden with linking and performance statistics. The fourth step builds the Spring Boot Jar: YAML - name: Build with Maven if: matrix.language == 'java' run: | ./mvnw clean install -Ddocker=true That is the Maven command to build the project. The fifth step is to find the Spring Boot jar: YAML - name: Find fat jar if: matrix.language == 'java' id: jar run: | JAR_PATH=$(find ./backend/target -type f -name "*SNAPSHOT.jar" | head -n 1) echo "Found JAR: $JAR_PATH" echo "jar=$JAR_PATH" >> $GITHUB_OUTPUT The sixth step is to extract the Spring Boot jar: YAML - name: Unpack fat jar if: matrix.language == 'java' id: UNPACK run: | java -Djarmode=tools -jar ${{ steps.jar.outputs.jar } extract --destination extracted EXTRACTED_PATH=$(find . -type d -name "extracted" | head -n 1) echo "Found directory: $EXTRACTED_PATH" echo "extracted=$EXTRACTED_PATH" >> $GITHUB_OUTPUT The seventh step is to get the name of the extracted application jar: YAML - name: find extracted jar if: matrix.language == 'java' id: EXTRACT run: | EXTRACTED_JAR=$(find "${{ steps.UNPACK.outputs.extracted }" -type f -name "*.jar" | head -n 1) EXTRACTED_JAR=${EXTRACTED_JAR#./} echo "Found extracted JAR: $EXTRACTED_JAR" echo "extracted=$EXTRACTED_JAR" >> $GITHUB_OUTPUT The eighth step is to create the AOT cache: YAML - name: Create AOT cache if: matrix.language == 'java' id: AOT env: JAVA_TOOL_OPTIONS: "" _JAVA_OPTIONS: "" JDK_JAVA_OPTIONS: "" run: | EXTRACTED_JAR="${{ steps.EXTRACT.outputs.extracted }" echo "jar=$EXTRACTED_JAR" echo "JAVA_TOOL_OPTIONS=$JAVA_TOOL_OPTIONS" echo "_JAVA_OPTIONS=$_JAVA_OPTIONS" echo "JDK_JAVA_OPTIONS=$JDK_JAVA_OPTIONS" JAVA_OPTS="-XX:+UseG1GC -XX:MaxGCPauseMillis=50 -XX:+UseCompressedOops -XX:+UseCompactObjectHeaders -XX:+ExitOnOutOfMemoryError -XX:MaxDirectMemorySize=64m -XX:+UseStringDeduplication" java $JAVA_OPTS \ -XX:+AOTClassLinking \ -XX:AOTCacheOutput=app.aot \ -Xlog:aot \ -Dspring.context.exit=onRefresh \ -Dspring.datasource.url="${{ env.POSTGRES_URL }" \ -Dspring.profiles.active=prod \ -jar "$EXTRACTED_JAR" || echo "AOT Training finished with exit code $?" This runs the application startup with the PostgreSQL database to create the AOT cache. The ninth step shows the exact JDK version used in the AOT cache generation: YAML - name: Show Jdk version if: matrix.language == 'java' id: JDK run: | JDK_VERSION=$(java -version 2>&1) VERSION=$(echo "$JDK_VERSION" | sed -n 's/.*build \([^[:space:]]*\)-LTS.*/\1/p') echo "JDK_VERSION=$JDK_VERSION" echo "VERSION=$VERSION" MY_VERSION="jdk=$VERSION" In case of problems with using the AOT cache. The first check is the version shown here against the JDK version in the Docker base image. The tenth step creates the Docker image: YAML - name: Build and push uses: docker/build-push-action@v6 if: matrix.language == 'java' with: context: . file: ./Dockerfile build-args: | JAR_PATH=${{ steps.EXTRACT.outputs.extracted } LIB_PATH=${{ steps.aot.outputs.extracted } push: false tags: angular2guy/moviemanager:latest This step can push the Docker image to an image repository. Conclusion The results of using the AOT cache of project Leyden are impressive. Cutting the startup time in half without any code change is amazing. The effort to create the AOT cache and set up the new application is a one-time investment. The impact of the larger Docker Images is low. That makes scaling application instances in Kubernetes clusters up and down much more flexible because the time to the availability of a new application instance is much lower. In Kubernetes environments with scaling of application instances, the AOT cache is a significant step forward and should be used. For serverless applications 3.5 seconds startup time is too slow. Their project, CrAC or Native Image, would be needed. Project CrAC needs code changes and testing. Native Image has the closed-world assumption, which makes it hard to prove that larger applications work correctly. Alternatives are Node.js with Nest.js and TypeScript, or Go with its libraries. Project Leyden is not finished in JDK 25. There are plans to add compiled code to the AOT cache in the future. The JVM is an impressive piece of technology that is still improving further.

By Sven Loesekann

Top Languages Experts

expert thumbnail

Alvin Lee

Founder,
Out of the Box Development, LLC

Full-stack developer and technology consultant specializing in web architectures, microservices, and API integrations.

The Latest Languages Topics

article thumbnail
Reducing Alert Fatigue in the SOC Using Correlation Rules and Detection-as-Code
Correlation rules and risk aggregation collapse noisy alerts into fewer, higher-context escalations. Detection-as-code keeps it sustainable.
June 25, 2026
by Krishnaveni Musku
· 343 Views
article thumbnail
Architectural Cost of Rust's Orphan Rule
A deep dive into how Rust's orphan rule creates architectural trade-offs in large monorepos. Learn when to use the newtype pattern, local traits, or bridge crates
June 24, 2026
by Krun Dev
· 468 Views
article thumbnail
Foxit MCP Server: Give AI Agents Direct Access to 30+ PDF Tools via Model Context Protocol
Foxit MCP Server gives any AI agent direct access to 30+ PDF tools for conversion, OCR, merge, and compare via the Model Context Protocol.
June 22, 2026
by Lucien Chemaly
· 914 Views
article thumbnail
When Valid SQL Was Still the Wrong Answer
A personal project exploring why AI-generated SQL isn't always trustworthy and how semantic context, validation, and governance improve analytics accuracy.
June 22, 2026
by Anusha Kovi DZone Core CORE
· 810 Views · 1 Like
article thumbnail
Keeping AI-Powered BI Honest: A Human-in-the-Loop (HITL) Playbook
AI-generated SQL can look right while being wrong. Learn how human-in-the-loop workflows build trust through reviews, approvals, audits, and escalation paths.
June 22, 2026
by Nithish Shetty
· 795 Views
article thumbnail
From Open SQL to CDS Views: Rewriting SAP Data Access for Performance at Scale
Swap Open SQL for CDS views to push logic into HANA and centralize reusable data models, but verify the execution plan, not just the pattern
June 19, 2026
by Deepika Paturu
· 1,040 Views
article thumbnail
Jakarta NoSQL: Why JPA Is Not Enough for the AI Era
Jakarta NoSQL provides a familiar Java programming model while preserving the strengths of document, graph, key-value, and AI-driven vector databases.
June 19, 2026
by Otavio Santana DZone Core CORE
· 1,179 Views · 1 Like
article thumbnail
From printTriangularNumber to Duff’s Device: Mastering Java Switch Statements Old and New
This post traces that journey using triangular number computation as a practical example of intentional fall-through and connects the technique to Duff's Device.
June 19, 2026
by NaveenKumar Namachivayam DZone Core CORE
· 996 Views · 2 Likes
article thumbnail
Top Java Security Vulnerabilities and How to Prevent Them in Modern Java
Most Java security breaches stem from preventable coding mistakes. Follow secure coding practices, validate inputs, and keep dependencies updated to reduce risk.
June 18, 2026
by Muhammed Harris Kodavath
· 1,894 Views
article thumbnail
OpenAPI, ORM, SVG, and Lottie
Learn about Codename One's latest release with OpenAPI code generation, SQLite ORM, SVG and Lottie support, deep links, and routing.
June 17, 2026
by Shai Almog DZone Core CORE
· 2,515 Views · 1 Like
article thumbnail
On-Device Debugging and JUnit 5
A walk-through of the new JDWP-based on-device debugging pipeline for ParparVM iOS apps and Android apps, with a step-by-step IntelliJ tutorial for each.
June 17, 2026
by Shai Almog DZone Core CORE
· 1,571 Views · 1 Like
article thumbnail
Parallel Kafka Batch Processing With Kotlin Coroutines in Spring Boot
Learn how Kotlin Coroutines improve Spring Boot Kafka batch processing with parallel execution, resource throttling, and faster database operations.
June 16, 2026
by Erkin Karanlık
· 2,101 Views · 1 Like
article thumbnail
Introducing RAI Audit Kit: Evidence-Grade Responsible AI Audits in Python
RAI Audit Kit is an open-source Python suite for repeatable, evidence-backed AI audits across ML, deep learning, LLMs, RAG, and agents.
June 15, 2026
by Sai Teja Erukude
· 1,429 Views
article thumbnail
A Spring Boot App With Half the Startup Time
Learn how Project Leyden and AOT caching can cut Spring Boot startup time in half, improving Kubernetes scaling and application responsiveness.
June 12, 2026
by Sven Loesekann
· 2,258 Views · 3 Likes
article thumbnail
Implementing the Planning Pattern With Java Enterprise and LangChain4j
Learn how to implement the Planning Pattern with Enterprise Java, Jakarta EE, CDI, and LangChain4j, enabling AI to transform business goals into executable workflows.
June 12, 2026
by Otavio Santana DZone Core CORE
· 1,630 Views · 1 Like
article thumbnail
Native SQL in Java Without JDBC Boilerplate — Meet Ujorm3
Ujorm3 eliminates JDBC boilerplate without a full ORM. Write native SQL with named parameters, get objects back — including nested relations.
June 11, 2026
by Pavel Ponec
· 1,985 Views · 2 Likes
article thumbnail
Rust-Native Alternatives to Spark SQL and DataFrame Workloads
Sail is an open-source computation framework that serves as a drop-in replacement for Apache Spark (SQL and DataFrame API) in both single-host and distributed settings.
June 11, 2026
by Srinivasarao Rayankula
· 1,833 Views · 2 Likes
article thumbnail
The Repo Tracker: Automating My Daily GitHub Catch-Up
Automate GitHub repo tracking with a local agent using Python, SQLite, and cron. Learn how to build a lightweight monitoring system for open-source projects.
June 11, 2026
by Alain Airom (Ayrom)
· 1,532 Views
article thumbnail
Give Your AI Assistant Long-Term Memory With perag
Perag is a local, no-cloud private RAG tool that gives your AI assistant searchable access to your personal document archive via UNIX pipes and JSON.
June 10, 2026
by Peter Verhas DZone Core CORE
· 2,097 Views · 1 Like
article thumbnail
I Was Tired of Flying Blind With AI Agents, So I Built AgentDog
A lightweight Python toolkit to test AI agent behavior, catch drift, and validate tool use, grounding, safety, and efficiency before production.
June 10, 2026
by Sai Teja Erukude
· 1,465 Views
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • ...
  • Next
  • RSS
  • X
  • Facebook

ABOUT US

  • About DZone
  • Support and feedback
  • Community research

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 215
  • Nashville, TN 37211
  • [email protected]

Let's be friends:

  • RSS
  • X
  • Facebook
×