Programming languages allow us to communicate with computers, and they operate like sets of instructions. There are numerous types of languages, including procedural, functional, object-oriented, and more. Whether you’re looking to learn a new language or trying to find some tips or tricks, the resources in the Languages Zone will give you all the information you need and more.
Architectural Cost of Rust's Orphan Rule
Foxit MCP Server: Give AI Agents Direct Access to 30+ PDF Tools via Model Context Protocol
Modern SAP landscapes running on SAP HANA demand a rethink of how ABAP programs access data. Traditional Open SQL queries embedded in ABAP code have served developers for decades, but at large data volumes, they can become performance bottlenecks. SAP’s introduction of Core Data Services (CDS) views offers a new paradigm: push more work to the in-memory database and retrieve only what’s needed. Traditional ABAP Data Access With Open SQL Open SQL is the standard SQL interface in ABAP that allows developers to query the underlying database in a database-agnostic way. For example, an ABAP report might join two tables and fetch results like this: Plain Text SELECT bkpf~bukrs, bkpf~belnr, bkpf~gjahr, bseg~koart, bseg~wrbtr, bseg~shkzg FROM bkpf INNER JOIN bseg ON bkpf~bukrs = bseg~bukrs AND bkpf~belnr = bseg~belnr AND bkpf~gjahr = bseg~gjahr INTO TABLE @DATA(it_fi_docs) WHERE bkpf~bukrs = '1000' AND bkpf~gjahr = '2023' AND bseg~koart = 'K'. This Open SQL example joins the BKPF and BSEG tables to retrieve financial documents. Open SQL sends such queries to the database, and on SAP HANA, the heavy lifting of the join and filtering is done in-memory on the DB server. The result is then brought back to the ABAP application server. However, the challenge with Open SQL at scale comes when ABAP code handles large data sets or complex logic in the application layer. Common performance issues in legacy ABAP include: Too much data transferred: Selecting wide tables or not filtering enough leads to heavy network and memory usage. Best practice is to filter and aggregate in the query to keep the result set small and transfer only the required columns (avoid SELECT *). Multiple round-trips: Performing calculations with many small queries or loops causes repeated DB calls. It’s more efficient to push joins and subqueries into one SQL if possible. Each context switch adds overhead. Application-side processing: If business logic runs on millions of records in ABAP, the application server CPU becomes the bottleneck. The database could perform these operations faster, set-wise. In summary, while Open SQL can express complex data retrieval, ABAP developers traditionally had to be very disciplined in query design to avoid performance issues at scale. This paved the way for a new approach leveraging SAP HANA’s strengths. The Case for Change: Code-to-Data Paradigm SAP HANA’s in-memory, columnar architecture enables it to execute aggregations, filters, and joins extremely fast at the database level. To exploit this, SAP advocated the code-to-data paradigm. push computations down to the database rather than pulling data up to the code. Rewriting data access using CDS views is a key technique in this paradigm, alongside others like AMDP. By offloading heavy operations to the DB, we minimize data transfer and let HANA’s optimized engines handle crunching the data. For example, instead of reading a full table and then filtering in ABAP, you pass WHERE conditions so the DB does it. Instead of multiple selects and merges in ABAP, you perform a JOIN or a subquery in one shot. Another driver for change is SAP’s new data models in S/4HANA. Many classic transparent tables were replaced by HANA-optimized structures or compatibility views. Custom ABAP code written for ECC often breaks or needs adaptation for S/4HANA’s simplified data model. In these cases, SAP often provides CDS views as the new interface to data. As one DZone article notes, engineers moving to S/4 must switch to the S/4 equivalents to replace old data access logic. In short, adopting CDS views is not only about performance but also about aligning with SAP’s modern architecture. Introducing ABAP Core Data Services (CDS) Views ABAP CDS is a framework to define rich data models directly on the database, using a declarative syntax in ABAP Development Tools (ADT). A CDS view is essentially a view in the HANA database, defined via an ABAP DDL statement. For example, here’s a simple CDS view definition joining two tables: Plain Text @AbapCatalog.sqlViewName: 'ZDEMO_FLIGHTS' define view ZFlightInfo as select from spfli inner join scarr on spfli.carrid = scarr.carrid { scarr.carrname as carrier, spfli.connid as flight, spfli.cityfrom as departure, spfli.cityto as arrival } This CDS view ZFlightInfo performs the same join between SPFLI and SCARR as an equivalent Open SQL join would. In fact, you could copy-paste the join logic from ABAP into the CDS definition with minor syntax changes. After activating this view in ADT, the system creates a database view in HANA. ABAP programs can then consume the CDS view just like a table: SQL SELECT * FROM ZFlightInfo INTO TABLE @DATA(it_flights) ORDER BY carrier, flight. The result set it_flights from the CDS view will be identical to what an Open SQL join would produce for the same input tables. Under the hood, both approaches result in the database executing a similar SQL SELECT. So, why use CDS? The benefits become evident as complexity grows: Reusability and model centralization: CDS definitions are stored in the ABAP Dictionary and can be reused by any number of programs or even other CDS views. Instead of writing the same joins or calculations in multiple ABAP reports, you define them once in a CDS view. SAP recommends using a CDS view when you need to retrieve data from multiple related tables, because it involves the least amount of coding and can be reused in multiple objects. In large-scale systems, this consistency is key to a single source of truth for that piece of data logic. Rich expression and metadata: CDS supports advanced SQL features and built-in functions. You can define calculated fields, aggregations, and even leverage specialized HANA capabilities within the view. CDS also allows adding annotations, making the data model self-descriptive. Performance through pushdown: By moving logic into the CDS (and thus into SQL on the database), you reduce the workload on the ABAP layer. The database can apply filters, joins, and computations in parallel, using its optimized engines. Only the final result is sent back to ABAP. Secure and controlled access: CDS views integrate with the SAP authorization concept, ensuring consistent enforcement of business security rules at the data model level, rather than scattering checks in ABAP code. This means performance benefits without sacrificing governance. Tutorial: Converting an Open SQL to a CDS View (with Code) To solidify the concept, let’s walk through a simple conversion. Imagine we have an ABAP report that needs to list flight routes with the airline name. In classic ABAP, you might do this with an inner join in Open SQL as shown below: Open SQL Approach (Legacy ABAP code): Plain Text DATA: lt_flights TYPE TABLE OF zflight_info. "Structure for results SELECT scarr~carrname AS carrier, spfli~connid AS flight, spfli~cityfrom AS departure, spfli~cityto AS arrival FROM spfli INNER JOIN scarr ON spfli~carrid = scarr~carrid INTO TABLE @lt_flights ORDER BY carrname, connid. This code joins SPFLI with SCARR and populates an internal table lt_flights. It works, but the logic is embedded in the program. Now, suppose we want to reuse this same join in multiple places. We can refactor it into a CDS view: CDS View Approach: Define the view in ABAP DDL (e.g., in Eclipse ADT): Plain Text @AbapCatalog.sqlViewName: 'ZFLIGHTINF' @AccessControl.authorizationCheck: #NOT_REQUIRED define view ZFlightInfo as select from spfli inner join scarr on spfli.carrid = scarr.carrid { scarr.carrname as carrier, spfli.connid as flight, spfli.cityfrom as departure, spfli.cityto as arrival } We give the view a name ZFlightInfo. Note that this is almost identical to the Open SQL, just expressed as a view definition. Once activated, the CDS is available system-wide. Now our ABAP report can simply do: Plain Text SELECT * FROM ZFlightInfo INTO TABLE @lt_flights ORDER BY carrier, flight. The result in lt_flights will be the same. We have effectively decoupled the data retrieval logic from the program and centralized it in the DB layer. This not only improves reuse; in a HANA system, it can also improve performance. The database can better optimize a single persistent view than ad-hoc SQL scattered in code. And if we needed to adjust the join or add a new field. Performance Considerations and Best Practices When rewriting Open SQL to CDS, ABAP developers should keep a few important considerations in mind: Measure, don’t guess: Simply converting an Open SQL to a CDS view doesn’t magically speed up the query if it was already efficient. As noted earlier, for straightforward SELECTs or joins, the performance will be equivalent in many cases. The real gains come when you use CDS to do more complex processing in one go. Always use tools like ST05 SQL trace or HANA’s PlanViz to ensure the new design is actually optimal. The execution plan is what matters, not whether you wrote it in Open SQL or CDS. Avoid over-complex views: It’s possible to go overboard with stacking CDS views on top of each other. While layering is good for separation of concerns, too many nested views or excessive use of associations can lead to very complex SQL at runtime. This can confuse the optimizer or prevent predicate pushdown. Be wary of heavy calculations in a single CDS. If performance suffers, consider alternatives like ABAP Managed DB Procedures (AMDP) for really complex logic or break the problem down differently. Select only what you need: Just as with Open SQL, a CDS view should be designed to return only necessary fields and records. Don’t define a CDS with SELECT * from a wide table list the needed fields. This ensures consumer queries aren’t unknowingly pulling extra data. One common pitfall is using CDS to expose an entire table with all columns, which defeats the purpose. Instead, tailor views to use cases or use parameters in CDS to filter data. Use CDS features wisely: Leverage CDS capabilities like aggregations, calculated fields, and unions to eliminate extra work in ABAP. Reuse and consistency: Replace multiple Open SQL implementations of the same logic with a single CDS. Not only does this reuse improve maintainability, but it also means the database might handle the unified load more efficiently. SAP itself follows this approach in S/4HANA with the Virtual Data Model, hundreds of CDS views that serve as the source for Fiori apps and reports, rather than raw table access. By moving to CDS, you align your custom code to the same philosophy. Conclusion Rewriting data access from Open SQL to CDS views is a strategic move for ABAP developers aiming to maximize performance at scale. By pushing more logic to the SAP HANA database, we take full advantage of its in-memory speed and parallel processing. CDS views enable complex data gathering in one shot, reduce the load on the application server, and provide a modular, reusable data model for your SAP applications. That said, an engineer must also approach CDS with a critical eye, understanding the execution plan and ensuring that moving to CDS truly improves the situation, rather than blindly adding abstraction. Advanced ABAP development is about choosing the right tool for the job. In the case of data-intensive operations, CDS views have proven to be a powerful tool, aligning with SAP’s modern direction and delivering robust performance at scale. By rewriting your data access with CDS and following best practices, you can future-proof your ABAP code for the HANA era, achieving faster results and a cleaner, more sustainable codebase for the long run.
In this blog post, we will see how the humble Java switch statement evolved from a fall-through curiosity into a powerful expression, and how understanding its mechanics unlocks classic techniques like Duff's Device. Java's switch statement has evolved from a fall-through-prone construct into a modern expression syntax introduced in Java 14. The post traces this evolution using a concrete example, a method that computes triangular numbers by intentionally allowing execution to cascade through cases without break statements. The post also connects this behavior to Duff's Device, a 1983 loop-unrolling technique that uses deliberate fall-through to handle remainder elements before processing full blocks. A comparison of old and new switch syntax outlines trade-offs, and practical guidance is offered on when each form is appropriate. The Accidental Discovery I was prepping for the OCP Java 21 exam and stumbled across a tricky question. A method named question2 used a switch statement without any break statements. The output surprised me at first. Once I traced through it, I renamed the method to printTriangularNumber. That one rename told the whole story. This post dives into why. The Old Switch Statement The traditional switch statement has been part of Java since day one. The syntax looks like this: Java int day = 3; switch (day) { case 1: System.out.println("Monday"); break; case 2: System.out.println("Tuesday"); break; case 3: System.out.println("Wednesday"); break; default: System.out.println("Unknown"); break; } As shown above, every case ends with a break. Without it, execution does not stop. It keeps going into the next case. The old switch works on int, char, String, and enum types. Fall-Through: Feature or Bug? The most misunderstood behavior in switch is fall-through. When you omit break, execution literally falls into the next case. Java int x = 2; switch (x) { case 3: System.out.println("three"); case 2: System.out.println("two"); // jumps here case 1: System.out.println("one"); // falls through default: System.out.println("done"); // falls through } Output: Plain Text two one done Most developers treat this as a bug waiting to happen. They are not wrong. Forgetting a break is one of the most common Java mistakes. But intentional fall-through is a different story. It is a deliberate tool. And printTriangularNumber is the perfect example. printTriangularNumber: Fall-Through in Action Here is the method I renamed from question2 during my OCP prep: Java private static void printTriangularNumber(int n) { int res = 0; switch (n) { case 5: res += 5; case 4: res += 4; case 3: res += 3; case 2: res += 2; case 1: res += 1; default: break; } System.out.println(res == 0 ? "Ok, bye." : res); Let us trace through n = 4: Jumps to case 4, adds 4. res = 4 Falls to case 3, adds 3. res = 7 Falls to case 2, adds 2. res = 9 Falls to case 1, adds 1. res = 10 Hits default, breaks Output: 10 The pattern for each input: nResultFormula111232+1363+2+14104+3+2+15155+4+3+2+1 This is n * (n + 1) / 2, the triangular number formula. The fall-through is doing the summation for you. Each case accumulates the remaining values by simply not stopping. For n = 0 or any value above 5, no case matches, default fires immediately, and res stays 0. The ternary prints "Ok, bye.". I personally find it a beautiful example of using language semantics intentionally. This is also the kind of question the OCP exam loves to throw at you. The New Switch Expression (Java 14+) Java 14 introduced switch expressions as a standard feature. The arrow syntax -> eliminates fall-through entirely. Each arm is independent. Java int day = 3; String name = switch (day) { case 1 -> "Monday"; case 2 -> "Tuesday"; case 3 -> "Wednesday"; default -> "Unknown"; }; System.out.println(name); // Wednesday A few things to notice here: Switch is now an expression. It returns a value. The arrow -> replaces : and break together. No fall-through. Each arm executes independently. Multiple labels on a single arm: case 1, 7 -> "Weekend"; You can also use it inline: Java System.out.println(switch (day) { case 1, 7 -> "Weekend"; default -> "Weekday"; }); Much cleaner. Much safer. Switch Expressions With Yield Sometimes you need more than a single expression in an arm. That is where yield comes in. Java int n = 4; int result = switch (n) { case 1, 2 -> n * 10; case 3, 4 -> { int temp = n * n; System.out.println("Computing for: " + n); yield temp; // return value from block } default -> 0; }; System.out.println(result); // 16 Think of yield as the return statement for a switch block arm. You need it whenever the arm has multiple statements inside {}. A common mistake is using return instead of yield inside a switch expression block. That compiles only inside a method and it returns from the entire method, not just the switch. Always use yield inside switch expression blocks. Duff's Device: Fall-Through Taken to the Extreme Now that we understand fall-through well, let us look at the most famous intentional use of it: Duff's Device. Tom Duff invented this in 1983 to speed up memory copy operations by reducing loop branch overhead. The trick is to unroll the copy loop and use a switch to jump into the middle of it based on the remainder. In Java, we replicate it in two clean phases since Java does not allow interleaved switch+loop syntax: Java public static void duffCopy(int[] src, int[] dst, int n) { int i = 0; int rem = n % 4; // Phase 1: handle remainder via fall-through switch (rem) { case 3: dst[i] = src[i]; i++; case 2: dst[i] = src[i]; i++; case 1: dst[i] = src[i]; i++; case 0: break; } // Phase 2: full blocks of 4 int fullBlocks = (n - rem) / 4; while (fullBlocks-- > 0) { dst[i] = src[i]; i++; dst[i] = src[i]; i++; dst[i] = src[i]; i++; dst[i] = src[i]; i++; } } Let us trace through n = 13: rem = 13 % 4 = 1 Switch jumps to case 1, copies 1 element. i = 1 fullBlocks = (13 - 1) / 4 = 3 Loop runs 3 times, copying 4 elements each time Total: 1 + 12 = 13 elements The Python equivalent makes the two phases explicit: Python def duff_copy(src, n): dst = [None] * n rem = n % 4 for i in range(rem): # Phase 1: remainder dst[i] = src[i] i = rem while i < n: # Phase 2: full blocks dst[i] = src[i] dst[i+1] = src[i+1] dst[i+2] = src[i+2] dst[i+3] = src[i+3] i += 4 return dst The connection to printTriangularNumber is direct. Both use fall-through intentionally. In printTriangularNumber, the switch jumps to the right case and accumulates downward. In Duff's Device, the switch jumps to the right case and copies the remainder before the main loop takes over. Old vs. New Switch at a Glance FeatureOld Switch (:)New Switch (->)Fall-throughYes (default)NoReturns valueNoYesbreak neededYesNoMultiple labelsNoYes (case 1, 2 ->)Block with yieldNoYesNull safeNoYes (Java 21 preview)OCP exam topicYesYes Which One Should You Use? For new code, always prefer the switch expression with ->. It is safer, cleaner, and expressive. Your reviewers will thank you. Reserve the old switch with fall-through only when you genuinely need the cascading behavior, like in printTriangularNumber or a hand-tuned loop like Duff's Device. In those cases, add a comment explaining the intent. Otherwise, the next developer (including future you) will assume the break is missing by accident. My personal observation: the OCP Java 21 exam tests both heavily. Knowing when fall-through is intentional versus accidental is the key distinction examiners probe. Make sure you can trace through any switch block without running it. Happy testing! What is your take: is intentional fall-through clever engineering or a maintenance nightmare waiting to happen? Drop your thoughts below!
The most effective way to present this idea is to begin with the challenge architects face: AI has transformed the persistence landscape. Enterprise applications were once built almost exclusively on relational databases, making JPA a keystone of Jakarta EE. Today, modern systems use a mix of relational databases, document stores, caches, graph engines, and increasingly, vector databases that support semantic search, retrieval-augmented generation (RAG), and AI-powered applications. Polyglot persistence is now the industry standard. While Jakarta EE standardized relational persistence through JPA, it still lacks a vendor-neutral standard for non-relational persistence. This gap forces developers to rely on fragmented, proprietary solutions, creating barriers to portability, productivity, and innovation. The rise of AI makes this gap critical. Vector databases are now essential to intelligent systems, supporting semantic search, embeddings, and contextual retrieval. For Jakarta EE to remain the leading enterprise Java platform in the AI era, it must offer a standardized approach to NoSQL persistence, as it did for relational databases. Jakarta NoSQL is not just another specification; it constitutes a strategic investment in the ecosystem's future. By offering a familiar programming model, reducing vendor lock-in, and integrating with AI workloads, Jakarta NoSQL ensures that Jakarta EE remains relevant and competitive for the next generation of enterprise applications. NoSQL in the AI Era: Understanding the Modern Data Landscape For years, enterprise data persistence focused on relational databases. Systems relied on tables, rows, foreign keys, and SQL, making relational technology the standard for business applications. While still essential, modern architectures now use polyglot persistence, where multiple database types coexist, each satisfying specific requirements. Today, NoSQL refers to a family of database paradigms, each engineered for specific workloads and architectural needs, rather than just document databases. Key-value databases store data as key-value pairs, enabling fast lookups and low latency. Typical uses include caching, user sessions, feature flags, and temporary application state.Document databases store data as structured documents, such as JSON or BSON. They are effective for applications having hierarchical or evolving schemas, including web applications, e-commerce platforms, and content management systems.Column-family databases organize data by columns instead of rows, supporting high write throughput and horizontal scalability. They are used for IoT telemetry, event logging, analytics, and large-scale distributed systems.Graph databases model entities and relationships as nodes and edges. This structure is ideal for social networks, fraud detection, recommendation engines, dependency analysis, and knowledge graphs in which relationships are critical.Vector databases store high-dimensional embeddings from machine learning models and large language models (LLMs). They enable semantic search, similarity matching, retrieval-augmented generation (RAG), recommendation platforms, and other AI-driven features via understanding meaning instead of exact text matches.Time-series databases specialize in timestamped data that changes over time. They are used for observability, monitoring, financial markets, industrial sensors, and operational metrics where high-performance temporal data storage and analysis are essential. These database types often coexist within the same architecture. Modern applications may use PostgreSQL for transactions, Redis for caching, MongoDB for documents, Neo4j for relationships, InfluxDB for telemetry, and a vector database like Milvus, Pinecone, or Weaviate for AI-powered search and retrieval. This approach, known as polyglot persistence, is now standard in enterprise systems. The industry has embraced this shift. The Stack Overflow Developer Survey shows that while relational databases still dominate enterprise workloads, NoSQL technologies are now standard tools for developers. Technologies like Redis, MongoDB, and Elasticsearch are used alongside PostgreSQL and MySQL. Organizations no longer choose between SQL and NoSQL; instead, they combine multiple persistence technologies to leverage their strengths. Polyglot persistence is now the baseline for modern software systems. Vector databases are especially important among NoSQL categories, as they are basic to modern Artificial Intelligence systems. In contrast to traditional databases that store explicit business data, vector databases store numerical representations called embeddings. Generated by machine learning models, these embeddings encode the semantic meaning of words, documents, images, or other content as mathematical vectors. This enables software to search and retrieve information based on meaning rather than exact text matches. The distinction between lexical and semantic search illustrates the significance of vector databases. For example, a traditional SQL search for “Pet” returns records with that exact term, such as “Pet Shop,” but ignores related expressions like “Dog” or “Puppy.” Semantic search, by comparing embeddings, retrieves documents about dogs, puppies, or animal companions because it recognizes their semantic relationship. The search engine matches meaning, not just syntax. This function is vital for modern AI architectures. Large language models do not process relational tables directly; they use embeddings and contextual connections between concepts. Systems such as retrieval-augmented generation (RAG), enterprise knowledge search, recommendation engines, and intelligent assistants depend on similarity searches across millions of vectors. While relational databases can support some vector operations through extensions, vector databases are purpose-built for these workloads, offering optimized indexing and similarity algorithms for large-scale semantic retrieval. As AI adoption grows, vector databases are becoming a strategic component of enterprise architecture. Appreciating the importance of NoSQL, several Java ecosystems have developed their own solutions. Spring offers independent projects like Spring Data MongoDB, Spring Data Redis, and Spring Data Cassandra. These integrations provide a productive programming model but are tightly coupled to the Spring ecosystem. Quarkus supports NoSQL persistence through Panache and database-specific integrations, emphasizing developer productivity and cloud-native deployment. Micronaut Data supports several NoSQL engines, using compile-time code generation and ahead-of-time processing to improve performance and reduce execution overhead. While these solutions are effective, they remain framework-specific rather than platform standards. Developers switching frameworks encounter different APIs, abstractions, annotations, and operational models, even when solving similar persistence challenges. Jakarta EE addressed this for relational persistence with Jakarta Persistence (JPA), delivering a standardized, vendor-independent programming model. As NoSQL technologies expand and AI workloads more and more depend on vector databases, the lack of a vendor-neutral NoSQL standard is a significant gap in the Jakarta ecosystem. The Java Standardization Journey The need for a standardized NoSQL solution in the Java ecosystem has been discussed for years. During the Java EE era, several proposals tried to integrate non-relational databases into the enterprise platform. As NoSQL technologies grew in popularity throughout the 2010s, developers anticipated a dedicated specification to accompany traditional enterprise APIs at JavaOne conferences. Despite clear demand, no such initiative emerged within Java EE. The platform remained focused on relational persistence via JPA, leaving NoSQL adoption to rely on vendor-specific libraries and framework integrations. The transition of Java EE to the Eclipse Foundation provided an opportunity to address this challenge. Instead of waiting for a platform-level solution, the community launched Eclipse JNoSQL, an open-source project supplying a unified programming model for NoSQL databases. Drawing on JPA's success, Eclipse JNoSQL introduced mapping annotations, repositories, templates, and communication APIs that support document, key-value, column-family, and graph databases. The project showed that a consistent developer experience could be attained without compromising each database model's unique features. As Jakarta EE matured, Eclipse JNoSQL became the foundation for a new standardization effort: Jakarta NoSQL. Jakarta NoSQL was the first persistence specification created entirely within the Jakarta EE process. Unlike earlier specifications that migrated from Java EE, Jakarta NoSQL was conceived, developed, and released under the Eclipse Foundation governance model. It was among the first to complete the full Jakarta Specification Process from inception to release. Jakarta NoSQL's impact extended beyond its initial scope. During development, the expert group identified a common challenge for both relational and non-relational databases: developers needed a consistent repository abstraction independent of the underlying persistence engine. This led to the creation of a separate specification, Jakarta Data. The need to standardize NoSQL access patterns directly influenced the development of Jakarta Data's repository-oriented programming model, which applies across multiple persistence technologies. The relationship between these specifications highlights Jakarta NoSQL's broader influence on the Jakarta EE ecosystem. Jakarta NoSQL focuses on mapping and interacting with non-relational databases, while Jakarta Data delivers a unified repository abstraction for both relational and NoSQL implementations. Together, they significantly reduce fragmentation in enterprise persistence. This evolution continued beyond Jakarta Data. The drive to standardize modern persistence requirements has inspired new specifications, such as Jakarta Query, which aims to deliver a portable, type-safe, and expressive query language for various persistence technologies. As the Jakarta ecosystem grows, Jakarta NoSQL acts as a key milestone. It addressed the long-standing absence of a NoSQL standard and helped lay the foundation for the next generation of persistence specifications within Jakarta EE. Jakarta NoSQL: Built for NoSQL, Not Adapted to It When architects consider standardizing NoSQL development in Jakarta EE, a common question arises: why not extend Jakarta Persistence (JPA) to support NoSQL databases? JPA has long provided a unified programming model for relational databases in the Java ecosystem. The answer is based on a core architectural principle: tools should be optimized for their intended purpose. The first challenge is that JPA was designed specifically for relational databases, relying on concepts like tables, columns, joins, foreign keys, and transactional consistency. These are not simply implementation details but core elements of the specification. Forcing document, graph, key-value, or vector databases into this model creates friction and limits the use of each database’s native features. The second challenge is that NoSQL systems behave fundamentally differently. Graph databases perform path traversals, document databases store nested structures without normalization, key-value databases focus on fast lookups, and vector databases handle similarity calculations. These systems also differ in consistency, transactions, query languages, indexing, and scalability capabilities. Representing all these paradigms through a single relational abstraction leads to compromises. The third challenge is the importance of specialization. As Abraham Maslow noted, “if the only tool you have is a hammer, it is tempting to treat everything as if it were a nail.” Relational databases are effective, but not ideal for every persistence need. Semantic search, graph traversal, and high-volume telemetry storage are not relational problems. Applying a relational abstraction to all database types runs the risk of losing the unique optimizations each technology provides. Examine the analogy of transportation: cars, boats, submarines, and airplanes all address transportation but are specialized for different environments. Forcing them to use the same controls would result in mediocrity across all. Similarly, a single persistence abstraction may remove the features that make each database effective. Therefore, Jakarta NoSQL does not extend JPA beyond its intended scope. Instead, it offers a dedicated persistence model for non-relational databases, while continuing to maintain the familiar developer experience that contributed to JPA’s success. A key design goal of Jakarta NoSQL is to reduce mental effort for enterprise Java developers. Teams experienced with JPA should find the specification immediately approachable, as Jakarta NoSQL intentionally uses familiar terminology and concepts from the Jakarta EE community. Developers will encounter annotations like @Entity, @Id, and @Column, enabling a smooth transition from relational to non-relational persistence. Java @Entity public class Car { @Id private Long id; @Column private String name; @Column private CarType type; } At first glance, this entity closely resembles a JPA entity, which is intentional. However, the underlying implementation is fundamentally different. Jakarta NoSQL is built to support schema flexibility, embedded structures, nested documents, and database-specific storage models. This approach is reflected throughout the API. Instead of requiring developers to oversee low-level driver details, Jakarta NoSQL offers a high-level programming model via the Template API. Java @Inject Template template; Car ferrari = Car.builder() .id(1L) .name("Ferrari") .build(); template.insert(ferrari); List<Car> sports = template.select(Car.class) .where("type").eq(CarType.SPORT) .orderBy("name") .result(); The objective mirrors JPA’s original mission: permitting developers to focus on domain models and business logic, rather than serialization, connection management, or vendor-specific APIs. This foundation shaped Jakarta NoSQL 1.0. The initial release introduced the mapping layer, CDI integration, repository support, template operations, and standardized endpoints for four major NoSQL categories: Document databasesKey-value databasesColumn-family databasesGraph databases Jakarta NoSQL 1.0 showed that a unified Java programming model can respect the particular characteristics of each database family. Jakarta NoSQL 1.1 continued this evolution. While version 1.0 focused on mapping and persistence, version 1.1 expanded querying capabilities through integration with Jakarta Query. A key addition is support for parameterized queries, letting developers to safely bind parameters instead of manually constructing query strings. Java List<Car> cars = template.query( "FROM Car WHERE type = :type") .bind("type", CarType.SPORT) .result(); Version 1.1 also introduces projection support, allowing applications to retrieve lightweight views instead of entire entities. Java @Projection public record TechCarView( String name, CarType type) { } List<TechCarView> views = template .typedQuery( "FROM Car WHERE type = 'SPORT'", TechCarView.class) .result(); These features improve performance, reduce data transfer, and comply with modern Java features such as records. An important aspect of Jakarta NoSQL is its long-term architectural vision. While most developers use the mapping layer, the specification also defines a lower-level communication API for advanced scenarios. Java DocumentManagerFactory factory = ...; DocumentManager manager = factory.get("users"); DocumentRecord record = ...; manager.put(record); Optional<DocumentRecord> result = manager.findByKey("user:10"); manager.deleteByKey("user:10"); This communication layer is optional. Application developers can build complete systems without it, but it is valuable for database vendors, framework authors, and advanced integrations needing direct access to database capabilities. This design is fundamentally different from JDBC, which assumes communication through SQL statements and tabular result sets. That model works well because relational databases share a common language and interaction pattern. NoSQL databases do not. Document databases may use BSON, graph databases may offer traversal languages, and vector databases may provide similarity-search APIs. Others use REST endpoints, binary protocols, gRPC streams, or vendor-specific mechanisms. Forcing these models into a JDBC-style abstraction would limit their capabilities or demand ongoing vendor-specific extensions. For this reason, Jakarta NoSQL uses a layered architecture. The mapping layer offers a portable, productive programming model for developers, while the communication layer remains flexible to support diverse NoSQL systems. This architecture positions the specification for future growth. As new technologies like vector databases, time-series engines, and AI-native storage emerge, Jakarta NoSQL can evolve without imposing a relational mindset. Rather than treating every database as a nail for the JPA hammer, Jakarta NoSQL recognizes that different problems require different tools, while still presenting a consistent and familiar experience for enterprise Java developers.
If you've ever inherited a Spark job that runs in 35 minutes and someone asks you to make it faster, you know the routine. You start by checking partition counts, then file sizes, then shuffle stages, then broadcast hints. You find a handwritten OPTIMIZE schedule from 2022, a Z-ORDER on the wrong column, and a cluster sized for last year's data volume. By the time you've made the job fast, you've absorbed three new things to maintain. The next person to inherit it will absorb four. This pattern — call it the hand-tuning treadmill — is what the declarative optimization story on Databricks is trying to break. It's not a single feature; it's a cluster of capabilities that collectively let teams describe what a table should look like and let the engine handle the physical optimizations. What follows is the practical view of those patterns: where they fit, what they replace, and how to migrate without a rewrite weekend. 1. The Hand-Tuning Treadmill: Why Imperative Optimization Doesn't Scale Before getting into the declarative side, it's worth being concrete about what "imperative Spark optimization" actually means in production. The shape is consistent across teams I've audited: Layout decisions frozen on day one. Somebody picks a partition column when the table is created. The data shape changes a year later. Nobody re-partitions because the migration is scary. Query plans drift toward full scans.Maintenance jobs that nobody owns. An OPTIMIZE / Z-ORDER / VACUUM script lives in a notebook scheduled at 3 AM. It runs on a cluster that's slightly mis-sized. When data volume grows, the job runs into the morning workload, and people complain about latency.Cluster sizing as a guess. Worker count is a heuristic from a senior engineer's memory of last year's spike. Half the time it's too big, half the time it's too small, and the cost discussion gets emotional.Hint-driven plans. Broadcast hints, repartition hints, coalesce (N) — sprinkled through pipelines to fix yesterday's problem, kept indefinitely because removing them feels risky. None of these are bugs. They're symptoms of the imperative model: the team owns the layout, the maintenance, the sizing, and the plan tuning. In small pipelines, ownership is fine. At scale, it becomes the bottleneck that the team can't outsource. 2. What "Declarative" Means in the Spark Optimization Context Declarative is a word that gets used in two different ways here, and it's worth pulling them apart. Within Lakeflow pipelines (formerly DLT), it means "describe the tables, not the steps" — the engine builds the DAG and runs it. But in the broader optimization story, declarative also means "describe the desired property of the table or workload, not the operations to maintain it": Layout: I want this table clustered by these columns; figure out when and how to re-cluster.Maintenance: I want this table optimized and vacuumed; figure out the schedule.Ingestion: I want all new files in this path picked up exactly once; figure out checkpointing and listing.Quality: These rows must satisfy these expectations; enforce them and report what gets dropped.Compute: I want this query fast and not wasteful; size and scale appropriately. Each one of those bullets corresponds to a piece of the declarative stack. Used together, they replace a remarkable amount of the boilerplate that has historically lived in Spark pipelines. The mental shift: You stop writing operations against the table and start writing properties of the table. The engine becomes the actor; you become the editor. 3. The Declarative Optimization Stack on Databricks The chart below maps each thing the team declares to the engine capability that handles it, ending at the physical Delta table. It's the picture I draw on whiteboards when teams ask, "What's the order to adopt these in?" Figure 1. The declarative optimization stack: each user-facing intent at the top maps to a continuous engine behavior, which keeps the underlying Delta tables well-clustered, compacted, and statistically up-to-date — without human intervention. Two things are worth highlighting in this picture. First, every box in the engine row is something that runs continuously, not on a cron — there is no daily "optimization window" anymore. Second, the bottom layer is identical to what you'd get from any well-tuned imperative pipeline: 256 MB Parquet files with current statistics. The declarative path doesn't change what good looks like; it changes who does the work to keep things looking good. 4. Layout: Liquid Clustering Replaces Hand-Maintained Z-ORDER Liquid Clustering is the change with the largest practical impact, because partition-key choices are where most lakehouse pipelines accumulate the most technical debt. The declarative version: you specify the columns the data is most often filtered or joined by, and the engine maintains a layout that supports those access patterns — incrementally, as new data arrives, without a full rewrite. When access patterns change, you change the cluster columns, and the engine re-clusters in the background. Defining Liquid-Clustered Tables SQL -- New table, clustered by the columns most commonly filtered on. -- No more PARTITIONED BY, no more guessing at partition cardinality. CREATE TABLE prod.gold.daily_totals ( account_id STRING, region STRING, ingest_date DATE, daily_total DECIMAL(18,2), txn_count BIGINT ) USING DELTA CLUSTER BY (region, ingest_date, account_id); -- Even better: let the engine pick the clustering columns by -- observing real query patterns over time. CREATE TABLE prod.gold.events_clustered USING DELTA CLUSTER BY AUTO AS SELECT * FROM prod.silver.events; Migrating an Existing Partitioned/Z-ORDER Table SQL -- Convert a legacy partitioned table to liquid clustering. -- Existing data files are not rewritten immediately; the engine -- rebalances incrementally on subsequent writes + maintenance. ALTER TABLE prod.silver.transactions CLUSTER BY (account_id, ingest_date); -- Force the first clustering pass for a freshly converted table OPTIMIZE prod.silver.transactions FULL; Why this matters: the recurring 2 AM Slack thread of "can we re-partition this table?" goes away. Layout becomes a property you change with one DDL statement, not a multi-week rewrite project. 5. Maintenance: Predictive Optimization Replaces Cron-Driven OPTIMIZE/VACUUM Predictive optimization is the part that retired the most legacy code in the pipelines I've migrated. Once enabled at the catalog or schema level, the engine monitors each table's read and write patterns and decides on its own when to compact files, re-cluster, vacuum, and refresh statistics. The big win isn't the operations themselves — the imperative pipeline could already run those — it's that the timing is observed-driven, not schedule-driven. Tables that get heavy ingestion get more frequent maintenance. Cold tables get left alone. SQL -- Turn it on at the catalog level once; new tables inherit. ALTER CATALOG prod SET PREDICTIVE OPTIMIZATION = ENABLED; -- Or at the schema level for a phased rollout ALTER SCHEMA prod.gold SET PREDICTIVE OPTIMIZATION = ENABLED; -- Inspect what the engine has been doing on a given table SELECT operation, operation_metrics.numFilesAdded AS files_added, operation_metrics.numFilesRemoved AS files_removed, operation_metrics.numOutputBytes AS output_bytes, timestamp FROM (DESCRIBE HISTORY prod.gold.daily_totals) WHERE userMetadata IS NULL -- engine-driven, not user AND operation IN ('OPTIMIZE', 'VACUUM') AND timestamp >= current_timestamp() - INTERVAL 7 DAYS ORDER BY timestamp DESC; What you should delete after enabling this: the nightly notebook that runs OPTIMIZE on every table in a schema, the VACUUM cron job, the ANALYZE TABLE wrapper, and the alerting that wakes someone up when those jobs run long. None of them are needed anymore, and leaving them on creates duplicate work that the engine and the cron will fight over. 6. Ingestion: Auto Loader Replaces Listing-Based File Detection Auto Loader is the declarative answer to the perennial "which files have we processed already?" problem. Instead of listing a directory, comparing it to a state file, and figuring out the new bits, you describe the source location and the format and let the engine maintain its own incremental state. It uses cloud-native event notifications (S3 events, ADLS notifications, or efficient directory listing as a fallback), and the checkpoint is just another piece of state the engine owns. Python from pyspark.sql.functions import current_timestamp # Streaming ingest from S3 with schema inference + evolution. # Replaces hand-maintained checkpointing, listing logic, and # whatever file-tracking table the team built two years ago. (spark.readStream .format("cloudFiles") .option("cloudFiles.format", "json") .option("cloudFiles.inferColumnTypes", "true") .option("cloudFiles.schemaLocation", "s3://acme-checkpoints/txns_schema") .option("cloudFiles.schemaEvolutionMode", "addNewColumns") .load("s3://landing/txns/") .withColumn("_ingest_ts", current_timestamp()) .writeStream .format("delta") .option("checkpointLocation", "s3://acme-checkpoints/txns_writer") .trigger(availableNow=True) # batch-style; runs to completion .toTable("prod.bronze.txns")) Two notes from production. First, schemaEvolutionMode is the option that prevents the silent-data-loss class of bugs when partner schemas change; pick the policy explicitly rather than letting it default. Second, trigger(availableNow=True) gives you batch ergonomics on a streaming source — the job runs until it has consumed everything and exits, which is what most teams actually want for daily ingestion. 7. Transforms and Quality: Declarative Pipelines Replace Bare Spark + External DQ The final piece is the transformation layer. Lakeflow pipelines (the rebrand of Delta Live Tables) let you declare each table as a Python or SQL definition, and add expectations as a first-class concept. The engine derives the DAG from the dependencies and enforces the expectations on every write — the data quality framework, the lineage layer, and the orchestration glue collapse into a single artifact. Python import dlt from pyspark.sql.functions import sum as _sum, col @dlt.table( name="silver_txns", table_properties={ "delta.enableChangeDataFeed": "true", "delta.tuneFileSizesForRewrites": "true", }, cluster_by=["account_id", "ingest_date"], ) @dlt.expect_or_drop("non_null_amount", "amount IS NOT NULL") @dlt.expect_or_fail("valid_currency", "currency IN ('USD','EUR','GBP')") @dlt.expect("unique_txn", "txn_id IS NOT NULL") def silver_txns(): return (dlt.read_stream("bronze_txns") .dropDuplicates(["txn_id"])) @dlt.table(name="gold_daily_totals") def gold_daily_totals(): return (dlt.read("silver_txns") .groupBy("ingest_date", "account_id", "region") .agg(_sum("amount").alias("daily_total"))) The decorators do four things at once: define the table, declare its layout (cluster_by), declare its quality rules, and let the engine infer that gold_daily_totals depends on silver_txns from the dlt.read call. There is no DAG file. There is no separate Great Expectations suite. Lineage is generated for free in Unity Catalog, including column-level edges. If you want to query how the expectations have been performing — useful for SLO dashboards or alerting — the event log surfaces it directly: SQL -- Pass / fail / drop counts per expectation, last 24 hours SELECT flow_name, details:flow_progress.data_quality.expectations[0].name AS exp_name, details:flow_progress.data_quality.expectations[0].passed_records AS passed, details:flow_progress.data_quality.expectations[0].failed_records AS failed, details:flow_progress.data_quality.expectations[0].dropped_records AS dropped, timestamp FROM event_log("<pipeline-id>") WHERE event_type = 'flow_progress' AND timestamp >= current_timestamp() - INTERVAL 1 DAY ORDER BY timestamp DESC; 8. Putting It Together: Where to Start, What to Measure Adopting all of this at once is a recipe for pain. The order I've seen work, and a small set of metrics to verify the change is paying off: Step Adopt Retire Verify with 1 Predictive optimization at schema level Nightly OPTIMIZE / VACUUM jobs Reduction in maintenance-cluster cost 2 Liquid clustering on top 5 tables Static partitioning + Z-ORDER p95 query latency on the same workloads 3 Auto loader for 1-2 ingestion pipelines Custom file-tracking + listing logic End-to-end data freshness 4 Lakeflow pipelines for new pipelines only External DQ + DAG glue (for new work) Lines of pipeline code per table 5 Serverless compute for SQL warehouses + DLT Hand-sized job clusters Cost-per-query, scale-up time What you do not need to migrate: imperative pipelines that already work and aren't growing. Declarative patterns are about new work and high-pain hot spots, not a heroic rewrite of every notebook ever shipped. 9. Honest Limitations and Where Imperative Still Wins Three places where the declarative model still bites — worth knowing before you commit: Procedural logic still belongs in Jobs. If your pipeline is really a sequence of API calls with branching error handling, that's a Lakeflow Job (or external code), not a declarative table. Don't try to bend dlt around it.Predictive optimization needs observation time. On a table that's a week old, the engine hasn't seen enough patterns to make great decisions. For tables under heavy initial load, an explicit OPTIMIZE FULL after the first big ingest still helps.Cluster-by-column choice still matters. CLUSTER BY AUTO is great for stable workloads with predictable filters. For tables whose access pattern is genuinely heterogeneous across teams, an explicit cluster-by based on the dominant query is usually faster.Hint-driven escapes are still allowed. If a particular query benefits from a /*+ BROADCAST(t) */ hint and AQE isn't catching it, the hint is fine. Just keep them rare and document why. Conclusion The declarative optimization story isn't a single feature you toggle — it's a quiet shift in who owns the boring parts of a Spark pipeline. Layout, maintenance, ingestion bookkeeping, plan tuning, cluster sizing, data quality enforcement: every one of those was traditionally a thing the team owned and paid for in toil. The current Databricks stack lets you express each as an intent and let the engine handle the operations underneath. Adopt them in order, retire what they replace, and the optimization treadmill slows from a daily concern to a quarterly review. That's the actual win, and it's the reason the declarative paradigm has gone from a Lakeflow detail to the default mental model for new pipelines on Databricks.
With the increasing number of security threats, organizations have invested heavily in cybersecurity initiatives to protect their applications, infrastructure, and sensitive data. Security vulnerabilities are rarely introduced intentionally. Most of them creep into applications through shortcuts, overlooked edge cases, outdated libraries, or some bad coding habits. Modern Java has significantly improved its security capabilities, but no framework or JVM version can completely protect an application from insecure coding practices. As developers, we still need to understand where vulnerabilities originate and how to prevent them before they reach production. In this article, I am trying to summarize some of the most common Java security vulnerabilities and practical techniques used to prevent them. These are the same security best practices and lessons learned that I frequently share with new team members joining my team. I am sharing them here in the hope that they can serve as a practical handbook for Java developers looking to build more secure applications. 1. SQL Injection SQL injection remains one of the oldest and most dangerous vulnerabilities. It occurs when user input is directly concatenated into SQL statements. Consider the following example: Java String query = "SELECT * FROM users WHERE username = '" + username + "'"; Statement stmt = connection.createStatement(); ResultSet rs = stmt.executeQuery(query); If an attacker enters, the query can be manipulated to return unintended results. SQL admin' OR '1'='1 Prevention Always use parameterized queries. Java String query = "SELECT * FROM users WHERE username = ?"; PreparedStatement stmt = connection.prepareStatement(query); stmt.setString(1, username); ResultSet rs = stmt.executeQuery(); Prepared statements separate data from executable SQL, eliminating injection opportunities. 2. Hardcoded Secrets One of the most common findings during security reviews is hardcoded credentials. Java private static final String API_KEY = "abcd123456789"; This may seem harmless during development, but once committed to source control, secrets often remain exposed indefinitely. Prevention Store secrets externally. SQL String apiKey = System.getenv("PAYMENT_API_KEY"); Better alternatives are to include it in AWS Secrets Manager, Azure Key Vault, HashiCorp Vault, or Kubernetes Secrets. Secrets should never live inside source code repositories. 3. Insecure Deserialization Java serialization has been responsible for numerous security incidents. Example: Java ObjectInputStream input = new ObjectInputStream(request.getInputStream()); Object obj = input.readObject(); The danger is that attackers can craft malicious serialized objects that execute unexpected code during deserialization. Prevention Avoid Java serialization whenever possible. Prefer formats such as JSON, XML (with secure parsing), or Protocol Buffers. Example using Jackson: Java ObjectMapper mapper = new ObjectMapper(); User user = mapper.readValue(json, User.class); Using structured formats reduces attack surfaces significantly. 4. Cross-Site Scripting (XSS) Although often associated with front-end applications, backend services can accidentally enable XSS vulnerabilities when user-generated content is returned without sanitization. Example: Java String comment = request.getParameter("comment"); response.getWriter().write(comment); If the user submits, the browser executes the script. HTML <script>alert('Hacked')</script> Prevention Always encode output. Using Spring: Java String safeComment = HtmlUtils.htmlEscape(comment); Additionally, validate inputs, sanitize rich text, and implement Content Security Policies (CSP). 5. Path Traversal Attacks File download functionality often introduces path traversal vulnerabilities. Example: Java String file = request.getParameter("file"); Path path = Paths.get("/documents/" + file); An attacker could submit and potentially access sensitive files. Shell ../../../etc/passwd Prevention Normalize and validate paths. Java Path base = Paths.get("/documents"); Path resolved = base.resolve(file).normalize(); if (!resolved.startsWith(base)) { throw new SecurityException( "Invalid file path"); } Never trust file names coming directly from user input 6. Weak Password Storage Storing passwords improperly remains surprisingly common. Bad practice: Java String passwordHash = DigestUtils.md5Hex(password); MD5 and SHA-1 are no longer considered secure for password storage. Prevention Use adaptive hashing algorithms. Example with BCrypt: Java BCryptPasswordEncoder encoder = new BCryptPasswordEncoder(); String hash = encoder.encode(password); BCrypt automatically includes salting and work-factor adjustments. Other strong alternatives include Argon2, PBKDF2 or SCrypt 7. Dependency Vulnerabilities Modern Java applications often contain more third-party code than custom code. A secure application can still become vulnerable because of outdated dependencies. Prevention Integrate dependency scanning into CI/CD pipelines. Example Maven plugin: XML <plugin> <groupId>org.owasp</groupId> <artifactId>dependency-check-maven</artifactId> </plugin> Additionally, tools such as Snyk can automatically identify known vulnerabilities. We have been using Snyk for the last couple of years, and it is effective. Regular dependency updates should be part of every release cycle. 8. Improper Logging of Sensitive Data Developers often log information for troubleshooting without considering security implications. Example: Java logger.info( "Login request received for user={} password={}", username, password); This exposes credentials inside log files. Prevention Mask or exclude sensitive information. Java logger.info( "Login request received for user={}", username); Never log passwords, access tokens, credit card information, Personal health information (PHI), or PII information. This is especially important in regulated industries such as healthcare, like ours. 9. Insufficient Authentication and Authorization Authentication verifies identity, and authorization determines access. Many applications perform authentication correctly but fail to enforce authorization consistently. Example: Java @GetMapping("/admin/users") public List<User> getUsers() { return userService.findAll(); } Without authorization checks, any authenticated user might gain access. Prevention Use role-based security. Java @PreAuthorize("hasRole('ADMIN')") @GetMapping("/admin/users") public List<User> getUsers() { return userService.findAll(); } Security should be enforced at every layer, not just the UI. 10. Lack of Input Validation Many vulnerabilities originate from accepting unexpected input. Example: Java String age = request.getParameter("age"); int userAge = Integer.parseInt(age); Invalid input can cause exceptions or unexpected behavior. Prevention Validate all external input. Java @Min(18) @Max(120) private Integer age; Bean Validation provides a simple and consistent approach for validating request payloads. Never assume user input is safe. Final Thoughts Security is not a feature that can be added at the end of a project. It needs to be part of the development process from the very beginning. The vulnerabilities discussed here are not theoretical. They are among the most common findings during security assessments, penetration tests, and production incident investigations. Fortunately, modern Java provides mature frameworks, libraries, and tools that make secure development significantly easier than it was a decade ago. The key is building security awareness into everyday development practices: Use parameterized queriesProtect secrets properlyValidate all inputsKeep dependencies updatedApply strong authentication and authorizationLog responsiblyContinuously scan for vulnerabilities Security is ultimately about reducing risk. Small improvements applied consistently across a codebase can prevent incidents that would otherwise become expensive lessons later.
This is the third follow-up to Friday's release post. Saturday's was about how you iterate; yesterday's was about new platform APIs in the core; today's is about a run of pieces that change how you write the structural parts of an app. The pieces are an OpenAPI client generator, a SQLite ORM, JSON and XML mappers, a component binder with validation, build-time SVG and Lottie transcoders, and a declarative router with deep links. All ride on a single build-time codegen pipeline: a Maven-plugin pass that reads annotations or declarative source files at build time and emits typed Java that compiles into your binary. No reflection, no service loader, no Class.forName. The "How it works" section at the end of this post covers the codegen plumbing once you have seen what it powers. OpenAPI Client Generation The headline of this release for any team that talks to a backend. A new cn1:generate-openapi-client Mojo reads an OpenAPI 3.x JSON spec (a URL or a local file) and writes typed Codename One client code that compiles into your app: One @Mapped POJO per components.schemas entry.One <Tag>Api.java class per OpenAPI tag, with one fluent method per operation.Every method routes through Rest.<verb> + Mappers.toJson + fetchAsMapped / fetchAsMappedList, so the generated surface integrates with the rest of the framework instead of dragging in a separate HTTP stack. Wire it into the project's pom.xml: XML <plugin> <groupId>com.codenameone</groupId> <artifactId>codenameone-maven-plugin</artifactId> <executions> <execution> <id>petstore-client</id> <goals><goal>generate-openapi-client</goal></goals> <configuration> <specUrl>https://petstore3.swagger.io/api/v3/openapi.json</specUrl> <basePackage>com.example.petstore</basePackage> </configuration> </execution> </executions> mvn generate-sources picks the spec up, downloads it, and writes one file per schema and one per tag under target/generated-sources/. The Petstore reference spec exercised end-to-end produces six model classes (Pet, Order, Customer, Tag, Category, User) and three API classes (PetApi, StoreApi, UserApi), and the nine generated .class files compile cleanly against codenameone-core. Documented at the OpenAPI codegen Maven goal. In application code you call the generated Api class the same way you would call any other Java method: Java PetApi pets = new PetApi(); // Returns AsyncResource<Pet>; resolves with the deserialised object. pets.getPetById(42).onResult((pet, err) -> { if (err == null) Log.p("Got " + pet.getName()); }); // Returns AsyncResource<List<Pet>>. pets.findPetsByStatus("available").onResult((list, err) -> { if (err == null) { for (Pet p : list) Log.p(p.getName()); } }); // POST with a request body. addPet takes a Pet, returns a Pet. Pet candidate = new Pet(); candidate.setName("Mittens"); candidate.setStatus("available"); pets.addPet(candidate).onResult((created, err) -> { /* ... */ }); There is no hand-rolled ConnectionRequest setup, no manual JSON parsing, no string-typed request bodies. The generated client takes a typed Pet, serializes it with Mappers.toJson(...), fires the right HTTP verb, deserializes the response with Mappers.fromJson(...), and surfaces the result through the framework's AsyncResource so your callback fires on the EDT. For teams who already publish an OpenAPI spec as part of their backend (most modern backend frameworks do this automatically; FastAPI, Spring's springdoc-openapi, NestJS, ASP.NET Core, Go's gnostic), the practical effect is that the mobile client's bindings stay in sync with the backend without anyone hand-writing a single network call. Update the spec, re-run mvn generate-sources, and the new and changed endpoints land in your app as typed Java; the IDE picks up immediately. It is the kind of change that is most useful when you do not know you have it: pull a fresh spec, rebuild, and your IDE highlights every place in the codebase that called a renamed endpoint or passed the wrong type to a parameter. SQLite ORM @Entity marks the class; @Id and @Column shape the schema; @DbTransient opts a field out: Java @Entity public class TodoItem { @Id @Column long id; @Column String title; @Column(name = "completed_at") Date completedAt; @DbTransient Object cachedView; } Dao<TodoItem> dao = EntityManager.open("todos.db").dao(TodoItem.class); dao.createTable(); dao.insert(new TodoItem(0, "Read the post", null)); List<TodoItem> open = dao.find("completed_at IS NULL", new Object[] {}); TodoItem byId = dao.findById(42); dao.delete(byId); The generated DAO does the typed work underneath. No reflection in insert; the generated code calls setString(1, e.title) and setLong(2, e.id) directly against the SQLite PreparedStatement. Validation at build time catches missing @Id, fields that look like relationships but are not yet supported, and abstract entity classes; the build fails with a class name and a reason. For JPA/Hibernate developers, the API is intentionally familiar. @Entity, @Id, @Column, and @Transient (here renamed @DbTransient to avoid colliding with java.beans.Transient) carry the same meaning they do under javax.persistence / jakarta.persistence. The EntityManager name is the same. Dao#findById, Dao#findAll, Dao#find(where, params), Dao#insert, Dao#update, Dao#delete line up with the basic JPA repository contract. The query language is plain SQL (there is no JPQL or Criteria DSL), but the annotation surface, the lifecycle, and the runtime methods will feel like a long-lost friend to anyone with server-side Java persistence experience. JSON/XML Mapping @Mapped marks a class as a transferable POJO. @JsonProperty and @XmlElement (plus @XmlRoot, @XmlAttribute, @JsonIgnore, @XmlTransient) shape the wire format. The runtime entry points are Mappers.toJson(...), Mappers.fromJson(...), Mappers.toXml(...), Mappers.fromXml(...): Java @Mapped public class User { @JsonProperty("user_id") long id; @JsonProperty String name; @JsonProperty("created_at") Date createdAt; @JsonIgnore String passwordHash; } String json = Mappers.toJson(user); User back = Mappers.fromJson(json, User.class); The same @Mapped POJO is the type the typed Rest helpers accept: Java Rest.get("https://api.example.com/users/42") .fetchAsMapped(User.class) .onResult((user, err) -> { /* ... */ }); Rest.get("https://api.example.com/users") .fetchAsMappedList(User.class) .onResult((users, err) -> { /* ... */ }); Rest.fetchAsJsonList (top-level JSON arrays, no {"root":[...]} envelope trick), JSONWriter (the complement of JSONParser, with fluent builders and streaming variants for Writer and OutputStream), and URLImage.setDefaultBearerToken (auth headers on image fetches) all ship alongside. For JAXB developers, the XML surface (@XmlRoot, @XmlElement, @XmlAttribute, @XmlTransient) is a direct port of the long-established javax.xml.bind.annotation surface. The same model class can be both @XmlRoot-decorated and @JsonProperty-decorated, which gives you a single source of truth for both wire formats. The JSON surface adopts the Jackson convention (@JsonProperty, @JsonIgnore) that nearly every modern JVM JSON binding (Jackson, Moshi, kotlinx-serialization) inherited. Component Binding With Validation The fourth annotation processor on the same pipeline is the component binder. @Bindable marks a model class; @Bind(name = "userField") ties a field to a component on a form by the component's name. Field-level validation annotations compose with @Bind on the same field: Java @Bindable public class SignupModel { @Bind(name = "userField") @Required @Length(min = 3) private String user; @Bind(name = "emailField") @Required @Email private String email; @Bind(name = "ageField") @Numeric(min = 13, max = 120) private String age; @Bind(name = "roleField") @ExistIn({ "admin", "editor", "viewer" }) private String role; } The matching form sets a name on each component so the binder can find them: Java TextField user = new TextField(); user.setName("userField"); TextField email = new TextField(); email.setName("emailField"); TextField age = new TextField(); age.setName("ageField"); ComboBox<String> role = new ComboBox<>("admin", "editor", "viewer"); role.setName("roleField"); Button submit = new Button("Sign up"); Form form = new Form("Sign Up", BoxLayout.y()); form.add(user).add(email).add(age).add(role).add(submit); form.show(); SignupModel model = new SignupModel(); Binding binding = Binders.bind(model, form); binding.getValidator().addSubmitButtons(submit); Binding is the handle: refresh() re-reads the model into the components, commit() writes the components back, disconnect() tears the listeners down. Multiple validation annotations on a single field compose via Validator.addConstraint(Component, Constraint...) and GroupConstraint (first failure wins). @Validate(MyClass.class) is the escape hatch for hand-written Constraint implementations. The validation set: @Required, @Length, @Regex, @Email, @Url, @Numeric, @ExistIn, @Validate. The new BindAttr enum lets @Bind target a specific attribute of the component (TEXT, UIID, SELECTED, ...) when the default ("write a String field into the component's text") is not what you want. SVG at Build Time Drop an SVG into src/main/css/, alongside theme.css: Shell src/main/css/ theme.css star.svg gradient_circle.svg path_arrow.svg rounded_button.svg wave.svg pro_badge.svg After the next build, every SVG is a regular Codename One Image. An SVG handled by the transcoder is a vector image, but it is still an Image. Everywhere a raster Image works (Label.setIcon, Button.setIcon, BorderLayout.NORTH, the toolbar, a MultiButton's leading icon, a CSS background: url(...) rule), the SVG works too. The difference is that it stays crisp at any size: the same source file is sharp at a 16-point list-row icon, a 64-point hero header, and a 256-point launch screen, on every DPI bucket. A grid of the static SVGs from the hellocodenameone fixture, rendered through the new pipeline: Sizing in Millimeters The SVG transcoder's most useful feature is also the one most easily missed: size every SVG in millimeters from CSS. SVGs in the wild routinely declare odd width / height attributes (a 1024×1024 export of a 24×24 icon, no dimensions at all, design-pixel values from one specific framework). Pinning the rendered size in millimeters sidesteps all of that. CSS HomeIcon { background: url(home.svg); cn1-svg-width: 6mm; cn1-svg-height: 6mm; bg-type: image_scaled_fit; } LogoBanner { background: url(logo.svg); cn1-svg-width: 32mm; cn1-svg-height: 12mm; } A 6 mm icon is 6 mm tall on a 1× desktop, 6 mm on a high-DPI handset, and 6 mm on a 4K tablet. The transcoder routes both values through Display.convertToPixels() at install time, the same way font-size: 3mm already behaves elsewhere in Codename One CSS. No design-pixel guesswork, no DPI bucket to choose, no scaling surprise when the artist re-exports the source SVG at a different resolution. If a project does not use CSS for theming, the two-float constructor on the generated class takes millimeters directly: new com.codename1.generated.svg.Home(6f, 6f). Coverage and What We Still Want Feedback On The transcoder is a maven/svg-transcoder/ module that parses SVG with javax.xml StAX. No Batik, no Flamingo, no external dependencies. Coverage targets what real-world icon SVGs use: rect (rounded corners included), circle, ellipse, line, polyline, polygon, the full path grammar (M / L / H / V / C / S / Q / T / A / Z plus relative-coordinate and smooth-curve reflection), groups with affine transforms (translate, scale, rotate, skew, matrix), linear gradients via LinearGradientPaint, fill, stroke, stroke-width, linecap, linejoin, opacity. SMIL animations are supported in the same pipeline: <animate>, <animateTransform> (translate, scale, rotate), and <set>. Time values interpolate against wall-clock time on every paint, with from / to / values / begin / dur / repeatCount / fill="freeze" honored. Text and clip-path landed in the follow-up PR for the static SVG fixtures, and both are visible in the screenshot above (the "Codename One / build-time SVG" wordmark in the rounded button, the "PRO" badge text, and the clip-path-shaped rounded-corner badge underneath). <text> and <tspan> work with single-style fills and transforms; <clipPath> referenced via clip-path="url(#id)" works against rect, circle, and path clip shapes (nested clip refs are ignored). What is still not supported: SVG filter primitives, <mask> (treated as a clip, so alpha masking falls back to opaque), <radialGradient> (falls back to the first-stop color), and CSS-in-SVG (style rules inside the SVG document; the transcoder reads presentation attributes and the inline style="..." attribute, but a <style> element with selectors is not parsed). If you hit an SVG that does not transcode the way you expect, please open an issue at github.com/codenameone/CodenameOne/issues and attach the source file. The fastest way to extend the coverage is for us to run the failing case through the test fixtures and watch the output. Every SVG we ship test goldens for started as somebody else's "this doesn't render right" report. Caveat on iOS: The transcoded SVGs use the framework's shape API (fillShape, drawShape, LinearGradientPaint). The full surface is implemented on the Metal renderer. The deprecated GL ES 2 pipeline does not have parity on every operation, so an SVG drawn under ios.metal=false will often render with visible artifacts (missing gradients, clipped fills, distorted paths) rather than the placeholder you might expect. Now that Metal is the default for new iOS builds as of last Friday, this is a non-issue on most apps; if you have explicitly pinned ios.metal=false, expect some visual regressions on SVG content and let us know which. The coverage matrix and troubleshooting are in the SVG Transcoder in the developer guide. Lottie at Build Time The same pipeline carries Lottie. Drop a Bodymovin export into the same src/main/css/: JSON src/main/css/ theme.css pulse.json spinner.json After the next build, both are real Image instances on every platform that exposes the shape API. The same vector-everywhere story as SVG: a Lottie animation renders crisply at any size and slots into any Image slot in the framework. Java Image pulse = Resources.getGlobalResources().getImage("pulse"); Image spinner = Resources.getGlobalResources().getImage("spinner"); Animation runs against wall-clock time on every paint, with no Timer and no allocation in the hot path. A capture of the hellocodenameone Lottie fixture in motion: The Lottie transcoder lives in maven/lottie-transcoder/. It parses Bodymovin JSON with no external dependencies (the framework's built-in JSON parser carries the load) and lowers each file into the same SVGDocument model the SVG path uses. The same JavaCodeGenerator emits the same GeneratedSVGImage subclass, and the same SVGRegistry registers it under the source filename. No new Image base class, no new registry, no per-port wiring, since the SVG path's JavaSE reflective load and iOS / Android Stub weaving already cover the new format. Coverage in v1: shape layers (rc / el / sh) with solid fills and strokes; layer transforms (anchor, position, scale, rotation, opacity); animated rotation, position, and scale collapsed to a two-keyframe loop; solid-color layers as filled rects. Most icon-grade Bodymovin exports lower cleanly. Complex character animations from After Effects with image references, masks, and effects do not, and the transcoder logs which layers it dropped so the source of any blank output is obvious. Same ask as for SVG: if a Lottie / Bodymovin file does not transcode the way you expect, please open an issue at github.com/codenameone/CodenameOne/issues and attach the source .json. The transcoder grows one shape family at a time from the cases the community reports. The same iOS caveat applies: the renderer leans on the shape API, so the deprecated GL ES 2 pipeline shows artifacts on the more elaborate Lottie animations. Use the Metal default (now on by default for new iOS builds). Deep Links and Routing Two pieces of plumbing for apps that handle URLs from outside themselves (notification taps, marketing links, share targets, Universal Links from Safari and the equivalent App Links from Chrome on Android). Deep Links Codename One has had deep-link support for a long time through Display.setProperty("AppArg", url). The platform plumbing already writes the incoming URL into that property on cold launch, and an app-resume sets it again on warm launch; reading it back from start() works fine for a small number of patterns. Where the AppArg-only approach gets fragile is consistency. The cold and warm paths execute different lifecycle code, the value is a flat string with no parsing, and the trickiest case is the one where a user lands in the middle of the app via a link and then continues to interact: their next navigation needs to compose with the entry point, the back-stack needs to make sense as if they had arrived through the usual flow, and "fall off the edge of the app" on back is a common bug. With a hand-rolled AppArg reader it is easy to miss one of these and ship a half-working flow. This release introduces a typed DeepLink and a single handler that fires for both cold and warm launches: Java Display.getInstance().setDeepLinkHandler(link -> { // link is a normalised DeepLink: scheme, host, path, // segments, query map, fragment. Same shape cold or warm. if ("/users".equals(link.path()) && link.segments().size() == 2) { showUserDetailForm(link.segments().get(1)); return true; } return false; AppArg still works for projects that depend on it, but the new handler is what we recommend going forward. The handler runs on a consistent lifecycle path on both cold and warm starts, and the parsed DeepLink value carries the scheme, host, path segments, query map, and fragment, so app code does not need to roll its own URL parser. Routing For projects that handle more than a handful of URL patterns, the second piece is the declarative router in com.codename1.router. We built it on the same build-time codegen pipeline as the ORM and the mappers (the router was actually the first concrete consumer of the new preprocessor), so the two surfaces compose: a deep-link handler that delegates to the router becomes a one-liner. Each form declares its own path with a @Route annotation: Java @Route("/") public class HomeForm extends Form { /* ... */ } @Route("/users/:id") public class UserDetailForm extends Form { public UserDetailForm(RouteMatch match) { String userId = match.param("id"); // build UI for user `userId` } } @Route("/about") Router.navigate("/users/42") resolves the path, instantiates UserDetailForm, and shows it. The deep-link handler now collapses to: Java Display.getInstance().setDeepLinkHandler(link -> Router.navigate(link.toString())); Each form owns its own routing rule. Adding or moving a screen is a one-class change. The "what screens does this app have, and at what paths?" question is answered by an IDE search for @Route, not by reading every form constructor in the project. For Spring developers, the shape is familiar by design. @Route plays the same role as Spring MVC's @RequestMapping: a class-level declaration that announces "this controller handles URLs of this shape". The :id parameter syntax mirrors Spring's {id} path-variable syntax; RouteMatch.param("id") is the same kind of accessor as Spring's @PathVariable. The mental model carries over from server-side Java with almost no friction. The same recognition is available to anyone with React Router, Vue Router, or Angular Router experience; the :param convention is the cross-framework default. The build-time processor validates that each annotated class extends Form, that the path starts with /, that the constructor is accessible, and that there are no duplicate patterns. Any rule violation fails the build with a class name and a reason, not at runtime with a stack trace. The rest of the router surface covers the kind of thing that has become table stakes in modern client routing: Route guards run before navigation completes and can cancel or redirect.Per-tab navigation stacks via TabsForm, where each tab keeps its own back stack.Location listeners so anything in the app can subscribe to "the route changed".Form.setPopGuard(PopGuard) intercepts hardware back, toolbar back, or Router.pop() with a chance to ask "are you sure?".Sheet.showForResult() returns an AsyncResource<T> that auto-cancels with null if the user dismisses the sheet. The API is opt-in. Apps that prefer the existing Form.show() / Form.showBack() flow keep using that; nothing changes. For the link-publishing side, an AasaBuilder emits the iOS apple-app-site-association JSON and an AssetLinksBuilder emits the Android assetlinks.json. The full setup walk-through (entitlements, the Android intent-filter, the .well-known/ upload on your origin server) is at Routing and Deep Links in the developer guide. The JavaScript port bridges the router into window.history so navigating the in-app router pushes a real entry into the browser's session history. Back and forward in the browser drive the router; reloading the page lands at the deep-link URL; sharing the URL out of the address bar takes a colleague to the same in-app location. How It Works: The Build-Time Codegen Pipeline Everything above sits on a single Maven-plugin pass. The plugin has an AnnotationProcessor SPI and two new Mojos: cn1:generate-annotation-stubs (in generate-sources) and cn1:process-annotations (in process-classes). The orchestrator ASM-scans target/classes, dispatches to every registered processor, validates the annotated classes, and emits a typed runtime artifact next to each one plus a tiny Index class that registers everything with a public runtime registry. Adding a new processor later is a matter of dropping it into META-INF/services with no orchestrator changes. The reason this runs against bytecode rather than against source text is that the source-regex prototype was scrapped early. The bytecode pass sees the JVM's view of the project (extends Form is a thing the JVM actually knows, not a pattern we have to hope the user wrote a specific way), rule violations come back with class names and reasons, and the build fails fast before any generated .class lands on disk. The infrastructure shares the ASM passes that the BytecodeComplianceMojo's existing String rewrites already use. A small stub source is emitted under target/generated-sources/cn1-annotations/ during generate-sources so application code that references the generated registry resolves at compile time. The real .class overwrites the stub later in process-classes. Standard "compile against a stub, link against the real thing" pattern; it just works inside a single Maven build instead of needing a multi-module split. cn1-core ships a no-op stub of each generated index (RoutesIndex, MappersIndex, BindersIndex, DaosIndex), so application code compiles even when the project has no annotated classes. The build-time processor shadows each stub with the real implementation before packaging. The SVG and Lottie transcoders sit on a parallel pipeline (declarative graphics files in place of annotations), but they emit the same shape of code and obey the same constraints. The practical effect is that the kind of code that historically required reflection at runtime (with all the obfuscation hazards and surprise allocations that come with that) now happens once at build time and produces direct, dead-code-eliminable, rename-safe symbol references. Wrapping Up That closes this release's post series. We already have some pretty big features lined up for this Friday's release post; the headline pieces are the most substantial things to land in months and are worth checking back for. Back to the weekly index.
Managing high-volume message traffic in distributed architectures is crucial. Efficient use of database and CPU resources is also very important. There are structures that allow us to receive messages in batches. The default Spring Kafka "BatchMessageListener" structure addresses this need. However, the processing of these messages often goes through a sequential bottleneck. This article will discuss the structure and usage of Kotlin Coroutines in detail. We will examine how to maximize Kafka message processing performance using Structured Concurrency principles and Resource Throttling techniques. Architectural Bottleneck: Sequential I/O Blocking On the current Kafka listener: Database or external service calls made for each message directly increase total processing times. If the processing speed of a message lags behind the message arrival speed and the max-poll-interval-ms time is exceeded, the consumer is removed from the consumer group. Rebalancing is triggered, and the partitions of that consumer are redistributed to other consumers in the group. Kotlin @KafkaListener(topics = ["usage-pool-topic"]) fun usagePoolListener(records: List<ConsumerRecord<String, String>>) { records.forEach { record -> processRecord(record) // Network latency + DB I/O blocking } } Solution 1. Batch-Fetch and In-Memory Map Structure Before any concurrent code is entered, data is retrieved collectively from all necessary entities. Multiple separate queries are converted into a batch query before data processing begins. The N+1 query problem is solved at the application layer. All data is cached once before being broken down into concurrent operations. Having the data cached significantly reduces our reliance on the database. Using the associateBy function, we transform the data into a map structure with X access times. This allows us to read the data safely from the maps instead of reading each concurrent operation from the database. Kotlin val messages = records.map { objectMapper.readValue(it.value(), UsagePoolRecord::class.java) } val usagePoolEntities = usagePoolRepository .findByIds(messages.map { it.usagePoolId.toBigInteger() }) .associateBy { it.usagePoolId } val lockEntities = lockRepository .findByUserIds(messages.map { it.userId }) .associateBy { it.userId } 2. Structured Concurrency Memory Management With Chunking The chunk structure serves two purposes. It prevents the creation of coroutines simultaneously. This prevents unnecessary memory usage. Each chunk writes to the database after all coroutines have completed their operations. Unnecessary connection pool consumption is avoided. Kotlin messages.chunked(150).forEach { chunk -> // Each chunk of 150 records is processed concurrently } Resource Isolation With limitedParallelism Why limitedParallelism? If the database connection pool has, for example, X connections, keeping the parallelism limit below X prevents "Connection Timeout" errors. Kotlin messages.chunked(150).forEach { chunk -> val deferredResults = chunk.map { record -> CoroutineScope(Dispatchers.IO.limitedParallelism(15)).async { try { processRecord(record, usagePoolEntities, lockEntities) } catch (e: Exception) { log.error("Operation error: ${record.key()}", e) buildErrorRecord(record, e) } } } val results = deferredResults.awaitAll() // Structural waiting collectAndAggregate(results) } The Dispatchers.IO.limitedParallelism(X) command limits the number of concurrent coroutines to X, preventing the DB connection pool from being exhausted.Each coroutine returns a result with the async command. The awaitAll() command waits for all coroutines in the chunk to finish before proceeding to the next step. runBlocking This function blocks callers until all concurrent operations are complete. This is the correct approach here because: It ensures that the Kafka consumer remains blocked to maintain its offset commit structure until all records in the batch are processed. We still benefit from concurrent operation parallelism within the runBlocking block. 3. Thread-Safe Result Structure After the awaitAll() operation, all results are collected in thread-safe queues. Then a single batch write operation takes place. Using MutableList structures to combine results returned from parallel processed coroutines can lead to data loss. At this point, lock-free data structures should be preferred. ConcurrentLinkedQueue uses CAS (Compare-And-Swap) algorithms instead of synchronized blocks. This provides superior performance in high-content write operations. Why Shouldn't We Use ConcurrentLinkedQueue? Concurrent operations (concurrent functions) perform simultaneous write operations to a shared collection of results. Using MutableList leads to race conditions. It performs well in secure and concurrent write operations. Kotlin data class AggregatedRecords( val processedSave: ConcurrentLinkedQueue<ProcessedEntity> = ConcurrentLinkedQueue(), val toDelete: ConcurrentLinkedQueue<UsagePoolEntity> = ConcurrentLinkedQueue(), val retryQueue: ConcurrentLinkedQueue<RetryEntity> = ConcurrentLinkedQueue() ) The DataIntegrityViolationException return is important. When two consumer instances are processing the same record, one of them falls into a unique constraint violation. Instead of making the entire batch fail, record-by-record deletion is performed. Kotlin AggregatedRecords.processedSave .chunked(150) .forEach { batch -> try { processedRepository.saveAll(batch) } catch (e: DataIntegrityViolationException) { batch.forEach { record -> try { processedRepository.save(record) } catch (e: DataIntegrityViolationException) {} } } } 4. Error Tolerance in Write Operations Batch write (saveAll) operations are performant. However, a "Unique Constraint" error in a single record can cause the entire batch to fail. The following structure is critical to meet Optimistic Locking or Idempotency requirements. Kotlin aggregatedRecords.processedSave.chunked(150).forEach { batch -> try { processedRepository.saveAll(batch) } catch (e: DataIntegrityViolationException) { // Fallback: Try one by one if batch fails batch.forEach { record -> try { processedRepository.save(record) } catch (innerException: DataIntegrityViolationException) { log.warn("Duplicate record skipped: ${record.id}") } } } } 5. Data Flow Diagram Ingress: The Kafka batch is caught with runBlocking.Preparation: All necessary context data is retrieved bulk from the DB.Execution: Coroutines are started asynchronously in chunks.Synchronization: The completion of all coroutines is awaited as a barrier point with awaitAll().Egress: Collected results are made permanent with saveAll. Performance Analysis and Results Conclusion Processing Kafka messages in Spring Boot with Kotlin Coroutines not only increases speed but also improves code readability and makes resource management deterministic (predictable). The use of runBlocking allows us to build a bridge between the blocking Kafka consumer thread and the suspended world without disrupting Kafka's offset management mechanism. Dependencies XML <dependency> <groupId>org.jetbrains.kotlinx</groupId> <artifactId>kotlinx-coroutines-core</artifactId> <version>1.7.3</version> </dependency> <dependency> <groupId>org.springframework.kafka</groupId> <artifactId>spring-kafka</artifactId> </dependency>
This is the first follow-up to Friday's release post, and it covers the two changes from this release that affect how you iterate on a Codename One app rather than what the app itself does. On-device debugging that treats Java as Java on a real iPhone or a real Android device, and standard JUnit 5 against the JavaSE simulator. The first is the one we have been wanting for a long time, and is the one that takes the most explaining, so most of the post is about it. On-Device Debugging That Treats Java as Java Codename One has always supported on-device debugging in the strict technical sense. You could attach Xcode to a .ipa, you could attach Android Studio to a running APK, you could read the native call stack, you could step through Objective-C or the C that ParparVM emits. What you could not do was set a breakpoint in MyForm.java, hit it on a real iPhone, and inspect a Java field on a Java object as a Java object. You also could not debug an iOS app without a Mac in the loop somewhere, because the only debugger that understood the binary was Xcode. The translation step between the Java you wrote and the C that ParparVM produces left no way back across the gap on the device. PR #4999 (iOS) and PR #5012 (Android) close that gap. As of this week, any JDWP-speaking debugger (IntelliJ IDEA, jdb, VS Code's Java Debugger, Eclipse, NetBeans) can attach to a Codename One app and treat the running process as a JVM. Supported targets: iOS The iOS Simulator (requires a Mac, because the iOS Simulator only runs on a Mac),A real iPhone reached over Wi-Fi from the developer machine on the same network. You do not need a local Mac to debug on a real iPhone. The Codename One build cloud runs the iOS build for you and produces a signed .ipa; install it on your iPhone the usual way (TestFlight, ad-hoc, or the standard Build Cloud install link), and the JDWP attach over Wi-Fi works from a Linux or Windows IDE just as well as from a Mac. The Mac is only required for the local Xcode build path and for running the iOS Simulator. Android The Android emulatorA real Android phone over USBA real Android phone over wireless adb The Android attach uses standard adb, so you need the Android SDK platform tools installed on the developer machine. Those are available on macOS, Linux, and Windows, so any of the three is fine for Android debugging. What It Looks Like A breakpoint inside an iOS app, hit on the iOS Simulator next to IntelliJ IDEA: The same Debug tool window you use for any other Java project. The frames panel on the left has the full Java call stack. The Variables panel shows this and the locals as Java values, with the same drill-down you would get on a regular JVM. The simulator on the right is the real iOS app, paused at the breakpoint, waiting for the next step. How the Pieces Fit Together On iOS, the IDE never talks to the device directly. The CN1 Debug Proxy is a small Java process you run on your developer machine. It binds two TCP ports: one for the iOS app to dial into using the CN1 wire protocol, and one that speaks standard JDWP for the IDE. The IDE sees a normal remote JVM. The iOS app sees a debug proxy. The proxy translates between the two and walks the ParparVM struct layout so Java fields, method calls, and values round-trip cleanly in both directions. On Android, the proxy is unnecessary. Dalvik/ART implements JDWP themselves, so IntelliJ attaches directly to the device through adb's built-in JDWP forwarder. The Maven plugin's new cn1:android-on-device-debugging goal does the adb orchestration and the port forwarding for you. A capability difference between the two platforms worth knowing up front: on Android, a native interface's Impl class is regular Java, so the JDWP attach steps through it the same way it steps through any other class in your project. On iOS the Impl is Objective-C, which JDWP does not speak, so you cannot step through it from the IDE. You can still step through the Codename One framework code and your own Java up to and through the native-interface call, and you can inspect the value the call returns; the body of the Objective-C method is the only thing that is opaque from the JDWP side. Attach Xcode in parallel if you need to step through the Objective-C as well. Tutorial: IntelliJ + iOS The Codename One archetype now generates two run configurations under an On-Device Debug folder in the IntelliJ run-config dropdown: CN1 Debug Proxy and CN1 Attach iOS. The tutorial below assumes a project generated from the Initializr recently enough to have those. If you have an older project, generate a new project with initializr and copy over the .idea directory and maven pom.xml files. 1. Enable the Build Hints Open common/codenameone_settings.properties and uncomment the four lines the archetype generated: Properties files ios.onDeviceDebug=true ios.onDeviceDebug.proxyHost=127.0.0.1 ios.onDeviceDebug.proxyPort=55333 ios.onDeviceDebug=true flips the iOS build into the instrumented variant. The other three configure the proxy connection. The fourth hint, ios.onDeviceDebug.waitForAttach=true, is the block-on-load option, and we recommend leaving it on. With it enabled, the iOS app shows a "Waiting for debugger" overlay at launch and does not progress past Display.init until the proxy issues its first resume. The recommendation is mostly about making the on-device-debug variant visible. Without the overlay it is easy to launch an on-device-debug build expecting the debugger to attach and not realize it is silently waiting for a proxy that is not running, and it is also easy to mistake an on-device-debug build for a regular build and then be surprised when it does not perform as smoothly as the release variant. The overlay rules out both of those. For a physical iPhone the proxyHost value should be the laptop's LAN IP (run ifconfig | grep "inet " to find it) rather than 127.0.0.1. The iOS Simulator can always use 127.0.0.1. 2. Build the iOS App Either path works: Local Xcode build (mvn cn1:buildIosXcodeProject) and then run from Xcode.Cloud build for a real device (mvn cn1:buildIosOnDeviceDebug) and install the resulting .ipa. Both produce an iOS binary instrumented for on-device debugging because the build hint is set. 3. Start the Proxy In IntelliJ, pick CN1 Debug Proxy from the run-config dropdown and click the green Run button (not the bug icon; Debug on this config would attach IntelliJ to the proxy itself, which is not what you want). The Run tool window shows: Plain Text On-device-debug proxy starting: symbols : .../cn1-symbols.txt device : listening on tcp://0.0.0.0:55333 jdwp : listening on tcp://0.0.0.0:8000 [device] listening on port 55333 for ParparVM app to dial in When the [jdwp] line appears, the proxy is ready. 4. Attach the Debugger Switch the run-config dropdown to CN1 Attach iOS and click the Debug button. IntelliJ connects to localhost:8000 and opens its standard Debug tool window. You can now set breakpoints anywhere in your Java code or in the framework. 5. Launch the App Launch the iOS app under the iOS Simulator (from Xcode) or on the tethered device. With waitForAttach=true it pauses at the "Waiting for debugger" overlay until the proxy issues its first resume. Hit Resume on the IntelliJ Debug toolbar; the app proceeds, your breakpoints fire as the app exercises them. The proxy's Run window is also your device console. Anything the app writes to System.out, Log.p, printf, or NSLog from native code is forwarded to the proxy and printed in the CN1 Debug Proxy Run window with a [device] prefix. This is genuinely useful and is one fewer thing you need Xcode for. The caveat is that the forwarding starts when the proxy connection is established, so output written during the very first millisecond of process launch (before Display.init) is not always captured. If you need every byte from t=0, attach Xcode's console for that specific run. Tutorial: IntelliJ + Android Android is simpler because the proxy is not needed. The archetype generates two run configurations under the same On-Device Debug folder: CN1 Android On-Device Debug (Maven, builds and installs the APK and forwards JDWP) and CN1 Attach Android (Remote JVM Debug at localhost:5005). 1. Enable the Build Hint In common/codenameone_settings.properties: Properties files android.onDeviceDebug=true This single hint flips the manifest to debuggable="true" and turns R8 / Proguard off for this build. Release builds without the hint are unaffected. 2. Run CN1 Android On-Device Debug Picks up the hint, builds the APK, installs it on the connected device or emulator, sets the debug-app for wait-for-attach, launches the Activity, forwards JDWP to localhost:5005, and streams logcat --pid=<pid> into the Run window with a [device] prefix. For wireless adb, pass -Dcn1.android.onDeviceDebug.wireless=<ip:port> and the goal will adb connect before installing. Both the Android 11+ adb pair flow and the legacy adb tcpip flow work. 3. Attach the Debugger Switch to CN1, Attach Android, and click Debug. IntelliJ connects to localhost:5005. Set breakpoints anywhere; they fire when exercised. Source resolution covers both the codenameone-core and codenameone-android sources jars, so breakpoints inside the framework or inside the Android port resolve to the right files. On Android, native interfaces are themselves Java, so a breakpoint inside the Impl class of your own native interface fires just like a breakpoint anywhere else in your code; you can step through the implementation, inspect locals, and evaluate expressions the same way. The dev guide has the full reference, including the wireless-pairing flows, the VS Code and Eclipse equivalents, and a troubleshooting section: iOS on-device debugging and Android on-device debugging. When to Use It (and When Not To) For most bugs, the JavaSE simulator is still, by a large margin, the fastest loop. Reach for on-device debugging when the bug is platform-specific: ParparVM-specific threading, an iOS-only layout glitch under the modern native theme, a real-radio Bluetooth interaction, a Touch ID gate, an Android-only manifest interaction, anything that only reproduces under iOS background memory pressure. The kind of bug that previously sent you reaching for Log.p and a rebuild loop. That bug now has a debugger pointed at it. JUnit 5 Against the Simulator The other change in this release is the new JUnit 5 integration in the JavaSE port (PR #5032). To be clear about what this is: it is standard JUnit 5. There is no fork of JUnit in com.codename1.testing.junit. That package holds a small set of annotations and a CodenameOneExtension that plugs into the regular JUnit Jupiter lifecycle. You write @Test methods using org.junit.jupiter.api.Test, you assert with org.junit.jupiter.api.Assertions, and your IDE's native test runner picks them up the way it does on any other Java project. Why a separate integration at all? The legacy com.codename1.testing.AbstractTest framework, driven by the cn1:test Maven goal, still exists and is still the only way to run tests on a real iOS or Android device (JUnit Jupiter is not available on ParparVM). The trade-off is that AbstractTest tests have to compile under the Codename One device subset, with no reflection, no java.net.http, no java.nio.file, no Mockito, no AssertJ, no assertThrows. JUnit-style tests run only on the JavaSE simulator JVM, but the JVM is a regular JVM, so reflection, Mockito, AssertJ, and parameterized tests are all available. Both styles coexist in the same project under common/src/test/java. You pick per test class. The runners discover disjoint sets (cn1:test looks for UnitTest implementers; Surefire looks for @Test methods), so a mvn install runs both passes in the same phase without overlap. A Minimal Test Tests live in common/src/test/java. The shape most apps want is one that boots the project's app class through the same init / start sequence the simulator uses, then asserts against the form the app actually opens: Java package com.example.myapp; import com.codename1.testing.junit.CodenameOneTest; import com.codename1.testing.junit.RunOnEdt; import com.codename1.ui.CN; import com.codename1.ui.Display; import com.codename1.ui.Form; import org.junit.jupiter.api.Test; import static org.junit.jupiter.api.Assertions.assertEquals; import static org.junit.jupiter.api.Assertions.assertTrue; @CodenameOneTest class GreetingFormTest { @Test @RunOnEdt void formShowsExpectedTitle() { MyAppName app = new MyAppName(); app.init(null); app.start(); assertEquals("Hi World", Display.getInstance().getCurrent().getTitle()); assertTrue(CN.isEdt(), "@RunOnEdt method runs on the Codename One EDT"); } } That is more useful than constructing a Form directly in the test because it exercises the same startup path the simulator runs. The assertions check the form your app opens, not a form the test wrote. The natural way to run it is from the IntelliJ gutter. Click the green icon next to the class declaration: The results land in the standard Run tool window: Click the green icon next to a specific @Test method to run just that method. The same flow works in VS Code's Test Explorer and in Eclipse's JUnit view. If you prefer the command line: Shell mvn -Ptest test # run the JUnit suite mvn -Ptest test -Dtest=GreetingFormTest # one class mvn -Ptest test -Dtest=GreetingFormTest#formShowsExpectedTitle @CodenameOneTest is the class-level entry point. It wires the simulator extension into the JUnit Jupiter lifecycle, boots Display.init(null) once per JVM (idempotent, so subsequent classes share the same Display), and skips the class with a TestAbortedException if the JVM is genuinely headless (so CI runners that have no display do not poison the rest of the run). @RunOnEdt dispatches the test body through CN.callSerially, which is what you want any time the body touches UI state. It rethrows the body's exceptions on the JUnit thread so the stack trace stays clickable in the IDE. Place it on the method for one test, on the class to apply to every test. A Couple More Common Cases A test that exercises a plain validator, with no UI involved at all: Java @CodenameOneTest class EmailValidatorTest { @Test void rejectsEmptyString() { assertFalse(new EmailValidator().isValid("")); } @Test void acceptsCommonAddress() { assertTrue(new EmailValidator().isValid("[email protected]")); } } This is the "pure model code" shape. No @RunOnEdt, no UI, runs on the JUnit worker thread, fast. A test of a form under a specific visual configuration: Java @CodenameOneTest class GreetingFormVisualTest { @Test @RunOnEdt @DarkMode @LargerText(scale = 1.6f) void titleStillFitsInDarkModeAtAccessibilityScale() { new GreetingForm().show(); Form current = Display.getInstance().getCurrent(); assertEquals("Hello", current.getTitle()); assertTrue(current.getPreferredW() <= Display.getInstance().getDisplayWidth()); } } The visual-config annotations (@Theme, @DarkMode, @LargerText, @Orientation, @RTL) apply on the EDT in one batch, followed by a single theme refresh, so the test body sees the simulator in the exact configuration you asked for without flicker. A test that injects a custom property for the duration of one method: Java @Test @RunOnEdt @SimulatorProperty(name = "feature.flag", value = "on") void newCodePathRunsWhenFlagIsOn() { // Display.getProperty("feature.flag", "off") returns "on" here runFeature(); assertEquals("expected", Display.getInstance().getCurrent().getTitle()); Class-level @SimulatorProperty applies to every method in the class. Method-level overrides class-level. Use the container @SimulatorProperties for more than one (the package source level rules out @Repeatable). The full reference, including the dependency-block YAML for common/pom.xml and javase/pom.xml and the @Theme / @Orientation / @RTL details, is at Testing with JUnit 5 in the developer guide. Wrapping Up That is the workflow half of this release. Tomorrow's post covers the new platform APIs that moved into the core this week: AI and OAuth/OIDC are the headline pieces, with wifi/connectivity and a few smaller items alongside them. Back to the weekly index.
This is the first article in a 6-part series on building practical, responsible AI audit workflows with RAI Audit Kit, an open-source Python package suite. The series will move from foundational AI systems to more advanced and production-oriented audit workflows: Launching RAI Audit Kit – why evidence-grade responsible AI audits matterAuditing ML systems – fairness, drift, data quality, and robustnessAuditing deep learning systems – image models, medical imaging, robustness, and explainabilityAuditing LLM and RAG systems – prompt injection, faithfulness, citations, and retrieval securityAuditing AI agents – tool use, memory, permissions, and trace safetyAdding audit gates to CI/CD – turning audit results into engineering controls This first article introduces the project, the problem it is designed to solve, and how the package suite is structured. Why Responsible AI Audits Need Better Tooling AI systems are becoming more complex. A few years ago, many teams mainly worried about model accuracy. Today, the picture is much broader. Modern AI systems may include tabular machine learning models, deep learning pipelines, LLM applications, RAG systems, and AI agents that call tools or use memory. That means AI evaluation can no longer stop at: “Is the model accurate?” A better question is: “Can we show evidence that this AI system was evaluated for fairness, robustness, drift, data quality, safety, security, and traceability?” In many teams, this evidence is scattered across notebooks, scripts, screenshots, spreadsheets, and manual review documents. That makes audits hard to reproduce and harder to compare across versions. Responsible AI needs to become part of normal engineering workflows. That is why I built the RAI Audit Kit. What Is the RAI Audit Kit? RAI Audit Kit is an open-source Python package suite for responsible, secure, and trustworthy AI audits. The goal is to help developers and AI teams run repeatable audits, generate structured findings, preserve evidence, and export useful reports. It is designed to support different types of AI systems, including: Classical machine learningDeep learningLLM applicationsRAG systemsAgentic AI workflows The package can help generate outputs such as findings, evidence manifests, model cards, audit reports, and CI/CD-friendly results. Install: PowerShell pip install rai-audit-kit Full install: PowerShell pip install "rai-audit-kit[all]" Package Architecture RAI Audit Kit is organized as a suite of smaller packages: PackagePurposerai-audit-coreReports, findings, evidence, model cards, audit history, and CI gatesrai-audit-mlFairness, drift, data quality, and robustness checks for tabular MLrai-audit-dlDeep learning, image, medical imaging, robustness, and explainability auditsrai-audit-llmLLM and RAG audits for prompt injection, toxicity, faithfulness, citations, and retrieval securityrai-audit-agentsAgent audits for tools, memory, permissions, prompt injection, and trace behaviorrai-audit-kitMeta-package for unified installation and CLI usage The structure is modular because responsible AI is not a single problem. A tabular ML system has different risks from a deep learning model. A RAG application has different risks from an autonomous agent. The suite is designed to keep those workflows connected while still allowing each package to focus on its own risk area. Quick Start A basic CLI workflow looks like this: PowerShell rai-audit init --project responsible-ai-demo rai-audit run --config audit.yaml For tabular ML, the Python API can look like this: Python from rai_audit.ml import ClassificationAudit report = ClassificationAudit( y_true=y_true, y_pred=y_pred, sensitive_features=sensitive_df, ).run() report.to_html("audit_report.html") The goal is to move from one-off evaluation scripts to repeatable audit runs that produce reviewable artifacts. What Can It Audit? RAI Audit Kit is designed around the idea that different AI systems need different audit lenses. For machine learning systems, the focus is on fairness, drift, data quality, and robustness. A model may perform well overall but still fail for certain subgroups or become unreliable after deployment.For deep learning systems, especially image and medical imaging models, the focus shifts toward robustness, explainability, patient leakage, site-level differences, and class-level performance.For LLM and RAG systems, the audit scope expands to prompt injection, unsafe output, toxicity, faithfulness, citation quality, retrieval quality, and retrieval security.For AI agents, the focus becomes tool use, memory, permissions, trace completeness, and prompt injection through external sources such as tools, webpages, retrieval systems, or email content. This article will not go deep into each area. Each one will be covered separately in the rest of the series. Why Evidence Matters Responsible AI audits should not disappear inside notebooks. A useful audit should answer: What checks were run?What data or predictions were evaluated?What findings were generated?What evidence supports each finding?Which artifacts were exported?Can the audit be repeated later?Can this be integrated into CI/CD? This evidence-first mindset is one of the main ideas behind the RAI Audit Kit. Reports can be exported in formats such as HTML, Markdown, and JSON. This makes the results useful for developers, reviewers, governance teams, and automation workflows. A simple audit flow may look like this: Plain Text Run evaluation ↓ Run responsible AI audit ↓ Generate findings ↓ Preserve evidence ↓ Export reports ↓ Review or gate deployment This does not replace human judgment. It gives reviewers better evidence to work with. Not a Compliance Shortcut It is important to be clear about the scope. RAI Audit Kit is a technical audit and reporting toolkit. It can help generate structured evidence and standards-oriented summaries, but it does not automatically certify that a system is compliant with any law, regulation, or internal policy. The goal is to support better review, not replace legal review, domain expertise, risk management, or organizational accountability. Responsible AI tools should help teams ask better questions and preserve better evidence. They should not create false confidence. Why This Project Matters Responsible AI needs practical engineering tools. Teams should be able to audit models, preserve evidence, compare results, and include risk checks in their development workflow. RAI Audit Kit is an early step in that direction. It brings together audits for ML, deep learning, LLMs, RAG systems, and AI agents under one Python suite. The core idea is simple: Responsible AI should be repeatable, evidence-backed, and built into the way we engineer AI systems. What’s Next in This Series In the next article, I will focus on auditing machine learning systems for fairness, drift, data quality, and robustness using the RAI Audit Kit. We will look at why accuracy alone is not enough, how subgroup performance can hide model risk, and how audit outputs can make ML review more structured and repeatable. Project Links GitHub: https://github.com/SaiTeja-Erukude/rai-auditInstall: pip install rai-audit-kit If you work on responsible AI, AI safety, LLM security, RAG systems, agentic AI, or MLOps, I would love feedback, ideas, and contributions.
The MovieManager project has been updated to use JDK 25 and the AOT cache from project Leyden. Project Leyden is part of the OpenJDK project and provides cached linking and cached performance statistics. That means the time spent linking at startup is moved to build time, and the statistics are created during a test run at build time as well. Because of that, the JVM loads the needed classes already linked and starts compiling the hot code paths immediately. The MovieManager application starts in less than half the time with these optimizations without any code changes. All these advantages come with preconditions: Exactly the same JVM version at build time, training time, and run timeThe same OS(Linux is used here) and libc at all steps -> (No Alpine-based Docker Images)Same CPU architecture, for example, AMD64 or ARM64 The steps to use Project Leyden: Build the Spring Boot ApplicationExtract the Spring Boot ApplicationDo a training run with the extracted Application to create the AOT cacheCreate the Docker Image with the extracted Application and the AOT cache Building and Training the Application The first step is to build the Spring Boot JAR. The MovieManager project has an integrated build that builds the Angular frontend and the Spring Boot backend with this Maven command: Shell ./mvnw clean install -Ddocker=true -Dnpm.test.script=test-chromium Project Leyden does not support Spring Boot Jars. The Jar has to be extracted to help Project Leyden find the used library jars of the project. To do that, this command needs to be used: Shell java -Djarmode=tools -jar backend/target/moviemanager-backend-0.0.1-SNAPSHOT.jar extract --destination extracted The result is the directory ‘extracted’ with the application jar and a sub-directory ‘lib’ that contains the used libraries. The second step is to create the AOT cache. To do that, the application has to run in production conditions. That means using a real PostgreSQL database with the database driver. That enables the JDK to record all the needed classes of the project and to create realistic performance statistics for the code compilation. To do this, a PostgreSQL database has to be started(done here in a Docker container), and the Application has to do the full startup. These commands are needed: Shell docker pull postgres:13 docker run --name local-postgres -e POSTGRES_PASSWORD=sven1 -e POSTGRES_USER=sven1 -e POSTGRES_DB=movies -p 5432:5432 -d postgres java -XX:+UseG1GC -XX:MaxGCPauseMillis=50 -XX:+UseCompressedOops -XX:+UseCompactObjectHeaders -XX:+ExitOnOutOfMemoryError -XX:MaxDirectMemorySize=64m -XX:+UseStringDeduplication -Xlog:aot -XX:AOTCacheOutput=app.aot -Dspring.context.exit=onRefresh -Djava.security.egd=file:/dev/./urandom -jar extracted/moviemanager-backend-0.0.1-SNAPSHOT.jar --spring.profiles.active=prod The Java command runs the application with the parameter ‘-Dspring.context.exit=onRefresh’ that makes Spring Boot do the full startup and exit then. The parameters ‘-Xlog:aot -XX:AOTCacheOutput=app.aot’ enable the logging of the AOT process and the creation of the ‘app.aot’ that is the AOT cache. The AOT cache contains everything that is needed for a fast startup of the application. If the AOT cache should also contain information to improve production performance, it would have to start up and process realistic production requests. That is beyond the scope of this article. The third step is to test the new application setup: Shell java -XX:+UseG1GC -XX:MaxGCPauseMillis=50 -XX:+UseCompressedOops -XX:+UseCompactObjectHeaders -XX:+ExitOnOutOfMemoryError -XX:MaxDirectMemorySize=64m -XX:+UseStringDeduplication -Xlog:class+path=info -XX:AOTCache=app.aot -Xlog:aot -Djava.security.egd=file:/dev/./urandom -jar extracted/moviemanager-backend-0.0.1-SNAPSHOT.jar --spring.profiles.active=prod The start-up time of the new setup with the AOT cache can be compared to the start-up time of the Spring Boot jar. On a medium-powered laptop, the times are: 9 seconds for the Spring Boot Jar3.5 seconds for the new setup with the AOT cache Creating a Docker Image To use the application in production, it needs to be packaged into a Docker image. The Docker image needs to contain the extracted application setup and the AOT cache. The base image needs to have the exact same JDK version, OS, and the same libc. That means small base images like Alpine cannot be used. The created Image can not be small because it contains 180 MB of AOT cache and a larger base image. This can be done with this Dockerfile: Dockerfile FROM eclipse-temurin:25.0.3_9-jdk-jammy WORKDIR /application ARG JAR_FILE=extracted/*.jar COPY ${JAR_FILE} moviemanager-backend-0.0.1-SNAPSHOT.jar COPY extracted/ ./ COPY app.aot app.aot ENV JAVA_OPTS="-XX:+UseG1GC \ -XX:MaxGCPauseMillis=50 \ -XX:+UseCompressedOops \ -XX:+UseCompactObjectHeaders \ -XX:+ExitOnOutOfMemoryError \ -XX:MaxDirectMemorySize=64m \ -XX:+UseStringDeduplication" ENTRYPOINT exec java $JAVA_OPTS -XX:+AOTClassLinking \ -XX:AOTCache=app.aot \ -Xlog:class+path=info \ -Djava.security.egd=file:/dev/./urandom \ -jar moviemanager-backend-0.0.1-SNAPSHOT.jar It copies the new application setup in the image and adds the AOT cache. The name of the application jar is in the AOT cache and has to be exactly the same as during the creation of the AOT cache. The ‘JAVA_OPTS’ also have to be the same. If the JDK version in the build environment changes, the version of the base image has to be adjusted accordingly. The parameter ‘-Xlog:class+path=info’ makes analyzing AOT problems much easier. The Docker container size is 705 MB. That makes the container about double the size of a Docker container with a Spring Boot Jar and an Alpine-based JDK image. Creating a Build Pipeline Creating Docker images for an application by hand is unsustainable in a production environment. A build pipeline is needed. The MovieManager project is hosted on GitHub; because of that, the project uses a GitHub Workflow as a build pipeline. The complete code for the build pipeline is in the script. The steps of the GitHub pipeline can be recreated in other environments too. The first step is to set up the PostgreSQL database service to be used in this build: YAML jobs: analyze: name: Analyze runs-on: ubuntu-latest env: POSTGRES_URL: jdbc:postgresql://localhost:5432/movies services: postgres: image: postgres:latest env: POSTGRES_USER: sven1 POSTGRES_PASSWORD: sven1 POSTGRES_DB: movies ports: - 5432:5432 options: >- --health-cmd="pg_isready -U sven1 -d movies" --health-interval=10s --health-timeout=5s --health-retries=5 The commands set up the PostgreSQL service in the build pipeline with user, password, dbname, and dbport. The ‘POSTGRES_URL’ is set to access the database later. The second step is to check out the project: YAML steps: - name: Checkout repository uses: actions/checkout@v3 It checks out the contents of the master branch. The third step is to provide the JDK: YAML - name: Setup Java JDK uses: actions/setup-java@v3 with: distribution: 'temurin' java-version: 25 JDK version 25 is the minimum to use the project Leyden with linking and performance statistics. The fourth step builds the Spring Boot Jar: YAML - name: Build with Maven if: matrix.language == 'java' run: | ./mvnw clean install -Ddocker=true That is the Maven command to build the project. The fifth step is to find the Spring Boot jar: YAML - name: Find fat jar if: matrix.language == 'java' id: jar run: | JAR_PATH=$(find ./backend/target -type f -name "*SNAPSHOT.jar" | head -n 1) echo "Found JAR: $JAR_PATH" echo "jar=$JAR_PATH" >> $GITHUB_OUTPUT The sixth step is to extract the Spring Boot jar: YAML - name: Unpack fat jar if: matrix.language == 'java' id: UNPACK run: | java -Djarmode=tools -jar ${{ steps.jar.outputs.jar } extract --destination extracted EXTRACTED_PATH=$(find . -type d -name "extracted" | head -n 1) echo "Found directory: $EXTRACTED_PATH" echo "extracted=$EXTRACTED_PATH" >> $GITHUB_OUTPUT The seventh step is to get the name of the extracted application jar: YAML - name: find extracted jar if: matrix.language == 'java' id: EXTRACT run: | EXTRACTED_JAR=$(find "${{ steps.UNPACK.outputs.extracted }" -type f -name "*.jar" | head -n 1) EXTRACTED_JAR=${EXTRACTED_JAR#./} echo "Found extracted JAR: $EXTRACTED_JAR" echo "extracted=$EXTRACTED_JAR" >> $GITHUB_OUTPUT The eighth step is to create the AOT cache: YAML - name: Create AOT cache if: matrix.language == 'java' id: AOT env: JAVA_TOOL_OPTIONS: "" _JAVA_OPTIONS: "" JDK_JAVA_OPTIONS: "" run: | EXTRACTED_JAR="${{ steps.EXTRACT.outputs.extracted }" echo "jar=$EXTRACTED_JAR" echo "JAVA_TOOL_OPTIONS=$JAVA_TOOL_OPTIONS" echo "_JAVA_OPTIONS=$_JAVA_OPTIONS" echo "JDK_JAVA_OPTIONS=$JDK_JAVA_OPTIONS" JAVA_OPTS="-XX:+UseG1GC -XX:MaxGCPauseMillis=50 -XX:+UseCompressedOops -XX:+UseCompactObjectHeaders -XX:+ExitOnOutOfMemoryError -XX:MaxDirectMemorySize=64m -XX:+UseStringDeduplication" java $JAVA_OPTS \ -XX:+AOTClassLinking \ -XX:AOTCacheOutput=app.aot \ -Xlog:aot \ -Dspring.context.exit=onRefresh \ -Dspring.datasource.url="${{ env.POSTGRES_URL }" \ -Dspring.profiles.active=prod \ -jar "$EXTRACTED_JAR" || echo "AOT Training finished with exit code $?" This runs the application startup with the PostgreSQL database to create the AOT cache. The ninth step shows the exact JDK version used in the AOT cache generation: YAML - name: Show Jdk version if: matrix.language == 'java' id: JDK run: | JDK_VERSION=$(java -version 2>&1) VERSION=$(echo "$JDK_VERSION" | sed -n 's/.*build \([^[:space:]]*\)-LTS.*/\1/p') echo "JDK_VERSION=$JDK_VERSION" echo "VERSION=$VERSION" MY_VERSION="jdk=$VERSION" In case of problems with using the AOT cache. The first check is the version shown here against the JDK version in the Docker base image. The tenth step creates the Docker image: YAML - name: Build and push uses: docker/build-push-action@v6 if: matrix.language == 'java' with: context: . file: ./Dockerfile build-args: | JAR_PATH=${{ steps.EXTRACT.outputs.extracted } LIB_PATH=${{ steps.aot.outputs.extracted } push: false tags: angular2guy/moviemanager:latest This step can push the Docker image to an image repository. Conclusion The results of using the AOT cache of project Leyden are impressive. Cutting the startup time in half without any code change is amazing. The effort to create the AOT cache and set up the new application is a one-time investment. The impact of the larger Docker Images is low. That makes scaling application instances in Kubernetes clusters up and down much more flexible because the time to the availability of a new application instance is much lower. In Kubernetes environments with scaling of application instances, the AOT cache is a significant step forward and should be used. For serverless applications 3.5 seconds startup time is too slow. Their project, CrAC or Native Image, would be needed. Project CrAC needs code changes and testing. Native Image has the closed-world assumption, which makes it hard to prove that larger applications work correctly. Alternatives are Node.js with Nest.js and TypeScript, or Go with its libraries. Project Leyden is not finished in JDK 25. There are plans to add compiled code to the AOT cache in the future. The JVM is an impressive piece of technology that is still improving further.
Alvin Lee
Founder,
Out of the Box Development, LLC