Coding Resources

DZone's Featured Coding Resources

GraphRAG in Practice Using Spring AI, Neo4j, and Goodreads Data

By Akmal Chaudhri

CORE

Large language models (LLMs) are impressive — until they are not. If you ask one about your internal data, your product catalog, or your users' reviews, it will either hallucinate an answer or admit it does not know. The solution most teams reach for is retrieval-augmented generation (RAG). This retrieves relevant data first, injects it into the prompt as context, and lets the model answer from that context rather than from memory. GraphRAG takes this a step further. Instead of retrieving only text chunks, it can use graph relationships to retrieve connected context, following relationships between entities to build richer, more structured context. The result can provide answers grounded in both data and the relationships between that data. In this article, we'll walk through a practical GraphRAG implementation using Spring AI and Neo4j, built on top of a Goodreads book and review dataset. We'll cover the data model, loading the data, setting up the vector index, running the Spring Boot application, and some lessons learned along the way. The full source code is available on GitHub. What We Are Building The application answers natural language queries like "find books with a happy ending" or "something encouraging" by combining two retrieval mechanisms in Neo4j: Vector search – embeds the search phrase via OpenAI and finds semantically similar book reviews using cosine similarity.Graph traversal – follows the WRITTEN_FOR relationship from matched reviews to their associated books, giving the LLM structured book context rather than raw review text. This example uses a simple GraphRAG pattern where vector search identifies relevant reviews and graph traversal expands the retrieved context to connected books. The LLM then summarizes the retrieved books in the context of the original search phrase. The architecture looks like Figure 1. Figure 1. Architecture. Prerequisites Before we start, we will need: Java 21 or laterA Neo4j AuraDB instanceAn OpenAI API key Installing Java If Java is not already installed, the recommended distribution is Temurin from the Adoptium project, available at adoptium.net. Installers are available for Windows, macOS, and Linux. Once installed, verify with: Shell java -version We should see something like openjdk version "21.x.x". The project uses the Maven wrapper, so there is no need to install Maven separately. Setting Up Neo4j AuraDB AuraDB is Neo4j's fully managed cloud database. A free tier is available. Sign up at neo4j.com/product/auradb/.Create a new AuraDB Free instance.When the instance is created, download or note the credentials — the URI, username, and password. Neo4j only shows the password once, so save it somewhere safe.Once the instance is running, open the built-in Query tab and verify connectivity: cypher MATCH (n) RETURN count(n) . This should return 0. We are ready to load data. AuraDB Free includes Awesome Procedures on Cypher (APOC), a utility that provides numerous procedures and functions for data handling. We'll use APOC for the data loading steps. The Data Model The dataset is built around three core node types: Book – 10,000 books from the Goodreads UCSD datasetAuthor – 12,371 authorsReview – 69,791 user reviews, each linked to a book via a WRITTEN_FOR relationship There is also a User node (44,827 users) linked to reviews via a PUBLISHED relationship, although the main application focuses on Books and Reviews. The graph model is shown in Figure 2. Figure 2. The Goodreads Dataset. The key insight is that the Review node carries two things: the review text and 1,536-dimension embeddings generated using an OpenAI embedding model. This is what makes vector similarity search possible without a separate vector database — Neo4j handles both the graph and the vectors. The Goodreads data used in this article is derived from the UCSD Book Graph dataset and related Goodreads datasets released by researchers at the University of California, San Diego, including Mengting Wan, Julian McAuley, and collaborators. The data is provided for research and educational purposes. If you use these datasets in your own work, please cite the following publications: Mengting Wan and Julian McAuley, Item Recommendation on Monotonic Behavior Chains, RecSys 2018.Mengting Wan, Rishabh Misra, Ndapa Nakashole, and Julian McAuley, Fine-Grained Spoiler Detection from Large-Scale Review Corpora, ACL 2019. Loading the Data Let's load the data step by step in the AuraDB Query tab. Run each of the following blocks separately. Constraints and Indexes First, let's set up the constraints and the vector index: Cypher CREATE CONSTRAINT FOR (b:Book) REQUIRE b.book_id IS UNIQUE; CREATE CONSTRAINT FOR (a:Author) REQUIRE a.author_id IS UNIQUE; CREATE CONSTRAINT FOR (r:Review) REQUIRE r.id IS UNIQUE; CREATE CONSTRAINT FOR (u:User) REQUIRE u.user_id IS UNIQUE; CREATE INDEX FOR (r:Review) ON (r.user_id); Then create the vector index on the Review node's embedding property: Cypher CREATE VECTOR INDEX `review-text` IF NOT EXISTS FOR (n:Review) ON (n.embedding) OPTIONS { indexConfig: { `vector.dimensions`: 1536, `vector.similarity_function`: 'cosine' }; Note the index name review-text — we will come back to this in the lessons learned section. Loading Books and Authors The data are hosted on Neo4j's public servers, so we can load them directly via APOC: Cypher CALL apoc.load.json("https://data.neo4j.com/goodreads/goodreads_books_10k.json") YIELD value as book MERGE (b:Book {book_id: book.book_id}) SET b += apoc.map.clean(book, ['authors','similar_books'],[""]); Next, we'll load the initial author stubs: Cypher CALL apoc.load.json("https://data.neo4j.com/goodreads/goodreads_books_10k.json") YIELD value as book WITH book UNWIND book.authors as author MERGE (a:Author {author_id: author.author_id}); and then populate the author nodes with the full data: Cypher CALL apoc.periodic.iterate( 'CALL apoc.load.json("https://data.neo4j.com/goodreads/goodreads_book_authors.json.gz") YIELD value as author', 'WITH author MATCH (a:Author {author_id: author.author_id}) SET a += apoc.map.clean(author, [],[""])', {batchsize: 10000} ); Next, we'll create the AUTHORED and SIMILAR_TO relationships: Cypher CALL apoc.load.json("https://data.neo4j.com/goodreads/goodreads_books_10k.json") YIELD value as book WITH book MATCH (b:Book {book_id: book.book_id}) WITH book, b UNWIND book.authors as author MATCH (a:Author {author_id: author.author_id}) MERGE (a)-[w:AUTHORED]->(b); Cypher CALL apoc.load.json("https://data.neo4j.com/goodreads/goodreads_books_10k.json") YIELD value as book WITH book MATCH (b:Book {book_id: book.book_id}) WITH book, b WHERE book.similar_books IS NOT NULL UNWIND book.similar_books as similarBookId MATCH (b2:Book {book_id: similarBookId}) MERGE (b)-[r:SIMILAR_TO]->(b2); Loading Reviews This step can take several minutes, as it is pulling and processing approximately 70,000 reviews from a gzipped JSON file: Cypher CALL apoc.load.json("https://data.neo4j.com/goodreads/goodreads_reviews_dedup.json.gz") YIELD value as review CALL { WITH review MATCH (b:Book) WHERE b.book_id = review.book_id WITH review, b MERGE (r:Review {id: review.review_id}) SET r += apoc.map.clean(review, [],[""]) WITH b, r MERGE (b)<-[rel:WRITTEN_FOR]-(r) } in transactions of 20000 rows; Note that review.review_id is stored as the Review node's id property, which Spring AI expects when mapping vector search results. Then we'll separate the User nodes from the Review data: Cypher MATCH (r:Review) WHERE r.user_id IS NOT NULL CALL { WITH r MERGE (u:User {user_id: r.user_id}) WITH r, u MERGE (r)<-[:PUBLISHED]-(u) } in transactions of 20000 rows; Adding the text Property Spring AI maps vector search results to Document objects using a property named text. Our review data uses review_text, so we need to add the text property: Cypher MATCH (r:Review) CALL { WITH r SET r.text = r.review_text } IN TRANSACTIONS OF 20000 ROWS; Loading Pre-Generated Embeddings Rather than generating embeddings at runtime, which costs tokens and time, we'll load pre-computed embeddings hosted by Neo4j. This step also takes several minutes: Cypher LOAD CSV WITH HEADERS FROM "https://data.neo4j.com/goodreads/review_embeddings.psv" as row FIELDTERMINATOR '|' CALL { WITH row MATCH (r:Review {id: row.reviewId}) CALL db.create.setNodeVectorProperty(r, 'embedding', apoc.convert.fromJsonList(row.embedding)) RETURN r } in transactions of 1000 rows WITH r RETURN count(r); Once complete, we can verify the embeddings loaded correctly: Cypher MATCH (r:Review) WHERE r.embedding IS NOT NULL RETURN count(r) AS reviews_with_embeddings We should see 69791. Exploring the Data Before running the application, let's take a look at what we have loaded. Here are a few useful queries to run in the AuraDB Query tab. Browse the top-rated books: Cypher MATCH (b:Book) RETURN b.title, b.average_rating ORDER BY b.average_rating DESC LIMIT 10 Browse books with their authors: Cypher MATCH (a:Author)-[:AUTHORED]->(b:Book) RETURN a.name, b.title, b.average_rating ORDER BY b.average_rating DESC LIMIT 10 Inspect a sample embedding — we can see the first few dimensions of a review's vector: Cypher MATCH (r:Review) WHERE r.embedding IS NOT NULL RETURN r.id, r.text, r.embedding[0..5] AS embedding_sample LIMIT 5 Building and Running the Application Let's clone the GitHub repo and get the application running: Shell git clone https://github.com/JMHReif/springai-goodreads.git cd springai-goodreads Set the environment variables for Neo4j AuraDB and OpenAI, as follows: Shell export SPRING_NEO4J_URI=neo4j+s://xxxx.databases.neo4j.io export SPRING_NEO4J_AUTHENTICATION_USERNAME=your_username_here export SPRING_NEO4J_AUTHENTICATION_PASSWORD=your_password_here export SPRING_AI_OPENAI_API_KEY=your_openai_key_here These variables must be set in the terminal session used to run the Spring Boot application, specifically the window where you run ./mvnw spring-boot:run. The terminal used for curl commands does not need them. To avoid having to re-export them each time, you can add them to your shell profile (e.g. ~/.zshrc on macOS or ~/.bashrc on Linux) or save them in a small shell script and source it before starting the app. Now we'll start the application from the root of the cloned repo, where the pom.xml and mvnw files live, as follows: Shell ./mvnw spring-boot:run Maven will download dependencies on the first run. Once the startup banner appears, the app is ready on port 8080. The Four Endpoints The application exposes four REST endpoints, each representing a different retrieval strategy: /hello — Baseline LLM Call Shell curl "http://localhost:8080/hello" A simple call to the LLM with no retrieval. Useful to verify the OpenAI connection is working. /llm — LLM With No Context Shell curl "http://localhost:8080/llm?searchPhrase=happy%20ending" This sends the search phrase directly to the LLM with no data from Neo4j. The model answers from its training data — fast, but prone to hallucination and not grounded in our Goodreads data. /vector — Vector Search Only Shell curl "http://localhost:8080/vector?searchPhrase=happy%20ending" Spring AI embeds the search phrase via OpenAI, queries the review-text vector index in Neo4j, and passes the matching review text to the LLM. Semantic matching works well here — the phrase does not need to match any exact words in the reviews. /graph — Full GraphRAG Pipeline Shell curl "http://localhost:8080/graph?searchPhrase=happy%20ending" This is the full pipeline. Vector search finds the most semantically similar reviews, the graph traversal follows the WRITTEN_FOR relationship to retrieve the associated Book nodes, and the LLM receives structured book context rather than raw review text. Let's look at the output for a few different search phrases: Shell curl "http://localhost:8080/graph?searchPhrase=encouragement" curl "http://localhost:8080/graph?searchPhrase=high%20tech" curl "http://localhost:8080/graph?searchPhrase=caffeine" The contrast between /llm and /graph on the same phrase is the most compelling comparison — the LLM answers from memory in one case and from our actual Goodreads data in the other. GraphRAG Uses Both Vector Search and Graph Traversal It's worth comparing the two retrieval strategies directly, as shown in Figure 3. Figure 3. Vector Search and Graph Traversal. Neither approach is strictly better. Rather, they are complementary. Vector search handles fuzzy, intent-driven queries that keyword search would miss entirely. Graph traversal adds relationship-aware context that makes the LLM response richer and easier to trace back to source data. The /graph endpoint combines both. Lessons Learned Here are four things worth knowing before setting this up from scratch. Vector index naming matters. Spring AI's default vector index name is spring-ai-document-index. This project requires review-text. If the index is created with the wrong name, the application throws a runtime error that is not immediately obvious. Always check the index name configured in the application against the one created in Neo4j.Review nodes need id and text properties. Spring AI maps vector search results to Document objects using properties named id and text. In this dataset, review_id is mapped to the Review node's id property during loading, but the review text is stored as review_text. We therefore add a text property so Spring AI can map the results correctly. Without the expected properties, vector search returns results, but the book list comes back empty — the model gets no context and answers from memory instead.Pre-generated embeddings save time and money. Generating 69,791 embeddings at runtime via the OpenAI API would be slow and costly. Loading pre-computed embeddings from a file is much faster for initial development setups. The trade-off is that the embeddings are fixed, as they were generated with a specific OpenAI model and will need to be regenerated if the model changes.Data loading takes patience. The two long-running steps are the review load and the embedding load. Plan for this, although both steps only need to be done once and the database can be left running between sessions. Summary GraphRAG is a practical pattern, not just a research concept. By combining Neo4j's graph traversal with its vector index, we get two retrieval mechanisms in a single database, and no separate vector store is required for this architecture. Spring AI provides the abstractions to wire it all together in a way that will feel familiar to any Spring developer. The Goodreads domain is approachable and familiar to many readers, but the architecture generalizes to any graph of connected entities, such as product catalogs, knowledge graphs, and collections of documents. If you have relationships in your data, a graph database gives you relationship-aware retrieval capabilities that a plain vector store does not provide. The full source code is on GitHub. Acknowledgements I thank my colleague Jennifer Reif for sharing the Spring AI example. More

Differential Flamegraphs in Java in Jeffrey Microscope

By Petr Bouda

CORE

In the first article, we got started with Jeffrey Microscope and learned to read a single flamegraph — the timeseries, search, tooltips, and the allocation and wall-clock variants. This time we build directly on that foundation and tackle one of Jeffrey's most powerful features for real-world performance work: the differential flamegraph, which compares two recordings and shows you precisely what changed between them. A single flamegraph tells you where your application spends its time. But the questions that matter most in practice are comparative: Did my optimization actually help?What did this refactor make slower?Where did the extra allocations come from? Staring at two flamegraphs side by side and trying to spot the difference by eye is slow and error-prone — the graphs are large, and the interesting change is often a few frames buried deep in the stack. Jeffrey Microscope's differential flamegraph solves this by overlaying two recordings into a single graph and coloring every frame by how it changed: Red – where the primary profile spends more than the baseline (a regression).Green – where it spends less (an improvement).Deeper shades – brand-new and fully-removed frames, called out distinctly. In this article, we'll take the two recordings from the previous post — the optimized direct serialization path and the garbage-heavy DOM path — set one as a secondary profile, and let the differential view pinpoint exactly which methods account for the difference. We start exactly where the first article left off. Open the optimized recording, jeffrey-persons-direct-serde-cpu.jfr.lz4, and head to the Visualization tab — this is our primary profile, the same CPU flamegraph we explored last time. On its own, it shows where the direct serialization path spends its time, but to turn it into a comparison we need a second recording to diff it against. That's what the Secondary Profile slot in the top bar is for — currently marked NOT SET. In the next step we'll point it at the DOM-based recording and unlock the Differential view in the sidebar. Supported Events Types With the secondary set, the Differential page mirrors the Primary one — a card per event type — but each now shows both sides at once. The value on the left is the baseline (the secondary profile), the value on the right is the primary, and the badge is the relative change from one to the other: a red +N% means the primary has more of that event than the baseline (grew), a green −N% means it has less (shrank). This lets you gauge the overall shift before opening a single graph — whether the change is a rounding-error wobble or a real regression worth investigating. Jeffrey supports differential flamegraphs for every sample-based event it can render normally: Execution Samples – total CPU work. More samples means more time spent on-CPU (37.3K → 39.7K, +6.4% here).Wall-Clock Samples – elapsed time including waiting and blocking, which can move independently of CPU (5.0M → 4.4M, −12.4%).Allocation Samples – memory pressure; switch Use Total Allocation to compare bytes rather than sample count and see the true allocation cost (27.47 GiB → 30.45 GiB, +10.9%).CPU-Time Samples and Method Traces – empty here, but diff identically when the recordings contain them. Each of these numbers is just the headline; the flamegraph below breaks the same delta down frame by frame, so you can see which methods drove it. Click View Flamegraph on the Execution Samples card to open the differential CPU view. Reading the Differential Flamegraph Opening the differential view feels familiar — same timeseries, search, and tooltip as a normal flamegraph — but everything now encodes two profiles at once: The summary bar at the top reports the totals side by side: baseline 35,472 vs primary 39,668, a net +4,196 (+11.83%) flagged as REGRESSED. That's the headline — the primary run did more on-CPU work overall.The timeseries overlays both recordings as two lines — Primary in blue, Secondary (baseline) in red — so you can see where in time the profiles diverge, not just that they differ.The flamegraph colors encode the per-frame change: pale pink/green for frames that shifted a little, and saturated deep red/deep green for frames that exist in only one profile — brand-new work versus work that disappeared entirely. The payoff is in the last two screenshots. Because the optimized and unoptimized paths run through differently-named classes, the diff renders them as a matched pair: the deep-red EfficientPersonService.getNPersons subtree (new in the primary) sitting right next to the deep-green InefficientPersonService subtree (gone from the primary). You're literally seeing the code swap, top to bottom. And hovering a shared frame quantifies it precisely — the tooltip on PersonController.getNPersons shows baseline 854 → primary 525, an IMPROVED −329 (−38.52%) for that endpoint's own path. The differential CPU flamegraph overlays both recordings: the timeseries plots the primary (blue) against the secondary baseline (red), and the summary bar reports baseline 35,472 → primary 39,668, a net +4,196 (+11.83%) marked REGRESSED. The merged flamegraph colors every frame by its change. The shared Tomcat, Coyote, and Spring layers stay mostly pale pink — small shifts — while the summary bar keeps the overall +11.83% delta in view. The flamegraph also captures the JVM's own threads, not just your request path — the CompileBroker / C2Compiler stacks on the left are JIT compilation, and garbage-collection activity shows up the same way. Comparing them across the two recordings tells you whether either run triggered extra spikes in JIT or GC work, a common hidden cost when one version allocates more or churns more code. Deeper into the stack, the two implementations separate out: saturated red columns mark work that is new in the primary profile, while the deep-green columns are paths that existed only in the baseline and disappear in the primary. The optimized EfficientPersonService path (red, added) sits beside the removed InefficientPersonService path (green). Hovering the shared PersonController.getNPersons frame quantifies the change exactly: baseline 854 → primary 525, an IMPROVED −329 (−38.52%). Summary From here, try the same workflow on the Wall-Clock and Allocation differential flamegraphs — the steps are identical, and each reveals a different dimension of the change: time spent waiting, and bytes allocated. Thank you for reading! To go deeper, visit the Jeffrey pages, or reach out to me directly on LinkedIn — I'd love to hear your feedback. And stay tuned: in the next article, we'll step away from flamegraphs and explore one of Jeffrey's JVM Internals views to dig into what the runtime does under the hood. More

Compliance Reporting Without Losing the Spreadsheet or the Control

By Hawk Chen

CORE

From Gherkin to Source Code Without Losing the Business Language

By Douglas Cardoso

Jeffrey Microscope for Generating Flame Graphs in Java

By Petr Bouda

CORE

Your Codename One App, Now A Native Mac App

Codename One has run on the desktop for a long time through the JavaSE target, which is the same engine that powers the simulator. What it did not have was a real native Mac binary, and the desktop output still carried a lot of phone-shaped habits: a drawn toolbar where the OS menu bar belongs, scrollbars you could not grab, no place in the menu for Preferences or Quit. With version 7.0.250, we finally have an actual native macOS application target that doesn't bundle a JVM and is as native as our iOS target. A Native Mac Build From the iOS Pipeline PR #5053 adds a Mac Native target that takes the existing project through the same build as the iPhone builder and the ParparVM pipeline that produces an iOS app. In this case, it emits a native Mac variant of it. We can find these targets in the standard Maven menu in IntelliJ as "Mac Native Build" to send a cloud build: Or as "Mac Native Project" to generate an Xcode project: These targets should work in the same way as the equivalent iOS targets. Thanks to our switch to Metal, the code for the native Mac build is very similar. That means the code of the Mac native target is mostly battle-tested. We use Mac Catalyst, which is an iOS/Mac porting framework from Apple. The user-facing name is "Mac native," and a future phase might add an AppKit target sharing the same Metal renderer without changing the surface you build against. One thing to keep in mind is that the iOS native interfaces would be the same for the desktop target; this might work out fine, but in case it doesn't, you can use #ifdef to adapt code for the Mac target. Here is a Codename One sample running as a native Mac app, the same Java code that produces the iOS and Android builds (it uses the new advertising API covered later this week): Certificates There's one major gap with the Mac target: signing. Right now our certificate wizard, settings, etc. are geared towards iOS/Android. Mac uses a different store and different signing tools. We didn't update all of that infrastructure yet, and it might take some time to update. As a short-term solution, we support some build hints to configure this: HintPurposecodename1.mac.appidMac bundle identifier (the App Store Connect record is distinct from the iOS one).codename1.mac.certificatePath to the .p12 containing the Mac signing certificate. Bundle both Mac App Distribution and Developer ID Application into a single P12 when targeting both channels.codename1.mac.certificatePasswordPassword to unlock the P12.codename1.mac.provisionPath to the Mac .provisionprofile. Desktop Integration PR #5136 and the follow-up PR #5170 make a desktop target behave like a desktop app rather than a tablet app in a window. Everything here is opt-in, on by default for newly generated apps, and completely inert on mobile or when disabled. It spans the core plus desktop ports, JavaSE, Mac, and future ports. Window Chrome and the OS Title Bar A new build hint chooses how the window is framed: Properties files desktop.titleBar=native In native mode, the Codename One Toolbar is suppressed, the form title goes to the OS title bar, and your commands are bridged to a real native menu bar (a Swing JMenuBar that becomes the macOS screen menu on JavaSE, a UIMenuBuilder menu on Mac Catalyst). custom gives you an undecorated window with Codename One drawn caption buttons and window drag; toolbar keeps the classic behavior. Together these modes let you control how the app looks in a deeply customized way. Commands Land in the Right Menu Instead of every command piling into one synthetic menu, a command can declare where it belongs: Java Command prefs = Command.create("Preferences...", null, e -> showPreferences()); prefs.setDesktopMenu(Command.DESKTOP_MENU_PREFERENCES); prefs.setDesktopShortcut(',', Command.DESKTOP_SHORTCUT_MODIFIER_PRIMARY); Command save = Command.create("Save", null, e -> save()); save.setDesktopMenu(Command.DESKTOP_MENU_FILE); save.setDesktopShortcut('s', Command.DESKTOP_SHORTCUT_MODIFIER_PRIMARY); setDesktopMenu(...) takes any of DESKTOP_MENU_APP, ABOUT, PREFERENCES, QUIT, FILE, EDIT, VIEW, WINDOW, HELP, or a custom top-level title string, so Preferences and Quit show up where a Mac user expects them. setDesktopShortcut(...) attaches a keyboard accelerator; DESKTOP_SHORTCUT_MODIFIER_PRIMARY is Command on macOS and Control elsewhere, so the same code does the right thing on each desktop. The accelerator both appears next to the menu item and fires from the keyboard. Interactive Scrollbars Desktop scrollbars are now grab-and-drag with a draggable thumb, click-track paging, and an always-visible track, following the macOS and Material conventions. The thumb shows its hover style under the pointer and its pressed style while dragged, and a minimum thumb size keeps it grabbable on very long content. This is gated by the interactiveScrollBool theme constant and uses dedicated Desktop* UIIDs, so mobile styling is untouched. Desktop Notifications PR #5170 makes the standard LocalNotification API work on a real desktop build, not just in the simulator. On JavaSE, a scheduled notification surfaces through a persistent system-tray icon as a native OS notification, and clicking it dispatches to your LocalNotificationCallback on the same code path mobile uses. Mac Catalyst keeps using the iOS notification path. The same notification code you already wrote for mobile now runs on the desktop. Generated Apps Get This for Free New projects from the archetype and the Initializr default to desktop.titleBar=native with interactive scrollbars on, and the modern themes ship the Desktop* and Window* UIIDs in light and dark (macOS conventions in ios-modern, Material in android-material). If you have an existing app, opt in with the two hints above and check the new UIIDs against your theme. This was validated end to end on both desktop builds: the JavaSE fat jar and the Mac Catalyst .app were each driven through the same AppleScript robot test for window title, menu placement, and native-menu command firing. The full Desktop Integration chapter in the developer guide covers the details. The release post has the full week's index. Tomorrow's deep dive covers WebSockets, gRPC, and GraphQL in the core, the same theme of giving a Codename One app better ways to talk to the outside world.

By Shai Almog

CORE

Exploring A Few Java 25 Language Enhancements

Although Java 26 was released in mid-March this year, Java 25 is the latest LTS version available, and thus I chose to focus my attention on it in the first place. Irrespective of whether certain Java 25 language improvements are still available as preview features or not, this article briefly outlines a few. The main purpose is to first make the developers aware that Java is continuously refined and evolved by its API contributors and secondly, to raise the curiosity and interest of exploring these enhancements in detail. Out of the bunch of features proposed in JDK 25 [Resource 1], the following five language enhancements are briefly explored here: JEP 512 – Compact source files and instance main methodsJEP 513 – Flexible Constructor BodiesJEP 507 – Primitive Types in Patterns, instanceof and switchJEP 506 – Scoped ValuesJEP 502 – Stable Values Compact Source Files and Instance Main Methods (JEP 512) After its initial proposal as part of JDK 21 as JEP 445 – ‘Unnamed Classes and Instance main Methods', this feature has been gradually improved in the next releases based on the feedback received, and it was finalized in JDK 25. The goal is clear – Simplify Java’s entry point for beginner developers and in small programs — reducing boilerplate and ceremony — while remaining fully compatible with the standard Java language and toolchain. Let’s imagine we quickly want to write a small program that: prompts the user and keeps reading their input in a loopif the user types exit (case-insensitive), it prints “Goodbye!” and endsotherwise, it prints the length of the entered string The code for this resides directly in a package, in a file called CompactSourceFile.java file, whose content looks as below: Java static final String EXIT = "exit"; String prompt(String exit) { return "Enter a string (or '" + exit + "' to quit): "; } void main() { while (true) { String input = IO.readln(prompt(EXIT)); if (EXIT.equalsIgnoreCase(input)) { IO.println("Goodbye!"); break; } IO.println("Length: " + input.length()); } } Suggestive and to the point — no class declaration, just the aimed simple piece of code. If run and after providing a few prompts, the output is as expected: Plain Text Enter a string (or 'exit' to quit): joke Length: 4 Enter a string (or 'exit' to quit): meeting Length: 7 Enter a string (or 'exit' to quit): exit Goodbye! A few observations are worth making: The need for an explicit class declaration is removedAlthough not visible, the compiler implicitly declares a class that is final and part of an unnamed packageThe traditional public static void main(String[] args) is replaced with a simpler enough instance method that is a clearly defined program entry pointThe program entry-point still needs to be named main() as the JVM looks for such a launchable methodAll fields and methods belong to the implicit class, just as in the regular caseThe simple program focuses directly on its purpose without additional detailsIt’s experimental; it’s straightforward. If it turns into a real application though, it’s advisable to preserve the object-oriented structure and all known best practices Flexible Constructor Bodies (JEP 513) Until JDK 25, one clear rule regarding constructors was that no statements could be written before super() or this() calls. For the sake of expressivity and readability, JEP 513 relaxes this constraint, while the existing code continues to compile and function correctly, and moreover, the object’s safety is 100% preserved. In Java, when an object instance is constructed, there are two stages that happen, one before and one after; the hierarchy of constructor chaining begins its execution. During the former, the memory is allocated and the instance fields are initialized, then during the latter, once the this() and super() calls complete, the rest of the object is basically constructed. This process is mainly a safety-wise one, that is to ensure the inherited object parts are completely initialized before any child-related code is run. Joshua Bloch has already advised in his ‘Effective Java’ book to prevent this reference to escape “too early.” The result – objects are not partially constructed at any moment. Simply put, starting with Java 25, statements are now allowed to be executed before this() or super() as part of constructor bodies and still, internally without making any compromises in regard to object core safety while building it. Observations: Allowed statements – only those that don’t depend on instance state and are guaranteed to be safe: manipulation of locally declared variables that live on the stackconstructor parameter validationSyntax is made more permissive, the object safety is preserved Let’s have a small example where we minimally model a Car through an approximate length and the number of wheels, where the former is inherited from a Vehicle super class. Java static class Vehicle { private final long length; Vehicle(long length) { if (length < 0) { throw new IllegalArgumentException("Length must be positive"); } this.length = length; } Vehicle(double length) { long round = Math.round(length); this(round); } public long length() { return length; } } static class Car extends Vehicle { private final int wheels; Car(double length, int wheels) { if (wheels < 0) { throw new IllegalArgumentException("Wheels must be positive"); } super(length); this.wheels = wheels; } public int wheels() { return wheels; } } void main() { var car = new Car(4.6d, 4); IO.println("Car is about " + car.length() + " meters long and has " + car.wheels() + " wheels."); } If we run it, the following output is observed — Car is about 5 meters long and has 4 wheels. First, one may observe that the Vehicle#length is first rounded as it's kept as a long value (line 13) then passed to the other constructor. Secondly, the number of wheels is validated before the super constructor is invoked (line 30), then set. Let’s now model a motorcycle using records. Java record Moto(long length, int wheels) { Moto { if (length < 0) { throw new IllegalArgumentException("Length must be positive"); } if (wheels < 0) { throw new IllegalArgumentException("Wheels must be positive"); } } Moto(double length, int wheels) { long round = Math.round(length); this(round, wheels); } } void main() { var moto1 = new Moto(3, 2); IO.println("Moto 1 is about " + moto1.length() + " meters long and has " + moto1.wheels() + " wheels."); var moto2 = new Moto(2.1d, 2); IO.println("Moto 2 is about " + moto2.length() + " meters long and has " + moto2.wheels() + " wheels."); } While before Java 25, the parameters’ validation is allowed in canonical record constructors (line 2), the ability is now extended for non-canonical constructors as well (line 12), and moreover the this() call is allowed. If we run it, moto1 is constructed using only the canonical constructor, while moto2 via both and the output is obviously the one below. Plain Text Moto 1 is about 3 meters long and has 2 wheels. Moto 2 is about 2 meters long and has 2 wheels. Regarding enums, let’s consider the following experimental code. Java enum Bike { CITY(12), MOUNTAIN("10"); private final int weight; Bike(int weight) { if (weight < 0) { throw new IllegalArgumentException("Weight must be positive"); } this.weight = weight; } Bike(String description) { int weight = Integer.parseInt(description); this(weight); } public int weight() { return weight; } } void main() { IO.println("Bike is " + Bike.MOUNTAIN.weight() + " kg heavy."); } While validation as in the first constructor has been allowed prior to Java 25, additional operations before calling this() are now permitted as well. To conclude, at class, record or enum level, the way the constructors can now be written is cleaned and improved, while the object safety is still preserved without any compromises. Primitive Types in Patterns, instanceof and switch (JEP 507) In general, pattern matching is a language procedure that basically combines a few steps into a feature that facilitates testing a particular value. The focus is on what is being checked and not necessarily on the means of doing it. In addition to situations where pattern matching is applied in case of instanceof and switch constructs, Java 25 allows using it with primitives — byte, short, int, long, float, double, char, boolean are now part of this model. The reference type boundary is now extended, making the feature uniform and more intuitive as the applicability restrictions have been reduced significantly. Let’s consider the following examples: Java void main() { Number doubleBoxed = 3.99; if (doubleBoxed instanceof int i) { IO.println("'num' fits in int: " + i); } else { IO.println("'num' does NOT fit losslessly in int (value=" + doubleBoxed + ")"); } IO.println(describe(Byte.MAX_VALUE)); IO.println(describe(Short.MAX_VALUE)); IO.println(describe(42)); IO.println(describe(Integer.MAX_VALUE)); IO.println(describe(Long.MAX_VALUE)); IO.println(describe(3.14f)); IO.println(describe(2.718281828459045)); } static String describe(Number n) { return switch (n) { case byte b -> n + " fits in byte → " + b; case short s -> n + " fits in short → " + s; case int i -> n + " fits in int → " + i; case long l -> n + " fits in long → " + l; case float f -> n + " fits in float → " + f; case double d -> n + " fits in double → " + d; case null, default -> n + " unknown numeric type"; }; } If run, it produces the below output: Plain Text 'num' does NOT fit losslessly in int (value=3.99) 127 fits in byte → 127 32767 fits in short → 32767 42 fits in int → 42 2147483647 fits in int → 2147483647 9223372036854775807 fits in long → 9223372036854775807 3.14 fits in float → 3.14 2.718281828459045 fits in double → 2.718281828459045 Observations: describe() allows to easily describe a Number as the most compact type it fits into (line 19)A Number reference can now be pattern-matched directly to a primitive (line 20)The feature enables safe, lossless narrowing checks without manual casting or range checks Going deeper with the exploration, what I personally find interesting regarding this feature is the deep nested patterns. The below example allows introspecting the object and directly matching the content. Java record Age(int years) {} record Wine(String name, Age age) {} void analyze(Object value) { IO.println("Analyzing - " + value); if (value instanceof Wine(String name, Age(int years))) { IO.println("Wine: " + name + " (" + years + " years old)"); } else { IO.println("Not a wine"); } } void main() { var value1 = new Wine("Merlot", new Age(10)); analyze(value1); var value2 = "Cabernet Sauvignon"; analyze(value2); } If run, the result is again obvious, but the code is clean, concise, and very expressive. Plain Text Analyzing Wine[name=Merlot, age=Age[years=10]] Wine: Merlot (10 years old) Analyzing Cabernet Sauvignon Not a wine To conclude, beginning with Java 25 in regard to the current state of the pattern matching feature, code has a great chance to become cleaner and safer as a whole. Scoped Values (JEP 506) As Project Loom brought virtual threads in Java, that definitely made room for another enhancement — passing immutable context between and across threads in a more structured, predictable, and safer way. ScopedValues are a finalized feature in Java 25 and allow exactly this, within the boundaries of a precise execution scope. To better understand them, let’s refer to the following simple example: Java static final ScopedValue<User> USER = ScopedValue.newInstance(); record User(int id, String name) {} static void handleFurther() { IO.println("handleFurther - start for " + USER.get()); ScopedValue.where(USER, new User(2, "AD")) .run(() -> { IO.println("handleFurther - something specific for " + USER.get()); }); IO.println("handleFurther - finished for " + USER.get()); } static void handle() { IO.println("handle - start for " + USER.get()); handleFurther(); IO.println("handle - finished for " + USER.get()); } void main() { ScopedValue.where(USER, new User(1, "HCD")) .run(() -> { IO.println("main - before handling - " + USER.get()); handle(); IO.println("main - after handling - " + USER.get()); }); //handle(); } The spot for the shared User is first created as USER. The context passed during the execution (and not as a parameter of the methods engaged) is the User instance. It might be seen as the “current” user. Once the instance is bound (line 23), its scope is clearly defined in the main() method and passed throughout the execution – to handle() and further to handleFurther(). Access is read-only; it cannot be changed. If during the execution flow it is re-set, as in handleFurther(), that is, a new (nested) sub scope is created and once this sub scope ends, the previous outer scope is continued. If run, the code produces the below output which exemplifies even more clearly what has already been stated. Properties files main - before handling - User[id=1, name=HCD] handle - start for User[id=1, name=HCD] handleFurther - start for User[id=1, name=HCD] handleFurther - something specific for User[id=2, name=AD] handleFurther - finished for User[id=1, name=HCD] handle - finished for User[id=1, name=HCD] main - after handling - User[id=1, name=HCD] In case handle() would be called outside the scope (line 30) and the code re-run, a clear exception is thrown upon reaching this point – Exception in thread "main" java.util.NoSuchElementException: ScopedValue not bound. Key points: where(…).run(…) – binds the value for the duration of the lambda, then unbinds it automatically – there’s no need for manual cleanup.Immutable within scope – once bound, it cannot be changed (but can be re-bound in a nested scope).Cheap with virtual threads – no copying, just a reference.Easy to reason about – the value is always what was bound at the top of the current scopeGood alternative to ThreadLocal which has unbounded lifetime, is mutable and pretty hard to reason about, as its value can be changed anywhere in the call stack.Works beautifully with Structured Concurrency (JEP 505) – child tasks automatically share the parent’s scoped values without copying. To conclude, scoped variables contribute a lot to the concurrency cleanness and safety and help prevent issues such as memory leaks or stale data leaking. Stable Values (JEP 502) I see this enhancement as enforcing effective immutability — both at instance and object level. If prior to Java 25 we created an instance, declared it final, initialized it, and documented that it shall remain unchanged, the reality was sometimes different, as some “content” of the instance was still mutable. StableValue feature allows constructing immutable instances by all means so that once initialized, the object content is guaranteed to remain unchanged as well. StableValues are a JVM enhancement that offers a way of achieving thread-safety and deep immutability, an alternative to accomplishing this via combining locks, synchronization, volatile variables and Atomic references. The behavior is thread-safe by design, detail ensured by the JVM’s internal handling of StableValues. Let’s examine the following code: Java static class User { private final StableValue<String> id = StableValue.of(); private final String name; public User(String name) { this.name = name; } public String id() { return id.orElseSet(() -> UUID.randomUUID().toString()); } public String name() { return name; } @Override public String toString() { return name + " (" + id() + ")"; } } private record Task(CountDownLatch latch, Runnable runnable) implements Runnable { @Override public void run() { try { latch.await(); } catch (InterruptedException e) { throw new RuntimeException(e); } runnable.run(); } } void main() { var user1 = new User("HCD"); IO.println("Created " + user1); var user2 = new User("Andrei"); IO.println("Created " + user2); IO.println("User's unique identifiers are: " + user1.id() + ", " + user2.id()); } Observations: A User is simply described by two attributes — while the name is provided at construction time, the id represents an internal unique identifier.id is declared as a StableValue and is lazily initialized when the value is read (if in a concurrent context, by the first thread that performs the action) Once initialized, this value is deeply immutable; it cannot be changed and remains as such until the object is destroyed If run, the output is the following: Properties files Created HCD (477a7dc1-c71f-4189-8c58-13994148ff95) Created Andrei (47647539-9cbe-4890-af23-050ee1fe9379) User's unique identifiers are: 477a7dc1-c71f-4189-8c58-13994148ff95, 47647539-9cbe-4890-af23-050ee1fe9379 It’s clear the ids are set when needed, and their values persist whenever read subsequently. One last observation is worth making regarding the User#id attribute — as a StableValue, it’s automatically thread-safe and lock-free. To demonstrate this, let’s run the next piece of code. Java void main() { var user = new User("Concurrent User"); var latch = new CountDownLatch(1); try (ExecutorService exec = Executors.newVirtualThreadPerTaskExecutor()) { Future<?> result1 = exec.submit(new Task(latch, () -> IO.println("Task1 - Id: " + user.id() + " at " + System.currentTimeMillis()))); Future<?> result2 = exec.submit(new Task(latch, () -> IO.println("Task2 - Id: " + user.id() + " at " + System.currentTimeMillis()))); Future<?> result3 = exec.submit(new Task(latch, () -> IO.println("Task3 - Id: " + user.id() + " at " + System.currentTimeMillis()))); latch.countDown(); result1.get(); result2.get(); result3.get(); } catch (ExecutionException | InterruptedException e) { throw new RuntimeException(e); } } Tasks 1, 2, and 3 are created and set to read the id of the user created in advance, then executed in parallel. The output below demonstrates that, in this particular run, Task 3 sets the id, and then Tasks 1 and 2 use the same value. Plain Text Task3 - Id: f7e12b49-5c21-4898-883b-12013824a683 at 1773834965123 Task1 - Id: f7e12b49-5c21-4898-883b-12013824a683 at 1773834965123 Task2 - Id: f7e12b49-5c21-4898-883b-12013824a683 at 1773834965123 StableValue also comes with quite a few higher-level helper methods (function(), intFunction(), list(), map(), supplier()), each of them useful and suitable in various scenarios. Below is an example of how the Singleton pattern could be implemented. Java record User(int id, String name) {} static class UserService { public UserService() { IO.println("UserService created"); } public void register(User user) { IO.println("Registered " + user); } } static UserService getInstance() { return USER_SERVICE_INSTANCE.orElseSet(UserService::new); } private static final StableValue<UserService> USER_SERVICE_INSTANCE = StableValue.of(); void main() { getInstance().register(new User(1, "HCD")); getInstance().register(new User(2, "Andrei")); } The aim is to have a single instance of the UserService that can be used to register users via the designated method. If we run it, the output is the one below, which clearly shows the constructor is called only once. Plain Text UserService created Registered User[id=1, name=HCD] Registered User[id=2, name=Andrei] To conclude, the StableValue enhancement ensures immutability enforced at JVM level – once the value is set, it’s stable and visible to all threads. Conclusions This article briefly covered a few Java 25 language enhancements, hoping that the straight-to-the-point examples presented offer a starting point for further deep-diving into these features. Whether you have already migrated to the latest LTS or not, whether you have started exploring the latest additions and improvements, I consider this worth doing whatsoever. At JavaOne ’26, during one of the opening keynotes, I remarked this quote: “Java is everywhere AI needs to be.” I couldn’t agree more. In a world where apparently everyone is preoccupied with “Accelerated Inference,” let’s remain optimistic about what the future will bring and continue to build and consolidate our Java foundation by exploring the new additions, staying up to date, and gradually embracing them in our personal and professional projects. Resources [1] – JDK 25 [2] – Sample code is available here.

By Horatiu Dan

CORE

Top 10 Best Places to Prepare for Your Next Data Engineer Interview

Landing a data engineering role means clearing a gauntlet that no other software discipline has to face all at once: airtight SQL, production-grade Python, data modeling instincts, distributed-compute fluency (Spark, warehouses, ETL), and system design that has to survive real data volume. Generic coding prep barely scratches the surface, and "just grind LeetCode" advice falls apart the moment an interviewer asks you to model a slowly changing dimension or reason about a skewed join. So we did the work. We evaluated the resources data engineers actually use, judged on five things that matter: relevance to the DE interview loop, depth of practice, realism of the questions, feedback quality, and price. Below is the ranked list. A quick note on methodology: this ranking favors resources that target the data engineering loop specifically, not generic algorithm grinding. That bias is intentional, and it is why the order may surprise you. 1. DataDriven.io Most "interview prep" platforms were built for generic SWE roles and bolt on a SQL section as an afterthought. This one was built from the ground up for the data engineering loop. The catchphrase you will hear repeated in DE communities is that DataDriven.io is LeetCode for data engineers, and it fits: instead of inverting binary trees, you are writing window functions against realistic schemas, designing star schemas, debugging an ETL transform, and reasoning about partitioning, all in an in-browser SQL and Python sandbox that runs your query against real data and tells you exactly where it broke. It is also the rare place where the whole product is built for the job rather than adjacent to it, which is why datadriven.io is great for data engineer interview prep specifically: SQL practice that ramps to multi-CTE analytics, a deep set of Python practice problems, plus data modeling, dimensional modeling, PySpark, and system-design tracks, with execution-based feedback and a difficulty curve that reaches the staff-level questions that actually separate offers from rejections. Verdict: The most targeted, realistic data engineering interview practice available today. Earns the top spot. 2. "Cracking the Coding Interview" (the book, by Gayle Laakmann McDowell) A deserved classic, and intentionally a book rather than a website. CTCI is still the best single artifact for understanding how technical interviews are actually structured: how the conversation flows, how to think out loud so the interviewer can follow your reasoning, how to recover when you get stuck, and how to handle the behavioral and negotiation segments that strong candidates routinely fumble. Most people lose offers not because they could not solve the problem but because they could not show their work, and this book is the canonical fix for that. Where it falls short for our purposes is scope. It will not teach you windowed SQL, slowly changing dimensions, or how to design a lakehouse, and its algorithm focus skews toward generalist software roles rather than the data engineering loop. The data structures and big-O chapters are still worth a pass because algorithm screens do show up, but treat them as a refresher, not your main event. Read CTCI once early in your prep to fix your interview mechanics, internalize the communication patterns, then spend the rest of your time on hands-on, domain-specific platforms. Verdict: Essential reading for interview mechanics; not a substitute for domain practice. 3. "Designing Data-Intensive Applications" (the book, by Martin Kleppmann) If CTCI teaches you how to interview, "DDIA" teaches you what a data engineer is actually supposed to know. Replication, partitioning, consistency models, batch versus stream processing, storage engine internals, the failure modes of distributed systems: this is the conceptual backbone of nearly every data engineering system design round. When an interviewer asks why you would choose a log-structured merge tree over a B-tree, or how you would keep two datastores in sync without losing events, the answers live in these pages. It is dense, and it is emphatically not an interview drill book. You will not find practice questions, and you cannot cram it the night before. What it gives you instead is judgment: the candidate who has internalized DDIA answers "how would you design this pipeline" with the calm of someone who has already thought through the tradeoffs, names the failure cases before being prompted, and explains why a choice holds up under real data volume. Read it slowly over weeks, ideally early in your prep, and pair it with a hands-on platform so the concepts attach to actual queries and schemas rather than floating as theory. Verdict: The definitive conceptual reference. Read it slowly, alongside real practice. 4. LeetCode The default destination, and it earns its spot for one practical reason: the Database problem set is sizable, the algorithm catalog is enormous, and the platform's brand means a large share of companies still pull their initial coding screen straight from it. If your target company is known to run a generic algorithm round before the data-specific rounds, you need exposure here, and the sheer volume of problems plus community discussion means you will rarely be surprised by a pattern you have never seen. The catch for data engineers is that LeetCode was built for the algorithm interview, not the DE loop. Its SQL section is genuinely solid but secondary; the questions are puzzle-shaped rather than drawn from real schemas, and you will not find data modeling, ETL design, dimensional modeling, or Spark anywhere on the platform. There is also a real failure mode here: candidates over-invest in LeetCode because it is comfortable and gamified, then walk into a DE loop under-practiced on the things that actually decide it. Use it deliberately to clear the algorithm gate and to keep your raw coding sharp, then move the bulk of your hours to resources that target data engineering directly. Verdict: Necessary for the algorithm screen; thin for the data-engineering-specific rounds. 5. HackerRank HackerRank is where a surprising number of companies host their take-home and timed online assessments, so practicing in its environment carries a payoff most resources cannot offer: you get comfortable with the exact editor, the exact test-case runner, and the exact time-pressure UI you may actually be scored in. For an assessment you cannot retake, that familiarity is worth real points, because fighting an unfamiliar interface while the clock runs is a self-inflicted way to lose. Its SQL and problem-solving tracks are beginner-friendly, well-structured, and free to work through. The ceiling, though, is lower than you want for a senior DE loop. The problems lean academic and self-contained rather than job-realistic, the SQL rarely reaches the messy multi-table analytics that real interviews probe, and there is nothing on modeling, pipelines, or system design. The smart way to use HackerRank is as format rehearsal: run a few timed sets so the assessment environment feels routine, then build your actual depth somewhere that mirrors the work. Do not let a green checkmark on an easy problem set convince you that you are loop-ready. Verdict: Great for getting comfortable with the testing environment; limited depth. 6. SQLZoo A long-running, completely free interactive SQL tutorial that runs entirely in the browser with no signup, no setup, and no paywall. It walks you from SELECT basics through joins, grouping, subqueries, and window functions, with short hands-on exercises after each concept so you are writing real queries from the first lesson rather than just reading about them. For anyone whose SQL has gone rusty, or who learned it informally and has gaps they cannot quite name, it is the most painless way to rebuild muscle memory before stepping up to interview-grade problems. It is a teaching tool, not an interview platform, and you should treat it as exactly that. The problems stay introductory, the datasets are small and tidy, and there is nothing on data modeling, ETL, pipelines, or system design — the parts of the loop that actually separate data engineers from analysts. Its value is as a fast diagnostic and warm-up: work through the sections that feel shaky, confirm your fundamentals are solid, then graduate to harder, execution-based practice against realistic schemas. Linger here too long, and you will plateau well below where a real interview will push you. Verdict: A friendly free SQL primer; foundational rather than interview-level. 7. "Python for Data Analysis" (by Wes McKinney) Written by the creator of pandas, this is the reference for the kind of data-wrangling Python that shows up constantly in DE take-homes and pairing rounds: reshaping, grouping and aggregating, merging on imperfect keys, handling missing values, parsing dates, and cleaning the kind of messy tabular data that never looks like a tidy LeetCode input. Many data engineering interviews quietly assume this fluency, then hand you a notebook and a dirty CSV and watch how you move; if your Python is sharp on algorithms but clumsy on real data manipulation, this book is exactly the gap-closer. It is a library-and-technique book, not interview prep, and it will not touch SQL, data modeling, distributed compute, or system design. There are also no interview questions to grind, which is fine, because its job is to make the tools second nature so that during a timed exercise you are reasoning about the problem instead of fumbling for the right pandas idiom. Read the chapters on data loading, cleaning, and group operations, keep it nearby as a reference, then go apply the techniques in hands-on practice against problems that actually resemble the job. Verdict: The definitive practical Python reference for data work; not a drill book. 8. "Fundamentals of Data Engineering" (the book, by Joe Reis & Matt Housley) Another deliberate book pick, and the best single survey of the modern data engineering lifecycle: generation, ingestion, storage, transformation, and serving, plus the cross-cutting concerns like orchestration, data quality, and governance that interviewers increasingly probe. Where DDIA goes deep on systems internals, this book goes broad on how the pieces fit together into a working data platform, which is precisely the framing you want for the "walk me through how you'd build X" and "what would you consider before choosing this approach" portions of a loop. It is a framework-and-vocabulary book, not a practice book, and that is both its strength and its limit. It will give you the mental model and the shared language to discuss tradeoffs like a practitioner, which makes you sound, accurately, like someone who understands the field. But it contains no exercises, so reading it alone will not build the hands-on skill an interviewer also tests. Use it to organize everything you know into a coherent lifecycle, fill the conceptual gaps, then go write the queries and design the schemas somewhere that gives you real feedback. Verdict: The best lifecycle overview in print; conceptual, not hands-on. 9. Mode SQL Tutorial A free, well-regarded interactive SQL tutorial built by an analytics company, which shows in its framing: it teaches SQL the way analysts and engineers actually use it, oriented around answering real questions from data rather than solving abstract puzzles. It runs in the browser, takes you from the basics through intermediate analytics queries including aggregation and the early window-function territory, and the explanations are unusually clear about why a query is shaped the way it is. For someone shoring up SQL foundations before diving into harder problems, it is one of the cleanest no-cost on-ramps available. Like SQLZoo, it is a tutorial rather than an interview-prep platform, so it stops well short of the difficulty a real DE loop will throw at you, and it covers none of the modeling, pipeline, or system-design ground. It is best read as a companion to a hands-on platform: use Mode to internalize the analytical mindset and clean up your SQL fundamentals, then take that foundation into execution-based practice where the problems are harder, the schemas messier, and the feedback tells you exactly where your query went wrong. Verdict: A clean free SQL on-ramp; foundational rather than interview-level. 10. Pramp/Interviewing.io (mock interviews) Rounding out the list: peer and expert mock interviews. All the solo practice in the world cannot reproduce the specific pressure of explaining your reasoning out loud to a real human while a clock runs and someone is judging you, and that pressure is exactly where otherwise-prepared candidates fall apart. A handful of mock loops surface the weaknesses you cannot see in yourself: the long silences, the jumping to code before clarifying the question, the inability to narrate a tradeoff. Pramp pairs you with peers for free, while Interviewing.io connects you with experienced interviewers, often anonymously, for higher-fidelity feedback. The honest limitation is supply and specificity. Data-engineering-focused interviewers are scarcer than generalist software ones, so depending on availability, you may land in an algorithm or general system-design mock that only partially mirrors a true DE loop. That is still worth doing, because the communication skills, the structure, the clarifying questions, the calm narration, transfer directly regardless of the exact problem. Schedule one or two once your technical prep is underway, treat the feedback as data, and fix the delivery habits well before the interview that counts. Verdict: Best for rehearsing delivery and nerves; DE-specific matches can be hit-or-miss. How to Actually Use This List You do not need all ten. A focused plan beats a scattered one: Build the foundation. Skim CTCI for interview mechanics and start DDIA for concepts.Do the reps where it counts. Spend the bulk of your time on hands-on, DE-shaped practice that maps directly onto what you will be asked (see #1).Patch specific gaps. Use LeetCode for the algorithm screen, SQLZoo or the Mode tutorial to shore up SQL, and a mock interview or two to rehearse out loud. The candidates who get offers are not the ones who consumed the most content. They are the ones who practiced the actual job. Pick the resources that put you closest to it, start today, and write more queries than you read. Good luck with your loop.

By Rahul Han

From Bash Script to Operational Triage: What Eight Months of Kubernetes Debugging Taught Me

In November 2025, I published a Bash script that analyzed Kubernetes clusters in about 60 seconds. It generated HTML reports, surfaced crash loops, orphaned resources, and other operational issues that were easy to overlook. The most interesting part wasn't the script — it was what happened after people started running it. Many told me they found problems they hadn't known existed. Looking back, the bash script wasn't really solving debugging. It was solving prioritization. I just didn't have the vocabulary for it yet. That script eventually became four different experiments, then a collection of small scanners, and eventually the dashboard shown in this article. Over the next eight months, that script evolved into OpsCart Watcher — an open-source operational triage dashboard for Kubernetes. This article is about what the journey taught me, and what I think is still missing from most Kubernetes environments. OpsCart Watcher — operational triage for Kubernetes (6 minutes) The Problem the Script Revealed The script did one thing well: it looked at an entire cluster and listed what was broken. Engineers who ran it kept telling me the same thing — "I had no idea this was there." That response was the important signal. These engineers had Grafana, Prometheus, and kubectl. Visibility was not their problem. The problem was that nothing told them to look at this specific namespace, this specific pod, this specific storage volume — before it became an incident. Consider a pod in CrashLoopBackOff for 19 days with 5,000+ restarts. To a metrics dashboard, that deployment looks healthy: replica count satisfied, a pod exists in Running state between crashes, CPU and memory flat because the container barely lives long enough to consume anything. The dashboard is answering the question it was built to answer — is the cluster meeting its SLOs? — and the answer is yes. The question nobody built tooling for: what deserves attention right now? LayerWhat It AnswersToolsMetricsIs the cluster meeting its SLOs?Prometheus, Grafana, DatadogPer-resource stateWhat is this specific pod doing?kubectl, k9s, LensOperational triageWhat deserves attention right now?Prioritizing operational work across cluster state What Triage Looks Like in Practice Overview page — Incident Score 41/100, KPI bar, Top 5, War Room panel The first time I ran the rebuilt dashboard against a cluster with real failures, the top of the screen didn't show me a CrashLoopBackOff pod. It showed me four CrashLoopBackOff pods spread across three namespaces, collapsed into a single operational problem: Plain Text 1. 4 pods crash-looping CRITICAL payments/fraud-detection (1810 restarts) → kubectl logs fraud-detection-... -n payments --previous That collapsing is the entire idea. Instead of inspecting every deployment individually, I was looking at a ranked list of operational problems — each with a severity, a location, and the exact kubectl command to start investigating. The full output for this environment: Plain Text Incident Score: 41/100 (Degraded) Top 5 Things to Fix: 1. 4 pods crash-looping CRITICAL 4 pods 2. 3 image_pull_backoff issues CRITICAL 3 items 3. 1 privileged_container issue CRITICAL 1 item 4. 1 namespace missing NetworkPolicy HIGH 1 ns 5. 3 orphaned PVCs wasting money MEDIUM 80 GB None of these had triggered an alert. All were present and accumulating before the scan. The Incident Score — a composite 0–100 across reliability, security, and waste — exists for one reason. Engineers fix incidents. Managers remember numbers. "We moved the Incident Score from 41 to 67" is a sentence that sticks. The crash loops and NetworkPolicies are the work behind it. The Step After Detection Finding problems was never the hard part. Knowing where to begin was. The most common feedback on the original bash script was some version of: "I found the problem, but I still didn't know what to do next." In March, I wrote about finding a container with 24,069 restarts that had been accumulating undetected. Finding it took sixty seconds. The next hour was the actual work: what do I run first? Is this configuration or code? Is it customer-facing? The investigation page is my answer to that hour. Investigation page — OpsCart Assessment, Evidence, Recommended Investigation One click from any triage finding opens a dedicated investigation view: Plain Text OpsCart Assessment This workload has restarted 1810 times over 6 days. The restart rate appears stable, suggesting a deterministic configuration or application failure rather than an intermittent infrastructure issue. No referenced ConfigMaps or Secrets were detected in the pod spec — missing configuration is unlikely to be the root cause. Investigation should begin with previous container logs. Estimated time: 5–10 minutes. Evidence [1810 Restarts] [CrashLoopBackOff] [6d] [Deployment/fraud-detection] Recommended Investigation HIGH CONFIDENCE Check previous container logs MEDIUM Verify ConfigMaps and Secrets exist LOW Check for OOMKill in events The assessment is rules-based — no AI. It reads restart count, failure pattern (stable vs accelerating), and referenced configuration objects, then produces a deterministic, auditable summary. The confidence levels reflect how a senior engineer actually reasons: previous logs are almost always the right first move for a crash loop; OOMKill is worth checking but less likely. This is the part kubectl doesn't give you. Neither does Lens, k9s, or Headlamp. From "What Is Broken?" to "What Changed?" The biggest architectural change came when the dashboard gained memory. The first version of the tool answered: "what is broken?" The current version — backed by a small embedded database recording every scan — answers "what changed?" That sounds like a minor distinction. Operationally, it changes everything. An incident that has existed for three days deserves different attention than one that appeared five minutes ago. A cluster whose Incident Score dropped eight points overnight is telling you something that no single scan can. War Room — critical issues with visual differentiation per type Every KPI now carries a trend arrow — critical issues up three since the last scan, waste down one — and the Incident Score shows a seven-point sparkline. Each incident is tracked with first-seen and last-seen timestamps and an active/resolved status, so "CrashLoopBackOff — first detected 6 days ago, still active" replaces "CrashLoopBackOff." Operational memory changed the tool from a scanner into something that remembers the history of a cluster. What This Is Not The triage pattern does not answer when an issue started at the metrics level, why an application is slow, or whether last Tuesday's deployment caused a regression. Prometheus, APM tooling, and deployment audit logs remain the right tools for those questions. The triage layer is not a replacement for observability. It is the layer that tells you which questions to ask of your observability stack. The Biggest Lesson When I started, I thought Kubernetes debugging was about collecting more information. It wasn't. Kubernetes already exposes almost everything an operator needs through its API. The difficult part is deciding what deserves attention first. Over eight months, I found myself spending less time searching for failures and more time ranking them. That is ultimately what OpsCart became — not another dashboard, but a prioritization engine for cluster operations. Why Open Source I considered keeping the dashboard private. Instead, I open-sourced it because operational patterns only become useful when they're tested across different clusters. Every environment fails differently, and I wanted the prioritization model to evolve from real-world feedback rather than a single infrastructure. The Remaining Gap The conclusion from my March article is still true: the question worth asking of your environment is not whether these conditions exist — they almost certainly do — but whether your current observability layer would surface them before they become incident preconditions. Eight months of building has only made that conclusion more specific. The gap is not data. The gap is attention: knowing which five things, out of hundreds of resources, deserve a human's time right now. Eight months ago I thought I was building a better debugging script. I wasn't. I was building something that helps operators decide where to spend the next ten minutes. About the environment: The scenarios shown in this article — CrashLoopBackOff pods, orphaned PVCs, missing NetworkPolicies, privileged containers — are representative of what OpsCart finds on real production clusters. The environment shown is a dedicated demonstration cluster configured with realistic failure scenarios. No production data was used. About the tool: OpsCart Watcher is open-source at github.com/opscart/opscart-k8s-watcher. It deploys as a single read-only container: Shell kubectl apply -f https://raw.githubusercontent.com/opscart/opscart-k8s-watcher/main/deploy/dashboard.yaml kubectl port-forward -n opscart-system svc/opscart-watcher 8080:80

By Shamsher Khan

CORE

Azure Databricks vs Microsoft Fabric: An Honest Guide to When to Use What

If you're building a data platform on Azure in 2026, you're going to be asked this question: Azure Databricks or Microsoft Fabric? Both run on Delta Lake, both integrate with ADLS Gen2, both have Spark, and both promise to be your unified data platform. The overlap is real, and the marketing doesn't help. This post is an honest breakdown of where each genuinely excels, where they overlap, and how to decide without getting lost in feature comparison tables. Architecture Comparison Decision Flow Detailed Capability Comparison CapabilityAzure DatabricksMicrosoft FabricWinnerSpark engineFull Spark, Photon, tunableSpark via Notebooks, less tunableDatabricksDelta LakeNative, full controlVia OneLake (Delta Parquet)TieMLflow / MLOpsNative, full MLflow stackBasic experiment trackingDatabricksModel servingDatabricks Model ServingAzure ML integrationDatabricksPower BI integrationDirectQuery via SQL WarehouseDirect Lake (zero-copy, faster)FabricSQL analyticsServerless SQL Warehouse + PhotonSQL Analytics EndpointTieData pipelinesDelta Live Tables, WorkflowsData Factory pipelines (mature)TieReal-time intelligenceSpark Streaming + KafkaEventstream + KQL DatabaseFabricSetup complexityMedium-highLow (SaaS)FabricFine-grained governanceUnity Catalog (mature)Purview integration (growing)DatabricksCost modelDBU + VMFabric capacity unitsComparableOpen format portabilityHigh (standard Delta/Parquet)Medium (OneLake but some lock-in)Databricks Step 1 — Reading Data from Fabric OneLake in Azure Databricks The good news: Fabric and Databricks can share data via OneLake, which speaks Delta format. You don't have to pick one and abandon the other. Python # Azure Databricks reading from Microsoft Fabric OneLake # OneLake exposes an ABFS-compatible endpoint # Authenticate using the workspace's Managed Identity or Service Principal tenant_id = dbutils.secrets.get("kv-scope", "sp-tenant-id") client_id = dbutils.secrets.get("kv-scope", "sp-client-id") client_secret = dbutils.secrets.get("kv-scope", "sp-client-secret") # OneLake uses the same ABFS protocol as ADLS Gen2 fabric_workspace_id = "your-fabric-workspace-guid" lakehouse_name = "your-lakehouse-name" onelake_host = "onelake.dfs.fabric.microsoft.com" spark.conf.set(f"fs.azure.account.auth.type.{onelake_host}", "OAuth") spark.conf.set(f"fs.azure.account.oauth.provider.type.{onelake_host}", "org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider") spark.conf.set(f"fs.azure.account.oauth2.client.id.{onelake_host}", client_id) spark.conf.set(f"fs.azure.account.oauth2.client.secret.{onelake_host}", client_secret) spark.conf.set(f"fs.azure.account.oauth2.client.endpoint.{onelake_host}", f"https://login.microsoftonline.com/{tenant_id}/oauth2/token") # Read a Delta table from Fabric Lakehouse fabric_path = f"abfss://{fabric_workspace_id}@{onelake_host}/{lakehouse_name}.Lakehouse/Tables/sales_gold" fabric_df = spark.read.format("delta").load(fabric_path) print(f"Rows from Fabric Lakehouse: {fabric_df.count()}") fabric_df.show(5) Step 2 — Writing Databricks Results Back to OneLake Run heavy ML feature engineering in Databricks, write results back to OneLake so Fabric Power BI can consume them via Direct Lake — zero-copy, sub-second dashboard refresh. Python from pyspark.sql.functions import current_timestamp, lit # Run your Databricks feature engineering / ML inference here result_df = spark.table("production.gold.churn_predictions") \ .withColumn("_computed_at", current_timestamp()) \ .withColumn("_source", lit("databricks-inference-job")) # Write back to Fabric OneLake as Delta output_path = f"abfss://{fabric_workspace_id}@{onelake_host}/{lakehouse_name}.Lakehouse/Tables/churn_predictions" result_df.write \ .format("delta") \ .mode("overwrite") \ .option("overwriteSchema", "true") \ .save(output_path) print(f"Written {result_df.count()} rows to Fabric OneLake.") print("Power BI Direct Lake will pick up changes automatically.") Step 3 — When to Use Fabric Notebooks vs Databricks Notebooks Not everything needs Databricks. Fabric Notebooks are good enough for lighter data prep that feeds Power BI reports. Python # This kind of transformation is fine in Fabric Notebooks # Use Fabric when: output goes directly to Power BI, team is analytics-focused, # no MLflow tracking needed, data volume < 100GB # Fabric Notebook (PySpark — same syntax as Databricks) from pyspark.sql.functions import col, sum as _sum, date_trunc df = spark.read.format("delta").load("Tables/sales_silver") summary = df \ .withColumn("month", date_trunc("month", col("sale_ts"))) \ .groupBy("month", "region", "product_category") \ .agg(_sum("revenue").alias("monthly_revenue")) \ .orderBy("month", "region") # Write to Lakehouse table — Power BI picks it up via Direct Lake summary.write.format("delta").mode("overwrite").saveAsTable("monthly_revenue_summary") # Use Databricks when: MLflow tracking needed, complex ML pipeline, # Unity Catalog governance required, data volume > 1TB, streaming workloads When to Use Which: Decision Framework Python # Use this as a mental checklist when deciding DATABRICKS_STRENGTHS = [ "Complex ML pipelines with MLflow experiment tracking", "Production model serving with A/B testing", "Fine-grained governance via Unity Catalog (row/column security)", "Spark Structured Streaming with Kafka / Event Hub", "Very large scale ETL (multi-TB, complex joins)", "Open-source tool integrations (dbt, Great Expectations, etc.)", "Multi-cloud or portability requirements", ] FABRIC_STRENGTHS = [ "Power BI as the primary consumption layer (Direct Lake = fastest)", "Analytics-focused teams without deep Spark expertise", "Microsoft 365 integration (Teams, SharePoint data sources)", "Real-time dashboards via Eventstream + KQL", "Fabric Data Factory for straightforward ELT pipelines", "Lower operational overhead — fully SaaS managed", "Already licensed via Microsoft 365 E5 / Fabric capacity", ] BOTH_TOGETHER = [ "Heavy ML/MLOps in Databricks, results published to OneLake for Power BI", "Fabric Data Factory for ingestion, Databricks for complex transformation", "Unity Catalog governing Databricks tables, Fabric consuming via shortcuts", ] Things to Watch in Production OneLake shortcuts are the integration bridge. Fabric Lakehouses support shortcuts that point to external Delta tables in ADLS Gen2 — the same storage Databricks writes to. This means Databricks writes once and Fabric reads without data movement. Set up shortcuts rather than copying data between platforms. Unity Catalog doesn't govern Fabric. Your row-level security and column masks in Unity Catalog do not apply when Fabric reads the same underlying Delta files directly. If governance is critical, either run everything through Databricks or replicate governance rules in Fabric's permission model. Fabric capacity units and Databricks DBUs are both usage-based but measure differently. Don't try to compare them directly. Run the same workload in both and compare wall-clock time and cost on your actual data sizes. Fabric ML is improving fast but isn't MLflow. As of early 2026, Fabric ML experiment tracking is functional but doesn't have the depth of MLflow's model registry, artifact storage, or model serving. If MLOps maturity matters, stay on Databricks for ML. Wrapping Up The honest answer is: most mature Azure data platforms in 2026 use both. Azure Databricks for ML, complex transformations, governance, and streaming. Microsoft Fabric for Power BI-first analytics, simpler pipelines, and teams that don't need the full Databricks stack. OneLake shortcuts and the shared Delta format make them composable rather than competitive. Pick based on your primary consumer: if it's Power BI dashboards, start with Fabric. If it's ML models and data products, start with Databricks. When you need both, they integrate cleanly. References Microsoft Fabric DocumentationOneLake — The OneDrive for DataFabric Lakehouse vs Azure DatabricksDirect Lake in Power BIOneLake ShortcutsAzure Databricks and Microsoft Fabric IntegrationUnity Catalog vs Fabric Data GovernanceFabric Eventstream — Real-Time Intelligence

By Jubin Abhishek Soni

CORE

Getting Started With RabbitMQ in Spring Boot

RabbitMQ is an enterprise-grade open-source messaging and streaming broker. In this blog, you will learn some basic concepts of RabbitMQ and how to use it in a Spring Boot application. Enjoy! Introduction Before diving into the programmatic details, first some concepts need to be explained. Do realize that in this blog, only the surface is scratched from what is possible with RabbitMQ. A detailed overview can be found in the official RabbitMQ documentation. Several protocols are supported by RabbitMQ. In this blog, the AMQP 0-9-1 protocol will be used. AMQP stands for Advanced Message Queuing Protocol. RabbitMQ receives messages from a publisher, a producing application, and routes them to consumers, applications that process the messages. A publisher publishes messages to an exchange (like a mailbox). The exchange then routes the messages to queues using bindings. RabbitMQ then delivers the messages to the consumers who are subscribed to the queues. The process is shown in the figure below. In the examples in the remainder of this blog, you will make use of a Topic Exchange. There are different exchange types, but for the sake of simplicity, only one will be used. A topic exchange routes messages to one or many queues, based on a message routing key. Topic exchanges are commonly used for multicast routing of messages. Sources used in this blog are available on GitHub in module topics. Prerequisites Prerequisites for reading this blog are: Basic knowledge of Java;Basic knowledge of Spring Boot;Basic knowledge of Docker Compose. Create Spring Boot Application In order to get started, you navigate to the Spring Initializr and add the following dependencies: Spring Web: in order to be able to send messages via an http request.Docker Compose Support: in order to start a RabbitMQ container when the application starts.Spring for RabbitMQ: in order to integrate Spring Boot with RabbitMQ. You will build the following: One Exchange with one Topic.Publish a general message to the topic which will be consumed by consumer A and consumer B.Publish a specific message to the topic which will be only consumed by consumer B. In order to send a general and a specific message, two HTTP endpoints are created in the MessageController. Java @RestController public class MessageController { private MessageService messageService; public MessageController(MessageService messageService) { this.messageService = messageService; } @RequestMapping( method = RequestMethod.POST, value = "send-general" ) public ResponseEntity<Void> sendGeneralMessage(@RequestBody String message) { messageService.sendMessage("event.general.message", message); return new ResponseEntity<>(HttpStatus.CREATED); } @RequestMapping( method = RequestMethod.POST, value = "send-specific" ) public ResponseEntity<Void> sendSpecificMessage(@RequestBody String message) { messageService.sendMessage("event.specific.message", message); return new ResponseEntity<>(HttpStatus.CREATED); } } The requests are forwarded to a MessageService.sendMessage method, which takes a routingKey and the message as arguments. The message is taken from the http request body, the routingKey is hardcoded. Remember that the routingKey determines to which queue the message will be routed. In the service, you make use of Spring Boot's RabbitTemplate in order to send the message to RabbitMQ. Java @Service public class MessageService { private RabbitTemplate rabbitTemplate; public MessageService(RabbitTemplate rabbitTemplate) { this.rabbitTemplate = rabbitTemplate; } public void sendMessage(String routingKey, String message) { rabbitTemplate.convertAndSend(RabbitMqConfig.TOPIC_EXCHANGE_NAME, routingKey, message); } } Bind Consumer A Consumer A will consume general messages. The queue needs to be bound to the Topic Exchange with the routing key. Create a RabbitMqConfig class with: A TopicExchange bean with name events.exchange.A Queue bean for consumer A with name consumer-a.queue.A binding bean for consumer A connecting the queue of consumer A to the TopicExchange with the routing key for the general messages. Do note that the name of the queue in method bindingConsumerA needs to match the queueConsumerA bean name. Java Configuration public class RabbitMqConfig { public static final String QUEUE_CONSUMER_A = "consumer-a.queue"; public static final String TOPIC_EXCHANGE_NAME = "events.exchange"; public static final String ROUTING_KEY_GENERAL_MESSAGE = "event.general.*"; @Bean TopicExchange eventsExchange() { return new TopicExchange(TOPIC_EXCHANGE_NAME); } @Bean public Queue queueConsumerA() { return new Queue(QUEUE_CONSUMER_A, false); } @Bean Binding bindingConsumerA(Queue queueConsumerA, TopicExchange exchange) { return BindingBuilder.bind(queueConsumerA).to(exchange).with(ROUTING_KEY_GENERAL_MESSAGE); } } Create Consumer A Next thing to do is to consume the messages from queue A. Create a Component named ReceiverA. Annotate the method for processing the messages with @RabbitListener and connect it to queue A. When receiving the message, just print it to the console. Java @Component public class ReceiverA { @RabbitListener(queues = RabbitMqConfig.QUEUE_CONSUMER_A) public void receiveMessage(String message) { System.out.println("Queue Consumer A received <" + message + ">"); } } Run the Application In order to run the application, you will need RabbitMQ. Since you have added Docker Compose Support to the project earlier, you can just add a compose.yaml in the root of the repository. YAML services: rabbitmq: image: rabbitmq:3.13-management-alpine # Stable, lightweight, includes management UI container_name: rabbitmq ports: - "5672:5672" # AMQP - "15672:15672" # Management console environment: RABBITMQ_DEFAULT_USER: secret RABBITMQ_DEFAULT_PASS: myuser Also add the connection parameters for RabbitMQ to the application.properties file. Properties files spring.rabbitmq.host=localhost spring.rabbitmq.port=5672 spring.rabbitmq.username=secret spring.rabbitmq.password=myuser Start the application. Shell mvn spring-boot:run You will notice that RabbitMQ is started automatically. Send a general message. Shell curl -X POST http://localhost:8080/send-general \ -H "Content-Type: text/plain" \ -d "This is a general message" The console log will print the following. Plain Text Queue Consumer A received <This is a general message> Stop the application. Bind Consumer B Consumer B will process general messages, but also specific messages. Add to the RabbitMqConfig the queue for consumer B, and bind it to the exchange with respectively the general message routing key and the specific message routing key. Java @Configuration public class RabbitMqConfig { public static final String QUEUE_CONSUMER_A = "consumer-a.queue"; public static final String QUEUE_CONSUMER_B = "consumer-b.queue"; public static final String TOPIC_EXCHANGE_NAME = "events.exchange"; public static final String ROUTING_KEY_GENERAL_MESSAGE = "event.general.*"; public static final String ROUTING_KEY_SPECIFIC_MESSAGE = "event.specific.*"; ... @Bean public Queue queueConsumerB() { return new Queue(QUEUE_CONSUMER_B, false); } @Bean Binding bindingConsumerBGeneral(Queue queueConsumerB, TopicExchange exchange) { return BindingBuilder.bind(queueConsumerB).to(exchange).with(ROUTING_KEY_GENERAL_MESSAGE); } @Bean Binding bindingConsumerBSpecific(Queue queueConsumerB, TopicExchange exchange) { return BindingBuilder.bind(queueConsumerB).to(exchange).with(ROUTING_KEY_SPECIFIC_MESSAGE); } } Create Consumer B Consumer B is created just like consumer A. Create a ReceiverB class in order to receive the queue B messages. Java @Component public class ReceiverB { @RabbitListener(queues = RabbitMqConfig.QUEUE_CONSUMER_B) public void receiveMessage(String message) { System.out.println("Queue Consumer B received <" + message + ">"); } } Run the Application Start the application. Shell mvn spring-boot:run Send a general message. Shell curl -X POST http://localhost:8080/send-general \ -H "Content-Type: text/plain" \ -d "This is a general message" The message is now received by Consumer A and Consumer B. Plain Text Queue Consumer B received <This is a general message> Queue Consumer A received <This is a general message> Send a specific message. Shell curl -X POST http://localhost:8080/send-specific \ -H "Content-Type: text/plain" \ -d "This is a specific message" The message is only received by Consumer B. Plain Text Queue Consumer B received <This is a specific message> Management Console Also take a look at the RabbitMQ management console, which is accessible at http://localhost:15672/. Here you can see the exchanges, the queues, the bindings, etc. Conclusion In this blog, you learned some basics of RabbitMQ using the AMQP 0-9-1 protocol. You learned how easy it is to integrate this within your Spring Boot application.

By Gunter Rotsaert

CORE

AI Is Making PHP Cool Again

Somewhere right now, an engineer is making the case to rewrite a working PHP app in Node, and the pitch includes the word "modern." I have heard a version of this for fifteen years. The app ships. The customers are happy. The code is unfashionable. And somebody wants to tear it down and rebuild it on a stack that looks better on a resume. I have shipped software for more than 20 years, and these days I spend a lot of my time watching AI coding agents write it. So here is a take that is going to sound backward: the thing everyone makes fun of PHP and Laravel for — that they are rigid, opinionated, and boring- is the exact thing that makes coding agents so good at them. When a machine writes a big chunk of your code, the most valuable thing your framework can give you is predictability, not flexibility. And the trendy, flexible stack the rewrite crowd wants is quietly making your AI tooling worse. The Thing That Makes a Stack Feel Modern Makes AI Worse at It A coding agent is a pattern matcher with a context window. It is good at your codebase to the degree that your codebase looks like the millions of others it trained on, and to the degree that it can guess where things go without reading the whole repo first. A bespoke Node service is the opposite of that. Node and Express enforce almost no structure, and that gets sold as a feature. You arrange the project however your team likes. One team puts routes in routes/. Another co-locates them with handlers. A third invents a domain-folder layout from a blog post someone read once. Controllers, services, models, and middleware live wherever this particular team decided. For a senior team, that freedom is genuinely nice. It is also poison for an agent. When you ask the model to add an endpoint, it first has to infer your project's private conventions from whatever it can see, then guess at the rest. Two runs of the same prompt come out different, because there is no canonical answer to "where does this go." The agent burns its effort rebuilding context your layout never standardized, instead of writing the feature. This is not really a Node problem. It is a configuration-over-convention problem, and it shows up anywhere the layout is a per-team decision. Even Django, a real framework with real conventions, leaves you enough rope (models in one file or split across many, your pick of API layer) that the AI output wobbles more than it does in a stricter framework. The more the framework leaves up to you, the more the agent has to guess. Convention Over Configuration Was an AI Strategy Before There Was AI Now open any Laravel project, built by any team, in any country. You already know where everything is. Models in app/Models. Controllers in app/Http/Controllers. Policies in app/Policies. Migrations follow the same timestamped naming every time. This is convention over configuration, the principle Rails made famous, and Laravel built its whole developer experience around. For two decades it was sold as a way to stop bikeshedding and onboard humans faster. It turns out it was an AI strategy the whole time, and nobody knew it yet. When the file always lives in the same place, and the code always follows the same idiom, the model has effectively seen your project a million times before it ever touches it. The structure it is predicting is not your team's private invention. It is the global standard, which is exactly what the model trained on. So the generated code comes out idiomatic, lands in the right directory, and looks the same across two runs of the same prompt. Laravel even ships official AI-assisted-development docs now, plus a tool called Boost that feeds an agent the framework's own conventions. That is the tell. The thing that makes a framework easy for a new human to read — everything is where you would expect — is the same thing that makes it easy for a machine. AI just raised the payoff on being predictable. What This Looks Like When You Actually Ship I am not making this argument in the abstract. I am watching it play out in my own company's products. Our newest product, ProductWave, is built entirely on PHP and Laravel. Not out of nostalgia. We got tired of the JavaScript churn, the dependency hell, the new framework every nine months, the constant re-platforming. Laravel is opinionated in the right places. You get auth, queues, an ORM, scheduling, and a sane directory structure on day one, so you stop arguing with the tooling and start shipping features. The AI part is what made the bet pay off harder than I expected. Because Laravel's conventions are so consistent, the agents we use write noticeably better code in our Laravel apps than in a from-scratch Node service where every team invented its own layout. Same file, same place, every time. So the output is idiomatic instead of improvised, and it holds up across runs. Here is the difference in the terms that actually matter when an agent is writing your code: What the coding agent facesConvention stack (Laravel, Rails)Bespoke stack (hand-rolled Node)Where a new controller goesSame path in every project on earthWherever this team decided, if anyone didStyle of the generated codeMatches the public examples it trained onMatches your house pattern, if one existsTwo runs of the same promptMostly consistentVary run to runContext it must rebuild per repoAlmost none, the structure is the standardMost of it, the layout is privateHow a new engineer (or agent) reads itLike every other projectLike a new language None of this needs the framework to be technically better on every axis. It needs the framework to make the same decision every time, so neither your new hire nor your AI has to wonder. PHP Got Written Off Years Ago. It Is Worth a Second Look. I know the objection, because the rewrite pitch always carries it: PHP is slow, untyped, stuck in 2010. If your last serious PHP experience was a PHP 5.6 codebase, that picture is more than a decade out of date. PHP 8 added a JIT compiler and a real type system. Union types, readonly properties, enums, the match expression, and Fibers for async are all standard now: PHP // PHP 5.6 function process($value) { if (is_int($value) || is_float($value)) { return calculate($value); } } // PHP 8.x function process(int|float $value): float { return calculate($value); } The performance cliche is just as stale. When Tumblr moved its fleet from PHP 5 to PHP 7, the engineering team documented latency dropping by half and CPU load falling at least 50 percent, and PHP 8 kept climbing from there. This is not a dead language. By W3Techs' numbers, it still runs roughly three-quarters of the websites with a known server-side language, and it powers production at the scale of Etsy and Slack. There are good, boring reasons companies still run on PHP. It is unfashionable on Hacker News, which is a very different thing from being dead. The Rewrite Reflex Gets It Backward So why does the rewrite argument keep coming up? Usually it is what I call resume-driven development. The stated reason is "PHP is outdated." The real reason is that an engineer wants the trendy stack on their resume for the next interview. That is rational for the individual and a disaster for the roadmap. I say that as someone who has approved the rewrite and regretted it! Every team I have watched hit this fork landed the same way. The ones that worked said no to the rewrite, modernized the stack they had, and kept shipping customer value. The ones that did not approve it, spent the better part of two years rebuilding what already worked, shipped nothing new in the meantime, and watched competitors eat their lunch. The AI era adds a line to that math the rewrite crowd never accounts for. When you tear down a legible, convention-driven Laravel app and rebuild it as a bespoke service in a flexible stack, you are not just paying the old rewrite tax. You are actively making your codebase harder for the AI tooling you are betting your future speed on. You are trading a structure the model understands for one it has to relearn. You are spending two years to make your own agents worse at their job. That is the opposite of modernization. What You Should Actually Do You do not have to adopt PHP to use any of this. The principle is about convention, not about a language. For greenfield work, bias toward an opinionated framework. Laravel, Rails, and the convention-heavy frameworks in any language give an agent a predictable surface to generate against. The "we will assemble our own stack" instinct feels powerful and quietly costs you AI quality.Modernize the app you have instead of rewriting it. If you are on an old PHP or Laravel version, upgrade it and adopt the conventions fully. You will get more out of your agents from a current, consistent codebase than from a brand-new language, at a fraction of the cost and risk.If you are stuck in a flexible stack, impose convention anyway. Pick a canonical layout, document it, lint for it, and keep it identical across services. The agent cannot read your mind, but it will follow a structure you actually enforce. Most of the AI-quality gap closes the moment the layout stops being a per-team decision.Stop treating "boring" as an insult. Boring means predictable. Predictable means staffable, and now it means legible to a machine too. In an AI shop, that is the competitive choice, not the compromise. The Bottom Line For fifteen years, the knock on Laravel was that it makes your decisions for you. That was always a strange thing to complain about. Now it is the entire advantage, and the agents are the ones cashing it in.

By Matt Watson

Real-Time Face Recognition Using OpenCV, Dlib, and Python

Face recognition has become one of the most widely used applications of artificial intelligence and computer vision. From smartphone authentication and smart surveillance systems to attendance management and access control solutions, facial recognition technology plays an important role in identifying individuals automatically. Advances in machine learning and image processing have made it possible to develop accurate face recognition systems using open-source tools and libraries. This project demonstrates the implementation of a real-time face recognition application using Python, OpenCV, Dlib, and the Face Recognition library. The application captures video input, detects human faces, generates facial feature encodings, and compares them against a database of known individuals. Once a match is identified, the system displays the corresponding name on the video frame. The project serves as an excellent example of practical computer vision implementation and provides a foundation for developing more advanced AI-based recognition systems. Environment Setup Before executing the application, it is important to prepare the development environment properly. Since the Face Recognition library depends heavily on Dlib, several prerequisites must be installed. The first requirement is Python. Any recent Python version can be used, and users who already have Anaconda installed can use the existing Python environment. The next step involves installing CMake. CMake is required to build and compile Dlib successfully. After downloading the latest stable Windows installer, the installation process should include adding CMake to the system PATH. Once installation is completed, the system should be restarted to ensure all environment variables are updated correctly. Another important requirement is Microsoft Visual Studio Build Tools. During installation, the "Desktop Development with C++" workload must be selected. These build tools provide the necessary compiler and development libraries required by Dlib. After installation, a system restart is recommended. Once the system prerequisites are installed, the Python libraries can be installed using the following commands: Python pip install cmake pip install opencv-python pip install dlib pip install face-recognition In some cases, Dlib installation may fail because of incomplete dependencies. Running the installation through an Anaconda Prompt with administrator privileges often resolves such issues. After installation, Dlib can be verified by importing it into Python and printing the installed version. Python #importing the required libraries import cv2 import face_recognition #capture the video from default camera webcam_video_stream = cv2.VideoCapture('images/testing/image1.mp4') #load the sample images and get the 128 face embeddings from them image1_image = face_recognition.load_image_file('images/samples/image1.jpg') image1_face_encodings = face_recognition.face_encodings(image1_image)[0] image2_image = face_recognition.load_image_file('images/samples/image2.jpg') image2_face_encodings = face_recognition.face_encodings(image2_image)[0] sen_image = face_recognition.load_image_file('images/samples/sen.jpg') sen_face_encodings = face_recognition.face_encodings(sen_image)[0] #save the encodings and the corresponding labels in seperate arrays in the same order known_face_encodings = [image1_face_encodings, image2_face_encodings, sen_face_encodings] known_face_names = ["Person1", "Person2", "Person3"] #initialize the array variable to hold all face locations, encodings and names all_face_locations = [] all_face_encodings = [] all_face_names = [] #loop through every frame in the video while True: #get the current frame from the video stream as an image ret,current_frame = webcam_video_stream.read() #resize the current frame to 1/4 size to proces faster current_frame_small = cv2.resize(current_frame,(0,0),fx=0.25,fy=0.25) #detect all faces in the image #arguments are image,no_of_times_to_upsample, model all_face_locations = face_recognition.face_locations(current_frame_small,number_of_times_to_upsample=1,model='hog') #detect face encodings for all the faces detected all_face_encodings = face_recognition.face_encodings(current_frame_small,all_face_locations) #looping through the face locations and the face embeddings for current_face_location,current_face_encoding in zip(all_face_locations,all_face_encodings): #splitting the tuple to get the four position values of current face top_pos,right_pos,bottom_pos,left_pos = current_face_location #change the position maginitude to fit the actual size video frame top_pos = top_pos*4 right_pos = right_pos*4 bottom_pos = bottom_pos*4 left_pos = left_pos*4 #find all the matches and get the list of matches all_matches = face_recognition.compare_faces(known_face_encodings, current_face_encoding) #string to hold the label name_of_person = 'Unknown face' #check if the all_matches have at least one item #if yes, get the index number of face that is located in the first index of all_matches #get the name corresponding to the index number and save it in name_of_person if True in all_matches: first_match_index = all_matches.index(True) name_of_person = known_face_names[first_match_index] #draw rectangle around the face cv2.rectangle(current_frame,(left_pos,top_pos),(right_pos,bottom_pos),(255,0,0),2) #display the name as text in the image font = cv2.FONT_HERSHEY_DUPLEX cv2.putText(current_frame, name_of_person, (left_pos,bottom_pos), font, 0.5, (255,255,255),1) #display the video cv2.imshow("Webcam Video",current_frame) if cv2.waitKey(1) & 0xFF == ord('q'): break #release the stream and cam #close all opencv windows open webcam_video_stream.release() cv2.destroyAllWindows() Project Overview The objective of this project is to identify known individuals appearing in a video stream. The system uses previously stored facial images as references. Each reference image is converted into a mathematical representation known as a facial embedding. When a video frame is processed, faces are detected and converted into similar embeddings. These embeddings are compared with stored embeddings, and if a match is found, the person's name is displayed on the screen. The workflow consists of the following stages: Load reference imagesGenerate face encodingsCapture video framesDetect faces in each frameGenerate facial embeddingsCompare embeddingsDisplay recognition results Loading Sample Images The application begins by loading images of known individuals. These images act as training references for the recognition process. The Face Recognition library provides a convenient function called load_image_file() which reads image files and converts them into arrays suitable for processing. For each image, the system generates a 128-dimensional face encoding. These encodings capture unique facial characteristics and serve as the identity signature of an individual. The generated encodings are stored in an array together with corresponding labels. Maintaining the same order between encodings and names ensures accurate identification later during matching. Video Stream Processing After loading the known faces, the application opens a video source. The video can be captured from a webcam or from a video file. Each frame is processed continuously within a loop. Since facial recognition is computationally intensive, the frame is resized to one-quarter of its original size. This optimization significantly improves processing speed while maintaining sufficient accuracy. Reducing image size helps the system achieve smoother real-time performance, especially on systems without dedicated graphics hardware. Face Detection Once a frame is resized, the system searches for human faces within the image. The Face Recognition library internally uses Dlib's face detector to locate faces. The code specifies the HOG (Histogram of Oriented Gradients) model, which is known for its balance between accuracy and performance. The detector returns coordinates representing the location of each detected face. These coordinates include the top, right, bottom, and left boundaries of the face. Because detection is performed on a reduced-size image, the coordinates are multiplied by four to map them back to the original frame dimensions. Face Encoding Generation After face locations are identified, the application generates face encodings for each detected face. A face encoding is a numerical representation consisting of 128 values. These values describe the unique characteristics of a person's face and allow efficient comparison between different faces. The encoding process is one of the most important stages of the recognition pipeline because it transforms visual information into mathematical data suitable for machine learning comparison. Face Matching The generated encoding from the current frame is compared against the stored encodings of known individuals. The compare_faces() function performs this comparison and returns a list indicating whether each stored encoding matches the current face. If a match exists, the application retrieves the index of the matching encoding and uses that index to obtain the corresponding person's name. If no match is found, the face is labeled as "Unknown Face." This approach provides a simple yet effective mechanism for identifying individuals in real time. Displaying Recognition Results Once a face is identified, the application visually highlights the result. A rectangle is drawn around the detected face using OpenCV. The recognized person's name is displayed near the face boundary using text rendering functions. These visual annotations provide immediate feedback and allow users to observe recognition results directly within the video stream. The video continues processing until the user presses the "Q" key to terminate execution. Applications The concepts demonstrated in this project can be applied to numerous real-world scenarios. Common applications include: Employee attendance systemsSmart access controlSecurity monitoringVisitor identificationAutomated authenticationEducational research projectsAI-powered surveillance systems Organizations can integrate similar systems into existing infrastructure to improve security and operational efficiency. Future Enhancements Although the current implementation performs effectively, several improvements can be introduced. Future enhancements may include: GPU acceleration for faster processingDeep learning-based face detection modelsMulti-camera supportCloud-based facial databasesEmotion detectionFace mask recognitionAttendance report generationIntegration with mobile applications These improvements would increase scalability and accuracy while enabling deployment in larger environments. Conclusion This project demonstrates the practical implementation of real-time face recognition using Python, OpenCV, Dlib, and the Face Recognition library. By combining face detection, feature extraction, and facial matching techniques, the system successfully identifies known individuals appearing in a video stream. The installation process, while requiring several dependencies such as CMake and Visual Studio Build Tools, provides a stable environment for Dlib and facial recognition functionality. The project highlights how modern computer vision libraries can be used to build intelligent recognition systems with relatively simple code. It serves as an excellent learning platform for students, researchers, and software developers interested in artificial intelligence, machine learning, and image processing technologies.

By venkataramaiah gude

A Step-by-Step Guide to Implementing Columnar Tables in SQL Server

Columnar storage was introduced in SQL Server 2016 as part of the SQL Server 2016 In-Memory OLTP feature. It is specifically designed for data warehousing and analytical workloads, where large amounts of data need to be scanned, aggregated, or analyzed efficiently. Columnar storage stores data in a column-wise format rather than the traditional row-wise storage, offering significant performance benefits for read-heavy operations such as reporting and analytics. Key Benefits of Columnar Storage Faster read performance: Optimized for analytics where only a few columns are needed in a query. Compression: Since column data is homogeneous, it achieves high compression rates, saving storage space. Improved query performance: Aggregating or scanning specific columns is much faster in a columnar format, especially with large datasets. Setting Up Columnar Tables in SQL Server SQL Server implements columnar storage through the Columnstore Index. The Columnstore Index is a special kind of index used in large data tables where the data is stored in columns rather than rows. The clustered columnstore index (CCI) is the preferred method when creating columnar tables. Step 1: Create a Sample Table Let's start by creating a table with a large number of rows, which we will populate with random data to demonstrate the difference between row-store and column-store formats. SQL -- Creating a traditional Rowstore Table CREATE TABLE SalesData_RowStore ( SalesOrderID INT, ProductID INT, Quantity INT, SalesAmount DECIMAL(18, 2), OrderDate DATE ); Step 2: Insert Data Into Rowstore Table For the sake of performance demonstration, we will generate a large set of random data. MS SQL -- Generate a large set of random data for Rowstore Table DECLARE @Counter INT = 0; WHILE @Counter < 1000000 BEGIN INSERT INTO SalesData_RowStore (SalesOrderID, ProductID, Quantity, SalesAmount, OrderDate) VALUES (FLOOR(RAND() * 1000) + 1, FLOOR(RAND() * 100) + 1, FLOOR(RAND() * 100) + 1, FLOOR(RAND() * 500) + 1, DATEADD(DAY, FLOOR(RAND() * 365) + 1, GETDATE())); SET @Counter = @Counter + 1; END Implementing Columnstore Index (Columnar Table) Step 1: Create a Columnstore Table Now, let's create a table with a clustered columnstore index (CCI). This index allows SQL Server to store the data in a columnar format. MS SQL -- Creating a Columnstore Table with Clustered Columnstore Index CREATE TABLE SalesData_ColumnStore ( SalesOrderID INT, ProductID INT, Quantity INT, SalesAmount DECIMAL(18, 2), OrderDate DATE ); MS SQL CREATE CLUSTERED COLUMNSTORE INDEX CCI_SalesData ON SalesData_ColumnStore; Step 2: Insert the Same Data Into the Columnstore Table You can insert the same large dataset into the columnar table in the same way. SQL -- Insert data into Columnstore Table DECLARE @Counter INT = 0; WHILE @Counter < 1000000 BEGIN INSERT INTO SalesData_ColumnStore (SalesOrderID, ProductID, Quantity, SalesAmount, OrderDate) VALUES (FLOOR(RAND() * 1000) + 1, FLOOR(RAND() * 100) + 1, FLOOR(RAND() * 100) + 1, FLOOR(RAND() * 500) + 1, DATEADD(DAY, FLOOR(RAND() * 365) + 1, GETDATE())); SET @Counter = @Counter + 1; END Query Performance Without Columnar Index Let's execute a typical query that aggregates data by ProductID and OrderDate. This will involve scanning through a large amount of data in the rowstore table. MS SQL -- Query on Rowstore Table SELECT ProductID, SUM(SalesAmount) AS TotalSales FROM SalesData_RowStore WHERE OrderDate > '2023-01-01' GROUP BY ProductID; Expected Outcome The query will scan all the rows in the table. Rowstore tables are not optimized for this type of query, and the performance might degrade with large datasets due to the need to read each row. Query Performance With Columnar Index Let's run the same query on the columnar table using a Clustered Columnstore Index. MS SQL -- Query on Columnstore Table SELECT ProductID, SUM(SalesAmount) AS TotalSales FROM SalesData_ColumnStore WHERE OrderDate > '2023-01-01' GROUP BY ProductID; Expected Outcome The columnar index stores the data by columns, and SQL Server can read only the relevant columns for the query (i.e., ProductID and SalesAmount). Columnstore indexes are highly optimized for these types of queries, resulting in much faster query execution time. Comparing the Performance of Both Scenarios To compare the performance of the two scenarios, we will execute both queries and check the execution plan and query duration. Step 1: Query Execution Plan Without Columnstore You can use the following query to view the execution plan for the rowstore table. MS SQL -- Displaying Execution Plan for Rowstore Table SET STATISTICS IO ON; SET STATISTICS TIME ON; SELECT ProductID, SUM(SalesAmount) AS TotalSales FROM SalesData_RowStore WHERE OrderDate > '2023-01-01' GROUP BY ProductID; SET STATISTICS IO OFF; SET STATISTICS TIME OFF; This will provide information on: Logical reads: The number of data pages read from diskCPU time: How much CPU time was consumedElapsed time: The total time taken to execute the query Step 2: Query Execution Plan With Columnstore Now, execute the same for the columnstore table. MS SQL -- Displaying Execution Plan for Columnstore Table SET STATISTICS IO ON; SET STATISTICS TIME ON; SELECT ProductID, SUM(SalesAmount) AS TotalSales FROM SalesData_ColumnStore WHERE OrderDate > '2023-01-01' GROUP BY ProductID; SET STATISTICS IO OFF; SET STATISTICS TIME OFF; In the execution plan for the columnstore table, SQL Server will typically show fewer logical reads and significantly lower CPU time, as it only scans the necessary columns. Performance Improvements in Columnar Tables Scenario 1: Data Compression Columnar storage achieves higher compression rates because data is stored in homogeneous chunks, which makes it more efficient in terms of storage. Compression reduces disk I/O during query execution. Scenario 2: Selective Column Scanning When querying only a few columns, columnar storage avoids scanning the entire row. In contrast, rowstore requires scanning all columns in every row, even if only a subset is required for the query. Conclusion In this example, we demonstrated how implementing columnstore indexes in SQL Server can significantly improve query performance, especially for analytics and aggregation queries on large datasets. The comparison showed that columnar storage excels in reducing query times by optimizing disk I/O, leveraging data compression, and selectively reading only the necessary columns. As a result, columnstore indexing is a great choice for data warehousing or any scenario where read performance for large datasets is critical.

By arvind toorpu

CORE

Designing Tool-Calling AI Agents That Survive Production: A LangGraph Approach

Most agent demos work beautifully on stage and fall apart the first week in production. The reason is almost always the same: the demo treats tool-calling as a happy path, and production is nothing but edge cases. A tool times out. A model hallucinates an argument. The agent loops on itself and burns through your token budget. After shipping a few of these systems, I have learned that the durable design question is not "can the agent call a tool" but "what happens when the tool call goes wrong." This tutorial walks through a tool-calling agent in LangGraph built the way I would build it for production, with the safeguards baked in from the first commit rather than bolted on after the first incident. What We Are Building To keep the focus on production patterns rather than business logic, we will build a small but realistic agent: a currency assistant. A user asks a plain-language question like "What is the USD to INR rate?" and the agent answers using live foreign-exchange data rather than guessing. The model itself has no idea what today's rate is, so it must recognize that it needs data, call a get_exchange_rate tool to fetch it, and then return the actual result. That is the entire reason tool-calling exists: it turns a model that can only talk into an agent that can act and ground its answers in real data. FX rates are a good teaching example because the failure modes are obvious and unforgiving. A wrong currency code, an unsupported pair, or a flaky data source are exactly the kinds of things that must not crash a production agent. The Mental Model A tool-calling agent is a loop. The model looks at the conversation and decides whether to answer directly or to call a tool. If it calls a tool, your code runs that tool, feeds the result back, and the model decides again. That loop is exactly what LangGraph is good at expressing: nodes do work, edges decide where control flows next. Step 1: A Tool That Cannot Crash Your Agent Our agent's one capability is looking up an exchange rate, so the tool that does it has to be bulletproof. The single most important production habit is that a tool never raises into the agent loop. It validates its input and returns a readable error string that the model can reason about. A raised exception kills the run; a returned error lets the agent recover. Here, the tool checks that both currency codes are valid and that the requested pair exists, returning a clear message when either check fails. Notice the failure modes are first-class outputs, not afterthoughts. Bad arguments and missing data both produce a controlled message. Step 2: The Nodes The agent node asks the model what to do. The tool node executes any requested tools, and critically, it wraps every call so a failure becomes a message instead of a stack trace. It also rejects calls to tools that do not exist, which is how you contain a hallucinated tool name. Step 3: Wiring the Loop, With a Brake The conditional edge sends control to the tools node only when the model actually requests a tool; otherwise, the run ends. This is the whole agent loop in four lines. The brake that separates a demo from a production system is one line at invocation time: The recursion_limit bounds how many times the loop can cycle. Without it, a confused model can call tools indefinitely. With it, a runaway agent fails fast and loudly instead of quietly draining your budget. Treat it as a required parameter, not an optional one. Step 4: Observability, You Will Thank Yourself For When an agent misbehaves in production, you need to see its decisions, not guess at them. A few log lines at each node turn an opaque black box into a traceable sequence. Running the agent against "What is the USD to INR rate?" produces this: Every step is visible: the agent decides to call a tool, the tool returns a validated result, control flows back, and the model produces its final answer. When something breaks at 2 a.m., this trace is the difference between a five-minute fix and a five-hour investigation. What Makes It Survive The example is small, but the principles scale. Tools return errors instead of raising. Unknown tool names are rejected rather than executed. The loop is bounded, so it cannot run away. Every decision is logged, so failures are traceable instead of mysterious when you are debugging at scale. Swap the prototype model for ChatAnthropic or ChatOpenAI, add your real tools, and the same skeleton carries you from prototype to production without rewriting the core. The hard part of agent engineering was never getting the model to call a tool. It is designed for the moment the call goes wrong, and LangGraph gives you exactly the right place to put each safeguard.

By Shubham Gupta

Coding

Functions of Coding

Frameworks

Java

JavaScript

Languages

Tools

DZone's Featured Coding Resources

The Latest Coding Topics