DZone Spotlight

Thursday, September 25 View All Articles »

Death by a Thousand YAMLs: Surviving Kubernetes Tool Sprawl

By Yitaek Hwang

CORE

Editor's Note: The following is an article written for and published in DZone's 2025 Trend Report, Kubernetes in the Enterprise: Optimizing the Scale, Speed, and Intelligence of Cloud Operations. Kubernetes is eating the world. More than 10 years after Google open-sourced its container orchestration technology, Kubernetes is now everywhere. What started as a tool to primarily manage containers in the cloud has since bled into every facet of infrastructure. We now see companies using Kubernetes to manage not just their applications running in containers but also virtual machines and databases to edge deployments and IoT devices. The numbers are staggering. Taking a look at this State of Production Kubernetes 2025 report, we can see that over a third of the organizations run more than 50 clusters in production. Those clusters aren't small either: More than half run 1,000+ nodes, and one in 10 run 10,000+ nodes. In addition, organizations are now running clusters in more than five different clouds (e.g., AWS, Azure, GCP) and other environments (e.g., on-prem, edge, airgap, GPU clouds). But that growth has come with a serious operational burden. Running a production-ready Kubernetes cluster is not simple. More than 40% of companies say they have more than 20 software elements in their Kubernetes stack. When you consider the fact that things like ingress, storage, secrets management, monitoring, and GPU operators are all "add-ons," it's not hard to see why that number is so high. The result? Teams are drowning in YAML files, each managing yet another tool to keep the Kubernetes cluster humming. Developers are increasingly confused about how to deal with all these files, the security team is growing worried about new attack vectors, and DevOps teams are struggling to rein in this chaos. The pace of innovation in the Kubernetes space has so far pushed tremendous growth, but it has also brought meaningful challenges to teams dealing with Kubernetes sprawl. Let's break down what this looks like in practice, the pain points it creates, and some emerging solutions to help us survive. Anatomy of Kubernetes Sprawl When we talk about "Kubernetes sprawl," we often point to two related but distinct issues: cluster sprawl and tool sprawl. Cluster Sprawl When Kubernetes was first released, multi-tenancy was not natively baked in. Besides using namespaces as soft isolation mechanisms, tooling was very immature to provide a true, multi-tenant solution. So at least initially, multiple clusters had to be created by necessity. Ten years later, the story is a bit more complicated. Cluster sprawl is now more of a function of organic growth. The obvious reasons for cluster sprawl stem from environment separation. Whether to limit the blast radius or comply with organizational structure, it's common to see at least prod and nonprod separation. But if you look deeper, we now see Kubernetes clusters both locally and in CI that may be running a different distribution or topology than either prod or non-prod environments. Then we have multi-region and even multi-cloud clusters for resiliency or business-driven reasons (e.g., cost, compliance mandate). Finally, as Kubernetes bleeds into managing VMs, edge, and even other cloud workloads, separate clusters are spun up to keep separation of concerns or for convenience. Tool Sprawl Another side-effect of cluster sprawl is tool sprawl. Just take a look at the CNCF landscape grid. In order to manage and operate a production-ready Kubernetes cluster, we need related tooling for: NetworkingStorageCI/CDObservabilitySecurityCost management Even though some managed Kubernetes providers now package them in add-ons or extensions, teams are still having to make choices on ingress controllers, service meshes, and secrets management, just to name a few. There are hundreds of tools that have overlapping capabilities. And while each tool solves a problem, together they create confusion and duplication of work. Pain Points From Sprawl Kubernetes sprawl is not simply a nuisance. It manifests in a serious operational burden and causes business risks. Toil and Complexity Starting with the most obvious, it causes immense toil work on teams maintaining all of those clusters and tools. Even with some automation in place, it takes time and effort to keep up with the speed of innovation in this space. Toil work ranges from simply upgrading Kubernetes versions to making sure that said upgrade does not break the numerous tools that run on top of Kubernetes, not to mention the applications that it supports. It also places a huge mental burden on the developers interacting with the platform. If you are living and breathing Kubernetes every day, it's easy to become lost in the YAML hell and not realize how exactly one goes from pushing code to main to deploying it into Kubernetes with all the bells and whistles in place. Sure, some of it may be all black-box magic, but when things break, how much visibility do the developers have to self-service the issues themselves? Security and Observability Gaps Fragmented tooling and cluster sprawl lead to observability blind spots, and worse, security gaps stemming from inconsistent RBAC, uneven enforcement of policies, and a patchwork of agents and controllers scanning and alerting on security vulnerabilities. While there are efforts like OpenTelemetry and CNCF security projects to standardize how to tackle observability and security, most teams currently struggle to deal with a plethora of tools that address one of these concerns. Cost Management Finally, we have growing cost concerns. Over 42% of the State of Production Kubernetes 2025 report respondents cited cost as their top challenge, and 88% said their Kubernetes total cost of ownership had increased in the past year. As the number of clusters and tools grows, cost can easily balloon. It's not simply about build vs. buy either. Given the complexity of the ecosystem and growing sprawl, there isn't a single solution where a managed Kubernetes option can simply outweigh the cost of operational management. Everything becomes a tradeoff with the rate of innovation outpacing cost control. Emerging Solutions While the pain from tool sprawl persists, the Kubernetes community has made some progress in combatting these issues. Platform Engineering Platform engineering might have been the hottest keyword in the DevOps space in recent years. New teams focused on developing tools and workflows to unlock self-service capabilities in the cloud-native world are standardizing and defining "paved paths" to reduce drift across clusters and environments. Platform engineering teams aim to curb Kubernetes sprawl by publishing: Reusable pipelinesCentralized observabilitySecurity guardrails We can see such growth in the 2025 report: Over 80% of organizations say that they have a mature platform engineering team, and 90% provide an internal developer platform (IDP) to allow self-service capabilities. But not all "platforms" are built the same and usage doesn't always equal effectiveness. AI-Driven Solutions Another promising path involves using AI to drive operational efficiency. The vision here is using a "Kubernetes Copilot" to tune resources, troubleshoot faster, or generate YAML manifests from natural-language prompts. The advantage of platform engineering discipline is that large language models (LLMs) can potentially index and produce these materials more than resource-constrained teams manually curating internal solutions. While skepticism remains, early experiments with Kubernetes MCPs and LLM-powered resource generators are helping reduce the cognitive load of sprawl. Conclusion In 2025, there's no doubt that Kubernetes is a mature container orchestration technology of choice for many. It is powering tens of thousands of nodes across environments, regions, clouds, and at the edge. But with its meteoric rise in popularity has come at a cost: too many clusters, too many tools, and too much complexity. Kubernetes sprawl is impacting various teams in more ways than one. There's no silver bullet, but several solutions are emerging. We now have a clearer picture of what platform engineering means as a discipline, along with mature IDPs and policy frameworks to enforce consistency. And whatever your thoughts on the current AI hype are, it is undoubtedly affecting this ecosystem, from workload optimization and bootstrapping various YAML files to exposing more resources via natural language. At the end of the day, sprawl is inevitable given its growth. But innovation will continue, which will bring new tools and paradigms to conquer the chaos, just like Kubernetes did to manage the container ecosystem. This is an excerpt from DZone's 2025 Trend Report, Kubernetes in the Enterprise: Optimizing the Scale, Speed, and Intelligence of Cloud Operations.Read the Free Report More

Top 7 Mistakes When Testing JavaFX Applications

By Catherine Edelveis

JavaFX is a versatile tool for creating rich enterprise-grade GUI applications. Testing these applications is an integral part of the development lifecycle. However, Internet sources are very scarce when it comes to defining best practices and guidelines for testing JavaFX apps. Therefore, developers must rely on commercial offerings for JavaFX testing services or write their test suites following trial-and-error approaches. This article summarises the seven most common mistakes programmers make when testing JavaFX applications and ways to avoid them. Scope and Baseline Two projects were used for demonstrating JavaFX testing capabilities: RaffleFX and SolfeggioFX. The latter uses Spring Boot in addition to JavaFX. Note that these projects don’t contain JavaFX dependencies because they are developed based on open source Liberica JDK with integrated JavaFX support. JDK version: 21 TestFX was used as a testing framework. It is actively developed, open-source, and with a wide variety of features. RobotFX, a TestFX class, was used for interacting with the UI. Other libraries and tools used: JUnit5, AssertJ, JavaFX Monocle for headless testing in CI. Mistake 1: Updating UI Off the FX Thread JavaFX creates an application thread upon application start, and only this thread can render the UI elements. This is one of the most common pitfalls in JavaFX testing because the tests run on the JUnit thread, not on the FX application thread, and it is easy to forget to perform specific actions explicitly on the FX thread, such as writing to or reading from the UI. Take a look at this code snippet: Java List<String> names = List.of("Alice", "Mike", "Linda"); TextArea area = fxRobot.lookup("#text") .queryAs(TextArea.class); area.setText(String.join(System.lineSeparator(), names)); Here, we are trying to update the UI off the application thread. As a result, another thread is created and tries to perform actions on UI elements. This results in Thrown java.lang.IllegalStateException: Not on FX application thread;Random NPEs inside skins,Deadlocks,States that never update. What can we do? Write to the UI to mutate controls or fire handlers on the FX thread. If you use the FxRobot class, you can achieve that by wrapping mutations in robot.interact(() -> { ... }). Java List<String> names = List.of("Alice", "Mike", "Linda"); TextArea area = fxRobot.lookup("#text") .queryAs(TextArea.class); fxRobot.interact(() -> area.setText(String.join(System.lineSeparator(), names))); Read from the UI to get text, snapshot pixels, or query layout on the FX thread and return a value: Java private static Color samplePixel(Canvas canvas, Point2D p) throws Exception { return WaitForAsyncUtils.asyncFx(() -> { WritableImage img = canvas.snapshot(new SnapshotParameters(), null); PixelReader pr = img.getPixelReader(); int x = (int) Math.round(p.getX()); int y = (int) Math.round(p.getY()); x = Math.max(0, Math.min(x, (int) canvas.getWidth() - 1)); y = Math.max(0, Math.min(y, (int) canvas.getHeight() - 1)); return pr.getColor(x, y); }).get(); } On the other hand, the input, such as pressing, clicking, or releasing, should happen on the test thread. Do not wrap it in robot.interact(): Java robot.press(KeyCode.Q); Mistake 2: Bootstrapping Tests and FXML ClassLoader Incorrectly When you combine JavaFX/TestFX with a framework such as Spring Boot, it is easy to boot the application the wrong way. The thing is that TestFX owns the Stage, but Spring owns the beans. So, if you boot Spring without giving it the TestFX Stage, the beans will not be able to use it. On the other hand, if you call Application.start(...) directly, you can end up with two contexts. Another mistake is related to the same situation of using JavaFX with Spring. FXMLLoader uses a different classloader than Spring. Therefore, controllers Spring creates aren’t the same “type” as the ones FXML asks for. Incorrect bootstrapping results in: NoSuchBeanDefinitionException: ...Controller even though it’s a @Component.Random NPEs from the custom FxmlLoader because applicationContext is null.Stack traces mention exceptions related to ClassLoader or “can’t find bean for controller X”. What can we do? Make FXMLoader use the same class loader as Spring in the application code: Java public Parent load(String fxmlPath) throws IOException { FXMLLoader loader = new FXMLLoader(); loader.setLocation(getClass().getResource(fxmlPath)); loader.setClassLoader(getClass().getClassLoader()); return loader.load(); } Use @Start to wire up a real Stage, and Dependency Injection to inject fakes. Don’t call new FxApplication.start(stage) if this code boots Spring internally. Java @SpringBootTest(webEnvironment = SpringBootTest.WebEnvironment.NONE) @ExtendWith(ApplicationExtension.class) class PianoKeyboardColorTest { @Autowired private ConfigurableApplicationContext context; @Start public void start(Stage stage) throws Exception { FxmlLoader loader = new FxmlLoader(context); Parent rootNode = loader.load("/fxml/in-build-keyboard.fxml"); stage.setScene(new Scene(rootNode, 800, 600)); stage.show(); WaitForAsyncUtils.waitForFxEvents(); } Mistake 3: Confusing Handler Wiring with Real User Input When you try to trigger the UI behavior by calling the controller methods directly, you are testing the code wiring, but not the real event path the user takes, such as focusing, clicking, pressing, etc. As a result, your tests may pass but miss bugs. Alternatively, the test may hang and fail because the input never fired. Another side of this coin is triggering the UI event that can’t happen, such as going full screen in the headless mode somewhere in CI. In this case, the assertions will time out waiting for the event that will never happen. For example, we can trigger the button action with robot.clickOn() and button.fire(), but these methods are not equivalent. The robot.clickOn() simulates a real mouse click by moving the mouse, pressing, and releasing. The button.fire() triggers the button’s action programmatically and skips the mouse events entirely. What can we do? Don’t mix integration and interaction tests, i.e., avoid calling controller methods directly in the UI tests. Use robot.clickOn() or similar FxRobot’s methods to test user interaction and UI behaviour: pressed/hover visuals, etc. Note that this method runs on the test thread, so you don’t have to wrap it in interact(): Java Canvas canvas = robot.lookup("#keyboard").queryAs(Canvas.class); robot.interact(canvas::requestFocus); robot.press(KeyCode.Q); Use button.fire() or similar control methods to assert handler effects without relying on real pointer semantics. Note that these methods run on the FX thread, so they must be wrapped in interact(): Java Button btn = fxRobot.lookup("#startButton").queryButton(); fxRobot.interact(btn::fire); Assert by changes in the UI, such as the presence of a node in the new scene, label text change, button visibility mode, not by assuming the service call succeeded. Java WaitForAsyncUtils.waitFor(3, SECONDS, () -> robot.lookup("#startPane").tryQuery().isPresent()); In headless mode, if the platform can’t do something like going full screen, assert a proxy signal (pseudo-classes, button state). Mistake 4: Racing the FX Event Queue As JavaFX is a single-thread kit, all UI events happen on the FX Application Thread, and so, events like animations, layout, etc., get queued. If you assert in tests before the queue is drained, you are testing the UI that doesn’t exist yet: You fire an action and immediately assert. As a result, your check runs before the handler executes.You query the scene right after a scene switch when the new nodes aren’t attached yet.You read pixels or control state from the test thread while JavaFX is mid-layout. Therefore, tests pass or fail unpredictably depending on CPU, CI, and whatnot. What can we do? In the case of simple changes, use WaitForAsyncUtils.waitForFxEvents() for the event queue of the JavaFX Application Thread to be completed: Java @Start public void start(Stage stage) throws Exception { FxmlLoader loader = new FxmlLoader(context); Parent rootNode = loader.load("/fxml/in-build-keyboard.fxml"); stage.setScene(new Scene(rootNode, 800, 600)); stage.show(); WaitForAsyncUtils.waitForFxEvents(); } In the case you are waiting for observable outcomes, use WaitForAsyncUtils.waitFor() to wait for some conditions to be met: Java @Test void shouldChangeSceneWhenContinueButtonIsClicked(FxRobot fxRobot) throws TimeoutException { Parent oldRoot = stage.getScene().getRoot(); Button btn = fxRobot.lookup("#continueButton").queryButton(); fxRobot.interact(btn::fire); WaitForAsyncUtils.waitFor(3, TimeUnit.SECONDS, () -> stage.getScene().getRoot() != oldRoot); assertThat(stage.getScene().getRoot()).isNotSameAs(oldRoot); assertThat( fxRobot.lookup("#startButton") .queryAs(Button.class)).isNotNull(); } The same approach should be applied when dealing with animations. Wait for the state to change, not the duration the animation is supposed to run: Java @Test void shouldHideAndDisableButtonsWhenRaffling(FxRobot fxRobot) throws TimeoutException { Button start = fxRobot.lookup("#startButton").queryButton(); Button repeat = fxRobot.lookup("#repeatButton").queryButton(); fxRobot.interact(start::fire); WaitForAsyncUtils.waitFor(5, TimeUnit.SECONDS, () -> WaitForAsyncUtils.asyncFx(() -> repeat.isVisible() && !repeat.isDisabled() ).get() ); assertThat(repeat.isVisible()).isFalse(); assertThat(repeat.isDisabled()).isTrue(); } Mistake 5: Assuming Pixel-Perfect Equality Across Platforms The pixel colors in the JavaFX applications may differ slightly on various platforms due to various reasons: CI uses Monocle, whereas Prism SW and the laptop use a GPU pipeline, or one machine uses LCD subpixel text and another uses grayscale. If the tests assess exact RGB equality on all platforms, the tests may pass locally and fail in CI or on another local machine. What exactly happens? JavaFX apps can run with different DPI scaling on various displays / in various environments: see release notes, bugs, javadoc proving that. On HiDPI and retina displays, JavaFX renders at a scale >1, so logical coordinates don’t map 1:1 to physical pixels. As a result, antialiasing and rounding shift colors slightly, breaking pixel-perfect assertions.Headless Monocle uses software Prism, not the desktop GPU, leading to slightly different composites.The FontSmoothingType enum in JavaFX specifies the preferred mechanism for smoothing the edges of fonts: sub-pixel LCD or GRAY. Due to this fact, the pixels may vary depending on the mode used by the system. Even if the mode is set in the application, JavaFx may fall back on a different mode if the first one is not supported by the system. See the proof for macOS and Linux as an example. What can we do? Don’t assert the exact color. Compare baseline vs changed and allow for some tolerance in color and pixel density. For example, in SolfeggioFX, to test that the key color on the virtual piano has changed when the corresponding key was pressed, we can calculate pixel indices using Math.round() to tolerate the fractional positions in the case of HiDPI and Math.max()/min() to avoid sampling outside the image in case the Point2D value is near the edge: Java private static Color samplePixel(Canvas canvas, Point2D p) throws Exception { return WaitForAsyncUtils.asyncFx(() -> { WritableImage img = canvas.snapshot(new SnapshotParameters(), null); PixelReader pr = img.getPixelReader(); int x = (int) Math.round(p.getX()); int y = (int) Math.round(p.getY()); x = Math.max(0, Math.min(x, (int) canvas.getWidth() - 1)); y = Math.max(0, Math.min(y, (int) canvas.getHeight() - 1)); return pr.getColor(x, y); }).get(); } In addition, we can allow for a small absolute difference when comparing colors: Java private static boolean colorsClose(Color a, Color b) { double eps = 0.02; // tolerate small AA differences (~2%) return Math.abs(a.getRed() - b.getRed()) < eps && Math.abs(a.getGreen() - b.getGreen()) < eps && Math.abs(a.getBlue() - b.getBlue()) < eps; } @Test void shouldHighLightPressedKey(FxRobot robot) throws Exception { Point2D point = Objects.requireNonNull(centers.get('Q')); Color before = samplePixel(canvas, point); robot.press(KeyCode.Q); WaitForAsyncUtils.waitFor(1500, TimeUnit.MILLISECONDS, () -> !colorsClose(WaitForAsyncUtils.asyncFx(() -> samplePixel(canvas, point)).get(), before)); Color duringPress = samplePixel(canvas, point); assertThat(before.equals(duringPress)).isFalse(); } Sample pixels inside the shape, not near borders, to avoid having a different color if borders blend with the background. In SolfeggioFX, we stored per-key centers in the Canvas properties when drawing the virtual piano, and used this data in the tests to sample pixels near the key center: Java // production code canvas.getProperties().put("keyCenters", Map<Character, Point2D> centers); // tests Point2D point = Objects.requireNonNull(centers.get('Q')); Mistake 6: Misconfiguring Headless CI Running JavaFX tests in CI differs from the standard testing process. The tests must run in headless mode and be backed by Monocle, an implementation of the Glass windowing component of JavaFX for embedded systems. But simply adding the dependency on Monocle won’t help much, and tests that pass locally may fail in CI due to multiple factors: UI tests run in parallel.Required modules are locked down, but Monocle uses com.sun.glass.ui reflectively. As a result, you get exceptions like IllegalAccessError: module javafx.graphics does not export com.sun.glass.ui or InaccessibleObjectException: … does not "opens com.sun.glass.ui"Tests assert platform features that don’t exist in headless, for instance, Stage.setFullScreen(true). So, the tests hang and finally fail with the TimeoutException What can we do? Add the Monocle dependency and set all necessary flags to run the tests in headless mode. In addition, open the required modules with --add-opens. Add the Monocle dependency first: XML <dependency> <groupId>org.pdfsam</groupId> <artifactId>javafx-monocle</artifactId> <version>21</version> <scope>test</scope> </dependency> Then, specify all required flags in a separate plugin: set the headless mode, disable parallelism, etc. Note that the --add-opens are specific to the RaffleFX application used for demonstration, in your case, the modules may be different. This application is developed and compiled in CI using a Java runtime with bundled JavaFX modules, but if you add the dependencies on JavaFX modules manually, you may have to use the additional --add-exports flag that allows compile-time access to Glass internals: XML <profile> <id>headless-ci</id> <build> <plugins> <plugin> <groupId>org.apache.maven.plugins</groupId> <artifactId>maven-surefire-plugin</artifactId> <version>3.5.3</version> <configuration> <forkCount>1</forkCount> <reuseForks>true</reuseForks> <argLine> --add-opens=javafx.graphics/com.sun.javafx.application=ALL-UNNAMED --add-opens=javafx.graphics/com.sun.glass.ui=ALL-UNNAMED --add-opens=javafx.graphics/com.sun.javafx.util=ALL-UNNAMED --add-opens=javafx.base/com.sun.javafx.logging=ALL-UNNAMED --add-opens=javafx.graphics/com.sun.glass.ui.monocle=ALL-UNNAMED -Dtestfx.robot=glass -Dtestfx.headless=true -Dglass.platform=Monocle -Dmonocle.platform=Headless -Dprism.order=sw -Dprism.text=t2k -Djava.awt.headless=true </argLine> </configuration> </plugin> </plugins> </build> </profile> Adjust the tests that wait for stage.isFullScreen() and assert a proxy signal or skip these tests in CI. In the workflow file, make sure to install all necessary native libraries for JavaFX and run the tests with the correct profile. The file below uses Liberica JDK 21 with JavaFX in the setup-java action, so no additional dependencies on FX are required: YAML name: Tests on: push: paths-ignore: - 'docs/**' - '**/*.md' branches: [ main ] jobs: test_linux_headless: name: UI tests (Ubuntu + Monocle) runs-on: ubuntu-latest steps: - name: Install Linux packages for JavaFX run: | sudo apt-get update sudo apt-get install -y \ libasound2-dev libavcodec-dev libavformat-dev libavutil-dev \ libgl-dev libgtk-3-dev libpango1.0-dev libxtst-dev - uses: actions/checkout@v4 - uses: actions/setup-java@v5 with: distribution: 'liberica' java-version: '21' java-package: 'jdk+fx' cache: maven - name: Run tests (headless with Monocle) run: ./mvnw -B -Pheadless-ci test - name: Upload surefire reports if: always() uses: actions/upload-artifact@v4 with: name: surefire-reports path: | **/target/surefire-reports/* **/target/failsafe-reports/* Mistake 7: Entangling Business Logic with UI (Non-Determinism) Last but not least, testing business logic with UI is not the best practice. Just as you separate controllers and service tests for web apps, the domain logic tests should not coexist with UI tests in one class. In the worst-case scenario, the tests become slow and yield inconsistent results. What can we do? The best solution would be to move business logic to ViewModels and test it with plain JUnit. This way, you don’t depend on animations and other UI events, and make sure that your tests are always deterministic. Conclusion JavaFX applications need testing just like any other program. On the one hand, you verify that the application functions exactly as expected. On the other hand, you make it more maintainable in the long term. Nevertheless, the unfamiliar process of JavaFX testing may result in numerous exceptions during test runs or ‘mysterious’ test failures. Luckily, developers can navigate these unknown waters safely, keeping an eye on the following waymarks: FX thread vs test thread: Mutate UI and read from UI on the FX Application thread, send input from the test thread.Correct bootstrap: If you use frameworks such as Spring, make sure to start Spring/TestFX in the right order and make FXMLLoader use Spring’s class loader.FX event queue: Wait until the FX queue is drained before making assertions and assert by state, not duration.No pixel-perfect assertions: Keep in mind that the environment and platform may affect the visuals slightly, so allow for tolerance when testing colors and take samples closer to the element center. CI headless configuration: Configure the headless testing with Monocle, open required Glass internals, and avoid asserting platform features Monocle can’t emulate. Testing JavaFX may seem complicated, and this article covers the most common pitfalls. But following these pieces of advice, you will be able to build a reliable testing foundation for your JavaFX program. More

Trend Report

Kubernetes in the Enterprise

Over a decade in, Kubernetes is the central force in modern application delivery. However, as its adoption has matured, so have its challenges: sprawling toolchains, complex cluster architectures, escalating costs, and the balancing act between developer agility and operational control. Beyond running Kubernetes at scale, organizations must also tackle the cultural and strategic shifts needed to make it work for their teams.As the industry pushes toward more intelligent and integrated operations, platform engineering and internal developer platforms are helping teams address issues like Kubernetes tool sprawl, while AI continues cementing its usefulness for optimizing cluster management, observability, and release pipelines.DZone's 2025 Kubernetes in the Enterprise Trend Report examines the realities of building and running Kubernetes in production today. Our research and expert-written articles explore how teams are streamlining workflows, modernizing legacy systems, and using Kubernetes as the foundation for the next wave of intelligent, scalable applications. Whether you're on your first prod cluster or refining a globally distributed platform, this report delivers the data, perspectives, and practical takeaways you need to meet Kubernetes' demands head-on.

Refcard #387

Getting Started With CI/CD Pipeline Security

By Sudip Sengupta

CORE

Getting Started With CI/CD Pipeline Security

Refcard #216

Java Caching Essentials

By Granville Barnett

The New API Economy With LLMs

Large language models (LLMs) are becoming more advanced in understanding context in natural language. With this, a new paradigm is emerging — using LLMs as APIs. Traditionally, an API call would be GET /users/123/orders and you would receive a JSON in return, which would return the orders for the user 123. APIs facilitate the interaction between different software systems. But what if the query was more complex? What if the requirement was to interact with another system without having much knowledge about the APIs to do so? What if the person interacting with the other system is non-technical and does not know how to call the API? Image by Authors: Traditional API vs LLM API This is where LLMs come into the picture. LLMs can function as a general-purpose API that can interpret the input and generate an output. In this case, the input is a natural language query, and the output is an API. This changes how software is built and how users interact with the software. This article discusses this aspect from both angles — how the developers should build software that supports an LLM-type API and how users can use LLMs to interact with software. You may wonder why now. Well, the answer is simple — there is an increased level of sophistication and accessibility of the models, which makes this shift possible. As the model outputs improve and the latencies decrease, this is becoming an easier shift. From Code to Natural Language Traditional APIs are deterministic: You make a specific request and get a known response. Every single time the traditional API is called, the response is the same. They are predictable, reliable, and rigid. But LLMs change that equation. Let’s see an example: Traditional API: GET /users/123/orders → JSON → Manual filtering & aggregation Image by Authors: Traditional API — Requires manual user involvement at every step LLM API: "What did user 123 order last month?" → LLM → “User placed 3 orders totaling $249” Image by Authors: LLM API — Automates every step behind the scene The two diagrams above show the difference in how the same task — retrieving and interpreting a user’s history is handled. LLM-powered API access offers a transformative shift by enabling users to interact with systems using natural language instead of structured syntax. Unlike traditional APIs that require understanding endpoints, parameters, and authentication details, LLMs interpret user intent and generate the correct API calls behind the scenes. This removes the need for technical expertise, making API access available to non-developers like analysts or product managers. LLMs also handle vague queries (e.g., "recent orders") by inferring filters, reducing the need for back-and-forth query refinement. They can combine data from multiple sources, apply reasoning, and return user-friendly summaries instead of raw JSON. Overall, LLMs abstract complexity, shorten the path from question to insight, and unlock APIs for a broader set of users. Below is another example : Traditional API: GET /weather?city=”SanFrancisco”LLM API: “What’s the weather in San Francisco this weekend? Should I pack a jacket?” If you notice, the difference is very apparent. The LLMs are doing more than just getting the weather for San Francisco in the above example. They are also able to answer a question about whether a jacket is required or not. This shows the ability of the new API surface that the LLMs can provide. It not only queries the traditional API style to get an answer, but is also able to answer additional queries that may or may not be present in the data. The LLM could be programmed to query the weather API, see a forecast of 60°F with wind, and then access an internal knowledge base that correlates temperature and conditions with appropriate clothing, leading to the recommendation. LLMs are no longer assistants. They are becoming programmable, context-aware software interfaces. How LLMs Are Changing the Way We Build and Use APIs This fundamental shift from structured code to unstructured language demands adaptation from everyone involved in the software lifecycle, from the developers building the APIs to the end-users interacting with them. API Developers Developers now need to transition, as this demands a new skill set. APIs are contracts, and they should be maintained as such. From the calling user endpoint, they want to call into an API and expect to receive the exact same results each time, given that the parameters are the same. With LLMs hallucinating and relying on probabilities, maintaining the contract can become challenging. Consider the example used above: “What did user 123 order last month?", internally needs to call the same API and correctly extract the user id and the timeframe from the prompt. But what if the query changes to “For the last 30 days, what did user 123 order”. For a human, it is obvious that the question is still the same, just a different way of saying it. It becomes important for the API developers to be able to extract this information consistently and correctly with the LLM. Developers need to focus on: Creating prompt templates to guide the model's behavior as per its use case.Slowly integrate LLM APIs into their code base. This is to ensure that it is a slow phased approach.Be understanding of the fact that it is not necessary to convert all the APIs to be used through an LLM API.Optimize for LLM latencies and token usage.Have good guardrails so that there is no misuse of the LLM APIs. For instance, a developer could implement a guardrail that prevents an LLM from processing a prompt containing 'delete all users’. The system would flag the suspicious intent before parsing it.Have a failure detection model to ensure that hallucination can be minimized. A failure detection model might work by having the LLM show its work. Before executing a database query, the LLM could output the query it plans to run. A separate, simpler validation model could then check that query against a set of allowed patterns, catching a potential hallucination before it returns incorrect data.Enable good access management even with LLM APIs. For example, it should not be made possible for an LLM API to bypass the permissions required to access certain data or perform certain actions.Ensuring the AI’s output is useful, safe, and aligned is as important as writing bug-free code. Image by Authors: API Developers workflow shift This gives rise to LLMOps, a new discipline focused on the lifecycle management of LLM applications, including prompt monitoring, A/B testing prompt templates, and maintaining observability for failure detection and resolution. Building with LLMs is part coding, part prompt engineering, part optimization, and part product design. API Users Callers of these LLM APIs can rejoice. Since LLM APIs are becoming a thing, this gives the callers the ability to choose which kind of API to call. A developer might let an LLM handle a fuzzy user query by turning it into multiple API calls (via function calling or an agent), but use a direct API call for a straightforward, deterministic operation. Let us talk about the same example used above: GET /users/123/orders. For an API user, they need this information to populate a user interface. But to get insights from it, the prompt can become “Summarize user 123 orders and identify patterns within the data”. With this modification, there can be multiple API calls: one that gets the user 123's orders and one that identifies patterns within that data. From the caller's perspective: Ensure that the API suits the purpose. While the developers will limit hallucinations or incorrect results, assume that mistakes can happen. By this logic, ensure critical work is called through well-defined APIs. Work that can benefit from LLM-level responses can be done using LLM APIs. Deciding when to use one versus the other becomes a key design choice.An important aspect is AI literacy, where callers need to understand the best way to structure their inputs to get the best outputs.Adapt their own software interactions with LLM APIs. For example, a travel booking application could use a traditional API to fetch flight prices. It could then use an LLM API to interpret a user's freeform request like, 'Find me a flight to Hawaii next month, I want to surf and stay somewhere family-friendly.' The LLM would parse the intent ('surfing,' 'family-friendly') to filter the structured data retrieved from the traditional API. Typical roles: Support agents (partner engineers), business analysts, integration engineers Image by Authors: API users workflow shift Non-Technical Users LLMs don’t just empower developers; they open doors for everyone else. Previously, a designer may have had to use a complicated User Interface to perform a certain action. Internally, when a button was pressed, an API was being called to perform the action. But now, the designer can simply prompt the LLM to perform a certain action or even guide the designer on the best way to perform that action. For this category of users, a query like GET /users/123/orders is hard to understand. Language provides the opportunity to ask so much more: “What did user 123 order within the last month, provide me with a summary and compare it with other users in the area”. This query functions as a multi-faceted question that directly serves the user's purpose. For a product manager, instead of asking the developers how certain sequence flows take place, they can easily query the LLM and get answers to it. For example, a marketing manager could ask, "Draft three social media posts for our new product launch, tailored for Twitter, LinkedIn, and Instagram, and schedule them for Monday at 9 AM." The LLM would not only generate the content but also interact with a scheduling API. Or a sales representative could prompt, "Summarize my last five email interactions with Client X and suggest three potential next steps based on their stated needs." This replaces digging through a CRM with a direct, action-oriented query. Typical roles: Product managers, designers, marketers, sales reps, operations staff Image by Authors: Non-technical users workflow shift Language as the New API Surface The biggest shift isn’t just technical — it’s conceptual. With LLMs, the API surface becomes language. You’re not just coding functions, you’re authoring instructions. You’re not designing screens, you’re enabling interactions. You’re not calling specific APIs, you’re asking a question. LLMs are shifting the paradigm from rigid interfaces and predefined contracts to more adaptive, conversational, and intelligent systems. It is not just the above categories that need to change. The biggest change has to be in how enterprises adopt them. LLMs expose interfaces and agents that can be customized. For enterprises, this means LLMs are not just another tool but a customizable and intelligent layer that can be wrapped around existing legacy systems. The transition to using language as an API is just the beginning. As these systems become more capable and integrated, the distinction between using software and having a conversation will reduce, fundamentally reshaping our relationship with technology. Treating LLMs like APIs is the first, crucial step into that future.

By Rachit Jain

Key Principles of API-First Development for SaaS

Having worked in software development for over 8 years, I have repeatedly watched developers struggle to integrate APIs into platforms as an afterthought. The situation is common. Someone builds a beautiful web app, then the business team asks for mobile support, third-party integrations, and suddenly you're reverse-engineering your own application to expose endpoints that make sense. Luckily, this is changing. With API-first development, we can design the architecture with the API as part of it from day one. This is especially beneficial for SaaS products as they rely on third-party integrations and ecosystem support. In this article, I’d like to share my experience and best practices for API-first development in SaaS, highlighting five key aspects you should consider when working on such projects. Why the API-First Development Approach Gets Popular The popularity of API-first development comes from painful lessons developers like us have learned over the past decade. I've seen too many projects where the "quick MVP" approach created integration nightmares six months later. Modern SaaS applications don't exist in isolation anymore. We work every day with projects that need to integrate with Stripe for payments, SendGrid for emails, Salesforce for CRM data, etc. When you design APIs first, these integrations feel natural. When you don't, each integration becomes a custom engineering project that takes weeks instead of days. The composable nature of API-first architecture means we can swap out components without rebuilding everything. This shift makes API design a business strategy, not just an engineering decision. Traditionally, APIs were built only to connect systems internally, but today more than 62% of engineers work with APIs that go beyond being just technical tools and function as revenue-generating products. So, let’s talk about the key aspects of API-first development you need to pay attention to. 5 Best Practices for API-First Development in SaaS 1. Strong Security Mechanisms Security in distributed API architectures keeps developers awake at night sometimes. Unlike monolithic applications, where you control a few well-defined entry points, API-first systems create multiple attack surfaces that need protection. I recommend implementing authentication at the API gateway using standard protocols such as OAuth 2.0 and OpenID Connect. It’s better to avoid custom authentication schemes because they're harder to audit and more likely to contain vulnerabilities. API keys and database credentials should live in secure key management systems, never in configuration files or environment variables. Also, monitoring is crucial for catching security issues early. Log authentication failures, unusual request patterns, and access to sensitive endpoints. 2. Versioning and Backward Compatibility When using an API-first approach, you need to use semantic versioning religiously. Major versions signal breaking changes, minor versions add functionality without breaking existing code, and patches fix bugs. This system helps other developers understand the impact before they upgrade. Here's what I've learned about designing APIs that evolve gracefully: Always add optional parameters instead of required ones. If you add a new required parameter, every existing client must update their code immediately, or the integration breaks. Optional parameters allow old clients to continue working without change.Provide sensible defaults for new fields to ensure stability and predictable behavior.Keep deprecated endpoints running long enough for customers to migrate. Documentation becomes critical here. You need to maintain clear migration guides that show developers exactly how to update their code. In my practice, we even built automated tools that scan customer API usage and warn them about deprecated features before we remove them. 3. Effective Data Management Data consistency gets tricky in API-first architectures because multiple services often need the same information. This challenge is especially pronounced in industries like healthcare, where managing sensitive data requires additional tools and processes to ensure quality and security. My rule is simple: every piece of data has one authoritative source. Other services access this data through APIs, not by maintaining their own copies. This prevents data fragmentation. Input validation needs to be consistent across all endpoints. Validate at the API gateway and again at the service level. When validation fails, return detailed error messages that tell developers exactly what went wrong and how to fix it. I've also learned to be careful about data transformation at API boundaries. Keep the core data model simple and consistent. This simplifies integrations and reduces errors. Handle formatting and presentation logic in client applications or dedicated formatting services. 4. Design for Scalability Scalability in API-first systems requires thinking about each service independently. Unlike monolithic applications that scale as single units, you can scale individual API services based on their specific load patterns and resource requirements. Load balancing becomes more sophisticated with multiple services, too. I recommend using health checks to ensure traffic only goes to healthy instances. Also, geographic distribution helps with latency for global customers. Monitor performance metrics continuously across all services to detect bottlenecks early. Response times, error rates, and throughput should be tracked for every endpoint. Application performance monitoring tools will help you identify bottlenecks in request processing pipelines. 5. Comprehensive Documentation Good documentation transforms APIs from technical implementations into business enablers. I've seen developers abandon integration projects because the documentation was confusing or incomplete. OpenAPI and Swagger are the most supported API specification formats. Standardized specifications enable automatic documentation generation and help you create client libraries in their preferred languages. Keeping documentation synchronized with implementation is critical. Use automated testing to verify that documentation examples work against the live API. Outdated documentation creates more problems than no documentation because it leads developers down the wrong paths. You should create getting-started guides for common integration scenarios. Step-by-step instructions that get developers from API key creation to successful integration as quickly as possible reduce onboarding friction and support overhead. Wrapping Up API-first development is changing the paradigm of SaaS product development for both businesses and engineers. It's a strategic approach that makes products more flexible, scalable, and integration-friendly. The five principles I've outlined here come from real project experience, including mistakes that taught me valuable lessons. Companies that embrace API-first development now will have significant advantages as software continues evolving toward composable, interconnected architectures. The transition takes time, but the flexibility and scalability benefits compound as your system grows. Trust me, your future self will thank you for making this investment.

By Mykhailo Kopyl

Testing Automation Antipatterns: When Good Practices Become Your Worst Enemy

Note: This article is a summary of a talk I gave at VLCTesting in 2023. Here's the recording (Spanish). Test automation is a fundamental tool for gaining confidence in what we build in a fast and efficient way. However, we often encounter practices that, while seemingly beneficial in the short term, generate significant problems in the long term: antipatterns. What Is an Antipattern? First, let's establish what I consider an antipattern, as it's not simply an obvious bad practice. It's characterized by: Offering an immediate benefit that's intuitiveSeeming like the right solution at the momentLeading to negative consequences over time Understanding and identifying them is important to avoid test suite degradation, slowness, unexplained failures, and ultimately abandoning automation efforts. Let's look at some examples I've encountered recurrently in teams and testers. 1. The Testing Pyramid as Dogma The testing pyramid suggests a specific distribution: many unit tests at the base, some integration tests in the middle, and few end-to-end tests at the top. The problem arises when it's applied as a universal rule without considering the project's specific context. Why is it an antipattern? The testing pyramid becomes an antipattern due to these apparent short-term benefits: Intuitive and visual model: Easy to understand and explain to stakeholdersWidespread popularity: "Cargo cult" - if Martin Fowler says it, it must be rightFeeling of doing the right thing: Following a recognized model gives immediate confidenceSimplifies decisions: No need to think about strategy, just follow the distribution Why it happens: It's much easier to follow an established model than to analyze each project's specific context. Also, questioning the pyramid might seem like you're questioning a "universal truth" of testing. Long-Term Problems Misalignment with business objectives: We should focus testing efforts on what makes sense for our application and what customers value. In some cases, this might be visual aspects, performance, stability, or the ability to integrate with other systemsLow-value tests: Adding low-value unit tests just to "comply" with the pyramid's baseStrategic rigidity: Applying the distribution without considering that frontend-heavy projects might benefit from other modelsResource waste: Time invested in tests that don't add real business value How to Solve It Identify your business core: Is it performance? Visual experience? Complex API logic?Risk analysis: Where are the biggest failure points in your specific application?Consider alternatives: "Trophy" model for SPAs, "diamond" for hybrid applicationsRemember the iceberg: The pyramid is the visible result, but it needs a solid foundation of team culture, knowledge, and strategy. 2. Local Test Execution Only This antipattern occurs when automated tests can only be executed on one specific person's local machine, usually the tester, who must run them manually instead of having them integrated into a CI/CD system. Why is it an antipattern? Local-only execution offers these irresistible immediate benefits: Development speed: No dependency on complex CI/CD configurationsTotal control: You have absolute dominion over the execution environmentNo blockers: No need to wait for others to configure pipelines or environmentsInitial ease: It's the quickest way to start with automationNo dependencies: No need to coordinate with DevOps or system administratorsImmediate feedback: You see results instantly without CI queues Why it happens: It's simpler to have total control and see immediate results. Also, setting up CI/CD can seem complex and "not urgent" when tests work locally. Long-Term Problems Knowledge silos: Only one person can execute the testsSlow feedback for the team: Others don't receive immediate information about code statusCritical dependency: If that person isn't available, tests don't runImpossible continuous integration: No real automation in the development pipelineInvestment loss: When the person leaves, all automation is lost How to Solve It CI/CD from day one: Every test must be able to run unattendedDockerization: Use containers to ensure consistency across environmentsShared repositories: Test code must be versioned and accessible to the entire teamClear documentation: Procedures so anyone can execute the testsInvestment in configuration: Dedicate initial time to properly configure environments 3. Cucumber: Misunderstood and Misused Cucumber allows writing tests in natural language (Gherkin) that are then linked to technical code. It becomes an antipattern when adopted, expecting it to automatically improve collaboration with business or simplify testing, without a clear BDD strategy behind it. Why is it an antipattern? Cucumber offers very attractive benefits: "Wow effect": Translating natural language to executable code seems magicalPromise of collaboration: "Now business can write tests"Ubiquitous language: Everyone will understand tests, technical, and non-technicalLiving documentation: Scenarios serve as executable specificationsMethodological justification: "We're doing real BDD"Professional differentiation: Using Cucumber makes you seem more advanced than using only unit tests Why it happens: The promise of democratizing testing is very attractive. Also, once you invest time learning Gherkin and creating step definitions, it's hard to admit it doesn't add value in your specific context. Long-Term Problems Unnecessary complexity: You add an extra layer that isn't always justifiedExpensive maintenance: Imperative scenarios become fragile with changesFalse collaboration: Business people rarely maintain or write scenariosImplementation coupling: Steps become too specific about the "how"Loss of real value: Used as a post-development testing tool instead of facilitating prior conversations How to Solve It Evaluate real context: Is there active collaboration between non-technical roles in criteria definition?Use it a priori: To generate conversations before development, not as a post-development testing toolDeclarative scenarios: Focus on what (behavior), not how (implementation), following Gherkin syntax best practicesConsider alternatives: If your entire team is technical, direct tests might be more efficient 4. Testing Through the Interface vs. Testing the Interface This antipattern consists of using UI tools (like Selenium/Cypress) to test the entire application stack when what you intend is to validate only the specific functionality of the user interface. It's the difference between using the UI as a vehicle to test the entire application versus testing specifically that the UI works correctly. Why is it an antipattern? Testing the entire stack through UI with an e2e tool offers very attractive immediate benefits: Total security feeling: "If it works in the browser, the entire system works"Manual testing replication: It's the most direct automation of what we'd do manuallyLess analysis required: No need to think about layers, dependencies, or architectureLess communication: No need to coordinate with other teams about what they testUniversal understanding: Anyone can understand what the test does just by watching it executeOne tool for everything: Selenium solves "all" testing needs Why it happens: It's natural to want to replicate what we do manually. Also, thinking in layers and dividing testing responsibilities requires more mental effort and team coordination. Long-Term Problems Slow and confusing feedback: When something fails, you don't know if it's the UI, API, database, or external serviceExpensive maintenance: Changes in any layer break end-to-end testsExtreme redundancy: You validate the same logic in multiple layers (e.g., email validation)Slow and unstable tests: More components = more failure pointsLimited scalability: Adding more end-to-end tests makes the suite progressively slower How to Solve It Divide responsibilities: Each test type in the most efficient layer for what it validatesStrategic mocking: Isolate the specific functionality you want to testRisk analysis: Identify which critical flows DO require complete end-to-end testsTeam communication: Coordinate to avoid unnecessary validation duplication 5. The Danger of Retries in Flaky Tests Flaky tests are tests that sometimes pass and sometimes fail without apparent changes in the code. The antipattern arises when we configure automatic retries to make these tests "go green" instead of investigating and solving the root cause of their instability. Why is it an antipattern? Configuring automatic retries for flaky tests offers irresistible immediate benefits: Quick and easy solution: A simple retry: 3 in your configuration and "problem solved"Instant green pipeline: No more broken builds due to "temporary issues"Fewer interruptions: The team isn't interrupted by false positivesExternal problem attribution: "It's the tool's or environment's fault, not ours"Delivery pressure: You need it green "now" to not block the releaseLess investigation required: You don't have to analyze each individual failure Why it happens: It's much easier to configure a retry than investigate the root cause. Also, when the test passes on the second attempt, it reinforces the belief that "it was just a temporary problem." Long-Term Problems Total confidence loss: Nobody trusts tests that "sometimes fail"Hidden real problems: Race conditions, memory leaks, concurrency issues get masked"Watermelon" suite: Green on the outside in the dashboard, but red inside with real failuresProgressive degradation: Underlying problems worsen until impossible to ignoreEventual abandonment: Teams end up disabling or completely ignoring tests How to Solve It Mandatory investigation: Every failure must be analyzed before any retryExhaustive root cause analysis: Logs, screenshots, manual reproduction, team consultationDeep technical knowledge: Understand how your testing tool works internallyQuality culture: "It's green" isn't enough if there were retries without prior investigation 6. The Coverage-Oriented Testing Illusion This antipattern arises when the main goal of writing tests becomes reaching a specific code coverage percentage (typically 80%), instead of focusing on testing critical system behaviors. The result is tests that increase metrics but don't add real value. Why is it an antipattern? The "80% coverage" goal offers very seductive immediate benefits: Easy metric to measure: A clear number you can report to managementObjective justification: "We have 80% coverage, our code is good"Fulfilled contractual obligation: Many contracts explicitly require itFeeling of professionalism: "A good developer has high coverage"Gamification: It's satisfying to see the percentage rise like a video gameEasy comparison: You can compare projects and teams with a simple metric Why it happens: It's much easier to chase a number than to analyze whether tests actually add value. Also, high unit test numbers give a false sense of security, but with a clear conscience. Long-Term Problems Tests without real value: Testing getters/setters just to increase the percentageExtreme coupling: One test class for every production classAbusive mocking: You end up testing your mocks, not the real codeRefactoring obstacle: Changing a constructor breaks 50 tests that didn't even detect bugsFalse security: High coverage doesn't mean error-free code How to Solve It Focus on behavior: Tests that verify real user flows, not lines of code. "Test behavior, not implementation"Coverage delta: More important that it doesn't decrease when you add new codeSocial tests: Tests that exercise multiple classes working togetherMutation testing: Verify your tests actually detect errors using tools like PIT or Mutmut After identifying the most common antipatterns in test automation, it's time to transform these learnings into practical actions to avoid falling into the same traps. Conclusions: Strategy, Context, and Collaboration To build robust test automation that adds value: 1. Define Your Strategy (Golden Circle) Why: What specific business problems do you want to solve?How: Will you focus on APIs, performance, and visual experience?What: Only at the end, choose specific tools 2. Context Determines Validity A practice can be an antipattern in one context and a solution in anotherEvaluate your specific situation: team, product, constraints 3. Collaboration and Consensus Decisions should be made by the complete teamAvoid unilateral impositions 4. Invest in Fundamentals Successful automation requires culture, knowledge, and timeVisible results depend on a solid, invisible foundation 5. Continuous Learning Antipatterns evolve with tools and practicesStay updated and willing to question your own practices Final reflection: These antipatterns don't arise from incompetence, but from rational decisions with limited information. The key is maintaining a long-term perspective and being willing to change when context requires it. Remember: In test automation, what works today can be tomorrow's antipattern. The true professional isn't one who never makes mistakes, but one who constantly questions their own practices.

By Francisco Moreno

CORE

Running AI/ML on Kubernetes: From Prototype to Production — Use MLflow, KServe, and vLLM on Kubernetes to Ship Models With Confidence

Editor's Note: The following is an article written for and published in DZone's 2025 Trend Report, Kubernetes in the Enterprise: Optimizing the Scale, Speed, and Intelligence of Cloud Operations. After training a machine learning model, the inference phase must be fast, reliable, and cost efficient in production. Serving inference at scale, however, brings difficult problems: GPU/resource management, latency and batching, model/version rollout, observability, and orchestration of ancillary services (preprocessors, feature stores, and vector databases). Running artificial intelligence and machine learning (AI/ML) on Kubernetes gives us a scalable, portable platform for training and serving models. Kubernetes schedules GPUs and other resources so that we can pack workloads efficiently and autoscale to match traffic for both batch jobs and real-time inference. It also coordinates multi-component stacks — like model servers, preprocessors, vector DBs, and feature stores — so that complex pipelines and low-latency endpoints run reliably. Containerization enforces reproducible environments and makes CI/CD for models practical. Built-in capabilities like rolling updates, traffic splitting, and metrics/tracing help us run safe production rollouts and meet SLOs for real-time endpoints. For teams that want fewer operations, managed endpoints exist, but Kubernetes is the go-to option when control, portability, advanced orchestration, and real-time serving matter. Let's look into a typical ML inferencing setup using KServer on Kubernetes below: Figure 1. ML inference setup with KServe on Kubernetes Clients (e.g., data scientists, apps, batch jobs) send requests through ingress to a KServe InferenceService. Inside, an optional Transformer pre-processes inputs, the Predictor (required) loads the model and serves predictions, and an optional explainer returns insights. Model artifacts are pulled from model storage (as seen in the diagram) and served via the chosen runtime (e.g., TensorFlow, PyTorch, scikit-learn, ONNX, Triton). Everything runs on Knative/Kubernetes with autoscaling and routing, using the CPU/GPU compute layer from providers such as NVIDIA/AMD/Intel on AWS, Azure, Google Cloud, or on-prem. Part 1: MLFlow and KServe With Kubernetes Let's dive into the practical implementation of an AI/ML scenario. We will use a combination of MLFlow to orchestrate ML processes, scikit-learn to train ML models, and KServe to inference our model in Kubernetes clusters. Introduction to MLFlow MLflow is an open-source ML framework, and we use it to bring order to the chaos that happens when models move from experiments to production. It helps us track runs (parameters, metrics, and files), save the exact environment and code that produced a result, and manage model versions so that we know which one is ready for production. In plain terms, MLflow fixes three common problems: Lost experiment dataMissing environment or code needed to reproduce resultsConfusion about which model is "the" production model; its main pieces — Tracking, Projects, Models, and the Model Registry — map directly to those needs We can also use MLflow to package and serve models (locally as a Docker image or via a registry), which makes it easy to hand off models to a serving platform like Kubernetes. Using MLflow and KServe on Kubernetes MLflow offers a straightforward way to serve models via a FastAPI-based inference server, and mlflow models build-docker lets you containerize that server for Kubernetes deployment. However, this approach can be unsuitable for production at scale; FastAPI is lightweight and not built for extreme concurrency or complex autoscaling patterns, and manual management of numerous inference replicas creates significant operational overhead. KServe (formerly KFServing) delivers a production-grade, Kubernetes-native inference platform with high-performance, scalable, and framework-agnostic serving abstractions for popular ML libraries such as TensorFlow, XGBoost, scikit-learn, and PyTorch. We've created a short step-by-step guide on how to train an ML model with MLflow and scikit-learn, and how to deploy to Kubernetes using KServe. This guide walks you through a complete MLflow workflow to train a linear-regression model with MLflow tracking and perform hyperparameter tuning to determine the best model: Prerequisites – Install Docker, kubectl, and a local cluster (Kind or Minikube) or use a cloud Kubernetes cluster. See Kind/Minikube quickstarts.Install MLflow + MLServer support – Install MLflow with the MLServer extras (pip install mlflow[mlserver]) and review MLServer examples for MLflow.Train and log a model – Train and save the model with mlflow.log_model() (or mlflow.sklearn.autolog()), following the MLflow tutorial.Smoke-test locally – Serve with MLflow/MLServer to validate invocations before Kubernetes: mlflow models serve -m models:/<name> -p 1234 --enable-mlserver. See MLflow models/MLServer examples.Package or publish Option A – Build a Docker image: mlflow models build-docker -m runs:/<run_id>/model -n <your/image> --enable-mlserver → push to a registry.Option B – Push artifacts to remote storage (S3/GCS) and use the storageUri in KServe. Documents and examples can be found here.Deploy to KServe – Create a namespace and apply an InferenceService pointing to your image or storageUri. See KServe's InferenceService quickstartand repo examples. Below is an example (Docker image method + Kubernetes) InferenceService snippet: YAML apiVersion: "serving.kserve.io/v1beta1" kind: InferenceService metadata: name: mlflow-wine-classifier namespace: mlflow-kserve-test spec: predictor: containers: - name: mlflow-wine-classifier image: "<your_docker_user>/mlflow-wine-classifier" ports: - containerPort: 8080 protocol: TCP env: - name: PROTOCOL value: "v2" Verify and productionize – Check Pods (kubectl get pods -n <ns>), call the endpoint, then add autoscaling, metrics, canary rollouts, and explainability as needed (KServe supports these features). The official MLflow documentation also has a good step-by-step guide that covers how to package the model artifacts and dependency environment as an MLflow model, validate local serving with mlserver using mlflow models serve, and deploy the packaged model to a Kubernetes cluster with KServe. Part 2: Managed AutoML: Azure ML to AKS For this example, we selected Azure. However, Azure is just one of many tool providers that can work in this scenario. Azure Machine Learning is a managed platform for the full ML lifecycle — experiment tracking, model registry, training, deployment, and MLOps — that helps teams productionize models quickly. Defining a reliable ML process can be difficult, and Automated ML (AutoML) can simplify that work by automating algorithm selection, feature engineering, and hyperparameter tuning. For low-latency, real-time inference at scale, you can run containers on Kubernetes, the de facto orchestration layer for production workloads. We pick Azure Kubernetes Service (AKS) when we need custom runtimes, strict performance tuning (GPU clusters, custom drivers), integration with existing Kubernetes infrastructure (service mesh, VNETs), or advanced autoscaling rules. If we prefer a managed, low-ops path and don't need deep cluster control, Azure ML's managed online endpoints are usually faster to adopt. We run AutoML in Azure ML to find the best model, register it, and publish it as a low-latency real-time endpoint on AKS so that we keep full control over runtime, scaling, and networking: Prerequisites – Acquire an Azure subscription, an Azure ML workspace, the Azure CLI/ML CLI or SDK, and an AKS Cluster (create one or attach an existing cluster).Run AutoML and pick the winner – Submit an AutoML job (classification/regression/forecast) from the Azure ML studio or SDK and register the top model in the Model Registry.Prepare scoring + environment – Add a minimal score.py (load model, handle request) and an environment spec (Conda/requirements); you can reuse examples from the azureml-examples repo.Attach AKS and deploy – Attach your AKS compute to the workspace (or create AKS), then deploy the registered model as an online/real-time endpoint using the Azure ML CLI or Python SDK.Test and monitor – Call the endpoint, add logging/metrics and autoscaling rules, and use rolling/canary swaps for safe updates. As an example of how AutoML works, I will provide a typical AI/ML pipeline below: Figure 2. Example AI/ML pipeline This ML pipeline contains steps to select, clean up, and transform data from datasets; to split data for training, selecting the ML algorithm, and testing the model; and finally, to score and evaluate the model. All those steps can be automated with AutoML, including several options to deploy models to the AKS/Kubernetes Real-Time API endpoint. Part 3: Serving LLMs on Kubernetes Let's have a look into the combination of LLMs and Kubernetes. We run LLMs on Kubernetes to get reliable, scalable, and reproducible inference: Kubernetes gives us GPU scheduling, autoscaling, and the orchestration primitives to manage large models, batching, and multi-instance serving. By combining optimized runtimes, request batching, and observability (metrics, logging, and health checks), we can deliver low-latency APIs while keeping costs and operational risks under control. To do so, we can use the open-source framework vLLM, which is used when we need high-throughput, memory-efficient LLM inference. On Kubernetes, we run vLLM inside containers and couple it with a serving control plane (like KServe) so that we get autoscaling, routing, canary rollouts, and the standard InferenceService CRD without re-implementing ops logic. This combination gives us both the low-level performance of vLLM and the operational features of a Kubernetes-native inference platform. Let's see how we can deploy LLM to Kubernetes using vLLM and KServe: Prepare cluster and KServe – Provision a Kubernetes cluster (AKS/GKE/EKS or on-prem) and install KServe, following the quickstart.Get vLLM – Clone the vLLM repo or follow the docs to install vLLM and test vllm serve locally to confirm that your model loads and the API works.Create a vLLM ServingRuntime/container – Build a container image or use the vLLM ServingRuntime configuration that KServe supports (the runtime wraps vllm serve with the correct arguments and environment variables).Deploy an InferenceService – Apply a KServe InferenceService that references the vLLM serving runtime (or your image) and model storage (S3/HF cache). KServe will create pods, handle routing, and expose the endpoint.Validate and tune – Hit the endpoint (through ingress/port-forward), measure latency/throughput, and tune vLLM batching/token-cache settings and KServe autoscaling to balance latency and GPU utilization. Last but not least, we can run vLLM, KServe, and BentoML together to get high-performance LLM inference and production-grade ops. Here is a short breakdown: vLLM – the high-throughput, GPU-efficient inference engine (token generation, KV-cache, and batching) — the runtime that actually executes the LLMBentoML – the developer packaging layer that wraps model loading, custom pre-/post-processing, and a stable REST/gRPC API, then builds a reproducible Docker image or artifactKServe – the Kubernetes control plane that deploys the container (Bento image or a vLLM serving image) and handles autoscaling, routing/ingress, canaries, health checks, and lifecycle management How do they fit together? We package our model and request logic with BentoML (image or Bento bundle), which runs the vLLM server for inference. KServe then runs that container on Kubernetes as an InferenceService (or ServingRuntime), giving autoscale, traffic controls, and observability. Pros and Cons of Kubernetes Inference Frameworks for ML We already had a look at the KServe library. However, there are other powerful alternatives. Let's look at the table below: Table 1. KServe alternative tools and libraries LibraryOverviewProsCons Seldon Core Kubernetes-native ML serving and orchestration framework offering CRDs for deployments, routing, and advanced traffic control Kubernetes-first (CRDs, Istio/Envoy integrations); rich routing (canary, A/B); built-in telemetry and explainer integrations; supports multiple runtimes Steeper learning curve; more operational surface to manage; heavier cluster footprint BentoML (with Yatai) Python-centric model packaging and serving; Yatai/Helm lets you run Bento services on Kubernetes as deployments/CRDs Excellent developer ergonomics and reproducible images; fast local dev loop; simple CI/CD image artifacts Less cluster-native controls out of the box (needs Yatai/Helm); autoscaling and advanced Kubernetes ops require extra setup NVIDIA Triton Inference Server High-performance GPU-optimized inference engine supporting TensorRT, TensorFlow, PyTorch, ONNX, and custom back ends Exceptional GPU throughput and mixed-framework support; batch and model ensemble optimizations; production-grade performance tuning Less cluster-native controls out of the box (needs Yatai/Helm); autoscaling and advanced Kubernetes ops require extra setup Conclusion Our goal is to run reliable, low-latency AI/ML in production while keeping control of cost, performance, and repeatability. Kubernetes gives us the orchestration primitives we need — GPU scheduling, autoscaling, traffic control, and multi-service coordination — so that models and their supporting services can run predictably at scale. Paired with optimized runtimes, serving layers, and inference engines, we get both high-inference performance and production-grade operational controls. The result is portable, reproducible deployments with built-in observability, safe rollout patterns, and better resource efficiency. Start small, validate with a single model and clear SLOs, pick the serving stack that matches your performance and ops needs, then iterate. Kubernetes lets you grow from prototype to resilient, scalable serving. This is an excerpt from DZone's 2025 Trend Report, Kubernetes in the Enterprise: Optimizing the Scale, Speed, and Intelligence of Cloud Operations.Read the Free Report

By Boris Zaikin

CORE

Spring REST API Client Flavors: From RestTemplate to RestClient

Just as humans have always preferred co-existing and communicating ideas, looking for and providing pieces of advice from and to their fellow humans, applications nowadays find themselves in the same situation, where they need to exchange data in order to collaborate and fulfill their purposes. At a very high level, applications’ interactions are carried out either conversationally (the case of REST APIs), where the information is exchanged synchronously by asking and responding, or asynchronously via notifications (the case of event-driven APIs), where data is sent by producers and picked up by consumers as it becomes available and they are ready. This article is an analysis of the synchronous communication between a client and a server via REST, with a focus on the client part. Its main purpose is to present how a Spring REST API client can be implemented, first using the RestTemplate, then the newer RestClient and seamlessly accomplish the same interaction. A Brief History RestTemplate was introduced in Spring Framework version 3.0, and according to the API reference, it’s a “synchronous client to perform HTTP requests, exposing a simple, template method API over underlying HTTP client libraries.” Flexible and highly configurable, it’s been for a long time the best choice when a fully-fledged synchronous but blocking HTTP client was implemented as part of a Spring application. As time has passed, its lack of non-blocking capabilities, the use of the old-fashioned template pattern, and the pretty cumbersome API significantly contributed to the emergence of a new, more modern HTTP client library, one that may also handle non-blocking and asynchronous calls. Spring Framework version 5.0 introduced WebClient, “a fluent, reactive API, over underlying HTTP client libraries.” It was especially designed for the WebFlux stack and by following the modern and functional API style, it was much cleaner and easier to use by developers. Nevertheless, for blocking scenarios, WebClient‘s ease of use benefit came with an extra cost – the need to add an additional library dependency into the project. Starting with Spring Framework version 6.1 and Spring Boot version 3.2, a new component is available — RestClient — which “offers a more modern API for synchronous HTTP access.” The evolution has been quite significant, developers nowadays may choose among these three options, RestTemplate, WebClient and RestClient, depending on the application needs and particularities. Implementation As stated above, the proof of concept in this article experiments with both RestTemplate and RestClient, leaving the WebClient aside as the communication here is conversational, that is synchronous. There are two simple actors involved, two applications: figure-service – the server that exposes the REST API and allows managing Figuresfigure-client – the client that consumes the REST API and actually manages the Figures Both are custom-made and use Java 21, Spring Boot version 3.5.3, and Maven version 3.9.9. A Figure is a generic entity that could denote a fictional character, a superhero, or a Lego mini-figure, for instance. The Server figure-service is a small service that allows performing common CRUD operations on simple entities that represent figures. As the focus in this article is on the client, server characteristics are only highlighted. The implementation is done in a standard, straightforward manner in accordance with the common best practices. The service exposes a REST API to manage figures: read all – GET /api/v1/figuresread one – GET /api/v1/figures/{id}read a random one – GET /api/v1/figures/randomcreate one – POST /api/v1/figuresupdate one – PUT /api/v1/figures/{id}delete one – DELETE /api/v1/figures/{id} The operations are secured at a minimum with an API key that shall be available as a request header Plain Text "x-api-key": the api key Figure entities are stored in an in-memory H2 database, described by a unique identifier, a name, and a code, and modelled as below: Java @Entity @Table(name = "figures") public class Figure { @Id @GeneratedValue private Long id; @Column(name = "name", unique = true, nullable = false) private String name; @Column(name = "code", nullable = false) private String code; ... } While the id and the name are visible to the outside world, the code is considered business domain information and kept private. Thus, the used DTOs look as below: Java public record FigureRequest(String name) {} public record FigureResponse(long id, String name) {} All server exceptions are handled generically in a single ResponseEntityExceptionHandler and sent back to the client in the following form, with the corresponding HTTP status: JSON { "title": "Bad Request", "status": 400, "detail": "Figure not found.", "instance": "/api/v1/figures/100" } Java public record ErrorResponse(String title, int status, String detail, String instance) {} Basically, in this service implementation, a client receives either the aimed response (if any) or one highlighting the error (detailed at point 4), in case there is a service one. This resource contains the figure-service source code, it may be browsed for additional details. The Client Let’s assume figure-client is an application that uses Figure entities as part of its business operations. As these are managed and exposed by the figure-service, the client needs to communicate via REST with the server, but also to comply with the contract and the requirements of the service provider. In this direction, a few considerations are needed prior to the actual implementation. Contract Since the synchronous communication is first implemented using a RestTemplate, then modified to use the RestClient, the client operations are outlined in the interface below. Java public interface FigureClient { List<Figure> allFigures(); Optional<Figure> oneFigure(long id); Figure createFigure(FigureRequest figure); Figure updateFigure(long id, FigureRequest figureRequest); void deleteFigure(long id); Figure randomFigure(); } In this manner, the implementation change is isolated and does not impact other application parts. Authentication As the server access is secured, a valid API key is needed. Once available, it is stored as an environment variable and used via the application.properties into a ClientHttpRequestInterceptor. According to the API reference, such a component defines the contract to intercept client-side HTTP requests and allows implementers to modify the outgoing request and/or incoming response. For this use case, all requests are intercepted, and the configured API key is set as the x-api-key header, then the execution is resumed. Java @Component public class AuthInterceptor implements ClientHttpRequestInterceptor { private final String apiKey; public AuthInterceptor(@Value("${figure.service.api.key}") String apiKey) { this.apiKey = apiKey; } @Override public ClientHttpResponse intercept(HttpRequest request, byte[] body, ClientHttpRequestExecution execution) throws IOException { request.getHeaders() .add("x-api-key", apiKey); return execution.execute(request, body); } } The AuthInterceptor is used in the RestTemplate configuration. Data Transfer Objects (DTOs) Particularly in this POC, as the Figure entities are trivial in terms of the attributes that describe them, the DTOs used in the operations of interest and by the RestTemplate are simple as well. Java public record FigureRequest(String name) {} public record Figure(long id, String name) {} Since once read, Figure objects might be further used, their name was simplified, although they denote response DTOs. Exception Handling RestTemplate (and then RestClient) allows setting a ResponseErrorHandler implementation during its configuration, a strategy interface used to determine whether a particular response has errors or not, and permits custom handling. In this POC, as the figure-service sends all errors in the same form, it is very convenient and easy to adopt a generic handling manner. Java @Component public class CustomResponseErrorHandler implements ResponseErrorHandler { private static final Logger log = LoggerFactory.getLogger(CustomResponseErrorHandler.class); private final ObjectMapper objectMapper; public CustomResponseErrorHandler() { objectMapper = new ObjectMapper(); } @Override public boolean hasError(ClientHttpResponse response) throws IOException { return response.getStatusCode().isError(); } @Override public void handleError(URI url, HttpMethod method, ClientHttpResponse response) throws IOException { HttpStatusCode statusCode = response.getStatusCode(); String body = new String(response.getBody().readAllBytes()); if (statusCode.is4xxClientError()) { throw new CustomException("Client error.", statusCode, body); } String message = null; try { message = objectMapper.readValue(body, ErrorResponse.class).detail(); } catch (JsonProcessingException e) { log.error("Failed to parse response body: {}", e.getMessage(), e); } throw new CustomException(message, statusCode, body); } @JsonIgnoreProperties(ignoreUnknown = true) private record ErrorResponse(String detail) {} } The logic here is the following: Both client and server errors are considered and handled — see hasError() method.All errors result in a custom RuntimeException decorated with an HTTP status code and a detail, the default being the general Internal Server Error and the raw response body, respectively. Java public class CustomException extends RuntimeException { private final HttpStatusCode statusCode; private final String detail; public CustomException(String message) { super(message); this.statusCode = HttpStatusCode.valueOf(HttpStatus.INTERNAL_SERVER_ERROR.value()); this.detail = null; } public CustomException(String message, HttpStatusCode statusCode, String detail) { super(message); this.statusCode = statusCode; this.detail = detail; } public HttpStatusCode getStatusCode() { return statusCode; } public String getDetail() { return detail; } } In case of recoverable errors, all methods declared in FigureClient are throwing CustomExceptions, thus providing a simple exception handling mechanism. An extraction of the detail provided by the figure-service in the response body is first attempted and if possible, included in the CustomException, otherwise, the body is set as such Useful to Have Although not required, especially during development, but not only, it proves very useful to be able to see the requests and the responses exchanged in the logs of the client application. In order to accomplish this, a LoggingInterceptor is added to the RestTemplate configuration. Java @Component public class LoggingInterceptor implements ClientHttpRequestInterceptor { private static final Logger log = LoggerFactory.getLogger(LoggingInterceptor.class); @Override public ClientHttpResponse intercept(HttpRequest request, byte[] body, ClientHttpRequestExecution execution) throws IOException { logRequest(body); ClientHttpResponse response = execution.execute(request, body); logResponse(response); return response; } private void logRequest(byte[] body) { var bodyContent = new String(body); log.debug("Request body : {}", bodyContent); } private void logResponse(ClientHttpResponse response) throws IOException { var bodyContent = StreamUtils.copyToString(response.getBody(), Charset.defaultCharset()); log.debug("Response body: {}", bodyContent); } } Here, only the request and response bodies are logged, although other items might be of interest as well (headers, response statuses, etc.). Although useful, there is a gotcha worth explaining that needs to be taken into account. As it can be depicted, when the response is logged in the above interceptor, it is basically read and the stream “consumed”, which determines the client to eventually end up with an empty body. To prevent this, a BufferingClientHttpRequestFactory component shall be used, a component that allows buffering the stream content into memory and thus be able to read the response twice. The response availability is now resolved, but buffering the entire response body into memory might not be a good idea when its size is significant. Before jumping into blindly using it out of the box, developers should analyze the possible performance impact and adapt, particularly for each application. Configuration Having clarified the figure-service contract and requirements, and moreover, having already implemented certain “pieces,” the RestTemplate can now be configured. Java @Bean public RestOperations restTemplate(LoggingInterceptor loggingInterceptor, AuthInterceptor authInterceptor, CustomResponseErrorHandler customResponseErrorHandler) { RestTemplateCustomizer customizer = restTemplate -> restTemplate.getInterceptors() .addAll(List.of(loggingInterceptor, authInterceptor)); return new RestTemplateBuilder(customizer) .requestFactory(() -> new BufferingClientHttpRequestFactory(new SimpleClientHttpRequestFactory())) .errorHandler(customResponseErrorHandler) .build(); } A RestTemplateBuilder is used, the LoggingInterceptor, AuthInterceptor are added via a RestTemplateCustomizer, while the error handler is set to a CustomResponseErrorHandler instance. RestTemplate Implementation Once the RestTemplate instance is constructed, it can be injected into the actual FigureClient implementation and used to communicate with the figure-service. Java @Service public class FigureRestTemplateClient implements FigureClient { private final String url; private final RestOperations restOperations; public FigureRestTemplateClient(@Value("${figure.service.url}") String url, RestOperations restOperations) { this.url = url; this.restOperations = restOperations; } @Override public List<Figure> allFigures() { ResponseEntity<Figure[]> response = restOperations.exchange(url, HttpMethod.GET, null, Figure[].class); Figure[] figures = response.getBody(); if (figures == null) { throw new CustomException("Could not get the figures."); } return List.of(figures); } @Override public Optional<Figure> oneFigure(long id) { ResponseEntity<Figure> response = restOperations.exchange(url + "/{id}", HttpMethod.GET, null, Figure.class, id); Figure figure = response.getBody(); if (figure == null) { return Optional.empty(); } return Optional.of(figure); } @Override public Figure createFigure(FigureRequest figureRequest) { HttpEntity<FigureRequest> request = new HttpEntity<>(figureRequest); ResponseEntity<Figure> response = restOperations.exchange(url, HttpMethod.POST, request, Figure.class); Figure figure = response.getBody(); if (figure == null) { throw new CustomException("Could not create figure."); } return figure; } @Override public Figure updateFigure(long id, FigureRequest figureRequest) { HttpEntity<FigureRequest> request = new HttpEntity<>(figureRequest); ResponseEntity<Figure> response = restOperations.exchange(url + "/{id}", HttpMethod.PUT, request, Figure.class, id); Figure figure = response.getBody(); if (figure == null) { throw new CustomException("Could not update figure."); } return figure; } @Override public void deleteFigure(long id) { restOperations.exchange(url + "/{id}", HttpMethod.DELETE, null, Void.class, id); } @Override public Figure randomFigure() { ResponseEntity<Figure> response = restOperations.exchange(url + "/random", HttpMethod.GET, null, Figure.class); Figure figure = response.getBody(); if (figure == null) { throw new CustomException("Could not get a random figure."); } return figure; } } In order to observe how this solution works end-to-end, first, the figure-service is started. A CommandLineRunner is configured there, so that a few Figure entities are persisted into the database. Java @Bean public CommandLineRunner initDatabase(FigureService figureService) { return args -> { log.info("Loading data..."); figureService.create(new Figure("Lloyd")); figureService.create(new Figure("Jay")); figureService.create(new Figure("Kay")); figureService.create(new Figure("Cole")); figureService.create(new Figure("Zane")); log.info("Available figures:"); figureService.findAll() .forEach(figure -> log.info("{}", figure)); }; } Then, as part of the figure-client application, a FigureRestTemplateClient instance is injected into the following integration test. Java @SpringBootTest class FigureClientTest { @Autowired private FigureRestTemplateClient figureClient; @Test void allFigures() { List<Figure> figures = figureClient.allFigures(); Assertions.assertFalse(figures.isEmpty()); } @Test void oneFigure() { long id = figureClient.allFigures().stream() .findFirst() .orElseThrow(() -> new RuntimeException("No figures found")) .id(); Optional<Figure> figure = figureClient.oneFigure(id); Assertions.assertTrue(figure.isPresent()); } @Test void createFigure() { var request = new FigureRequest( "Fig " + UUID.randomUUID()); Figure figure = figureClient.createFigure(request); Assertions.assertNotNull(figure); Assertions.assertTrue(figure.id() > 0L); Assertions.assertEquals(request.name(), figure.name()); CustomException ex = Assertions.assertThrows(CustomException.class, () -> figureClient.createFigure(request)); Assertions.assertEquals("A Figure with the same 'name' already exists.", ex.getMessage()); Assertions.assertEquals(HttpStatus.BAD_REQUEST.value(), ex.getStatusCode().value()); Assertions.assertEquals(""" {"title":"Bad Request","status":400,"detail":"A Figure with the same 'name' already exists.","instance":"/api/v1/figures"}""", ex.getDetail()); } @Test void updateFigure() { List<Figure> figures = figureClient.allFigures(); long id = figures.stream() .findFirst() .orElseThrow(() -> new RuntimeException("No figures found")) .id(); var updatedRequest = new FigureRequest("Updated Fig " + UUID.randomUUID()); Figure updatedFigure = figureClient.updateFigure(id, updatedRequest); Assertions.assertNotNull(updatedFigure); Assertions.assertEquals(id, updatedFigure.id()); Assertions.assertEquals(updatedRequest.name(), updatedFigure.name()); Figure otherExistingFigure = figures.stream() .filter(f -> f.id() != id) .findFirst() .orElseThrow(() -> new RuntimeException("Not enough figures")); var updateExistingRequest = new FigureRequest(otherExistingFigure.name()); CustomException ex = Assertions.assertThrows(CustomException.class, () -> figureClient.updateFigure(id, updateExistingRequest)); Assertions.assertEquals(HttpStatus.INTERNAL_SERVER_ERROR.value(), ex.getStatusCode().value()); } @Test void deleteFigure() { long id = figureClient.allFigures().stream() .findFirst() .orElseThrow(() -> new RuntimeException("No figures found")) .id(); figureClient.deleteFigure(id); CustomException ex = Assertions.assertThrows(CustomException.class, () -> figureClient.deleteFigure(id)); Assertions.assertEquals(HttpStatus.BAD_REQUEST.value(), ex.getStatusCode().value()); Assertions.assertEquals("Figure not found.", ex.getMessage()); } @Test void randomFigure() { CustomException ex = Assertions.assertThrows(CustomException.class, () -> figureClient.randomFigure()); Assertions.assertEquals(HttpStatus.INTERNAL_SERVER_ERROR.value(), ex.getStatusCode().value()); Assertions.assertEquals("Not implemented yet.", ex.getMessage()); } } When running, for instance, the above createFigure() test, the RestTemplate and the LoggingInterceptor contribute to clearly describe what’s happening and display it in the client log: Plain Text [main] DEBUG RestTemplate#HTTP POST http://localhost:8082/api/v1/figures [main] DEBUG InternalLoggerFactory#Using SLF4J as the default logging framework [main] DEBUG RestTemplate#Accept=[application/json, application/*+json] [main] DEBUG RestTemplate#Writing [FigureRequest[name=Fig 6aa854a5-ba7a-4bbf-8160-70adf7d3e59b]] with org.springframework.http.converter.json.MappingJackson2HttpMessageConverter [main] DEBUG LoggingInterceptor#Request body : {"name":"Fig 6aa854a5-ba7a-4bbf-8160-70adf7d3e59b"} [main] DEBUG LoggingInterceptor#Response body: {"id":8,"name":"Fig 6aa854a5-ba7a-4bbf-8160-70adf7d3e59b"} [main] DEBUG RestTemplate#Response 201 CREATED [main] DEBUG RestTemplate#Reading to [com.hcd.figureclient.service.dto.Figure] [main] DEBUG RestTemplate#HTTP POST http://localhost:8082/api/v1/figures [main] DEBUG RestTemplate#Accept=[application/json, application/*+json] [main] DEBUG RestTemplate#Writing [FigureRequest[name=Fig 6aa854a5-ba7a-4bbf-8160-70adf7d3e59b]] with org.springframework.http.converter.json.MappingJackson2HttpMessageConverter [main] DEBUG LoggingInterceptor#Request body : {"name":"Fig 6aa854a5-ba7a-4bbf-8160-70adf7d3e59b"} [main] DEBUG LoggingInterceptor#Response body: {"title":"Bad Request","status":400,"detail":"A Figure with the same 'name' already exists.","instance":"/api/v1/figures"} [main] DEBUG RestTemplate#Response 400 BAD_REQUEST And with that, a client implementation using RestTemplate is complete. RestClient Implementation The aim here, as stated from the beginning, is to be able to accomplish the same, but instead of using RestTemplate, to use a RestClient instance. As the LoggingInterceptor, AuthInterceptor and the CustomResponseErrorHandler can be reused, they are not changed, and the RestClient configured as below. Java @Bean public RestClient restClient(@Value("${figure.service.url}") String url, LoggingInterceptor loggingInterceptor, AuthInterceptor authInterceptor, CustomResponseErrorHandler customResponseErrorHandler) { return RestClient.builder() .baseUrl(url) .requestFactory(new BufferingClientHttpRequestFactory(new SimpleClientHttpRequestFactory())) .requestInterceptor(loggingInterceptor) .requestInterceptor(authInterceptor) .defaultStatusHandler(customResponseErrorHandler) .build(); } Then, the instance is injected into a new FigureClient implementation. Java @Service public class FigureRestClient implements FigureClient { private final RestClient restClient; public FigureRestClient(RestClient restClient) { this.restClient = restClient; } @Override public List<Figure> allFigures() { var figures = restClient.get() .retrieve() .body(Figure[].class); if (figures == null) { throw new CustomException("Could not get the figures."); } return List.of(figures); } @Override public Optional<Figure> oneFigure(long id) { var figure = restClient.get() .uri("/{id}", id) .retrieve() .body(Figure.class); return Optional.ofNullable(figure); } @Override public Figure createFigure(FigureRequest figureRequest) { var figure = restClient.post() .contentType(MediaType.APPLICATION_JSON) .body(figureRequest) .retrieve() .body(Figure.class); if (figure == null) { throw new CustomException("Could not create figure."); } return figure; } @Override public Figure updateFigure(long id, FigureRequest figureRequest) { var figure = restClient.put() .uri("/{id}", id) .contentType(MediaType.APPLICATION_JSON) .body(figureRequest) .retrieve() .body(Figure.class); if (figure == null) { throw new CustomException("Could not update figure."); } return figure; } @Override public void deleteFigure(long id) { restClient.delete() .uri("/{id}", id) .retrieve() .toBodilessEntity(); } @Override public Figure randomFigure() { var figure = restClient.get() .uri("/random") .retrieve() .body(Figure.class); if (figure == null) { throw new CustomException("Could not get a random figure."); } return figure; } } In addition to these, there is only one important step left: to test the client-server integration. In order to fulfill that, it is enough to replace the FigureRestTemplateClient instance with the FigureRestClient one above in the previous FigureClientTest. Java @SpringBootTest class FigureClientTest { @Autowired private FigureRestClient figureClient; ... } If running, for instance, the same createFigure() test, the client output is similar. Apparently, RestClient is not as generous (or verbose) as RestTemplate when it comes to logging, but there is room for improvement as part of the custom LoggingInterceptor. Plain Text [main] DEBUG DefaultRestClient#Writing [FigureRequest[name=Fig 1155fd2c-91fe-486d-aaa3-35bf682629d4]] as "application/json" with org.springframework.http.converter.json.MappingJackson2HttpMessageConverter [main] DEBUG LoggingInterceptor#Request body : {"name":"Fig 1155fd2c-91fe-486d-aaa3-35bf682629d4"} [main] DEBUG LoggingInterceptor#Response body: {"id":9,"name":"Fig 1155fd2c-91fe-486d-aaa3-35bf682629d4"} [main] DEBUG DefaultRestClient#Reading to [com.hcd.figureclient.service.dto.Figure] [main] DEBUG DefaultRestClient#Writing [FigureRequest[name=Fig 1155fd2c-91fe-486d-aaa3-35bf682629d4]] as "application/json" with org.springframework.http.converter.json.MappingJackson2HttpMessageConverter [main] DEBUG LoggingInterceptor#Request body : {"name":"Fig 1155fd2c-91fe-486d-aaa3-35bf682629d4"} [main] DEBUG LoggingInterceptor#Response body: {"title":"Bad Request","status":400,"detail":"A Figure with the same 'name' already exists.","instance":"/api/v1/figures"} That’s it, the migration from RestTemplate to RestClient is now complete. Conclusions When it comes to new synchronous API client implementations, I find RestClient the best choice mostly for its functional and fluent API style. For older projects, which had been started before Spring Framework version 6.1 (Spring Boot 3.2, respectively), introduced RestClient and most probably are still using RestTemplate, I consider the migration worth planning and doing (more details in [Resource 4]). Moreover, the possibility of reusing existing components (ClientHttpRequestInterceptors, ResponseErrorHandlers, etc.) is another incentive for such a migration. Ultimately, as a last resort, it is even possible to create a RestClient instance using the already configured RestTemplate and go from there, although I find this solution pretty tangled. Resources RestTemplate Spring Framework API ReferenceWebClient Spring Framework API ReferenceRestClient Spring Framework API ReferenceMigrating from RestTemplate to RestClientfigure-service source codefigure-client source codeThe picture was taken in Bucharest, Romania.

By Horatiu Dan

Integrating AI Into Test Automation Frameworks With the ChatGPT API

When I first tried to implement AI in a test automation framework, I expected it to be helpful only for a few basic use cases. A few experiments later, I noticed several areas where the ChatGPT API actually saved me time and gave the test automation framework more power: producing realistic test data, analyzing logs in white-box tests, and handling flaky tests in CI/CD. Getting Started With the ChatGPT API ChatGPT API is a programming interface by OpenAI that operates on top of the HTTP(s) protocol. It allows sending requests and retrieving outputs from a pre-selected model as raw text, JSON, XML, or any other format you prefer to work with. The API documentation is clear enough to get started, with examples of request/response bodies that made the first call straightforward. In my case, I just generated an API key in the OpenAI developer platform and plugged it into the framework properties to authenticate requests. Building a Client for Integration With the API I built the integration in both Java and Python, and the pattern is the same: Send a POST with JSON and read the response, so it can be applied in almost any programming language. Since I prefer to use Java in automation, here is an example of what a client might look like: Java import java.net.http.*; import java.net.URI; import java.time.Duration; public class OpenAIClient { private final HttpClient http = HttpClient.newBuilder() .connectTimeout(Duration.ofSeconds(20)).build(); private final String apiKey; public OpenAIClient(String apiKey) { this.apiKey = apiKey; } public String chat(String userPrompt) throws Exception { String body = """ { "model": "gpt-5-mini", "messages": [ {"role":"system","content":"You are a helpful assistant for test automation..."}, {"role":"user","content": %s} ] } """.formatted(json(userPrompt)); HttpRequest req = HttpRequest.newBuilder() .uri(URI.create("https://api.openai.com/v1/chat/completions")) .timeout(Duration.ofSeconds(60)) .header("Authorization", "Bearer " + apiKey) .header("Content-Type", "application/json") .POST(HttpRequest.BodyPublishers.ofString(body)) .build(); HttpResponse<String> res = http.send(req, HttpResponse.BodyHandlers.ofString()); if (res.statusCode() >= 300) throw new RuntimeException(res.body()); return res.body(); } } As you probably have already noticed, one of the query parameters in the request body is the GPT model. Models differ in speed, cost, and capabilities: some are faster, while others are slower; some are expensive, while others are cheap, and some support multimodality, while others do not. Therefore, before integrating with the ChatGPT API, I recommend that you determine which model is best suited for performing tasks and set limits for it. On the OpenAI website, you can find a page where you can select several models and compare them to make a better choice. It will also probably be good to know that custom client implementation can also be extended to support server-sent streaming events to show results as they’re generated, and the Realtime API for multimodal purposes. This is what you can use for processing logs and errors in real time and identifying anomalies on the fly. Integration Architecture In my experience, integration with the ChatGPT API only makes sense in testing when applied to the correct problems. In my practice, I found three real-world scenarios I mentioned earlier, and now let’s take a closer look at them. Use Case 1: Test Data Generation The first use case I tried was a test data generation for automation tests. Instead of relying on hardcoded values, ChatGPT can provide strong and realistic data sets, ranging from user profiles with household information to unique data used in exact sciences. In my experience, this variety of data helped uncover issues that fixed or hardcoded data would never catch, especially around boundary values and rare edge cases. The diagram below illustrates how this integration with the ChatGPT API for generating test data works. At the initial stage, the TestNG Runner launches the suite, and before running the test, it goes to the ChatGPT API and requests test data for the automation tests. This test data will later be processed at the data provider level, and automated tests will be run against it with newly generated data and expected assertions. Java class TestUser { public String firstName, lastName, email, phone; public Address address; } class Address { public String street, city, state, zip; } public List<TestUser> generateUsers(OpenAIClient client, int count) throws Exception { String prompt = """ You generate test users as STRICT JSON only. Schema: {"users":[{"firstName":"","lastName":"","email":"","phone":"", "address":{"street":"","city":"","state":"","zip":""}]} Count = %d. Output JSON only, no prose. """.formatted(count); String content = client.chat(prompt); JsonNode root = new ObjectMapper().readTree(content); ArrayNode arr = (ArrayNode) root.path("users"); List<TestUser> out = new ArrayList<>(); ObjectMapper m = new ObjectMapper(); arr.forEach(n -> out.add(m.convertValue(n, TestUser.class))); return out; } This solved the problem of repetitive test data and helped to detect errors and anomalies earlier. The main challenge was prompt reliability, and if the prompt wasn’t strict enough, the model would add extra text that broke the JSON parser. In my case, the versioning of prompts was the best way to keep improvements under control. Use Case 2: Log Analysis In some recent open-source projects I came across, automated tests also validated system behavior by analyzing logs. In most of these tests, there is an expectation that a specific message should appear in the application console or in DataDog or Loggly, for example, after calling one of the REST endpoints. Such tests are needed when the team conducts white-box testing. But what if we take it a step further and try to send logs to ChatGPT, asking it to check the sequence of messages and identify potential anomalies that may be critical for the service? Such an integration might look like this: When an automated test pulls service logs (e.g., via the Datadog API), it groups them and sends a sanitized slice to the ChatGPT API for analysis. The ChatGPT API has to return a structured verdict with a confidence score. In case anomalies are flagged, the test fails and displays the reasons from the response; otherwise, it passes. This should keep assertions focused while catching unexpected patterns you didn’t explicitly code for. The Java code for this use case might look like this: Java //Redaction middleware (keep it simple and fast) public final class LogSanitizer { private LogSanitizer() {} public static String sanitize(String log) { if (log == null) return ""; log = log.replaceAll("(?i)(api[_-]?key\\s*[:=]\\s*)([a-z0-9-_]{8,})", "$1[REDACTED]"); log = log.replaceAll("([A-Za-z0-9-_]{20,}\\.[A-Za-z0-9-_]+\\.[A-Za-z0-9-_]+)", "[REDACTED_JWT]"); log = log.replaceAll("[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+", "[REDACTED_EMAIL]"); return log; } } //Ask for a structured verdict record Verdict(String verdict, double confidence, List<String> reasons) {} public Verdict analyzeLogs(OpenAIClient client, String rawLogs) throws Exception { String safeLogs = LogSanitizer.sanitize(rawLogs); String prompt = """ You are a log-analysis assistant. Given logs, detect anomalies (errors, timeouts, stack traces, inconsistent sequences). Respond ONLY as JSON with this exact schema: {"verdict":"PASS|FAIL","confidence":0.0-1.0,"reasons":["...","..."]} Logs (UTC): ---------------- %s ---------------- """.formatted(safeLogs); // Chat with the model and parse the JSON content field String content = client.chat(prompt); ObjectMapper mapper = new ObjectMapper(); JsonNode jNode = mapper.readTree(content); String verdict = jNode.path("verdict").asText("PASS"); double confidence = jNode.path("confidence").asDouble(0.0); List<String> reasons = mapper.convertValue( jNode.path("reasons").isMissingNode() ? List.of() : jNode.path("reasons"), new com.fasterxml.jackson.core.type.TypeReference<List<String>>() {} ); return new Verdict(verdict, confidence, reasons); } Before implementing such an integration, it is important to remember that logs often contain sensitive information, which may include API keys, JWT tokens, or user email addresses. Therefore, sending raw logs to the cloud API is a security risk, and in this case, the data sanitization must be performed. That is why, in my example, I added a simple LogSanitizer middleware to sanitize sensitive data before sending these logs to the ChatGPT API. It is also important to understand that this approach does not replace traditional assertions, but complements them. You can use them instead of dozens of complex checks, allowing the model to detect abnormal behavior. The most important thing is to treat the ChatGPT API verdict as a recommendation and leave the final decision to the automated framework itself based on the specified threshold values. For example, consider a test a failure only if the confidence is higher than 0.8. Use Case 3: Test Stabilization One of the most common problems in test automation is the occurrence of flaky tests. Tests can fail for various reasons, including changes to the API contract or interface. However, the worst scenario is when tests fail due to an unstable testing environment. Typically, for such unstable tests, the teams usually enable retries, and the test is run multiple times until it passes or, conversely, fails after three unsuccessful attempts in a row. But what if we give artificial intelligence the opportunity to decide whether a test needs to be restarted or whether it can be immediately marked as failed or vice versa? Here’s how this idea can be applied in a testing framework: When a test fails, the first step is to gather as much context as possible, including the stack trace, service logs, environment configuration, and, if applicable, a code diff. All this data should be sent to the ChatGPT API for analysis to obtain a verdict, which is then passed to the AiPolicy. It is essential not to let ChatGPT make decisions independently. If the confidence level is high enough, AiPolicy can quarantine the test to prevent the pipeline from being blocked, and when the confidence level is below a specific value, the test can be re-tried or immediately marked as failed. I believe it is always necessary to leave the decision logic to the automation framework to maintain control over the test results, while still using AI-based integration. The main goal for this idea is to save time on analyzing unstable tests and reduce their number. Reports after processing data by ChatGPT become more informative and provide clearer insights into the root causes of failures. Conclusion I believe that integrating the ChatGPT API into a test automation framework can be an effective way to extend its capabilities, but there are compromises to this integration that need to be carefully weighed. One of the most important factors is cost. For example, in a set of 1,000 automated tests, of which about 20 fail per run, sending logs, stack traces, and environment metadata to the API can consume over half a million input tokens per run. Adding test data generation to this quickly increases token consumption. In my opinion, the key point is that the cost is directly proportional to the amount of data: the more you send, the more you pay. Another major issue I noticed is the security and privacy concerns. Logs and test data often contain sensitive information such as API keys, JWT tokens, or users' data, and sending raw data to the cloud is rarely acceptable in production. In practice, this means either using open-source LLMs like LLaMA deployed locally or providing a redaction/anonymization layer between your framework and the API so that sensitive fields are removed or replaced before anything leaves your testing environment. Model selection also plays a role. I've found that in many cases the best strategy is to combine them: using smaller models for routine tasks, and larger ones only where higher accuracy really matters. With these considerations in mind, the ChatGPT API can bring real value to testing. It helps generate realistic test data, analyze logs more intelligently, and makes it easier to manage flaky tests. The integration also makes reporting more informative, adding context and analytics that testers would otherwise have to research manually. As I have observed in practice, utilizing AI effectively requires controlling costs, protecting sensitive data, and maintaining decision-making logic within an automation framework to enable effective regulation of AI decisions. It reminds me of the early days of automation, when teams were beginning to weigh the benefits against the limitations to determine where the real value lay.

By Serhii Romanov

How to Build Secure Knowledge Base Integrations for AI Agents

Done well, knowledge base integrations enable AI agents to deliver specific, context-rich answers without forcing employees to dig through endless folders. Done poorly, they introduce security gaps and permissioning mistakes that erode trust. The challenge for software developers building these integrations is that no two knowledge bases handle permissions the same way. One might gate content at the space level, another at the page level, and a third at the attachment level. Adding to these challenges, permissions aren't static. They change when people join or leave teams, switch roles, or when content owners update visibility rules. If your integration doesn't mirror these controls accurately and in real time, you risk exposing the wrong data to the wrong person. In building these knowledge base integrations ourselves, we've learned lots of practical tips for how to build secure, maintainable connectors that shorten the time to deployment without cutting corners on data security. 1. Treat Permissions as a First-Class Data Type Too many integration projects prioritize syncing content over permissions. This approach is backwards. Before your AI agent processes a single page, it should understand the permission model of the source system and be able to represent it internally. This means: Mapping every relevant permission scope in the source system (space, folder, page, attachment, comment).Representing permissions in your data model so your AI agent can enforce them before returning a result.Designing for exceptions. For example, if an article is generally public within a department but contains one restricted attachment, your connector should respect that partial restriction. For example, in a Confluence integration, you should check both space-level and page-level rules for each request. If you cache content to speed up retrieval, you must also cache the permissions and invalidate them promptly when they change. 2. Sync Permissions as Often as Content Permissions drift quickly. Someone might be promoted, transferred, or removed from a sensitive project, and the content they previously accessed is suddenly off-limits. Your AI agent should never rely on a stale permission snapshot. A practical approach is to tie permission updates to the same sync cadence as content updates. If you're fetching new or updated articles every five minutes, refresh the associated access control lists (ACLs) on the same schedule. If the source system supports webhooks or event subscriptions for permission changes, use them to trigger targeted re-syncs. 3. Respect the Principle of Least Privilege in Responses Enforcing permissions also shapes what your AI agent returns. For example, say your AI agent receives the query, "What are the latest results from our employee engagement survey?" The underlying knowledge base contains a page with survey results visible only to HR and executives. Even if the query perfectly matches the page's content, the agent should respond with either no result or a message indicating that the content is restricted. This means filtering retrieved documents at query time based on the current user's identity and permissions, not just when content is first synced. Retrieval-augmented generation (RAG) pipelines need this filter stage before passing context to the LLM. 4. Normalize Data Without Flattening Security Every knowledge base stores content differently, whether that's nested pages in Confluence, blocks in Notion, or articles in Zendesk. Normalizing these formats makes it easier for your AI agent to handle multiple systems. But normalization should never strip away the original permission structures. For instance, when creating a unified search index, store both the normalized text and the original system's permission metadata. Your query service can then enforce the correct rules regardless of which source system the content came from. 5. Handle Hierarchies and Inheritance Carefully Most systems allow permission inheritance, where you grant access to a top-level space, and then all child pages inherit those rights unless overridden. Your connector must understand and replicate this logic. For example, with an internal help desk AI agent, a "VPN Troubleshooting" article may inherit view rights from its parent "Network Resources" space. But if someone restricts that one article to a smaller group, your integration must override the inherited rule and enforce the more restrictive setting. 6. Test With Realistic, Complex Scenarios Permission bugs often hide in edge cases: Mixed inheritance and explicit restrictionsUsers with multiple, overlapping rolesAttachments with different permissions than their parent page Developers should build a test harness that mirrors these conditions using anonymized or synthetic data. Validate not only that your AI agent can fetch the right content, but that it never exposes restricted data, even when queried indirectly ("What did the survey results say about the marketing team?"). 7. Build for Ongoing Maintenance A secure, reliable knowledge base integration isn't a "set it and forget it" feature. It's an active part of your AI agent's architecture. Once deployed, knowledge base integrations require constant upkeep: API version changes, evolving permission models, and shifts in organizational structure. Assign ownership for monitoring and updating each connector, and automate regression tests for permission enforcement. Document your mapping between source-system roles and internal permission groups so that changes can be made confidently when needed. By giving permissions the same engineering rigor as content retrieval, you protect sensitive data and preserve trust in the system. That trust is what ultimately allows these AI agents to be embedded into the real workflows where they deliver the most value. You may be looking at the steps involved in building knowledge base connectors and wonder why they matter. When implemented well, they can transform workflows: Enterprise AI search: By integrating with a company's wiki, CRM, and file storage, a search agent can answer multi-step queries like, "What's the status of the Acme deal?" pulling from sales notes, internal strategy docs, and shared project plans. Permissions ensure that deal details remain visible only to the account team.IT help desk agent: When connected to a knowledge base, the agent can deliver precise, step-by-step troubleshooting guides to employees. If a VPN setup page is restricted to IT staff, the agent won't surface it to non-IT users.New hire onboarding bot: Integrated with the company wiki and messaging platform, an agent can answer questions about policies, teams, and tools. Each answer is filtered through the same rules that would apply if the employee searched manually. These examples work not because the AI agent "knows everything," but because it knows how to retrieve the right things for the right person at the right time. As knowledge base products become the standard for AI agents, it's critical to manage integrations in a way that prioritizes data security and trust.

By Gil Feig

Isolation Level for MongoDB Multi-Document Transactions (Strong Consistency)

Many outdated or imprecise claims about transaction isolation levels in MongoDB persist. These claims are outdated because they may be based on an old version where multi-document transactions were introduced, MongoDB 4.0, such as the old Jepsen report, and issues have been fixed since then. They are also imprecise because people attempt to map MongoDB's transaction isolation to SQL isolation levels, which is inappropriate, as the SQL Standard definitions ignore Multi-Version Concurrency Control (MVCC), utilized by most databases, including MongoDB. Martin Kleppmann has discussed this issue and provided tests to assess transaction isolation and potential anomalies. I will conduct these tests on MongoDB to explain how multi-document transactions work and avoid anomalies. I followed the structure of Martin Kleppmann's tests on PostgreSQL and ported them to MongoDB. The read isolation level in MongoDB is controlled by the Read Concern, and the "snapshot" read concern is the only one comparable to other Multi-Version Concurrency Control SQL databases, and maps to Snapshot Isolation, improperly called Repeatable Read to use the closest SQL standard term. As I test on a single-node lab, I use "majority" to show that it does more than Read Committed. The write concern should also be set to "majority" to ensure that at least one node is common between the read and write quorums. Recap on Isolation Levels in MongoDB Let me explain quickly the other isolation levels and why they cannot be mapped to the SQL standard: readConcern: { level: "local" } is sometimes compared to Uncommitted Reads because it may show a state that can be later rolled back in case of failure. However, some SQL databases may show the same behavior in some rare conditions (example here) and still call that Read CommittedreadConcern: { level: "majority" } is sometimes compared to Read Committed, because it avoids uncommitted reads. However, Read Committed was defined for wait-on-conflict databases to reduce the lock duration in two-phase locking, but MongoDB multi-document transactions use fail-on-conflict to avoid waits. Some databases consider that Read Committed can allow reads from multiple states (example here) while some others consider it must be a statement-level snapshot isolation (examples here). In a multi-shard transaction, majority may show a result from multiple states, as snapshot is the one being timeline consistent.readConcern: { level: "snapshot" } is the real equivalent to Snapshot Isolation, and prevents more anomalies than Read Committed. Some databases even call that "serializable" (example here) because the SQL standard ignores the write-skew anomaly.readConcern: { level: "linearlizable" } is comparable to serializable, but for a single document, not available for multi-document transactions, similar to many SQL databases that do not provide serializable as it reintroduces scalability the problems of read locks, that MVCC avoids. Read Committed Basic Requirements (G0, G1a, G1b, G1c) Here are some tests for anomalies typically prevented in Read Committed. I'll run them with readConcern: { level: "majority" } but keep in mind that readConcern: { level: "snapshot" } may be better if you want a consistent snapshot across multiple shards. MongoDB Prevents Write Cycles (G0) With Conflict Error JavaScript // init use test_db; db.test.drop(); db.test.insertMany([ { _id: 1, value: 10 }, { _id: 2, value: 20 } ]); // T1 const session1 = db.getMongo().startSession(); const T1 = session1.getDatabase("test_db"); session1.startTransaction({ readConcern: { level: "majority" }, writeConcern: { w: "majority" } }); // T2 const session2 = db.getMongo().startSession(); const T2 = session2.getDatabase("test_db"); session2.startTransaction({ readConcern: { level: "majority" }, writeConcern: { w: "majority" } }); T1.test.updateOne({ _id: 1 }, { $set: { value: 11 } }); T2.test.updateOne({ _id: 1 }, { $set: { value: 12 } }); In a two-phase locking database, with wait-on-conflict behavior, the second transaction would wait for the first one to avoid anomalies. However, MongoDB with transactions is fail-on-conflict and raises a retriable error to avoid the anomaly. Each transaction touched only one document, but it was declared explicitly with a session and startTransaction(), to allow multi-document transactions, and this is why we observed the fail-on-conflict behavior to let the application apply its retry logic for complex transactions. If the conflicting update was run as a single-document transaction, equivalent to an auto-commit statement, it would have used a wait-on-conflict behavior. I can test it by immediately running this while the t1 transaction is still active: JavaScript const db = db.getMongo().getDB("test_db"); print(`Elapsed time: ${ ((startTime = new Date()) && db.test.updateOne({ _id: 1 }, { $set: { value: 12 } })) && (new Date() - startTime) } ms`); Elapsed time: 72548 ms I've run the updateOne({ _id: 1 }) without an implicit transaction. It waited for the other transaction to terminate, which happened after a 60-second timeout, and then the update was successful. The first transaction that timed out is aborted: JavaScript session1.commitTransaction(); MongoServerError[NoSuchTransaction]: Transaction with { txnNumber: 2 } has been aborted. The behavior of conflict in transactions differs: wait-on-conflict for implicit single-document transactionsfail-on-conflict for explicit multiple-document transactions immediately, resulting in a transient error, without waiting, to let the application rollback and retry. MongoDB Prevents Aborted Reads (G1a) JavaScript // init use test_db; db.test.drop(); db.test.insertMany([ { _id: 1, value: 10 }, { _id: 2, value: 20 } ]); // T1 const session1 = db.getMongo().startSession(); const T1 = session1.getDatabase("test_db"); session1.startTransaction({ readConcern: { level: "majority" }, writeConcern: { w: "majority" } }); // T2 const session2 = db.getMongo().startSession(); const T2 = session2.getDatabase("test_db"); session2.startTransaction({ readConcern: { level: "majority" }, writeConcern: { w: "majority" } }); T1.test.updateOne({ _id: 1 }, { $set: { value: 101 } }); T2.test.find(); [ { _id: 1, value: 10 }, { _id: 2, value: 20 } ] session1.abortTransaction(); T2.test.find(); [ { _id: 1, value: 10 }, { _id: 2, value: 20 } ] session2.commitTransaction(); MongoDB prevents reading an aborted transaction by reading only the committed value when Read Concern is 'majority' or 'snapshot.' MongoDB Prevents Intermediate Reads (G1b) JavaScript // init use test_db; db.test.drop(); db.test.insertMany([ { _id: 1, value: 10 }, { _id: 2, value: 20 } ]); // T1 const session1 = db.getMongo().startSession(); const T1 = session1.getDatabase("test_db"); session1.startTransaction({ readConcern: { level: "majority" }, writeConcern: { w: "majority" } }); // T2 const session2 = db.getMongo().startSession(); const T2 = session2.getDatabase("test_db"); session2.startTransaction({ readConcern: { level: "majority" }, writeConcern: { w: "majority" } }); T1.test.updateOne({ _id: 1 }, { $set: { value: 101 } }); T2.test.find(); [ { _id: 1, value: 10 }, { _id: 2, value: 20 } ] The non-committed change from T1 is not visible to T2. JavaScript T1.test.updateOne({ _id: 1 }, { $set: { value: 11 } }); session1.commitTransaction(); // T1 commits T2.test.find(); [ { _id: 1, value: 10 }, { _id: 2, value: 20 } ] The committed change from T1 is still not visible to T2 because it happened after T2 started. This is different from the majority of Multi-Version Concurrency Control SQL databases. To minimize the performance impact of wait-on-conflict, they reset the read time before each statement in Read Committed, as phantom reads are allowed. They would have displayed the newly committed value with this example. MongoDB never does that; the read time is always the start of the transaction, and no phantom read anomaly happens. However, it doesn't wait to see if the conflict is resolved or must fail with a deadlock, and fails immediately to let the application retry it. MongoDB Prevents Circular Information Flow (G1c) JavaScript // init use test_db; db.test.drop(); db.test.insertMany([ { _id: 1, value: 10 }, { _id: 2, value: 20 } ]); // T1 const session1 = db.getMongo().startSession(); const T1 = session1.getDatabase("test_db"); session1.startTransaction({ readConcern: { level: "majority" }, writeConcern: { w: "majority" } }); // T2 const session2 = db.getMongo().startSession(); const T2 = session2.getDatabase("test_db"); session2.startTransaction({ readConcern: { level: "majority" }, writeConcern: { w: "majority" } }); T1.test.updateOne({ _id: 1 }, { $set: { value: 11 } }); T2.test.updateOne({ _id: 2 }, { $set: { value: 22 } }); T1.test.find({ _id: 2 }); [ { _id: 2, value: 20 } ] T2.test.find({ _id: 1 }); [ { _id: 1, value: 10 } ] session1.commitTransaction(); session2.commitTransaction(); In both transactions, the uncommitted changes are not visible to others. MongoDB Prevents Observed Transaction Vanishes (OTV) JavaScript // init use test_db; db.test.drop(); db.test.insertMany([ { _id: 1, value: 10 }, { _id: 2, value: 20 } ]); // T1 const session1 = db.getMongo().startSession(); const T1 = session1.getDatabase("test_db"); session1.startTransaction({ readConcern: { level: "majority" }, writeConcern: { w: "majority" } }); // T2 const session2 = db.getMongo().startSession(); const T2 = session2.getDatabase("test_db"); session2.startTransaction({ readConcern: { level: "majority" }, writeConcern: { w: "majority" } }); // T3 const session3 = db.getMongo().startSession(); const T3 = session3.getDatabase("test_db"); session3.startTransaction({ readConcern: { level: "majority" }, writeConcern: { w: "majority" } }); T1.test.updateOne({ _id: 1 }, { $set: { value: 11 } }); T1.test.updateOne({ _id: 2 }, { $set: { value: 19 } }); T2.test.updateOne({ _id: 1 }, { $set: { value: 12 } }); MongoServerError[WriteConflict]: Caused by :: Write conflict during plan execution and yielding is disabled. :: Please retry your operation or multi-document transaction. This anomaly is prevented by fail-on-conflict with an explicit transaction. With an implicit single-document transaction, it would have to wait for the conflicting transaction to end. MongoDB Prevents Predicate-Many-Preceders (PMP) With a SQL database, this anomaly would require the Snapshot Isolation level because Read Committed uses different read times per statement. However, I can show it in MongoDB with 'majority' read concern, 'snapshot' being required only to get cross-shard snapshot consistency. JavaScript // init use test_db; db.test.drop(); db.test.insertMany([ { _id: 1, value: 10 }, { _id: 2, value: 20 } ]); // T1 const session1 = db.getMongo().startSession(); const T1 = session1.getDatabase("test_db"); session1.startTransaction({ readConcern: { level: "majority" }, writeConcern: { w: "majority" } }); // T2 const session2 = db.getMongo().startSession(); const T2 = session2.getDatabase("test_db"); session2.startTransaction({ readConcern: { level: "majority" }, writeConcern: { w: "majority" } }); T1.test.find({ value: 30 }).toArray(); [] T2.test.insertOne( { _id: 3, value: 30 } ); session2.commitTransaction(); T1.test.find({ value: { $mod: [3, 0] } }).toArray(); [] The newly inserted row is not visible because it was committed by T2 after the start of T1. Martin Kleppmann's tests include some variations with a delete statement and a write predicate: JavaScript // init use test_db; db.test.drop(); db.test.insertMany([ { _id: 1, value: 10 }, { _id: 2, value: 20 } ]); // T1 const session1 = db.getMongo().startSession(); const T1 = session1.getDatabase("test_db"); session1.startTransaction({ readConcern: { level: "majority" }, writeConcern: { w: "majority" } }); // T2 const session2 = db.getMongo().startSession(); const T2 = session2.getDatabase("test_db"); session2.startTransaction({ readConcern: { level: "majority" }, writeConcern: { w: "majority" } }); T1.test.updateMany({}, { $inc: { value: 10 } }); T2.test.deleteMany({ value: 20 }); MongoServerError[WriteConflict]: Caused by :: Write conflict during plan execution and yielding is disabled. :: Please retry your operation or multi-document transaction. As it is an explicit transaction, rather than blocking, the delete detects the conflict and raises a retriable exception to prevent the anomaly. Compared to PostgreSQL, which prevents that in Repeatable Read, it saves the waiting time before failure, but requires the application to implement a retry logic. MongoDB Prevents Lost Update (P4) JavaScript // init use test_db; db.test.drop(); db.test.insertMany([ { _id: 1, value: 10 }, { _id: 2, value: 20 } ]); // T1 const session1 = db.getMongo().startSession(); const T1 = session1.getDatabase("test_db"); session1.startTransaction({ readConcern: { level: "majority" }, writeConcern: { w: "majority" } }); // T2 const session2 = db.getMongo().startSession(); const T2 = session2.getDatabase("test_db"); session2.startTransaction({ readConcern: { level: "majority" }, writeConcern: { w: "majority" } }); T1.test.find({ _id: 1 }); [ { _id: 1, value: 10 } ] T2.test.find({ _id: 1 }); [ { _id: 1, value: 10 } ] T1.test.updateOne({ _id: 1 }, { $set: { value: 11 } }); T2.test.updateOne({ _id: 1 }, { $set: { value: 11 } }); MongoServerError[WriteConflict]: Caused by :: Write conflict during plan execution and yielding is disabled. :: Please retry your operation or multi-document transaction. As it is an explicit transaction, the update doesn't wait and raises a retriable exception, so that it is impossible to overwrite the other update without waiting for its completion. MongoDB Prevents Read Skew (G-single) JavaScript // init use test_db; db.test.drop(); db.test.insertMany([ { _id: 1, value: 10 }, { _id: 2, value: 20 } ]); // T1 const session1 = db.getMongo().startSession(); const T1 = session1.getDatabase("test_db"); session1.startTransaction({ readConcern: { level: "majority" }, writeConcern: { w: "majority" } }); // T2 const session2 = db.getMongo().startSession(); const T2 = session2.getDatabase("test_db"); session2.startTransaction({ readConcern: { level: "majority" }, writeConcern: { w: "majority" } }); T1.test.find({ _id: 1 }); [ { _id: 1, value: 10 } ] T2.test.find({ _id: 1 }); [ { _id: 1, value: 10 } ] T2.test.find({ _id: 2 }); [ { _id: 2, value: 20 } ] T2.test.updateOne({ _id: 1 }, { $set: { value: 12 } }); T2.test.updateOne({ _id: 2 }, { $set: { value: 18 } }); session2.commitTransaction(); T1.test.find({ _id: 2 }); [ { _id: 2, value: 20 } ] In SQL databases with Read Committed isolation, a read skew anomaly could display the value 18. However, MongoDB avoids this issue by reading the same value of 20 consistently throughout the transaction, as it reads data as of the start of the transaction. Martin Kleppmann's tests include a variation with predicate dependency: JavaScript // init use test_db; db.test.drop(); db.test.insertMany([ { _id: 1, value: 10 }, { _id: 2, value: 20 } ]); // T1 const session1 = db.getMongo().startSession(); const T1 = session1.getDatabase("test_db"); session1.startTransaction({ readConcern: { level: "majority" }, writeConcern: { w: "majority" } }); // T2 const session2 = db.getMongo().startSession(); const T2 = session2.getDatabase("test_db"); session2.startTransaction({ readConcern: { level: "majority" }, writeConcern: { w: "majority" } }); T1.test.findOne({ value: { $mod: [5, 0] } }); { _id: 1, value: 10 } T2.test.updateOne({ value: 10 }, { $set: { value: 12 } }); session2.commitTransaction(); T1.test.find({ value: { $mod: [3, 0] } }).toArray(); [] The uncommitted value 12 which is a multiple of 3 is not visible to the transaction that started before. Another test includes a variation with a write predicate in a delete statement: JavaScript // init use test_db; db.test.drop(); db.test.insertMany([ { _id: 1, value: 10 }, { _id: 2, value: 20 } ]); // T1 const session1 = db.getMongo().startSession(); const T1 = session1.getDatabase("test_db"); session1.startTransaction({ readConcern: { level: "majority" }, writeConcern: { w: "majority" } }); // T2 const session2 = db.getMongo().startSession(); const T2 = session2.getDatabase("test_db"); session2.startTransaction({ readConcern: { level: "majority" }, writeConcern: { w: "majority" } }); T1.test.find({ _id: 1 }); [ { _id: 1, value: 10 } ] T2.test.find(); [ { _id: 1, value: 10 }, { _id: 2, value: 20 } ] T2.test.updateOne({ _id: 1 }, { $set: { value: 12 } }); T2.test.updateOne({ _id: 2 }, { $set: { value: 18 } }); session2.commitTransaction(); T1.test.deleteMany({ value: 20 }); MongoServerError[WriteConflict]: Caused by :: Write conflict during plan execution and yielding is disabled. :: Please retry your operation or multi-document transaction. This read skew anomaly is prevented by the fail-on-conflict behavior when writing a document that has uncommitted changes from another transaction. Write Skew (G2-item) Must Be Managed by the Application JavaScript // init use test_db; db.test.drop(); db.test.insertMany([ { _id: 1, value: 10 }, { _id: 2, value: 20 } ]); // T1 const session1 = db.getMongo().startSession(); const T1 = session1.getDatabase("test_db"); session1.startTransaction({ readConcern: { level: "majority" }, writeConcern: { w: "majority" } }); // T2 const session2 = db.getMongo().startSession(); const T2 = session2.getDatabase("test_db"); session2.startTransaction({ readConcern: { level: "snapshot" }, writeConcern: { w: "majority" } }); T1.test.find({ _id: { $in: [1, 2] } }) [ { _id: 1, value: 10 }, { _id: 2, value: 20 } ] T2.test.find({ _id: { $in: [1, 2] } }) [ { _id: 1, value: 10 }, { _id: 2, value: 20 } ] T2.test.updateOne({ _id: 1 }, { $set: { value: 11 } }); T2.test.updateOne({ _id: 2 }, { $set: { value: 21 } }); session1.commitTransaction(); session2.commitTransaction(); MongoDB doesn't detect the read/write conflict when one transaction has read a value updated by the other, and then writes something that may have depended on this value. The Read Concern doesn't provide the Serializable guarantee. Such isolation requires acquiring range or predicate locks during reads, and doing it prematurely would hinder the performance of a database designed to scale. For the transactions that need to avoid this, the application can transform the read/write conflict to a write/write conflict by updating a field in the document that was read to be sure that other transactions do not modify it. Or re-check the value when updating. Anti-Dependency Cycles (G2) Must Be Managed by the Application JavaScript // init use test_db; db.test.drop(); db.test.insertMany([ { _id: 1, value: 10 }, { _id: 2, value: 20 } ]); // T1 const session1 = db.getMongo().startSession(); const T1 = session1.getDatabase("test_db"); session1.startTransaction({ readConcern: { level: "snapshot" }, writeConcern: { w: "majority" } }); // T2 const session2 = db.getMongo().startSession(); const T2 = session2.getDatabase("test_db"); session2.startTransaction({ readConcern: { level: "snapshot" }, writeConcern: { w: "majority" } }); T1.test.find({ value: { $mod: [3, 0] } }).toArray(); [] T2.test.find({ value: { $mod: [3, 0] } }).toArray(); [] T1.test.insertOne( { _id: 3, value: 30 } ); T1.test.insertOne( { _id: 4, value: 42 } ); session1.commitTransaction(); session2.commitTransaction(); T1.test.find({ value: { $mod: [3, 0] } }).toArray(); [ { _id: 3, value: 30 }, { _id: 4, value: 42 } ] The read/write conflict was not detected, and both transactions were able to write, even if they may have depended on a previous read that had been modified by the other transaction. MongoDB does not acquire locks across read and write calls. If you run a multi-document transaction where the writes depend on the reads, the application must explicitly write to the read set in order to detect the write conflict and avoid the anomaly. All those tests were based on https://github.com/ept/hermitage. There's a lot of information about MongoDB transactions in the MongoDB Multi-Document ACID Transactions whitepaper from 2020. While the document model offers simplicity and performance when a single document matches the business transaction, MongoDB supports multi-statement transactions with Snapshot Isolation, similar to many SQL databases using Multi-Version Concurrency Control (MVCC), but favoring fail-on-conflict rather than wait. Despite outdated myths surrounding NoSQL or based on old versions, its transaction implementation is robust and effectively prevents common transactional anomalies.

By Franck Pachot

Spring Boot WebSocket: Building a Multichannel Chat in Java

As you may have already guessed from the title, the topic for today will be Spring Boot WebSockets. Some time ago, I provided an example of WebSocket chat based on Akka toolkit libraries. However, this chat will have somewhat more features, and a quite different design. I will skip some parts so as not to duplicate too much content from the previous article. Here you can find a more in-depth intro to WebSockets. Please note that all the code that’s used in this article is also available in the GitHub repository. Spring Boot WebSocket: Tools Used Let’s start the technical part of this text with a description of the tools that will be further used to implement the whole application. As I cannot fully grasp how to build a real WebSocket API with classic Spring STOMP overlay, I decided to go for Spring WebFlux and make everything reactive. Spring Boot – No modern Java app based on Spring can exist without Spring Boot; all the autoconfiguration is priceless.Spring WebFlux – A reactive version of classic Spring, provides quite a nice and descriptive toolkit for handling both WebSockets and REST. I would dare to say that it is the only way to actually get WebSocket support in Spring.Mongo – One of the most popular NoSQL databases, I am using it for storing message history.Spring Reactive Mongo – Spring Boot starter for handling Mongo access in a reactive fashion. Using reactive in one place but not the other is not the best idea. Thus, I decided to make DB access reactive as well. Let’s start the implementation! Spring Boot WebSocket: Implementation Dependencies and Config pom.xml XML <dependencies>  <dependency> <groupId>org.springframework.boot</groupId> <artifactId>spring-boot-starter-webflux</artifactId> </dependency> <dependency> <groupId>org.springframework.boot</groupId> <artifactId>spring-boot-starter-data-mongodb-reactive</artifactId> </dependency> </dependencies> application.properties Properties files spring.data.mongodb.uri=mongodb://chats-admin:admin@localhost:27017/chats I prefer .properties over .yml — In my honest opinion, YAML is not readable and non-maintainable on a larger scale. WebSocketConfig Java @Configuration class WebSocketConfig { @Bean ChatStore chatStore(MessagesStore messagesStore) { return new DefaultChatStore(Clock.systemUTC(), messagesStore); } @Bean WebSocketHandler chatsHandler(ChatStore chatStore) { return new ChatsHandler(chatStore); } @Bean SimpleUrlHandlerMapping handlerMapping(WebSocketHandler wsh) { Map<String, WebSocketHandler> paths = Map.of("/chats/{id}", wsh); return new SimpleUrlHandlerMapping(paths, 1); } @Bean WebSocketHandlerAdapter webSocketHandlerAdapter() { return new WebSocketHandlerAdapter(); } } And surprise, all four beans defined here are very important. ChatStore – Custom bean for operating on chats, I will go into more details on this bean in the following steps.WebSocketHandler – Bean that will store all the logic related to handling WebSocket sessions.SimpleUrlHandlerMapping – Responsible for mapping URLs to correct handler full URL for this one will look more or less like this ws://localhost:8080/chats/{id}.WebSocketHandlerAdapter – A kind of capability bean it adds WebSockets handling support to Spring Dispatcher Servlet. ChatsHandler Java class ChatsHandler implements WebSocketHandler { private final Logger log = LoggerFactory.getLogger(ChatsHandler.class); private final ChatStore store; ChatsHandler(ChatStore store) { this.store = store; } @Override public Mono handle(WebSocketSession session) { String[] split = session.getHandshakeInfo() .getUri() .getPath() .split("/"); String chatIdStr = split[split.length - 1]; int chatId = Integer.parseInt(chatIdStr); ChatMeta chatMeta = store.get(chatId); if (chatMeta == null) { return session.close(CloseStatus.GOING_AWAY); } if (!chatMeta.canAddUser()) { return session.close(CloseStatus.NOT_ACCEPTABLE); } String sessionId = session.getId(); store.addNewUser(chatId, session); log.info("New User {} join the chat {}", sessionId, chatId); return session .receive() .map(WebSocketMessage::getPayloadAsText) .flatMap(message -> store.addNewMessage(chatId, sessionId, message)) .flatMap(message -> broadcastToSessions(sessionId, message, store.get(chatId).sessions()) .doFinally(sig -> store.removeSession(chatId, session.getId())) .then(); } private Mono broadcastToSessions(String sessionId, String message, List sessions) { return sessions .stream() .filter(session -> !session.getId().equals(sessionId)) .map(session -> session.send(Mono.just(session.textMessage(message)))) .reduce(Mono.empty(), Mono::then); } } As I mentioned above, here you can find all the logic related to handling WebSocket sessions. First, we parse the ID of a chat from the URL to get the target chat. Responding with different statuses depends on the context present for a particular chat. Additionally, I am also broadcasting the message to all the sessions related to particular chat — for users to actually exchange the messages. I have also added doFinally trigger that will clear closed sessions from the chatStore, to reduce redundant communication. As a whole, this code is reactive; there are some restrictions I need to follow. I have tried to make it as simple and readable as possible, if you have any idea how to improve it I am open. ChatsRouter Java @Configuration(proxyBeanMethods = false) class ChatRouter { private final ChatStore chatStore; ChatRouter(ChatStore chatStore) { this.chatStore = chatStore; } @Bean RouterFunction routes() { return RouterFunctions .route(POST("api/v1/chats/create"), e -> create(false)) .andRoute(POST("api/v1/chats/create-f2f"), e -> create(true)) .andRoute(GET("api/v1/chats/{id}"), this::get) .andRoute(DELETE("api/v1/chats/{id}"), this::delete); } } WebFlux's approach to defining REST endpoints is quite different from the classic Spring. Above, you can see the definition of 4 endpoints for managing chats. As similar as in the case of Akka implementation I want to have a REST API for managing Chats and WebSocket API for actual handling chats. I will skip the function implementations as they are pretty trivial; you can see them on GitHub. ChatStore First, the interface: Java public interface ChatStore { int create(boolean isF2F); void addNewUser(int id, WebSocketSession session); Mono addNewMessage(int id, String userId, String message); void removeSession(int id, String session); ChatMeta get(int id); ChatMeta delete(int id); Then the implementation: Java public class DefaultChatStore implements ChatStore { private final Map<Integer, ChatMeta> chats; private final AtomicInteger idGen; private final MessagesStore messagesStore; private final Clock clock; public DefaultChatStore(Clock clock, MessagesStore store) { this.chats = new ConcurrentHashMap<>(); this.idGen = new AtomicInteger(0); this.clock = clock; this.messagesStore = store; } @Override public int create(boolean isF2F) { int newId = idGen.incrementAndGet(); ChatMeta chatMeta = chats.computeIfAbsent(newId, id -> { if (isF2F) { return ChatMeta.ofId(id); } return ChatMeta.ofIdF2F(id); }); return chatMeta.id; } @Override public void addNewUser(int id, WebSocketSession session) { chats.computeIfPresent(id, (k, v) -> v.addUser(session)); } @Override public void removeSession(int id, String sessionId) { chats.computeIfPresent(id, (k, v) -> v.removeUser(sessionId)); } @Override public Mono addNewMessage(int id, String userId, String message) { ChatMeta meta = chats.getOrDefault(id, null); if (meta != null) { Message messageDoc = new Message(id, userId, meta.offset.getAndIncrement(), clock.instant(), message); return messagesStore.save(messageDoc) .map(Message::getContent); } return Mono.empty(); } // omitted The base of ChatStore is the ConcurrentHashMap that holds the metadata of all open chats. Most of the methods from the interface are self-explanatory, and there is nothing special behind them. create – Creates a new chat with a bool attribute denoting if the chat is f2f or group.addNewUser – Adds a new user to existing chats.removeUser – Removes a user from the existing chat.get – Gets the metadata of a chat with an ID.delete – Deletes the chat from CMH. The only complex method here is addNewMessages. It increments the message counter within the chat and persists message content in MongoDB, for durability. MongoDB Message Entity Java public class Message { @Id private String id; private int chatId; private String owner; private long offset; private Instant timestamp; private String content; A model for message content stored in a database, there are three important fields here: chatId – Represent chat in which a particular message was sent.ownerId – The userId of the message sender.offset – Ordinal number of message within the chat, for retrieval ordering. MessageStore Java public interface MessagesStore extends ReactiveMongoRepository<Message, String> {} Nothing special, classic Spring Repository, but in a reactive fashion, provides the same set of features as JpaRepository. It is used directly in ChatStore. Additionally, in the main application class, WebsocketsChatApplication, I am activating reactive repositories by using @EnableReactiveMongoRepositories. Without this annotation messageStore from above would not work. And here we go, we have the whole chat implemented. Let’s test it! Spring Boot WebSocket: Testing For tests, I’m using Postman and Simple WebSocket Client. I’m creating a new chat using Postman. In the response body, I got a WebSocket URL to the recently created chat. Now it is time to use them and check if users can communicate with one another. Simple Web Socket Client comes into play here. Thus, I am connecting to the newly created chat here. Here we are, everything is working, and users can communicate with each other. There is one last thing to do. Let’s spend a moment looking at things that can be done better. What Can Be Done Better As what I have just built is the most basic chat app, there are a few (or in fact quite a lot) things that may be done better. Below, I have listed the things I find worthy of improvement: Authentication and rejoining support – Right now, everything is based on the sessionId. It is not an optimal approach. It would be better to have some authentication in place and actual rejoining based on user data.Sending attachments – For now, the chat only supports simple text messages. While texting is the basic function of a chat, users enjoy exchanging images and audio files, too.Tests – There are no tests for now, but why leave it like this? Tests are always a good idea.Overflow in offset – Currency, it is a simple int. If we were to track the offset for a very long time, it would overflow sooner or later. Summary Et voilà! The Spring Boot WebSocket chat is implemented, and the main task is done. You have some ideas on what to develop in the next steps. Please keep in mind that this chat case is very simple, and it will require lots of changes and development for any type of commercial project. Anyway, I hope that you learned something new while reading this article. Thank you for your time. These other resources might interest you: Lock-Free Programming in Java7 API Integration Patterns

By Bartłomiej Żyliński

CORE

Your SDLC Has an Evil Twin — and AI Built It

You think you know your SDLC like the back of your carpal-tunnel-riddled hand: You've got your gates, your reviews, your carefully orchestrated dance of code commits and deployment pipelines. But here's a plot twist straight out of your auntie's favorite daytime soap: there's an evil twin lurking in your organization (cue the dramatic organ music). It looks identical to your SDLC — same commits, same repos, the same shiny outputs flowing into production. But this fake-goatee-wearing doppelgänger plays by its own rules, ignoring your security governance and standards. Welcome to the shadow SDLC — the one your team built with AI when you weren't looking: It generates code, dependencies, configs, and even tests at machine speed, but without any of your governance, review processes, or security guardrails. Checkmarx’s August Future of Application Security report, based on a survey of 1,500 CISOs, AppSec managers, and developers worldwide, just pulled back the curtain on this digital twin drama: 34% of developers say more than 60% of their code is now AI-generated. Only 18% of organizations have policies governing AI use in development. 26% of developers admit AI tools are being used without permission. It’s not just about insecure code sneaking into production, but rather about losing ownership of the very processes you’ve worked to streamline. Your “evil twin” SDLC comes with: Unknown provenance → You can’t always trace where AI-generated code or dependencies came from. Inconsistent reliability → AI may generate tests or configs that look fine but fail in production. Invisible vulnerabilities → Flaws that never hit a backlog because they bypass reviews entirely. This isn’t a story about AI being “bad”, but about AI moving faster than your controls — and the risk that your SDLC’s evil twin becomes the one in charge. The rest of this article is about how to prevent that. Specifically: How the shadow SDLC forms (and why it’s more than just code)The unique risks it introduces to security, reliability, and governanceWhat you can do today to take back ownership — without slowing down your team How the Evil Twin SDLC Emerges The evil twin isn’t malicious by design — it’s a byproduct of AI’s infiltration into nearly every stage of development: Code creation – AI writes large portions of your codebase at scale. Dependencies – AI pulls in open-source packages without vetting versions or provenance. Testing – AI generates unit tests or approves changes that may lack rigor. Configs and infra – AI auto-generates Kubernetes YAMLs, Dockerfiles, Terraform templates. Remediation – AI suggests fixes that may patch symptoms while leaving root causes. The result is a pipeline that resembles your own — but lacks the data integrity, reliability, and governance you’ve spent years building. Sure, It’s a Problem. But Is It Really That Bad? You love the velocity that AI provides, but this parallel SDLC compounds risk by its very nature. Unlike human-created debt, AI can replicate insecure patterns across dozens of repos in hours. And the stats from the FOA report speak for themselves: 81% of orgs knowingly ship vulnerable code — often to meet deadlines. 33% of developers admit they “hope vulnerabilities won’t be discovered” before release. 98% of organizations experienced at least one breach from vulnerable code in the past year — up from 91% in 2024 and 78% in 2023. The share of orgs reporting 4+ breaches jumped from 16% in 2024 to 27% in 2025. That surge isn’t random. It correlates with the explosive rise of AI use in development. As more teams hand over larger portions of code creation to AI without governance, the result is clear: risk is scaling at machine speed, too. Taking Back Control From the Evil Twin You can’t stop AI from reshaping your SDLC. But you can stop it from running rogue. Here’s how: 1. Establish Robust Governance for AI in Development Whitelist approved AI tools with built-in scanning and keep a lightweight approval workflow so devs don’t default to Shadow AI. Enforce provenance standards like SLSA or SBOMs for AI-generated code. Audit usage & tag AI contributions — use CodeQL to detect AI-generated code patterns and require devs to mark AI commits for transparency. This builds reliability and integrity into the audit trail. 2. Strengthen Supply Chain Oversight AI assistants are now pulling in OSS dependencies you didn’t choose — sometimes outdated, sometimes insecure, sometimes flat-out malicious. While your team already uses hygiene tools like Dependabot or Renovate, they’re only table stakes that don’t provide governance. They won’t tell you if AI just pulled in a transitive package with a critical vulnerability, or if your dependency chain is riddled with license risks. That’s why modern SCA is essential in the AI era. It goes beyond auto-bumping versions to: Generate SBOMs for visibility into everything AI adds to your repos. Analyze transitive dependencies several layers deep. Provide exploitable-path analysis so you prioritize what’s actually risky. Auto-updaters are hygiene. SCA is resilience. 3. Measure and Manage Debt Velocity Track debt velocity — measure how fast vulnerabilities are introduced and fixed across repos. Set sprint-based SLAs — if issues linger, AI will replicate them across projects before you’ve logged the ticket. Flag AI-generated commits for extra review to stop insecure patterns from multiplying. Adopt Agentic AI AppSec Assistants — The FOA report highlights that traditional remediation cycles can’t keep pace with machine-speed risk, making autonomous prevention and real-time remediation a necessity, not a luxury. 4. Foster a Culture of Reliable AI Use Train on AI risks like data poisoning and prompt injection. Make secure AI adoption part of the “definition of done.” Align incentives with delivery, not just speed. Create a reliable feedback loop — encourage devs to challenge governance rules that hurt productivity. Collaboration beats resistance. 5. Build Resilience for Legacy Systems Legacy apps are where your evil twin SDLC hides best. With years of accumulated debt and brittle architectures, AI-generated code can slip in undetected. These systems were built when cyber threats were far less sophisticated, lacking modern security features like multi-factor authentication, advanced encryption, and proper access controls. When AI is bolted onto these antiquated platforms, it doesn't just inherit the existing vulnerabilities, but can rapidly propagate insecure patterns across interconnected systems that were never designed to handle AI-generated code. The result is a cascade effect where a single compromised AI interaction can spread through poorly-secured legacy infrastructure faster than your security team can detect it. Here’s what’s often missed: Manual before automatic: Running full automation on legacy repos without a baseline can drown teams in false positives and noise. Start with manual SBOMs on the most critical apps to establish trust and accuracy, then scale automation. Triage by risk, not by age: Not every legacy system deserves equal attention. Prioritize repos with heavy AI use, repeated vulnerability patterns, or high business impact. Hybrid skills are mandatory: Devs need to learn how to validate AI-generated changes in legacy contexts, because AI doesn’t “understand” old frameworks. A dependency bump that looks harmless in 2025 might silently break a 2012-era API. Conclusion: Bring the ‘Evil Twin’ Back into the Family The “evil twin” of your SDLC isn’t going away. It’s already here, writing code, pulling dependencies, and shaping workflows. The question is whether you’ll treat it as an uncontrolled shadow pipeline — or bring it under the same governance and accountability as your human-led one. Because in today’s environment, you don’t just own the SDLC you designed. You also own the one AI is building — whether you control it or not. Interested to learn more about SDLC challenges in 2025 and beyond? More stats and insights are available in the Future of Appsec report mentioned above.

By Eran Kinsbruner