DZone Spotlight

Thursday, January 30 View All Articles »

Using Spring AI to Generate Images With OpenAI's DALL-E 3

By Danil Temnikov

Hi, community! This is my first article in a series of introductions to Spring AI. Today, we will see how we can easily generate pictures using text prompts. To achieve this, we will leverage the OpenAI API and the DALL-E 3 model. In this article, I'll skip the explanation of some basic Spring concepts like bean management, starters, etc, as the main goal of this article is to discover Spring AI capabilities. For the same reason, I'll not create detailed instructions on how to generate an OpenAI API key. Prerequisite If you don't have an active OpenAI API key, do the following steps: Create an account on OpenAI.Generate the token on the API Keys page. Step 1: Set Up a Project To quickly generate a project template with all necessary dependencies, you may use https://start.spring.io/. In my example, I'll use Java 17 and Spring Boot 3.4.1. We also need to include the following dependencies: Spring WEB: This dependency will allow us to create a web server and expose REST endpoints as entry points to our applicationOpenAI: This dependency provides us with smooth integration with Open AI just by writing a couple lines of code and a few lines of configurations. After clicking generate, open downloaded files in the IDE you are working on and validate that all necessary dependencies exist in pom.xml. XML <dependency> <groupId>org.springframework.boot</groupId> <artifactId>spring-boot-starter-web</artifactId> </dependency> <dependency> <groupId>org.springframework.ai</groupId> <artifactId>spring-ai-openai-spring-boot-starter</artifactId> </dependency> Step 2: Set Up a Configuration File As a next step, we need to configure our property file. By default, Spring uses application.yaml or application.properties file. In this example, I'm going to use yaml format. You may reformat the code into .properties if you feel more comfortable working with this format. Here are all the configs we need to add to the application.yaml file: YAML spring: ai: openai: api-key: [your OpenAI api key] image: options: model: dall-e-3 size: 1024x1024 style: vivid quality: standard response-format: url Model: We are going to use the dall-e-3 model as the only available model in Spring AI at the moment of writing this article.Size: Configures the size of the generated image. It must be one of 1024x1024, 1792x1024, or 1024x1792 for dall-e-3 modelStyle: The vivid style generates more hyper-realistic images. If you want your pictures to look more real, set value natural.Quality: Might be one out of two options: standard or HD.Response-format: Might be one out of two options: url and b64_json. I'll be using the URL for demo purposes and simplicity. The image will be available by URL one hour after generation. Step 3: Create ImageGenerationService Let's create a service that will be responsible for generating images. Java @Service public class ImageGenerationService { @Autowired ImageModel imageModel; } We created a new class and annotated it as a Service. We also autowired the ImageModel bean. ImageModel is the main interface used to generate pictures. As we provided all the necessary configurations in Step 2, Spring Boot Starter will automatically generate an implementation of this interface called OpenAiImageModel for us. When our class is configured, we may start implementing a method that calls an OpenAI API to generate pictures using our prompts. And this is where the real magic of Spring AI will happen. Let's take a look at it. Java public String generateImage(String prompt) { ImagePrompt imagePrompt = new ImagePrompt(prompt); ImageResponse imageResponse = imageModel.call(imagePrompt); return imageResponse.getResult().getOutput().getUrl(); } That's it! We just need three lines of code to actually generate an image with Spring AI. Isn't that amazing? In the first step, we created a new ImagePrompt just by providing a string prompt. Next, we made an API call using imageModel.call(imagePrompt) and stored the response in the ImageResponse variable. In the last step, we returned the URL of the generated image. Remember, the image is only available for one hour; after that, the link will not be available anymore. So don't forget to save your masterpiece! Step 4: Create ImageGenerationController to Run Our Code We need to create a last file to allow users to execute our integration. It may look like this: Java @RestController() @RequestMapping("/image") public class ImageGenerationController { @Autowired ImageGenerationService imageService; @GetMapping("/generate") public ResponseEntity<String> generateImage(@RequestParam String prompt) { return ResponseEntity.ok(imageService.generateImage(prompt)); } } As you can see, we just created a simple controller with just one GET endpoint inside. This endpoint will be available at localhost:8080/image/generate. Step 5. Run Our Application To start our application, we need to run the following command: Plain Text mvn spring-boot:run When the application is running, we may check the result by executing the following curl with any prompt you want. I used the following: Cute cat playing chess. Don't forget to add %20 instead of whitespaces if you are using the command line for calling your endpoint: Shell curl -X GET "http://localhost:8080/image/generate?prompt=Cute%20cat%20playing%20chess" After executing, wait a few seconds as it takes some time for OpenAI to generate your image and voila: Congratulations! You've just created and tested your first Spring AI application, which generates images using custom prompts! Step 6: Give More Flexibility in Generating Images (Optional) In the second step, we configured the default behavior to our model and provided all the necessary configurations in the application.yaml file. But can we give more flexibility to our users and let them provide their configurations? The answer is yes! To do this, we need to use the ImageOptions interface. Here is an example: Java public String generateImage(GenerateImageRequest imageRequest) { ImageOptions options = OpenAiImageOptions.builder() .withQuality("standard") .withStyle("vivid") .withHeight(1024) .withWidth(1024) .withResponseFormat("url") .build(); ImagePrompt imagePrompt = new ImagePrompt(imageRequest.getPrompt(), options); ImageResponse imageResponse = imageModel.call(imagePrompt); return imageResponse.getResult().getOutput().getUrl(); } To achieve this, we need to build options programmatically using all the configs we set up in application.yaml and provide these options when creating an ImagePrompt object. You may find more configuration options in Spring AI docs. Conclusion Spring AI is a great tool that helps developers smoothly integrate with different AI models. As of writing this article, Spring AI supports five image models, including but not limited to Azure AI and Stability. I hope you found this article helpful and that it will inspire you to explore Spring AI more deeply. More

Stop Shipping Waste: Fix Your Product Backlog

By Stefan Wolpers

CORE

TL; DR: Stop Shipping Waste When product teams fail to establish stakeholder alignment and implement rigorous Product Backlog management, they get caught in an endless cycle of competing priorities, reactive delivery, and shipping waste. The result? Wasted resources, frustrated teams, and missed business opportunities. Success in 2025 requires turning your Product Backlog from a chaotic wish list into a strategic tool that connects vision to value delivery. Learn how to do so. Two Systemic Failures Leading to Shipping Waste Product management is a balancing act. Teams must manage customer needs, stakeholder expectations, technical constraints, and business goals while delivering measurable outcomes. Yet, despite their best intentions, many product teams fall short. Why? Two pervasive issues often lie at the root of this failure: A lack of alignment and a broken Product Backlog Management process. Let’s unpack why these failures matter — and how overcoming them can transform your team’s impact. (And, possibly, your career!) Failure #1: The Alignment Gap Imagine a scenario: Developers build features stakeholders think customers want, only to discover post-launch that the solution misses the mark. Sales teams push for one priority, product leadership has a different idea, engineering advocates for another, and executives demand faster timelines. The result? Often wasted efforts, frustrated teams, disappointed customers, and missed business objectives. Misalignment isn’t just inconvenient — it’s costly. When stakeholders operate in silos, ignoring the benefits of product leadership, product teams lose sight of the “why” behind their work. Product roadmaps become wish lists, product strategy feels disconnected from execution, and collaboration dissolves into competing agendas. Without shared ownership of priorities, even the most talented product teams struggle to deliver meaningful outcomes. The Fix Alignment isn’t about enforcing consensus — it’s about creating clarity. Teams need frameworks to connect product vision to daily work. Tools like user story mapping, outcome-focused roadmaps, and structured stakeholder workshops can bridge gaps. Moreover, by frequently integrating customer insights and data with business objectives, product teams foster collaboration, ensuring everyone rallies behind the same objectives. Failure #2: The Backlog Black Hole The Product Backlog is meant to be a strategic asset. Yet, for many teams, it’s an overwhelming, chaotic list of tasks — a “black hole” where ideas go to die. Common symptoms of dysfunctional Product Backlogs include: Endless, low-value items drowning critical priorities.Stakeholders bypassing processes to demand urgent work.Teams stuck in reactive mode, shipping outputs without measurable impact. A poorly managed backlog erodes trust. Stakeholders see delays and confusion; product teams feel overwhelmed by shifting demands. Worse, without transparency, the backlog becomes a source of conflict rather than a tool for value delivery. The Fix Effective Product Backlog management requires rigor and strategy. Teams need processes to prioritize ruthlessly, validate assumptions, and align backlog items with customer and business outcomes. Techniques like weighted scoring, value vs. effort analysis, and anti-pattern identification can transform Product Backlogs into dynamic, transparent tools. The Cost of Ignoring These Failures When alignment and Product Backlog Management break down, the consequences ripple across organizations: Lost opportunities: Teams waste precious capacity on low-impact work while competitors innovate.Stagnant careers: Product leaders lose credibility when they can’t articulate progress or outcomes.Cultural erosion: Misalignment breeds frustration, burnout, and attrition. But teams that address these challenges unlock transformative results. They ship solutions customers love (and contribute to the bottom line), build stakeholder trust, and create cultures where collaboration thrives. Why This Matters for Your Career Let me be blunt: The market doesn’t need more Product Owners who “manage” Product Backlogs. It requires product leaders who wield them strategically. When you master alignment and backlog rigor, you stop being seen as a “task coordinator” and become the person who delivers results. The product teams that thrive in 2025 and beyond will: Ship solutions customers love, not just tolerate.Turn stakeholders into collaborators, not critics.Use the backlog to drive decisions, not document them. This isn’t about process — it’s about impact. Conclusion Product teams often struggle with misalignment and chaotic Product Backlogs, leading to wasted effort, frustrated teams, and missed opportunities. By addressing these issues, teams can turn their Product Backlog into a strategic tool that drives value and aligns everyone around a shared vision. Success comes from fostering clarity and collaboration, prioritizing customer-centric decisions, and implementing rigorous Product Backlog management. Teams that embrace these principles will ship solutions customers love, build trust, and create a culture of accountability. For product leaders, this is a chance to elevate your career. Master alignment and backlog management to become a strategic leader who delivers measurable outcomes. Stop shipping waste and start delivering value. More

Trend Report

Observability and Performance

The dawn of observability across the software ecosystem has fully disrupted standard performance monitoring and management. Enhancing these approaches with sophisticated, data-driven, and automated insights allows your organization to better identify anomalies and incidents across applications and wider systems. While monitoring and standard performance practices are still necessary, they now serve to complement organizations' comprehensive observability strategies. This year's Observability and Performance Trend Report moves beyond metrics, logs, and traces — we dive into essential topics around full-stack observability, like security considerations, AIOps, the future of hybrid and cloud-native observability, and much more.

Refcard #400

Java Application Containerization and Deployment

By Mark Heckler

Refcard #392

Software Supply Chain Security

By Justin Albano

CORE

Why You Don’t Need That New JavaScript Library

Libraries can rise to stardom in months, only to crash and fade into obscurity within months. We’ve all seen this happen in the software development world, and my own journey has been filled with “must-have” JavaScript libraries, each claiming to be more revolutionary than the one before. But over the years, I’ve come to realize that the tools we need have been with us all along, and in this article, I’ll explain why it’s worth sticking to the fundamentals, how new libraries can become liabilities, and why stable, proven solutions usually serve us best in the long run. The Allure of New Libraries I’ll be the first to admit that I’ve been seduced by shiny new libraries before. Back in 2018, I led a team overhaul of our front-end architecture. We added a number of trendy state management tools and UI component frameworks, certain that they would streamline our workflow. Our package.json ballooned with dependencies, each seemingly indispensable. At first, it felt like we were riding a wave of innovation. Then, about six months in, a pattern emerged. A few libraries became outdated; some were abandoned by their maintainers. Every time we audited our dependencies, it seemed we were juggling security patches and version conflicts far more often than we shipped new features. The headache of maintenance made one thing crystal clear: every new dependency is a promise you make to maintain and update someone else’s code. The True Cost of Dependencies When we adopt a new library, we’re not just adding functionality; we’re also taking on significant risks. Here are just some of the hidden costs that frequently go overlooked: Maintenance Overhead New libraries don’t just drop into your project and remain stable forever. They require patching for security vulnerabilities, updating for compatibility with other tools, and diligence when major releases introduce breaking changes. If you’re not on top of these updates, you risk shipping insecure or buggy code to production. Version Conflicts Even robust tools like npm and yarn can’t guarantee complete harmony among your dependencies. One library might require a specific version of a package that conflicts with another library’s requirements. Resolving these inconsistencies can be a maddening, time-consuming process. Performance Implications The size of the bundle increases a lot because of front-end libraries. One specialized library may add tens or hundreds of kilobytes to your final JavaScript payload, making it heavier, which means slower load times and worse user experiences. Security Vulnerabilities In one audit for a client recently, 60% of their app’s vulnerabilities came from third-party packages, often many layers deep in the dependency tree. Sometimes, to patch one library, multiple interdependent packages need to be updated, which is rarely an easy process. A colleague and I once had a need for a date picker for a project. The hip thing to do would have been to install some feature-rich library and quickly drop it in. Instead, we polyfilled our own lightweight date picker in vanilla JavaScript, using the native Date object. It was a fraction of the size, had zero external dependencies, and was completely ours to modify. That tiny decision spared us from possible library update headaches, conflicts, or abandonment issues months later. The Power of Vanilla JavaScript Modern JavaScript is almost unrecognizable from what it was ten years ago. Many features that previously required libraries like Lodash or Moment are now part of the language — or can be replicated with a few lines of code. For example: JavaScript // Instead of installing Lodash to remove duplicates: const uniqueItems = [...new Set(items)]; // Instead of using a library for deep cloning: const clonedObject = structuredClone(complexObject); A deep familiarity with the standard library can frequently replace entire suites of utility functions. These days, JavaScript’s built-in methods handle most common tasks elegantly, making large chunks of external code unnecessary. When to Use External Libraries None of this is to say you should never install a third-party package. The key lies in discernment — knowing when a problem is big enough or specialized enough to benefit from a well-tested, well-maintained library. For instance: Critical complexity: Frameworks like React have proven their mettle for managing complex UI states in large-scale applications.Time-to-market: Sometimes, a short-term deliverable calls for a robust, out-of-the-box solution, and it makes sense to bring in a trusted library rather than build everything from scratch.Community and maintenance: Popular libraries with long track records and active contributor communities — like D3.js for data visualization — can be safer bets, especially if they’re solving well-understood problems. The key is to evaluate the cost-benefit ratio: Can this be done with native APIs or a small custom script?Do I trust this library’s maintainer track record?Is it solving a core problem or offering only minor convenience?Will my team actually use enough of its features to justify the extra weight? Strategies for Avoiding Unnecessary Dependencies To keep your projects lean and maintainable, here are a few best practices: 1. Evaluate Built-In Methods First You’d be surprised how many tasks modern JavaScript can handle without third-party code. Spend time exploring the newer ES features, such as array methods, Map/Set, async/await, and the Intl API for localization. 2. Document Your Choices If you do bring in a new library, record your reasoning in a few sentences. State the problem it solves, the alternatives you considered, and any trade-offs. Future maintainers (including your future self) will appreciate the context if questions arise later. 3. Regular Dependency Audits Re-scan your package.json every quarter or so. Is this library still maintained? Are you really using their features? Do a small cleanup of the project for removing dead weights that would reduce the potential for security flaws. 4. Aggressive Dependency vs. DevDependency Separation Throw build tooling, testing frameworks, other non-production packages into your devDependencies. Keep your production dependency listing lean in terms of just the things that you really need to function at runtime. The Case for Core Libraries A team I recently worked with had some advanced charting and visualization requirements. Although a newer charting library promised flashy animations and out-of-the-box UI components, we decided to use D3.js, a stalwart in the data visualization space. The maturity of the library, thorough documentation, and huge community made it a stable foundation for our custom charts. By building directly on top of D3’s fundamentals, we had full control over our final visualizations, avoiding the limitations of less established abstractions. That mindset — paying off in performance, maintainability, and peace of mind for embracing a core, proven library rather than chasing every new offering — means instead of spending time adapting our data to a proprietary system or debugging half-baked features, we have to focus on real product needs, confident that D3 would remain stable and well-supported. Performance Gains Libraries aren’t just maintenance overhead, they affect your app’s performance too. In one recent project, we reduced the initial bundle size by 60% simply by removing niche libraries and replacing them with native code. The numbers told the story. Load time dropped from 3.2s to 1.4s.Time to interact improved by nearly half.Memory usage fell by roughly 30%. These results didn’t come from advanced optimizations but from the simpler act of removing unnecessary dependencies. In an age of ever-growing user expectations, the performance benefits alone can justify a more minimal approach. Building for the Long Term Software is never static. Today’s must-have library may turn out to be tomorrow’s orphaned repository. Reliable, stable code tends to come from developers who favor well-understood, minimal solutions over ones that rely too heavily on external, fast-moving packages. Take authentication, for example: with the hundreds of packages that exist to handle user login flows, rolling a simple system with few dependencies may result in something easier to audit, more transparent, and less subject to churn from external libraries. The code might be a bit more verbose, but it’s also explicit, predictable, and directly under your control. Teaching and Team Growth One of the underrated benefits of using fewer libraries is how it fosters stronger problem-solving skills within your team. Having to implement features themselves forces the developers to have a deep understanding of core concepts-which pays dividends when debugging, performance tuning, or even evaluating new technologies in the future. Relying too much on abstractions from someone else can stunt that growth and transform capable coders into “framework operators.” Conclusion The next time you think about installing yet another trending package, reflect on whether it solves your pressing need or just for novelty. As experience has drummed into my head, each new dependency is for life. This is the way to ensure that one gets light solutions that are secure and easier to maintain by using built-in capabilities, well-tried libraries, and a deep understanding of fundamentals. Ultimately, “boring” but reliable libraries — and sometimes just vanilla JavaScript — tend to stand the test of time better than flashy newcomers. Balancing innovation with pragmatism is the hallmark of a seasoned developer. In an era of endless frameworks and packages, recognizing when you can simply reach for the tools you already have may be the most valuable skill of all.

By Denis Ermakov

How to Split PDF Files into Separate Documents Using Java

Asking our Java file-processing applications to manipulate PDF documents can only increase their value in the long run. PDF is by far the most popular, widely used file type in the world today, and that’s unlikely to change any time soon. Introduction In this article, we’ll specifically learn how to divide PDF files into a series of separate PDF documents in Java — resulting in exactly one new PDF per page of the original file — and we’ll discuss open-source and third-party web API options to facilitate implementing that programmatic workflow into our code. We’ll start with a high-level overview of how PDF files are structured to make this type of workflow possible. Distinguishing PDF from Open-Office XML File Types I’ve written a lot about MS Open-Office XML (OOXML) files (e.g., DOCX, XLSX, etc.) in recent months — and it’s worth noting right away that PDF files are extremely different. Where OOXML files are structured as a zip-compressed package of XML files containing document formatting instructions, PDF files use a binary file format that prioritizes layout fidelity over structured data representation and editability. In other words, PDF files care more about the visual appearance of content than its accessibility; we might’ve noticed this for ourselves if we’ve tried to copy and paste information directly from PDF files into another document. Understanding How PDF Files Manage Individual Pages Each individual page within a PDF document is organized in a hierarchical section called the Page Tree. Within this Tree, each page is represented as its own independent object, and each page object references its own content streams (i.e., how the file should render the page when it’s opened) and resources (i.e., which fonts, images, or other objects the file should use on each page). Each resource found on any given PDF page contains a specific byte offset reference in the PDF directory (called a cross-reference table), which directs the object to load in a specific page location. If we’ve spent time looking at any document file structures in the past, this should all sound pretty familiar. What might be less familiar is the path to building a series of new, independent PDF documents using each page object found within a PDF Page Tree. Creating New PDF Files From Split PDF Pages The latter stage of our process involves extracting and subsequently cloning PDF page content — which includes retaining the necessary resources (page rendering instructions) and maintaining the right object references (content location instructions) for each PDF page. The API we use to handle this stage of the process will often duplicate shared resources from the original PDF document to avoid issues in the subsequent standalone documents. Handling this part correctly is crucial to ensure the resulting independent PDF documents contain the correct content; this consideration is one of the many reasons why we (probably) wouldn’t enjoy writing a program to handle this workflow from scratch. Once each page is successfully cloned, a new PDF document must be created for each page object with a Page Tree that defines only one page, and the result of this process must be serialized. The original PDF metadata object (which includes information like the document title, author, creation date, etc.) may be retained or deleted, depending on the API. Splitting PDFs With an Open-Source Library If we’re heading in an open-source API direction for our project, we might’ve already guessed that we’d land on an Apache library. Like most Apache APIs, the Apache PDFBox library is extremely popular thanks to its frequent updates, extensive support, and exhaustive documentation. Apache PDFBox has a utility called PDFSplit which conveniently facilitates the PDF splitting process. More specifically, the PDFSplit utility is represented by the Splitter class from the Apache PDFBox library. After we create a Splitter instance in our code (this configures logic for splitting a PDF document), we can call the split() method that breaks our loaded PDF into a series of independent PDF document objects. Each new PDF document can then be stored independently with the save() method, and when our process is finished, we can invoke the close() method to prevent memory leaks from occurring in our program. Like any library, we can add Apache PDFBox to our Java project by adding the required dependencies to our pom.xml (for Maven projects) or to our build.gradle (for Gradle projects). Splitting PDFs With a Web API One of the challenges we often encounter using open-source APIs for complex file operations is the overhead incurred from local memory usage (i.e., on the machine running the code). When we split larger PDF files, for example, we consume a significant amount of local RAM, CPU, and disk space on our server. Sometimes, it’s best to itemize our file processing action as a web request and ask it to take place on another server entirely. This offloads the bulk of our file processing overhead to another server, distributing the workload more effectively. We could deploy a new server on our own, or we could lean on third-party web API servers with easy accessibility and robust features. This depends entirely on the scope and requirements of our project; we may not have permission to provision a new server or leverage a third-party service. We’ll now look at one example of a simple web API request that can offload PDF splitting and document generation on our behalf. Demonstration The below solution is free to use, requiring an API key in the configuration step. For a Maven project, we can install it by first adding the below reference to our pom.xml repository: XML <repositories> <repository> <id>jitpack.io</id> <url>https://jitpack.io</url> </repository> </repositories> And then adding the below reference to our pom.xml dependency: XML <dependencies> <dependency> <groupId>com.github.Cloudmersive</groupId> <artifactId>Cloudmersive.APIClient.Java</artifactId> <version>v4.25</version> </dependency> </dependencies> Alternatively, for a Gradle project, we’ll add the below in our root build.gradle (at the end of repositories): Groovy allprojects { repositories { ... maven { url 'https://jitpack.io' } } } We’ll then add the following dependency in build.gradle: Groovy dependencies { implementation 'com.github.Cloudmersive:Cloudmersive.APIClient.Java:v4.25' } Next, we’ll place the import classes at the top of our file: Java // Import classes: //import com.cloudmersive.client.invoker.ApiClient; //import com.cloudmersive.client.invoker.ApiException; //import com.cloudmersive.client.invoker.Configuration; //import com.cloudmersive.client.invoker.auth.*; //import com.cloudmersive.client.SplitDocumentApi; Then, we’ll add our API key configuration directly after: Java ApiClient defaultClient = Configuration.getDefaultApiClient(); // Configure API key authorization: Apikey ApiKeyAuth Apikey = (ApiKeyAuth) defaultClient.getAuthentication("Apikey"); Apikey.setApiKey("YOUR API KEY"); // Uncomment the following line to set a prefix for the API key, e.g. "Token" (defaults to null) //Apikey.setApiKeyPrefix("Token"); Finally, we’ll create an instance of the SplitDocumentAPI and call the apiInstance.splitDocumentPdfByPage() method with our input PDF file: Java SplitDocumentApi apiInstance = new SplitDocumentApi(); File inputFile = new File("/path/to/inputfile"); // File | Input file to perform the operation on. Boolean returnDocumentContents = true; // Boolean | Set to true to directly return all of the document contents in the DocumentContents field; set to false to return contents as temporary URLs (more efficient for large operations). Default is false. try { SplitPdfResult result = apiInstance.splitDocumentPdfByPage(inputFile, returnDocumentContents); System.out.println(result); } catch (ApiException e) { System.err.println("Exception when calling SplitDocumentApi#splitDocumentPdfByPage"); e.printStackTrace(); } We'll most likely want to keep returnDocumentContents set to true in our code, just like the above example. This specifies that the API will return file byte strings in our response array rather than temporary URLs (which are used to "chain" edits together by referencing modified file content in a cache on the endpoint server). Our try/catch block will print errors (with stack trace) to the console for easy debugging. In our API response, we can expect an array of new PDF documents. Here's a JSON response model for reference: JSON { "Successful": true, "Documents": [ { "PageNumber": 0, "URL": "string", "DocumentContents": "string" } ] } And an XML version of the same, if that's more helpful: XML <?xml version="1.0" encoding="UTF-8"?> <SplitPdfResult> <Successful>true</Successful> <Documents> <PageNumber>0</PageNumber> <URL>string</URL> <DocumentContents>string</DocumentContents> </Documents> </SplitPdfResult> Conclusion In this article, we learned about how PDF files are structured, and we focused our attention on the way PDF pages are organized within PDF file structure. We learned about the high-level steps involved in splitting a PDF file into a series of separate documents, and we then explored two Java libraries — one open-source library and one third-party web API — to facilitate adding this workflow into our own Java project.

By Brian O'Neill

CORE

Integrating AI With Spring Boot: A Beginner’s Guide

Do you need to integrate artificial intelligence into your Spring Boot application? Spring AI reduces complexity using abstractions you are used to apply within Spring Boot. Let’s dive into the basics in this blog post. Enjoy! Introduction Artificial intelligence is not a Python-only party anymore. LangChain4j basically opened the Java toolbox for integrating with AI. Spring AI is the Spring solution for AI integration. It tries to reduce the complexity of integrating AI within a Java application, just like LangChain4j is doing. The difference is that you can use the same abstractions as you are used to apply within Spring Boot. At the time of writing, only a milestone release is available, but it is a matter of months before the first General Availablity (GA) release will be released. In this blog, basic functionality will be demonstrated, mainly based on the official documentation of Spring AI. So, do check out the official documentation, next to reading this blog. Sources used in this blog are available on GitHub. Prerequisites Prerequisites for reading this blog are: Basic knowledge of Java;Basic knowledge of Spring Boot;Basic knowledge of large language models (LLMs). Project Setup Navigate to Spring Initializr and add the Ollama and Spring Web dependencies. Spring Web will be used to invoke REST endpoints, Ollama will be used as LLM provider. An LLM provider is used to run an LLM. Take a look at the pom and notice that the spring-ai-ollama-spring-boot-starter dependency is added and the spring-ai-bom. As mentioned before, Spring AI is still a milestone release; therefore, the Spring Milestones repositories need to be added. XML <properties> ... <spring-ai.version>1.0.0-M5</spring-ai.version> </properties> <dependencies> ... <dependency> <groupId>org.springframework.ai</groupId> <artifactId>spring-ai-ollama-spring-boot-starter</artifactId> </dependency> ... </dependencies> <dependencyManagement> <dependencies> <dependency> <groupId>org.springframework.ai</groupId> <artifactId>spring-ai-bom</artifactId> <version>${spring-ai.version}</version> <type>pom</type> <scope>import</scope> </dependency> </dependencies> </dependencyManagement> <repositories> <repository> <id>spring-milestones</id> <name>Spring Milestones</name> <url>https://repo.spring.io/milestone</url> <snapshots> <enabled>false</enabled> </snapshots> </repository> </repositories> As an LLM provider, Ollama will be used. Install it according to the installation instructions and install a model. In this blog, Llama 3.2 will be used as a model. Install and run the model with the following command: Shell $ ollama run llama3.2 Detailed information about the Ollama commands can be found on the GitHub page. The Spring Boot Ollama Starter comes with some defaults. By default, the Mistral model is configured. The default can be changed in the application.properties file. Properties files spring.ai.ollama.chat.options.model=llama3.2 Chat Responses 1. String Content Take a look at the following code snippet, which takes a message as an input parameter, sends it to Ollama, and returns the response. A preconfigured ChatClient.Builder instance is injected in the constructor of MyController.A chatClient is constructed.The prompt method is used to start creating the prompt.The user message is added.The call method sends the prompt to Ollama.The content method contains the response. Java @RestController class MyController { private final ChatClient chatClient; public MyController(ChatClient.Builder chatClientBuilder) { this.chatClient = chatClientBuilder.build(); } @GetMapping("/basic") String basic(@RequestParam String message) { return this.chatClient.prompt() .user(message) .call() .content(); } } Run the application. Shell $ mvn spring-boot:run Invoke the endpoint with the message 'tell me a joke.' A joke is returned. Shell $ curl "http://localhost:8080/basic?message=tell%20me%20a%20joke" Here's one: What do you call a fake noodle? An impasta. 2. ChatResponse Instead of just returning the response from Ollama, it is also possible to retrieve a ChatResponse object which contains some metadata. E.g., the number of input tokens (a token is a part of a word) and the number of output tokens (the number of tokens of the response). This might be interesting if you are using cloud models because these charge you based on the number of tokens. Java @GetMapping("/chatresponse") String chatResponse(@RequestParam String message) { ChatResponse chatResponse = this.chatClient.prompt() .user(message) .call() .chatResponse(); return chatResponse.toString(); } Run the application and invoke the endpoint. Shell $ curl "http://localhost:8080/chatresponse?message=tell%20me%20a%20joke" ChatResponse [metadata={ id: , usage: { promptTokens: 29, generationTokens: 14, totalTokens: 43 }, rateLimit: org.springframework.ai.chat.metadata.EmptyRateLimit@c069511 }, generations=[Generation [assistantMessage= AssistantMessage [ messageType=ASSISTANT, toolCalls=[], textContent=Why don't scientists trust atoms? Because they make up everything., metadata={messageType=ASSISTANT} ], chatGenerationMetadata= ChatGenerationMetadata{finishReason=stop,contentFilterMetadata=null} ] ] ] Other information in the metadata is the response itself, which is of type ASSISTANT because the model has created this message. 3. Entity Response You might want to process the response of the LLM in your application. In this case, it is very convenient when the response is returned as a Java object instead of having to parse it yourself. This can be done by means of the entity method. Create a record ArtistSongs which contains the artist's name and a list of songs. When invoking the entity method, you specify that you want the response to be returned as an ArtistSongs record. Java @GetMapping("/entityresponse") String entityResponse() { ArtistSongs artistSongs = this.chatClient.prompt() .user("Generate a list of songs of Bruce Springsteen. Limit the list to 10 songs.") .call() .entity(ArtistSongs.class); return artistSongs.toString(); } record ArtistSongs(String artist, List<String> songs) {} Run the application and invoke the endpoint. Shell $ curl "http://localhost:8080/entityresponse" ArtistSongs[artist=Bruce Springsteen, songs=[Born to Run, Thunder Road, Dancing in the Dark, Hungry Heart, Jungleland, The River, Devil's Arcade, Badlands, Sherry Darling, Rosalita (Come Out Tonight)]] However, sometimes the response is empty. It might be that the model does not return any songs at all and that the prompt should be made more specific (e.g., at least one song). Shell $ curl "http://localhost:8080/entityresponse" ArtistSongs[artist=null, songs=null] You might run into an exception when you invoke the endpoint many times. The cause has not been investigated, but it seems that Spring AI adds instructions to the model in order to return the response as a JSON object so that it can be converted easily to a Java object. Sometimes, the model might not return valid JSON. Plain Text 2024-11-16T13:01:06.980+01:00 ERROR 21595 --- [MySpringAiPlanet] [nio-8080-exec-1] o.s.ai.converter.BeanOutputConverter : Could not parse the given text to the desired target type:{ "artist": "Bruce Springsteen", "songs": [ "Born in the U.S.A.", "Thunder Road", "Dancing in the Dark", "Death to My Hometown", "The River", "Badlands", "Jungleland", "Streets of Philadelphia", "Born to Run", "Darkness on the Edge of Town" into org.springframework.ai.converter.BeanOutputConverter$CustomizedTypeReference@75a77425 2024-11-16T13:01:06.981+01:00 ERROR 21595 --- [MySpringAiPlanet] [nio-8080-exec-1] o.a.c.c.C.[.[.[/].[dispatcherServlet] : Servlet.service() for servlet [dispatcherServlet] in context with path [] threw exception [Request processing failed: java.lang.RuntimeException: com.fasterxml.jackson.databind.JsonMappingException: Unexpected end-of-input: expected close marker for Array (start marker at [Source: REDACTED (`StreamReadFeature.INCLUDE_SOURCE_IN_LOCATION` disabled); line: 3, column: 12]) at [Source: REDACTED (`StreamReadFeature.INCLUDE_SOURCE_IN_LOCATION` disabled); line: 13, column: 35] (through reference chain: com.mydeveloperplanet.myspringaiplanet.MyController$ArtistSongs["songs"]->java.util.ArrayList[10])] with root cause com.fasterxml.jackson.core.io.JsonEOFException: Unexpected end-of-input: expected close marker for Array (start marker at [Source: REDACTED (`StreamReadFeature.INCLUDE_SOURCE_IN_LOCATION` disabled); line: 3, column: 12]) at [Source: REDACTED (`StreamReadFeature.INCLUDE_SOURCE_IN_LOCATION` disabled); line: 13, column: 35] at com.fasterxml.jackson.core.base.ParserMinimalBase._reportInvalidEOF(ParserMinimalBase.java:585) ~[jackson-core-2.17.2.jar:2.17.2] at com.fasterxml.jackson.core.base.ParserBase._handleEOF(ParserBase.java:535) ~[jackson-core-2.17.2.jar:2.17.2] at com.fasterxml.jackson.core.base.ParserBase._eofAsNextChar(ParserBase.java:552) ~[jackson-core-2.17.2.jar:2.17.2] at com.fasterxml.jackson.core.json.ReaderBasedJsonParser._skipWSOrEnd(ReaderBasedJsonParser.java:2491) ~[jackson-core-2.17.2.jar:2.17.2] at com.fasterxml.jackson.core.json.ReaderBasedJsonParser.nextToken(ReaderBasedJsonParser.java:673) ~[jackson-core-2.17.2.jar:2.17.2] at com.fasterxml.jackson.core.json.ReaderBasedJsonParser.nextTextValue(ReaderBasedJsonParser.java:1217) ~[jackson-core-2.17.2.jar:2.17.2] at com.fasterxml.jackson.databind.deser.std.StringCollectionDeserializer.deserialize(StringCollectionDeserializer.java:203) ~[jackson-databind-2.17.2.jar:2.17.2] at com.fasterxml.jackson.databind.deser.std.StringCollectionDeserializer.deserialize(StringCollectionDeserializer.java:184) ~[jackson-databind-2.17.2.jar:2.17.2] at com.fasterxml.jackson.databind.deser.std.StringCollectionDeserializer.deserialize(StringCollectionDeserializer.java:27) ~[jackson-databind-2.17.2.jar:2.17.2] at com.fasterxml.jackson.databind.deser.SettableBeanProperty.deserialize(SettableBeanProperty.java:545) ~[jackson-databind-2.17.2.jar:2.17.2] at com.fasterxml.jackson.databind.deser.BeanDeserializer._deserializeWithErrorWrapping(BeanDeserializer.java:576) ~[jackson-databind-2.17.2.jar:2.17.2] at com.fasterxml.jackson.databind.deser.BeanDeserializer._deserializeUsingPropertyBased(BeanDeserializer.java:446) ~[jackson-databind-2.17.2.jar:2.17.2] at com.fasterxml.jackson.databind.deser.BeanDeserializerBase.deserializeFromObjectUsingNonDefault(BeanDeserializerBase.java:1493) ~[jackson-databind-2.17.2.jar:2.17.2] at com.fasterxml.jackson.databind.deser.BeanDeserializer.deserializeFromObject(BeanDeserializer.java:348) ~[jackson-databind-2.17.2.jar:2.17.2] at com.fasterxml.jackson.databind.deser.BeanDeserializer.deserialize(BeanDeserializer.java:185) ~[jackson-databind-2.17.2.jar:2.17.2] at com.fasterxml.jackson.databind.deser.DefaultDeserializationContext.readRootValue(DefaultDeserializationContext.java:342) ~[jackson-databind-2.17.2.jar:2.17.2] at com.fasterxml.jackson.databind.ObjectMapper._readMapAndClose(ObjectMapper.java:4905) ~[jackson-databind-2.17.2.jar:2.17.2] at com.fasterxml.jackson.databind.ObjectMapper.readValue(ObjectMapper.java:3848) ~[jackson-databind-2.17.2.jar:2.17.2] at com.fasterxml.jackson.databind.ObjectMapper.readValue(ObjectMapper.java:3831) ~[jackson-databind-2.17.2.jar:2.17.2] at org.springframework.ai.converter.BeanOutputConverter.convert(BeanOutputConverter.java:191) ~[spring-ai-core-1.0.0-M3.jar:1.0.0-M3] at org.springframework.ai.converter.BeanOutputConverter.convert(BeanOutputConverter.java:58) ~[spring-ai-core-1.0.0-M3.jar:1.0.0-M3] at org.springframework.ai.chat.client.DefaultChatClient$DefaultCallResponseSpec.doSingleWithBeanOutputConverter(DefaultChatClient.java:349) ~[spring-ai-core-1.0.0-M3.jar:1.0.0-M3] at org.springframework.ai.chat.client.DefaultChatClient$DefaultCallResponseSpec.entity(DefaultChatClient.java:355) ~[spring-ai-core-1.0.0-M3.jar:1.0.0-M3] at com.mydeveloperplanet.myspringaiplanet.MyController.entityResponse(MyController.java:54) ~[classes/:na] at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:103) ~[na:na] at java.base/java.lang.reflect.Method.invoke(Method.java:580) ~[na:na] at org.springframework.web.method.support.InvocableHandlerMethod.doInvoke(InvocableHandlerMethod.java:255) ~[spring-web-6.1.14.jar:6.1.14] at org.springframework.web.method.support.InvocableHandlerMethod.invokeForRequest(InvocableHandlerMethod.java:188) ~[spring-web-6.1.14.jar:6.1.14] at org.springframework.web.servlet.mvc.method.annotation.ServletInvocableHandlerMethod.invokeAndHandle(ServletInvocableHandlerMethod.java:118) ~[spring-webmvc-6.1.14.jar:6.1.14] at org.springframework.web.servlet.mvc.method.annotation.RequestMappingHandlerAdapter.invokeHandlerMethod(RequestMappingHandlerAdapter.java:926) ~[spring-webmvc-6.1.14.jar:6.1.14] at org.springframework.web.servlet.mvc.method.annotation.RequestMappingHandlerAdapter.handleInternal(RequestMappingHandlerAdapter.java:831) ~[spring-webmvc-6.1.14.jar:6.1.14] at org.springframework.web.servlet.mvc.method.AbstractHandlerMethodAdapter.handle(AbstractHandlerMethodAdapter.java:87) ~[spring-webmvc-6.1.14.jar:6.1.14] at org.springframework.web.servlet.DispatcherServlet.doDispatch(DispatcherServlet.java:1089) ~[spring-webmvc-6.1.14.jar:6.1.14] at org.springframework.web.servlet.DispatcherServlet.doService(DispatcherServlet.java:979) ~[spring-webmvc-6.1.14.jar:6.1.14] at org.springframework.web.servlet.FrameworkServlet.processRequest(FrameworkServlet.java:1014) ~[spring-webmvc-6.1.14.jar:6.1.14] at org.springframework.web.servlet.FrameworkServlet.doGet(FrameworkServlet.java:903) ~[spring-webmvc-6.1.14.jar:6.1.14] at jakarta.servlet.http.HttpServlet.service(HttpServlet.java:564) ~[tomcat-embed-core-10.1.31.jar:6.0] at org.springframework.web.servlet.FrameworkServlet.service(FrameworkServlet.java:885) ~[spring-webmvc-6.1.14.jar:6.1.14] at jakarta.servlet.http.HttpServlet.service(HttpServlet.java:658) ~[tomcat-embed-core-10.1.31.jar:6.0] at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:195) ~[tomcat-embed-core-10.1.31.jar:10.1.31] at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:140) ~[tomcat-embed-core-10.1.31.jar:10.1.31] at org.apache.tomcat.websocket.server.WsFilter.doFilter(WsFilter.java:51) ~[tomcat-embed-websocket-10.1.31.jar:10.1.31] at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:164) ~[tomcat-embed-core-10.1.31.jar:10.1.31] at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:140) ~[tomcat-embed-core-10.1.31.jar:10.1.31] at org.springframework.web.filter.RequestContextFilter.doFilterInternal(RequestContextFilter.java:100) ~[spring-web-6.1.14.jar:6.1.14] at org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:116) ~[spring-web-6.1.14.jar:6.1.14] at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:164) ~[tomcat-embed-core-10.1.31.jar:10.1.31] at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:140) ~[tomcat-embed-core-10.1.31.jar:10.1.31] at org.springframework.web.filter.FormContentFilter.doFilterInternal(FormContentFilter.java:93) ~[spring-web-6.1.14.jar:6.1.14] at org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:116) ~[spring-web-6.1.14.jar:6.1.14] at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:164) ~[tomcat-embed-core-10.1.31.jar:10.1.31] at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:140) ~[tomcat-embed-core-10.1.31.jar:10.1.31] at org.springframework.web.filter.CharacterEncodingFilter.doFilterInternal(CharacterEncodingFilter.java:201) ~[spring-web-6.1.14.jar:6.1.14] at org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:116) ~[spring-web-6.1.14.jar:6.1.14] at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:164) ~[tomcat-embed-core-10.1.31.jar:10.1.31] at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:140) ~[tomcat-embed-core-10.1.31.jar:10.1.31] at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:167) ~[tomcat-embed-core-10.1.31.jar:10.1.31] at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:90) ~[tomcat-embed-core-10.1.31.jar:10.1.31] at org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:483) ~[tomcat-embed-core-10.1.31.jar:10.1.31] at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:115) ~[tomcat-embed-core-10.1.31.jar:10.1.31] at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:93) ~[tomcat-embed-core-10.1.31.jar:10.1.31] at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:74) ~[tomcat-embed-core-10.1.31.jar:10.1.31] at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:344) ~[tomcat-embed-core-10.1.31.jar:10.1.31] at org.apache.coyote.http11.Http11Processor.service(Http11Processor.java:384) ~[tomcat-embed-core-10.1.31.jar:10.1.31] at org.apache.coyote.AbstractProcessorLight.process(AbstractProcessorLight.java:63) ~[tomcat-embed-core-10.1.31.jar:10.1.31] at org.apache.coyote.AbstractProtocol$ConnectionHandler.process(AbstractProtocol.java:905) ~[tomcat-embed-core-10.1.31.jar:10.1.31] at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.doRun(NioEndpoint.java:1741) ~[tomcat-embed-core-10.1.31.jar:10.1.31] at org.apache.tomcat.util.net.SocketProcessorBase.run(SocketProcessorBase.java:52) ~[tomcat-embed-core-10.1.31.jar:10.1.31] at org.apache.tomcat.util.threads.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1190) ~[tomcat-embed-core-10.1.31.jar:10.1.31] at org.apache.tomcat.util.threads.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:659) ~[tomcat-embed-core-10.1.31.jar:10.1.31] at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:63) ~[tomcat-embed-core-10.1.31.jar:10.1.31] at java.base/java.lang.Thread.run(Thread.java:1583) ~[na:na] Stream Responses When a large response is returned, it is better to stream the response in order that the user can start reading already instead of waiting for the full response to be returned. The only thing you need to do is to replace the call method with stream. The response will be a Flux. Java @GetMapping("/stream") Flux<String> stream(@RequestParam String message) { return this.chatClient.prompt() .user(message) .stream() .content(); } Run the application and invoke the endpoint. The result is that characters are displayed one after another. Shell $ curl "http://localhost:8080/stream?message=tell%20me%20a%20joke" Here's one: What do you call a fake noodle? An impasta. System Message A system message is used to instruct the LLM on how it should behave. This can be done by invoking the system method. In the code snippet, the LLM is instructed to use quotes from the movie The Terminator in the response. Java @GetMapping("/system") String system() { return this.chatClient.prompt() .system("You are a chat bot who uses quotes of The Terminator when responding.") .user("Who is Bruce Springsteen?") .call() .content(); } Run the application and invoke the endpoint. The response contains random quotes. Shell $ curl "http://localhost:8080/system" "Hasta la vista, baby!" Just kidding, I think you meant to ask about the Boss himself, Bruce Springsteen! He's a legendary American singer-songwriter and musician known for his heartland rock style and iconic songs like "Born to Run," "Thunder Road," and many more. A true "I'll be back" kind of artist, with a career spanning over 40 years and countless hits that have made him one of the most beloved musicians of all time! The system message can be applied to the ChatClient.Builder itself, besides other more general options. This way, you only need to add this once, or you can create your own defaults and override it when necessary. Chat Memory You prompt an LLM, and it responds. Based on this response, you prompt again. However, the LLM will not know anything about the previous prompt and response. Let’s use the first endpoint and prompt your name to the LLM (my name is Gunter). After this, you ask the LLM your name (what is my name). Shell $ curl "http://localhost:8080/basic?message=my%20name%20is%20Gunter" Hallo Gunter! It's nice to meet you. Is there something I can help you with, or would you like to chat? $ curl "http://localhost:8080/basic?message=what%20is%20my%20name" I don't have any information about your name. This conversation just started, and I don't have any prior knowledge or context to know your name. Would you like to share it with me? As you can see, the LLM does not remember the information that was given before. In order to solve this, you need to add chat memory to consecutive prompts. In Spring AI, this can be done by adding advisors. In the code snippet below, an in-memory chat memory is used, but you are also able to persist it to Cassandra if needed. Java private final InMemoryChatMemory chatMemory = new InMemoryChatMemory(); @GetMapping("/chatmemory") String chatMemory(@RequestParam String message) { return this.chatClient.prompt() .advisors(new MessageChatMemoryAdvisor(chatMemory)) .user(message) .call() .content(); } Run the application and invoke this new endpoint. The LLM does know your name now. Shell $ curl "http://localhost:8080/chatmemory?message=my%20name%20is%20Gunter" Hallo Gunter! It's nice to meet you. Is there something I can help you with, or would you like to chat for a bit? $ curl "http://localhost:8080/chatmemory?message=what%20is%20my%20name" Your name is Gunter. You told me that earlier! Is there anything else you'd like to talk about? Prompt Templates Prompt templating allows you to create a template prompt and fill in some parameters. This is quite useful because creating a good prompt can be quite challenging, and you do not want to bother your users with it. The following code snippet shows how to create a PromptTemplate and how to add the parameters to the template by adding key-value pairs. Java @GetMapping("/promptwhois") String promptWhoIs(@RequestParam String name) { PromptTemplate promptTemplate = new PromptTemplate("Who is {name}"); Prompt prompt = promptTemplate.create(Map.of("name", name)); return this.chatClient.prompt(prompt) .call() .content(); } Run the application and invoke the endpoint with different parameters. Shell $ curl "http://localhost:8080/promptwhois?name=Bruce%20Springsteen" Bruce Springsteen (born September 23, 1949) is an American singer-songwriter and musician. He is one of the most influential and iconic figures in popular music, known for his heartland rock style and poignant lyrics. ... $ curl "http://localhost:8080/promptwhois?name=Arnold%20Schwarzenegger" Arnold Schwarzenegger is a world-renowned Austrian-born American actor, filmmaker, entrepreneur, and former politician. He is one of the most successful and iconic figures in the entertainment industry. ... You can apply this to system messages as well. In the next code snippet you can instruct the LLM to add quotes to the response from a certain movie. The user message is fixed in this example. By means of a Prompt object, you create a list of the system and user message and add it to the prompt method. Java @GetMapping("/promptmessages") String promptMessages(@RequestParam String movie) { Message userMessage = new UserMessage("Telll me a joke"); SystemPromptTemplate systemPromptTemplate = new SystemPromptTemplate("You are a chat bot who uses quotes of {movie} when responding."); Message systemMessage = systemPromptTemplate.createMessage(Map.of("movie", movie)); Prompt prompt = new Prompt(List.of(systemMessage, userMessage)); return this.chatClient.prompt(prompt) .call() .content(); } Run the application and invoke the endpoint using different movies like The Terminator and Die Hard. Shell $ curl "http://localhost:8080/promptmessages?movie=The%20Terminator" "Hasta la vista, baby... to your expectations! Why couldn't the bicycle stand up by itself? Because it was two-tired!" $ curl "http://localhost:8080/promptmessages?movie=Die%20Hard" "Yippee ki yay, joke time!" Here's one: Why did the scarecrow win an award? Because he was outstanding in his field! (get it?) "Now we're talking!" Hope that made you smile! Conclusion Spring AI already offers quite some functionality in order to interact with artificial intelligence systems. It is easy to use and uses familiar Spring abstractions. In this blog, the basics were covered. Now, it is time to experiment with more complex use cases!

By Gunter Rotsaert

CORE

AWS Lambda Enhances Local IDE Experience With AI Support

AWS Lambda is enhancing the local IDE experience to make developing Lambda-based applications more efficient. These new features enable developers to author, build, debug, test, and deploy Lambda applications seamlessly within their local IDE using Visual Studio Code (VS Code). Overview The improved IDE experience is part of the AWS Toolkit for Visual Studio Code. It includes a guided setup walkthrough that helps developers configure their local environment and install necessary tools. The toolkit also includes sample applications that demonstrate how to iterate on your code both locally and in the cloud. Developers can save and configure build settings to accelerate application builds and generate configuration files for setting up a debugging environment. With these enhancements, you can sync local code changes quickly to the cloud or perform full application deployments, enabling faster iteration. You can test functions locally or in the cloud and create reusable test events to streamline the testing process. The toolkit also provides quick action buttons for building, deploying, and invoking functions locally or remotely. Additionally, it integrates with AWS Infrastructure Composer, allowing for a visual application-building experience directly within the IDE. If anyone has worked with AWS Lambda, you will find IDE is not developer-friendly and has poor UI. It's hard to make code changes in the code and test from the present IDE. On top of that, if you don't want to use AWS-based CI/CD services, automated deployment can be a bit challenging for a developer. You can use Terraform or Github actions now, but AWS came up with another better option to deploy and test code AWS Lambda code. Considering these challenges, AWS Lambda recently announced the Visual Studio integration feature, which is a part of the AWS toolkit. It will make it easier for the developers to push, build, test, and deploy the code. This integration feature option uses Visual Studio. Although it still has restrictions on using 50 MB code size, it now provides a better IDE experience similar to what Visual Studio will be like on your local host. This includes dependencies installation with extension, split screen layout, writing code and running test events without opening new windows, and live logs from CloudWatch for efficient debugging. In addition, Amazon Q's in the console can be used as a coding assistant similar to a co-pilot. This provides a better developer experience. To start using Visual Studio for AWS Lambda: 1. You should have Visual Studio locally installed. After that, install the AWS Toolkit from the marketplace. You will see that the webpage will redirect to Visual Studio and open this tab. You can go ahead and install this. 2. After installing the AWS Toolkit, you will see the AWS logo on the left sidebar under extensions. Click on that. 3. Now, select the option to connect with your AWS account. 3. After a successful connection, you will get a tab to invoke the Lambda function locally. As you can see below, this option requires AWS SAM installed to invoke Lambda locally. After login, it will also pull all your Lambda functions from your AWS account. If you want to update those, you can right-click on the Lambda function and select Upload Lambda. It will ask you for the zip file of the Lambda function. Alternatively, you can select samples from the explorer option in the left sidebar. If you want to go with remote invoke, you can click on any Lambda functions visible to you from the sidebar. 4. If you want to create your own Lambda function and test the integration, you can click on the Application Builder option and select AWS CLI or SAM. If you want the Lambda code to deploy to the AWS account, you can select the last option, as shown in the above screenshot. After that, if you log into your AWS account, you will be asked to log in. Then, it will let you deploy AWS code. This way, you can easily deploy AWS code from your IDE, which can be convenient for developer testing. Conclusion Lambda is enhancing the local development experience for Lambda-based applications by integrating with the VS Code IDE and AWS Toolkit. This upgrade simplifies the code-test-deploy-debug workflow. A step-by-step walkthrough helps you set up your local environment and explore Lambda functionality through sample applications. With intuitive icon shortcuts and the Command Palette, you can build, debug, test, and deploy Lambda applications seamlessly, enabling faster iteration without the need to switch between multiple tools.

By Swati Tyagi

The Evolution of User Authentication With Generative AI

Remember when you had to squint at wonky text or click on traffic lights to prove you're human? Those classic CAPTCHAs are being rendered obsolete by the day. As artificial intelligence improves, these once-reliable gatekeepers let automated systems through. That poses a challenge — and an opportunity — for developers to think again about how they verify human users. What’s Wrong With Traditional CAPTCHAs? Traditional CAPTCHAs have additional problems besides becoming increasingly ineffective against AI. Modern users expect seamless experiences, and presenting them with puzzles creates serious friction in their flow. Even more so, these systems introduce real accessibility challenges for users with visual or cognitive disabilities [1]. Recent research shows that traditional text-based CAPTCHAs can be solved with up to 99% accuracy using modern AI systems. Worse still, image recognition tasks — such as recognizing crosswalks and traffic lights — are trivial for state-of-the-art computer vision systems [2]. Why Has User Authentication Remained So Stagnant? The challenges are numerous and complex, but they also present an exciting opportunity for us as developers to innovate and adapt. The architecture of modern authentication systems is shifting from explicit challenges (a.k.a. "prove you’re human") to implicit verification ("we can tell you're human by how you interact"). Increases in underlying heuristics are driving higher and higher levels of frictionless, implicit authentication, marking a paradigm shift in our thinking about authentication [3]. The new systems function based on three essential qualities: User interactivity: They observe how users interact organically with websites and applications. A human's mouse, keyboard, or scroll behavior is unique and challenging for machines to replicate with 100% fidelity.Analysis of context: They process the context of every dialogue, including analyzing when and how users access services, their devices, and their generic behavior patterns.Adaptive security: These new systems use adaptive security, a concept in which the level of security changes depending on the risk factors involved. Instead of applying the same level of security to everyone, these systems can increase security measures when something seems suspicious while remaining almost undetectable to legitimate users. For AI Challenge: The Computer Use of Claude Recent developments in AI, including Anthropic’s Claude 3.5 Sonnet, have also significantly clouded the authentication landscape. Now, Claude can, in many ways, independently take control of a user's computer and browse the Internet, doing things like building websites or planning vacations [4]. This adds yet another layer of difficulty in distinguishing humans from machines. While providing exciting automation possibilities, this also requires more advanced authentication moments to secure and prevent AI impersonation [5]. About CAPTCHA in the Age of Generative AI These traditional CAPTCHA systems are becoming less effective as generative AI improves. Customer-facing product builders must evolve their authentication frameworks to run ahead of the sophisticated bot arms race without compromising their risky user experience. Here’s a way to approach this challenge in the GenAI era: 1. Adopt Multi-Layered Authentication That means not just using visual or text-based challenges but taking a multi-faceted approach: Behavioral analysis: Using AI, analyze how users interact(e.g., mouse movement, typing pattern [6], and more) with the application.Contextual verification: Assess device data, access patterns, and historical data [6].Adaptive security: Provide real-time security response based on risk [6]. 2. Focus on the User Experience User experience can place friction in the authentication process: Work towards invisible authentication methods operating behind the scenes [7].Obstacles that will be necessary are fast and intuitive to solve.Offer alternative text for users with disabilities [8]. 3. Use Powerful AI Techniques Protect yourself from malicious AI by using advanced technologies: Deploy machine learning models to identify human and AI-generated responses [9].Enable federated learning to learn detection without compromising user anonymity.Investigate the application of adversarial examples to fool AI-based CAPTCHA solvers [9]. 4. Institute Continuous Monitoring and Adjustment A lot is changing in the AI landscape, and we need to be on guard: Continuously evaluate the strength of your authentication mechanism in light of emerging AI advancements [10].Invest in real-time monitoring and threat detection/response systemsBe ready to deploy updates and patches at the drop of a hat as new vulnerabilities come to light. 5. Explore Other Authentication Options Go beyond conventional CAPTCHA systems: Investigate using biometric authentication (e.g., fingerprint, facial recognition [9])Use risk-based authentication that only prompts for a challenge on suspicious activity.Employ computationally expensive proof-of-work systems for bots but inexpensive ones for real users [7]. 6. Keep Transparency and User Trust As authentication systems get more complex, it’s essential to maintain trust in users: Inform users about your security practices.Give users options to perceive and navigate their data needs through the authentication. Ops must continually make privacy laws compliant, such as GDPR and CCPA [7]. Product builders can utilize this framework to create robust authentication mechanisms that defend against AI-enabled attacks and do not hinder user experience. The aim is not so much to render authentication impossible to AI altogether as to make it considerably more expensive than for legitimate users. Thinking Creatively-Challenging the AI Gods To address these issues, researchers are proposing new CAPTCHA ideas. Recently, a group from UCSF introduced creative solutions that utilize aspects of human cognition that contemporary AI models can not yet reproduce [11]. Their approach includes: Logical reasoning challenges: These are problems that require human-like logical reasoning that data-driven algorithms may need help solving quickly.Dynamic challenge generation: Designing unique CAPTCHAs generated on the fly and hard for AI systems to learn or predict.After-image visual patterns: Creating challenges involving visual perception of time-based movements and patterns beyond current capabilities for static-image processing AI.Scalable complexity: Assembling puzzles of increasing difficulty and complexity, from challenges that provide images to choose from to more complicated ones that require pattern detection. These methods are designed to provide a more robust defense against AI copying while remaining accessible to human users. As AI capabilities advance, such solutions will become necessary to preserve the integrity of user authentication. The Future of Authentication As we look ahead, there will be a few trends that will be defining the future of authentication. Helping people act as their own best friends while securing their access, authentication is getting more tailored to individual user behavior within acceptable privacy limits [12]. Integrating existing identity systems continues to become more seamless, minimizing the need for separate authentication steps. On the attacking side, both authentication systems and attackers continue to evolve new approaches using machine learning, which creates a continuous arms race. This migration away from traditional CAPTCHAs is a big step forward in challenging user identity. We can move to more advanced, intuitive approaches and devise systems that are simultaneously more secure and nice to use [13]. Shortly, the challenge of authentication may no longer be about making humans solve puzzles but rather about designing intelligent systems capable of identifying human behavior while safeguarding the privacy and security of individuals. Learning to understand and use these practices from the present allows us to construct better, more protected programs for each of us. Disclaimer: The views and opinions expressed in this article are those of the authors solely and do not reflect the official policy or position of any institution, employer, or organization with which the authors may be affiliated.

By Abhai Pratap Singh

Get Started With Vector Search in Azure Cosmos DB

This is a guide for folks who are looking for a way to quickly and easily try out the Vector Search feature in Azure Cosmos DB for NoSQL. This app uses a simple dataset of movies to find similar movies based on a given criteria. It's implemented in four languages — Python, TypeScript, .NET and Java. There are instructions that walk you through the process of setting things up, loading data, and then executing similarity search queries. A vector database is designed to store and manage vector embeddings, which are mathematical representations of data in a high-dimensional space. In this space, each dimension corresponds to a feature of the data, and tens of thousands of dimensions might be used to represent data. A vector's position in this space represents its characteristics. Words, phrases, or entire documents, and images, audio, and other types of data can all be vectorized. These vector embeddings are used in similarity search, multi-modal search, recommendations engines, large language models (LLMs), etc. Prerequisites You will need: An Azure subscription. If you don't have one, you can create a free Azure account. If, for some reason, you cannot create an Azure subscription, try Azure Cosmos DB for NoSQL free.Once that's done, go ahead and create an Azure Cosmos DB for NoSQL account.Create an Azure OpenAI Service resource. Azure OpenAI Service provides access to OpenAI's models including the GPT-4o, GPT-4o mini (and more), as well as embedding models. In this example, we will use the text-embedding-ada-002 embedding model. Deploy this model using the Azure AI Foundry portal. I am assuming you have the required programming language already setup. To run the Java example, you need to have Maven installed (most likely you do, but I wanted to call it out). Configure Integrated Vector Database in Azure Cosmos DB for NoSQL Before you start loading data, make sure to configure the vector database in Azure Cosmos DB. Enable the Feature This is a one-time operation — you will need to explicitly enable the vector indexing and search feature. Create a Database and Container Once you have done that, go ahead and create a database and collection. I created a database named movies_db and a container named movies with the partition key set to /id. Create Policies You will need to configure a vector embedding policy as well as an indexing policy for the container. For now, you can do it manually via the Azure portal (it's possible to do it programmatically as well) as part of the collection creation process. Use the same policy information as per the above, at least for this sample app: Choice of index type: Note that I have chosen the diskANN index type which and a dimension of 1536 for the vector embeddings. The embedding model I chose was text-embedding-ada-002 model and it supports dimension size of 1536. I would recommend that you stick to these values for running this sample app. But know that you can change the index type but will need to change the embedding model to match the new dimension of the specified index type. Alright, let's move on. Load Data in Azure Cosmos DB To keep things simple, I have a small dataset of movies in JSON format (in movies.json file). The process is straightforward: Read movie info data from json file,Generate vector embeddings (of the movie description), andInsert the complete data (title, description, and embeddings) into Azure Cosmos DB container. As promised, here are the language-specific instructions — refer to the one that's relevant to you. Irrespective of the language, you need to set the following environment variables: Plain Text export COSMOS_DB_CONNECTION_STRING="" export DATABASE_NAME="" export CONTAINER_NAME="" export AZURE_OPENAI_ENDPOINT="" export AZURE_OPENAI_KEY="" export AZURE_OPENAI_VERSION="2024-10-21" export EMBEDDINGS_MODEL="text-embedding-ada-002" Before moving on, don't forget to clone this repository: Plain Text git clone https://github.com/abhirockzz/cosmosdb-vector-search-python-typescript-java-dotnet cd cosmosdb-vector-search-python-typescript-java-dotnet Load Vector Data Using Python SDK for Azure Cosmos DB Setup the Python environment and install the required dependencies: Python cd python python3 -m venv .venv source .venv/bin/activate pip install -r requirements.txt To load the data, run the following command: Python python load.py Load Vector Data Using Typescript SDK for Azure Cosmos DB Install the required dependencies: Plain Text cd typescript npm install Build the program and then load the data: Plain Text npm run build npm run load Load Vector Data Using Java SDK for Azure Cosmos DB Install dependencies, and build the application: Plain Text cd java mvn clean install Load the data: Plain Text java -jar target/cosmosdb-java-vector-search-1.0-SNAPSHOT.jar load Load Vector Data Using .NET SDK for Azure Cosmos DB Install dependencies and load the data: Plain Text cd dotnet dotnet restore dotnet run load Irrespective of the language, you should see the output similar to this (with slight differences): Plain Text database and container ready.... Generated description embedding for movie: The Matrix Added data to Cosmos DB for movie: The Matrix .... Verify Data in Azure Cosmos DB Check the data in the Azure portal. You can also use the Visual Studio Code extension, which is pretty handy! Let's move on to the search part! Vector/Similarity Search The search component queries Azure Cosmos DB collection to find similar movies based on a given search criteria - for example, you can search for comedy movies. This is done using the VectorDistance function to get the similarity score between two vectors. Again, the process is quite simple: Generate a vector embedding for the search criteria, andUse the VectorDistance function to compare it. This is what the query looks like: Plain Text SELECT TOP @num_results c.id, c.description, VectorDistance(c.embeddings, @embedding) AS similarityScore FROM c ORDER BY VectorDistance(c.embeddings, @embedding) Just like data loading, the search is also language-specific. Here are the instructions for each language. I am assuming you have already set the environment variables and loaded the data. Invoke the respective program with your search criteria (e.g. inspiring, comedy, etc.) and the number of results (top N) you want to see. Python Plain Text python search.py "inspiring" 3 Typescript Plain Text npm run search "inspiring" 3 Java Plain Text java -jar target/cosmosdb-java-vector-search-1.0-SNAPSHOT.jar search "inspiring" 3 .NET Plain Text dotnet run search "inspiring" 3 Irrespective of the language, you should get the results similar to this. For example, my search query was "inspiring," and I got the following results: Plain Text Search results for query: inspiring Similarity score: 0.7809536662138555 Title: Forrest Gump Description: The story of a man with a low IQ who achieves incredible feats in his life, meeting historical figures and finding love along the way. ===================================== Similarity score: 0.771059411474658 Title: The Shawshank Redemption Description: Two imprisoned men bond over a number of years, finding solace and eventual redemption through acts of common decency. ===================================== Similarity score: 0.768073216615931 Title: Avatar Description: A paraplegic Marine dispatched to the moon Pandora on a unique mission becomes torn between following his orders and protecting the world he feels is his home. ===================================== Closing Notes I hope you found this useful! Before wrapping up, here are a few things to keep in mind: There are different vector index types you should experiment with (flat, quantizedFlat).Consider the metric your are using to compute distance/similarity (I used cosine, but you can also use euclidean, or dot product).Which embedding model you use is also an important consideration - I used text-embedding-ada-002 but there are other options, such as text-embedding-3-large, text-embedding-3-small.You can also use Azure Cosmos DB for MongoDB vCore for vector search.

By Abhishek Gupta

CORE

How Event-Driven Ansible Works for Configuration Monitoring

Configuration files control how applications, systems, and security policies work, making them crucial for keeping systems reliable and secure. If these files are changed accidentally or without permission, it can cause system failures, security risks, or compliance issues. Manually checking configuration files takes a lot of time, is prone to mistakes, and isn’t reliable, especially in complex IT systems. Event-driven Ansible offers a way to automatically monitor and manage configuration files. It reacts to changes as they happen, quickly detects them, takes automated actions, and works seamlessly with the tools and systems you already use. In this article, I will demonstrate how to use Ansible to monitor the Nginx configuration file and trigger specific actions if the file is modified. In the example below, I use the Ansible debug module to print the message to the host. However, this setup can be integrated with various Ansible modules depending on the organization's requirements. About the Module The ansible.eda.file_watch module is a part of event-driven Ansible and is used to monitor changes in specified files or directories. It can detect events such as file creation, modification, or deletion and trigger automated workflows based on predefined rules. This module is particularly useful for tasks like configuration file monitoring and ensuring real-time responses to critical file changes. Step 1 To install Nginx on macOS using Homebrew, run the command brew install nginx, which will automatically download and install Nginx along with its dependencies. By default, Homebrew places Nginx in the directory /usr/local/Cellar/nginx/ and configures it for use on macOS systems. After installation, edit the configuration file at /usr/local/etc/nginx/nginx.conf to set the listen directive to listen 8080;, then start the Nginx service with brew services start nginx. To confirm that Nginx is running, execute the command curl http://localhost:8080/ in the terminal. If Nginx is properly configured, you will receive an HTTP response indicating that it is successfully serving content on port 8080. Step 2 In the example below, the configwatch.yml playbook is used to monitor the Nginx configuration file at /usr/local/etc/nginx/nginx.conf. It continuously observes the file for any changes. When a modification is detected, the rule triggers an event that executes the print-console-message.yaml playbook. YAML --- - name: Check if the nginx config file is modified hosts: localhost sources: - name: file_watch ansible.eda.file_watch: path: /usr/local/etc/nginx/nginx.conf recursive: true rules: - name: Run the action if the /usr/local/etc/nginx/nginx.conf is modified condition: event.change == "modified" action: run_playbook: name: print-console-message.yml This second playbook performs a task to print a debug message to the console. Together, these playbooks provide automated monitoring and instant feedback whenever the configuration file is altered. YAML --- - name: Playbook for printing the message in console hosts: localhost connection: local gather_facts: false tasks: - name: Error message in the console debug: msg: "Server config altered" Demo To monitor the Nginx configuration file for changes, execute the command ansible-rulebook -i localhost -r configwatch.yml, where -i localhost specifies the inventory as the local system, and -r configwatch.yml points to the rulebook file that defines the monitoring rules and actions. This command will initiate the monitoring process, enabling Ansible to continuously watch the specified Nginx configuration file for any modifications. When changes are detected, the rules in the configwatch.yml file will trigger the action to run the print-console-message.yaml playbook. Check the last modified time of /usr/local/etc/nginx/nginx.conf by running the ls command. Use the touch command to update the last modified timestamp, followed by the ls command to display the output in the console. The output of the ansible-rulebook -i localhost -r configwatch.yml command, it detected the file timestamp modification change and triggered the corresponding action. Benefits of Event-Driven Ansible for Configuration Monitoring Event-driven Ansible simplifies configuration monitoring by instantly detecting changes and responding immediately. Organizations can extend the functionality to automatically fix issues without manual intervention, enhancing security by preventing unauthorized modifications. It also supports compliance by maintaining records and adhering to regulations while efficiently managing large and complex environments. Use Cases The Event-Driven Ansible File Watch module can serve as a security compliance tool by monitoring critical configuration files, such as SSH or firewall settings, to ensure they align with organizational policies. It can also act as a disaster recovery solution, automatically restoring corrupted or deleted configuration files from predefined backups. Additionally, it can be used as a multi-environment management tool, ensuring consistency across deployments by synchronizing configurations. Conclusion Event-driven Ansible is a reliable and flexible tool for monitoring configuration files in real time. It automatically detects, helping organizations keep systems secure and compliant. As systems become more complex, it offers a modern and easy-to-adapt way to manage configurations effectively. Note: The views expressed on this blog are my own and do not necessarily reflect the views of Oracle.

By Binoj Melath Nalinakshan Nair

Powering LLMs With Apache Camel and LangChain4j

LLMs need to connect to the real world. LangChain4j tools, combined with Apache Camel, make this easy. Camel provides robust integration, connecting your LLM to any service or API. This lets your AI interact with databases, queues, and more, creating truly powerful applications. We'll explore this powerful combination and its potential. Setting Up the Development Environment Ollama: Provides a way to run large language models (LLMs) locally. You can run many models, such as LLama3, Mistral, CodeLlama, and many others on your machine, with full CPU and GPU support.Visual Studio Code: With Kaoto, Java, and Quarkus plugins installed.OpenJDK 21MavenQuarkus 3.17Quarkus Dev Services: A feature of Quarkus that simplifies the development and testing of applications the development and testing of applications that rely on external services such as databases, messaging systems, and other resources. You can download the complete code at the following GitHub repo. The following instructions will be executed on Visual Studio Code: 1. Creating the Quarkus Project Shell mvn io.quarkus:quarkus-maven-plugin:3.17.6:create \ -DprojectGroupId=dev.mikeintoch \ -DprojectArtifactId=camel-agent-tools \ -Dextensions="camel-quarkus-core,camel-quarkus-langchain4j-chat,camel-quarkus-langchain4j-tools,camel-quarkus-platform-http,camel-quarkus-yaml-dsl" 2. Adding langchain4j Quarkus Extensions Shell ./mvnw quarkus:add-extension -Dextensions="io.quarkiverse.langchain4j:quarkus-langchain4j-core:0.22.0" ./mvnw quarkus:add-extension -Dextensions="io.quarkiverse.langchain4j:quarkus-langchain4j-ollama:0.22.0" 3. Configure Ollama to Run Ollama LLM Open the application.properties file and add the following lines: Properties files #Configure Ollama local model quarkus.langchain4j.ollama.chat-model.model-id=qwen2.5:0.5b quarkus.langchain4j.ollama.chat-model.temperature=0.0 quarkus.langchain4j.ollama.log-requests=true quarkus.langchain4j.log-responses=true quarkus.langchain4j.ollama.timeout=180s Quarkus uses Ollama to run LLM locally and also auto wire configuration for the use in Apache camel components in the following steps. 4. Creating Apache Camel Route Using Kaoto Create a new folder named route in the src/main/resources folder. Create a new file in the src/main/resources/routes folder and name route-main.camel.yaml, and Visual Studio Code opens the Kaoto visual editor. Click on the +New button and a new route will be created. Click on the circular arrows to replace the timer component. Search and select platform-http component from the catalog. Configure required platform-http properties: Set path with the value /camel/chat By default, platform-http will be serving on port 8080. Click on the Add Step Icon in the arrow after the platform-http component. Search and select the langchain4j-tools component in the catalog. Configure required langchain4j-tools properties: Set Tool Id with value my-tools.Set Tags with store (Defining tags is for grouping the tools to use with the LLM). You must process the user input message to the langchain4j-tools component able to use, then click on the Add Step Icon in the arrow after the platform-http component. Search and select the Process component in the catalog. Configure required properties: Set Ref with the value createChatMessage. The process component will use the createChatMessage method you will create in the following step. 5. Create a Process to Send User Input to LLM Create a new Java Class into src/main/java folder named Bindings.java. Java import java.util.ArrayList; import java.util.List; import java.util.Map; import java.util.HashMap; import org.apache.camel.BindToRegistry; import org.apache.camel.Exchange; import org.apache.camel.Processor; import org.apache.camel.builder.RouteBuilder; import dev.langchain4j.data.message.ChatMessage; import dev.langchain4j.data.message.SystemMessage; import dev.langchain4j.data.message.UserMessage; public class Bindings extends RouteBuilder{ @Override public void configure() throws Exception { // Routes are loading in yaml files. } @BindToRegistry(lazy=true) public static Processor createChatMessage(){ return new Processor() { public void process(Exchange exchange) throws Exception{ String payload = exchange.getMessage().getBody(String.class); List<ChatMessage> messages = new ArrayList<>(); String systemMessage = """ You are an intelligent store assistant. Users will ask you questions about store product. Your task is to provide accurate and concise answers. In the store have shirts, dresses, pants, shoes with no specific category %s If you are unable to access the tools to answer the user's query, Tell the user that the requested information is not available at this time and that they can try again later. """; String tools = """ You have access to a collection of tools You can use multiple tools at the same time Complete your answer using data obtained from the tools """; messages.add(new SystemMessage(systemMessage.formatted(tools))); messages.add(new UserMessage(payload)); exchange.getIn().setBody(messages); } }; } } This class helps create a Camel Processor to transform the user input into an object that can handle the langchain4j component in the route. It also gives the LLM context for using tools and explains the Agent's task. 6. Creating Apache Camel Tools for Using With LLM Create a new file in the src/main/resources/routes folder and name it route-tool-products.camel.yaml, and in Visual Studio Code, open the Kaoto visual editor. Click on the +New button, and a new route will be created. Click on the circular arrows to replace the timer component. Search and select the langchain4j-tools component in the catalog. Configure langchain4j-tools, click on the All tab and search Endpoint properties. Set Tool Id with value productsbycategoryandcolor.Set Tags with store (The same as in the main route).Set Description with value Query database products by category and color (a brief description of the tool). Add parameters that will be used by the tool: NAME: category, VALUE: stringNAME: color, VALUE: string These parameters will be assigned by the LLM for use in the tool and are passed via header. Add SQL Component to query database, then click on Add Step after the langchain4j-tools component. Search and select SQL component. Configure required SQL properties: Query with the following value. SQL Select name, description, category, size, color, price, stock from products where Lower(category)= Lower (:#category) and Lower(color) = Lower(:#color) Handle parameters to use in the query, then add a Convert Header component to convert parameters to a correct object type. Click on the Add Step button after langchain4j-tools, search, and select Convert Header To transformation in the catalog. Configure required properties for the component: Name with the value categoryType with the value String Repeat the steps with the following values: Name with the value colorType with the value String As a result, this is how the route looks like: Finally, you need to transform the query result into an object that the LLM can handle; in this example, you transform it into JSON. Click the Add Step button after SQL Component, and add the Marshal component. Configure data format properties for the Marshal and select JSon from the list. 7. Configure Quarkus Dev Services for PostgreSQL Add Quarkus extension to provide PostgreSQL for dev purposes, run following command in terminal. Shell ./mvnw quarkus:add-extension -Dextensions="io.quarkus:quarkus-jdbc-postgresql" Open application.properties and add the following lines: Properties files #Configuring devservices for Postgresql quarkus.datasource.db-kind=postgresql quarkus.datasource.devservices.port=5432 quarkus.datasource.devservices.init-script-path=db/schema-init.sql quarkus.datasource.devservices.db-name=store Finally, create our SQL script to load the database. Create a folder named db into src/main/resources, and into this folder, create a file named schema-init.sql with the following content. SQL DROP TABLE IF EXISTS products; CREATE TABLE IF NOT EXISTS products ( id SERIAL NOT NULL, name VARCHAR(100) NOT NULL, description varchar(150), category VARCHAR(50), size VARCHAR(20), color VARCHAR(20), price DECIMAL(10,2) NOT NULL, stock INT NOT NULL, CONSTRAINT products_pk PRIMARY KEY (id) ); INSERT INTO products (name, description, category, size, color, price, stock) VALUES ('Blue shirt', 'Cotton shirt, short-sleeved', 'Shirts', 'M', 'Blue', 29.99, 10), ('Black pants', 'Jeans, high waisted', 'Pants', '32', 'Black', 49.99, 5), ('White Sneakers', 'Sneakers', 'Shoes', '40', 'White', 69.99, 8), ('Floral Dress', 'Summer dress, floral print, thin straps.', 'Dress', 'M', 'Pink', 39.99, 12), ('Skinny Jeans', 'Dark denim jeans, high waist, skinny fit.', 'Pants', '28', 'Blue', 44.99, 18), ('White Sneakers', 'Casual sneakers, rubber sole, minimalist design.', 'Shoes', '40', 'White', 59.99, 10), ('Beige Chinos', 'Casual dress pants, straight cut, elastic waist.', 'Pants', '32', 'Beige', 39.99, 15), ('White Dress Shirt', 'Cotton shirt, long sleeves, classic collar.', 'Shirts', 'M', 'White', 29.99, 20), ('Brown Hiking Boots', 'Waterproof boots, rubber sole, perfect for hiking.', 'Shoes', '42', 'Brown', 89.99, 7), ('Distressed Jeans', 'Distressed denim jeans, mid-rise, regular fit.', 'Pants', '30', 'Blue', 49.99, 12); 8. Include our Route to be Loaded by the Quarkus Project Camel Quarkus supports several domain-specific languages (DSLs) in defining Camel Routes. It is also possible to include yaml DSL routes, adding the following line on the application.properties file. Properties files # routes to load camel.main.routes-include-pattern = routes/*.yaml This will be load all routes in the src/main/resources/routes folder. 9. Test the App Run the application using Maven, open a Terminal in Visual Studio code, and run the following command. Shell mvn quarkus:dev Once it has started, Quarkus calls Ollama and runs your LLM locally, opens a terminal, and verifies with the following command. Shell ollama ps NAME ID SIZE PROCESSOR UNTIL qwen2.5:0.5b a8b0c5157701 1.4 GB 100% GPU 4 minutes from now Also, Quarkus creates a container running PostgreSQL and creates a database and schema. You can connect using psql command. Shell psql -h localhost -p 5432 -U quarkus -d store And query products table: Shell store=# select * from products; id | name | description | category | size | color | price | stock ----+--------------------+----------------------------------------------------+----------+------+-------+-------+------- 1 | Blue shirt | Cotton shirt, short-sleeved | Shirts | M | Blue | 29.99 | 10 2 | Black pants | Jeans, high waisted | Pants | 32 | Black | 49.99 | 5 3 | White Sneakers | Sneakers | Shoes | 40 | White | 69.99 | 8 4 | Floral Dress | Summer dress, floral print, thin straps. | Dress | M | Pink | 39.99 | 12 5 | Skinny Jeans | Dark denim jeans, high waist, skinny fit. | Pants | 28 | Blue | 44.99 | 18 6 | White Sneakers | Casual sneakers, rubber sole, minimalist design. | Shoes | 40 | White | 59.99 | 10 7 | Beige Chinos | Casual dress pants, straight cut, elastic waist. | Pants | 32 | Beige | 39.99 | 15 8 | White Dress Shirt | Cotton shirt, long sleeves, classic collar. | Shirts | M | White | 29.99 | 20 9 | Brown Hiking Boots | Waterproof boots, rubber sole, perfect for hiking. | Shoes | 42 | Brown | 89.99 | 7 10 | Distressed Jeans | Distressed denim jeans, mid-rise, regular fit. | Pants | 30 | Blue | 49.99 | 12 (10 rows) To test the app, send a POST request to localhost:8080/camel/chat with a plain text body input. requesting for some product. The LLM may have hallucinated. Please try again modifying your request slightly. You can see how the LLM uses the tool and gets information from the database using the natural language request provided. LLM identifies the parameters and sends them to the tool. If you look in the request log, you can find the tools and parameters LLM is using to create the answer. Conclusion You've explored how to leverage the power of LLMs within your integration flows using Apache Camel and the LangChain4j component. We've seen how this combination allows you to seamlessly integrate powerful language models into your existing Camel routes, enabling you to build sophisticated applications that can understand, generate, and interact with human language.

By Miguel Delgadillo

"Fix with AI" Button to Automate Playwright Test Fixes

End-to-end tests are essential for ensuring the reliability of your application, but they can also be a source of frustration. Even small changes to the user interface can cause tests to fail, leading developers and QA teams to spend hours troubleshooting. In this blog post, I’ll show you how to utilize AI tools like ChatGPT or Copilot to automatically fix Playwright tests. You’ll learn how to create an AI prompt for any test that fails and attach it to your HTML report. This way, you can easily copy and paste the prompt into AI tools for quick suggestions on fixing the test. Join me to streamline your testing process and improve application reliability! Let’s dive in! Plan The solution comes down to three simple steps: Identify when a Playwright test fails.Create an AI prompt with all the necessary context: The error messageA snippet of the test codeAn ARIA snapshot of the pageIntegrate the prompt into the Playwright HTML report. By following these steps, you can enhance your end-to-end testing process and make fixing Playwright tests a breeze. Step-by-Step Guide Step 1: Detecting a Failed Test To detect a failed test in Playwright, you can create a custom fixture that checks the test result during the teardown phase, after the test has completed. If there’s an error in testInfo.error and the test won't be retried, the fixture will generate a helpful prompt. Check out the code snippet below: JavaScript import { test as base } from '@playwright/test'; import { attachAIFix } from '../../ai/fix-with-ai' export const test = base.extend({ fixWithAI: [async ({ page }, use, testInfo) => { await use() await attachAIFix(page, testInfo) }, { scope: 'test', auto: true }] }); Step 2: Building the Prompt Prompt Template I'll start with a simple proof-of-concept prompt (you can refine it later): You are an expert in Playwright testing. Your task is to fix the error in the Playwright test titled "{title}". - First, provide a highlighted diff of the corrected code snippet. - Base your fix solely on the ARIA snapshot of the page. - Do not introduce any new code. - Avoid adding comments within the code. - Ensure that the test logic remains unchanged. - Use only role-based locators such as getByRole, getByLabel, etc. - For any 'heading' roles, try to adjust the heading level first. - At the end, include concise notes summarizing the changes made. - If the test appears to be correct and the issue is a bug on the page, please note that as well. Input: {error} Code snippet of the failing test: {snippet} ARIA snapshot of the page: {ariaSnapshot} Let’s fill the prompt with the necessary data. Error Message Playwright stores the error message in testInfo.error.message. However, it includes special ASCII control codes for coloring output in the terminal (such as [2m or [22m): TimeoutError: locator.click: Timeout 1000ms exceeded. Call log: [2m - waiting for getByRole('button', { name: 'Get started' })[22m After investigating Playwright’s source code, I found a stripAnsiEscapes function that removes these special symbols: JavaScript const clearedErrorMessage = stripAnsiEscapes(testInfo.error.message); Cleared error message: TimeoutError: locator.click: Timeout 1000ms exceeded. Call log: - waiting for getByRole('button', { name: 'Get started' }) This cleaned-up message can be inserted into the prompt template. Code Snippet The test code snippet is crucial for AI to generate the necessary code changes. Playwright often includes these snippets in its reports, for example: 4 | test('get started link', async ({ page }) => { 5 | await page.goto('https://playwright.dev'); > 6 | await page.getByRole('button', { name: 'Get started' }).click(); | ^ 7 | await expect(page.getByRole('heading', { level: 3, name: 'Installation' })).toBeVisible(); 8 | }); You can see how Playwright internally generates these snippets. I’ve extracted the relevant code into a helper function, getCodeSnippet(), to retrieve the source code lines from the error stack trace: const snippet = getCodeSnippet(testInfo.error); ARIA Snapshot ARIA snapshots, introduced in Playwright 1.49, provide a structured view of the page’s accessibility tree. Here’s an example ARIA snapshot showing the navigation menu on the Playwright homepage: - document: - navigation "Main": - link "Playwright logo Playwright": - img "Playwright logo" - text: Playwright - link "Docs" - link "API" - button "Node.js" - link "Community" ... While ARIA snapshots are primarily used for snapshot comparison, they are also a game-changer for AI prompts in web testing. Compared to raw HTML, ARIA snapshots offer: Small size → Less risk of hitting prompt limitsLess noise → Less unnecessary contextRole-based structure → Encourages AI to generate role-based locators Playwright provides .ariaSnapshot(), which you can call on any element. For AI to fix a test, it makes sense to include the ARIA snapshot of the entire page retrieved from the root <html> element: HTML const ariaSnapshot = await page.locator('html').ariaSnapshot(); Assembling the Prompt Finally, combine all the pieces into one prompt: HTML const errorMessage = stripAnsiEscapes(testInfo.error.message); const snippet = getCodeSnippet(testInfo.error); const ariaSnapshot = await page.locator('html').ariaSnapshot(); const prompt = promptTemplate .replace('{title}', testInfo.title) .replace('{error}', errorMessage) .replace('{snippet}', snippet) .replace('{ariaSnapshot}', ariaSnapshot); Example of the generated prompt: Step 3: Attach the Prompt to the Report When the prompt is built, you can attach it to the test using testInfo.attach: HTML export async function attachAIFix(page: Page, testInfo: TestInfo) { const willRetry = testInfo.retry < testInfo.project.retries if (testInfo.error && !willRetry) { const prompt = generatePrompt({ title: testInfo.title, error: testInfo.error, ariaSnapshot: await page.locator('html').ariaSnapshot(), }); await testInfo.attach('AI Fix: Copy below prompt and paste to Github Copilot Edits to see the magic', { body: prompt }) } } Now, whenever a test fails, the HTML report will include an attachment labeled "Fix with AI." Fix Using Copilot Edits When it comes to using ChatGPT for fixing tests, you typically have to manually implement the suggested changes. However, you can make this process much more efficient by using Copilot. Instead of pasting the prompt into ChatGPT, simply open the Copilot edits window in VS Code and paste your prompt there. Copilot will then recommend code changes that you can quickly review and apply — all from within your editor. Check out this demo video of fixing a test with Copilot in VS Code: Integrating "Fix with AI" into Your Project Vitaliy Potapov created a fully working GitHub repository demonstrating the "Fix with AI" workflow. Feel free to explore it, run tests, check out the generated prompts, and fix errors with AI help. To integrate the "Fix with AI" flow into your own project, follow these steps: Ensure you’re on Playwright 1.49 or newerCopy the fix-with-ai.ts file into your test directoryRegister the AI-attachment fixture: HTML import { test as base } from '@playwright/test'; import { attachAIFix } from '../../ai/fix-with-ai' export const test = base.extend({ fixWithAI: [async ({ page }, use, testInfo) => { await use() await attachAIFix(page, testInfo) }, { scope: 'test', auto: true }] }); Run your tests and open the HTML report to see the “Fix with AI” attachment under any failed test From there, simply copy and paste the prompt into ChatGPT or GitHub Copilot, or use Copilot’s edits mode to automatically apply the code changes. Relevant Links Fully-working GitHub repositoryOriginally written by Vitaliy Potapov: https://dev.to/vitalets/fix-with-ai-button-in-playwright-html-report-2j37 I’d love to hear your thoughts or prompt suggestions for making the “Fix with AI” process even more seamless. Feel free to share your feedback in the comments. Thanks for reading, and happy testing with AI!

By Shivam Bharadwaj

Building Call Graphs for Code Exploration Using Tree-Sitter

Code exploration for large legacy codebases is a heavy-lifting task. Manual exploration can become error-prone and time-consuming. Automated data collection and visualization can ease the process to some extent. To extract key insights like Code composition, LoC, etc., we may need to use various data collection tools. However, using those tools is challenging as most of them are commercial. The available FOSS tools either support only smaller code sizes or only support a limited set of technology stacks. One such tool is Doxygen, which generates documentation out of codebases and helps extract various metadata elements that can be processed and used for further exploration. However, the challenge with this tool is that it allows very little control over how it collects data and is very heavy to run on large code bases. To solve this problem, we tried to build a custom data collection process to collect call graphs from codebases. The core component of the tool is a parser that parses source code, builds a call graph, and stores it in a graph datastore. This tutorial will guide you through setting up a tree-sitter parsing library in Python and using its different API for parsing the code for various use cases. Introduction Tree-sitter is a very powerful and performant parser generator library implemented in C and optimized to run cross-platforms. It supports grammar for most of the popular high-level programming languages. It also supports bindings for multiple languages so that it can be integrated with any type of application. For our implementation, we have used the C family of parser and Python bindings. Setup and Installation To get started with tree-sitter in Python, it needs the below package installation: Base Package Plain Text pip install tree-sitter This package provides an abstract class for the implementation of specific languages: Language: A class that defines how to parse a particular language.Tree: A tree that represents the syntactic structure of a source code file.Node: A single node within a syntax Tree.Parser: A class that is used to produce a Tree based on some source code. Language Packages This tutorial is focused on parsing codebases written in C family languages, so for this, it would require the below packages to be installed using the given commands: Plain Text pip install tree-sitter-c pip install tree-sitter-cpp pip install tree-sitter-c-sharp Each of these packages provides a language and parser implementation that can be used to parse code written in the specific language. Getting Started Basic parsing requires a parser instance for each language, which follows an abstract API. Python from tree_sitter import Language, Parser import tree_sitter_c as tsc import tree_sitter_cpp as tscpp import tree_sitter_c_sharp as tscs parser_c = Parser(Language(tsc.language())) parser_cpp = Parser(Language(tscpp.language())) parser_cs = Parser(Language(tscs.language())) To parse a code, it needs to read the file and load it to bytes. Python def readFile(file_path): with open(file_path, 'r', encoding = 'utf-8') as file: file_str = file.read() file_bytes = bytes(file_str, "utf8") return file_bytes Then, the loaded bytes are passed to the parse method, which will create and return a tree object representing the Abstract Syntex Tree of the parsed source code. Python file_bytes = readFile('C:/Data/RnD/memgraph-demo/alternative.c') tree = parser_c.parse(file_bytes) print("tree:- ", tree) The Tree points to the root node that has children created according to the grammar rules of that parser. Traversing the Parsed Tree The tree can be traversed using multiple parser APIs: Traversing Using Children The simplest way to traverse the tree is using direct children of each node. Each node has a name and type associated with it. The tree does not contain value embedded into it; to retrieve the value of each node it needs to offset the source using the start and end bytes of the node. Python def node_val(source_byte, node): return source_byte[node.start_byte:node.end_byte].decode('utf8') For example, to retrieve all the member function names in a C file, it needs to first reach each function_definition node type and then traverse to its function_declarator and finally, to its identifier node. Python def print_functions_c(file_bytes, tree): root_children = tree.root_node.children for root_child in root_children: if(root_child.type == "function_definition"): func_def_children = root_child.children for func_def_child in func_def_children: if(func_def_child.type == 'function_declarator'): func_dec_children = func_def_child.children for func_dec_child in func_dec_children: if(func_dec_child.type == 'identifier'): identifier = node_val(file_bytes, func_dec_child) print(identifier) Traversing Using Recursion The above code can be optimized by simply traversing the nodes recursively. By skipping all the intermediate children till reaching the final ‘identifier’ node. Python def print_identifiers(node, file_bytes): if node.type == 'identifier': identifier = node_val(file_bytes, node) print('identifier', ":-", identifier ) for child in node.children: print_identifiers(child, file_bytes) def print_functions_c(file_bytes, tree): print_identifiers(tree.root_node.children, file_bytes) Traversing Using Cursor API Parser’s tree provides a very efficient Cursor API that keeps track of nodes being processed. Based on logic, it can choose to process the next, previous, parent, or child node: cursor = tree.walk()cursor.goto_first_child()cursor.goto_next_sibling()cursor.goto_parent() To traverse using the cursor, you can use it inside recursion to reach a particular node by skipping all non-necessary nodes. Python def print_fn_defs_cs(file_bytes, cursor): if(cursor.node.type == "method_declaration"): identifier = cursor.node.child_by_field_name('nme') fn_name = node_val(file_bytes, identifier) print("fn_name: ", fn_name) if(len(cursor.node.children) > 0): status = cursor.goto_first_child() else: status = cursor.goto_next_sibling() while(status == False): status = cursor.goto_parent() if(status == False): break status = cursor.goto_next_sibling() if(status == True): print_fn_defs_cs(file_bytes, cursor) Building the Call Graph Using one of the above techniques, it can traverse and extract the function definition and function calls with a source file. Further, it needs to push the extracted data to a graph structure so that it can build relationships between method definition and method call across source files within a large codebase. This tutorial is focused on parsing and retrieving relationships between functions. The next part of this tutorial will focus on how to store and visualize the call graphs using graph stores like neo4j/memgraph.

By Vinod Pahuja