Artificial intelligence (AI) and machine learning (ML) are two fields that work together to create computer systems capable of perception, recognition, decision-making, and translation. Separately, AI is the ability for a computer system to mimic human intelligence through math and logic, and ML builds off AI by developing methods that "learn" through experience and do not require instruction. In the AI/ML Zone, you'll find resources ranging from tutorials to use cases that will help you navigate this rapidly growing field.
Semi-Supervised Learning: How to Overcome the Lack of Labels
Overview of Classical Time Series Analysis: Techniques, Applications, and Models
GenAI and Coding: A Short Story When I first heard about the Semantic Kernel, I was confused by the term. As a former C++ guy who has worked with Operating Systems, I thought it had something to do with it. Though I have followed the Generative AI landscape, I felt most of it was the initial hype (Garner's hype curve). Predicting tokens was lovely, but I never thought it would influence and create usable bots that could integrate with existing applications. Then, there comes LangChain, which immediately became one of the fastest-growing open-source projects of 2023. LangChain is an AI orchestrator that helps developers sprinkle the magic of AI into their new and existing applications. LangChain came in Python, Java, and JavaScript, and there was a void for C# folks. Semantic Kernel filled this gap: although it differs from LangChain in principle and style, it was essentially built to make the .NET folks happy. Icing on the cake? It also supports Java and Python. Semantic Kernel What Is Semantic Kernel? In the evolving landscape of AI, integrating intelligent, context-aware functionalities into applications has become more crucial than ever. Microsoft’s Semantic Kernel (SK) is a robust framework that allows developers to embed AI capabilities, such as natural language processing, into .NET applications. Whether you’re looking to build chatbots, automate workflows, or enhance decision-making processes, Semantic Kernel provides a robust foundation. Semantic Kernel is an extensible framework that leverages AI models like OpenAI's GPT to enable natural language understanding and generation within .NET applications. It provides a set of APIs and tools that allow developers to create AI-driven agents capable of processing text, generating content, and meaningfully interacting with users. At its heart, Semantic Kernel focuses on "plugins" — modular, reusable components that encapsulate specific capabilities such as understanding user intent, summarizing content, or generating responses. These can be combined and orchestrated to build sophisticated AI-driven applications. Why Semantic Kernel? LangChain is excellent, and Semantic Kernel is equally fabulous. Choosing one over the other should depend on your style, programming languages, or specific use cases. For example, if you need to integrate Gen AI capabilities into a browser-based solution such as ReactJS/Angular/Vue or vanilla web application, I would use LangChain, as it supports JavaScript. Are We Only Going to Talk About Sematic Kernel? No: in this multi-part series, though the primary focus would be on Semantic Kernel, we will still explore use cases of LangChain and use it as a cousin to Semantic Kernel for specific scenarios and use cases. Enough talk! Let's build something with SK! Prerequisites Before diving into the code, ensure you have the following prerequisites: .NET 7.0 SDK or later: Download it from the .NET website. Visual Studio 2022: Ensure the ASP.NET Core workload is installed. Azure AI Model: It is possible to use OpenAI or Other models directly, but for this series, I will stick to AI models that are deployed in Azure as I have enough Azure credits as an MS MVP (P.S.: If you plan to use OpenAI’s models, you’ll need an API key, which you can obtain from the OpenAI website.) Setting Up Your Project The first step in integrating Semantic Kernel into your application is to set up the environment. Let’s start with a simple console application and then walk through adding Semantic Kernel to the mix. 1. Create a New Console Project It's fine if you prefer to create a new console application using Visual Studio or VS Code. Shell dotnet new console -n sk-console cd sk-console 2. Add the Semantic Kernel NuGet Package Add Semantic Kernel NuGet package using the following command: Shell dotnet add package Microsoft.SemanticKernel 3. Setup Semantic Kernel Open your Program.cs file and configure the Semantic Kernel service. This service will handle interactions with the AI model. Please take a look at the function AddAzureOpenAIChatCompletion(). As the name suggests, this function helps us integrate Open AI chat completion using the Open AI model hosted in Azure onto our Semantic Kernel Builder. The parameters' values are from my already deployed gtp-4o model on Azure AI Studio. I will write a separate article on deploying AI models using Azure AI Studio and link it here later. C# var builder = Kernel.CreateBuilder(); builder.AddAzureOpenAIChatCompletion( deploymentName: "<Your_Deployment_Name>", endpoint: "<Azure-Deployment-Endpoint-Ends-In:openai.azure.com>", apiKey: "<Your_API_Key>" ); var kernel = builder.Build(); Think of this KernelBuilder as similar to the ASP.NET Core HostBuilder. Before the Build() call, you would need to supply all of your plugin information(more on plugins later), so that SK would be aware of it. 4. Ask the First Question C# Console.WriteLine(await kernel.InvokePromptAsync("What is Gen AI?")); 5. Running the Application With everything configured, you’re ready to run your application. Run the below command in the terminal. Shell dotnet run 6. We Did It! All is well. Our Semantic Kernel configuration used the Deployed Azure Open AI model to answer our question. Hooray! I know, I know, this isn't much. But I still published the source code on GitHub here. This is the starting point, and we will build from here. Conclusion Semantic Kernel is a powerful tool for bringing advanced AI capabilities to your .NET applications. Following the steps outlined in this multi-part series, you can quickly get started with Semantic Kernel and integrate intelligent, context-aware functionalities into your projects. The possibilities are vast, from simple chatbots to complex, AI-driven workflows. As we dive deeper, remember that the key to effectively leveraging the Semantic Kernel is in how you define and orchestrate your skills. With a solid understanding of these basics, you're well on your way to building the next generation of intelligent applications. What's Next? Are we done? No. Now that we know how to add Semantic Kernel in a .NET Application, it is time to take this flight off the ground. We will dig deeper and deeper as we go along with this multi-part series. In "Part 2: Understanding Plugins in Semantic Kernel, A Deep Dive," we will dive deeper into plugins in the semantic kernel. We wouldn't stop there, in the following parts, we will discuss agents, local SLMs and Ollama, Semantic Kernel on ASP.NET Core applications, mixing SK with AutoGen and LangChain, and more.
In this article, I will discuss in a practical and objective way the integration of the Spring framework with the resources of the OpenAI API, one of the main artificial intelligence products on the market. The use of artificial intelligence resources is becoming increasingly necessary in several products, and therefore, presenting its application in a Java solution through the Spring framework allows a huge number of projects currently in production to benefit from this resource. All of the code used in this project is available via GitHub. To download it, simply run the following command: git clone https://github.com/felipecaparelli/openai-spring.git or via SSL git clone. Note: It is important to notice that there is a cost in this API usage with the OpenAI account. Make sure that you understand the prices related to each request (it will vary by tokens used to request and present in the response). Assembling the Project 1. Get API Access As defined in the official documentation, first, you will need an API key from OpenAI to use the GPT models. Sign up at OpenAI's website if you don’t have an account and create an API key from the API dashboard. Going to the API Keys page, select the option Create new secret key. Then, in the popup, set a name to identify your key (optional) and press Create secret key. Now copy the API key value that will be used in your project configuration. 2. Configure the Project Dependencies The easiest way to prepare your project structure is via the Spring tool called Spring Initializr. It will generate the basic skeleton of your project, add the necessary libraries, the configuration, and also the main class to start your application. You must select at least the Spring Web dependency. In the Project type, I've selected Maven, and Java 17. I've also included the library httpclient5 because it will be necessary to configure our SSL connector. Follow the snipped of the pom.xml generated: XML <?xml version="1.0" encoding="UTF-8"?> <project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 https://maven.apache.org/xsd/maven-4.0.0.xsd"> <modelVersion>4.0.0</modelVersion> <parent> <groupId>org.springframework.boot</groupId> <artifactId>spring-boot-starter-parent</artifactId> <version>3.3.2</version> <relativePath/> <!-- lookup parent from repository --> </parent> <groupId>br.com.erakles</groupId> <artifactId>spring-openai</artifactId> <version>0.0.1-SNAPSHOT</version> <name>spring-openai</name> <description>Demo project to explain the Spring and OpenAI integration</description> <properties> <java.version>17</java.version> <spring-ai.version>1.0.0-M1</spring-ai.version> </properties> <dependencies> <dependency> <groupId>org.springframework.boot</groupId> <artifactId>spring-boot-starter-web</artifactId> </dependency> <dependency> <groupId>org.apache.httpcomponents.client5</groupId> <artifactId>httpclient5</artifactId> <version>5.3.1</version> </dependency> <dependency> <groupId>org.springframework.boot</groupId> <artifactId>spring-boot-starter-test</artifactId> <scope>test</scope> </dependency> </dependencies> <build> <plugins> <plugin> <groupId>org.springframework.boot</groupId> <artifactId>spring-boot-maven-plugin</artifactId> </plugin> </plugins> </build> </project> 3. Basic Configuration On your configuration file (application.properties), set the OpenAI secret key in the property openai.api.key. You can also replace the model version on the properties file to use a different API version, like gpt-4o-mini. Properties files spring.application.name=spring-openai openai.api.url=https://api.openai.com/v1/chat/completions openai.api.key=YOUR-OPENAI-API-KEY-GOES-HERE openai.api.model=gpt-3.5-turbo A tricky part about connecting with this service via Java is that it will, by default, require your HTTP client to use a valid certificate while executing this request. To fix it we will skip this validation step. 3.1 Skip the SSL validation To disable the requirement for a security certificate required by the JDK for HTTPS requests you must include the following modifications in your RestTemplate bean, via a configuration class: Java import org.apache.hc.client5.http.classic.HttpClient; import org.apache.hc.client5.http.impl.classic.HttpClients; import org.apache.hc.client5.http.impl.io.BasicHttpClientConnectionManager; import org.apache.hc.client5.http.socket.ConnectionSocketFactory; import org.apache.hc.client5.http.socket.PlainConnectionSocketFactory; import org.apache.hc.client5.http.ssl.NoopHostnameVerifier; import org.apache.hc.client5.http.ssl.SSLConnectionSocketFactory; import org.apache.hc.core5.http.config.Registry; import org.apache.hc.core5.http.config.RegistryBuilder; import org.apache.hc.core5.ssl.SSLContexts; import org.apache.hc.core5.ssl.TrustStrategy; import org.springframework.boot.web.client.RestTemplateBuilder; import org.springframework.context.annotation.Bean; import org.springframework.context.annotation.Configuration; import org.springframework.http.client.HttpComponentsClientHttpRequestFactory; import org.springframework.web.client.RestTemplate; import javax.net.ssl.SSLContext; @Configuration public class SpringOpenAIConfig { @Bean public RestTemplate secureRestTemplate(RestTemplateBuilder builder) throws Exception { // This configuration allows your application to skip the SSL check final TrustStrategy acceptingTrustStrategy = (cert, authType) -> true; final SSLContext sslContext = SSLContexts.custom() .loadTrustMaterial(null, acceptingTrustStrategy) .build(); final SSLConnectionSocketFactory sslsf = new SSLConnectionSocketFactory(sslContext, NoopHostnameVerifier.INSTANCE); final Registry<ConnectionSocketFactory> socketFactoryRegistry = RegistryBuilder.<ConnectionSocketFactory> create() .register("https", sslsf) .register("http", new PlainConnectionSocketFactory()) .build(); final BasicHttpClientConnectionManager connectionManager = new BasicHttpClientConnectionManager(socketFactoryRegistry); HttpClient client = HttpClients.custom() .setConnectionManager(connectionManager) .build(); return builder .requestFactory(() -> new HttpComponentsClientHttpRequestFactory(client)) .build(); } } 4. Create a Service To Call the OpenAI API Now that we have all of the configuration ready, it is time to implement a service that will handle the communication with the ChatGPT API. I am using the Spring component RestTemplate, which allows the execution of the HTTP requests to the OpenAI endpoint. Java import org.springframework.beans.factory.annotation.Value; import org.springframework.http.HttpEntity; import org.springframework.http.HttpHeaders; import org.springframework.http.HttpMethod; import org.springframework.http.MediaType; import org.springframework.stereotype.Service; import org.springframework.web.client.RestTemplate; @Service public class JavaOpenAIService { @Value("${openai.api.url}") private String apiUrl; @Value("${openai.api.key}") private String apiKey; @Value("${openai.api.model}") private String modelVersion; private final RestTemplate restTemplate; public JavaOpenAIService(RestTemplate restTemplate) { this.restTemplate = restTemplate; } /** * @param prompt - the question you are expecting to ask ChatGPT * @return the response in JSON format */ public String ask(String prompt) { HttpEntity<String> entity = new HttpEntity<>(buildMessageBody(modelVersion, prompt), buildOpenAIHeaders()); return restTemplate .exchange(apiUrl, HttpMethod.POST, entity, String.class) .getBody(); } private HttpHeaders buildOpenAIHeaders() { HttpHeaders headers = new HttpHeaders(); headers.set("Authorization", "Bearer " + apiKey); headers.set("Content-Type", MediaType.APPLICATION_JSON_VALUE); return headers; } private String buildMessageBody(String modelVersion, String prompt) { return String.format("{ \"model\": \"%s\", \"messages\": [{\"role\": \"user\", \"content\": \"%s\"}]}", modelVersion, prompt); } } 5. Create Your REST API Then, you can create your own REST API to receive the questions and redirect it to your service. Java import org.springframework.http.ResponseEntity; import org.springframework.web.bind.annotation.GetMapping; import org.springframework.web.bind.annotation.RequestParam; import org.springframework.web.bind.annotation.RestController; import br.com.erakles.springopenai.service.SpringOpenService; @RestController public class SpringOpenAIController { private final SpringOpenService springOpenService; SpringOpenAIController(SpringOpenService springOpenService) { this.springOpenService = springOpenService; } @GetMapping("/chat") public ResponseEntity<String> sendMessage(@RequestParam String prompt) { return ResponseEntity.ok(springOpenService.askMeAnything(prompt)); } } Conclusion These are the steps required to integrate your web application with the OpenAI service, so you can improve it later by adding more features like sending voice, images, and other files to their endpoints. After starting your Spring Boot application (./mvnw spring-boot:run), to test your web service, you must run the following URL http://localhost:8080/ask?promp={add-your-question}. If you did everything right, you will be able to read the result on your response body as follows: JSON { "id": "chatcmpl-9vSFbofMzGkLTQZeYwkseyhzbruXK", "object": "chat.completion", "created": 1723480319, "model": "gpt-3.5-turbo-0125", "choices": [ { "index": 0, "message": { "role": "assistant", "content": "Scuba stands for \"self-contained underwater breathing apparatus.\" It is a type of diving equipment that allows divers to breathe underwater while exploring the underwater world. Scuba diving involves using a tank of compressed air or other breathing gas, a regulator to control the flow of air, and various other accessories to facilitate diving, such as fins, masks, and wetsuits. Scuba diving allows divers to explore the underwater environment and observe marine life up close.", "refusal": null }, "logprobs": null, "finish_reason": "stop" } ], "usage": { "prompt_tokens": 12, "completion_tokens": 90, "total_tokens": 102 }, "system_fingerprint": null } I hope this tutorial helped in your first interaction with the OpenAI and makes your life easier while diving deeper into your AI journey. If you have any questions or concerns don't hesitate to send me a message.
I started evaluating Google's Gemini Code Assist development in December 2023, almost about its launch time. The aim of this article is to cover its usage and impact beyond basic code generation on all the activities that a developer is supposed to do in his daily life (especially with additional responsibilities entrusted to developers these days with the advent of "Shift-Left" and full stack development roles). Gemini Code Assist Gemini Code Assist can be tried at no cost until November 2024. These are the core features it offered at the time of carrying out this exercise: AI code assistance Natural language chat AI-powered smart actions Enterprise security and privacy Refer to the link for more details and pricing: Gemini Code Assist. Note Gemini Code Assist was formerly known as Duet AI. The entire content of the study has been divided into two separate articles. Interested readers should go through both of them in sequential order. The second part will be linked following its publication. This review expresses a personal view specific to Gemini Code Assist only. As intelligent code assist is an evolving field, the review points are valid based on features available at the time of carrying out this study. Gemini Code Assist Capabilities: What’s Covered in the Study as per Features Availability Gemini Pro Code Customization Code transformations ✓ Available for all users ✓ Local Context from Relevant files in local folder x Use Natural Language to modify existing code e.g java 8 to Java 21 ✓ Chat × Remote Context from private codebases ✓ Improve Code Generations ✓ Smart Actions Note: 1. Items marked with x will be available in future releases of Gemini Code Assist. 2. Code Transformations is not released publicly and is in preview at the time of writing. Technical Tools Below are technical tools used for different focus areas during the exercise. The study is done on the specified tools, languages, and frameworks below, but the results can be applicable to other similar modern languages and frameworks with minor variations. Focus Areas Tools Language and Framework Java 11 & 17; Spring Boot 2.2.3 & 3.2.5 Database Postgres Testing Junit, Mockito IDE and Plugins VS Studio Code with extensions: Cloud Code Extension Gemini Code Assist Cloud Platform GCP with Gemini API Enabled on a project (Pre-requisite) Docker Cloud SQL (Postgres) Cloud Run Development Lifecycle Stages and Activities For simplicity, the entire development lifecycle has been divided into different stages (below) encompassing different sets of activities that developers would normally do. For each lifecycle stage, some activities were selected and tried out in VS Code Editor using Gemini Code Assist. S.No# stage Activities 1 Bootstrapping Gain deeper Domain Understanding via Enterprise Knowledge Base: Confluence, Git repos, etc. Generate Scaffolding Code for Microservices: Controller, services, repository, models Pre-Generated Templates for Unit and Integration Tests Database Schema: Table creation, relationships, scripts test-data population 2 Build and Augment Implement Business Logic/Domain Rules Leverage Implementation Patterns: e.g., Configuration Mgt, Circuit Breaker, etc. Exception and Error Handling Logging/Monitoring Optimized Code for performance: asynchronous, time-outs, concurrency, non-blocking, remove boilerplate 3 Testing and Documentation Debugging: Using Postman to test API Endpoints Unit/Integration Tests Open API Specs Creation Code Coverage; Quality; Code Smells Test Plan Creation 4 Troubleshoot Invalid/No Responses or application errors 5 Deployment Deploy Services to GCP Stack: Cloud Run/GKE/App Engine, Cloud SQL 6 Operate Get assistance modifying/upgrading existing application code and ensuring smooth operations Requirements Let's now consider a fictitious enterprise whose background and some functional requirements are given below. We will see to what extent Gemini Code Assist can help in fulfilling them. Background Functional Requirements A fictitious enterprise that moved to the cloud or adopted “cloud-native” a few years back: Domain: E-commerce Let’s keep discussion centric to “Microservices” using Spring Boot and Java Grappling with multi-fold technical challenges: Green Field (new microservices to be created Brown Field (breaking monolithic to microservices, integration with legacy systems) Iterative Development (incremental updates to microservices, upgrades, code optimization, patches) Allow List Products in Catalog Add, Modify, and Delete Products in the Catalog Recommendation Service Asynchronous implementation to retrieve the latest price Query affiliated shops for a product and fetch the lowest price for a product Bulk addition of products and grouping of processed results based on success and failure status Rules:A product will belong to a single category and a category may have many products. Let's Start with Stage 1, Bootstrapping, to gain deeper domain understanding. 1. Bootstrapping During this phase, developers will: Need more understanding of domain (i.e., e-commerce, in this case) from Enterprise Knowledge Management (Confluence, Git, Jira, etc.). Get more details about specific services that will need to be created. Get a viewpoint on the choice of tech stack (i.e., Java and Spring Boot) with steps to follow to develop new services. Let’s see how Gemini Code Assist can help in this regard and to what extent. Prompt: "I want to create microservices for an e-commerce company. What are typical domains and services that need to be created for this business domain" Note: Responses above by Gemini Code Assist: Chat are based on information retrieved from public online/web sources on which it is trained, and not retrieved from the enterprise’s own knowledge sources, such as Confluence. Though helpful, this is generic e-commerce information. In the future when Gemini Code Assist provides information more contextual to the enterprise, it will be more effective. Let’s now try to generate some scaffolding code for the catalog and recommendation service first as suggested by Code Assist. First, we will build a Catalog Service through Gemini Code Assist. A total of 7 steps along with code snippets were generated. Relevant endpoints for REST API methods to test the service are also provided once the service is up. Let's begin with the first recommended step, "Create a new Spring Boot project." Building Catalog Service, Step 1 Generate project through Spring Initializr: Note: Based on user prompts, Gemini Code Assist generates code and instructions to follow in textual form. Direct generation of files and artifacts is not supported yet. Generated code needs to be copied to files at the appropriate location. Building Catalog Service, Steps 2 and 3 Add dependency for JPA, and define Product and Category entities: Building Catalog Service, Step 4 Create Repository interfaces: Building Catalog Service, Step 5 Update Service layer: Building Catalog Service, Steps 6 and 7 Update the Controller and run the application: Building Catalog Service, Additional Step: Postgres Database Specific This step was not initially provided by Gemini Code Assist, but is part of an extended conversation/prompt by the developer. Some idiosyncrasies — for example, the Postgres database name — can not contain hyphens and had to be corrected before using the generated scripts. Building Through Gemini Code Assist vs Code Generators A counterargument to using Gemini Code Assist can be that a seasoned developer without Gemini Code Assist may be able to generate scaffolding code with JPAEntities quickly based on his past experience and familiarity with existing codebase using tools such as Spring Roo, JHipster, etc. However, there may be a learning curve, configuration, or approvals required before such tools can be used in an enterprise setup. The ease of use of Gemini Code Assist and the flexibility to cater to diverse use cases across domains makes it a viable option even for a seasoned developer, and it can, in fact, complement code-gen tools and be leveraged as the next step to initial scaffolding. 2. Build and Augment Now let's move to the second stage, Build and Augment, and evolve the product catalog service further by adding, updating, and deleting products generated through prompts. Generate a method to save the product by specifying comments at the service layer: Along similar lines to the product-catalog service, we created a Recommendation service. Each of the steps can be drilled down further as we did during the product-catalog service creation. Now, let's add some business logic by adding a comment and using Gemini Code Assist Smart Actions to generate code. Code suggestions can be generated not only by comment, but Gemini Code Assist is also intelligent enough to provide suggestions dynamically based on developer keyboard inputs and intent. Re-clicking Smart Actions can give multiple options for code. Another interactive option to generate code is the Gemini Code Assist Chat Feature. Let’s now try to change existing business logic. Say we want to return a map of successful and failed product lists instead of a single list to discern which products were processed successfully and which ones failed. Let's try to improve the existing method by an async implementation using Gemini Code Assist. Next, let's try to refactor an existing code by applying a strategy pattern through Gemini Code Assist. Note: The suggested code builds PricingStrategy for Shops; e.g., RandomPricing and ProductLength pricing. But, still, this is too much boilerplate code, so a developer, based on his experience, should probe further with prompts to reduce the boilerplate code. Let's try to reduce boilerplate code through Gemini Code Assist. Note: Based on the input prompt, the suggestion is to modify the constructor of the shop class to accept an additional function parameter for pricingstrategy using Lambdas. Dynamic behavior can be passed during the instantiation of Shop class objects. 3. Testing and Documentation Now, let's move to stage 3, testing and documentation, and probe Gemini Code Assist on how to test the endpoint. As per the response, Postman, curl, unit tests, and integration tests are some options for testing provided by Gemini Code Assist. Now, let's generate the payload from Gemini Code Assist to test the /bulk endpoint via Postman. Let's see how effective Gemini Code Assist generated payloads are by hitting the /bulk endpoint. Let's see if we can fix it with Gemini Code Assist so that invalid category IDs can be handled using product creation. Next, let's generate Open AI Specifications for our microservices using Gemini Code Assist. Note: Documenting APIs so that it becomes easy for API consumers to call and integrate these API(s) in their applications is a common requirement in microservices projects. However, it is often a time-consuming activity for developers. Swagger/Open API Specs is a common format followed to document REST APIs. Gemini Code Assist generated Open API Specs that matched the expectations in this regard. Next, we are generating unit test cases at the Controller layer. Following a similar approach, unit test cases can be generated at other layers; i.e., service and repository, too. Next, we ran the generated unit test cases and checked if we encountered any errors. 4. Troubleshooting While running this application, we encountered an error on table and an entity name mismatch, which we were able to rectify with Gemini Code Assist help. Next, we encountered empty results on the get products call when data existed in the products table. To overcome this issue, we included Lombok dependencies for missing getters and setters. Debugging: An Old Friend to the Developer’s Rescue The debugging skill of the developer will be handy, as there would be situations where results may not be as expected for generated code, resulting in hallucinations. We noted that a developer needs to be aware of concepts such as marshalling, unmarshalling, and annotations such as @RequestBody to troubleshoot such issues and then get more relevant answers from Gemini Code Assist. This is where a sound development background will come in handy. An interesting exploration in this area could be whether Code Assist tools can learn and be trained on issues that other developers in an enterprise have encountered during the development while implementing similar coding patterns. The API call to create a new product finally worked after incorporating the suggestion of adding @RequestBody. Handling exceptions in a consistent manner is a standard requirement for all enterprise projects. Create a new package for exceptions, a base class to extend, and other steps to implement custom exceptions. Gemini Code Assist does a good job of meeting this requirement. Handling specific exceptions such as "ProductNotFound": Part 1 Conclusion This concludes Part 1 of the article. In Part 2, I will cover the impact of Gemini Code Assist on the remainder of the lifecycle stages, Deployment and Operate; also, productivity improvements in different development lifecycle stages, and the next steps prescribed thereof.
Our world is undergoing an AI revolution powered by very deep neural networks. With the advent of Apple Intelligence and Gemini, AI has reached the hands of every human being with a mobile phone. Apart from consumer AI, we also have deep learning models being used in several industries like automobile, finance, medical science, manufacturing, etc. This has motivated many engineers to learn deep learning techniques and apply them to solve complex problems in their projects. In order to help these engineers, it becomes imperative to lay down certain guiding principles to prevent common pitfalls when building these black box models. Any deep learning project involves five basic elements: data, model architecture, loss functions, optimizer, and evaluation process. It is critical to design and configure each of these appropriately to ensure proper convergence of models. This article shall cover some of the recommended practices and common problems and their solutions associated with each of these elements. Data All deep-learning models are data-hungry and require several thousands of examples at a minimum to reach their full potential. To begin with, it is important to identify the different sources of data and devise a proper mechanism for selecting and labeling data if required. It helps to build some heuristics for data selection and gives careful consideration to balance the data to prevent unintentional biases. For instance, if we are building an application for face detection, it is important to ensure that there is no racial or gender bias in the data, as well as the data is captured under different environmental conditions to ensure model robustness. Data augmentations for brightness, contrast, lighting conditions, random crop, and random flip also help to ensure proper data coverage. The next step is to carefully split the data into train, validation, and test sets while ensuring that there is no data leakage. The data splits should have similar data distributions but identical, or very closely related samples should not be present in both train and test sets. This is important, as if train samples are present in the test set, then we may see high test performance metrics but still several unexplained critical issues in production. Also, data leakage makes it almost impossible to know if the alternate ideas for model improvement are bringing about any real improvement or not. Thus, a diverse, leak-proof, balanced test dataset representative of the production environment is your best safeguard to deliver a robust deep learning-based model and product. Model Architecture In order to get started with model design, it makes sense to first identify the latency and performance requirements of the task at hand. Then, one can look at open-source benchmarks like this one to identify some suitable papers to work with. Whether we use CNNs or transformers, it helps to have some pre-trained weights to start with, to reduce training time. If no pre-trained weights are available, then suitable model initialization for each model layer is important to ensure that the model converges in a reasonable time. Also, if the dataset available is quite small (a few hundred samples or less), then it doesn’t make sense to train the whole model, rather just the last few task-specific layers should be fine-tuned. Now, whether to use CNN, transformers, or a combination of them is very specific to the problem. For natural language processing, transformers have been established as the best choice. For vision, if the latency budget is very tight, CNNs are still the better choice; otherwise, both CNNs and transformers should be experimented with to get the desired results. Loss Functions The most popular loss function for classification tasks is the Cross Entropy Loss and for regression tasks are the L1 or L2 (MSE) losses. However, there are certain variations of them available for numerical stability during model training. For instance in Pytorch, BCEWithLogitsLoss combines the sigmoid layer and BCELoss into a single class and uses the log-sum-exp trick which makes it more numerically stable than a sigmoid layer followed by BCELoss. Another example is of SmoothL1Loss which can be seen as a combination of L1 and L2 loss and makes the L1 Loss smooth near zero. However, care must be taken when using smooth L1 Loss to set the beta appropriately as its default value of 1.0 may not be suitable for regressing values in sine and cosine domains. The figures below show the loss values for L1, L2 (MSE), and Smooth L1 losses and also the change in smooth L1 Loss value for different beta values. Optimizer Stochastic Gradient Descent with momentum has traditionally been a very popular optimizer among researchers for most problems. However, in practice, Adam is generally easier to use but suffers from generalization problems. Transformer papers have popularized the AdamW optimizer which decouples the weight-decay factor’s choice from the learning rate and significantly improves the generalization ability of Adam optimizer. This has made AdamW the optimal choice for optimizers these days. Also, it isn’t necessary to use the same learning rate for the whole network. Generally, if starting from a pre-trained checkpoint, it is better to freeze or keep a low learning rate for the initial layers and a higher learning rate for the deeper task-specific layers. Evaluation and Generalization Developing a proper framework for evaluating the model is the key to preventing issues in production. This should involve both quantitative and qualitative metrics for not only the full benchmark dataset but also for specific scenarios. This should be done to ensure that performance is acceptable in every scenario and there is no regression. Performance metrics should be carefully chosen to ensure that they appropriately represent the task to be achieved. For example, precision/recall or F1 score may be better than accuracy in many unbalanced problems. At times, we may have several metrics to compare alternate models, then it generally helps to come up with a single weighted metric that can simplify the comparison process. For instance, the nuScenes dataset introduced NDS (nuScenes Detection Score) which is a weighted sum of mAP (mean average precision), mATE (mean average translation error), mASE (mean average scale error), mAOE(mean average orientation error), mAVE(mean average velocity error) and mAAE(mean average attribute error) to simplify comparison of various 3D object detection models. Further, one should also visualize the model outputs whenever possible. This could involve drawing bounding boxes on input images for 2D object detection models or plotting cuboids on LIDAR point clouds for 3D object detection models. This manual verification ensures that model outputs are reasonable and there is no apparent pattern in model errors. Additionally, it helps to pay close attention to training and validation loss curves to check for overfitting or underfitting. Overfitting is a problem wherein validation loss diverges from training loss and starts increasing, representing that the model is not generalizing well. This problem can generally be fixed by adding proper regularization like weight-decay, drop-out layers, adding more data augmentation, or by using early stopping. Underfitting, on the other hand, represents the case where the model doesn’t have enough capacity to even fit the training data. This can be identified by the training loss not going down enough and/or remaining more or less flat over the epochs. This problem can be addressed by adding more layers to the model, reducing data augmentations, or selecting a different model architecture. The figures below show examples of overfitting and underfitting through the loss curves. The Deep Learning Journey Unlike traditional software engineering, deep learning is more experimental and requires careful tuning of hyper-parameters. However, if the fundamentals mentioned above are taken care of, this process can be more manageable. Since the models are black boxes, we have to leverage the loss curves, output visualizations, and performance metrics to understand model behavior and correspondingly take corrective measures. Hopefully, this guide can make your deep learning journey a little less taxing.
Editor's Note: The following is an article written for and published in DZone's 2024 Trend Report, Database Systems: Modernization for Data-Driven Architectures. Modern database practices enhance performance, scalability, and flexibility while ensuring data integrity, consistency, and security. Some key practices include leveraging distributed databases for scalability and reliability, using cloud databases for on-demand scalability and maintenance, and implementing NoSQL databases for handling unstructured data. Additionally, data lakes store vast amounts of raw data for advanced analytics, and in-memory databases speed up data retrieval by storing data in main memory. The advent of artificial intelligence (AI) is rapidly transforming database development and maintenance by automating complex tasks, enhancing efficiency, and ensuring system robustness. This article explores how AI can revolutionize development and maintenance through automation, best practices, and AI technology integration. The article also addresses the data foundation for real-time AI applications, offering insights into database selection and architecture patterns to ensure low latency, resiliency, and high-performance systems. How Generative AI Enables Database Development and Maintenance Tasks Using generative AI (GenAI) for database development can significantly enhance productivity and accuracy by automating key tasks, such as schema design, query generation, and data cleaning. It can generate optimized database structures, assist in writing and optimizing complex queries, and ensure high-quality data with minimal manual intervention. Additionally, AI can monitor performance and suggest tuning adjustments, making database development and maintenance more efficient. Generative AI and Database Development Let's review how GenAI can assist some key database development tasks: Requirement analysis. The components that need additions and modifications for each database change request are documented. Utilizing the document, GenAI can help identify conflicts between change requirements, which will help in efficient planning for implementing change requests across dev, QA, and prod environments. Database design. GenAI can help develop the database design blueprint based on the best practices for normalization, denormalization, or one big table design. The design phase is critical and establishing a robust design based on best practices can prevent costly redesigns in the future. Schema creation and management. GenAI can generate optimized database schemas based on initial requirements, ensuring best practices are followed based on normalization levels and partition and index requirements, thus reducing design time. Packages, procedures, and functions creation. GenAI can help optimize the packages, procedures, and functions based on the volume of data that is processed, idempotency, and data caching requirements. Query writing and optimization. GenAI can assist in writing and optimizing complex SQL queries, reducing errors, and improving execution speed by analyzing data structures based on data access costs and available metadata. Data cleaning and transformation. GenAI can identify and correct anomalies, ensuring high-quality data with minimal manual intervention from database developers. Generative AI and Database Maintenance Database maintenance to ensure efficiency and security is crucial to a database administrator's (DBA) role. Here are some ways that GenAI can assist critical database maintenance tasks: Backup and recovery. AI can automate back-up schedules, monitor back-up processes, and predict potential failures. GenAI can generate scripts for recovery scenarios and simulate recovery processes to test their effectiveness. Performance tuning. AI can analyze query performance data, suggest optimizations, and generate indexing strategies based on access paths and cost optimizations. It can also predict query performance issues based on historical data and recommend configuration changes. Security management. AI can identify security vulnerabilities, suggest best practices for permissions and encryption, generate audit reports, monitor unusual activities, and create alerts for potential security breaches. Database monitoring and troubleshooting. AI can provide real-time monitoring, anomaly detection, and predictive analytics. It can also generate detailed diagnostic reports and recommend corrective actions. Patch management and upgrades. AI can recommend optimal patching schedules, generate patch impact analysis reports, and automate patch testing in a sandbox environment before applying them to production. Enterprise RAG for Database Development Retrieval augmented generation (RAG) helps in schema design, query optimization, data modeling, indexing strategies, performance tuning, security practices, and back-up and recovery plans. RAG improves efficiency and effectiveness by retrieving best practices and generating customized, context-aware recommendations and automated solutions. Implementing RAG involves: Building a knowledge base Developing retrieval mechanisms Integrating generation models Establishing a feedback loop To ensure efficient, scalable, and maintainable database systems, RAG aids in avoiding mistakes by recommending proper schema normalization, balanced indexing, efficient transaction management, and externalized configurations. RAG Pipeline When a user query or prompt is input into the RAG system, it first interprets the query to understand what information is being sought. Based on the query, the system searches a vast database or document store for relevant information. This is typically accomplished using vector embeddings, where both the query and the documents are converted into vectors in a high-dimensional space, and similarity measures are used to retrieve the most relevant documents. The retrieved information, along with the original query, is fed into a language model. This model uses both the input query and the context provided by the retrieved documents to generate a more informed, accurate, and relevant response or output. Figure 1. Simple RAG pipeline Vector Databases for RAG Vector databases are tailored for high-dimensional vector operations, making them perfect for similarity searches in AI applications. Non-vector databases, however, manage transactional data and complex queries across structured, semi-structured, and unstructured data formats. The table below outlines the key differences between vector and non-vector databases: Table 1. Vector databases vs. non-vector databases Feature Vector Databases Non-Vector Databases Primary use case Similarity search, machine learning, AI Transactional data, structured queries Data structure High-dimensional vectors Structured data (tables), semi-structured data (JSON), unstructured data (documents) Indexing Specialized indexes for vector data Traditional indexes (B-tree, hash) Storage Vector embeddings Rows, documents, key-value pairs Query types k-NN (k-nearest neighbors), similarity search CRUD operations, complex queries (joins, aggregations) Performance optimization Optimized for high-dimensional vector operations Optimized for read/write operations and complex queries Data retrieval Nearest neighbor search, approximate nearest neighbor (ANN) search SQL queries, NoSQL queries When taking the vector database route, choosing a suitable vector database involves evaluating: data compatibility, performance, scalability, integration capabilities, operational considerations, cost, security, features, community support, and vendor stability. By carefully assessing these aspects, one can select a vector database that meets the application's requirements and supports its growth and performance objectives. Vector Databases for RAG Several vector databases in the industry are commonly used for RAG, each offering unique features to support efficient vector storage, retrieval, and integration with AI workflows: Qdrant and Chroma are powerful vector databases designed to handle high-dimensional vector data, which is essential for modern AI and machine learning tasks. Milvus, an open-source and highly scalable database, supports various vector index types and is used for video/image retrieval and large-scale recommendation systems. Faiss, a library for efficient similarity search, is widely used for large-scale similarity search and AI inference due to its high efficiency and support for various indexing methods. These databases are chosen based on specific use cases, performance requirements, and ecosystem compatibility. Vector Embeddings Vector embeddings can be created for diverse content types, such as data architecture blueprints, database documents, podcasts on vector database selection, and videos on database best practices for use in RAG. A unified, searchable knowledge base can be constructed by converting these varied forms of information into high-dimensional vector representations. This enables efficient and context-aware retrieval of relevant information across different media formats, enhancing the ability to provide precise recommendations, generate optimized solutions, and support comprehensive decision-making processes in database development and maintenance. Figure 2. Vector embeddings Vector Search and Retrieval Vector search and retrieval in RAG involve converting diverse data types (e.g., text, images, audio) into high-dimensional vector embeddings using machine learning models. These embeddings are indexed using techniques like hierarchical navigable small world (HNSW) or ANN to enable efficient similarity searches. When a query is made, it is also converted into a vector embedding and compared against the indexed vectors using distance metrics, such as cosine similarity or Euclidean distance, to retrieve the most relevant data. This retrieved information is then used to augment the generation process, providing context and improving the relevance and accuracy of the generated output. Vector search and retrieval are highly effective for applications such as semantic search, where queries are matched to similar content, and recommendation systems, where user preferences are compared to similar items to suggest relevant options. They are also used in content generation, where the most appropriate information is retrieved to enhance the accuracy and context of the generated output. LLMOps for AI-Powered Database Development Large language model operations (LLMOps) for AI-powered database development leverages foundational and fine-tuned models, effective prompt management, and model observability to optimize performance and ensure reliability. These practices enhance the accuracy and efficiency of AI applications, making them well suited for diverse, domain-specific, and robust database development and maintenance tasks. Foundational Models and Fine-Tuned Models Leveraging large, pre-trained GenAI models offers a solid base for developing specialized applications because of their training on diverse datasets. Domain adaptation involves additional training of these foundational models on domain-specific data, increasing their relevance and accuracy in fields such as finance and healthcare. A small language model is designed for computational efficiency, featuring fewer parameters and a smaller architecture compared to large language models (LLMs). Small language models aim to balance performance with resource usage, making them ideal for applications with limited computational power or memory. Fine-tuning these smaller models on specific datasets enhances their performance for particular tasks while maintaining computational efficiency and keeping them up to date. Custom deployment of fine-tuned small language models ensures they operate effectively within existing infrastructure and meet specific business needs. Prompt Management Effective prompt management is crucial for optimizing the performance of LLMs. This includes using various prompt types like zero-shot, single-shot, few-shot, and many-shot and learning to customize responses based on the examples provided. Prompts should be clear, concise, relevant, and specific to enhance output quality. Advanced techniques such as recursive prompts and explicit constraints help ensure consistency and accuracy. Methods like chain of thought (COT) prompts, sentiment directives, and directional stimulus prompting (DSP) guide the model toward more nuanced and context-aware responses. Prompt templating standardizes the approach, ensuring reliable and coherent results across tasks. Template creation involves designing prompts tailored to different analytical tasks, while version control manages updates systematically using tools like Codeberg. Continuous testing and refining of prompt templates further improve the quality and relevance of generated outputs. Model Observability Model observability ensures models function optimally through real-time monitoring, anomaly detection, performance optimization, and proactive maintenance. By enhancing debugging, ensuring transparency, and enabling continuous improvement, model observability improves AI systems' reliability, efficiency, and accountability, reducing operational risks and increasing trust in AI-driven applications. It encompasses synchronous and asynchronous methods to ensure the models function as intended and deliver reliable outputs. Generative AI-Enabled Synchronous Observability and AI-Enabled Asynchronous Data Observability Using AI for synchronous and asynchronous data observability in database development and maintenance enhances real-time and historical monitoring capabilities. Synchronous observability provides real-time insights and alerts on database metrics, enabling immediate detection and response to anomalies. Asynchronous observability leverages AI to analyze historical data, identify long-term trends, and predict potential issues, thus facilitating proactive maintenance and deep diagnostics. Together, these approaches ensure robust performance, reliability, and efficiency in database operations. Figure 3. LLMOps for model observability and database development Conclusion Integrating AI into database development and maintenance drives efficiency, accuracy, and scalability by automating tasks and enhancing productivity. In particular: Enterprise RAG, supported by vector databases and LLMOps, further optimizes database management through best practices. Data observability ensures comprehensive monitoring, enabling proactive and real-time responsiveness. Establishing a robust data foundation is crucial for real-time AI applications, ensuring systems meet real-time demands effectively. Integrating generative AI into data architectures and database selections, analytics layer building, data cataloging, data fabric, and data mesh development will increase automation and optimization, leading to more efficient and accurate data analytics. The benefits of leveraging AI in database development and maintenance will allow organizations to continuously improve performance and their database's reliability, thus increasing value and stance in the industry. Additional resources: Getting Started With Vector Databases by Miguel Garcia, DZone Refcard Getting Started With Large Language Models by Tuhin Chattopadhyay, DZone Refcard This is an excerpt from DZone's 2024 Trend Report, Database Systems: Modernization for Data-Driven Architectures.Read the Free Report
During my 10+ years of experience in Agile product development, I have seen the difficulties of meeting the rapid requirements of the digital market. Manual procedures can slow down highly flexible software engineering and delivery teams, resulting in missed chances and postponed launches. With AI and Large Language Models (LLMs) becoming more prevalent, we are on the verge of a major change. Gartner points out a 25% increase in project success rates for those using predictive analytics (Gartner, 2021). These technologies are changing the way agile product development is optimized - by automating tasks, improving decision-making, and forecasting future trends. As stated in a report from McKinsey, companies using AI experience a 20% decrease in project costs (McKinsey & Company, 2023). In this article, I discuss how agile product development including any experiences and user journeys can be improved based on AI and LLM integrations across the development lifecycle. Also Read: "The Foundation of AI and Analytics Success: Why Architecture Matters" AI and LLM Integration Phases for Agile Product Development Automating User Story Generation Creating user stories is crucial for Agile development, although it can be time-consuming. LLMs, for example, such as GPT-4 from OpenAI are able to streamline the process by creating comprehensive user stories using available documentation and feedback. This speeds up the process while also enhancing precision and significance. Application Scenario For example, I focus on utilizing AI or LLM-based methods for streamlining, optimizing, and automating the creation of user stories. Integrating such methods with a comprehensive backlog has allowed me to improve product development lifecycles and any engineering prioritization. This significantly reduces user story creation time, which is also helpful for solutions architects and increases user satisfaction where there is more relevant and accurate feature development. Significance and Advantages The automation of generating user stories is essential as it reduces the monotonous job of creating stories by hand, enabling product managers and software engineers to concentrate on more strategic tasks. This process guarantees that user stories are created uniformly and in line with user requirements, resulting in improved prioritization and quicker development cycles. Assisting agile teams in sustaining their progress and releasing features that better align with user needs. Additionally, organizations that adopt AI for generating user stories usually see a 50% reduction in story creation time (Menzies & Zimmermann, 2022). Also Read: "User Story Reflections" Optimizing Backlog Prioritization Key to swift value delivery is effective prioritization of the backlog. AI algorithms analyze user feedback, market trends, and technical dependencies to forecast the most valuable features. This approach driven by data assists product managers in making well-informed choices. Application Scenario For example, during the development of a digital healthcare consumer platform, I utilized AI tools to review user feedback and determine which backlog items to focus on first. This was mapped across different prioritization techniques as well as how engineering would execute them based on complexity. As a result, there was a 40% rise in feature utilization and a 20% decrease in feature development duration, which also helped the software engineering team improve their metrics. Significance and Advantages It is crucial to prioritize backlog optimization in order to make informed decisions that improve the value of the product and customer satisfaction. Utilizing AI for prioritization aids agile teams in determining which features will yield the greatest benefit, enabling them to utilize resources effectively and concentrate on tasks with significant impact. Companies that have implemented AI for prioritizing their backlog have seen a 40% growth in feature adoption (Buch & Pokiya, 2020). Leveraging Predictive Analytics Predictive analytics offers insight to help shape development tactics. AI models can predict risks and estimate delivery times by examining historical data, helping teams address issues and align development efforts with market changes. Further, this can help agile product development teams assess how to staff across sprints and ensure workforce optimization to improve feature velocity. Application Scenario For example, I use predictive analytics in collaboration with engineering development and delivery teams to predict how new features would affect Sprint planning, Sprint allocation, and user engagement. The information assisted in determining which updates were most important as well as need execution in upcoming sprints and has allowed me to optimize MVPs, resulting in a ~25% rise in user retention and a ~15% increase in new user acquisition across two different products. Significance and Advantages Predictive analytics offer practical insights that steer strategic choices in flexible product development. Teams can prioritize new features that will have the greatest impact on user engagement and retention by predicting their effects. Businesses that use predictive analytics have observed a 25% rise in customer retention (Forrester, 2019). Improving Product Experiences and User Journeys AI and LLMs improve user journeys and product experiences through a more user-focused approach to development. Automated creation of user stories guarantees that features are developed according to genuine user requirements, resulting in products that are more instinctive and captivating. This alignment improves user satisfaction and involvement by customizing features to meet specific needs and desires. Use Case For example, I used LLMs to analyze user feedback and create features that directly addressed user pain points. This resulted in streamlining and optimizing how different product features are lined up along with tech debt for engineering execution. I have seen a ~35% increase in user engagement significant reduction in user churn rates. Significance and Advantages Improving product experiences and user journeys with AI and LLMs ensures a user-focused approach in product development, resulting in more user-friendly and personalized experiences. Aligning with user needs not only boosts satisfaction but also enhances engagement and retention. After incorporating AI-driven improvements, companies have experienced a 35% rise in user engagement (Ransbotham, Kiron, Gerbert, & Reeves, 2018). Supporting Agile Product Development and Product Management Incorporating AI and LLMs into agile product development changes how teams tackle and carry out projects, providing numerous advantages. To begin with, these technologies simplify the process of developing user stories, cutting down on manual work and allowing more time for strategic duties. This results in enhanced precision and significance in feature advancement. Also, by using AI to prioritize the backlog, teams can concentrate on important tasks, leading to better use of resources and increased overall productivity. Predictive analytics enhances value by predicting feature performance, allowing teams to make educated decisions that increase user retention and engagement. From my own experience, I've noticed that these advancements not only speed up the process of development but also make products better suited to user requirements, resulting in a more agile and adaptable development setting. The integration of AI in agile product development leads to improved product management, faster iterations, and enhanced user experience. For example, the global AI-assisted custom application development market is expected to grow up to $61Bn and from 21% to 28% by 2024 (Deloitte Insights, 2020). As a product manager working across multiple software engineering teams, AI and LLMs have helped me simplify decision-making by automating routine tasks and providing actionable insights. Automated user story generation and backlog prioritization free up time to focus on strategic aspects, while predictive analytics offers data-driven forecasts and trend analysis. This results in a more agile and responsive product management process, where decisions are guided by comprehensive data and real-time insights, ultimately leading to more successful product outcomes and better market alignment. Benefits of AI and LLMs for Agile Product Development Conclusion and Next Steps The incorporation of AI and LLMs in agile product development seems like a dynamic revolution. In my opinion, these tools have revolutionized the way tasks are done by automating them, streamlining processes, and forecasting trends accurately. They have made workflows more efficient and enhanced product experiences, resulting in more agile and responsive development cycles. As we further accept and improve these technologies, I look forward to witnessing how their developing abilities will continue to change our strategy for creating and providing outstanding products. The process of incorporating AI and LLMs into agile product development methods is indeed exciting and filled with potential. Key Takeaways Start using AI and LLM tools to automate and improve the generation of user stories and prioritize backlogs in your development processes. Utilize predictive analytics: Employ predictive analytics to gain insight into potential project risks and market trends, enabling proactive modifications. Prioritize user-centric development: Utilize AI-generated insights to enhance product experiences for better user satisfaction and retention.
Software development and architecture is continuously evolving with artificial intelligence (AI). AI-assisted code generation stands out as a particularly revolutionary advancement, offering developers the ability to create high-quality code more efficiently and accurately than ever before. This innovation not only enhances productivity, but also opens the door to new possibilities in software creation, particularly in the realm of microservices development. The Evolution of Code Generation: Traditional Coding vs. AI-Assisted Coding Traditional coding requires developers to write and test extensive lines of code manually. This process is time consuming and prone to errors. Conversely, AI-assisted code generation leverages machine learning algorithms to analyze patterns in existing codebases, understands programming logic, and generates code snippets or entire programs based on specific requirements. This technology can drastically reduce the time spent on repetitive coding tasks and minimize human errors. It is not a substitute for developers, but rather a productivity tool that eliminates tedious and monotonous infrastructure and plumbing code. Benefits of AI-Assisted Code Generation Below is a list of some of the key benefits of leveraging AI-assisted code generation. Increased Efficiency: AI can quickly generate code, which allows developers to focus on more complex and creative aspects of software development. This leads to faster project completion times and the ability to tackle more projects simultaneously. Improved Code Quality: By learning from vast datasets of existing code, AI can produce high-quality code that adheres to best practices and industry standards. This results in more robust and maintainable software. Enhanced Collaboration: AI tools can bridge the gap between different development teams by providing consistent code styles and standards. This facilitates better collaboration and smoother integration of different software components. Rapid Prototyping: With AI-assisted code generation, developers can quickly create prototypes to test new ideas and functionalities. This accelerates the innovation cycle and helps bring new products to market faster. The Relationship Between AI and Microservices Microservices architecture has gained popularity in recent years because of its ability to break down complex applications into smaller, manageable services. Each service can be developed, deployed, and scaled independently, offering greater flexibility and resilience than a monolithic architecture. AI-assisted code generation is particularly well-suited for creating microservices, as it can handle the intricacies of defining and managing numerous small, interconnected services. A Platform for AI-Generated Microservices One example of AI in practice is ServiceBricks, an open-source platform that uses AI to generate microservices. Users provide human-readable text, which the AI then converts into fully functional microservices, including REST APIs for create, update, delete, get, and query operations. The platform also generates DTO models, source code, project files, class files, unit tests, and integration tests, thereby automating parts of the development process and reducing the time and effort needed to build scalable, maintainable microservices. The Future of AI-Assisted Development As AI technology continues to advance, its role in software development will only expand. Future iterations of AI-assisted code generation tools will likely become even more intuitive and capable, handling more complex programming tasks and integrating seamlessly with various development environments. The ultimate goal is to create a synergistic relationship between human developers and AI, where each leverages their strengths to produce superior software solutions. Conclusion AI-assisted code generation is transforming software development by enhancing efficiency, code quality, and innovation. This technology is reshaping how microservices and other essential components are developed, paving the way for greater productivity and creativity. As AI technology continues to evolve, it holds the potential to drive further advancements in software development, enabling developers to reach new heights in excellence and innovation worldwide.
Goal of This Application In this article, we will build an advanced data model and use it for ingestion and various search options. For the notebook portion, we will run a hybrid multi-vector search, re-rank the results, and display the resulting text and images. Ingest data fields, enrich data with lookups, and format: Learn to ingest data including JSON and images, format and transform to optimize hybrid searches. This is done inside the streetcams.py application. Store data into Milvus: Learn to store data in Milvus, an efficient vector database designed for high-speed similarity searches and AI applications. In this step, we are optimizing the data model with scalar and multiple vector fields — one for text and one for the camera image. We do this in the streetcams.py application. Use open source models for data queries in a hybrid multi-modal, multi-vector search: Discover how to use scalars and multiple vectors to query data stored in Milvus and re-rank the final results in this notebook. Display resulting text and images: Build a quick output for validation and checking in this notebook. Simple Retrieval-Augmented Generation (RAG) with LangChain: Build a simple Python RAG application (streetcamrag.py) to use Milvus for asking about the current weather via Ollama. While outputing to the screen we also send the results to Slack formatted as Markdown. Summary By the end of this application, you’ll have a comprehensive understanding of using Milvus, data ingest object semi-structured and unstructured data, and using open source models to build a robust and efficient data retrieval system. For future enhancements, we can use these results to build prompts for LLM, Slack bots, streaming data to Apache Kafka, and as a Street Camera search engine. Milvus: Open Source Vector Database Built for Scale Milvus is a popular open-source vector database that powers applications with highly performant and scalable vector similarity searches. Milvus has a distributed architecture that separates compute and storage, and distributes data and workloads across multiple nodes. This is one of the primary reasons Milvus is highly available and resilient. Milvus is optimized for various hardware and supports a large number of indexes. You can get more details in the Milvus Quickstart. For other options for running Milvus, check out the deployment page. New York City 511 Data REST Feed of Street Camera information with latitude, longitude, roadway name, camera name, camera URL, disabled flag, and blocked flag: JSON { "Latitude": 43.004452, "Longitude": -78.947479, "ID": "NYSDOT-badsfsfs3", "Name": "I-190 at Interchange 18B", "DirectionOfTravel": "Unknown", "RoadwayName": "I-190 Niagara Thruway", "Url": "https://nyimageurl", "VideoUrl": "https://camera:443/rtplive/dfdf/playlist.m3u8", "Disabled":true, "Blocked":false } We then ingest the image from the camera URL endpoint for the camera image: After we run it through Ultralytics YOLO, we will get a marked-up version of that camera image. NOAA Weather Current Conditions for Lat/Long We also ingest a REST feed for weather conditions meeting latitude and longitude passed in from the camera record that includes elevation, observation date, wind speed, wind direction, visibility, relative humidity, and temperature. JSON "currentobservation":{ "id":"KLGA", "name":"New York, La Guardia Airport", "elev":"20", "latitude":"40.78", "longitude":"-73.88", "Date":"27 Aug 16:51 pm EDT", "Temp":"83", "Dewp":"60", "Relh":"46", "Winds":"14", "Windd":"150", "Gust":"NA", "Weather":"Partly Cloudy", "Weatherimage":"sct.png", "Visibility":"10.00", "Altimeter":"1017.1", "SLP":"30.04", "timezone":"EDT", "state":"NY", "WindChill":"NA" } Ingest and Enrichment We will ingest data from the NY REST feed in our Python loading script. In our streetcams.py Python script does our ingest, processing, and enrichment. We iterate through the JSON results from the REST call then enrich, update, run Yolo predict, then we run a NOAA Weather lookup on the latitude and longitude provided. Build a Milvus Data Schema We will name our collection: "nycstreetcameras". We add fields for metadata, a primary key, and vectors. We have a lot of varchar variables for things like roadwayname, county, and weathername. Python FieldSchema(name='id', dtype=DataType.INT64, is_primary=True, auto_id=True), FieldSchema(name='latitude', dtype=DataType.VARCHAR, max_length=200), FieldSchema(name='longitude', dtype=DataType.VARCHAR, max_length=200), FieldSchema(name='name', dtype=DataType.VARCHAR, max_length=200), FieldSchema(name='roadwayname', dtype=DataType.VARCHAR, max_length=200), FieldSchema(name='directionoftravel', dtype=DataType.VARCHAR, max_length=200), FieldSchema(name='videourl', dtype=DataType.VARCHAR, max_length=200), FieldSchema(name='url', dtype=DataType.VARCHAR, max_length=200), FieldSchema(name='filepath', dtype=DataType.VARCHAR, max_length=200), FieldSchema(name='creationdate', dtype=DataType.VARCHAR, max_length=200), FieldSchema(name='areadescription', dtype=DataType.VARCHAR, max_length=200), FieldSchema(name='elevation', dtype=DataType.VARCHAR, max_length=200), FieldSchema(name='county', dtype=DataType.VARCHAR, max_length=200), FieldSchema(name='metar', dtype=DataType.VARCHAR, max_length=200), FieldSchema(name='weatherid', dtype=DataType.VARCHAR, max_length=200), FieldSchema(name='weathername', dtype=DataType.VARCHAR, max_length=200), FieldSchema(name='observationdate', dtype=DataType.VARCHAR, max_length=200), FieldSchema(name='temperature', dtype=DataType.FLOAT), FieldSchema(name='dewpoint', dtype=DataType.VARCHAR, max_length=200), FieldSchema(name='relativehumidity', dtype=DataType.VARCHAR, max_length=200), FieldSchema(name='windspeed', dtype=DataType.VARCHAR, max_length=200), FieldSchema(name='winddirection', dtype=DataType.VARCHAR, max_length=200), FieldSchema(name='gust', dtype=DataType.VARCHAR, max_length=200), FieldSchema(name='weather', dtype=DataType.VARCHAR, max_length=200), FieldSchema(name='visibility', dtype=DataType.VARCHAR, max_length=200), FieldSchema(name='altimeter', dtype=DataType.VARCHAR, max_length=200), FieldSchema(name='slp', dtype=DataType.VARCHAR, max_length=200), FieldSchema(name='timezone', dtype=DataType.VARCHAR, max_length=200), FieldSchema(name='state', dtype=DataType.VARCHAR, max_length=200), FieldSchema(name='windchill', dtype=DataType.VARCHAR, max_length=200), FieldSchema(name='weatherdetails', dtype=DataType.VARCHAR, max_length=8000), FieldSchema(name='image_vector', dtype=DataType.FLOAT_VECTOR, dim=512), FieldSchema(name='weather_text_vector', dtype=DataType.FLOAT_VECTOR, dim=384) The two vectors are image_vector and weather_text_vector, which contain an image vector and text vector. We add an index for the primary key id and for each vector. We have a lot of options for these indexes and they can greatly improve performance. Insert Data Into Milvus We then do a simple insert into our collection with our scalar fields matching the schema name and type. We have to run an embedding function on our image and weather text before inserting. Then we have inserted our record. We can then check our data with Attu. Building a Notebook for Report We will build a Jupyter notebook to query and report on our multi-vector dataset. Prepare Hugging Face Sentence Transformers for Embedding Sentence Text We utilize a model from Hugging Face, "all-MiniLM-L6-v2", a sentence transformer to build our Dense embedding for our short text strings. This text is a short description of the weather details for the nearest location to our street camera. See: Integrate with HuggingFace Prepare Embedding Model for Images We utilize a standard resnet34 Pytorch feature extractor that we often use for images. Instantiate Milvus As stated earlier, Milvus is a popular open-source vector database that powers AI applications with highly performant and scalable vector similarity search. For our example, we are connecting to Milvus running in Docker. Setting the URI as a local file, e.g., ./milvus.db, is the most convenient method, as it automatically utilizes Milvus Lite to store all data in this file. If you have a large scale of data, say more than a million vectors, you can set up a more performant Milvus server on Docker or Kubernetes. In this setup, please use the server URI, e.g.http://localhost:19530, as your uri. If you want to use Zilliz Cloud, the fully managed cloud service for Milvus, adjust the URI and token, which correspond to the Public Endpoint and API key in Zilliz Cloud. Prepare Our Search We are building two searches (AnnSearchRequest) to combine together for a hybrid search which will include a reranker. Display Our Results We display the results of our re-ranked hybrid search of two vectors. We show some of the output scalar fields and an image we read from the stored path. The results from our hybrid search can be iterated and we can easily access all the output fields we choose. filepath contains the link to the locally stored image and can be accessed from the key.entity.filepath. The key contains all our results, while key.entity has all of our output fields chosen in our hybrid search in the previous step. We iterate through our re-ranked results and display the image and our weather details. RAG Application Since we have loaded a collection with weather data, we can use that as part of a RAG (Retrieval Augmented Generation). We will build a completely open-source RAG application utilizing the local Ollama, LangChain, and Milvus. We set up our vector_store as Milvus with our collection. Python vector_store = Milvus( embedding_function=embeddings, collection_name="CollectionName", primary_field = "id", vector_field = "weather_text_vector", text_field="weatherdetails", connection_args={"uri": "https://localhost:19530"}, ) We then connect to Ollama. Python llm = Ollama( model="llama3", callback_manager=CallbackManager([StreamingStdOutCallbackHandler()]), stop=["<|eot_id|>"], ) We prompt for interacting questions. Python query = input("\nQuery: ") We set up a RetrievalQA connection between our LLM and our vector store. We pass in our query and get the result. Python qa_chain = RetrievalQA.from_chain_type( llm, retriever=vector_store.as_retriever(collection = SC_COLLECTION_NAME)) result = qa_chain({"query": query}) resultforslack = str(result["result"]) We then post the results to a Slack channel. Python response = client.chat_postMessage(channel="C06NE1FU6SE", text="", blocks=[{"type": "section", "text": {"type": "mrkdwn", "text": str(query) + " \n\n" }, {"type": "divider"}, {"type": "section","text": {"type": "mrkdwn","text": str(resultforslack) +"\n" }] ) Below is the output from our chat to Slack. You can find all the source code for the notebook, the ingest script, and the interactive RAG application in GitHub below. Source Code Conclusion In this notebook, you have seen how you can use Milvus to do a hybrid search on multiple vectors in the same collection and re-ranking the results. You also saw how to build a complex data modal that includes multiple vectors and many scalar fields that represent a lot of metadata related to our data. You learned how to ingest JSON, images, and text to Milvus with Python. And finally, we built a small chat application to check out the weather for locations near traffic cameras. To build your own applications, please check out the resources below. Resources In the following list, you can find resources helpful in learning more about using pre-trained embedding models for Milvus, performing searches on text data, and a great example notebook for embedding functions. Milvus Reranking Milvus Hybrid Search 511NY: GET api/GetCameras Using PyMilvus's Model To Generate Text Embeddings HuggingFace: sentence-transformers/all-MiniLM-L6-v2 Pretrained Models Milvus: SentenceTransformerEmbeddingFunction Vectorizing JSON Data with Milvus for Similarity Search Milvus: Scalar Index Milvus: In-memory Index Milvus: On-disk Index GPU Index Not Every Field is Just Text, Numbers, or Vectors How good is Quantization in Milvus?
Today, several significant and safety-critical decisions are being made by deep neural networks. These include driving decisions in autonomous vehicles, diagnosing diseases, and operating robots in manufacturing and construction. In all such cases, scientists and engineers claim that these models help make better decisions than humans and hence, help save lives. However, how these networks reach their decisions is often a mystery, for not just their users, but also for their developers. These changing times, thus, necessitate that as engineers we spend more time unboxing these black boxes so that we can identify the biases and weaknesses of the models that we build. This may also allow us to identify which part of the input is most critical for the model and hence, ensure its correctness. Finally, explaining how models make their decisions will not only build trust between AI products and their consumers but also help meet the diverse and evolving regulatory requirements. The whole field of explainable AI is dedicated to figuring out the decision-making process of models. In this article, I wish to discuss some of the prominent explanation methods for understanding how computer vision models arrive at a decision. These techniques can also be used to debug models or to analyze the importance of different components of the model. The most common way to understand model predictions is to visualize heat maps of layers close to the prediction layer. These heat maps when projected on the image allow us to understand which parts of the image contribute more to the model’s decision. Heat maps can be generated either using gradient-based methods like CAM, or Grad-CAM or perturbation-based methods like I-GOS or I-GOS++. A bridge between these two approaches, Score-CAM, uses the increase in model confidence scores to provide a more intuitive way of generating heat maps. In contrast to these techniques, another class of papers argues that these models are too complex for us to expect just a single explanation for their decision. Most significant among these papers is the Structured Attention Graphs method which generates a tree to provide multiple possible explanations for a model to reach its decision. Class Activation Map (CAM) Based Approaches 1. CAM Class Activation Map (CAM) is a technique for explaining the decision-making of specific types of image classification models. Such models have their final layers consisting of a convolutional layer followed by global average pooling, and a fully connected layer to predict the class confidence scores. This technique identifies the important regions of the image by taking a weighted linear combination of the activation maps of the final convolutional layer. The weight of each channel comes from its associated weight in the following fully connected layer. It's quite a simple technique but since it works for a very specific architectural design, its application is limited. Mathematically, the CAM approach for a specific class c can be written as: where is the weight for activation map (A) of the kth channel of the convolutional layer. ReLU is used as only positive contributions of the activation maps are of interest for generating the heat map. 2. Grad-CAM The next step in CAM evolution came through Grad-CAM, which generalized the CAM approach to a wider variety of CNN architectures. Instead of using the weights of the last fully connected layer, it determines the gradient flowing into the last convolutional layer and uses that as its weight. So for the convolutional layer of interest A, and a specific class c, they compute the gradient of the score for class c with respect to the feature map activations of a convolutional layer. Then, this gradient is the global average pooled to obtain the weights for the activation map. The final obtained heat map is of the same shape as the feature map output of that layer, so it can be quite coarse. Grad-CAM maps become progressively worse as we move to more initial layers due to reducing receptive fields of the initial layers. Also, gradient-based methods suffer from vanishing gradients due to the saturation of sigmoid layers or zero-gradient regions of the ReLU function. 3. Score-CAM Score-CAM addresses some of these shortcomings of Grad-CAM by using Channel-wise Increase of Confidence (CIC) as the weight for the activation maps. Since it does not use gradients, all gradient-related shortcomings are eliminated. Channel-wise Increase of Confidence is computed by following the steps below: Upsampling the channel activation maps to input size and then, normalizing them Then, computing the pixel-wise product of the normalized maps and the input image Followed by taking the difference of the model output for the above input tensors and some base images which gives an increase in confidence Finally, applying softmax to normalize the activation maps weights to [0, 1] The Score-CAM approach can be applied to any layer of the model and provides one of the most reasonable heat maps among the CAM approaches. In order to illustrate the heat maps generated by Grad-CAM and Score-CAM approaches, I selected three images: bison, camel, and school bus images. For the model, I used the Convnext-Tiny implementation in TorchVision. I extended the PyTorch Grad-CAM repo to generate heat maps for the layer convnext_tiny.features[7][2].block[5]. From the visualization below, one can observe that Grad-CAM and Score-CAM highlight similar regions for the bison image. However, Score-CAM’s heat map seems to be more intuitive for the camel and school bus examples. Perturbation-Based Approaches Perturbation-based approaches work by masking part of the input image and then observing how this affects the model's performance. These techniques directly solve an optimization problem to determine the mask that can best explain the model’s behavior. I-GOS and I-GOS++ are the most popular techniques under this category. 1. Integrated Gradients Optimized Saliency (I-GOS) The I-GOS paper generates a heat map by finding the smallest and smoothest mask that optimizes for the deletion metric. This involves identifying a mask such that if the masked portions of the image are removed, the model's prediction confidence will be significantly reduced. Thus, the masked region is critical for the model’s decision-making. The mask in I-GOS is obtained by finding a solution to an optimization problem. One way to solve this optimization problem is by applying conventional gradients in the gradient descent algorithm. However, such a method can be very time-consuming and is prone to getting stuck in local optima. Thus, instead of using conventional gradients, the authors recommend using integrated gradients to provide a better descent direction. Integrated gradients are calculated by going from a baseline image (giving very low confidence in model outputs) to the original image and accumulating gradients on images along this line. 2. I-GOS++ I-GOS++ extends I-GOS by also optimizing for the insertion metric. This metric implies that only keeping the highlighted portions of the heat map should be sufficient for the model to retain confidence in its decision. The main argument for incorporating insertion masks is to prevent adversarial masks which don’t explain the model behavior but are very good at deletion metrics. In fact, I-GOS++ tries to optimize for three masks: a deletion mask, an insertion mask, and a combined mask. The combined mask is the dot product of the insertion and deletion masks and is the output of the I-GOS++ technique. This technique also adds regularization to make masks smooth on image areas with similar colors, thus enabling the generation of better high-resolution heat maps. Next, we compare the heat maps of I-GOS and I-GOS++ with Grad-CAM and Score-CAM approaches. For this, I made use of the I-GOS++ repo to generate heat maps for the Convnext-Tiny model for the bison, camel, and school bus examples used above. One can notice in the visualization below that the perturbation techniques provide less diffused heat maps compared to the CAM approaches. In particular, I-GOS++ provides very precise heat maps. Structured Attention Graphs for Image Classification The Structured Attention Graphs (SAG) paper presents a counter view that a single explanation (heat map) is not sufficient to explain a model's decision-making. Rather multiple possible explanations exist which can also explain the model’s decision equally well. Thus, the authors suggest using beam-search to find all such possible explanations and then using SAGs to concisely present this information for easier analysis. SAGs are basically “directed acyclic graphs” where each node is an image patch and each edge represents a subset relationship. Each subset is obtained by removing one patch from the root node’s image. Each root node represents one of the possible explanations for the model’s decision. To build the SAG, we need to solve a subset selection problem to identify a diverse set of candidates that can serve as the root nodes. The child nodes are obtained by recursively removing one patch from the parent node. Then, the scores for each node are obtained by passing the image represented by that node through the model. Nodes below a certain threshold (40%) are not expanded further. This leads to a meaningful and concise representation of the model's decision-making process. However, the SAG approach is limited to only coarser representations as combinatorial search is very computationally expensive. Some illustrations for Structured Attention Graphs are provided below using the SAG GitHub repo. For the bison and camel examples for the Convnext-Tiny model, we only get one explanation; but for the school bus example, we get 3 independent explanations. Applications of Explanation Methods Model Debugging The I-GOS++ paper presents an interesting case study substantiating the need for model explainability. The model in this study was trained to detect COVID-19 cases using chest x-ray images. However, using the I-GOS++ technique, the authors discovered a bug in the decision-making process of the model. The model was paying attention not only to the area in the lungs but also to the text written on X-ray images. Obviously, the text should not have been considered by the model, indicating a possible case of overfitting. To alleviate this issue, the authors pre-processed the images to remove the text and this improved the performance of the original diagnosis task. Thus, a model explainability technique, IGOS++ helped debug a critical model. Understanding Decision-Making Mechanisms of CNNs and Transformers Jiang et. al. in their CVPR 2024 paper, deployed SAG, I-GOS++, and Score-CAM techniques to understand the decision-making mechanism of the most popular types of networks: Convolutional Neural Networks (CNNs) and Transformers. This paper applied explanation methods on a dataset basis instead of a single image and gathered statistics to explain the decision-making of these models. Using this approach, they found that Transformers have the ability to use multiple parts of an image to reach their decisions in contrast to CNNs which use several disjoint smaller sets of patches of images to reach their decision. Key Takeaways Several heat map techniques like Grad-CAM, Score-CAM, IGOS, and IGOS++ can be used to generate visualizations to understand which parts of the image a model focuses on when making its decisions. Structured Attention Graphs provide an alternate visualization to provide multiple possible explanations for the model’s confidence in its predicted class. Explanation techniques can be used to debug the models and can also help better understand model architectures.
If you haven't already, be sure to review Part 1 where we reviewed data collection and prepared a dataset for our model to train on. In the previous section, we gathered the crucial "ingredients" for our AI creation — the data. This forms the foundation of our model. Remember, the quality of the ingredients (your data) directly impacts the quality of the final dish (your model's performance). Now, we'll transform that data into a fully functioning Large Language Model (LLM). By the end of this section, you'll be interacting with your very own AI! Choosing Your Base Layer Before we dive into training, we’ll explore the different approaches to training your LLM. This is like choosing the right flour for your bread recipe — it significantly influences the capabilities and limitations of your final creation. There are many ways to go about training an ML model. This is also an active area of research, with new methodologies emerging every day. Let’s take a look at the major tried-and-true categories of methods of model development. (Note: These methods are not necessarily mutually exclusive.) Key Approaches 1. Start From Scratch (Pretraining Your Own Model) This offers the most flexibility, but it's the most resource-intensive path. The vast amounts of data and compute resources required here mean that only the most well-resourced corporations are able to train novel pre-trained models. 2. Fine-Tuning (Building on a Pre-trained Model) This involves starting with a powerful, existing LLM and adapting it to our specific meal-planning task. It's like using pre-made dough — you don't have to start from zero, but you can still customize it. 3. Leveraging Open-Source Models Explore a growing number of open source models, often pre-trained on common tasks, to experiment without the need for extensive pre-training. 4. Using Commercial Off-the-Shelf Models For production-ready applications, consider commercial LLMs (e.g., from Google, OpenAI, Microsoft) for optimized performance, but with potential customization limits. 5. Cloud Services Streamline training and deployment with powerful tools and managed infrastructure, simplifying the process. Choosing the Right Approach The best foundation for your LLM depends on your specific needs: Time and resources: Do you have the capacity for pretraining, or do you need a faster solution? Customization: How much control over the model's behavior do you require? Cost: What's your budget? Can you invest in commercial solutions? Performance: What level of accuracy and performance do you need? Capabilities: What level of technical skills and/or compute resources do you have access to? Moving Forward We'll focus on fine-tuning Gemini Pro in this tutorial, striking a balance between effort and functionality for our meal-planning model. Getting Ready to Train: Export Your Dataset Now that we've chosen our base layer, let's get our data ready for training. Since we're using Google Cloud Platform (GCP), we need our data in JSONL format. Note: Each model might have specific data format requirements, so always consult the documentation before proceeding. Luckily, converting data from Google Sheets to JSONL is straightforward with a little Python. Export to CSV: First, export your data from Google Sheets as a CSV file. Convert CSV to JSONL: Run the following Python script, replacing your_recipes.csv with your actual filename: Python import csv import json csv_file = 'your_recipes.csv' # Replace 'your_recipes.csv' with your CSV filename jsonl_file = 'recipes.jsonl' with open(csv_file, 'r', encoding='utf-8') as infile, open(jsonl_file, 'w', encoding='utf-8') as outfile: reader = csv.DictReader(infile) for row in reader: row['Prompt'] = row['Prompt'].splitlines() row['Response'] = row['Response'].splitlines() json.dump(row, outfile) outfile.write('\n') This will create a recipes.jsonl file where each line is a JSON object representing a meal plan. Training Your Model We’re finally ready to start training our LLM. Let’s dive in! 1. Project Setup Google Cloud Project: Create a new Google Cloud project if you don't have one already (free tier available). Enable APIs: Search for "Vertex AI" in your console, and on the Vertex AI page, click Enable All Recommended APIs. Authentication: Search for "Service Accounts," and on that page, click Create Service Account. Use the walkthrough to set up a service account and download the required credentials for secure access. Cloud Storage Bucket: Find the "Cloud Storage" page and create a storage bucket. 2. Vertex AI Setup Navigate to Vertex AI Studio (free tier available). Click Try it in Console in a browser where you are already logged in to your Google Cloud Account. In the left-hand pane find and click Language. Navigate to the “Tune and Distill” tab: 3. Model Training Click Create Tuned Model. For this example, we’ll do a basic fine-tuning task, so select “Supervised Tuning” (should be selected by default). Give your model a name. Select a base model: We’ll use Gemini Pro 1.0 002 for this example. Click Continue. Upload your JSONL file that you generated in Step 2. You’ll be asked for a “dataset location.” This is just where your JSONL file is going to be located in the cloud. You can use the UI to very easily create a "bucket" to store this data. Click start and wait for the model to be trained! With this step, you have now entered the LLM AI arena. The quality of the model you produce is only limited by your imagination and the quality of the data you can find, prepare, and/or generate for your use case. For our use case, we used the data we generated earlier, which included prompts about how individuals could achieve their specific health goals, and meal plans that matched those constraints. 4. Test Your Model Once your model is trained, you can test it by navigating to it on the Tune and Distill main page. In that interface, you can interact with the newly created model the same way you would with any other chatbot. In the next section, we will show you how to host your newly created model to run evaluations and wire it up for an actual application! Deploying Your Model You've trained your meal planning LLM on Vertex AI, and it's ready to start generating personalized culinary masterpieces. Now it's time to make your AI chef accessible to the world! This post will guide you through deploying your model on Vertex AI and creating a user-friendly bot interface. Create an endpoint: Navigate to the Vertex AI section in the Google Cloud Console. Select "Endpoints" from the left-hand menu and click "Create Endpoint." Give your endpoint a descriptive name (e.g., "meal-planning-endpoint"). Deploy your model: Within your endpoint, click "Deploy model." Select your trained model from the Cloud Storage bucket where you saved it. Specify a machine type suitable for serving predictions (consider traffic expectations). Choose a deployment scale (e.g., "Manual Scaling" for initial testing, "Auto Scaling" for handling variable demand). Deploy the model. Congratulations! You've now trained and tested your very own LLM on Google's Vertex AI. You are now an AI engineer! In the next and final installment of this series, we'll take you through the exciting steps of deploying your model, creating a user-friendly interface, and unleashing your meal-planning AI upon the world! Stay tuned for the grand finale of our LLM adventure.
Tuhin Chattopadhyay
CEO at Tuhin AI Advisory and Professor of Practice,
JAGSoM
Yifei Wang
Senior Machine Learning Engineer,
Meta
Austin Gil
Developer Advocate,
Akamai
Tim Spann
Principal Developer Advocate,
Zilliz