Artificial intelligence (AI) and machine learning (ML) are two fields that work together to create computer systems capable of perception, recognition, decision-making, and translation. Separately, AI is the ability for a computer system to mimic human intelligence through math and logic, and ML builds off AI by developing methods that "learn" through experience and do not require instruction. In the AI/ML Zone, you'll find resources ranging from tutorials to use cases that will help you navigate this rapidly growing field.
Chat Completion Models vs OpenAI Assistants API
Explore Open WebUI: Your Offline AI Interface
The year 2025 is the year of AI agents. For the purposes of this article, an AI agent is a system that can leverage AI to achieve a goal by following a series of steps, possibly reasoning on its results and making corrections. In practice, the steps that an agent follows can constitute a graph. We will build a reactive agent (meaning that it reacts to a stimulus, in our case, the input from a user) to help people find their perfect vacation. Our agent will find the best city in the specified country, considering the food, sea, and activity specified by the user. The agent will look like this: In the first phase, it will collect information in parallel, ranking the cities by a single characteristic. The last step will use this information to choose the best city. You could use a search engine to collect information, but we will use ChatGPT for all the steps, though we will use different models. You could write all the code by hand or use some library to help you simplify the code a bit. Today, we will use a new feature that I added to Fibry, my Actor System, to implement the graph and control the parallelism in great detail. Fibry is a simple and small Actor System that provides an easy way to leverage actors to simplify multi-threading code and does not have any dependency. Fibry also implements a Finite State Machine, so I decided to extend it to make it easier to write agents in Java. My inspiration has been LangGraph. As Fibry is about multi-threading, the new features allow plenty of flexibility in deciding the level of parallelism while keeping everything as simple as possible. You should use Fibry 3.0.2, for example: Plain Text compile group: 'eu.lucaventuri', name: 'fibry', version: '3.0.2' Defining the Prompts The first step is defining the prompts that we need for the LLM: Java public static class AiAgentVacations { private static final String promptFood = "You are a foodie from {country}. Please tell me the top 10 cities for food in {country}."; private static final String promptActivity = "You are from {country}, and know it inside out. Please tell me the top 10 cities in {country} where I can {goal}"; private static final String promptSea = "You are an expert traveler, and you {country} inside out. Please tell me the top 10 cities for sea vacations in {country}."; private static final String promptChoice = """ You enjoy traveling, eating good food and staying at the sea, but you also want to {activity}. Please analyze the following suggestions from your friends for a vacation in {country} and choose the best city to visit, offering the best mix of food and sea and where you can {activity}. Food suggestions: {food}. Activity suggestions: {activity}. Sea suggestions: {sea}. """; } Defining the States Normally, you would define four states, one for each step. However, since branching out and back is quite common, I added a feature to handle this with only a single state. As a result, we need only two states: CITIES, where we collect information, and CHOICE, where we choose the city. Plain Text enum VacationStates { CITIES, CHOICE } Defining the Context The different steps of the agent will collect information that needs to be stored somewhere; let’s call it context. Ideally, you would want every step to be independent and know as little as possible of the other, but achieving this in a simple way, with a low amount of code while keeping as much type safety as possible and maintaining thread safety, is not exactly straightforward. As a result, I choose to force the context to be a record, providing some functionality to update the values of the record (using reflection underneath) while we wait for JEP 468 (Derived Record Creation) to be implemented. Java public record VacationContext(String country, String goal, String food, String activity, String sea, String proposal) { public static VacationContext from(String country, String goal) { return new VacationContext(country, goal, null, null, null, null); } } Defining the Nodes Now, we can define the logic of the agent. We will allow the user to use two different LLM models, for example, a “normal” LLM for the search and a “reasoning” one for the choice step. This is where things become a bit trickier, as it is quite dense: Java AgentNode<VacationStates, VacationContext> nodeFood = state -> state.setAttribute("food", modelSearch.call("user", replaceField(promptFood, state.data(), "country"))); AgentNode<VacationStates, VacationContext> nodeActivity = state -> state.setAttribute("activity", modelSearch.call("user", replaceField(promptActivity, state.data(), "country"))); AgentNode<VacationStates, VacationContext> nodeSea = state -> state.setAttribute("sea", modelSearch.call("user", replaceField(promptSea, state.data(), "country"))); AgentNode<VacationStates, VacationContext> nodeChoice = state -> { var prompt = replaceAllFields(promptChoice, state.data()); System.out.println("***** CHOICE PROMPT: " + prompt); return state.setAttribute("proposal", modelThink.call("user", prompt)); }; As you might have guessed, modelSearch is the model used for search (e.g., ChatGPT 4o), and modelThink could be a “reasoning model” (e.g., ChatGPT o1). Fibry provides a simple LLM interface and a simple implementation for ChatGPT, exposed by the class ChatGpt. Please note that calling ChatPGT API requires an API key that you need to define using the “-DOPENAI_API_KEY=xxxx” JVM parameter. Different and more advanced use cases will require custom implementations or the usage of a library. There is also a small issue related to the philosophy of Fibry, as Fibry is meant not to have any dependencies, and this gets tricky with JSON. As a result, now Fibry can operate in two ways: If Jackson is detected, Fibry will use it with reflection to parse JSON.If Jackson is not detected, a very simple custom parser (that seems to work with ChatGPT output) is used. This is recommended only for quick tests, not for production.Alternatively, you can provide your own JSON processor implementation and call JsonUtils.setProcessor(), possibly checking JacksonProcessor for inspiration. The replaceField() and replaceAllFields() methods are defined by RecordUtils and are just convenience methods to replace text in the prompt, so that we can provide our data to the LLM.The setAttribute() function is used to set the value of an attribute in the state without you having to manually recreate the record or define a list of “withers” methods. There are other methods that you might use, like mergeAttribute(), addToList(), addToSet(), and addToMap(). Building the Agent Now that we have the logic, we need to describe the graph of dependencies between states and specify the parallelism we want to achieve. If you imagine a big multi-agent system in production, being able to express the parallelism required to maximize performance without exhausting resources, hitting rate limiters, or exceeding the parallelism allowed by external systems is a critical feature. This is where Fibry can help, making everything explicit but relatively easy to set up. Let’s start creating the agent builder: Plain Text var builder = AiAgent.<VacationStates, VacationContext>builder(true); The parameter autoGuards is used to put automatic guards on the states, which means that they are executed with an AND logic, and a state is executed only after all the incoming states have been processed. If the parameter is false, the state is called once for each incoming state. In the previous example, if the intention is to execute D once after A and once after C, then autoGuards should be false, while if you want it to be called only once after both have been executed, then autoGuards should be true. But let’s continue with the vacation agent. Plain Text builder.addState(VacationStates.CHOICE, null, 1, nodeChoice, null); Let’s start with the method addState(). It is used to specify that a certain state should be followed by another state and execute a certain logic. In addition, you can specify the parallelism (more on that soon) and the guards. In this case: The state is CHOICEThere is no default following state (e.g., this is a final state)The parallelism is 1There is no guard The next state is just a default because the node has the possibility to overwrite the next state, which means that the graph can dynamically change at runtime, and in particular, it can perform cycles, for example, if some steps need to be repeated to collect more or better information. This is an advanced use case. An unexpected concept might be the parallelism. This has no consequences in a single run of the agent, but it is meaningful in production at scale. In Fibry, every node is backed by an actor, which, from a practical point of view, is a thread with a list of messages to process. Every message is an execution step. So parallelism is the number of messages that can be executed at a single time. In practice: parallelism == 1 means there is only one thread managing the step, so only one execution at a time.parallelism > 1 means that there is a thread pool backing the actor, with the number of threads specified by the user. By default, it uses virtual threads.parallelism == 0 means that every message creates a new actor backed by a virtual thread, so the parallelism can be as high as necessary. Every step can be configured configured independently, which should allow you to configure performance and resource usage quite well. Please consider that if parallelism != 1, you might have multi-threading, as the thread confinement typically associated with actors is lost. This was a lot to digest. If it is clear, you can check state compression. State Compression As said earlier, it is quite common to have a few states that are related to each other, they need to be performed in parallel and join before moving to a common state. In this case, you do not need to define multiple states, but you can use only one: Plain Text builder.addStateParallel(VacationStates.CITIES, VacationStates.CHOICE, 1, List.of(nodeFood, nodeActivity, nodeSea), null); In this case, we see that the CITIES state is defined by three nodes, and addStateParallel() takes care of executing them in parallel and waits for the execution of all of them to be finished. In this case, the parallelism is applied to each node, so in this case, you will get three single-thread actors. Please note that if you do not use autoGuards, this basically allows you to mix OR and AND logic. In case you want to merge some nodes in the same state, but they need to be executed serially (e.g., because they need information generated by the previous node), the addStateSerial() method is also available. AIAgent creation is simple, but there are a few parameters to specify: The initial stateThe final state (which can be null)A flag to execute states in parallel when possible Plain Text var vacationAgent = builder.build(VacationStates.CITIES, null, true); Now we have an agent, and we can use it, calling process: Plain Text vacationsAgent.process(AiAgentVacations.VacationContext.from("Italy", "Dance Salsa and Bachata"), (state, info) -> System.out.println(state + ": " + info)); This version of process() takes two parameters: The initial state, which contains the information required by the agent to perform its actionsAn optional listener, for example, if you want to print the output of each step If you need to start the action and check its return value, later, you can use processAsync(). If you are interested in learning more about the parallelism options, I recommend you check the unit test TestAIAgent. It simulates an agent with nodes that sleep for a while and can help you see the impact of each choice: But I promised you a multi-agent, didn’t I? Extending to Multi-Agents The AIAgent that you just created is an actor, so it runs on its own thread (plus all the threads used by the nodes), and it also implements the Function interface, in case you need it. There is actually nothing special about a multi-agent; just one or more nodes of an agent ask another agent to perform an action. However, you can build a library of agents and combine them in the best way while simplifying the whole system. Let’s imagine that we want to leverage the output of our previous agent and use it to calculate how much that vacation would cost so the user can decide if it is affordable enough. Like a real Travel Agent! This is what we want to build: First, we need prompts to extract the destination and compute the cost. Java private static final String promptDestination = "Read the following text describing a destination for a vacation and extract the destination as a simple city and country, no preamble. Just the city and the country. {proposal}"; private static final String promptCost = "You are an expert travel agent. A customer asked you to estimate the cost of travelling from {startCity}, {startCountry} to {destination}, for {adults} adults and {kids} kids}"; We just need two states, one to research the cities, which is done by the previous agent, and one to calculate the cost. Plain Text enum TravelStates { SEARCH, CALCULATE } We also need a context, that should also hold the proposal from the previous agent. Plain Text public record TravelContext(String startCity, String startCountry, int adults, int kids, String destination, String cost, String proposal) { } Then we can define the agent logic, which requires as a parameter another agent. The first node calls the previous agent to get the proposal. Java var builder = AiAgent.<TravelStates, TravelContext>builder(false); AgentNode<TravelStates, TravelContext> nodeSearch = state -> { var vacationProposal = vacationsAgent.process(AiAgentVacations.VacationContext.from(country, goal), 1, TimeUnit.MINUTES, (st, info) -> System.out.print(debugSubAgentStates ? st + ": " + info : "")); return state.setAttribute("proposal", vacationProposal.proposal()) .setAttribute("destination", model.call(promptDestination.replaceAll("\\{proposal\\}", vacationProposal.proposal()))); }; The second node computes the cost: Plain Text AgentNode<TravelStates, TravelContext> nodeCalculateCost = state -> state.setAttribute("cost", model.call(replaceAllFields(promptCost, state.data()))); Then, we can define the graph and build the agent Java builder.addState(TravelStates.SEARCH, TravelStates.CALCULATE, 1, nodeSearch, null); builder.addState(TravelStates.CALCULATE, null, 1, nodeCalculateCost, null); var agent = builder.build(TravelStates.SEARCH, null, false); Now we can instantiate the two agents (I chose to use ChatGPT 4o and ChatGPT 01-mini) and use them: Java try (var vacationsAgent = AiAgentVacations.buildAgent(ChatGPT.GPT_MODEL_4O, ChatGPT.GPT_MODEL_O1_MINI)) { try (var travelAgent = AiAgentTravelAgency.buildAgent(ChatGPT.GPT_MODEL_4O, vacationsAgent, "Italy", "Dance Salsa and Bachata", true)) { var result = travelAgent.process(new AiAgentTravelAgency.TravelContext("Oslo", "Norway", 2, 2, null, null, null), (state, info) -> System.out.println(state + ": " + info)); System.out.println("*** Proposal: " + result.proposal()); System.out.println("\n\n\n*** Destination: " + result.destination()); System.out.println("\n\n\n*** Cost: " + result.cost()); } } Final Outputs If you wonder what the result is, here is the long output that you can get when stating that what you want to do is to dance Salsa and Bachata: Destination Plain Text Naples, Italy Proposal Plain Text Based on the comprehensive analysis of your friends' suggestions, **Naples** emerges as the ideal city for your vacation in Italy. Here's why Naples stands out as the best choice, offering an exceptional mix of excellent food, beautiful seaside experiences, and a vibrant salsa and bachata dance scene: ### **1. Vibrant Dance Scene** - **Dance Venues:** Naples boasts numerous venues and events dedicated to salsa and bachata, ensuring that you can immerse yourself in lively dance nights regularly. - **Passionate Culture:** The city's passionate and energetic atmosphere enhances the overall dance experience, making it a hotspot for Latin dance enthusiasts. ### **2. Culinary Excellence** - **Authentic Neapolitan Pizza:** As the birthplace of pizza, Naples offers some of the best and most authentic pizzerias in the world. - **Fresh Seafood:** Being a coastal city, Naples provides access to a wide variety of fresh seafood dishes, enhancing your culinary adventures. - **Delicious Pastries:** Don't miss out on local specialties like **sfogliatella**, a renowned Neapolitan pastry that is a must-try for any foodie. ### **3. Stunning Seaside Location** - **Bay of Naples:** Enjoy breathtaking views and activities along the Bay of Naples, including boat tours and picturesque sunsets. - **Proximity to Amalfi Coast:** Naples serves as a gateway to the famous Amalfi Coast, allowing you to explore stunning coastal towns like Amalfi, Positano, and Sorrento with ease. - **Beautiful Beaches:** Relax on the city's beautiful beaches or take short trips to nearby seaside destinations for a perfect blend of relaxation and exploration. ### **4. Cultural Richness** - **Historical Sites:** Explore Naples' rich history through its numerous museums, historic sites, and UNESCO World Heritage landmarks such as the Historic Centre of Naples. - **Vibrant Nightlife:** Beyond dancing, Naples offers a lively nightlife scene with a variety of bars, clubs, and entertainment options to suit all tastes. ### **5. Accessibility and Convenience** - **Transportation Hub:** Naples is well-connected by air, rail, and road, making it easy to travel to other parts of Italy and beyond. - **Accommodation Options:** From luxury hotels to charming boutique accommodations, Naples offers a wide range of lodging options to fit your preferences and budget. ### **Conclusion** Naples perfectly balances a thriving dance scene, exceptional culinary offerings, and beautiful seaside attractions. Its unique blend of culture, history, and vibrant nightlife makes it the best city in Italy to fulfill your desires for travel, good food, and lively dance experiences. Whether you're dancing the night away, savoring authentic pizza by the sea, or exploring nearby coastal gems, Naples promises an unforgettable vacation. ### **Additional Recommendations** - **Day Trips:** Consider visiting nearby attractions such as Pompeii, the Isle of Capri, and the stunning Amalfi Coast to enrich your travel experience. - **Local Experiences:** Engage with locals in dance classes or attend festivals to dive deeper into Naples' vibrant cultural scene. Enjoy your trip to Italy, and may Naples provide you with the perfect blend of everything you're looking for! Cost Plain Text To estimate the cost of traveling from Oslo, Norway, to Naples, Italy, for two adults and two kids, we need to consider several key components of the trip: flights, accommodations, local transportation, food, and activities. Here's a breakdown of potential costs: 1. **Flights**: - Round-trip flights from Oslo to Naples typically range from $100 to $300 per person, depending on the time of booking, the season, and the airline. Budget airlines might offer lower prices, while full-service carriers could be on the higher end. - For a family of four, the cost could range from $400 to $1,200. 2. **Accommodations**: - Hotels in Naples can vary significantly. Expect to pay approximately $70 to $150 per night for a mid-range hotel room that accommodates a family. Vacation rentals might offer more flexibility and potentially lower costs. - For a typical 5-night stay, this would range from $350 to $750. 3. **Local Transportation**: - Public transportation in Naples (buses, metro, trams) is affordable, and daily tickets cost around $4 per person. - Assume about $50 to $100 for the family's local transport for the entire trip, depending on usage. 4. **Food**: - Dining costs are highly variable. A budget for meals might be around $10-$20 per person per meal at casual restaurants, while dining at mid-range restaurants could cost $20-$40 per person. - A family of four could expect to spend around $50 to $100 per day, reaching a total of $250 to $500 for five days. 5. **Activities**: - Entry fees for attractions can vary. Some museums and archaeological sites charge around $10 to $20 per adult, with discounts for children. - Budget around $100 to $200 for family activities and entrance fees. 6. **Miscellaneous**: - Always allow a little extra for souvenirs, snacks, and unexpected expenses. A typical buffer might be $100 to $200. **Estimated Total Cost**: - **Low-end estimate**: $1,250 - **High-end estimate**: $2,950 These are general estimates and actual costs can vary based on when you travel, how far in advance you book, and your personal preferences for accommodation and activities. For the most accurate assessment, consider reaching out to airlines for current flight prices, hotels for room rates, and looking into specific attractions you wish to visit. That was a lot, and this is only the output of the two “reasoning” models! But the result is quite interesting. Naples is on my bucket list, and I am curious to see if the agent is correct! Let’s also check the intermediate results to see how it reached this conclusion, which seems reasonable to me. Intermediate Outputs If you are curious, there are intermediate results. Food Plain Text As a foodie exploring Italy, you're in for a treat, as the country boasts a rich culinary heritage with regional specialties. Here's a list of the top 10 cities in Italy renowned for their food: 1. **Bologna** - Often referred to as the gastronomic heart of Italy, Bologna is famous for its rich Bolognese sauce, tasty mortadella, and fresh tagliatelle. 2. **Naples** - The birthplace of pizza, Naples offers authentic Neapolitan pizza, as well as delicious seafood and pastries like sfogliatella. 3. **Florence** - Known for its Florentine steak, ribollita (a hearty bread and vegetable soup), and delicious wines from the surrounding Tuscany region. 4. **Rome** - Enjoy classic Roman dishes such as carbonara, cacio e pepe, and Roman-style artichokes in the bustling capital city. 5. **Milan** - A city that blends tradition and innovation, Milan offers risotto alla milanese, ossobuco, and an array of high-end dining experiences. 6. **Turin** - Known for its chocolate and coffee culture, as well as traditional dishes like bagna cauda and agnolotti. 7. **Palermo** - Sample the vibrant street food scene with arancini, panelle, and sfincione, as well as fresh local seafood in this Sicilian capital. 8. **Venice** - Famous for its seafood risotto, sarde in saor (sweet and sour sardines), and cicchetti (Venetian tapas) to enjoy with a glass of prosecco. 9. **Parma** - Home to the famous Parmigiano-Reggiano cheese and prosciutto di Parma, it’s a haven for lovers of cured meats and cheeses. 10. **Genoa** - Known for its pesto Genovese, focaccia, and variety of fresh seafood dishes, Genoa offers a unique taste of Ligurian cuisine. Each of these cities offers a distinct culinary experience influenced by local traditions and ingredients, making them must-visit destinations for any food enthusiast exploring Italy. Sea Plain Text Italy is renowned for its stunning coastline and beautiful seaside cities. Here are ten top cities and regions perfect for a sea vacation: 1. **Amalfi** - Nestled in the famous Amalfi Coast, this city is known for its dramatic cliffs, azure waters, and charming coastal villages. 2. **Positano** - Also on the Amalfi Coast, Positano is famous for its colorful buildings, steep streets, and picturesque pebble beachfronts. 3. **Sorrento** - Offering incredible views of the Bay of Naples, Sorrento serves as a gateway to the Amalfi Coast and provides a relaxing seaside atmosphere. 4. **Capri** - The island of Capri is known for its rugged landscape, upscale hotels, and the famous Blue Grotto, a spectacular sea cave. 5. **Portofino** - This quaint fishing village on the Italian Riviera is known for its picturesque harbor, pastel-colored houses, and luxurious coastal surroundings. 6. **Cinque Terre** - Comprising five stunning villages along the Ligurian coast, Cinque Terre is a UNESCO World Heritage site known for its dramatic seaside and hiking trails. 7. **Taormina** - Situated on a hill on the east coast of Sicily, Taormina offers sweeping views of the Ionian Sea and beautiful beaches like Isola Bella. 8. **Rimini** - Located on the Adriatic coast, Rimini is known for its long sandy beaches and vibrant nightlife, making it a favorite for beach-goers and party enthusiasts. 9. **Alghero** - A city on the northwest coast of Sardinia, Alghero is famous for its medieval architecture, stunning beaches, and Catalan culture. 10. **Lerici** - Near the Ligurian Sea, Lerici is part of the stunning Gulf of Poets and is known for its beautiful bay, historic castle, and crystal-clear waters. Each of these destinations offers a unique blend of beautiful beaches, cultural sites, and local cuisine, making Italy a fantastic choice for a sea vacation. Activity Plain Text Italy has a vibrant dance scene with many cities offering great opportunities to enjoy salsa and bachata. Here are ten cities where you can indulge in these lively dance styles: 1. **Rome** - The capital city has a bustling dance scene with numerous salsa clubs and events happening regularly. 2. **Milan** - Known for its nightlife, Milan offers various dance clubs and events catering to salsa and bachata enthusiasts. 3. **Florence** - A cultural hub, Florence has several dance studios and clubs where you can enjoy Latin dances. 4. **Naples** - Known for its passionate culture, Naples offers several venues and events for salsa and bachata lovers. 5. **Turin** - This northern city has a growing salsa community with events and social dances. 6. **Bologna** - Known for its lively student population, Bologna has a number of dance clubs and events for salsa and bachata. 7. **Venice** - While famous for its romantic canals, Venice also hosts various dance events throughout the year. 8. **Palermo** - In Sicily, Palermo has a vibrant Latin dance scene reflecting the island's festive culture. 9. **Verona** - Known for its romantic setting, Verona has several dance studios and clubs for salsa and bachata. 10. **Bari** - This coastal city in the south offers dance festivals and clubs perfect for salsa and bachata enthusiasts. These cities offer a mix of cultural experiences and lively dance floors, ensuring you can enjoy salsa and bachata across Italy. Interestingly enough, Naples does not top any of the lists, though the first four cities in the sea list are all close to Naples. Licensing Details Before closing the article, just two words on the license of Fibry. Fibry is no longer distributed as a pure MIT license. The main difference now is that if you want to build a system to generate code at scale for third parties (like a software engineer agent), you need a commercial license. Also, it is forbidden to include it in any datasets to train systems to generate code (e.g., ChatGPT should not be trained on the source code of Fibry). Anything else, you are good to go. I can provide commercial support and develop features on demand. Conclusion I hope you had fun and could get an idea of how to use Fibry to write AI agents. If you think that a multi-agent system needs to be distributed and run on multiple nodes, Fibry has got you covered! While we’ll save the details for another article, it’s worth noting that setting up Fibry actors in a distributed system is straightforward, and your agents are already actors: when you call process() or processAsync(), a message is sent to the underlying actor. In Fibry, sending and receiving messages over the network is abstracted away, so you don’t even need to modify your agent logic to enable distribution. This makes Fibry uniquely simple for scaling across nodes without rewriting core logic. Happy coding!
Testing is a critical yet often time-consuming process. Ensuring that every feature, flow, and edge case works as intended can take up significant resources — both in terms of time and manpower. Manual testing, while thorough, is prone to human error and inefficiency, especially when dealing with repetitive tasks or complex workflows. OpenAI recently introduced an advanced AI agent that would enhance our approach to software testing. In this article, we’ll explore what Operator is, how it functions, and, most importantly, how it can drastically reduce manual testing time for developers and QA teams. We’ll also walk through some real-world examples to demonstrate its potential impact on testing various application flows and some potential limitations. What Is Operator? Operator is an AI-powered agent designed to interact with digital systems in a way that mimics human behavior. Unlike traditional automation tools that require explicit scripting and predefined rules, Operator leverages natural language processing (NLP) and machine learning to understand instructions and execute actions dynamically. It’s like having a virtual assistant that can navigate applications, perform tasks, and even troubleshoot issues — all without requiring extensive coding knowledge. The key features of Operator include: Natural language understanding. You can provide instructions in plain English, such as "Log into the app using test credentials" or "Verify if the payment gateway redirects correctly."Dynamic adaptability. Operator adapts to changes in UI elements, making it more resilient than static scripts.Task automation. From filling out forms to simulating multi-step user journeys, Operator handles repetitive tasks effortlessly.Error detection. The agent can identify anomalies during execution and flag them for review. These capabilities make Operator particularly well-suited for automating end-to-end testing scenarios, where flexibility and adaptability are crucial. Why Manual Testing Still Dominates and Its Challenges Despite advances in automated testing frameworks, many organizations still rely heavily on manual testing for several reasons: Complex workflows. Some applications have intricate user paths that are difficult to script.Frequent updates. Agile development cycles mean frequent updates, rendering pre-written scripts obsolete quickly.Edge cases. Identifying and testing rare but critical edge cases requires creativity and intuition, which scripted tests lack. However, manual testing comes with its own set of challenges: Time-consuming. Repetitive tasks eat up valuable hours that could be spent on innovation.Human error. Even experienced testers can miss subtle bugs due to fatigue or oversight.Scalability issues. As projects grow larger, scaling manual efforts becomes impractical. This is where Operator shines — it combines the precision of automation with the adaptability of human-like interaction, addressing these pain points effectively. Reducing Manual Testing Time With Operator Let’s dive into a practical example to illustrate how Operator can streamline testing processes and save time. Imagine you’re working on an e-commerce platform with the following core functionalities: User registration and loginProduct search and filteringAdding items to the cartCheckout process, including payment integration Each of these steps involves multiple sub-tasks, validations, and possible error conditions. Let’s see how Operator can help automate the testing of these flows. Scenario 1: Testing User Registration and Login Traditional Approach A manual tester would need to: Create new accounts repeatedly with different datasets (valid emails, invalid formats, duplicate entries)Test password strength requirementsAttempt logins with correct/incorrect credentialsCheck email verification links. This process could easily take 1–2 hours per round of testing, depending on the number of variations. With Operator: You simply instruct Operator in natural language: Prompt Create five new user accounts with valid details, one account with an invalid email format, and another with a weak password. Then, attempt to log in with each set of credentials and verify error messages. Operator will: Generate test data automaticallyExecute registration attempts across all specified scenariosLog in with each credential combinationValidate responses against expected outcomes What once took hours now takes mere minutes, freeing up your team to focus on higher-value activities. Scenario 2: Testing Product Search and Filtering Traditional Approach Testers manually search for products using various keywords, filters (price range, category), and sorting options. They must ensure results align with expectations and handle cases where no matches exist. With Operator Provide a simple command: Prompt Search for 'laptop' and apply filters: price between $100–$1000, brand='Apple', sort by relevance. Repeat with non-existent product names like 'unicorn laptop.' Operator will: Perform searches and apply filters systematicallyCompare actual results with expected outputsFlag discrepancies, such as incorrect filter applications or missing items Scenario 3: End-to-End Checkout Process Traditional Approach Manually adding items to the cart, entering shipping details, selecting payment methods, and verifying confirmation pages is tedious. Any change in the checkout flow necessitates retesting everything from scratch. With Operator Use a straightforward instruction: Prompt Add three random products to the cart, proceed to checkout, enter dummy shipping info, select PayPal as the payment method, and confirm the order. Operator will: Automate the entire checkout journeyHandle both successful and failure scenariosEnsure error messages appear appropriately and transactions reflect accurately Benefits Beyond Time Savings While reducing manual testing time is a significant advantage, Operator offers additional benefits that enhance the overall testing process: Improved accuracy. Operator eliminates human errors associated with repetitive tasks, leading to more reliable results.Enhanced collaboration. Since Operator uses natural language, non-technical stakeholders can easily participate in defining test scenarios.Cost efficiency. Automating routine tests reduces dependency on large QA teams, lowering operational costs.Focus on innovation. Freed from manual tasks, testers can dedicate more time to exploratory testing and creative problem-solving. Potential Limitations and Considerations While Operator holds immense promise, it’s essential to acknowledge certain limitations: Learning curve. Teams must learn to phrase test requirements effectively for the AI.Complex UI interactions. Highly dynamic interfaces (e.g., games, AR apps) may still require human intervention.Ethical oversight. Over-reliance on AI could lead to complacency. Human review remains essential for critical systems. That said, these challenges are outweighed by the long-term gains in efficiency and reliability. Conclusion As software complexity continues to rise, so does the demand for smarter, faster, and more adaptable testing solutions. Operator represents a paradigm shift in how we approach quality assurance, bridging the gap between human expertise and machine efficiency. With Operator, development teams can significantly cut down on manual testing time, achieve broader test coverage, and deliver high-quality products at a faster pace. In my next blog, I will provide a live example and explain it in greater detail.
Can an identity exist without being referenced by another identity? How would we know? That might seem a bit philosophical for a security tech article, but it is an important point to keep in mind when tackling the subject of non-human identities. A better question around security would actually be, "Should an identity exist if it can not be interacted with?" We might not be able to reach the answer to that first question, as proving the nature of reality is a little out of scope for computer science. However, a lot of folks have been hard at work building the NHI Governance tools to determine if a machine identity exists, why it exists, and answer the question of whether it should exist. The future of eliminating secrets sprawl means getting a handle on the lifecycles and interdependencies of the non-human identities that rely on secrets. But why now? Let's step back and re-examine some of our assumptions about NHIs and their existence. What Are Non-Human Identities? Before we proceed, let's define NHI in the context of this conversation. In the simplest terms, a non-human identity, also commonly referred to as a machine identity or a workload identity, is any entity that is not human and can perform an action within your system, most commonly interacting exclusively with other non-humans. This could be a Kubernetes pod that needs to interact with a data source and send the processed data to a reporting system. This could be an Internet of Things (IoT) sensor feeding data to a central server. This could be a Slack-based chatbot. If no human input is directly needed after the initial creation for the entity to get work done, then we should consider that identity 'non-human.' The one thing all these examples have in common is that they interact with another system. If we want them to communicate with the entire world, that is easy, as we simply point to the other non-human identities and programmatically describe how they should interact. However, we most likely want these systems to communicate securely, only authorizing specific identities under specific circumstances. This has driven the evolution of secrets for access management, from simple username/password pairs to API keys to certificates. Admittedly, that is a broad definition of NHI. However, we can narrow down what we care about with machine identities by stepping back and considering how these entities relate to one another through the lens of their secrets, allowing access and communication. All NHIs Connect to Other Systems Can you build a stand-alone application that does not take in any input, produce any output, and has no addressable interface? Does such an application exist outside of a thought experiment? While fun to think about, the reality is that all NHIs we care about exist to communicate with other identities. NHIs inherently require connections to other systems and services to fulfill their purpose. This interconnectivity means every NHI becomes a node in a web of interdependencies. From an NHI governance perspective, this necessitates maintaining an accurate and dynamic inventory of these connections to manage the associated risks. For example, if a single NHI is compromised, what does it connect to, and what would an attacker be able to access to laterally move into? Proper NHI governance must include tools to map and monitor these relationships. While there are many ways to go about this manually, what we actually want is an automated way to tell what is connected to what, what is used for what, and by whom. When thinking in terms of securing our systems, we can leverage another important fact about all NHIs in a secured application to build that map, they all, necessarily, have secrets. All Secure NHIs Must Have a Secret In order to establish trusted communication between any two NHIs, a unique secret, such as an API key, token, or certificate, must exist for those entities to authenticate. We can use the secret to prove an NHI's identity and map it in the ecosystem. The question becomes, where do we look for these secrets? In the modern enterprise, especially larger ones, there are essentially only two places a secret can live. Your first option is the best practice and safest option: a secrets management system, such as CyberArk's Conjur, Vault by HashiCorp, or AWS Secrets Manager. The other option is much less secure but, unfortunately, all too common: outside of a vault, in code, or configuration in plaintext. Enterprise secrets management platforms, often referred to as vaults, are critical for storing and protecting secrets used by NHIs. Vaults can provide a single source of truth for all secrets, ensuring they are encrypted at rest, tightly access-controlled, and monitored for unauthorized access attempts. This assumes you have standardized on a single enterprise secret management platform. Most organizations actually have many vaults in use at the same time, making synchronization between all vaults an additional challenge. Teams can map all existing machine identities based on the existence of these secrets. For enterprises with multiple secret management solutions in place, you need to know which vaults do and do not contain a secret and to reduce the overhead of storing the same key redundantly across several vaults. All NHI Secrets Have an Origin Story Machines can't grant themselves permissions and access. Every machine identity was created by or represents a human identity. Governance of NHIs must include secret creation tracking to ensure every secret is traceable to its origin, securely distributed, and linked to a legitimate identity. While this aspect could be accounted for with the proper use of a secret management platform, our data keeps telling us that a certain percentage of secrets leak year after year because we are not consistently using these vault solutions. We know from years of experience helping teams remediate incidents that the creator of a secret will almost always be the person who first introduces the credential into an ecosystem. We can also tell from the code history or other system timestamp information when this was first seen, which is the most probable time for it to be created or at least come into existence in a meaningful way. This is a critical detail that might never have been properly logged or documented anywhere else. Once you understand who created a secret to be able to leverage an NHI, then you truly understand the beginning of our NHI lifecycle. All NHI Secrets Must Grant Some Set of Permissions When created, every NHI secret must be granted a certain set of permissions. The scope determines what actions an identity can perform and on which systems. This makes permission scoping and enforcement crucial components of governance. Essentially, two risks make understanding the scope of a secret critical for enterprise security. First is that misconfigured or over-privileged secrets can inadvertently grant access to sensitive data or critical systems, significantly increasing the attack surface. Imagine accidentally giving write privileges to a system that can access your customer's PII. That is a ticking clock waiting for a threat actor to find and exploit it. Also, just as troubling is that when a secret is leaked or compromised, a team can not replace it until they first understand how those permissions were configured. For example, suppose you know a mission-critical microservice's secret was accidentally pushed to a public GitHub repo. In that case, it is only a matter of time before it will be discovered and used by someone outside of your organization. In our recent Voice of the Practitioner report, IT decision-makers admitted it took, on average, 27 days to rotate these critical secrets. Teams should be able to act in seconds or minutes, not days. Tools that provide additional context about detected secrets, including their roles and permissions, are needed. Rapidly understanding what assets are exposed when a leak occurs and what potential damage can be inflicted by a threat actor goes a long way when responding to an incident. Knowing exactly how to replace it from a dashboard view or API call can mean the difference between a breach and a frustrated attacker finding the key they have is invalid. All NHI Secrets Need to be Rotated A machine identity can, and likely should, have many secrets in its lifetime. If credentials are left to live for months or years, or in the worst case, forever, NHI secrets exposure or compromise becomes increasingly likely. Manual rotation is error-prone and operationally taxing, particularly in environments with thousands of NHIs. Automating the secret rotation process is a cornerstone of NHI governance, ensuring that secrets are refreshed before they expire or are leaked. For any of the secrets in your vaults, rotation should be a simple matter of scripting. Most secret management platforms provide scripts or some other mechanism to handle the delicate dance of safely replacing and revoking the old secret. But what about the NHI secrets that live outside of these vaults, or perhaps the same secret that is spread across multiple vaults? A good secret scanning platform needs seamless integration with these vaults so that your team can more easily find and safely store these secrets in the secrets manager and prepare the way for automated rotation. GitGuardian's reference implementation with CyberArk's Conjur goes into more detail on how you can fully automate the entire storage and rotation process. By identifying all the NHIs and knowing when they were created, we can also predict when they need to be rotated. While every team will judge exactly how long each secret should live, any secrets that have never been rotated after creation are ripe to be replaced. Any secret older than a year, or for some mission-critical systems, a few days should also be prioritized for rotation asap. All NHIs Will Have an End-of-Life NHIs, like their human counterparts, have finite lifecycles. They may be decommissioned when a service is retired, replaced, or no longer needed. Without addressing the deactivation and cleanup of NHIs to prevent the persistence of unused secrets or stale connections, we are creating security blind spots. But how do we know when we are at the end of the road for an NHI, especially if its secret remains valid? One answer is that it should no longer exist when an NHI no longer connects to another active system. This ensures attackers cannot exploit defunct NHI secrets to gain a foothold in your environment. Remember that attackers do not care how a secret should be appropriately used; they only care about what they can do with it. By mapping all the relationships an NHI's secrets allow, you can identify when a system is no longer connected to any other identity. Once there are no more ways for an identity to communicate, then it and its secrets should no longer exist. It also means the secret no longer needs to be stored in your secrets managers, giving you one less thing to store and manage. Understanding the World Around Your NHIs is Critical to Security In 2022, CyberArk's research showed that for every human identity in an environment, at least 45 non-human identities need to be managed. That ratio today is likely closer to 1 to 100 and is ever-increasing. The best time to come to terms with your NHI governance and lifecycle management was years ago. The next best time is right now. It is time for a full-cycle approach to non-human identity security, mapping out not just where your NHI secrets are but, just as importantly, what other NHIs are connected. We are overdue, across all industries, to implement NHI governance at scale. Finding and properly storing your secrets is just the beginning of the story. We must better document and understand the scope of NHI secrets, their age, who implemented them, and other contextual information, such as when they should be rotated. Even though machine identities outnumber human beings, there is no reason to work alone to solve this problem; we are all in it together.
Artificial intelligence (AI) lets you manage WordPress in many ways, including generating AI content, creating images, improving SEO, and more. You can use AI to do the following: Generate WordPress AI contentGenerate WordPress art and imagesImprove WordPress SEODesign and build a WordPress siteEnhance user experience with AI-powered chatbotImprove WordPress web forms with AITranslate WordPress into a multilingual site And much more! In this article, we’ll learn how to use AI with WordPress using plugins. Let’s get started. How to Use AI With WordPress There are many ways you can use AI with WordPress. You can use WordPress AI plugins like ShortPixel to carry out specific tasks like creating/optimizing images. Alternatively, you can use Uncanny Automator to use AI in WordPress creatively. Uncanny Automator lets you integrate OpenAI such as Ada, Babbage, Davinci, etc., enabling you to create an AI-powered workflow for different tasks such as: Create AI blog post contentGenerate images using the DALL-E text promptAutomate replies to questions in forums or chats Furthermore, it works with the most popular WordPress plugins, including WPForms, MemberPress, AffiliateWP, etc. To install Uncanny Automator, follow this guide, and to connect to OpenAI, use this guide. Now, let’s discuss the different use cases and the AI tools/plugins you need to carry them out. Generate AI Content and Improve Existing Content Using generative AI, you can write or improve content. Plenty of tools let you create content, such as Jasper and ChatGPT. However, the challenge is to make them work directly from WordPress. To overcome this, I recommend using AI Engine, which lets you use OpenAI directly within WordPress. Once installed and connected (through the Open AI API key), you can create blog content, optimize SEO, and improve the content’s grammar without leaving WordPress. Alternatively, you can also opt for: Divi AI. It directly integrates with WordPress, offering powerful contextual text generation.Uncanny Automator. Set up custom steps to generate content directly in WordPress using forms. Generate and Optimize Images for Your WordPress Site You can use GenAI tools like DALL-E 2 to generate and optimize images, such as content generation. If you’re new to this, I suggest directly visiting the DALL-E 2 site and generating images there. However, if you want to integrate DALL-E 2 in WordPress, use Uncanny Automator, as it lets you create workflows directly in WordPress. Source: Automator Plugin With Divi AI, you can generate unique, stunning images directly from Divi Builder. It also offers AI Image refinement to existing images — all from text input, without the need to know coding or image editing skills.ShortPixel is an AI-powered plugin that optimizes images using advanced compression algorithms. The AI manages image resizing and scaling, ensuring it matches the user's device resolutions without affecting the site's functionality and load time (as it runs in the background). Improve WordPress SEO If you need help with WordPress SEO, you can try AIOSEO (All in One SEO) and Rank Math AI plugins. With AIOSEO, you can generate titles and meta descriptions. Rank Math, on the other hand, offers a complete package that you can use to generate SEO content, links, headings, and links. Rank Math's AI also lets you improve the content by suggesting additional keywords and providing dynamic optimization recommendations. Both these tools are free to install. However, if you want to use the AI features, you must buy credits (each action consumes one credit). Design and Build a WordPress Site You can also use AI to help you design and build a WordPress site. One such plugin is Elementor AI, an AI-powered tool for Elementor Builder that lets you do WordPress site design and building. With Elementor AI, you can generate container layouts using the desired design, all using prompts. And, if you are new to prompts, you can use its "Enhance Prompt" feature that offers suggestions to provide better input to get the desired output. Additionally, I like its leverage prompt history as it records your decision-making. Some other useful Elementor AI features include: Library-based AI container variationsAbility to generate wireframes and designGenerate editable images (change resolution, resize, replace image background, etc.) directly from the builderGenerate contextual content for your site, generation, headers, posts, and translationsGenerate custom code (HTML and CSS) to add animations and visual effects Alternatively, you can use: Divi AI lets you build AI-designed layouts, fine-tune the Divi module codebase, and define custom AI styles. Enhance User Experience With AI-Powered Chatbot AI chatbots are becoming more human-like, allowing you to use them to improve visitors’ real-time interactions. You can use WordPress AI plugins such as Chatbase to create a customizable chatbot. It can handle pre-trained data, file types, text, and content from specific URLs. It also learns over time by interacting with customers. However, it is an AI-focused tool, so you cannot add human interactivity. If you want a mix of AI and human support, ChatBot is what you need to use. I also suggest checking out AI ChatBot by QuantumCloud and Tidio, as they offer excellent AI-powered chatbots for your WordPress site. Improve Web Forms With AI Another way you can use AI to improve web forms on your WordPress site is by predicting user input, improving form interactions, and making them personalized based on individual preferences. To enhance web forms with AI, check out: Formidable Forms is an AI-powered form plugin that uses OpenAI. It generates AI responses to form field inputs. I prefer Formidable Forms because it is easy to set up. Gravity Forms also offer similar functionality, but you have to use OpenAI Addon by Gravity Wiz to make it work. Translate WordPress into Multilingual Sites If you want to translate WordPress into a multilingual site, you can use an AI-powered translation plugin such as Weglot, WPML, or Loco Translate. With Weglot, you can translate into other languages and reach a global audience. Weglot also ensures that Google indexes your newly generated translated post/page. Conclusion WordPress with AI opens up endless possibilities. Apart from the mentioned use cases, you can also use AI to help block WordPress comment spam, generate product descriptions/ad copy, or create event summaries. Just pick the right WordPress AI plugin, and you’re good to go. However, If you want more flexibility and control, you need Uncanny Automator. It is completely customizable, offering multiple combinations of steps enabling you to perform diverse tasks such as sentiment analysis, automating customer support, drafting emails, etc. So, which AI WordPress plugin/tool are you planning to use? Comment below and let us know.
Generative AI has been the cutting-edge technology that greatly reshaped the enterprise search landscape. But now, artificial intelligence (AI) development communities are delving into a new industry-leading innovation — Agentic AI. Agentic AI is a system that exhibits a high degree of autonomy. It designs workflow and uses available tools to take action independently on behalf of the users and solve complex problems that require multi-step solutions. It also interacts with external environments and goes beyond the data on which the system's machine learning models were trained. AI agents, powered by advanced machine learning techniques such as reinforcement learning, learn from user behavior and improve over time. These agents use multiple tools that enable them to work effectively in dynamic conditions. This blog explains the key problems that Agentic AI resolves in enterprise search. Critical Challenges in Enterprise Search That Agentic AI Addresses Ambiguity in User Queries Users usually search with certain keywords only, avoiding typing search queries. Due to the vague nature of the query, it becomes challenging for traditional AI models to comprehend the intent and deliver relevant results. However, AI agents take the decision to rephrase or augment the query. They have a query rephrase tool that autonomously refines or rephrases search terms when they are invalid by analyzing historical data and previous query contexts to refine the query. Consider a user who searches for "watches," but this query is ambiguous and incomplete and doesn't give the idea of what kind of watches the user is looking for, smart or regular. Now, suppose the user previously searched for "tracking burn calories." AI agents' query rephrase tool will rephrase the query based on the user's browsing history and previous query context and deliver search results for "Smartwatches." Inconsistent Sentiment Analysis Sentiments are a range of emotions that customers experience throughout their brand journey. Deciphering those sentiments is one crucial aspect of boosting customer satisfaction scores (CSAT). Traditional AI models fall short of understanding user query sentiments in many scenarios. Moreover, you have to leverage certain approaches that rely on pre-made dictionaries with words and their sentiment scores (positive, negative, or neutral) and redefine rules to determine the text sentiments. However, AI agents autonomously analyze the query sentiment and take action further based on that without human help. Its sentiment analyzer tool captures the overall sentiment of complex sentences, goes beyond just positive or negative sentiment, and distinguishes fine-grained sentiment expressions. Suppose a customer searched for "I tried everything but did not get my answers, feeling frustrated." An AI agent interprets the query sentiment, "the user is frustrated," and suggests something can aggravate their anger. So, it will either create a support ticket for the customer or directly connect with a live support agent to resolve their query. Identifying Key Entities in Data Earlier exact match and regex methods were used to find string values to tag the data. However, these methods miss the mark when it comes to contextual tagging and synonyms with the same lemma and stem. However, AI agents can perform Named Entity Recognition (NER) independently. The tool identifies and extracts key entities such as name, date, location, organization, or product from unstructured data without the need for manual tagging. This capability of agentic AI enhances the customer experience by making support service faster and more efficient. Imagine a customer raising a support ticket mentioning, "I haven't received my iPhone 16 pro, which I ordered on September 30." The AI agent tool autonomously performs NER and identifies key entities from the query, such as iPhone 16 pro (product) and (date) through NER. Then, it automatically cross-checks the information from the order database to find the reason for the delay. Based on this analysis, AI agents take further action to inform customers of the reason for the delay, initiate refunds directly, or escalate to live support agents directly. Therefore, agentic AI reduces resolution time and enhances customer satisfaction. Irrelevant Search Results Users, both customers and support agents, usually desire relevant, accurate, and contextual results for solving their queries. However, traditional models struggle when it comes to capturing evolving user query intent and proactive situation analysis in such a nuanced context. These limitations make traditional models lag behind in improving user satisfaction and efficiency. AI agents, on the contrary, rerank and refine search results. They automatically adapt to changing user inputs, analyze the past interaction of that user, decipher the evolving users' query intent, keep the previous context in their memory, and then refine and rerank the search result based on these analyses. Picture this: When a user searches for "best laptops for gaming," agentic AI goes in-depth for query intent interpretation and considers various factors such as gaming performance, affordability, and customer reviews. Then, results are reranked to bring the most relevant ones before others. This ability of agentic AI to autonomously fine-tune and prioritize relevant search results improves the user experience. How the Tools Integrate Seamlessly for Better Efficiency When a search query comes in, LLMs can determine whether it's related to a previous query or not. Based on this, it comprehends how to integrate previous conversations into this and rephrase a search query if it's incomplete or vague. Using NER, it automatically selects facets. Simultaneously, it analyzes user sentiment, whether they are happy, neutral, or frustrated, and if they need to escalate the ticket to the support agent. If you give autonomy to the agent, it will figure out to whom to assign the case. Conclusion To sum up, AI agents can enhance search accuracy, perform complex reasoning tasks, improve user experience, and complete tasks autonomously without human intervention.
As generative AI revolutionizes various industries, developers increasingly seek efficient ways to integrate large language models (LLMs) into their applications. Amazon Bedrock is a powerful solution. It offers a fully managed service that provides access to a wide range of foundation models through a unified API. This guide will explore key benefits of Amazon Bedrock, how to integrate different LLM models into your projects, how to simplify the management of the various LLM prompts your application uses, and best practices to consider for production usage. Key Benefits of Amazon Bedrock Amazon Bedrock simplifies the initial integration of LLMs into any application by providing all the foundational capabilities needed to get started. Simplified Access to Leading Models Bedrock provides access to a diverse selection of high-performing foundation models from industry leaders such as AI21 Labs, Anthropic, Cohere, Meta, Mistral AI, Stability AI, and Amazon. This variety allows developers to choose the most suitable model for their use case and switch models as needed without managing multiple vendor relationships or APIs. Fully Managed and Serverless As a fully managed service, Bedrock eliminates the need for infrastructure management. This allows developers to focus on building applications rather than worrying about the underlying complexities of infrastructure setup, model deployment, and scaling. Enterprise-Grade Security and Privacy Bedrock offers built-in security features, ensuring that data never leaves your AWS environments and is encrypted in transit and at rest. It also supports compliance with various standards, including ISO, SOC, and HIPAA. Stay Up-to-Date With the Latest Infrastructure Improvements Bedrock regularly releases new features that push the boundaries of LLM applications and require little to no setup. For example, it recently released an optimized inference mode that improves LLM inference latency without compromising accuracy. Getting Started With Bedrock In this section, we’ll use the AWS SDK for Python to build a small application on your local machine, providing a hands-on guide to getting started with Amazon Bedrock. This will help you understand the practical aspects of using Bedrock and how to integrate it into your projects. Prerequisites You have an AWS account.You have Python installed. If not installed, get it by following this guide.You have the Python AWS SDK (Boto3) installed and configured correctly. It's recommended to create an AWS IAM user that Boto3 can use. Instructions are available in the Boto3 Quickstart guide.If using an IAM user, ensure you add the AmazonBedrockFullAccess policy to it. You can attach policies using the AWS console.Request access to 1 or more models on Bedrock by following this guide. 1. Creating the Bedrock Client Bedrock has multiple clients available within the AWS CDK. The Bedrock client lets you interact with the service to create and manage models, while the BedrockRuntime client enables you to invoke existing models. We will use one of the existing off-the-shelf foundation models for our tutorial, so we’ll just work with the BedrockRuntime client. Python import boto3 import json # Create a Bedrock client bedrock = boto3.client(service_name='bedrock-runtime', region_name='us-east-1') 2. Invoking the Model In this example, I’ve used the Amazon Nova Micro model (with modelId amazon.nova-micro-v1:0), one of Bedrock's cheapest models. We’ll provide a simple prompt to ask the model to write us a poem and set parameters to control the length of the output and the level of creativity (called “temperature”) the model should provide. Feel free to play with different prompts and tune parameters to see how they impact the output. Python import boto3 import json # Create a Bedrock client bedrock = boto3.client(service_name='bedrock-runtime', region_name='us-east-1') # Select a model (Feel free to play around with different models) modelId = 'amazon.nova-micro-v1:0' # Configure the request with the prompt and inference parameters body = json.dumps({ "schemaVersion": "messages-v1", "messages": [{"role": "user", "content": [{"text": "Write a short poem about a software development hero."}]}], "inferenceConfig": { "max_new_tokens": 200, # Adjust for shorter or longer outputs. "temperature": 0.7 # Increase for more creativity, decrease for more predictability } }) # Make the request to Bedrock response = bedrock.invoke_model(body=body, modelId=modelId) # Process the response response_body = json.loads(response.get('body').read()) print(response_body) We can also try this with another model like Anthropic’s Haiku, as shown below. Python import boto3 import json # Create a Bedrock client bedrock = boto3.client(service_name='bedrock-runtime', region_name='us-east-1') # Select a model (Feel free to play around with different models) modelId = 'anthropic.claude-3-haiku-20240307-v1:0' # Configure the request with the prompt and inference parameters body = json.dumps({ "anthropic_version": "bedrock-2023-05-31", "messages": [{"role": "user", "content": [{"type": "text", "text": "Write a short poem about a software development hero."}]}], "max_tokens": 200, # Adjust for shorter or longer outputs. "temperature": 0.7 # Increase for more creativity, decrease for more predictability }) # Make the request to Bedrock response = bedrock.invoke_model(body=body, modelId=modelId) # Process the response response_body = json.loads(response.get('body').read()) print(response_body) Note that the request/response structures vary slightly between models. This is a drawback that we will address by using predefined prompt templates in the next section. To experiment with other models, you can look up the modelId and sample API requests for each model from the “Model Catalog” page in the Bedrock console and tune your code accordingly. Some models also have detailed guides written by AWS, which you can find here. 3. Using Prompt Management Bedrock provides a nifty tool to create and experiment with predefined prompt templates. Instead of defining prompts and specific parameters such as token lengths or temperature in your code every time you need them, you can create pre-defined templates in the Prompt Management console. You specify input variables that will be injected during runtime, set up all the required inference parameters, and publish a version of your prompt. Once done, your application code can invoke the desired version of your prompt template. Key advantages of using predefined prompts: It helps your application stay organized as it grows and uses different prompts, parameters, and models for various use cases.It helps with prompt reuse if the same prompt is used in multiple places.Abstracts away the details of LLM inference from our application code.Allows prompt engineers to work on prompt optimization in the console without touching your actual application code.It allows for easy experimentation, leveraging different versions of prompts. You can tweak the prompt input, parameters like temperature, or even the model itself. Let’s try this out now: Head to the Bedrock console and click “Prompt Management” on the left panel.Click on “Create Prompt” and give your new prompt a nameInput the text that we want to send to the LLM, along with a placeholder variable. I used Write a short poem about a {{topic}.In the Configuration section, specify which model you want to use and set the values of the same parameters we used earlier, such as “Temperature” and “Max Tokens.” If you prefer, you can leave the defaults as-is.It's time to test! At the bottom of the page, provide a value for your test variable. I used “Software Development Hero.” Then, click “Run” on the right to see if you’re happy with the output. For reference, here is my configuration and the results. We need to publish a new Prompt Version to use this Prompt in your application. To do so, click the “Create Version” button at the top. This creates a snapshot of your current configuration. If you want to play around with it, you can continue editing and creating more versions. Once published, we need to find the ARN (Amazon Resource Name) of the Prompt Version by navigating to the page for your Prompt and clicking on the newly created version. Copy the ARN of this specific prompt version to use in your code. Once we have the ARN, we can update our code to invoke this predefined prompt. We only need the prompt version's ARN and the values for any variables we inject into it. Python import boto3 import json # Create a Bedrock client bedrock = boto3.client(service_name='bedrock-runtime', region_name='us-east-1') # Select your prompt identifier and version promptArn = "<ARN from the specific prompt version>" # Define any required prompt variables body = json.dumps({ "promptVariables": { "topic":{"text":"software development hero"} } }) # Make the request to Bedrock response = bedrock.invoke_model(modelId=promptArn, body=body) # Process the response response_body = json.loads(response.get('body').read()) print(response_body) As you can see, this simplifies our application code by abstracting away the details of LLM inference and promoting reusability. Feel free to play around with parameters within your prompt, create different versions, and use them in your application. You could extend this into a simple command line application that takes user input and writes a short poem on that topic. Next Steps and Best Practices Once you're comfortable with using Bedrock to integrate an LLM into your application, explore some practical considerations and best practices to get your application ready for production usage. Prompt Engineering The prompt you use to invoke the model can make or break your application. Prompt engineering is the process of creating and optimizing instructions to get the desired output from an LLM. With the pre-defined prompt templates explored above, skilled prompt engineers can get started with prompt engineering without interfering with the software development process of your application. You may need to tailor your prompt to be specific to the model you would like to use. Familiarize yourself with prompt techniques specific to each model provider. Bedrock provides some guidelines for commonly large models. Model Selection Making the right model choice is a balance between the needs of your application and the cost incurred. More capable models tend to be more expensive. Not all use cases require the most powerful model, while the cheapest models may not always provide the performance you need. Use the Model Evaluation feature to quickly evaluate and compare the outputs of different models to determine which one best meets your needs. Bedrock offers multiple options to upload test datasets and configure how model accuracy should be evaluated for individual use cases. Fine-Tune and Extend Your Model With RAG and Agents If an off-the-shelf model doesn't work well enough for you, Bedrock offers options to tune your model to your specific use case. Create your training data, upload it to S3, and use the Bedrock console to initiate a fine-tuning job. You can also extend your models using techniques such as retrieval-augmented generation (RAG) to improve performance for specific use cases. Connect existing data sources which Bedrock will make available to the model to enhance its knowledge. Bedrock also offers the ability to create agents to plan and execute complex multi-step tasks using your existing company systems and data sources. Security and Guardrails With Guardrails, you can ensure that your generative application gracefully avoids sensitive topics (e.g., racism, sexual content, and profanity) and that the generated content is grounded to prevent hallucinations. This feature is crucial for maintaining your applications' ethical and professional standards. Leverage Bedrock's built-in security features and integrate them with your existing AWS security controls. Cost Optimization Before widely releasing your application or feature, consider the cost that Bedrock inference and extensions such as RAG will incur. If you can predict your traffic patterns, consider using Provisioned Throughput for more efficient and cost-effective model inference.If your application consists of multiple features, you can use different models and prompts for every feature to optimize costs on an individual basis.Revisit your choice of model as well as the size of the prompt you provide for each inference. Bedrock generally prices on a "per-token" basis, so longer prompts and larger outputs will incur more costs. Conclusion Amazon Bedrock is a powerful and flexible platform for integrating LLMs into applications. It provides access to many models, simplifies development, and delivers robust customization and security features. Thus, developers can harness the power of generative AI while focusing on creating value for their users. This article shows how to get started with an essential Bedrock integration and keep our Prompts organized. As AI evloves, developers should stay updated with the latest features and best practices in Amazon Bedrock to build their AI applications.
At the ASF's flagship Community Over Code North America conference in October 2024, keynote speakers underscored the vital role of open-source communities in driving innovation, enhancing security, and adapting to new challenges. By highlighting the Cybersecurity and Infrastructure Security Agency's (CISA) intensified focus on open source security, citing examples of open source-driven innovation, and reflecting on the ASF's 25-year journey, the keynotes showcased a thriving but rapidly changing ecosystem for open source. Opening Keynote: CISA's Vision for Open Source Security Aeva Black from CISA opened the conference with a talk about the government's growing engagement with open source security. Black, a long-time open source contributor who helps shape federal policy, emphasized how deeply embedded open source has become in critical infrastructure. To help illustrate open source's pervasiveness, Black noted that modern European cars have more than 100 computers, "most of them running open source, including open source orchestration systems to control all of it." CISA's open-source roadmap aims to "foster an open source ecosystem that is secure, sustainable and resilient, supported by a vibrant community." Black also highlighted several initiatives, including new frameworks for assessing supply chain risk, memory safety requirements, and increased funding for security tooling. Notably, in the annual Administration Cybersecurity Priorities Memo M-24-14, the White House has encouraged Federal agencies to include budget requests to establish Open Source Program Offices (OSPOs) to secure their open source usage and develop contribution policies. Innovation Showcase: The O.A.S.I.S Project Chris Kersey delivered a keynote demonstrating the O.A.S.I.S Project, an augmented-reality helmet system built entirely with open-source software. His presentation illustrated how open source enables individuals to create sophisticated systems by building upon community-maintained ecosystems. Kersey's helmet integrates computer vision, voice recognition, local AI processing, and sensor fusion - all powered by open source. "Open source is necessary to drive this level of innovation because none of us know all of this technology by ourselves, and by sharing what we know with each other, we can build amazing things," Kersey emphasized while announcing the open-sourcing of the O.A.S.I.S Project. State of the Foundation: Apache at 25 David Nalley, President of the Apache Software Foundation (ASF), closed the conference with the annual 'State of the Foundation' address, reflecting on the ASF's evolution over 25 years. He highlighted how the foundation has grown from primarily hosting the Apache web server to becoming a trusted home for hundreds of projects that "have literally changed the face of the (open source) ecosystem and set a standard that the rest of the industry is trying to copy." Nalley emphasized the ASF's critical role in building trust through governance: "When something carries the Apache brand, people know that means there's going to be governance by consensus, project management committees, and people who are acting in their capacity as an individual, not as a representative of some other organization." Looking ahead, Nalley acknowledged the need for the ASF to adapt to new regulatory requirements like Europe's Cyber Resiliency Act while maintaining its core values. He highlighted ongoing collaboration with other foundations like the Eclipse Foundation to set standards for open-source security compliance. "There is a lot of new work we need to do. We cannot continue to do the things that we have done for many years in the same way that we did them 25 years ago," Nalley noted while expressing confidence in the foundation's ability to evolve. Conclusion This year's Community Over Code keynotes highlighted a maturing open-source ecosystem tackling new challenges around security, regulation, and scalability while showing how community-driven innovation continues to push technical limits. Speakers stressed that the ASF's model of community-led development and strong governance is essential for fostering trust and driving innovation in today's complex technology landscape.
Large language models (LLMs) have disrupted AI with their ability to generate coherent text, translate languages, and even carry on conversations. However, despite their impressive capabilities, LLMs still face significant challenges when it comes to reasoning and understanding complex contexts. These models, while adept at identifying and replicating patterns in vast amounts of training text, often struggle with tasks that require true comprehension and logical reasoning. This can result in problems such as inconsistencies in long conversations, errors in connecting disparate pieces of information, and difficulties in maintaining context over extended narratives. Understanding these reasoning problems is crucial for improving the future development and application of LLMs. Key Reasoning Challenges Lack of True Understanding Language models operate by predicting the next keyword based on patterns they have learned from extensive data during training. However, they lack a deep, inherent understanding of the environment and the concepts they discuss. As a result, they may find complex reasoning tasks that demand true comprehension challenging. Contextual Limitations Although modern language models excel at grasping short contexts, they often struggle to maintain coherence and context over extended conversations or larger text segments. This can result in reasoning errors when the model must link information from various parts of the dialogue or text. In a lengthy discussion or intricate narrative, the model might forget or misinterpret earlier details, leading to contradictions or inaccurate conclusions later on. Inability to Perform Planning Many reasoning tasks involve multiple steps of logic or the ability to track numerous facts over time. Current language models often struggle with tasks that demand long-term coherence or multi-step logical deductions. They may have difficulty solving puzzles that require several logical operations. Answering Unsolvable Problems Answering unsolvable problems is a critical challenge for LLMs and highlights the limitations of their reasoning capabilities. When presented with an unsolvable problem, such as a paradox, a question with no clear answer, or a question that contradicts established facts, LLMs can struggle to provide meaningful or coherent responses. Instead of recognizing the inherent impossibility of the problem, the model may attempt to offer a solution based on patterns in the data it has been trained on, which can lead to misleading or incorrect answers. State Space Computation Complexity Some problems necessitate exploring all possible states from the initial state to the goal state. For instance, travel planning can involve numerous options, and with additional constraints like budget and mode of travel, the search state space can approach polynomial explosion. It would be impractical for a language model to compute and respond to all these possibilities. Instead, it would rely on the heuristics it has learned to provide a feasible solution that may not be correct. A Real-Life Example of Incorrect Reasoning Let’s take the question: Plain Text "A jug filled with 8 units of water, and two empty jugs of sizes 5 and 5. The solver must pour the water so that the first and second jugs both contain 4 units, and the third is empty. Each step pouring water from a source jug to a destination jug stops when either the source jug is empty or the destination jug is full, whichever happens first". We can see from the responses below that the LLMs that exist today give incorrect answers. This problem is actually not solvable, but all LLMs try to give an answer as if they found the solution. ChatGPT's Response Google's Response Bing Copilot's Response LLMs Reciting vs. Reasoning However, if you were to change the question to have “two empty jugs of sizes 5 and 4” instead of “two empty jugs of sizes 5 and 5”, then all LLMs would answer the memorized question correctly. What Are Researchers Proposing to Help With Reasoning? While some researchers focus on improving the dataset and using the chain of thoughts approach, others propose the use of external verifiers and solvers. Each of these techniques aims to bring improvements by addressing different dimensions of the problem. Improving the Dataset Some researchers propose improving the quality and diversity of the data used to train language models. By curating more comprehensive and varied datasets, the models can learn from a broader range of contexts and examples. This approach aims to increase the model's ability to handle diverse scenarios. Chain-of-Thought This technique involves training models to follow a structured reasoning process, similar to human thinking, step by step. By encouraging models to generate intermediate reasoning steps explicitly, researchers hope to improve the models' ability to tackle complex reasoning tasks and provide more accurate, logically consistent responses. Using External Verifiers To address the issue of models generating incorrect or misleading information, some researchers propose integrating external verification mechanisms. These verifiers can cross-check the model's output against trusted sources or use additional algorithms to validate the accuracy of the information before presenting it to the user. This helps ensure that the generated content is reliable and factually correct. Using Solvers Another approach involves incorporating specialized solvers that are designed to handle specific types of reasoning tasks. These solvers can be used to perform calculations, solve equations, or process logical statements, complementing the language model's capabilities. By delegating these tasks to solvers, the overall system can achieve more accurate and reliable results. Conclusion Despite impressive progress in areas like text generation and comprehension, current language models struggle with intricate, multi-layered reasoning tasks due to their inability to fully grasp the meaning, maintain consistent context, and rely solely on patterns extracted from large but potentially flawed training data. To address these limitations, future models will likely require more sophisticated architectures alongside ongoing research into common sense reasoning. References Water pouring puzzle Learning to reason with LLMs GSM-Symbolic: Understanding the Limitations of Mathematical Reasoning in Large Language Models PlanBench: An Extensible Benchmark for Evaluating Large Language Models on Planning and Reasoning about ChangeLLMs Still Can't Plan; Can LRMs? A Preliminary Evaluation of OpenAI's o1 on PlanBenchLLMs Can't Plan, But Can Help Planning in LLM-Modulo Frameworks
Retrieval-augmented generation (RAG) is the most popular approach for obtaining real-time data or updated data from a data source based on text input by users. Thus empowering all our search applications with state-of-the-art neural search. In RAG search systems, each user request is converted into a vector representation by embedding model, and this vector comparison is performed using various algorithms such as cosine similarity, longest common sub-sequence, etc., with existing vector representations stored in our vector-supporting database. The existing vectors stored in the vector database are also generated or updated asynchronously by a separate background process. This diagram provides a conceptual overview of vector comparison To use RAG, we need at least an embedding model and a vector storage database to be used by the application. Contributions from community and open-source projects provide us with an amazing set of tools that help us build effective and efficient RAG applications. In this article, we will implement the usage of a vector database and embedding generation model in a Python application. If you are reading this concept for the first time or nth time, you only need tools to work, and no subscription is needed for any tool. You can simply download tools and get started. Our tech stack consists of the following open-source and free-to-use tools: Operating system – Ubuntu LinuxVector database – Apache CassandraEmbedding model – nomic-embed-textProgramming language – Python Key Benefits of this Stack Open-sourceIsolated data to meet data compliance standards This diagram provides a high-level dependency architecture of the system Implementation Walkthrough You may implement and follow along if prerequisites are fulfilled; otherwise, read to the end to understand the concepts. Prerequisites Linux (In my case, it's Ubuntu 24.04.1 LTS)Java Setup (OpenJDK 17.0.2)Python (3.11.11)Ollama Ollama Model Setup Ollama is an open-source middleware server that acts as an abstraction between generative AI and applications by installing all the necessary tools to make generative AI models available to consume as CLI and API in a machine. It has most of the openly available models like llama, phi, mistral, snowflake-arctic-embed, etc. It is cross-platform and can be easily configured in OS. In Ollama, we will pull the nomic-embed-text model to generate embeddings. Run in command line: Plain Text ollama pull nomic-embed-text This model generates embeddings of size 768 vectors. Apache Cassandra Setup and Scripts Cassandra is an open-source NoSQL database designed to work with a high amount of workloads that require high scaling as per industry needs. Recently, it has added support for Vector search in version 5.0 that will facilitate our RAG use case. Note: Cassandra requires Linux OS to work; it can also be installed as a docker image. Installation Download Apache Cassandra from https://cassandra.apache.org/_/download.html. Configure Cassandra in your PATH. Start the server by running the following command in the command line: Plain Text cassandra Table Open a new Linux terminal and write cqlsh; this will open the shell for Cassandra Query Language. Now, execute the below scripts to create the embeddings keyspace, document_vectors table, and necessary index edv_ann_index to perform a vector search. SQL CREATE KEYSPACE IF NOT EXISTS embeddings WITH REPLICATION = { 'class' : 'SimpleStrategy', 'replication_factor' : '1' }; USE embeddings; CREATE TABLE IF NOT EXISTS embeddings.document_vectors ( record_id timeuuid, id uuid, content_chunk text, content_vector VECTOR <FLOAT, 768>, created_at timestamp, PRIMARY KEY (id, created_at) ) WITH CLUSTERING ORDER BY (created_at DESC); CREATE INDEX IF NOT EXISTS edv_ann_index ON embeddings.document_vectors(content_vector) USING 'sai'; Note: content_vector VECTOR <FLOAT, 768> is responsible for storing vectors of 768 length that are generated by the model. Milestone 1: We are ready with database setup to store vectors. Python Code This programming language certainly needs no introduction; it is easy to use and loved by the industry with strong community support. Virtual Environment Set up virtual environment: Plain Text sudo apt install python3-virtualenv && python3 -m venv myvenv Activate virtual environment: Plain Text source /media/setia/Data/Tutorials/cassandra-ollama-app/myvenv/bin/activate Packages Download Datastax Cassandra package: Plain Text pip install cassandra-driver Download requests package: Plain Text pip install requests File Create a file named app.py. Now, write the code below to insert sample documents in Cassandra. This is the first step always to insert data in the database; it can be done by a separate process asynchronously. For demo purposes, I have written a method that will insert documents first in the database. Later on, we can comment on this method once the insertion of documents is successful. Python from cassandra.cluster import Cluster from cassandra.query import PreparedStatement, BoundStatement import uuid import datetime import requests cluster = Cluster(['127.0.0.1'],port=9042) session = cluster.connect() def generate_embedding(text): embedding_url = 'http://localhost:11434/api/embed' body = { "model": "nomic-embed-text", "input": text } response = requests.post(embedding_url, json = body) return response.json()['embeddings'][0] def insert_chunk(content, vector): id = uuid.uuid4() content_chunk = content content_vector = vector created_at = datetime.datetime.now() insert_query = """ INSERT INTO embeddings.document_vectors (record_id, id, content_chunk, content_vector, created_at) VALUES (now(), ?, ?, ?, ?) """ prepared_stmt = session.prepare(insert_query) session.execute(prepared_stmt, [ id, content_chunk, content_vector, created_at ]) def insert_sample_data_in_cassandra(): sentences = [ "The aroma of freshly baked bread wafted through the quaint bakery nestled in the cobblestone streets of Paris, making Varun feel like time stood still.", "Sipping a spicy masala chai in a bustling tea stall in Mumbai, Varun felt he was tasting the very soul of the city.", "The sushi in a small Tokyo diner was so fresh, it felt like Varun was on a culinary journey to the sea itself.", "Under the starry desert sky in Morocco, Varun enjoyed a lamb tagine that tasted like a dream cooked slowly over a fire.", "The cozy Italian trattoria served the creamiest risotto, perfectly capturing the heart of Tuscany on a plate, which Varun savored with delight.", "Enjoying fish tacos on a sunny beach in Mexico, with the waves crashing nearby, made the flavors unforgettable for Varun.", "The crispy waffles drizzled with syrup at a Belgian café were worth every minute of waiting, as Varun indulged in the decadent treat.", "A bowl of warm pho in a roadside eatery in Hanoi felt like comfort wrapped in a broth of herbs and spices, giving Varun a sense of warmth.", "Sampling chocolate truffles in a Swiss chocolate shop, Varun found himself in a moment of pure bliss amidst snow-capped mountains.", "The street food stalls in Bangkok served fiery pad Thai that left Varun with a tangy memory of the city’s vibrant energy." ] for sentence in sentences: vector = generate_embedding(sentence) insert_chunk(sentence, vector) insert_sample_data_in_cassandra() Now, run this file using the commandline in the virtual environment: Plain Text python app.py Once the file is executed and documents are inserted, this can be verified by querying the Cassandra database from the cqlsh console. For this, open cqlsh and execute: SQL SELECT content_chunk FROM embeddings.document_vectors; This will return 10 documents inserted in the database, as seen in the screenshot below. Milestone 2: We are done with data setup in our vector database. Now, we will write code to query documents based on cosine similarity. Cosine similarity is the dot product of two vector values. Its formula is A.B / |A||B|. This cosine similarity is internally supported by Apache Cassandra, helping us to compute everything in the database and handle large data efficiently. The code below is self-explanatory; it fetches the top three results based on cosine similarity using ORDER BY <column name> ANN OF <text_vector> and also returns cosine similarity values. To execute this code, we need to ensure that indexing is applied to this vector column. Python def query_rag(text): text_embeddings = generate_embedding(text) select_query = """ SELECT content_chunk,similarity_cosine(content_vector, ?) FROM embeddings.document_vectors ORDER BY content_vector ANN OF ? LIMIT 3 """ prepared_stmt = session.prepare(select_query) result_rows = session.execute(prepared_stmt, [ text_embeddings, text_embeddings ]) for row in result_rows: print(row[0], row[1]) query_rag('Tell about my Bangkok experiences') Remember to comment insertion code: Python #insert_sample_data_in_cassandra() Now, execute the Python code by using python app.py. We will get the output below: Plain Text (myvenv) setia@setia-Lenovo-IdeaPad-S340-15IIL:/media/setia/Data/Tutorials/cassandra-ollama-app$ python app.py The street food stalls in Bangkok served fiery pad Thai that left Varun with a tangy memory of the city’s vibrant energy. 0.8205469250679016 Sipping a spicy masala chai in a bustling tea stall in Mumbai, Varun felt he was tasting the very soul of the city. 0.7719690799713135 A bowl of warm pho in a roadside eatery in Hanoi felt like comfort wrapped in a broth of herbs and spices, giving Varun a sense of warmth. 0.7495554089546204 You can see the cosine similarity of "The street food stalls in Bangkok served fiery pad Thai that left Varun with a tangy memory of the city’s vibrant energy." is 0.8205469250679016, which is the closest match. Final Milestone: We have implemented the RAG search. Enterprise Applications Apache Cassandra For enterprises, we can use Apache Cassandra 5.0 from popular cloud vendors such as Microsoft Azure, AWS, GCP, etc. Ollama This middleware requires a VM compatible with Nvidia-powered GPU for running high-performance models, but we don't need high-end VMs for models used for generating vectors. Depending upon traffic requirements, multiple VMs can be used, or any generative AI service like Open AI, Anthropy, etc, whichever Total Cost of Ownership is lower for scaling needs or Data Governance needs. Linux VM Apache Cassandra and Ollama can be combined and hosted in a single Linux VM if the use case doesn't require high usage to lower the Total Cost of Ownership or to address Data Governance needs. Conclusion We can easily build RAG applications by using Linux OS, Apache Cassandra, embedding models (nomic-embed-text) used via Ollama, and Python with good performance without needing any additional cloud subscription or services in the comfort of our machines/servers. However, hosting a VM in server(s) or opt for a cloud subscription for scaling as an enterprise application compliant with scalable architectures is recommended. In this Apache, Cassandra is a key component to do the heavy lifting of our vector storage and vector comparison and Ollama server for generating vector embeddings. That's it! Thanks for reading 'til the end.
Modern data architecture is necessary for organizations trying to remain competitive. It is not a choice. Organizations are finding it difficult to use the exponentially expanding amounts of data effectively. Importance of Modern Data Architectures Modern data architectures remain relevant, considering that they offer businesses and foster a systematic way of dealing with large quantities of data and, in return, make faster and quicker decisions. Modern businesses rely on these architectures because they provide real-time processing, powerful analytics, and numerous data sources. Understanding Modern Data Architectures Modern data architectures are frameworks enabling mass data collecting, processing, and data analysis. Usually, they comprise elements including data lakes, data warehouses, real-time processing, and analytics tools. Important components include: Scalability. The capability to handle the increased volume of data over time and still be efficient.Flexibility. Ability and/or suitability to work with different data types irrespective of their formats.Security. Measures to ensure that the right measures are taken to protect and/or keep confidential the data. Modern data architectures provide better data integration, more analytics power, and lower operational costs. Commonly employed are predictive analytics, processed data in real time, and unique solutions for each client. Key Features of Azure for Data Architecture In Microsoft Azure, there are data services tailored for modern-day data architectures. These features empower organizations to store, maintain, process, and analyze data in a safe, scalable, and efficient manner, bearing in mind the need for robust, scalable data solutions. The following is a description of some of the important Azure tools required for modern data architecture: 1. Azure Data Factory Azure Data Factory is an ETL tool offering cloud-based data integration, which is oriented towards building data-centric processes. It allows users to build workflows that are used to schedule and control data movement and transformation. It ensures proper data integration as organizations can centralize data from various sources in one location. 2. Azure Synapse Analytics Azure Synapse Analytics is a sophisticated analytics service that allows both big data and data warehousing. It allows enterprises to perform large-scale analytics on data and offers a unified approach to the ingestion, preparation, governance, and serving of data. 3. Azure Data Lake Storage Azure Data Lake Storage is meant for safe and scale out cloud-based storage. It has low-cost storage and high capabilities of overflooding, therefore maximizing big data technologies. 4. Azure Databricks Azure Databricks is a collaborative, quick, simple Apache Spark-based analytics tool. It's a great choice for creating scalable data pipelines, machine learning models, and data-driven apps since it blends perfectly with Azure services. Designing a Modern Data Architecture Modern data architecture is designed with a deliberate strategy to combine analytics tools, processing frameworks, and many data sources. Organizations can develop scalable, safe, and efficient architectures supporting their data-driven objectives using a disciplined design approach. Steps to Design: Assess, Plan, Design, Implement, and Manage Step 1. Assess Determine how far the present data implementation has gone and where it needs improvement. Step 2. Plan Provide a blueprint that describes the implementation of the compliance requirements and the need for capacity and governance of the data. Step 3. Design Model a system that provides an architecture consisting of analytic application controls and processing application systems and databases. Step 4. Implement Enforce the architecture using Azure services appropriate to your specific requirements. Step 5. Manage Monitor and maximize the applicable level of security, calculation, availability, and performance efficiencies across the entire area. Best Practices for Scalability, Performance, and Security An architecture of systems-based development on the platform above improves operational performance data and the availability of services. These have been diagnosed as the frequency of audits, limiting users’ access, and data encryption. Implementation Steps Modern data architecture principles require adequate and systematic planning and implementation of data scope, structural design, manipulation, and statistical analysis. Organizations can streamline these processes to develop an organized and efficient data ecosystem using the powerful tools of Azure. 1. Data Ingestion Strategies Data ingestion is the taking of data from multiple sources into one system. Azure Data Factory and Azure Event Hubs' effective ingesting capabilities enable batch and real-time data fusion. 2. Data Transformation and Processing Use Azure Databricks and Azure Synapse Analytics to interpret and process the data. Such instruments assist in data cleaning up, transforming, and preparing for analytics. 3. Management and Data Storage Azure Cosmos Database and Azure Data Lake Storage provide Abundant, efficient, and secure storage options. They allow the implementation of good availability and performance and do support multiple data types. 4. Visualization and Data Analysis The augmented analytics and visualizations offered by Azure Machine Learning, Power BI, and Azure Synapse Analytics allow decision-makers to execute strategies based on real-time insights. Challenges and Solutions New data architecture addresses modern needs, but with it comes integration, security, and scalability problems. But, these challenges grant Microsoft Azure great capabilities that allow organizations to explore far and better maximize their data plans. Common Challenges in Building Data Architectures Correcting data, integrating various data sources, and ensuring data security are complex tasks. In addition, there’s the issue of scaling designs when large amounts of data increase. How Azure Address These Challenges To solve these problems, Azure formulates security features and automatically verifies the tested datatypes. Data structures and forms of Azure are very flexible and can grow with the needs of the business. Data Architecture Future Trends In this relation, it is more than likely that 'Data architecture' will be characterized by edge computing, artificial intelligence-based analytics, and the use of blockchain technology for protecting data assets. Looking ahead, the pattern of constant improvements in Azure places the company in a favorable position with respect to the new worldwide trends and provision of firms with the relevant resources for race. Conclusion Organizations trying to maximize the value of data depend on modern data structures. Microsoft Azure offers thorough, scalable solutions from every aspect of data management. These technologies allow companies to create strong data systems that stimulate innovation and expansion.
Tuhin Chattopadhyay
CEO at Tuhin AI Advisory and Professor of Practice,
JAGSoM
Frederic Jacquet
Technology Evangelist,
AI[4]Human-Nexus
Suri Nuthalapati
Data & AI Practice Lead, Americas,
Cloudera
Pratik Prakash
Principal Solution Architect,
Capital One