DevOps isn't dead, it has just evolved. And the developer experience has been a driving force.Tell us how your DevOps journey has changed.
Stop letting outdated access processes hold your team back. Learn how to build smarter, faster, and more secure DevOps infrastructures.
Welcome to the Data Engineering category of DZone, where you will find all the information you need for AI/ML, big data, data, databases, and IoT. As you determine the first steps for new systems or reevaluate existing ones, you're going to require tools and resources to gather, store, and analyze data. The Zones within our Data Engineering category contain resources that will help you expertly navigate through the SDLC Analysis stage.
Artificial intelligence (AI) and machine learning (ML) are two fields that work together to create computer systems capable of perception, recognition, decision-making, and translation. Separately, AI is the ability for a computer system to mimic human intelligence through math and logic, and ML builds off AI by developing methods that "learn" through experience and do not require instruction. In the AI/ML Zone, you'll find resources ranging from tutorials to use cases that will help you navigate this rapidly growing field.
Big data comprises datasets that are massive, varied, complex, and can't be handled traditionally. Big data can include both structured and unstructured data, and it is often stored in data lakes or data warehouses. As organizations grow, big data becomes increasingly more crucial for gathering business insights and analytics. The Big Data Zone contains the resources you need for understanding data storage, data modeling, ELT, ETL, and more.
Data is at the core of software development. Think of it as information stored in anything from text documents and images to entire software programs, and these bits of information need to be processed, read, analyzed, stored, and transported throughout systems. In this Zone, you'll find resources covering the tools and strategies you need to handle data properly.
A database is a collection of structured data that is stored in a computer system, and it can be hosted on-premises or in the cloud. As databases are designed to enable easy access to data, our resources are compiled here for smooth browsing of everything you need to know from database management systems to database languages.
IoT, or the Internet of Things, is a technological field that makes it possible for users to connect devices and systems and exchange data over the internet. Through DZone's IoT resources, you'll learn about smart devices, sensors, networks, edge computing, and many other technologies — including those that are now part of the average person's daily life.
Seamless Transition from Elasticsearch to OpenSearch
OPC-UA and MQTT: A Guide to Protocols, Python Implementations
For years, developers have dreamed of having a coding buddy who would understand their projects well enough to automatically create intelligent code, not just pieces of it. We've all struggled with the inconsistent naming of variables across files, trying to recall exactly what function signature was defined months ago, and wasted valuable hours manually stitching pieces of our codebase together. This is where large language models (LLMs) come in — not as chatbots, but as strong engines in our IDEs, changing how we produce code by finally grasping the context of our work. Traditional code generation tools, and even basic features of IDE auto-completion, usually fall short because they lack a deep understanding of the broader context; hence, they usually operate in a very limited view, such as only the current file or a small window of code. The result is syntactically correct but semantically inappropriate suggestions, which need to be constantly manually corrected and integrated by the developer. Think about suggesting a variable name that is already used at some other crucial module with a different meaning — a frustrating experience we've all encountered. LLMs now change this game entirely by bringing a much deeper understanding to the table: analyzing your whole project, from variable declarations in several files down to function call hierarchies and even your coding style. Think of an IDE that truly understands not just the what of your code but also the why and how in the bigger scheme of things. That is a promise of LLM-powered IDEs, and it's real. Take, for example, a state-of-the-art IDE using LLMs, like Cursor. It's not simply looking at the line you're typing; it knows what function you are in, what variables you have defined in this and related files, and the general structure of your application. That deep understanding is achieved by some fancy architectural components. This is built upon what's called an Abstract Syntax Tree, or AST. An IDE will parse your code into a tree-like representation of the grammatical constructs in that code. This gives the LLM at least an elementary understanding of code, far superior to simple plain text. Secondly, in order to properly capture semantics between files, a knowledge graph has been generated. It interlinks all of the class-function-variable relationships throughout your whole project and builds an understanding of these sorts of dependencies and relationships. Consider a simplified JavaScript example of how context is modeled: JavaScript /* Context Model based on an edited single document and with external imports */ function Context(codeText, lineInfo, importedDocs) { this.current_line_code = codeText; // Line with active text selection this.lineInfo = lineInfo; // Line number, location, code document structure etc. this.relatedContext = { importedDocs: importedDocs, // All info of imported or dependencies within text }; // ... additional code details ... } This flowchart shows how information flows when a developer changes his/her code. Markdown graph LR A[Editor(User Code Modification)] --> B(Context Extractor); B --> C{AST Structure Generation}; C --> D[Code Graph Definition Creation ]; D --> E( LLM Context API Input) ; E --> F[LLM API Call] ; F --> G(Generated Output); style A fill:#f9f,stroke:#333,stroke-width:2px style F fill:#aaf,stroke:#333,stroke-width:2px The Workflow of LLM-Powered IDEs 1. Editor The process starts with a change that you, as the developer, make in the code using the code editor. Perhaps you typed some new code, deleted some lines, or even edited some statements. This is represented by node A. 2. Context Extractor That change you have just made triggers the Context Extractor. This module essentially collects all information around your modification within the code — somewhat like an IDE detective looking for clues in the environs. This is represented by node B. 3. AST Structure Generation That code snippet is fed to a module called AST Structure Generation. AST is the abbreviation for Abstract Syntax Tree. This module will parse your code, quite similar to what a compiler would do. Then, it begins creating a tree-like representation of the grammatical structure of your code. For LLMs, such a structured view is important for understanding the meaning and the relationships among the various parts of the code. This is represented by node C, provided within the curly braces. 4. Creation of Code Graph Definition Next, the creation of the Code Graph Definition will be done. This module will take the structured information from the AST and build an even greater understanding of how your code fits in with the rest of your project. It infers dependencies between files, functions, classes, and variables and extends the knowledge graph, creating a big picture of the general context of your codebase. This is represented by node D. 5. LLM Context API Input All the context gathered and structured — the current code, the AST, and the code graph — will finally be transformed into a particular input structure. This will be done so that it is apt for the large language model input. Then, finally, this input is sent to the LLM through a request, asking for either code generation or its completion. This is represented by node E. 6. LLM API Call It is now time to actually call the LLM. At this moment, the well-structured context is passed to the API of the LLM. This is where all the magic has to happen: based on its training material and given context, the LLM should give suggestions for code. This is represented with node F, colored in blue to indicate again that this is an important node. 7. Generated Output The LLM returns its suggestions, and the user sees them inside the code editor. This could be code completions, code block suggestions, or even refactoring options, depending on how well the IDE understands the current context of your project. This is represented by node G. So, how does this translate to real-world improvements? We've run benchmarks comparing traditional code completion methods with those powered by LLMs in context-aware IDEs. The results are compelling: Metric Baseline (Traditional Methods) LLM-Powered IDE (Context Aware) Improvement Accuracy of Suggestions (Score 0-1) 0.55 0.91 65% Higher Average Latency (ms) 20 250 Acceptable for Benefit Token Count in Prompt Baseline **~ 30% Less (Optimized Context)** Optimized Prompt Size Graph: Comparison of suggestion accuracy scores across 10 different code generation tasks. A higher score indicates better accuracy. Markdown graph LR A[Test Case 1] -->|Baseline: 0.5| B(0.9); A -->|LLM IDE: 0.9| B; C[Test Case 2] -->|Baseline: 0.6| D(0.88); C -->|LLM IDE: 0.88| D; E[Test Case 3] -->|Baseline: 0.7| F(0.91); E -->|LLM IDE: 0.91| F; G[Test Case 4] -->|Baseline: 0.52| H(0.94); G -->|LLM IDE: 0.94| H; I[Test Case 5] -->|Baseline: 0.65| J(0.88); I -->|LLM IDE: 0.88| J; K[Test Case 6] -->|Baseline: 0.48| L(0.97); K -->|LLM IDE: 0.97| L; M[Test Case 7] -->|Baseline: 0.58| N(0.85); M -->|LLM IDE: 0.85| N; O[Test Case 8] -->|Baseline: 0.71| P(0.90); O -->|LLM IDE: 0.90| P; Q[Test Case 9] -->|Baseline: 0.55| R(0.87); Q -->|LLM IDE: 0.87| R; S[Test Case 10] -->|Baseline: 0.62| T(0.96); S -->|LLM IDE: 0.96| T; style B fill:#ccf,stroke:#333,stroke-width:2px style D fill:#ccf,stroke:#333,stroke-width:2px Let's break down how these coding tools performed, like watching a head-to-head competition. Imagine each row in our results table as a different coding challenge (we called them "Test Case 1" through "Test Case 10"). For each challenge, we pitted two approaches against each other: The Baseline: Think of this as the "old-school" method, either using standard code suggestions or a basic AI that doesn't really "know" the project inside and out. You'll see an arrow pointing from the test case (like 'Test Case 1', which we labeled Node A) to its score — that's how well the baseline did.The LLM IDE: This is the "smart" IDE we've built, the one with a deep understanding of the entire project, like it's been studying it for weeks. Another arrow points from the same test case to the same score, but this time, it tells you how the intelligent IDE performed. Notice how the result itself (like Node B) is highlighted in light blue? That's our visual cue to show where the smart IDE really shined. Take Test Case 1 (that's Node A) as an example: The arrow marked 'Baseline: 0.5' means the traditional method got it right about half the time for that task.But look at the arrow marked 'LLM IDE: 0.9'! The smart IDE, because it understands the bigger picture of the project, nailed it almost every time. If you scan through each test case, you'll quickly see a pattern: the LLM-powered IDE consistently and significantly outperforms the traditional approach. It's like having a super-knowledgeable teammate who always seems to know the right way to do things because they understand the entire project. The big takeaway here is the massive leap in accuracy when the AI truly grasps the context of your project. Yes, there's a tiny bit more waiting time involved as the IDE does its deeper analysis, but honestly, the huge jump in accuracy and the fact that you'll spend way less time fixing errors makes it a no-brainer for developers. But it's more than just the numbers. Think about the actual experience of coding. Engineers who've used these smarter IDEs say it feels like a weight has been lifted. They're not constantly having to keep every tiny detail of the project in their heads. They can focus on the bigger, more interesting problems, trusting that their IDE has their back on the details. Even tricky stuff like reorganizing code becomes less of a headache, and getting up to speed on a new project becomes much smoother because the AI acts like a built-in expert, helping you connect the dots. These LLM-powered IDEs aren't just about spitting out code; they're about making developers more powerful. By truly understanding the intricate connections within a project, these tools are poised to change how software is built. They'll make us faster and more accurate and, ultimately, allow us to focus on building truly innovative things. The future of coding assistance is here, and it's all about having that deep contextual understanding.
The Bloom filter is a lesser-known data structure that is not widely used by developers. It is a space-efficient, highly probabilistic data structure that every developer should be familiar with. It can significantly speed up exact match queries, especially in cases where indexing has not been added to that field. The space efficiency of a Bloom filter provides the added advantage of allowing filters to be created for multiple fields. How It Works Reading from a database or storage is a costly operation. To optimize this, we use a Bloom filter to check the availability of a key-value pair and only perform a database read if the filter responds with a 'Yes.' Bloom filters are space-efficient and can be stored in memory. Additionally, the lookup test for a value can be performed in O(1) time. More on this later. Let’s explore this concept with an example: We create a Bloom filter for the key 'Key1'. When clients request data associated with any value of Key1, we first check the availability of the value by passing it through the Bloom filter. In the diagram above, we attempt to read the payload for three different values of Key1: For the first value, 'Value1', the filter responds with a 'No,' allowing us to short-circuit the read operation and return "not available."In the second example, 'Value2' exists in the database. The filter responds with a 'Yes,' prompting a database read to retrieve the payload associated with Key1 = Value2.The third example is more interesting: the filter responds with a 'Yes' for 'Value3,' but the database does not contain any payload for Key1 = Value3. This is known as a false positive. In this case, a database read is performed, but no value is returned. Although false positives can occur, there are never false negatives — meaning it is impossible for a key-value combination to exist while the filter returns a 'No.' How to Implement a Bloom Filter A Bloom filter is a bit array of length 'n,' with all values initialized to 0. It requires 'h' distinct hash functions to populate the bit array. When a value is added to the filter, each hash function is applied to the value to generate a number between 0 and n, and the corresponding 'h' bits are set in the bit array. To test whether an element exists in the array, the same set of hash functions is applied to the element, generating 'h' indices between 0 and n. We then check if the corresponding bits in the bit array for those indices are set to 1. If all the bits are set to 1, it is considered a hit, and the filter returns true. As shown in the above example, Value1, Value2, and Value3 are added to the filter. Each value is passed through three hash functions, and the corresponding bits in the bit array are set. Later, during testing, Value1, Value4, and Value5 are passed through the same set of hash functions to determine whether the values exist. Due to the nature of hash functions and their finite range of output values, multiple inputs can generate the same hash value, leading to collisions. Collisions can result in false positives, as demonstrated in the case of Value5. Selecting an appropriate hash function and an adequate length for the bit array can help reduce collisions. Additionally, understanding the range of possible input values, such as all strings or numbers within a finite range, can assist in choosing an optimal hash function and bit array length. Time and Space Complexity Hash functions operate in O(1) time, making both the set and test operations constant-time operations. The space required depends on the length of the bit array; however, compared to a traditional index, this space is minimal as the Bloom filter does not store actual values but only a few bits per value. Limitations 1. Bloom filters can only support exact matches because their operation relies on passing input through hash functions, which require an exact match. 2. Deleting a value from a Bloom filter is not straightforward since bits in the array cannot simply be unset. A bit set in the array may correspond to multiple values. To support deletions, the bit array would need to be recreated. 3. If the hash function(s) and bit array length are not chosen correctly, it may result in an increased number of false positives, making the filter inefficient and potentially an overhead. In the worst-case scenario, all bits in the array could be set after adding all the values. In such a situation, any test would always yield a positive response from the filter. To address this, the filter can be recreated with a resized bit array and new hash functions tailored to better meet the requirements. Final Thoughts Hopefully, this article has introduced you to Bloom filters, if you weren’t already familiar with them, and provided you with a valuable data structure to expand your knowledge. Like any other data structure, its usage depends on the specific use case, but it can prove to be highly useful when the right opportunity arises.
Serverless computing is a cloud computing model where cloud providers like AWS, Azure, and GCP manage the server infrastructure, dynamically allocating resources as needed. Developers either invoke APIs directly or write code in the form of functions, and the cloud provider executes these functions in response to certain events. This means developers can scale applications automatically without worrying about server management and deployments, leading to cost savings and improved agility. The main advantage of serverless computing is that it abstracts away much of the complexity related to release management, and developers don’t need to worry about capacity planning, hardware management, or even operating systems. This simplicity frees up time and resources to focus more on building innovative applications and services on top of the deployed models. AI Model Deployment Model deployment involves several critical steps to take a machine learning or AI model from development to production, ensuring it is scalable, reliable, and effective. Key elements include model training and optimization, where the model is fine-tuned for performance, and model versioning, which helps manage different iterations. Once trained, the model is serialized and packaged with its necessary dependencies, ready to be deployed in an appropriate runtime environment, such as a cloud platform or containerized service. The model is exposed via APIs or web services, allowing it to provide real-time predictions to external applications. In addition to deployment, continuous monitoring and the establishment of CI/CD pipelines for automated retraining and model updates are crucial. Security measures are also essential to safeguard data privacy and ensure compliance with regulations. Models must be interpretable, particularly in industries that require an explanation of AI decisions, and feedback loops should be incorporated to refine the model over time based on user input or data changes. Managing resources efficiently to optimize operational costs is also a key element, ensuring that the deployed model remains cost-effective and sustainable. Collectively, these elements ensure that a machine learning model can operate efficiently, securely, and with high performance in a production environment. Serverless AI Inference Serverless AI inference refers to the use of serverless computing platforms to deploy and execute machine learning models for making predictions without the need to manage infrastructure or worry about scaling resources. In this setup, the model is hosted as an API endpoint and users are charged only for the compute time their models actually use, offering cost efficiency and flexibility. Serverless platforms like AWS Lambda, Google Cloud Functions, and Azure Functions enable developers to upload their trained models and expose them through APIs for real-time predictions. This allows businesses to integrate AI-driven decision-making into their applications without needing to manage complex server infrastructure. One of the primary advantages of serverless AI inference is its ability to seamlessly scale with varying request volumes, making it ideal for use cases like fraud detection, recommendation systems, and real-time image or speech recognition. Additionally, it reduces operational overhead, enabling data scientists and developers to focus on the model's accuracy and performance rather than managing infrastructure. Serverless AI inference is becoming increasingly popular for lightweight, low-latency applications that require fast and cost-effective AI predictions without the need for dedicated infrastructure. Advantages of Serverless AI Traditional AI models often require significant resources to deploy and scale, especially in production environments. With serverless infrastructure, developers can tap into a highly flexible, pay-as-you-go model that optimizes both cost and efficiency. Here are several key advantages of serverless AI: Simplicity AI models typically require a lot of configuration, especially when scaling across multiple machines for distributed computing. Serverless computing abstracts much of the infrastructure management and allows developers to quickly deploy and iterate on their AI models. Developers can focus solely on the core logic, and as a result, businesses can develop AI-powered solutions faster than ever before. Scalability Serverless computing offers virtually unlimited scalability, allowing applications to handle increased demand without additional setup or configuration. For instance, if a particular AI model is serving real-time predictions for a web app and suddenly faces a spike in users, serverless infrastructure can automatically scale to handle this surge without manual intervention. Cost-Efficiency Serverless computing operates on a consumption-based pricing model, where users only pay for the actual resources used. This is particularly advantageous when working with AI, as many AI workloads have bursts in traffic, i.e., they need heavy resources during certain times but little or none during others. Event-Driven Architecture Serverless platforms are inherently event-driven, making them ideal for AI applications that need to respond to real-time data. This is crucial for scenarios such as fraud detection, anomaly detection, etc. Serverless Solutions By leveraging a serverless ecosystem, organizations can focus on innovation, benefit from automatic scaling, optimize costs, and deliver applications faster, all while maintaining a secure and efficient development environment. Serverless with AWS: AWS provides a range of services that support serverless AI, such as AWS Lambda, which allows users to run code in response to events without provisioning or managing servers. For machine learning tasks, services like Amazon Sage Maker enable developers to quickly train, deploy, and manage models at scale. Serverless with Microsoft Azure: Azure's serverless offerings, such as Azure Functions, allow developers to run AI models and code in response to specific events or triggers, automatically scaling based on demand. Azure also provides robust machine learning services through Azure Machine Learning, which offers tools for training, deploying, and managing AI models at scale.Serverless with GCP: GCP provides key serverless services like Cloud Functions for event-driven computing. These services enable seamless integration with GCP’s AI and machine learning offerings, such as Vertex AI, allowing businesses to easily deploy AI models and process real-time data. Serverless Challenges Cold Start Latency Serverless functions can experience a delay when they are invoked after a period of inactivity. For AI models that require high responsiveness, cold starts could introduce latency, which might be a problem for real-time applications. State Management Serverless functions are stateless by design, which means that managing the state of an AI model during inference can be tricky. Developers must design their applications to handle session persistence or state externally using databases or distributed caches. Resource Governance Many serverless platforms impose limitations on memory, execution time, and CPU/GPU usage. For particularly resource-intensive AI models, this could pose a problem, though it's often possible to design efficient models or split large tasks into smaller functions. Scheduling Fairness Scheduling fairness in serverless AI inference ensures equitable resource allocation across concurrent tasks, preventing resource monopolization and delays. It is crucial for balancing latency-sensitive and resource-intensive workloads while maintaining consistent performance. Achieving fairness requires strategies like priority queues, load balancing, and predictive scheduling, though the dynamic nature of serverless environments makes this challenging. Effective scheduling is key to optimizing throughput and responsiveness in AI inference tasks. Conclusion Serverless architectures revolutionize the way developers and businesses approach technology by delivering unparalleled scalability, cost efficiency, and simplicity. By eliminating the need to manage and maintain underlying infrastructure, these architectures allow developers to channel their energy into innovation, enabling them to design and implement cutting-edge AI applications with ease. Businesses leveraging serverless computing gain the ability to rapidly adapt to changing demands, reduce operational costs, and accelerate development cycles. This agility fosters the creation of more efficient and powerful AI-driven solutions. References Orchestrate generative AI workflows with Amazon Bedrock and AWS Step FunctionsDeploy models as serverless APIsRun your AI inference applications on Cloud Run with NVIDIA GPUs
There are only two hard things in Computer Science: cache invalidation and naming things. - Phil Karlton Caching is an important technique in system design and offers several benefits. With caching, you can improve the performance and availability of your system while simultaneously reducing the cost of operating your service. Caching is the Swiss army knife of system design. In this article, I will primarily talk about availability and resiliency, which indirectly help with the other aspects of system design. For example, a service that serves its user requests mostly through a cache can reduce the number of calls made to the back-end system, which is one of the main sources of costs incurred to run the service. Similarly, as the cache can serve customer requests quickly, the service can support a higher rate of requests per second (RPS). Not only this, but if a service can serve requests through cached data, it reduces the pressure on downstream dependencies, preventing them from failing or browning out, and hence helping the service using it. But as the saying goes, there is no free lunch in this world; caches come with their downsides as well. The performance of a cache is typically measured in the form of cache hits or misses. All things work perfectly as expected as long as you have a high cache hit ratio. The moment you start to see a relatively high ratio of cache misses, you start to see its downsides. For example, a service can serve 10k transactions per second (TPS) built on a service that can only do 1k TPS. On a good day, there is no problem, and it can withstand a failure rate of up to 10%. But on a bad day, when for some reason > 10% of requests fail to get the data from the cache, they end up making calls back to the service and risk browning it out. Now, if the service takes time to recover, the entries that are good right now will also retire and cause cache misses, leading to a further increase in traffic to the back-end service. In my last two articles about retry dilemma and backpressure, I talked about how a slight change in behavior leads to elongated outages if proper guardrails are not in place. Similarly, caches need guardrails and careful consideration. Before going into the details of what can be done to avoid these things, let’s see some of the common cache failure modes: Failure Modes Cascading Failure: The Doom Loop I call this a doom loop because, once it starts, it can cause a complete outage. Let me explain. Typically, a cache comprises multiple cache nodes to distribute the load. If one node fails, the load shifts to the other nodes while a new node comes up and starts to take traffic. Now, imagine a scenario where every node in the cache cluster is running hot, and one node fails. The traffic from the failed node spills over to the other nodes in the cluster, causing them to become overloaded and eventually crash. With more nodes out of the cluster, the load shifts to the remaining healthy nodes, potentially leading to a complete outage. These kinds of outages are not unheard of and have occurred in the real world. Even a simple deployment, where one node is removed and then re-added to the cluster, can cause this kind of outage. Thundering Herd: Generating Excessing Back-End Load This is another cache failure scenario that is often seen when discussing caches and their failure modes. Consider a popular website where many users access the same information, like a news article. This article is often stored in cache to speed up retrieval times. Everything works as expected until it doesn’t. Let’s say a cache entry is serving thousands of people successfully, but it reaches its expiry. Suddenly, all of these users are now sent to the database to have their queries served. This causes an overload on the backend and leads to slowdowns, errors, and even a complete crash. This is commonly known as the thundering herd problem. Increased load in the backend can lead to cascading failures for other dependent services that rely on everything behaving correctly. Cache Coherence: Inconsistent Data in Cache In large-scale systems, caches are often distributed across multiple nodes and levels, such as L1 (in-memory) and L2 (source database), to serve user requests. However, maintaining consistency across these nodes for a given user can be challenging due to varying access patterns and data requirements. Caching inappropriate data, such as frequently changing data, negates the benefits of caching and leads to increased overhead and cache misses. This can create unreliable behavior, where different applications using different nodes process and generate inconsistent results. Forever Caches: Evicting and Invalidating/Expiring Cache Entries The story of caching is incomplete without discussing cache eviction and invalidation. Removing entries from the cache is as important as adding them. Common cache eviction techniques include Least Recently Used (LRU), Least Frequently Used (LFU), and FIFO. Cache eviction typically occurs when the cache is at capacity. However, consider a scenario where the cache is full, and you remove a frequently used data point. This removal will lead to a cache miss and cause performance regression through the thundering herd effect. Another possibility is thrashing, where the same data is repeatedly added and removed due to conflicts with the selected policy, leading to cache performance degradation. On the other hand, invalidation is necessary when data becomes stale. Common invalidation techniques are time-based or event-driven expiration. A common failure with invalidation occurs when it is implemented improperly, leading to service outages. For example, if you haven't implemented any cache entry removal (expiration or eviction), your cache will grow in size and eventually crash the cache node. This not only increases the load on the backend but also on other cache nodes, potentially triggering the Doom Loop I explained above. People often forget this when implementing caches. I have seen production systems operate for years before this behavior was discovered, leading to multi-hour outages. As someone aptly said, "Everything happens for a reason." While I am sure these are not the only cache failure modes, I want to focus on these for now. Let's discuss some practical strategies to resolve these failure modes and make caches useful for high-volume production use cases. Practical Approaches Intermediatory Caches: L2 Caching Layer In the case of caches, two is better than one. The idea is to introduce an intermediate layer between the in-memory and the back-end database. This middle cache is highly available and decoupled from the back-end database. If a cache miss occurs, it first checks this middle layer (let's call it L2), and only if the data is not present there does it hit L3 (the backend). This removes the direct coupling between the application and back-end databases and simplifies deployment; when cache node eviction occurs, it falls back to L2 instead of the backend itself. Because this is separate from the main application, it can have a dedicated setup and more relaxed memory constraints for better performance. However, as you can now see, you have a new caching fleet to maintain. This increases costs, and the system now has a new mode of operation to deal with. All the best practices mentioned in the previous articles should be considered when dealing with failures here. Request Coalescing In the previous section, I discussed introducing an L2 caching layer. While helpful, it can still suffer from a thundering herd problem and needs safeguards to prevent overload. The application using the cache can also help mitigate this. Application code can employ a technique called request coalescing, where multiple similar requests are combined to reduce pressure on the backend. In this technique, only one request among the similar ones is made, and the data returned is shared with all the common calls. There are multiple ways to implement request coalescing. One popular approach is to introduce a queue where requests are staged, and a single processor thread makes the back-end call and distributes the response to all the waiting requests. Locking is another widely adopted technique. Sensible Caching The success of a caching solution is measured by its cache hit rate. To increase the chances of success, you can pre-warm the cache with frequently accessed data or data that has a high read-to-write ratio. This reduces the load on the backend and is commonly referred to as a solution to the cold cache problem. Another useful technique is negative caching, where you cache data that is not present in the back-end database and update the application code to handle it. In this case, when a query is made to get data for a particular record that is not present in the cache or the backend, it can be served with a "no data" response without querying the backend. Additionally, well-researched caching strategies such as "write-through cache," "write-back cache," and "read-through cache" should be considered based on the application's needs. Proper Cache Eviction and Invalidation/Expiration Core to a performant cache is how eviction and record invalidation are handled. Eviction is necessary to maintain the right cache size. The Eviction Policy decides which items are removed from the cache when it reaches capacity. For example, if temporal locality is important, use LRU; if you have a consistent access pattern, use LFU. Invalidation/Expiration determines how long to keep a cache in memory. The most common technique to set expiration is through absolute time-based expiration (setting time to live (TTL)). Setting the appropriate TTL is important to avoid mass eviction or eviction of a hotkey, leading to a stampede effect. You can jitter the TTL to avoid concurrent eviction. Generally, TTL is set based on the staleness sensitivity of the application relying on it. While preparation is essential, nothing compares to real-world data. Start with reasonable settings, but then rely on metrics and alarms to measure cache performance and tune it to production needs. Caches Will Fail: It’s a Matter of When, Not If Instead of only preparing to deal with cache failure after it happens, let’s bake failure mitigation into the design itself. You can start with the assumption that cache will eventually fail, and your system should be able to deal with it. Let’s take the Thundering Herd scenario explained above as an example. To prepare for it, I suggest developers introduce a ~10% (or as they see appropriate) failure rate in cache query requests and observe how their system behaves. It could be a canary that, at regular intervals, causes this to occur and forces the system to exercise this failure mode. This helps highlight both Thundering Herd and Doom Loop failure modes. This is often overlooked and is the most powerful tool in the developer toolset. Monitoring caches: Analyze cache performance on a regular basis and change strategies to make it more dynamic. Conclusion Caching can bring much-needed performance and scalability to distributed systems, but it requires critical thinking and strategic planning. Understanding the potential pitfalls of caching will enable a developer to optimize their systems effectively while reducing risks. By implementing strategic caching, strong invalidation, consistency mechanisms, and comprehensive monitoring, developers can create truly efficient and resilient distributed systems.
The LLM can work with the knowledge it has from its training data. To extend the knowledge retrieval-augmented generation (RAG) can be used that retrieves relevant information from a vector database and adds it to the prompt context. To provide really up-to-date information, function calls can be used to request the current information (flight arrival times, for example) from the responsible system. That enables the LLM to answer questions that require current information for an accurate response. The AIDocumentLibraryChat has been extended to show how to use the function call API of Spring AI to call the OpenLibrary API. The REST API provides book information for authors, titles, and subjects. The response can be a text answer or an LLM-generated JSON response. For the JSON response, the Structured Output feature of Spring AI is used to map the JSON in Java objects. Architecture The request flow looks like this: The LLM gets the prompt with the user question.The LLM decides if it calls a function based on its descriptions.The LLM uses the function call response to generate the answer.The Spring AI formats the answer as JSON or text according to the request parameter. Implementation Backend To use the function calling feature, the LLM has to support it. The Llama 3.1 model with function calling support is used by the AIDocumentLibraryChat project. The properties file: Properties files # function calling spring.ai.ollama.chat.model=llama3.1:8b spring.ai.ollama.chat.options.num-ctx=65535 The Ollama model is set, and the context window is set to 64k because large JSON responses need a lot of tokens. The function is provided to Spring AI in the FunctionConfig class: Java @Configuration public class FunctionConfig { private final OpenLibraryClient openLibraryClient; public FunctionConfig(OpenLibraryClient openLibraryClient) { this.openLibraryClient = openLibraryClient; } @Bean @Description("Search for books by author, title or subject.") public Function<OpenLibraryClient.Request, OpenLibraryClient.Response> openLibraryClient() { return this.openLibraryClient::apply; } } First, the OpenLibraryClient gets injected. Then, a Spring Bean is defined with its annotation, and the @Description annotation that provides the context information for the LLM to decide if the function is used. Spring AI uses the OpenLibraryClient.Request for the call and the OpenLibraryClient.Response for the answer of the function. The method name openLibraryClient is used as a function name by Spring AI. The request/response definition for the openLibraryClient() is in the OpenLibraryClient: Java public interface OpenLibraryClient extends Function<OpenLibraryClient.Request, OpenLibraryClient.Response> { @JsonIgnoreProperties(ignoreUnknown = true) record Book(@JsonProperty(value= "author_name", required = false) List<String> authorName, @JsonProperty(value= "language", required = false) List<String> languages, @JsonProperty(value= "publish_date", required = false) List<String> publishDates, @JsonProperty(value= "publisher", required = false) List<String> publishers, String title, String type, @JsonProperty(value= "subject", required = false) List<String> subjects, @JsonProperty(value= "place", required = false) List<String> places, @JsonProperty(value= "time", required = false) List<String> times, @JsonProperty(value= "person", required = false) List<String> persons, @JsonProperty(value= "ratings_average", required = false) Double ratingsAverage) {} @JsonInclude(Include.NON_NULL) @JsonClassDescription("OpenLibrary API request") record Request(@JsonProperty(required=false, value="author") @JsonPropertyDescription("The book author") String author, @JsonProperty(required=false, value="title") @JsonPropertyDescription("The book title") String title, @JsonProperty(required=false, value="subject") @JsonPropertyDescription("The book subject") String subject) {} @JsonIgnoreProperties(ignoreUnknown = true) record Response(Long numFound, Long start, Boolean numFoundExact, List<Book> docs) {} } The annotation @JsonPropertyDescription is used by Spring AI to describe the function parameters for the LLM. The annotation is used on the request record and each of its parameters to enable the LLM to provide the right values for the function call. The response JSON is mapped in the response record by Spring and does not need any description. The FunctionService processes the user questions and provides the responses: Java @Service public class FunctionService { private static final Logger LOGGER = LoggerFactory .getLogger(FunctionService.class); private final ChatClient chatClient; @JsonPropertyOrder({ "title", "summary" }) public record JsonBook(String title, String summary) { } @JsonPropertyOrder({ "author", "books" }) public record JsonResult(String author, List<JsonBook> books) { } private final String promptStr = """ Make sure to have a parameter when calling a function. If no parameter is provided ask the user for the parameter. Create a summary for each book based on the function response subject. User Query: %s """; @Value("${spring.profiles.active:}") private String activeProfile; public FunctionService(Builder builder) { this.chatClient = builder.build(); } public FunctionResult functionCall(String question, ResultFormat resultFormat) { if (!this.activeProfile.contains("ollama")) { return new FunctionResult(" ", null); } FunctionResult result = switch (resultFormat) { case ResultFormat.Text -> this.functionCallText(question); case ResultFormat.Json -> this.functionCallJson(question); }; return result; } private FunctionResult functionCallText(String question) { var result = this.chatClient.prompt().user( this.promptStr + question).functions("openLibraryClient") .call().content(); return new FunctionResult(result, null); } private FunctionResult functionCallJson(String question) { var result = this.chatClient.prompt().user(this.promptStr + question).functions("openLibraryClient") .call().entity(new ParameterizedTypeReference<List<JsonResult>>() {}); return new FunctionResult(null, result); } } In the FunctionService are the records for the responses defined. Then, the prompt string is created, and the profiles are set in the activeProfile property. The constructor creates the chatClient property with its Builder. The functionCall(...) method has the user question and the result format as parameters. It checks for the ollama profile and then selects the method for the result format. The function call methods use the chatClient property to call the LLM with the available functions (multiple possible). The method name of the bean that provides the function is the function name, and they can be comma-separated. The response of the LLM can be either got with .content() as an answer string or with .Entity(...) as a JSON mapped in the provided classes. Then, the FunctionResult record is returned. Conclusion Spring AI provides an easy-to-use API for function calling that abstracts the hard parts of creating the function call and returning the response as JSON. Multiple functions can be provided to the ChatClient. The descriptions can be provided easily by annotation on the function method and on the request with its parameters. The JSON response can be created with just the .entity(...) method call. That enables the display of the result in a structured component like a tree. Spring AI is a very good framework for working with AI and enables all its users to work with LLMs easily. Frontend The frontend supports the request for a text response and a JSON response. The text response is displayed in the frontend. The JSON response enables the display in an Angular Material Tree Component. Response with a tree component: The component template looks like this: XML <mat-tree [dataSource]="dataSource" [treeControl]="treeControl" class="example-tree"> <mat-tree-node *matTreeNodeDef="let node" matTreeNodeToggle> <div class="tree-node"> <div> <span i18n="@@functionSearchTitle">Title</span>: {{ node.value1 } </div> <div> <span i18n="@@functionSearchSummary">Summary</span>: {{ node.value2 } </div> </div> </mat-tree-node> <mat-nested-tree-node *matTreeNodeDef="let node; when: hasChild"> <div class="mat-tree-node"> <button mat-icon-button matTreeNodeToggle> <mat-icon class="mat-icon-rtl-mirror"> {{ treeControl.isExpanded(node) ? "expand_more" : "chevron_right" } </mat-icon> </button> <span class="book-author" i18n="@@functionSearchAuthor"> Author</span> <span class="book-author">: {{ node.value1 }</span> </div> <div [class.example-tree-invisible]="!treeControl.isExpanded(node)" role="group"> <ng-container matTreeNodeOutlet></ng-container> </div> </mat-nested-tree-node> </mat-tree> The Angular Material Tree needs the dataSource, hasChild and the treeControl to work with. The dataSource contains a tree structure of objects with the values that need to be displayed. The hasChild checks if the tree node has children that can be opened. The treeControl controls the opening and closing of the tree nodes. The <mat-tree-node ... contains the tree leaf that displays the title and summary of the book. The mat-nested-tree-node ... is the base tree node that displays the author's name. The treeControl toggles the icon and shows the tree leaf. The tree leaf is shown in the <ng-container matTreeNodeOutlet> component. The component class looks like this: TypeScript export class FunctionSearchComponent { ... protected treeControl = new NestedTreeControl<TreeNode>( (node) => node.children ); protected dataSource = new MatTreeNestedDataSource<TreeNode>(); protected responseJson = [{ value1: "", value2: "" } as TreeNode]; ... protected hasChild = (_: number, node: TreeNode) => !!node.children && node.children.length > 0; ... protected search(): void { this.searching = true; this.dataSource.data = []; const startDate = new Date(); this.repeatSub?.unsubscribe(); this.repeatSub = interval(100).pipe(map(() => new Date()), takeUntilDestroyed(this.destroyRef)) .subscribe((newDate) => (this.msWorking = newDate.getTime() - startDate.getTime())); this.functionSearchService .postLibraryFunction({question: this.searchValueControl.value, resultFormat: this.resultFormatControl.value} as FunctionSearch) .pipe(tap(() => this.repeatSub?.unsubscribe()), takeUntilDestroyed(this.destroyRef), tap(() => (this.searching = false))) .subscribe(value => this.resultFormatControl.value === this.resultFormats[0] ? this.responseText = value.result || '' : this.responseJson = this.addToDataSource(this.mapResult( value.jsonResult || [{ author: "", books: [] }] as JsonResult[]))); } ... private addToDataSource(treeNodes: TreeNode[]): TreeNode[] { this.dataSource.data = treeNodes; return treeNodes; } ... private mapResult(jsonResults: JsonResult[]): TreeNode[] { const createChildren = (books: JsonBook[]) => books.map(value => ({ value1: value.title, value2: value.summary } as TreeNode)); const rootNode = jsonResults.map(myValue => ({ value1: myValue.author, value2: "", children: createChildren(myValue.books) } as TreeNode)); return rootNode; } ... } The Angular FunctionSearchComponent defines the treeControl, dataSource, and the hasChild for the tree component. The search() method first creates a 100ms interval to display the time the LLM needs to respond. The interval gets stopped when the response has been received. Then, the function postLibraryFunction(...) is used to request the response from the backend/AI. The .subscribe(...) function is called when the result is received and maps the result with the methods addToDataSource(...) and mapResult(...) into the dataSource of the tree component. Conclusion The Angular Material Tree component is easy to use for the functionality it provides. The Spring AI structured output feature enables the display of the response in the tree component. That makes the AI results much more useful than just text answers. Bigger results can be displayed in a structured manner that would be otherwise a lengthy text. A Hint at the End The Angular Material Tree component creates all leafs at creation time. With a large tree with costly components in the leafs like Angular Material Tables the tree can take seconds to render. To avoid this treeControl.isExpanded(node) can be used with @if to render the tree leaf content at the time it is expanded. Then the tree renders fast, and the tree leafs are rendered fast, too.
LLMs need to connect to the real world. LangChain4j tools, combined with Apache Camel, make this easy. Camel provides robust integration, connecting your LLM to any service or API. This lets your AI interact with databases, queues, and more, creating truly powerful applications. We'll explore this powerful combination and its potential. Setting Up the Development Environment Ollama: Provides a way to run large language models (LLMs) locally. You can run many models, such as LLama3, Mistral, CodeLlama, and many others on your machine, with full CPU and GPU support.Visual Studio Code: With Kaoto, Java, and Quarkus plugins installed.OpenJDK 21MavenQuarkus 3.17Quarkus Dev Services: A feature of Quarkus that simplifies the development and testing of applications the development and testing of applications that rely on external services such as databases, messaging systems, and other resources. You can download the complete code at the following GitHub repo. The following instructions will be executed on Visual Studio Code: 1. Creating the Quarkus Project Shell mvn io.quarkus:quarkus-maven-plugin:3.17.6:create \ -DprojectGroupId=dev.mikeintoch \ -DprojectArtifactId=camel-agent-tools \ -Dextensions="camel-quarkus-core,camel-quarkus-langchain4j-chat,camel-quarkus-langchain4j-tools,camel-quarkus-platform-http,camel-quarkus-yaml-dsl" 2. Adding langchain4j Quarkus Extensions Shell ./mvnw quarkus:add-extension -Dextensions="io.quarkiverse.langchain4j:quarkus-langchain4j-core:0.22.0" ./mvnw quarkus:add-extension -Dextensions="io.quarkiverse.langchain4j:quarkus-langchain4j-ollama:0.22.0" 3. Configure Ollama to Run Ollama LLM Open the application.properties file and add the following lines: Properties files #Configure Ollama local model quarkus.langchain4j.ollama.chat-model.model-id=qwen2.5:0.5b quarkus.langchain4j.ollama.chat-model.temperature=0.0 quarkus.langchain4j.ollama.log-requests=true quarkus.langchain4j.log-responses=true quarkus.langchain4j.ollama.timeout=180s Quarkus uses Ollama to run LLM locally and also auto wire configuration for the use in Apache camel components in the following steps. 4. Creating Apache Camel Route Using Kaoto Create a new folder named route in the src/main/resources folder. Create a new file in the src/main/resources/routes folder and name route-main.camel.yaml, and Visual Studio Code opens the Kaoto visual editor. Click on the +New button and a new route will be created. Click on the circular arrows to replace the timer component. Search and select platform-http component from the catalog. Configure required platform-http properties: Set path with the value /camel/chat By default, platform-http will be serving on port 8080. Click on the Add Step Icon in the arrow after the platform-http component. Search and select the langchain4j-tools component in the catalog. Configure required langchain4j-tools properties: Set Tool Id with value my-tools.Set Tags with store (Defining tags is for grouping the tools to use with the LLM). You must process the user input message to the langchain4j-tools component able to use, then click on the Add Step Icon in the arrow after the platform-http component. Search and select the Process component in the catalog. Configure required properties: Set Ref with the value createChatMessage. The process component will use the createChatMessage method you will create in the following step. 5. Create a Process to Send User Input to LLM Create a new Java Class into src/main/java folder named Bindings.java. Java import java.util.ArrayList; import java.util.List; import java.util.Map; import java.util.HashMap; import org.apache.camel.BindToRegistry; import org.apache.camel.Exchange; import org.apache.camel.Processor; import org.apache.camel.builder.RouteBuilder; import dev.langchain4j.data.message.ChatMessage; import dev.langchain4j.data.message.SystemMessage; import dev.langchain4j.data.message.UserMessage; public class Bindings extends RouteBuilder{ @Override public void configure() throws Exception { // Routes are loading in yaml files. } @BindToRegistry(lazy=true) public static Processor createChatMessage(){ return new Processor() { public void process(Exchange exchange) throws Exception{ String payload = exchange.getMessage().getBody(String.class); List<ChatMessage> messages = new ArrayList<>(); String systemMessage = """ You are an intelligent store assistant. Users will ask you questions about store product. Your task is to provide accurate and concise answers. In the store have shirts, dresses, pants, shoes with no specific category %s If you are unable to access the tools to answer the user's query, Tell the user that the requested information is not available at this time and that they can try again later. """; String tools = """ You have access to a collection of tools You can use multiple tools at the same time Complete your answer using data obtained from the tools """; messages.add(new SystemMessage(systemMessage.formatted(tools))); messages.add(new UserMessage(payload)); exchange.getIn().setBody(messages); } }; } } This class helps create a Camel Processor to transform the user input into an object that can handle the langchain4j component in the route. It also gives the LLM context for using tools and explains the Agent's task. 6. Creating Apache Camel Tools for Using With LLM Create a new file in the src/main/resources/routes folder and name it route-tool-products.camel.yaml, and in Visual Studio Code, open the Kaoto visual editor. Click on the +New button, and a new route will be created. Click on the circular arrows to replace the timer component. Search and select the langchain4j-tools component in the catalog. Configure langchain4j-tools, click on the All tab and search Endpoint properties. Set Tool Id with value productsbycategoryandcolor.Set Tags with store (The same as in the main route).Set Description with value Query database products by category and color (a brief description of the tool). Add parameters that will be used by the tool: NAME: category, VALUE: stringNAME: color, VALUE: string These parameters will be assigned by the LLM for use in the tool and are passed via header. Add SQL Component to query database, then click on Add Step after the langchain4j-tools component. Search and select SQL component. Configure required SQL properties: Query with the following value. SQL Select name, description, category, size, color, price, stock from products where Lower(category)= Lower (:#category) and Lower(color) = Lower(:#color) Handle parameters to use in the query, then add a Convert Header component to convert parameters to a correct object type. Click on the Add Step button after langchain4j-tools, search, and select Convert Header To transformation in the catalog. Configure required properties for the component: Name with the value categoryType with the value String Repeat the steps with the following values: Name with the value colorType with the value String As a result, this is how the route looks like: Finally, you need to transform the query result into an object that the LLM can handle; in this example, you transform it into JSON. Click the Add Step button after SQL Component, and add the Marshal component. Configure data format properties for the Marshal and select JSon from the list. 7. Configure Quarkus Dev Services for PostgreSQL Add Quarkus extension to provide PostgreSQL for dev purposes, run following command in terminal. Shell ./mvnw quarkus:add-extension -Dextensions="io.quarkus:quarkus-jdbc-postgresql" Open application.properties and add the following lines: Properties files #Configuring devservices for Postgresql quarkus.datasource.db-kind=postgresql quarkus.datasource.devservices.port=5432 quarkus.datasource.devservices.init-script-path=db/schema-init.sql quarkus.datasource.devservices.db-name=store Finally, create our SQL script to load the database. Create a folder named db into src/main/resources, and into this folder, create a file named schema-init.sql with the following content. SQL DROP TABLE IF EXISTS products; CREATE TABLE IF NOT EXISTS products ( id SERIAL NOT NULL, name VARCHAR(100) NOT NULL, description varchar(150), category VARCHAR(50), size VARCHAR(20), color VARCHAR(20), price DECIMAL(10,2) NOT NULL, stock INT NOT NULL, CONSTRAINT products_pk PRIMARY KEY (id) ); INSERT INTO products (name, description, category, size, color, price, stock) VALUES ('Blue shirt', 'Cotton shirt, short-sleeved', 'Shirts', 'M', 'Blue', 29.99, 10), ('Black pants', 'Jeans, high waisted', 'Pants', '32', 'Black', 49.99, 5), ('White Sneakers', 'Sneakers', 'Shoes', '40', 'White', 69.99, 8), ('Floral Dress', 'Summer dress, floral print, thin straps.', 'Dress', 'M', 'Pink', 39.99, 12), ('Skinny Jeans', 'Dark denim jeans, high waist, skinny fit.', 'Pants', '28', 'Blue', 44.99, 18), ('White Sneakers', 'Casual sneakers, rubber sole, minimalist design.', 'Shoes', '40', 'White', 59.99, 10), ('Beige Chinos', 'Casual dress pants, straight cut, elastic waist.', 'Pants', '32', 'Beige', 39.99, 15), ('White Dress Shirt', 'Cotton shirt, long sleeves, classic collar.', 'Shirts', 'M', 'White', 29.99, 20), ('Brown Hiking Boots', 'Waterproof boots, rubber sole, perfect for hiking.', 'Shoes', '42', 'Brown', 89.99, 7), ('Distressed Jeans', 'Distressed denim jeans, mid-rise, regular fit.', 'Pants', '30', 'Blue', 49.99, 12); 8. Include our Route to be Loaded by the Quarkus Project Camel Quarkus supports several domain-specific languages (DSLs) in defining Camel Routes. It is also possible to include yaml DSL routes, adding the following line on the application.properties file. Properties files # routes to load camel.main.routes-include-pattern = routes/*.yaml This will be load all routes in the src/main/resources/routes folder. 9. Test the App Run the application using Maven, open a Terminal in Visual Studio code, and run the following command. Shell mvn quarkus:dev Once it has started, Quarkus calls Ollama and runs your LLM locally, opens a terminal, and verifies with the following command. Shell ollama ps NAME ID SIZE PROCESSOR UNTIL qwen2.5:0.5b a8b0c5157701 1.4 GB 100% GPU 4 minutes from now Also, Quarkus creates a container running PostgreSQL and creates a database and schema. You can connect using psql command. Shell psql -h localhost -p 5432 -U quarkus -d store And query products table: Shell store=# select * from products; id | name | description | category | size | color | price | stock ----+--------------------+----------------------------------------------------+----------+------+-------+-------+------- 1 | Blue shirt | Cotton shirt, short-sleeved | Shirts | M | Blue | 29.99 | 10 2 | Black pants | Jeans, high waisted | Pants | 32 | Black | 49.99 | 5 3 | White Sneakers | Sneakers | Shoes | 40 | White | 69.99 | 8 4 | Floral Dress | Summer dress, floral print, thin straps. | Dress | M | Pink | 39.99 | 12 5 | Skinny Jeans | Dark denim jeans, high waist, skinny fit. | Pants | 28 | Blue | 44.99 | 18 6 | White Sneakers | Casual sneakers, rubber sole, minimalist design. | Shoes | 40 | White | 59.99 | 10 7 | Beige Chinos | Casual dress pants, straight cut, elastic waist. | Pants | 32 | Beige | 39.99 | 15 8 | White Dress Shirt | Cotton shirt, long sleeves, classic collar. | Shirts | M | White | 29.99 | 20 9 | Brown Hiking Boots | Waterproof boots, rubber sole, perfect for hiking. | Shoes | 42 | Brown | 89.99 | 7 10 | Distressed Jeans | Distressed denim jeans, mid-rise, regular fit. | Pants | 30 | Blue | 49.99 | 12 (10 rows) To test the app, send a POST request to localhost:8080/camel/chat with a plain text body input. requesting for some product. The LLM may have hallucinated. Please try again modifying your request slightly. You can see how the LLM uses the tool and gets information from the database using the natural language request provided. LLM identifies the parameters and sends them to the tool. If you look in the request log, you can find the tools and parameters LLM is using to create the answer. Conclusion You've explored how to leverage the power of LLMs within your integration flows using Apache Camel and the LangChain4j component. We've seen how this combination allows you to seamlessly integrate powerful language models into your existing Camel routes, enabling you to build sophisticated applications that can understand, generate, and interact with human language.
In the software development lifecycle (SDLC), testing is one of the important stages where we ensure that the application works as expected and meets end-user requirements. Among the various techniques that we use for testing, mocking plays a crucial role in testing different components of a system, especially when the external services that the application is dependent on are not yet ready or deployed. With that being said, let’s try to understand what mocking is and how it helps in integration testing and end-to-end (E2E) testing. What Is Mocking? Mocking is the process of simulating the behavior of real objects or services that an application interacts with. In other words, when you mock something, you are creating a fake version of the real-world entity that behaves like the real thing but in a controlled way. For example, imagine you are building an e-commerce application. The application might be dependent on a payment gateway to process the payments. However, during testing, it might not be feasible to use the actual payment gateway service due to various factors like costs, service unavailability, not being able to control the response, etc. Here comes the concept of mocking, which we can use to test our application in a controllable way. Mocks can be used to replace dependencies (API, databases, etc.) and test our application in isolation. The Importance of Mocking Faster tests: Most of the time, when we interact with external services, tests usually tend to be either flaky or long-running due to external service either being unavailable or taking a longer time to respond. However, when we consider mocks, they are usually fast and reliable, which helps in faster execution of tests.Ability to test edge cases: When we use mocks, we have complete control over the response that a service can return. This is helpful when we want to test edge cases like exception scenarios, time out, errors, etc.,Isolation: With mocking, we can test specific functionality in an isolated way. For instance, if the application relies on a database, we can mock the database response in case we have challenges in setting up specific test data.Eliminate dependencies: If the application depends on a lot of external services that can make our tests unreliable and flaky, we can use mocks, which helps make our tests reliable. How to Mock an API? Now, let’s look at an example of how to mock an API call. For illustration purposes, we will use Java, Maven, Junit4, and Wiremock. 1. Add WireMock as a dependency to your project: Java <dependency> <groupId>org.wiremock</groupId> <artifactId>wiremock</artifactId> <version>3.10.0</version> <scope>test</scope> </dependency> <dependency> <groupId>org.assertj</groupId> <artifactId>assertj-core</artifactId> <version>3.26.3</version> <scope>test</scope> </dependency> 2. Add the WireMock rule: Java import static com.github.tomakehurst.wiremock.client.WireMock.*; 3. Set up WireMock: Java @Rule public WireMockRule wireMockRule = new WireMockRule(8089); // No-args constructor defaults to port 8080 4. Mock an API response: Java import java.net.http.HttpClient; import java.net.http.HttpRequest; import java.net.http.HttpResponse; ... @Test public void exampleTest() { // Setup the WireMock mapping stub for the test stubFor(post("/my/resource") .withHeader("Content-Type", containing("xml")) .willReturn(ok() .withHeader("Content-Type", "text/xml") .withBody("<response>SUCCESS</response>"))); // Setup HTTP POST request (with HTTP Client embedded in Java 11+) final HttpClient client = HttpClient.newBuilder().build(); final HttpRequest request = HttpRequest.newBuilder() .uri(wiremockServer.url("/my/resource")) .header("Content-Type", "text/xml") .POST().build(); // Send the request and receive the response final HttpResponse<String> response = client.send(request, HttpResponse.BodyHandlers.ofString()); // Verify the response (with AssertJ) assertThat(response.statusCode()).as("Wrong response status code").isEqualTo(200); assertThat(response.body()).as("Wrong response body").contains("<response>SUCCESS</response>"); } Best Practices Use Mocks only when required: Mocks help isolate the external services and test the application in a controlled way. However, overusing the mock can cause bugs in production if not tested with real services in staging environments.Mock External Services only: Only external services should be mocked and not the business logic.Always Update Mocks with the latest system contracts: Whenever there is a change in the real service contract/response, make sure the mock is also updated accordingly. Otherwise, we might be testing inaccurately. Conclusion Mocking comes in very handy when it comes to integration and end-to-end testing. Specifically, in tight deadlines, when the external services code changes are not ready for testing in the staging environments, mocking helps to get started with testing early and discover potential bugs in the application. However, we always need to ensure that the application is tested with real service before deploying to production.
Cardinality is the number of distinct items in a dataset. Whether it's counting the number of unique users on a website or estimating the number of distinct search queries, estimating cardinality becomes challenging when dealing with massive datasets. That's where the HyperLogLog algorithm comes into the picture. In this article, we will explore the key concepts behind HyperLogLog and its applications. HyperLogLog HyperLogLog is a probabilistic algorithm designed to estimate the cardinality of a dataset with both high accuracy and low memory usage. Traditional methods for counting distinct items require storing all the items seen so far, e.g., storing all the user information in the user dataset, which can quickly consume a significant amount of memory. HyperLogLog, on the other hand, uses a fixed amount of memory, a few kilobytes, and still provides accurate estimates of cardinality, making it ideal for large-scale data analysis. Use Cases HyperLogLog is particularly useful in the following scenarios: Limited Memory If working with massive datasets, such as logs from millions of users or network traffic data, storing every unique item might not be feasible due to memory constraints. Approximate Count In many cases, an exact count isn't necessary, and a good estimate is sufficient. HyperLogLog gives an estimate that is close enough to the true value without the overhead of precise computation. Streaming Data When working with continuous streams of data, such as live website traffic or social media feeds, HyperLogLog can update its estimate without needing to revisit past data. Some notable application use cases include the following: Web analytics: Estimating the number of unique users visiting a website. Social media analysis: Counting unique hashtags, mentions, or other distinct items in social media streams.Database systems: Efficiently counting distinct keys or values in large databases.Big data systems: Frameworks like Apache Hadoop and Apache Spark use HyperLogLog to count distinct items in big data pipelines.Network monitoring: Estimating the number of distinct IP addresses or packets in network traffic analysis. Existing Implementations HyperLogLog has been implemented in various languages and data processing frameworks. Some popular tools that implement HyperLogLog are the following: Redis provides a native implementation of HyperLogLog for approximate cardinality estimation via the PFADD, PFCOUNT, and PFMERGE commands. Redis allows users to efficiently track unique items in a dataset while consuming minimal memory.Google BigQuery provides a built in function called APPROX_COUNT_DISTINCT that uses HyperLogLog to estimate the count of distinct items in a large dataset. BigQuery optimizes the query processing by using HyperLogLog to offer highly efficient cardinality estimation without requiring the full storage of data.Apache DataSketches is a collection of algorithms for approximate computations, including HyperLogLog. It is implemented in Java and is often used in distributed computing environments for large-scale data processing.Python package hyperloglog is an implementation of HyperLogLog that allows you to compute the approximate cardinality of a dataset with a small memory footprint.The function approx_count_distinct is available in PySpark's DataFrame API and is used to calculate an approximate count of distinct values in a column of a DataFrame. It is based on the HyperLogLog algorithm, providing a highly memory efficient way of estimating distinct counts. Example Usage Python from pyspark.sql import SparkSession from pyspark.sql import functions spark=SparkSession.builder.appName('Test').getOrCreate() df = spark.createDataFrame([("user1", 1), ("user2", 2), ("user3", 3)]) distinct_count_estimate = df.agg(functions.approx_count_distinct("_1").alias("distinct_count")).collect() print(distinct_count_estimate) Logic The basic idea behind HyperLogLog is to use hash functions to map each item in the dataset to a position in a range of values. By analyzing the position of these items, the algorithm can estimate how many distinct items exist without storing them explicitly. Here's a step-by-step breakdown of how it works: Each item in the set is hashed using a hash function. The output of the hash function is a binary string.HyperLogLog focuses on the leading zeros in the binary representation of the hash value. The more leading zeros, the rarer the value. Specifically, the position of the first 1 bit in the hash is tracked, which gives an idea of how large the number of distinct items could be.HyperLogLog divides the range of possible hash values into multiple buckets or registers. Each register tracks the largest number of leading zeros observed for any item hashed to that register.After processing all items, HyperLogLog combines the information from all registers to compute an estimate of the cardinality. The more registers and the higher the number of leading zeros observed, the more accurate the estimate. HyperLogLog provides an estimate with an error margin. The error rate depends on the number of registers used in the algorithm. The more registers in use, the smaller the error margin, but also the higher the memory usage. The accuracy can be fine-tuned based on the needs of the application. Advantages Here are some of the key advantages of using HyperLogLog. Space Complexity Unlike traditional methods, which require storing each unique item, HyperLogLog uses a fixed amount of memory that scales logarithmically with the number of distinct items. This makes it ideal for large-scale datasets. Time Complexity HyperLogLog is highly efficient in terms of processing speed. It requires constant time for each item processed, making it suitable for real-time or streaming applications. Scalability HyperLogLog scales well with large datasets and is often used in distributed systems or data processing frameworks where handling massive volumes of data is a requirement. Simplicity The algorithm is relatively simple to implement and does not require complex data structures or operations. Other Approaches There are several other approaches for cardinality estimation, such as Count-Min Sketch and Bloom Filters. While each of these methods has its strengths, HyperLogLog stands out in terms of its balance between accuracy and space complexity. Bloom Filters Bloom filters are great for checking if an item exists, but they do not provide an estimate of cardinality. HyperLogLog, on the other hand, can estimate cardinality without needing to store all items. Count-Min Sketch This is a probabilistic data structure used for frequency estimation, but it requires more memory than HyperLogLog for the same level of accuracy in cardinality estimation. Conclusion HyperLogLog is an incredibly efficient and accurate algorithm for estimating cardinality in large datasets. Utilizing probabilistic techniques and hash functions will allow the handling of big data with minimal memory usage, making it an essential tool for applications in data analytics, distributed systems, and streaming data. References https://cloud.google.com/bigquery/docs/reference/standard-sql/hll_functionshttps://redis.io/docs/latest/develop/data-types/probabilistic/hyperloglogs/https://datasketches.apache.org/docs/HLL/HllMap.htmlhttps://pypi.org/project/hyperloglog/https://docs.databricks.com/en/sql/language-manual/functions/approx_count_distinct.html
Cold emailing remains one of the most effective ways to reach potential employers or clients, but crafting personalized, compelling messages at scale can be challenging. CrewAI is a framework for creating AI agent teams to automate and enhance cold email outreach. In this tutorial, we'll build a sophisticated cold email system using CrewAI that researches companies, generates personalized templates, and provides strategic insights. The Challenge With Traditional Cold Emailing Traditional cold emailing faces several challenges: Time-consuming research for each companyDifficulty maintaining personalization at scaleInconsistent messaging and value propositionsLimited ability to analyze and improve performance Our CrewAI-powered system addresses these challenges by creating a crew of specialized AI agents who work together to craft effective cold emails. Setting Up the Project First, let's set up our project structure: cold_emailer/ ├── config/ │ ├── agents.yaml │ └── tasks.yaml ├── cold_emailer_agent/ │ ├── __init__.py │ └── crew.py └── main.py Install the required dependencies: pip install crewai crewai-tools Defining Our AI Agents Our system uses three specialized agents, each with a specific role: Email researcher: Investigates companies and identifies personalization opportunitiesEmail strategist: Crafts compelling email templates based on researchOutreach analyst: Analyzes templates and suggests improvements Here's how we configure our agents in agents.yaml: YAML email_researcher: role: > Cold Email Research Specialist for {industry} goal: > Research companies and identify personalized connection points for cold emails backstory: > You're an expert at finding meaningful insights about companies and their pain points. You combine public information, technical analysis, and industry trends to identify compelling conversation starters for cold emails. # ... [similar configurations for email_strategist and outreach_analyst] Creating Tasks for Our Agents Each agent needs specific tasks to complete. We define these in tasks.yaml: YAML research_task: description: > Research {company_name} to identify: 1. Recent company news, tech stack changes, or public challenges 2. Specific technical improvement opportunities 3. Relevant projects or innovations they might be interested in 4. Key decision makers and their priorities expected_output: > A detailed research report with specific insights that can be used to personalize cold emails agent: email_researcher # ... [similar configurations for strategy_task and analysis_task] Implementing the CrewAI System The heart of our system is the ColdEmailCrew class. This orchestrates our agents and tasks: YAML @CrewBase class ColdEmailCrew: """Crew for generating personalized cold emails""" agents_config = 'config/agents.yaml' tasks_config = 'config/tasks.yaml' @agent def email_researcher(self) -> Agent: """Create the research specialist agent""" return Agent( config=self.agents_config['email_researcher'], verbose=True, tools=[SerperDevTool(), SeleniumScrapingTool()] ) # ... [similar methods for email_strategist and outreach_analyst] @crew def crew(self) -> Crew: """Creates the ColdEmailCrew""" return Crew( agents=self.agents, tasks=self.tasks, process=Process.sequential, verbose=True ) Running the System To use our cold email system: YAML from cold_emailer_agent.crew import ColdEmailCrew def run(): """Run the crew with example inputs""" inputs = { "industry": "tech", "company_name": "Google" } # Create and run the crew ColdEmailCrew().crew().kickoff(inputs=inputs) if __name__ == "__main__": run() Example Output When we run our system targeting Google in the tech industry, it generates: Research insights about Google's tech stack and infrastructureA personalized email template with multiple subject line variationsDetailed analysis with A/B testing suggestions The email template includes personalization opportunities: Subject Line: Improving Google's Tech Stack: Insights from Industry Experts Hi [Recipient], I came across your work on improving Google's tech stack, and I wanted to share some insights that might be relevant to your team. As we've analyzed Google's infrastructure, we noticed that they're using a combination of open-source technologies like Kubernetes, TensorFlow, and Apache Beam. While this is impressive, there are potential areas for improvement to enhance scalability and efficiency. [Rest of template...] Analysis and Improvements The system also provides a detailed analysis of the generated template: Personalization effectiveness score: 7/10Value proposition clarity: 8/10Specific improvement recommendationsA/B testing scenarios for optimization Future Enhancements Potential improvements to the system could include: Integration with email delivery systemsAdvanced analytics trackingMachine learning for response predictionDynamic template adjustment based on feedback Conclusion Combining specialized AI agents for research, strategy, and analysis can create more effective, personalized cold emails at scale. The system demonstrates how AI can augment human capabilities in business communication while maintaining authenticity and relevance. Try implementing this system in your outreach efforts and see how it can transform your cold email process. Test and refine the output to match your specific needs and voice.
OpenAI’s latest announcement about its reasoning models has really made me pause and think about where AI is headed. Over the years, I have seen GPT models evolve from something experimental to tools we now rely on daily for everything from content creation to customer support. But as impressive as GPT is, we have all noticed its shortcomings, especially when it’s tasked with solving complex problems or making logical connections. That’s why the idea of reasoning models feels like such a big step forward. It’s not just an upgrade; it’s a shift in what AI is capable of. So, what are reasoning models really about? And how will they change the AI landscape we have gotten so used to? From Generating Words to Solving Problems If GPT had been our creative partner, reasoning models would feel like the analytical minds we have been waiting for. GPT is great at generating text that flows beautifully, but let’s be honest, it can struggle when the task demands deeper thinking. I have seen it firsthand and am sure many of you must have: asking GPT to solve a multi-step problem or make sense of something nuanced can be frustrating. It tries, but it doesn’t always deliver. Reasoning models, on the other hand, seem to be stepping into that gap. Instead of focusing solely on writing content that sounds good, they are designed to think through information logically. Picture an AI that doesn’t just draft an email but helps you troubleshoot a complex issue in your code or untangle a tricky ethical dilemma. It’s like moving from a gifted writer to a sharp, analytical problem-solver. Why It Feels Like a Big Deal This isn’t just about making AI smarter for the sake of it. It’s about unlocking capabilities that could genuinely transform how we use technology. Tackling real challenges: Imagine an AI that doesn’t just identify issues but actually walks you through the solution, whether it’s debugging software, analyzing legal documents, or providing logical advice in medical cases. It’s like having a consultant on demand.Keeping context intact: One thing I’ve always found frustrating about GPT is how it loses the thread in longer conversations. You ask it something detailed, and a few exchanges later, it feels like it’s forgotten half of what you discussed. Reasoning models might finally fix this, offering more consistent and meaningful interactions.A glimpse of AGI: There’s a lot of talk about Artificial General Intelligence — AI that thinks and reasons like humans. Reasoning models might not get us there overnight, but they feel like a solid step in that direction. Is GPT Going Away? Now, this is something I have been thinking about a lot: Does this mean GPT models will become obsolete? Honestly, I don’t think so. I see reasoning models as a complement to GPT, not a replacement. The way I imagine it, future AI systems will combine the best of both worlds. You will have GPT for creative, flowing tasks like brainstorming or writing, and reasoning models for when you need logic and precision. It’s like having a creative writer and an analyst on the same team. Together, they could redefine what AI can do. Challenges Ahead As exciting as this is, there are many challenges. Developing reasoning models is no small feat. They’ll require even more computational power to train and deploy and that comes with significant costs. Then there’s the issue of bias. Just because an AI can reason logically doesn’t mean it’s immune to the biases in its training data. And we can’t ignore the potential for misuse AI that reasons well could be used to create more convincing disinformation or even automate malicious decision-making. These are things we’ll need to navigate carefully as this technology evolves. Why It Matters to Me and Maybe to You OpenAI’s new reasoning models aim to go beyond GPT’s creativity by focusing on logical problem-solving and context retention. These models could transform AI into true collaborators, solving complex challenges in fields like coding, medicine, and law. While GPT will remain relevant for creative tasks, reasoning models mark a step toward AI that thinks and reasons like humans. However, challenges like bias, computational demands, and ethical concerns must be addressed as we embrace this new frontier. For me, the shift from generative models like GPT to reasoning models isn’t just a technical upgrade. It’s a reimagining of what AI can be in our lives. It’s no longer just about automating tasks or drafting text it’s about building systems that help us think, solve, and make better decisions. Imagine a world where AI doesn’t just assist but genuinely collaborates with you, solving your hardest problems and making life just a little bit easier. That’s the potential I see in reasoning models, and it’s why I can’t wait to see how this unfolds.