Welcome to the Data Engineering category of DZone, where you will find all the information you need for AI/ML, big data, data, databases, and IoT. As you determine the first steps for new systems or reevaluate existing ones, you're going to require tools and resources to gather, store, and analyze data. The Zones within our Data Engineering category contain resources that will help you expertly navigate through the SDLC Analysis stage.
Artificial intelligence (AI) and machine learning (ML) are two fields that work together to create computer systems capable of perception, recognition, decision-making, and translation. Separately, AI is the ability for a computer system to mimic human intelligence through math and logic, and ML builds off AI by developing methods that "learn" through experience and do not require instruction. In the AI/ML Zone, you'll find resources ranging from tutorials to use cases that will help you navigate this rapidly growing field.
Big data comprises datasets that are massive, varied, complex, and can't be handled traditionally. Big data can include both structured and unstructured data, and it is often stored in data lakes or data warehouses. As organizations grow, big data becomes increasingly more crucial for gathering business insights and analytics. The Big Data Zone contains the resources you need for understanding data storage, data modeling, ELT, ETL, and more.
Data is at the core of software development. Think of it as information stored in anything from text documents and images to entire software programs, and these bits of information need to be processed, read, analyzed, stored, and transported throughout systems. In this Zone, you'll find resources covering the tools and strategies you need to handle data properly.
A database is a collection of structured data that is stored in a computer system, and it can be hosted on-premises or in the cloud. As databases are designed to enable easy access to data, our resources are compiled here for smooth browsing of everything you need to know from database management systems to database languages.
IoT, or the Internet of Things, is a technological field that makes it possible for users to connect devices and systems and exchange data over the internet. Through DZone's IoT resources, you'll learn about smart devices, sensors, networks, edge computing, and many other technologies — including those that are now part of the average person's daily life.
Observability and DevTool Platforms for AI Agents
Agentic Workflows for Unlocking User Engagement Insights
Video deduplication is a crucial process for managing large-scale video inventory, where duplicates consume storage, increase processing costs, and affect data quality negatively. This article explores a robust architecture for deduplication using video segmentation, frame embedding extraction, and clustering techniques. It also highlights key methodologies like video hashing, CLIP embeddings, and temporal alignment for effective deduplication. Challenges in Video Deduplication Scale Video datasets are exponentially larger than images, with each video containing thousands of frames. This presents challenges such as: Data volume. Gigabytes to terabytes of data requiring efficient I/O handling.Frame explosion. Extracting frames for embedding generation results in millions of data points. Accuracy Videos often have slight variations, such as: Different resolutions, formats, compression levels, etc.Trivial scene changes, like camera movements or overlays, which should not be treated as duplicates. Latency Real-time deduplication workflows, such as content moderation, require pipelines that minimize latency while handling massive data volumes. Architecture Video Segmentation The first step in deduplication is segmenting videos into manageable components. We reduce redundant frame comparisons and improve efficiency by identifying scene changes or fixed time intervals. Efficiency. Analyzing the entire video frame-by-frame is computationally expensive. Segmentation reduces the workload by focusing on representative frames.Focus. Keyframes capture the essence of scenes, improving the accuracy of deduplication. Python import cv2 #Video segmentation using scene change detection video_path = "input_video" def segment_video(video_path): cap = cv2.VideoCapture(video_path) frame_count = 0 segments = [] while cap.isOpened(): ret, frame = cap.read() if not ret: break #Detect scene change (compare histograms) if frame_count % 30 == 0: #Process every 30th frame - Can be tuned segments.append(frame) frame_count += 1 cap.release() return segments segments = segment_video(video_path) This implementation showcases a histogram-based segmentation approach, but advanced methods like deep learning-based scene detection can provide better accuracy at the cost of high compute. Frame Embedding Extraction After segmentation, representative frames are converted into embeddings using CLIP. These embeddings capture semantic features for similarity comparison. Why CLIP? Cross-modal understanding. CLIP embeddings excel at capturing semantic relationships across modalities, making them ideal for complex data, such as videos.Efficiency. Pre-trained models provide high-quality embeddings without extensive training. Python from transformers import CLIPProcessor, CLIPModel import torch #Load pre-trained CLIP model model = CLIPModel.from_pretrained("openai/clip-vit-base-patch32").cuda() processor = CLIPProcessor.from_pretrained("openai/clip-vit-base-patch32") def extract_frame_embeddings(frames): inputs = processor(images=frames, return_tensors="pt", padding=True).to("cuda") with torch.no_grad(): embeddings = model.get_image_features(inputs) return embeddings.cpu().numpy() frame_embeddings = extract_frame_embeddings(segments) CUDA acceleration ensures that large batches of frames are processed efficiently, thus enabling high throughput pipelines. Temporal Alignment for Embedding Comparison Temporal alignment involves matching embeddings from different videos to identify duplicates. By aligning embeddings based on timestamps, we ensure that comparisons are meaningful. Why Temporal Alignment? Context preservation. Aligning embeddings ensures that comparisons account for video timelines, reducing false positives.Scalability. By focusing on aligned frames, computational requirements are minimized. Python import numpy as np def temporal_alignment(embeddings_a, embeddings_b, threshold=0.8): aligned_pairs = [] for i, emb_a in enumerate(embeddings_a): for j, emb_b in enumerate(embeddings_b): similarity = np.dot(emb_a, emb_b) / (np.linalg.norm(emb_a) * np.linalg.norm(emb_b)) if similarity > threshold: aligned_pairs.append((i, j, similarity)) return aligned_pairs aligned_pairs = temporal_alignment(frame_embeddings, frame_embeddings) This implementation uses cosine similarity-based alignment. Advanced methods can incorporate dynamic time warping for non-linear alignments. Clustering for Deduplication Clustering groups similar embeddings into clusters and identifies duplicates across videos. Scalability. Clustering reduces computational overhead by summarizing similarity scores into groups.Flexibility. Techniques like DBSCAN dynamically adapt to clusters of varying densities. Python from sklearn.cluster import DBSCAN #Clustering with DBSCAN clustering = DBSCAN(eps=0.5, min_samples=5, metric='cosine').fit(frame_embeddings) #Cluster assignments cluster_labels = clustering.labels_ for frame, label in zip(segments, cluster_labels): print(f"Frame belongs to cluster {label}") DBSCAN is preferred for its ability to handle noisy data and adapt to non-spherical cluster shapes. HDBSCAN can also be used if the compute permits. Techniques for Enhanced Deduplication Video Hashing Video hashing generates unique signatures for videos, enabling quick deduplication. Techniques like perceptual video hashing consider temporal features for improved accuracy. Python from moviepy.editor import VideoFileClip from imagehash import phash #Generate a perceptual hash for a video video = VideoFileClip(video_path) frame_hashes = [phash(frame.to_image()) for frame in video.iter_frames()] hash_signature = ''.join(map(str, frame_hashes)) print("Video Hash Signature:", hash_signature) Combining Temporal Alignment With Clustering Integrating temporal alignment with clustering improves precision by filtering outliers and emphasizing aligned embeddings although the required compute would be significantly more. Conclusion Deduplication of videos at scale requires a blend of techniques, including video segmentation, CLIP embeddings, and temporal alignment. Massive video assets can be efficiently managed by utilizing CUDA acceleration, clustering algorithms, and advanced embedding models. This architecture optimizes storage and ensures data quality, enabling downstream applications like content recommendation and analytics to be free from bias.
In a world obsessed with artificial intelligence, there's a new player in town — AI agents. But before you roll your eyes and think, "Great, another tech term to pretend I understand at meetings," let’s break it down. What the Heck Are AI Agents? Imagine you have a really smart assistant — not just one that tells you the weather or suggests a new Netflix show — but one that thinks, plans, and acts without you having to spell everything out. That’s what AI agents are all about. Unlike simple chatbots or automation scripts that follow rigid, predefined paths, AI agents are designed to be autonomous. They don’t just react; they perceive, decide, and take action based on goals. At their core, AI agents have three main components: A model. The brain behind the agent, often powered by a language model (like GPT) or a combination of AI techniques.Tools. The “hands” of the agent, allowing it to interact with databases, APIs, and external systems.An orchestration layer. This governs how the agent perceives its environment, plans, and acts. Think of an AI agent as a chef in a high-end restaurant. It looks at the ingredients available (perception), decides what to cook (reasoning), and then actually prepares the dish (action). This cycle repeats and improves over time, making the agent more efficient and effective. Not Everything That Talks Back Is an AI Agent Here’s where we need to clear up some confusion. Just because something is labeled as “AI” doesn’t make it an agent. A large language model, like the ones behind your favorite chatbots, is an impressive text generator. It predicts words based on patterns it has learned but doesn’t actually understand what it’s saying. It’s like a parrot that repeats words convincingly but doesn’t grasp their meaning. Similarly, chatbots and automated customer service assistants might give helpful responses, but they’re simply regurgitating predefined scripts — they don’t make decisions or adapt dynamically. AI agents, on the other hand, are goal-oriented problem solvers. They don’t just answer questions; they analyze real-time data, make informed decisions, and adapt their behavior to achieve complex objectives. Imagine hiring a new employee — one that doesn’t just do what they’re told but also figures out what needs to be done, identifies the best way to do it, and improves over time. That’s the difference between a basic chatbot and an AI Agent. How Are AI Agents Built? AI agents are not just simple programs following a script; they are complex systems built with multiple interdependent components. Their architecture can be broken down into three fundamental parts: The Model This is the core decision-making unit of an AI agent. It typically consists of machine learning models, including large language models (LLMs), neural networks, and other AI techniques. These models process input data, generate predictions, and make informed decisions based on patterns and learned behaviors. The Tools AI agents extend their capabilities through external tools such as APIs, databases, search engines, or specialized functions. These tools allow agents to retrieve real-time information, interact with digital systems, and even execute specific tasks beyond their initial training data. The Orchestration Layer This governs the entire operational cycle of an AI agent. It includes mechanisms for perception (input processing), reasoning (decision-making), and action (executing tasks). The orchestration layer ensures the agent dynamically adapts to new inputs and refines its responses over time. Cognitive Architecture: The Brain of AI Agents The cognitive architecture of an AI agent defines how it processes information, reasons through problems, and interacts with its environment. This architecture typically includes the following: 1. Perception Module The agent collects raw data from its surroundings, which can include structured databases, real-time web scraping, or even IoT sensor inputs. 2. Memory and Knowledge Graphs AI agents store and retrieve relevant information to maintain context over time. This includes both short-term memory (session-based interactions) and long-term memory (historical learning and pattern recognition). 3. Decision-Making and Planning Agents use frameworks such as Chain-of-Thought (CoT) or Tree-of-Thought (ToT) reasoning to break complex tasks into manageable steps, analyze multiple solutions, and select the best course of action. 4. Action Execution Once a decision is made, the agent interacts with its environment using predefined tools, API calls, or even physical actuators in robotics-based implementations. 5. Feedback Loop and Continuous Learning AI agents refine their decision-making process over time through reinforcement learning, self-supervised learning, or user feedback mechanisms. Think of an AI agent like a self-driving car. The model is the brain that makes driving decisions, the tools include sensors and navigation systems to interact with the road, and the orchestration layer ensures all these components work in sync to drive safely and efficiently. The cognitive architecture enables the car to not only drive but also learn from past trips, anticipate potential obstacles, and adapt to new routes dynamically. Why Should You Care? AI agents are not just an evolution of AI; they are a fundamental shift in IT operations and decision-making. These agents are being increasingly integrated into Predictive AIOps (Artificial Intelligence for IT Operations), where they autonomously manage, optimize, and troubleshoot systems without human intervention. Unlike traditional automation, which follows pre-defined scripts, AI agents dynamically predict, adapt, and respond to system conditions in real time. Some key benefits of AI agents include: Proactive issue resolution. AI agents in AIOps identify potential failures before they occur, reducing downtime and ensuring system resilience.Autonomous decision-making. They optimize system performance, allocate resources, and resolve errors without waiting for human input.Scalability and adaptability. AI agents continuously learn from system data, adjust in real time, and enhance operational efficiency without requiring frequent manual updates.Enhanced IT autonomy. By leveraging reinforcement learning and predictive analytics, AI agents create self-sustaining IT ecosystems, minimizing operational risks and human workload. Okay, so AI agents sound cool, but what can they actually do? Adaptive and self-sustaining AI systems. AI agents are transforming IT management and operational resilience. Instead of just replacing workflows, they now optimize and predict system health, automatically mitigating risks and reducing downtime. Whether it's self-repairing IT infrastructure, real-time cybersecurity monitoring, or orchestrating distributed cloud environments, AI Agents are pushing technology toward self-governing, intelligent automation.Dynamic decision-making. AI agents continuously analyze complex systems in real time, using advanced cognitive architectures to make decisions without predefined rules. This allows them to detect anomalies, mitigate security risks, and reconfigure environments autonomously.Autonomous systems in IT and cybersecurity. AI agents are not just digital assistants but active participants in managing IT infrastructure. They autonomously allocate resources, detect vulnerabilities, and adapt to emerging threats, enhancing system resilience without human oversight.Self-learning and predictive adaptation. AI agents employ reinforcement learning techniques, meaning they refine their behavior based on past experiences. Whether it’s optimizing system performance, predicting potential failures, or automating complex workflows, these agents continuously improve without requiring manual intervention. What’s Next for AI Agents? The future of AI agents is both thrilling and terrifying. Companies are investing in large action models (LAMs) — next-gen AI that doesn’t just generate text but actually does things. We’re talking about AI that can manage entire business processes or run a company’s operations without human intervention. But with great power comes great responsibility, right? AI agents will also need governance, ethical considerations, and built-in safeguards to prevent them from going rogue (because, let’s face it, we’ve all seen Terminator). Final Thoughts: Hype or Reality? AI agents aren’t just another tech buzzword — they represent a fundamental shift in how AI interacts with the world. Sure, we’re still in the early days, and there’s a lot of fluff in the market, but make no mistake: AI agents will change the way we work, live, and do business. The question is: Are you ready for them, or will you be left scrambling to catch up? Further Reading and Sources For those interested in diving deeper into the world of AI agents and their applications, I highly recommend exploring the research behind Predictive AIOps and cognitive AI architectures. The insights presented in Agentic AI in Predictive AIOps: Enhancing IT Autonomy and Performance provide a strong foundation for understanding how AI agents are transforming IT operations and decision-making processes. Additionally, the whitepaper Agents explores the intricate details of AI agent architectures, including cognitive reasoning, decision-making models, and integration with external tools. This paper highlights how AI agents bridge the gap between foundational models and real-world applications, extending their utility far beyond simple automation. If you're curious about the frameworks and methodologies that power AI agents, both of these sources will help you gain a more comprehensive understanding of the technology and its implications. AI agents are not just a futuristic concept; they are already reshaping industries. The key question remains — will you be a passive observer or an active participant in this revolution?
Stored procedures and functions are implementing the business logic of the database. When migrating the SQL Server database to PostgreSQL, you will need to convert stored procedures and functions properly, paying attention to parameter handling, rowset retrieval, and other specific syntax constructions. SQL Server uses a dialect of SQL called Transact-SQL (or T-SQL) for stored procedures and functions, while PostgreSQL uses Procedural Language/PostgreSQL (or PL/pgSQL) for the same. These languages have significantly different syntax and capabilities, so stored procedures and functions must be carefully analyzed and converted. Also, some T-SQL features have no direct equivalents in PL/pgSQL, and therefore, alternative implementation is required for those cases. Finally, stored procedures and functions must be optimized for the PostgreSQL engine to ensure they perform efficiently. Returning a Rowset Both SQL Server and PostgreSQL allow the return of a rowset, usually the result of a SELECT query, from stored procedures or functions, but the syntax is distinguished. If the stored procedure in T-SQL contains SELECT as the last statement of the body, this means it returns rowset. PL/pgSQL requires either forward declaration of returned rowset as a table or fetching data through refcursor. When returning rowset has just a few columns with clear types, you can use the RETURNS TABLE feature of PostgreSQL. In T-SQL: SQL CREATE PROCEDURE GetCustomerOrders @CustomerID INT AS SELECT OrderID, OrderDate, Amount FROM Orders WHERE CustomerID = @CustomerID; GO In PL/pgSQL, the same may look like this: SQL CREATE OR REPLACE FUNCTION GetCustomerOrders(CustomerID INT) RETURNS TABLE(OrderID INT, OrderDate TIMESTAMP, Amount DECIMAL) AS $$ BEGIN RETURN QUERY SELECT OrderID, OrderDate, Amount FROM Orders WHERE CustomerID = GetCustomerOrders.CustomerID; END; $$ LANGUAGE plpgsql; And the caller PostgreSQL code may look like this: SQL SELECT * FROM GetCustomerOrders(5); If the returning rowset is more complicated and it is hard to determine the data type for each column, the approach above may not work. For those cases, the workaround is to use refcursor. In T-SQL: SQL CREATE PROCEDURE GetSalesByRange @DateFrom DATETIME, @DateTo DATETIME AS SELECT C.CustomerID, C.Name AS CustomerName, C.FirstName, C.LastName, C.Email AS CustomerEmail, C.Mobile, C.AddressOne, C.AddressTwo, C.City, C.ZipCode, CY.Name AS Country, ST.TicketID, TT.TicketTypeID, TT.Name AS TicketType, PZ.PriceZoneID, PZ.Name AS PriceZone, ST.FinalPrice AS Price, ST.Created, ST.TransactionType, COALESCE(VME.ExternalEventID, IIF(E.ExternalID = '', NULL, E.ExternalID), '0') AS ExternalID, E.EventID, ES.[Name] AS Section, ST.RowName, ST.SeatName FROM [Event] E WITH (NOLOCK) INNER JOIN EventCache EC WITH (NOLOCK) ON E.EventID = EC.EventID INNER JOIN SaleTicket ST WITH (NOLOCK) ON E.EventID = ST.EventID INNER JOIN EventSection ES WITH (NOLOCK) ON ST.EventSectionID = ES.EventSectionID INNER JOIN Customer C WITH (NOLOCK) ON ST.CustomerID = C.CustomerID INNER JOIN Country CY WITH (NOLOCK) ON C.CountryID = CY.CountryID INNER JOIN TicketType TT WITH (NOLOCK) ON ST.TicketTypeID = TT.TicketTypeID INNER JOIN PriceZone PZ WITH (NOLOCK) ON ST.PriceZoneID = PZ.PriceZoneID LEFT OUTER JOIN VenueManagementEvent VME ON VME.EventID = E.EventID WHERE ST.Created BETWEEN @DateFrom AND @DateTo ORDER BY ST.Created GO In PL/pgSQL: SQL CREATE OR REPLACE FUNCTION GetSalesByRange ( V_DateFrom TIMESTAMP(3), V_DateTo TIMESTAMP(3), V_rc refcursor ) RETURNS refcursor AS $$ BEGIN OPEN V_rc FOR SELECT C.CustomerID, C.Name AS CustomerName, C.FirstName, C.LastName, C.Email AS CustomerEmail, C.Mobile, C.AddressOne, C.AddressTwo, C.City, C.ZipCode, CY.Name AS Country, ST.TicketID, TT.TicketTypeID, TT.Name AS TicketType, PZ.PriceZoneID, PZ.Name AS PriceZone, ST.FinalPrice AS Price, ST.Created, ST.TransactionType, COALESCE( VME.ExternalEventID, (CASE WHEN E.ExternalID = '' THEN NULL ELSE E.ExternalID END), '0') AS ExternalID, E.EventID, ES.Name AS Section, ST.RowName, ST.SeatName FROM Event E INNER JOIN EventCache EC ON E.EventID = EC.EventID INNER JOIN SaleTicket ST ON E.EventID = ST.EventID INNER JOIN EventSection ES ON ST.EventSectionID = ES.EventSectionID INNER JOIN Customer C ON ST.CustomerID = C.CustomerID INNER JOIN Country CY ON C.CountryID = CY.CountryID INNER JOIN TicketType TT ON ST.TicketTypeID = TT.TicketTypeID INNER JOIN PriceZone PZ ON ST.PriceZoneID = PZ.PriceZoneID LEFT OUTER JOIN VenueManagementEvent VME ON VME.EventID = E.EventID WHERE ST.Created BETWEEN V_DateFrom AND V_DateTo ORDER BY ST.Created; RETURN V_rc; END; $$ LANGUAGE plpgsql; And the caller PostgreSQL code may look like this: SQL BEGIN; SELECT GetSalesByRange( '2024-01-01'::TIMESTAMP(3), '2025-01-01'::TIMESTAMP(3), 'mycursorname' ); FETCH 4 FROM mycursorname; COMMIT; Declaration of Local Variables T-SQL allows local variables to be declared everywhere inside a stored procedure or function body. PL/pgSQL requires that all local variables are declared before BEGIN keyword: SQL CREATE OR REPLACE FUNCTION CreateEvent(…) AS $$ DECLARE v_EventID INT; v_EventGroupID INT; BEGIN … END; $$ LANGUAGE plpgsql; In SQL Server, table variables can be declared as follows: SQL DECLARE @Products TABLE ( ProductID int, ProductTitle varchar(100), ProductPrice decimal (8,2) ) PostgreSQL does not support this feature; temporary tables should be used instead: SQL CREATE TEMP TABLE Products ( ProductID int, ProductTitle varchar(100), ProductPrice decimal (8,2) ) Remember that temporary tables are automatically dropped at the end of the session or the current transaction. If you need to manage the lifetime of the table explicitly, use the DROP TABLE IF EXISTS statement. Pay attention to appropriate SQL Server to PostgreSQL types mapping when converting variables declaration. Last Value of Auto-Increment Column After running INSERT-query, you may need to get the generated value of the auto-increment column. In T-SQL, it may be obtained as SQL CREATE TABLE aitest (id int identity, val varchar(20)); INSERT INTO aitest(val) VALUES ('one'),('two'),('three'); SELECT @LastID = SCOPE_IDENTITY(); PostgreSQL allows access to the last inserted value via an automatically generated sequence that always has the name {tablename}_{columnname}_seq: SQL CREATE TABLE aitest (id serial, val varchar(20)); INSERT INTO aitest(val) VALUES ('one'),('two'),('three'); LastID := currval('aitest_id_seq’); Built-In Functions When migrating stored procedures and functions from SQL Server to PostgreSQL, all specific built-in functions and operators must be converted into equivalents according to the rules below: Function CHARINDEX must be replaced by PostgreSQL equivalent POSITIONFunction CONVERT must be migrated into PostgreSQL according to the rules specified in this articleFunction DATEADD($interval, $n_units, $date) can be converted into PostgreSQL expressions that use the operator + depending on $interval value as follows: DAY / DD / D / DAYOFYEAR / DY ($date + $n_units * interval '1 day')::dateHOUR / HH($date + $n_units * interval '1 hour')::dateMINUTE / MI / N($date + $n_units * interval '1 minute')::dateMONTH / MM / M($date + $n_units * interval '1 month')::dateQUARTER / QQ / Q($date + $n_units * 3 * interval '1 month')::dateSECOND / SS / S($date + $n_units * interval '1 second')::dateWEEK / WW / WK($date + $n_units * interval '1 week')::dateWEEKDAY / DW / W($date + $n_units * interval '1 day')::dateYEAR / YY($date + $n_units * interval '1 year')::date Function DATEDIFF($interval, $date1, $date2) of SQL Server can be emulated in PostgreSQL via DATE_PART as follows: DAY / DD / D / DAYOFYEAR / DY date_part('day', $date2 - $date1)::intHOUR / HH24 * date_part('day', $date2 - $date1)::int + date_part('hour', $date2 - $date1)MINUTE / MI / N1440 * date_part('day', $date2 - $date1)::int + 60 * date_part('hour', $date2 - $date1) + date_part('minute', $date2 - $date1)MONTH / MM / M(12 * (date_part('year', $date2) - date_part('year', $date1))::int + date_part('month', $date2) - date_part('month', $date1))::intSECOND / SS / S86400 * date_part('day', $date2 - $date1)::int + 3600 * date_part('hour', $date2 - $date1) + 60 * date_part('minute', $date2 - $date1) + date_part('second', $date2 - $date1)WEEK / WW / WKTRUNC(date_part('day', $date2 - $date1) / 7)WEEKDAY / DW / Wdate_part('day', $date2 - $date1)::intYEAR / YY(date_part('year', $date2) - date_part('year', $date1))::int Every occurrence of DATEPART must be replaced by DATE_PARTSQL Server function GETDATE must be converted into PostgreSQL NOW()Conditional operator IIF($condition, $first, $second) must be converted into CASE WHEN $condition THEN $first ELSE $second ENDEvery occurrence of ISNULL must be replaced by COALESCESQL Server function REPLICATE must be converted into PostgreSQL equivalent, REPEATEvery occurrence of SPACE($n) must be replaced by REPEAT(' ', $n) Conclusion The migration of stored procedures and functions between two DBMSs is quite a complicated procedure requiring much time and effort. Although it cannot be completely automated, some available tools online could help partially automate the procedure.
In programming, object mutation implies that an object's state or data is mutated after creation. In other words, the operation that changes the attributes of an object in JavaScript is known as object mutation. Object mutation alters an object's values directly, making it challenging, particularly in applications where multiple operations may try to read from or write to an object simultaneously. This article presents a discussion on object mutation in JavaScript with relevant code examples wherever necessary. Data Types in JavaScript Data types denote the type of data a variable or an object can hold. JavaScript supports two distinct categories of data types: primitive and user-defined or reference types. Primitive Data Types In JavaScript, all primitive data types are immutable by nature, i.e., you cannot alter them after they have been created. Numbers, Booleans, Strings, Bigints, Undefineds, Nulls, Symbols, and Objects are examples of primitive types. User-Defined or Reference Data Types User-defined data types or reference data types are objects created using primitive types or a combination of primitive and user-defined types. Typical examples of user-defined or reference types are objects and arrays. How Variables Are Assigned and Reassigned in JavaScript When you assign a primitive type variable to a primitive type variable, the two variables hold similar values, but they are stored in different storage locations. For example, assume that you have two variables varA and varB and you assign one variable to another in the following way: JavaScript var varA = 100; var varB = varA; console.log(varB); When you execute the preceding piece of code, the number 100 will be displayed on the console. Now, you change the values of one of the two variables (say varB) as shown here. JavaScript var varA = 100; var varB = varA; varB = 500; console.log(varA); Note how the value of the variable varB has been changed to 500. When you print the value of varA, it will still display 100. This is because these variables varA and varB are stored in two different memory locations. So, if you change any of them, the new or changed value will not reflect on the other variables. What Is Object Mutation in JavaScript? In JavaScript, the data type of an object can belong to any of the two categories: primitive or non-primitive. While primitive types are immutable, i.e., you cannot change them after creating them, you can alter non-primitive types, i.e., objects and arrays. Objects always allow their values to be changed. Hence, you can change the state of fields for a mutable type without creating a new instance. Object mutations can create several problems, such as the following: Mutated objects can often lead to race conditions because of concurrency and thread-safety issuesMutation can introduce complexities in the source code because of predictability and thread safety issuesMutation can often lead to bugs that can be difficult to identify in the application's source codeMutation makes testing and debugging the code difficult because tracking code that leverages mutation becomes a challenge Code Examples That Demonstrate Object Mutation Object mutation can occur in any of the following scenarios: Adding, editing, or removing propertiesUsing methods that can exhibit mutation When you alter the properties of an object, either directly or indirectly, you are essentially mutating the object. The following code snippet shows how you can mutate an object by changing its property. JavaScript const author = { id: 1, name: "Joydip Kanjilal"}; author.id = 2; author.city = "Hyderabad, INDIA"; console.log(author); In the preceding piece of code, we create an object named author that contains two properties, namely, id and name. While the id property is used to store the id of the author record, the name property stores the name of the author. Note how we mutate the author object by altering the value pertaining to the id property. Next, we add a new property, named city, to the author object and assign a value to the property. When you run the preceding piece of code, the properties and their values of the author object will be displayed as shown below: JavaScript { name: 'Joydip Kanjilal', city: 'Hyderabad, INDIA' } When you pass an object to a function or assign it to a variable in JavaScript, you're essentially passing the reference to the object and not a copy of it. This implies that any change you make to the new object created by passing an object or assigning it to the variable will apply to all references of the actual object. Consider the following piece of code that shows how you can create an object in JavaScript and then assign it to a variable. JavaScript const objA = { id: 1, name: 'Joydip Kanjilal', city: 'Hyderabad, INDIA', pincode: 500089 } const objB = objA; objB.pincode = 500034; console.log(objA); In the preceding piece of code, the object objA is assigned to objB, and the value of the pincode property of objA is changed, i.e., the object objA is mutated. When you execute the program, the following data will be displayed. JavaScript { id: 1, name: 'Joydip Kanjilal', city: 'Hyderabad, INDIA', pincode: 500034 } Note that the value of the pincode property has been changed. Preventing Object Mutation in JavaScript In JavaScript, you can prevent mutation in several ways, such as the following: Using object cloning by taking advantage of the Object.assign() method or the spread operator (...)Using the Object.seal() method to prevent adding or deleting properties of an objectUsing the Object.freeze() method to prevent adding, editing, or deleting properties of an object Using Cloning Refer to the following piece of code that shows how you can clone an object in JavaScript using the spread operator. JavaScript let originalObj = { x: 10, y: 100 }; let clonedObj = { ...originalObj }; Here, the name of the cloned object is clonedObj, and it is identical to the original object named originalObj. So, if you display the values of the two properties of these two objects, the results will be the same. Now, change the value of one of the properties of the cloned object named, clonedObj to your desired value, as shown in the piece of code given below. Plain Text clonedObj.x = 50; Now, write the following piece of code to display the value of the property named x pertaining to the two objects originalObj and clonedObj. Plain Text console.log(originalObj.x); console.log(clonedObj.x); When you run the program, you'll observe that the value of the property x in the original object is unchanged. The values will be displayed at the console as shown below: Plain Text 10 50 Using the Object.freeze() Method The Object.freeze() method can make an object immutable by preventing any alterations to any of its properties. JavaScript const author = { id: 1, name: "Joydip Kanjilal", city: "Hyderabad", state: "Telengana", country: "India", pincode: 500089}; Object.freeze(author); author.city = "Bangalore"; author.state = "Karnataka"; author.pincode = 560010; console.log(author); When you execute the preceding piece of code, the results will be similar to this: JavaScript { id: 1, name: 'Joydip Kanjilal', city: 'Hyderabad', state: 'Telangana', country: 'India', pincode: 500089 } As you can see from the output, even if you’ve assigned values to the properties city and state, and pincode, there is no effect. So, no changes have been made to the data contained in any of the properties of the object. Using the Object.seal() Method You can also use the Object.seal() method to prevent object mutation in JavaScript. This method would enable you to alter the values of existing properties, but you cannot modify or delete any of the properties of the Object. The following code example illustrates this: JavaScript const author = { id: 1, name: "Joydip Kanjilal", city: "Hyderabad", state: "Telangana", country: "India", pincode: 500089}; Object.seal(author); author.city = "Bangalore"; author.state = "Karnataka"; author.pincode = 560005; author.booksauthored = 3; console.log(author); In the preceding code snippet, while modifications to the properties of the object named author will be allowed, neither addition nor deletion of the object's properties will be allowed. When you run the program, you'll see that the values of the properties modified are reflected in the result, but the statements that add or delete properties are ignored. Here's how the output would look like at the console: JavaScript { id: 1, name: 'Joydip Kanjilal', city: 'Bangalore', state: 'Karnataka', country: 'India', pincode: 560005 } Using the Object.defineProperty() method You can also leverage the Object.defineProperty() method in JavaScript to control the mutability of an object's individual properties. The following code snippet shows how you can use this method to disallow alterations to the value contained in a property whose mutability is restricted. JavaScript const author = { id: 1, name: "Joydip Kanjilal"}; Object.defineProperty(author, "booksauthored", { value: 3, writable: false, }); author.booksauthored = 5; console.log(author.booksauthored); When you execute the preceding piece of code, you’ll see that the number 3 is displayed on the console. Key Takeaways JavaScript categorizes object types into two distinct categories: primitives (mutable) and objects (immutable).The term object mutation refers to the operations that alter or change an object after it has been created. While primitive values such as number, etc., cannot be altered, you can always change objects after they have been created. Since strings in JavaScript are immutable, you cannot alter them once they have been created.Although mutation by itself is not that bad, you should manage it carefully to reduce bugs in your applications.You can reduce or eliminate mutation in JavaScript by following the recommended practices and leveraging immutable data structures.
The cloud has proven to be a main enabler for large AI deployments because it provides AI-native APIs for rapid prototyping, elastic computing, and storage to address scaling problems. This article covers how to build and scale GenAI applications in the cloud. The Importance of the Cloud in GenAI The cloud is critical for contemporary GenAI applications because it can accommodate vast processing power, data storage, and distributed processes necessary for AI models. Traditional deployments often need more flexibility and performance to adapt to changing business requirements. Microsoft Azure, AWS, and Google Cloud are examples of cloud AI service providers. For example, Azure AI provides ready-to-utilize algorithms and models and the necessary infrastructural tools for building and expanding AI applications. In addition, GenAI projects that are cloud-based also benefit from the following advantages: Elastic provisioning: Resources are provisioned automatically or manually depending on business needs.Cost optimization: AI tools and AI-enabled tool configurations, plus automatic on-the-fly scaling can optimize operational costs. Not to mention the pay-as-you-go pricing model and hybrid cloud supported by large cloud providers. All of these improvements facilitate more focus on model development instead of hardware and infrastructural backing management.Integrated AI Services: Integration makes it possible to market faster by using pre-trained models and APIs or OpenAI and all advanced toolkits. Due to these advantages, the cloud is the core of the development of current generative AI, starting from the large language models (LLMs) to the multimodal AI systems. Data Preparation Any effective GenAI application relies on high-quality data. Training models on different, well-prepared datasets gives greater generalizability and resilience. Steps to Prepare Data Data collection and ingestion: This feature allows cataloging datasets in the data storage tool of your choice. It also allows automatic data flow from many sources with the help of automated ingestion pipelines.Data cleaning and transformation: Certain data applications assist in cleansing and shaping unprocessed data into meaningful, useful forms.Data annotation and governance: Annotating specific datasets necessary for certain GenAI models can be done using annotation tools or cloud services. The more ample and well-structured the training sets are will help widen the ‘temporal cycles’ that can fit the models. Best Practices for GenAI Data Preparation Data governance: Ensure security through strict data protection, access, and legislative compliance regulations.Cloud-native compliance: Apply policies with the technology provider of your choice for user compliance verification.Data protection: Protect data access and ensure compliance with applicable legislation through regulatory data protection measures. Ensure you have a wide range of compliance certifications, including but not limited to SOC, GDPR, and HIPAA, which promise improved management of sensitive data.Cloud-native security: Take advantage of the tool provider of your choice's pre-existing security aspects, if available, which assist in advanced threat prevention with its ongoing surveillance and assurance of meeting set standards. Fine-Tuning Models Major cloud services would provide all the necessary resources to train and fine-tune GenAI models, including resources that can easily be reconfigured. Pre-trained model : Time and cost are greatly spared when employing already trained models, such as OpenAI's GPT-4 or DALL-E. Cloud GPUs or TPUs and frameworks such as Hugging Face, all of which allow for the adaptation of these models.Distributed training: Certain machine learning tools come with distributed training capabilities that enable good scaling across multiple nodes on the cloud. Moreover, it might be important for all programs to seek solutions for the development and resolution of problems of ethical artificial intelligence. Legitimate concerns regarding bias and fairness in AI can be effectively addressed with these tools, which often provide insights into model behavior and the detection and mitigation of biases. GenAI Modeling Factors for Deployment at Scale The evaluation of GenAI models in the revolutionary setting is always preceded by analyzing the cost of scalability, latency, and maintenance of the systems. Hosting models: Some OpenAI model deployments are achieved through scalable endpoints meant for ultra-low latency high-volume inferencing. Their sophisticated load balancer and elastically scaling resources buffer ensure that service delivery is superb regardless of the dynamic load. Serverless architectures: Serverless computing can automatically create the appropriate scale without the need for operable cost, although no per-infrastructural management is required. CI/CD integrates well with machine learning models, allowing model re-training and testing deployment to pipelines to be automated. The built-in monitoring and rollback feature guarantees rapid updates without excessive risk, making it perfect for managing highly available and reliable AI systems. Inference and Real-World Applications Inference, or the outputs produced from trained models, must be made while considering the aspects of latency, throughput, and cost. Considerations for Real-Time Inference Try using quantization or model pruning optimization techniques wherever possible to reduce the inference time. Be sure to employ managed inference services. Real-World Use Cases Predictive analytics: Knowing different patterns and facts using analytical methods drastically improves finance, health care, and logistics decisions.Automated content creation: Content generation employs AI to generate written content for various purposes, including creative writing, marketing, or product details. Challenges of Using GenAI Though GenAI offers promise, efforts at scaling its applications in the cloud have difficulties, including: Cost of infrastructure: Failure to properly understand the infrastructure requirements can lead to over-provisioning of resources or waste of vital infrastructure. Load testing and careful estimating of future demand are essential.Interdisciplinary collaboration: Even a functioning prototype often requires constructing and integrating cross-functional teams with technical and domain knowledge.Business alignment: Each model must be designed to solve so that value can be derived for each business. Modeling boosts development when data scientists, product management, and other stakeholders begin working together. Conclusion GenAI, when paired with cloud technology, provides an unparalleled possibility for innovation and scale. Organizations may overcome scaling problems by embracing the cloud's flexibility, enhanced capabilities, and cost-effectiveness, allowing GenAI to reach its disruptive promise.
Microservices and containers are revolutionizing how modern applications are built, deployed, and managed in the cloud. However, developing and operating microservices can introduce significant complexity, often requiring developers to spend valuable time on cross-cutting concerns like service discovery, state management, and observability. Dapr, or Distributed Application Runtime, is an open-source runtime for building microservices on cloud and edge environments. It provides platform-agnostic building blocks like service discovery, state management, pub/sub messaging, and observability out of the box. Dapr moved to the graduated maturity level of CNCF (Cloud Native Computing Foundation) and is currently used by many enterprises. When combined with Amazon Elastic Kubernetes Service (Amazon EKS), a managed Kubernetes service from AWS, Dapr can accelerate the adoption of microservices and containers, enabling developers to focus on writing business logic without worrying about infrastructure plumbing. Amazon EKS makes managing Kubernetes clusters easy, enabling effortless scaling as workloads change. In this blog post, we'll explore how Dapr simplifies microservices development on Amazon EKS. We'll start by diving into two essential building blocks: service invocation and state management. Service Invocation Seamless and reliable communication between microservices is crucial. However, developers often struggle with complex tasks like service discovery, standardizing APIs, securing communication channels, handling failures gracefully, and implementing observability. With Dapr's service invocation, these problems become a thing of the past. Your services can effortlessly communicate with each other using industry-standard protocols like gRPC and HTTP/HTTPS. Service invocation handles all the heavy lifting, from service registration and discovery to request retries, encryption, access control, and distributed tracing. State Management Dapr's state management building block simplifies the way developers work with the state in their applications. It provides a consistent API for storing and retrieving state data, regardless of the underlying state store (e.g., Redis, AWS DynamoDB, Azure Cosmos DB). This abstraction enables developers to build stateful applications without worrying about the complexities of managing and scaling state stores. Prerequisites In order to follow along this post, you should have the following: An AWS account. If you don’t have one, you can sign up for one.An IAM user with proper permissions. The IAM security principal that you're using must have permission to work with Amazon EKS IAM roles, service-linked roles, AWS CloudFormation, a VPC, and related resources. For more information, see Actions, resources, and condition keys for Amazon Elastic Container Service for Kubernetes and Using service-linked roles in the AWS Identity and Access Management User Guide. Application Architecture In the diagram below, we have two microservices: a Python app and a Node.js app. The Python app generates order data and invokes the /neworder endpoint exposed by the Node.js app. The Node.js app writes the incoming order data to a state store (in this case, Amazon ElastiCache) and returns an order ID to the Python app as a response. By leveraging Dapr's service invocation building block, the Python app can seamlessly communicate with the Node.js app without worrying about service discovery, API standardization, communication channel security, failure handling, or observability. It implements mTLS to provide secure service-to-service communication. Dapr handles these cross-cutting concerns, allowing developers to focus on writing the core business logic. Additionally, Dapr's state management building block simplifies how the Node.js app interacts with the state store (Amazon ElastiCache). Dapr provides a consistent API for storing and retrieving state data, abstracting away the complexities of managing and scaling the underlying state store. This abstraction enables developers to build stateful applications without worrying about the intricacies of state store management. The Amazon EKS cluster hosts a namespace called dapr-system, which contains the Dapr control plane components. The dapr-sidecar-injector automatically injects a Dapr runtime into the pods of Dapr-enabled microservices. Service Invocation Steps The order generator service (Python app) invokes the Node app’s method, /neworder. This request is sent to the local Dapr sidecar, which is running in the same pod as the Python app. Dapr resolves the target app using the Amazon EKS cluster’s DNS provider and sends the request to the Node app’s sidecar.The Node app’s sidecar then sends the request to the Node app microservice.Node app then writes the order ID received from the Python app to Amazon ElasticCache.The node app sends the response to its local Dapr sidecar.Node app’s sidecar forwards the response to the Python app’s Dapr sidecar. Python app side car returns the response to the Python app, which had initiated the request to the Node app's method /neworder. Deployment Steps Create and Confirm an EKS Cluster To set up an Amazon EKS (Elastic Kubernetes Service) cluster, you'll need to follow several steps. Here's a high-level overview of the process: Prerequisites Install and configure the AWS CLIInstall eksctl, kubectl, and AWS IAM Authenticator 1. Create an EKS cluster. Use eksctl to create a basic cluster with a command like: Shell eksctl create cluster --name my-cluster --region us-west-2 --node-type t3.medium --nodes 3 2. Configure kubectl. Update your kubeconfig to connect to the new cluster: Shell aws eks update-kubeconfig --name my-cluster --region us-west-2 3. Verify the cluster. Check if your nodes are ready: Shell kubectl get nodes Install DAPR on Your EKS cluster 1. Install DAPR CLI: Shell wget -q https://raw.githubusercontent.com/dapr/cli/master/install/install.sh -O - | /bin/bash 2. Verify installation: Shell dapr -h 3. Install DAPR and validate: Shell dapr init -k --dev dapr status -k The Dapr components statestore and pubsub are created in the default namespace. You can check it by using the command below: Shell dapr components -k Configure Amazon ElastiCache as Your Dapr StateStore Create Amazon ElastiCache to store the state for the microservice. In this example, we are using ElastiCache serverless, which quickly creates a cache that automatically scales to meet application traffic demands with no servers to manage. Configure the security group of the ElastiCache to allow connections from your EKS cluster. For the sake of simplicity, keep it in the same VPC as your EKS cluster. Take note of the cache endpoint, which we will need for the subsequent steps. Running a Sample Application 1. Clone the Git repo of the sample application: Shell git clone https://github.com/dapr/quickstarts.git 2. Create redis-state.yaml and provide an Amazon ElasticCache endpoint for redisHost: YAML apiVersion: dapr.io/v1alpha1 kind: Component metadata: name: statestore namespace: default spec: type: state.redis version: v1 metadata: - name: redisHost value: redisdaprd-7rr0vd.serverless.use1.cache.amazonaws.com:6379 - name: enableTLS value: true Apply yaml configuration for state store component using kubectl. Shell kubectl apply -f redis-state.yaml 3. Deploy microservices with the sidecar. For the microservice node app, navigate to the /quickstarts/tutorials/hello-kubernetes/deploy/node.yaml file and you will notice the below annotations. It tells the Dapr control plane to inject a sidecar and also assigns a name to the Dapr application. YAML annotations: dapr.io/enabled: "true" dapr.io/app-id: "nodeapp" dapr.io/app-port: "3000" Add an annotation service.beta.kubernetes.io/aws-load-balancer-scheme: "internet-facing" in node.yaml to create AWS ELB. YAML kind: Service apiVersion: v1 metadata: name: nodeapp annotations: service.beta.kubernetes.io/aws-load-balancer-scheme: "internet-facing" labels: app: node spec: selector: app: node ports: - protocol: TCP port: 80 targetPort: 3000 type: LoadBalancer Deploy the node app using kubectl. Navigate to the directory /quickstarts/tutorials/hello-kubernetes/deploy and execute the below command. Shell kubectl apply -f node.yaml Obtain the AWS NLB, which appears under External IP, on the output of the below command. Shell kubectl get svc nodeapp http://k8s-default-nodeapp-3a173e0d55-f7b14bedf0c4dd8.elb.us-east-1.amazonaws.com Navigate to the /quickstarts/tutorials/hello-kubernetes directory, which has sample.json file to execute the below step. Shell curl --request POST --data "@sample.json" --header Content-Type:application/json http://k8s-default-nodeapp-3a173e0d55-f14bedff0c4dd8.elb.us-east-1.amazonaws.com/neworder You can verify the output by accessing /order endpoint using the load balancer in a browser. Plain Text http://k8s-default-nodeapp-3a173e0d55-f7b14bedff0c4dd8.elb.us-east-1.amazonaws.com/order You will see the output as {“OrderId”:“42”} Next, deploy the second microservice Python app, which has a business logic to generate a new order ID every second and invoke the Node app’s method /neworder. Navigate to the directory /quickstarts/tutorials/hello-kubernetes/deploy and execute the below command. Shell kubectl apply -f python.yaml 4. Validating and testing your application deployment. Now that we have both the microservices deployed. The Python app is generating orders and invoking /neworder as evident from the logs below. Shell kubectl logs --selector=app=python -c daprd --tail=-1 SystemVerilog time="2024-03-07T12:43:11.556356346Z" level=info msg="HTTP API Called" app_id=pythonapp instance=pythonapp-974db9877-dljtw method="POST /neworder" scope=dapr.runtime.http-info type=log useragent=python-requests/2.31.0 ver=1.12.5 time="2024-03-07T12:43:12.563193147Z" level=info msg="HTTP API Called" app_id=pythonapp instance=pythonapp-974db9877-dljtw method="POST /neworder" scope=dapr.runtime.http-info type=log useragent=python-requests/2.31.0 ver=1.12.5 We can see that the Node app is receiving the requests and writing to the state store Amazon ElasticCache in our example. Shell kubectl logs —selector=app=node -c node —tail=-1 SystemVerilog Got a new order! Order ID: 367 Successfully persisted state for Order ID: 367 Got a new order! Order ID: 368 Successfully persisted state for Order ID: 368 Got a new order! Order ID: 369 Successfully persisted state for Order ID: 369 In order to confirm whether the data is persisted in Amazon ElasticCache we access the endpoint /order below. It returns the latest order ID, which was generated by the Python app. Plain Text http://k8s-default-nodeapp-3a173e0d55-f7b14beff0c4dd8.elb.us-east-1.amazonaws.com/order You will see an output with the most recent order as {“OrderId”:“370”}. Clean up Run the below command to delete the deployments Node app and Python app along with the state store component. Navigate to the /quickstarts/tutorials/hello-kubernetes/deploy directory to execute the below command. YAML kubectl delete -f node.yaml kubectl delete -f python.yaml You can tear down your EKS cluster using the eksctl command and delete Amazon ElastiCache. Navigate to the directory that has the cluster.yaml file used to create the cluster in the first step. Shell eksctl delete cluster -f cluster.yaml Conclusion Dapr and Amazon EKS form a powerful alliance for microservices development. Dapr simplifies cross-cutting concerns, while EKS manages Kubernetes infrastructure, allowing developers to focus on core business logic and boost productivity. This combination accelerates the creation of scalable, resilient, and observable applications, significantly reducing operational overhead. It's an ideal foundation for your microservices journey. Watch for upcoming posts exploring Dapr and EKS's capabilities in distributed tracing and observability, offering deeper insights and best practices.
Coupling Go's lightweight programming capabilities with AWS' robust AI services allows developers to build performant, scalable, and intelligent microservices devoted to diverse business needs. This blog explains how Go, and AWS AI services can be combined to create intelligent microservices, discusses the benefits of this approach, and provides a step-by-step guide to getting started. Why Use Go for Microservices? Golang, or Go, is a statically typed, compiled programming language that speaks Google. It aims to meet some requirements regarding simplicity, performance, and scalability. Combined, they make it an excellent choice for building microservices: Concurrency. Its built-in concurrency support through goroutines and channels lets developers easily address multiple tasks without incurring a big performance overhead.Fast compilation and execution. Because it is a compiled language, Go offers high execution speeds and fast build times, which is essential for microservices needing to respond quickly to user requests.Minimal memory footprint. Effective memory usage means that Go keeps its microservices small and, hence, cheap.Rich standard library. Its great built-in standard library includes tools for networking, HTTP handling, and JSON parsing, making it easier to develop microservices.Scalability. Go was intrinsic at the creation stage to keep the philosophy simple and foolproof, aiding the developers in building and maintaining scalable systems easily. Why Choose AWS AI Services? AWS offers developer AI service suites for NLP, computer vision, ML, and predictive analysis. The seamless combination of AWS AI services with microservices offers the following: The major advantage of AWS AI services is their SDK and API platform, which would make integration much easier on microservices made in Go.AWS automatically scales its services for demand to maintain consistent performance under varying workloads.AWS's pay-as-you-use model ensures one only pays for the resources utilized.Pre-trained from Amazon NLP (Amazon Comprehend), image recognition (Amazon Rekognition), and text-to-speech (Amazon Polly), the list goes on the likes of these.AWS follows standard security protocols in the industry to protect user data for AI services. Key AWS AI Services for Intelligent Microservices Highlighted below are some AWS AI services that can be used for building intelligent microservices: Amazon Recognition. Provides image and video analysis capabilities such as object detection, facial recognition, and content moderation.Amazon Comprehend. An application that offers features such as natural language processing for sentiment analysis, entity recognition, and language detection.Amazon Polly. Text-to-speech conversion tool; apps with voice-enabled functionality are built.Amazon Sage Maker. ML model building training and deployment tool.Amazon Translate. Provides real-time and batch language translation. Amazon Textract. Extracting text and data from forms and tables in scanned documents.Amazon Lex. Enables the creation of conversational interfaces for applications using voice and text.Amazon Transcribe. Converts speech into text for applications like transcription services and voice analytics. The Architecture of Intelligent Microservices With Go and AWS The architecture of intelligent microservices involves several layers: Frontend layer. User interfaces or APIs that interact with end users.Microservices layer. Go-based microservices that handle specific business functionalities. Each Microservices communicates with the AWS AI services for processing.Data layer. Includes databases or data storage solutions, such as Amazon RDS, DynamoDB, or S3, for managing application data.AWS AI integration layer. AWS AI services that process data and return results to the microservices.Monitoring and logging. Tools like AWS CloudWatch and AWS X-Ray to monitor the performance and diagnose issues in the microservices. A Step-by-Step Guide Step 1: Setting Up the Development Environment Go Configuration Basics Download and install Go from the official Go website. After installation, have your Go workspace set up and environment variables specified. Once Go is ready, install AWS SDK for Go for AWS services integration. Configure your AWS credentials using AWS CLI for secure authenticated access to your services. Step 2: Design the Microservices Channel the microservices through their specialization. For image analysis service, set up Amazon Rekognition to identify objects on an image; use Amazon Comprehend as a sentiment analysis service that analyzes user feedback; and utilize Amazon Polly as the text-to-speech conversion service to speak textual notifications. Each microservice solves a particular business requirement without losing flexibility. Step 3: Integrating AWS AI Services Make the necessary interconnections between microservices and AWS AI services by creating AWS sessions, starting the service client, and calling the appropriate APIs. At this juncture, proper communication is ensured and remains efficient between the microservices and AI services, thus giving intelligent results. Step 4: Deployment of the Microservices After microservice development, dockerize the microservices for portability and consistent work across environments. Appropriately configure the containers for the various services. Use Kubernetes/AWS ECS to orchestrate and manage the deployment of the containerized microservices for greater availability and scalability. Monitor performance and enable logging through AWS CloudWatch, while having the Auto Scaling groups to cater to the different workloads. Step 5: Testing and Optimization Conduct thorough unit and integration tests to verify that every microservice works as it should. Understand microservice communication performance with respect to AWS services to boost its performance and improve responsiveness and resource utilization. The frequent testing and process iteration would serve to ensure the reliability and scalability of the system. Benefits of Using Go and AWS AI Services With improved productivity. Go's simplicity and managed services of AWS reduce the time and effort needed for intelligent application building.Improving scalability. The lightweight Go combined with elastic AWS infrastructure guarantees the seamless scale of microservices.Cost efficiency. The pay-as-you-go pricing model of AWS and Go's low memory footprint enhances cost savings.Intelligence. AWS AI services add intelligent capabilities to microservices, like advanced functionalities such as sentiment analysis, image recognition, and speech synthesis. Conclusion Building intelligent microservices with the combination of Go and AWS AI services thus offers great performance, scale, and advanced functions. With the strengths of Go's efficient design and AWS AI technologies for intelligent apps, developers are already creating microservices that meet modern business needs. Whatever the goal better customer experience, improved business propositions, or real-time analysis-integration of Go and AWS requires both adaptability and sturdiness in application ecosystems. The deployment of microservices allows businesses to innovate faster and to easily adapt to changing requirements while not breaking the whole system. Between this, AWS AI services allow for many easily integrated pre-trained models and tools. This reduces AI-driven solutions' complexity, giving teams the time and space to deliver value to their users.
The Rise of LLMs and the Need for Efficiency In recent years, large language models (LLMs) such as GPT, Llama, and Mistral have impacted natural language understanding and generation. However, a significant challenge in deploying these models lies in optimizing their performance, particularly for tasks involving long text generation. One powerful technique to address this challenge is key-value caching (KV cache). In this article, we will delve into how KV caching works, its role within the attention mechanism, and how it enhances efficiency in LLMs. How Large Language Models Generate Text To truly understand token generation, we need to start with the basics of how sentences are processed in LLMs. Step 1: Tokenization Before a model processes a sentence, it breaks it into smaller pieces called tokens. Example sentence: Why is the sky blue? Tokens can represent words, subwords, or even characters, depending on the tokenizer used. For simplicity, let’s assume the sentence is tokenized as:['Why', 'is', 'the', 'sky', 'blue', '?'] Each token is assigned a unique ID, forming a sequence like:[1001, 1012, 2031, 3021, 4532, 63] Step 2: Embedding Lookup Token IDs are mapped to high-dimensional vectors, called embeddings, using a learned embedding matrix. Example: Token “Why” (ID: 1001) → Vector: [-0.12, 0.33, 0.88, ...]Token “is” (ID: 1012) → Vector: [0.11, -0.45, 0.67, ...] The sentence is then represented as a sequence of embedding vectors:[Embedding("Why"), Embedding("is"), Embedding("the"), ...] Step 3: Contextualizing Tokens With Attention Raw embeddings don’t capture context. For instance, the meaning of “sky” differs in the sentences “Why is the sky blue?” and “The sky is clear today.” To add context, LLMs use the attention mechanism. How Attention Works: (Keys, Queries, and Values) The attention mechanism uses three components: Query (Q). Represents the current token’s embedding, transformed through a learned weight matrix. It determines how much attention to give to other tokens in the sequence.Key (K). Encodes information about each token (including previous ones), transformed through a learned weight matrix. It is used to assess relevance by comparing it to the query (Q).Value (V). Represents the actual content of the tokens, providing the information that the model “retrieves” based on the attention scores. Example: Let's consider the LLM processing the sentence in the example, and the current token is“the.” When processing the token “the,” the model attends to all previously processed tokens (“Why,” “is,” “the”) using their key (K) and value (V) representations. Query (Q) for “the”: The Query vector for “the” is derived by applying a learned weight matrix to its embedding:Q("the") = WQ ⋅ Embedding("the") Keys (K) and Values (V) for previous tokens: Each previous token generates: Key (K): K("why") = WK ⋅ Embedding("why")Value (V): V("why") = Embedding("why") Attention Calculation The model calculates relevance by comparing Q (“the”) with all previous K vectors (“why”, “is”, and “the”) using a dot product. The resulting scores are normalized with softmax to compute attention weights. These weights are applied to the corresponding V vectors to update the contextual representation of “the.” In summary: Q (the). The embedding of “the” passed through a learned weight matrix WQ to form the query vector Q for the token “the.” This query is used to determine how much attention “the” should pay to other tokens.K (why). The embedding of “why,” passed through a learned weight matrix WK to form the key vector K for “why.” This key is compared with Q (the) to compute attention relevance.V (why). The embedding of “why,” passed through a learned weight matrix WV to form the value vector V for “why.” This value contributes to updating the contextual representation of “the” based on its attention weight relative to Q (the). Step 4: Updating the Sequence Each token’s embedding is updated based on its relationships with all other tokens. This process is repeated across multiple attention layers, with each layer refining the contextual understanding. Step 5: Generating the Next Token (Sampling) Once embeddings are contextualized across all layers, the model outputs a logits vector — a raw score distribution over the vocabulary — for each token position. For text generation, the model focuses on the logits for the last position. The logits are converted into probabilities using a softmax function. Sampling Strategies Greedy sampling. Selects the token with the highest probability (in the image above, it uses greedy sampling and selects “because”).Top-k sampling. Chooses randomly among the top k probable tokens.Temperature sampling. Adjusts the probability distribution to control randomness (e.g., higher temperature = more random choices). How Key-Value Cache Helps Without a KV Cache At each generation step, the model recomputes the keys and values for all tokens in the sequence, even those already processed. This results in a quadratic computational cost (O(n²)), where n is the number of tokens, making it inefficient for long sequences. With a KV Cache The model stores the keys and values for previously processed tokens in memory. When generating a new token, it reuses the cached keys and values, and computes only the key, value, and query for the new token. This optimization significantly reduces the need for recalculating attention components for the entire sequence, improving both computational time and memory usage. Code With KV Cache Suppose the model has already generated the sequence “Why is the sky.” The keys and values for these tokens are stored in the cache. When generating the next token, “blue”: The model retrieves the cached keys and values for the tokens “Why,” “is,” “the,” and “sky.”It computes the query, key, and value for “blue” and performs attention calculations using the query for “blue” with the cached keys and values.The newly calculated key and value for “blue” are added to the cache for future use. Python import torch import time from transformers import AutoTokenizer, AutoModelForCausalLM # Load tokenizer and model tokenizer = AutoTokenizer.from_pretrained("meta-llama/Llama-3.2-1B") model = AutoModelForCausalLM.from_pretrained("meta-llama/Llama-3.2-1B") # Move model to the appropriate device device = torch.device("cuda" if torch.cuda.is_available() else "cpu") model.to(device) # Input text input_text = "Why is the sky blue?" input_ids = tokenizer.encode(input_text, return_tensors="pt").to(device) def generate_tokens(use_cache, steps=100): """ Function to generate tokens with or without caching. Args: use_cache (bool): Whether to enable cache reuse. steps (int): Number of new tokens to generate. Returns: generated_text (str): The generated text. duration (float): Time taken for generation. """ past_key_values = None # Initialize past key values input_ids_local = input_ids # Start with initial input generated_tokens = tokenizer.decode(input_ids_local[0]).split() start_time = time.time() for step in range(steps): outputs = model( input_ids=input_ids_local, use_cache=use_cache, past_key_values=past_key_values, ) logits = outputs.logits past_key_values = outputs.past_key_values if use_cache else None # Cache for next iteration # Get the next token (argmax over logits) next_token_id = torch.argmax(logits[:, -1, :], dim=-1) # Decode and append the new token new_token = tokenizer.decode(next_token_id.squeeze().cpu().numpy()) generated_tokens.append(new_token) # Update input IDs for next step if use_cache: input_ids_local = next_token_id.unsqueeze(0) # Only the new token for cached mode else: input_ids_local = torch.cat([input_ids_local, next_token_id.unsqueeze(0)], dim=1) end_time = time.time() duration = end_time - start_time generated_text = " ".join(generated_tokens) return generated_text, duration # Measure time with and without cache steps_to_generate = 200 # Number of tokens to generate print("Generating tokens WITHOUT cache...") output_no_cache, time_no_cache = generate_tokens(use_cache=False, steps=steps_to_generate) print(f"Output without cache: {output_no_cache}") print(f"Time taken without cache: {time_no_cache:.2f} seconds\n") print("Generating tokens WITH cache...") output_with_cache, time_with_cache = generate_tokens(use_cache=True, steps=steps_to_generate) print(f"Output with cache: {output_with_cache}") print(f"Time taken with cache: {time_with_cache:.2f} seconds\n") # Compare time difference time_diff = time_no_cache - time_with_cache print(f"Time difference (cache vs no cache): {time_diff:.2f} seconds") When Is Key-Value Caching Most Effective? The benefits of KV cache depend on several factors: Model size. Larger models (e.g., 7B, 13B) perform more computations per token, so caching saves more time.Sequence length. KV cache is more effective for longer sequences (e.g., generating 200+ tokens).Hardware. GPUs benefit more from caching compared to CPUs, due to parallel computation. Extending KV Cache: Prompt Caching While KV cache optimizes text generation by reusing keys and values for previously generated tokens, prompt caching goes a step further by targeting the static nature of the input prompt. Let’s explore what prompt caching is and its significance. What Is Prompt Caching? Prompt caching involves pre-computing and storing the keys and values for the input prompt before the generation process starts. Since the input prompt does not change during text generation, its keys and values remain constant and can be efficiently reused. Why Prompt Caching Matters Prompt caching offers distinct advantages in scenarios with large prompts or repeated use of the same input: Avoids redundant computation. Without prompt caching, the model recalculates the keys and values for the input prompt every time it generates a token. This leads to unnecessary computational overhead.Speeds up generation. By pre-computing these values once, prompt caching significantly accelerates the generation process, particularly for lengthy input prompts or when generating multiple completions.Optimized for batch processing. Prompt caching is invaluable in cases where the same prompt is reused across multiple batched requests or slight variations, ensuring consistent efficiency. Python import time import torch from transformers import AutoModelForCausalLM, AutoTokenizer # Load model and tokenizer model_name = "mistralai/Mistral-7B-v0.1" tokenizer = AutoTokenizer.from_pretrained(model_name) model = AutoModelForCausalLM.from_pretrained(model_name, device_map="auto", torch_dtype=torch.float16) assistant_prompt = "You are a helpful and knowledgeable assistant. Answer the following question thoughtfully:\n" # Tokenize the assistant prompt input_ids = tokenizer(assistant_prompt, return_tensors="pt").to(model.device) # Step 1: Cache Keys and Values for the assistant prompt with torch.no_grad(): start_time = time.time() outputs = model(input_ids=input_ids.input_ids, use_cache=True) past_key_values = outputs.past_key_values # Cache KV pairs for the assistant prompt prompt_cache_time = time.time() - start_time print(f"Prompt cached in {prompt_cache_time:.2f} seconds\n") # Function to generate responses for separate questions def generate_response(question, past_key_values): question_prompt = f"Question: {question}\nAnswer:" question_ids = tokenizer(question_prompt, return_tensors="pt").to(model.device) # Append question tokens after assistant cached tokens input_ids_combined = torch.cat((input_ids.input_ids, question_ids.input_ids), dim=-1) generated_ids = input_ids_combined # Initialize with prompt + question num_new_tokens = 50 # Number of tokens to generate with torch.no_grad(): for _ in range(num_new_tokens): outputs = model(input_ids=generated_ids, past_key_values=past_key_values, use_cache=True) next_token_id = outputs.logits[:, -1].argmax(dim=-1).unsqueeze(0) # Pick next token generated_ids = torch.cat((generated_ids, next_token_id), dim=-1) # Append next token past_key_values = outputs.past_key_values # Update KV cache response = tokenizer.decode(generated_ids[0], skip_special_tokens=True) return response, past_key_values # Step 2: Pass multiple questions questions = [ "Why is the sky blue?", "What causes rain?", "Why do we see stars at night?" ] # Generate answers for each question for i, question in enumerate(questions, 1): start_time = time.time() response, past_key_values = generate_response(question, past_key_values) response_time = time.time() - start_time print(f"Question {i}: {question}") print(f"Generated Response: {response.split('Answer:')[-1].strip()}") print(f"Time taken: {response_time:.2f} seconds\n") For example: Customer support bots. The system prompt often remains unchanged for every user interaction. prompt caching allows the bot to generate responses efficiently without recomputing the keys and values of the static system prompt.Creative content generation. When multiple completions are generated from the same input prompt, varying randomness (e.g., temperature settings) can be applied while reusing cached keys and values for the input. Conclusion Key-value caching (KV vache) plays a crucial role in optimizing the performance of LLMs. Reusing previously computed keys and values reduces computational overhead, speeds up generation, and improves efficiency, particularly for long sequences and large models. Implementing KV caching is essential for real-world applications like summarization, translation, and dialogue systems, enabling LLMs to scale effectively and provide faster, more reliable results. Combined with techniques like prompt caching, KV cache ensures that LLMs can handle complex and resource-intensive tasks with improved efficiency. I hope you found this article useful, and if you did, consider giving claps.
Problem Statement Challenge Organizations running containerized applications in Kubernetes often need to capture and preserve the state of running containers for: Disaster recoveryApplication migrationDebug/troubleshootingState preservationEnvironment reproduction However, there's no straightforward, automated way to: Create container checkpoints on-demandStore these checkpoints in a standardized formatMake them easily accessible across clustersTrigger checkpointing through a standard interface Current Limitations Manual checkpoint creation requires direct cluster accessNo standardized storage format for checkpointsLimited integration with container registriesLack of programmatic access for automationComplex coordination between containerd and storage systems Solution A Kubernetes sidecar service that: Exposes checkpoint functionality via REST APIAutomatically converts checkpoints to OCI-compliant imagesStores images in ECR for easy distributionIntegrates with existing Kubernetes infrastructureProvides a standardized interface for automation This solves the core problems by: Automating the checkpoint processStandardizing checkpoint storageMaking checkpoints portableEnabling programmatic accessSimplifying integration with existing workflows Target users: DevOps teamsPlatform engineersApplication developersSite Reliability Engineers (SREs) Forensic container checkpointing is based on Checkpoint/Restore In Userspace (CRIU) and allows the creation of stateful copies of a running container without the container knowing that it is being checkpointed. The copy of the container can be analyzed and restored in a sandbox environment multiple times without the original container being aware of it. Forensic container checkpointing was introduced as an alpha feature in Kubernetes v1.25. This article will guide you on how to deploy Golang code that can be used to take a container checkpoint using an API. The code takes a pod identifier, retrieves the container ID from containerd as an input, and then uses the ctr command to checkpoint the specific container in the k8s.io namespace of containerd: Prerequisites Kubernetes clusterInstall ctr commandline tool. if you are able to run ctr commands on the kubelet or worker node; if not, install or adjust AMI to contain the ctr. kubectl configured to communicate with your clusterDocker installed locallyAccess to a container registry (e.g., Docker Hub, ECR)Helm (for installing Nginx Ingress Controller) Step 0: Code to Create Container Checkpoint Using GO Create a file named checkpoint_container.go with the following content: Go package main import ( "context" "fmt" "log" "os" "os/exec" "strings" "github.com/aws/aws-sdk-go/aws" "github.com/aws/aws-sdk-go/aws/session" "github.com/aws/aws-sdk-go/service/ecr" "github.com/containerd/containerd" "github.com/containerd/containerd/namespaces" ) func init() { log.SetOutput(os.Stdout) log.SetFlags(log.Ldate | log.Ltime | log.Lmicroseconds | log.Lshortfile) } func main() { if len(os.Args) < 4 { log.Fatal("Usage: checkpoint_container <pod_identifier> <ecr_repo> <aws_region>") } podID := os.Args[1] ecrRepo := os.Args[2] awsRegion := os.Args[3] log.Printf("Starting checkpoint process for pod %s", podID) containerID, err := getContainerIDFromPod(podID) if err != nil { log.Fatalf("Error getting container ID: %v", err) } err = processContainerCheckpoint(containerID, ecrRepo, awsRegion) if err != nil { log.Fatalf("Error processing container checkpoint: %v", err) } log.Printf("Successfully checkpointed container %s and pushed to ECR", containerID) } func getContainerIDFromPod(podID string) (string, error) { log.Printf("Searching for container ID for pod %s", podID) client, err := containerd.New("/run/containerd/containerd.sock") if err != nil { return "", fmt.Errorf("failed to connect to containerd: %v", err) } defer client.Close() ctx := namespaces.WithNamespace(context.Background(), "k8s.io") containers, err := client.Containers(ctx) if err != nil { return "", fmt.Errorf("failed to list containers: %v", err) } for _, container := range containers { info, err := container.Info(ctx) if err != nil { continue } if strings.Contains(info.Labels["io.kubernetes.pod.uid"], podID) { log.Printf("Found container ID %s for pod %s", container.ID(), podID) return container.ID(), nil } } return "", fmt.Errorf("container not found for pod %s", podID) } func processContainerCheckpoint(containerID, ecrRepo, region string) error { log.Printf("Processing checkpoint for container %s", containerID) checkpointPath, err := createCheckpoint(containerID) if err != nil { return err } defer os.RemoveAll(checkpointPath) imageName, err := convertCheckpointToImage(checkpointPath, ecrRepo, containerID) if err != nil { return err } err = pushImageToECR(imageName, region) if err != nil { return err } return nil } func createCheckpoint(containerID string) (string, error) { log.Printf("Creating checkpoint for container %s", containerID) checkpointPath := "/tmp/checkpoint-" + containerID cmd := exec.Command("ctr", "-n", "k8s.io", "tasks", "checkpoint", containerID, "--checkpoint-path", checkpointPath) output, err := cmd.CombinedOutput() if err != nil { return "", fmt.Errorf("checkpoint command failed: %v, output: %s", err, output) } log.Printf("Checkpoint created at: %s", checkpointPath) return checkpointPath, nil } func convertCheckpointToImage(checkpointPath, ecrRepo, containerID string) (string, error) { log.Printf("Converting checkpoint to image for container %s", containerID) imageName := ecrRepo + ":checkpoint-" + containerID cmd := exec.Command("buildah", "from", "scratch") containerId, err := cmd.Output() if err != nil { return "", fmt.Errorf("failed to create container: %v", err) } cmd = exec.Command("buildah", "copy", string(containerId), checkpointPath, "/") err = cmd.Run() if err != nil { return "", fmt.Errorf("failed to copy checkpoint: %v", err) } cmd = exec.Command("buildah", "commit", string(containerId), imageName) err = cmd.Run() if err != nil { return "", fmt.Errorf("failed to commit image: %v", err) } log.Printf("Created image: %s", imageName) return imageName, nil } func pushImageToECR(imageName, region string) error { log.Printf("Pushing image %s to ECR in region %s", imageName, region) sess, err := session.NewSession(&aws.Config{ Region: aws.String(region), }) if err != nil { return fmt.Errorf("failed to create AWS session: %v", err) } svc := ecr.New(sess) authToken, registryURL, err := getECRAuthorizationToken(svc) if err != nil { return err } err = loginToECR(authToken, registryURL) if err != nil { return err } cmd := exec.Command("podman", "push", imageName) err = cmd.Run() if err != nil { return fmt.Errorf("failed to push image to ECR: %v", err) } log.Printf("Successfully pushed checkpoint image to ECR: %s", imageName) return nil } func getECRAuthorizationToken(svc *ecr.ECR) (string, string, error) { log.Print("Getting ECR authorization token") output, err := svc.GetAuthorizationToken(&ecr.GetAuthorizationTokenInput{}) if err != nil { return "", "", fmt.Errorf("failed to get ECR authorization token: %v", err) } authData := output.AuthorizationData[0] log.Print("Successfully retrieved ECR authorization token") return *authData.AuthorizationToken, *authData.ProxyEndpoint, nil } func loginToECR(authToken, registryURL string) error { log.Printf("Logging in to ECR at %s", registryURL) cmd := exec.Command("podman", "login", "--username", "AWS", "--password", authToken, registryURL) err := cmd.Run() if err != nil { return fmt.Errorf("failed to login to ECR: %v", err) } log.Print("Successfully logged in to ECR") return nil } Step 1: Initialize the go Module Shell go mod init checkpoint_container Modify the go.mod file: Go module checkpoint_container go 1.23 require ( github.com/aws/aws-sdk-go v1.44.298 github.com/containerd/containerd v1.7.2 ) require ( github.com/jmespath/go-jmespath v0.4.0 // indirect github.com/opencontainers/go-digest v1.0.0 // indirect github.com/opencontainers/image-spec v1.1.0-rc2.0.20221005185240-3a7f492d3f1b // indirect github.com/pkg/errors v0.9.1 // indirect google.golang.org/genproto v0.0.0-20230306155012-7f2fa6fef1f4 // indirect google.golang.org/grpc v1.53.0 // indirect google.golang.org/protobuf v1.30.0 // indirect ) Run the following command: Shell go mod tidy Step 2: Build and Publish Docker Image Create a Dockerfile in the same directory: Dockerfile # Build stage FROM golang:1.20 as builder WORKDIR /app COPY . . RUN CGO_ENABLED=0 GOOS=linux go build -o checkpoint_container # Final stage FROM amazonlinux:2 # Install necessary tools RUN yum update -y && \ amazon-linux-extras install -y docker && \ yum install -y awscli containerd skopeo && \ yum clean all # Copy the built Go binary COPY --from=builder /app/checkpoint_container /usr/local/bin/checkpoint_container EXPOSE 8080 ENTRYPOINT ["checkpoint_container"] This Dockerfile does the following: Uses golang:1.20 as the build stage to compile your Go application.Uses amazonlinux:2 as the final base image.Installs the AWS CLI, Docker (which includes containerd), and skopeo using yum and amazon-linux-extras.Copies the compiled Go binary from the build stage. Shell docker build -t <your-docker-repo>/checkpoint-container:v1 . docker push <your-docker-repo>/checkpoint-container:v1 Replace <your-docker-repo> with your actual Docker repository. Step 3: Apply the RBAC resources Create a file named rbac.yaml: YAML apiVersion: v1 kind: ServiceAccount metadata: name: checkpoint-sa namespace: default --- apiVersion: rbac.authorization.k8s.io/v1 kind: Role metadata: name: checkpoint-role namespace: default rules: - apiGroups: [""] resources: ["pods"] verbs: ["get", "list"] --- apiVersion: rbac.authorization.k8s.io/v1 kind: RoleBinding metadata: name: checkpoint-rolebinding namespace: default subjects: - kind: ServiceAccount name: checkpoint-sa namespace: default roleRef: kind: Role name: checkpoint-role apiGroup: rbac.authorization.k8s.io Apply the RBAC resources: Shell kubectl apply -f rbac.yaml Step 4: Create a Kubernetes Deployment Create a file named deployment.yaml: YAML apiVersion: apps/v1 kind: Deployment metadata: name: main-app namespace: default spec: replicas: 1 selector: matchLabels: app: main-app template: metadata: labels: app: main-app spec: serviceAccountName: checkpoint-sa containers: - name: main-app image: nginx:latest # Replace with your main application image - name: checkpoint-sidecar image: <your-docker-repo>/checkpoint-container:v1 ports: - containerPort: 8080 securityContext: privileged: true volumeMounts: - name: containerd-socket mountPath: /run/containerd/containerd.sock volumes: - name: containerd-socket hostPath: path: /run/containerd/containerd.sock type: Socket Apply the deployment: Shell kubectl apply -f deployment.yaml In deployment.yaml, update the following: YAML image: <your-docker-repo>/checkpoint-container:v1 Step 5: Kubernetes Service Create a file named service.yaml: YAML apiVersion: v1 kind: Service metadata: name: checkpoint-service namespace: default spec: selector: app: main-app ports: - protocol: TCP port: 80 targetPort: 8080 Apply the service: Shell kubectl apply -f service.yaml Step 6: Install Ngnix Ingress Contoller Shell helm repo add ingress-nginx https://kubernetes.github.io/ingress-nginx helm repo update helm install ingress-nginx ingress-nginx/ingress-nginx Step 7: Create Ingress Resource Create a file named ingress.yaml: YAML apiVersion: networking.k8s.io/v1 kind: Ingress metadata: name: checkpoint-ingress annotations: kubernetes.io/ingress.class: nginx nginx.ingress.kubernetes.io/ssl-redirect: "false" spec: rules: - http: paths: - path: /checkpoint pathType: Prefix backend: service: name: checkpoint-service port: number: 80 Apply the Ingress: Shell kubectl apply -f ingress.yaml Step 8: Test the API Shell kubectl get services ingress-ngnix-contoller -n ingress-ngnix Shell curl -X POST http://<EXTERNAL-IP>/checkpoint \ -H "Content-Type: application/json" \ -d '{"podId": "your-pod-id", "ecrRepo": "your-ecr-repo", "awsRegion": "your-aws-region"}' Replace <EXTERNAL-IP> with the actual external IP. Additional Considerations Security. Implement HTTPS by setting up TLS certificatesAdd authentication to the APIMonitoring. Set up logging and monitoring for the API and checkpoint process.Resource management. Configure resource requests and limits for the sidecar container.Error handling. Implement robust error handling in the Go application.Testing. Thoroughly test the setup in a non-production environment before deploying it to production.Documentation. Maintain clear documentation on how to use the checkpoint API. Conclusion This setup deploys the checkpoint container as a sidecar in Kubernetes and exposes its functionality through an API accessible from outside the cluster. It provides a flexible solution for managing container checkpoints in a Kubernetes environment. AWS/EKS Specific Step 7: Install the AWS Load Balancer Controller Instead of using the Nginx Ingress Controller, we'll use the AWS Load Balancer Controller. This controller will create and manage ALBs for our Ingress resources. 1. Add the EKS chart repo to Helm: Shell helm repo add eks https://aws.github.io/eks-charts 2. Install the AWS Load Balancer Controller: Shell helm install aws-load-balancer-controller eks/aws-load-balancer-controller \ -n kube-system \ --set clusterName=<your-cluster-name> \ --set serviceAccount.create=false \ --set serviceAccount.name=aws-load-balancer-controller Replace <your-cluster-name> with your EKS cluster name. Note: Ensure that you have the necessary IAM permissions set up for the AWS Load Balancer Controller. You can find the detailed IAM policy in the AWS documentation. Step 8: Create Ingress Resource Create a file named ingress.yaml: YAML apiVersion: networking.k8s.io/v1 kind: Ingress metadata: name: checkpoint-ingress annotations: kubernetes.io/ingress.class: alb alb.ingress.kubernetes.io/scheme: internet-facing alb.ingress.kubernetes.io/target-type: ip spec: rules: - http: paths: - path: /checkpoint pathType: Prefix backend: service: name: checkpoint-service port: number: 80 Apply the Ingress: Shell kubectl apply -f ingress.yaml Step 9: Test the API 1. Get the ALB DNS name: Shell kubectl get ingress checkpoint-ingress Look for the ADDRESS field, which will be the ALB's DNS name. 2. Send a test request: Shell curl -X POST http://<ALB-DNS-NAME>/checkpoint \ -H "Content-Type: application/json" \ -d '{"podId": "your-pod-id", "ecrRepo": "your-ecr-repo", "awsRegion": "your-aws-region"}' Replace <ALB-DNS-NAME> with the actual DNS name of your ALB from step 1. Additional Considerations for AWS ALB 1. Security groups. The ALB will have a security group automatically created. Ensure it allows inbound traffic on port 80 (and 443 if you set up HTTPS). 2. SSL/TLS: To enable HTTPS, you can add the following annotations to your Ingress: YAML alb.ingress.kubernetes.io/listen-ports: '[{"HTTP": 80}, {"HTTPS":443}]' alb.ingress.kubernetes.io/certificate-arn: arn:aws:acm:region:account-id:certificate/certificate-id 3. Access logs. Enable access logs for your ALB by adding the following: YAML alb.ingress.kubernetes.io/load-balancer-attributes: access_logs.s3.enabled=true,access_logs.s3.bucket=your-log-bucket,access_logs.s3.prefix=your-log-prefix 4. WAF integration. If you want to use AWS WAF with your ALB, you can add: YAML alb.ingress.kubernetes.io/waf-acl-id: your-waf-web-acl-id 5. Authentication. You can set up authentication using Amazon Cognito or OIDC by using the appropriate ALB Ingress Controller annotations. These changes will set up your Ingress using an AWS Application Load Balancer instead of Nginx. The ALB Ingress Controller will automatically provision and configure the ALB based on your Ingress resource. Conclusion Remember to ensure that your EKS cluster has the necessary IAM permissions to create and manage ALBs. This typically involves creating an IAM policy and a service account with the appropriate permissions. This setup will now use AWS's native load-balancing solution, which integrates well with other AWS services and can be more cost-effective in an AWS environment.
In the digital age, the ability to find relevant information quickly and accurately has become increasingly critical. From simple web searches to complex enterprise knowledge management systems, search technology has evolved dramatically to meet growing demands. This article explores the journey from index-based basic search engines to retrieval-based generation, examining how modern techniques are revolutionizing information access. The Foundation: Traditional Search Systems Traditional search systems were built on relatively simple principles: matching keywords and ranking results based on relevance, user signals, frequency, positioning, and many more. While effective for basic queries, these systems faced significant limitations. They struggled with understanding context, handling complex multi-part queries, resolving indirect references, performing nuanced reasoning, and providing user-specific personalization. These limitations became particularly apparent in enterprise settings, where information retrieval needs to be both precise and comprehensive. Python from collections import defaultdict import math class BasicSearchEngine: def __init__(self): self.index = defaultdict(list) self.document_freq = defaultdict(int) self.total_docs = 0 def add_document(self, doc_id, content): # Simple tokenization terms = content.lower().split() # Build inverted index for position, term in enumerate(terms): self.index[term].append((doc_id, position)) # Update document frequencies unique_terms = set(terms) for term in unique_terms: self.document_freq[term] += 1 self.total_docs += 1 def search(self, query): terms = query.lower().split() scores = defaultdict(float) for term in terms: if term in self.index: idf = math.log(self.total_docs / self.document_freq[term]) for doc_id, position in self.index[term]: tf = 1 # Simple TF scoring scores[doc_id] += tf * idf return sorted(scores.items(), key=lambda x: x[1], reverse=True) # Usage example search_engine = BasicSearchEngine() search_engine.add_document("doc1", "Traditional search systems use keywords") search_engine.add_document("doc2", "Modern systems employ advanced techniques") results = search_engine.search("search systems") Enterprise Search: Bridging the Gap Enterprise search introduced new complexities and requirements that consumer search engines weren't designed to handle. Organizations needed systems that could search across diverse data sources, respect complex access controls, understand domain-specific terminology, and maintain context across different document types. These challenges drove the development of more sophisticated retrieval techniques, setting the stage for the next evolution in search technology. The Paradigm Shift: From Document Retrieval to Answer Generation The landscape of information access underwent a dramatic transformation in early 2023 with the widespread adoption of large language models (LLMs) and the emergence of retrieval-augmented generation (RAG). Traditional search systems, which primarily focused on returning relevant documents, were no longer sufficient. Instead, organizations needed systems that could not only find relevant information but also provide it in a format that LLMs could effectively use to generate accurate, contextual responses. This shift was driven by several key developments: The emergence of powerful embedding models that could capture semantic meaning more effectively than keyword-based approaches The development of efficient vector databases that could store and query these embeddings at scale The recognition that LLMs, while powerful, needed accurate and relevant context to provide reliable responses The traditional retrieval problem thus evolved into an intelligent, contextual answer generation problem, where the goal wasn't just to find relevant documents, but to identify and extract the most pertinent pieces of information that could be used to augment LLM prompts. This new paradigm required rethinking how we chunk, store, and retrieve information, leading to the development of more sophisticated ingestion and retrieval techniques. Python import numpy as np from transformers import AutoTokenizer, AutoModel import torch class ModernRetrievalSystem: def __init__(self, model_name="sentence-transformers/all-MiniLM-L6-v2"): self.tokenizer = AutoTokenizer.from_pretrained(model_name) self.model = AutoModel.from_pretrained(model_name) self.document_store = {} def _get_embedding(self, text: str) -> np.ndarray: """Generate embedding for a text snippet""" inputs = self.tokenizer(text, return_tensors="pt", max_length=512, truncation=True, padding=True) with torch.no_grad(): outputs = self.model(**inputs) embedding = outputs.last_hidden_state[:, 0, :].numpy() return embedding[0] def chunk_document(self, text: str, chunk_size: int = 512) -> list: """Implement late chunking strategy""" # Get document-level embedding first doc_embedding = self._get_embedding(text) # Chunk the document words = text.split() chunks = [] current_chunk = [] current_length = 0 for word in words: word_length = len(self.tokenizer.encode(word)) if current_length + word_length > chunk_size: chunks.append(" ".join(current_chunk)) current_chunk = [word] current_length = word_length else: current_chunk.append(word) current_length += word_length if current_chunk: chunks.append(" ".join(current_chunk)) return chunks def add_document(self, doc_id: str, content: str): """Process and store document with context-aware chunking""" chunks = self.chunk_document(content) for i, chunk in enumerate(chunks): context = f"Document: {doc_id}, Chunk: {i+1}/{len(chunks)}" enriched_chunk = f"{context}\n\n{chunk}" embedding = self._get_embedding(enriched_chunk) self.document_store[f"{doc_id}_chunk_{i}"] = { "content": chunk, "context": context, "embedding": embedding } The Rise of Modern Retrieval Systems An Overview of Modern Retrieval Using Embedding Models Modern retrieval systems employ a two-phase approach to efficiently access relevant information. During the ingestion phase, documents are intelligently split into meaningful chunks, which preserve context and document structure. These chunks are then transformed into high-dimensional vector representations (embeddings) using neural models and stored in specialized vector databases. During retrieval, the system converts the user's query into an embedding using the same neural model and then searches the vector database for chunks whose embeddings have the highest cosine similarity to the query embedding. This similarity-based approach allows the system to find semantically relevant content even when exact keyword matches aren't present, making retrieval more robust and context-aware than traditional search methods. At the heart of these modern systems lies the critical process of document chunking and retrieval from embeddings, which has evolved significantly over time. Evolution of Document Ingestion The foundation of modern retrieval systems starts with document chunking — breaking down large documents into manageable pieces. This critical process has evolved from basic approaches to more sophisticated techniques: Traditional Chunking Document chunking began with two fundamental approaches: Fixed-size chunking. Documents are split into chunks of exactly specified token length (e.g., 256 or 512 tokens), with configurable overlap between consecutive chunks to maintain context. This straightforward approach ensures consistent chunk sizes but may break natural textual units. Semantic chunking. A more sophisticated approach that respects natural language boundaries while maintaining approximate chunk sizes. This method analyzes the semantic coherence between sentences and paragraphs to create more meaningful chunks Drawbacks of Traditional Chunking Consider an academic research paper split into 512-token chunks. The abstract might be split midway into two chunks, disconnecting the context of its introduction and conclusions. A retrieval model would struggle to identify the abstract as a cohesive unit, potentially missing the paper’s central theme. In contrast, semantic chunking may keep the abstract intact but might struggle with other sections, such as cross-referencing between the discussion and conclusion. These sections might end up in separate chunks, and the links between them could still be missed. Late Chunking: A Revolutionary Approach Legal documents, such as contracts, frequently contain references to clauses defined in other sections. Consider a 50-page employment contract where Section 2 states, 'The Employee shall be subject to the non-compete obligations detailed in Schedule A' while Schedule A, appearing 40 pages later, contains the actual restrictions like 'may not work for competing firms within 100 miles.' If someone searches for 'what are the non-compete restrictions?', traditional chunking that processes sections separately would likely miss this connection — the chunk with Section 2 lacks the actual restrictions, while the Schedule A chunk lacks the context that these are employee obligations Traditional chunking methods would likely split these references across chunks, making it difficult for retrieval models to maintain context. Late chunking, by embedding the entire document first, captures these cross-references seamlessly, enabling precise extraction of relevant clauses during a legal search. Late chunking represents a significant advancement in how we process documents for retrieval. Unlike traditional methods that chunk documents before processing, late chunking: First, processes the entire document through a long context embedding model Creates embeddings that capture the full document context Only then applies chunking boundaries to create final chunk representations This approach offers several advantages: Preserves long-range dependencies between different parts of the document Maintains context across chunk boundaries Improves handling of references and contextual elements Late chunking is particularly effective when combined with reranking strategies, where it has been shown to reduce retrieval failure rates by up to 49% Contextual Enablement: Adding Intelligence to Chunks Consider a 30-page annual financial report where critical information is distributed across different sections. The Executive Summary might mention "ACMECorp achieved significant growth in the APAC region," while the Regional Performance section states, "Revenue grew by 45% year-over-year," the Risk Factors section notes, "Currency fluctuations impacted reported earnings," and the Footnotes clarify "All APAC growth figures are reported in constant currency, excluding the acquisition of TechFirst Ltd." Now, imagine a query like "What was ACME's organic revenue growth in APAC?" A basic chunking system might return just the "45% year-over-year" chunk because it matches "revenue" and "growth." However, this would be misleading as it fails to capture critical context spread across the document: that this growth number includes an acquisition, that currency adjustments were made, and that the number is specifically for APAC. A single chunk in isolation could lead to incorrect conclusions or decisions — someone might cite the 45% as organic growth in investor presentations when, in reality, a significant portion came from M&A activity. One of the major limitations of basic chunking is the loss of context. This method aims to solve that context problem by adding relevant context to each chunk before processing. The process works by: Analyzing the original document to understand the broader context Generating concise, chunk-specific context (typically 50-100 tokens) Prepending this context to each chunk before creating embeddings Using both semantic embeddings and lexical matching (BM25) for retrieval This technique has shown impressive results, reducing retrieval failure rates by up to 49% in some implementations. Evolution of Retrieval Retrieval methods have seen dramatic advancement from simple keyword matching to today's sophisticated neural approaches. Early systems like BM25 relied on statistical term-frequency methods, matching query terms to documents based on word overlap and importance weights. The rise of deep learning brought dense retrieval methods like DPR (Dense Passage Retriever), which could capture semantic relationships by encoding both queries and documents into vector spaces. This enabled matching based on meaning rather than just lexical overlap. More recent innovations have pushed retrieval capabilities further. Hybrid approaches combining sparse (BM25) and dense retrievers help capture both exact matches and semantic similarity. The introduction of cross-encoders allowed for more nuanced relevance scoring by analyzing query-document pairs together rather than independently. With the emergence of large language models, retrieval systems gained the ability to understand and reason about content in increasingly sophisticated ways. Recursive Retrieval: Understanding Relationships Recursive retrieval advances the concept further by exploring relationships between different pieces of content. Instead of treating each chunk as an independent unit, it recognizes that chunks often have meaningful relationships with other chunks or structured data sources. Consider a real-world example of a developer searching for help with a memory leak in a Node.js application: 1. Initial Query "Memory leak in Express.js server handling file uploads." The system first retrieves high-level bug report summaries with similar symptoms A matching bug summary describes: "Memory usage grows continuously when processing multiple file uploads" 2. First Level Recursion From this summary, the system follows relationships to: Detailed error logs showing memory patterns Similar bug reports with memory profiling data Discussion threads about file upload memory management 3. Second Level Recursion Following the technical discussions, the system retrieves: Code snippets showing proper stream handling in file uploads Memory leak fixes in similar scenarios Relevant middleware configurations 4. Final Level Recursion For implementation, it retrieves: Actual code commits diffs that fixed similar issues Unit tests validating the fixes Performance benchmarks before and after fixes At each level, the retrieval becomes more specific and technical, following the natural progression from problem description to solution implementation. This layered approach helps developers not only find solutions but also understand the underlying causes and verification methods. This example demonstrates how recursive retrieval can create a comprehensive view of a problem and its solution by traversing relationships between different types of content. Other applications might include: A high-level overview chunk linking to detailed implementation chunks A summary chunk referencing an underlying database table A concept explanation connecting to related code examples During retrieval, the system not only finds the most relevant chunks but also explores these relationships to gather comprehensive context. Recursive retrieval takes the concept further by exploring relationships between different pieces of content. Instead of treating each chunk as an independent unit, it recognizes that some chunks might have special relationships with others or with structured data sources. For example, in a technical documentation system: A high-level overview chunk might link to detailed implementation chunks A summary chunk might reference an underlying database table A concept explanation might connect to related code examples During retrieval, the system not only finds the most relevant chunks but also explores these relationships to gather comprehensive context. A Special Case of Recursive Retrieval Hierarchical chunking represents a specialized implementation of recursive retrieval, where chunks are organized in a parent-child relationship. The system maintains multiple levels of chunks: Parent chunks – larger pieces providing a broader context Child chunks – smaller, more focused pieces of content The beauty of this approach lies in its flexibility during retrieval: Initial searches can target precise child chunks The system can then "zoom out" to include parent chunks for additional context Overlap between chunks can be carefully managed at each level Python import networkx as nx from typing import Set, Dict, List class RecursiveRetriever: def __init__(self, base_retriever): self.base_retriever = base_retriever self.relationship_graph = nx.DiGraph() def add_relationship(self, source_id: str, target_id: str, relationship_type: str): """Add a relationship between chunks""" self.relationship_graph.add_edge(source_id, target_id, relationship_type=relationship_type) def recursive_search(self, query: str, max_depth: int = 2) -> Dict[str, List[str]]: """Perform recursive retrieval""" results = {} visited = set() # Get initial results initial_results = self.base_retriever.search(query) first_level_ids = [doc_id for doc_id, _ in initial_results] results["level_0"] = first_level_ids visited.update(first_level_ids) # Recursively explore relationships for depth in range(max_depth): current_level_results = [] for doc_id in results[f"level_{depth}"]: related_docs = self._get_related_documents(doc_id, visited) current_level_results.extend(related_docs) visited.update(related_docs) if current_level_results: results[f"level_{depth + 1}"] = current_level_results return results # Usage example retriever = ModernRetrievalSystem() recursive = RecursiveRetriever(retriever) # Add relationships recursive.add_relationship("doc1_chunk_0", "doc2_chunk_0", "related_concept") results = recursive.recursive_search("modern retrieval techniques") Putting It All Together: Modern Retrieval Architecture Modern retrieval systems often combine multiple techniques to achieve optimal results. A typical architecture might: Use hierarchical chunking to maintain document structure Apply contextual embeddings to preserve semantic meaning Implement recursive retrieval to explore relationships Employ reranking to fine-tune results This combination can reduce retrieval failure rates by up to 67% compared to basic approaches. Multi-Modal Retrieval: Beyond Text As organizations increasingly deal with diverse content types, retrieval systems have evolved to handle multi-modal data effectively. The challenge extends beyond simple text processing to understanding and connecting information across images, audio, and video formats. The Multi-Modal Challenge Multi-modal retrieval faces two fundamental challenges: 1. Modality-Specific Complexity Each type of content presents unique challenges. Images, for instance, can range from simple photographs to complex technical diagrams, each requiring different processing approaches. A chart or graph might contain dense information that requires specialized understanding. 2. Cross-Modal Understanding Perhaps the most significant challenge is understanding relationships between different modalities. How does an image relate to its surrounding text? How can we connect a technical diagram with its explanation? These relationships are crucial for accurate retrieval. Solutions and Approaches Modern systems address these challenges through three main approaches: 1. Unified Embedding Space Uses models like CLIP to encode all content types in a single vector space Enables direct comparison between different modalities Simplifies retrieval but may sacrifice some nuanced understanding 2. Text-Centric Transformation Converts all content into text representations Leverages advanced language models for understanding Works well for text-heavy applications but may lose modal-specific details 3. Hybrid Processing Maintains specialized processing for each modality Uses sophisticated reranking to combine results Achieves better accuracy at the cost of increased complexity The choice of approach depends heavily on specific use cases and requirements, with many systems employing a combination of techniques to achieve optimal results. Looking Forward: The Future of Retrieval As AI and machine learning continue to advance, retrieval systems are becoming increasingly sophisticated. Future developments might include: More nuanced understanding of document structure and relationships Better handling of multi-modal content (text, images, video) Improved context preservation across different types of content More efficient processing of larger knowledge bases Conclusion The evolution from basic retrieval to answer generation systems reflects our growing need for more intelligent information access. Organizations can build more effective knowledge management systems by understanding and implementing techniques like contextual retrieval, recursive retrieval, and hierarchical chunking. As these technologies continue to evolve, we can expect even more sophisticated approaches to emerge, further improving our ability to find and utilize information effectively.