Celebrate a decade of Kubernetes. Explore why K8s continues to be one of the most prolific open-source systems in the SDLC.
With the guidance of FinOps experts, learn how to optimize AWS containers for performance and cost efficiency.
Building a CI/CD Pipeline With Kubernetes: A Development Guide With Deployment Considerations for Practitioners
Workarounds for Oracle Restrictions on the Size of Expression Lists
Kubernetes in the Enterprise
In 2014, Kubernetes' first commit was pushed to production. And 10 years later, it is now one of the most prolific open-source systems in the software development space. So what made Kubernetes so deeply entrenched within organizations' systems architectures? Its promise of scale, speed, and delivery, that is — and Kubernetes isn't going anywhere any time soon.DZone's fifth annual Kubernetes in the Enterprise Trend Report dives further into the nuances and evolving requirements for the now 10-year-old platform. Our original research explored topics like architectural evolutions in Kubernetes, emerging cloud security threats, advancements in Kubernetes monitoring and observability, the impact and influence of AI, and more, results from which are featured in the research findings.As we celebrate a decade of Kubernetes, we also look toward ushering in its future, discovering how developers and other Kubernetes practitioners are guiding the industry toward a new era. In the report, you'll find insights like these from several of our community experts; these practitioners guide essential discussions around mitigating the Kubernetes threat landscape, observability lessons learned from running Kubernetes, considerations for effective AI/ML Kubernetes deployments, and much more.
API Integration Patterns
Threat Detection
Are you a software developer or other tech professional? If you’re reading this, chances are pretty good that the answer is "yes." Long story short — we want DZone to work for you! We're asking that you take our annual community survey so we can better serve you! ^^ You can also enter the drawing for a chance to receive an exclusive DZone Swag Pack! The software development world moves fast, and we want to keep up! Across our community, we found that readers come to DZone for various reasons, including to learn about new development trends and technologies, find answers to help solve problems they have, connect with other peers, publish their content, and expand their personal brand's audience. In order to continue helping the DZone Community reach goals such as these, we need to know more about you, your learning preferences, and your overall experience on dzone.com and with the DZone team. For this year's DZone Community research, our primary goals are to: Learn about developer tech preferences and habits Identify content types and topics that developers want to get more information on Share this data for public consumption! To support our Community research, we're focusing on several primary areas in the survey: You, including your experience, the types of software you work on, and the tools you use How you prefer to learn and what you want to learn more about on dzone.com The ways in which you engage with DZone, your content likes vs. dislikes, and your overall journey on dzone.com As a community-driven site, our relationships with our members and contributors is invaluable, and we want to make sure that we continue to serve our audience to the best of our ability. If you're curious to see the report from the 2023 Community survey, feel free to check it out here! Thank you in advance for your participation!—Your favorite DZone Content and Community team
These days, restaurants, food banks, home kitchens, and any other business that deals with products and foods that go bad quickly need to have good food inventory management. Kitchens stay organized and waste is kept to a minimum by keeping track of stock, checking expiration dates, and managing usage well. I will show you how to make a Food Inventory Management App in this guide. With this app, users can: Add food items or ingredients to the inventory. Monitor the quantity of each item. Remove items when they’re used or expired. Optionally, generate recipes or suggest uses for the items. The Food Inventory Management App will not only track food items but also generate recipes based on the available ingredients using a Hugging Face model. I will use Next.js for the front end, Material-UI for the user interface, Firebase Firestore for real-time database functionality, and a Hugging Face model for recipe generation. Setting Up the Environment for Development We need to set up our working environment before we start writing code for our Food Inventory Management App. 1. Install Node.js and npm The first step is to install Node.js and npm. Go to the Node.js website and get the Long Term Support version for your computer's running system. Follow the steps given for installation. 2. Making a Project With Next.js Start up your terminal and go to the location where you want to make your project. After that, run these commands: npx create-next-app@latest food-inventory-management-app (With the @latest flag, npm gets the most recent version of the Next.js starting setup.) cd food-inventory-management-app It will make a new Next.js project and take you to its path. You'll be given a number of configuration choices during the setup process, set them as given below: Would you like to use TypeScript? No Would you like to use ESLint? Yes Would you like to use Tailwind CSS? No Would you like to use the src/ directory? No Would you like to use App Router? Yes Would you like to customize the default import alias? No 3. Installing Firebase and Material-UI In the directory of your project, execute the following command: npm install @mui/material @emotion/react @emotion/styled firebase Setting Up Firebase Launch a new project on the Firebase Console. Click "Add app" after your project has been built, then choose the web platform (</>). Give your app a name when you register it, such as "Food Inventory Management App." Make a copy of the Firebase setup file. Afterwards, this will be useful. 4. Create a Firebase Configuration File Make a new file called firebase.js in the root directory of your project and add the following code, replacing the placeholders with the real Firebase settings for your project: JavaScript import { initializeApp } from 'firebase/app'; import { getFirestore } from 'firebase/firestore'; const firebaseConfig = { apiKey: "YOUR_API_KEY", authDomain: "YOUR_PROJECT_ID.firebaseapp.com", projectId: "YOUR_PROJECT_ID", storageBucket: "YOUR_PROJECT_ID.appspot.com", messagingSenderId: "YOUR_MESSAGING_SENDER_ID", appId: "YOUR_APP_ID" }; const app = initializeApp(firebaseConfig); const db = getFirestore(app); export { db }; Building a Flask API for Recipe Generation Using Hugging Face I'll show you how to make a Flask-based API that uses a Hugging Face model to make recipes. With a POST request, users will be able to send ingredients to the API. It will then use a pre-trained model from Hugging Face to return a recipe based on those ingredients. We will use environment variables to safely handle Hugging Face tokens. 1. Setting Up the Python Environment Install Python if not already present (brew install python). Verify installation (python3 --version). Install dependencies (pip install Flask flask-cors transformers huggingface_hub). 2. Setting Up the Hugging Face Token Go to the Hugging Face website. If you already have an account, click on Sign In. If not, click Sign Up to create a new account. Navigate to the dropdown menu, and select Settings. In the Settings menu, look for the Access Tokens tab on the left side of the page and click on it. Under the Access Tokens section, you will see a button to create a new token. Click on New Token. Give your token a descriptive name (e.g., "Food Inventory App Token"). Choose Read as the token scope for basic access to models and datasets. If you need write access for uploading models or data, choose Write. Click Generate Token. The token will be displayed on the screen. After generating the token, copy it. Make sure to save it in a secure place, as you will need it for authentication when making API calls. 3. Keeping Hugging Face API Token Safe Your Hugging Face API token should be kept safely in an environment variable instead of being written in your script as code. To do this: Create an .env file in the root of your project: (touch .env). Inside this file, add your Hugging Face token (HF_TOKEN=your_hugging_face_token_here). Load this environment variable securely in your Flask app using Python’s os module. Python import os huggingface_token = os.getenv('HF_TOKEN') 4. Building the Flask API Flask app with Hugging Face's recipe generation model (note: this sample model is free). Name the file as backend.py. Python import os from flask import Flask, request, jsonify from flask_cors import CORS from huggingface_hub import login from transformers import pipeline app = Flask(__name__) CORS(app) # Securely get the Hugging Face token from the environment huggingface_token = os.getenv('HF_TOKEN') if huggingface_token: login(token=huggingface_token) # Load Hugging Face food recipe model pipeline try: model_name = "flax-community/t5-recipe-generation" recipe_generator = pipeline("text2text-generation", model=model_name) except Exception as e: print(f"Error loading model: {e}") recipe_generator = None @app.route('/generate_recipe', methods=['POST']) def generate_recipe(): data = request.json print("Hello") ingredients = data.get('ingredients') if not ingredients: return jsonify({"error": "Ingredients not provided."}), 500 if recipe_generator: try: response = recipe_generator(f"Generate a recipe using the following ingredients: {ingredients}") return jsonify({"recipe": response[0]['generated_text']}) except Exception as e: print(f"Error generating recipe: {e}") return jsonify({"error": "Error generating recipe"}), 500 else: return jsonify({"error": "Recipe generator model is not available."}), 500 if __name__ == '__main__': app.run(debug=True, port=5001) Note: The flax-community/t5-recipe-generation model is loaded using the Hugging Face pipeline. This model can be utilized to generate recipes using the given/stored ingredients. Building the Core Components for the Food Inventory Management 1. Import All Necessary Libraries TypeScript 'use client'; import React, { useEffect, useState } from 'react'; import { Box, Stack, Typography, Button, TextField, IconButton, Tabs, Tab } from '@mui/material'; import { DatePicker, LocalizationProvider } from '@mui/x-date-pickers'; import { AdapterDayjs } from '@mui/x-date-pickers/AdapterDayjs'; import { collection, addDoc, deleteDoc, doc, onSnapshot, updateDoc, query, where, getDocs } from 'firebase/firestore'; import { db } from './firebase'; // Firebase configuration import DeleteIcon from '@mui/icons-material/Delete'; import dayjs from 'dayjs'; import axios from 'axios'; In this step, we set up our component with the basic layout and imports it needs. This is a client-side component, as shown by the 'use client' command at the top. 2. Utility Functions TypeScript # We define a utility function that capitalizes the first letter of a string. const capitalizeFirstLetter = (string) => string.charAt(0).toUpperCase() + string.slice(1); 3. State Management and useEffect for Firestore Snapshot We need to set up states to keep track of pantry items, new item input, expiration dates, search queries, active tabs, and recipe suggestions. items: Stores pantry items newItem: Stores the name of the item to be added expirationDate: Stores the expiration date of the new item searchQuery: Stores the search input for filtering items tabIndex: Stores the current tab (Available, Soon to Expire, Expired) recipe: Stores the generated recipe The useEffect hook monitors changes in Firestore data using the onSnapshot method, ensuring that the pantry items are always up to date. TypeScript export default function Pantry() { const [items, setItems] = useState([]); const [newItem, setNewItem] = useState(''); const [expirationDate, setExpirationDate] = useState(null); const [searchQuery, setSearchQuery] = useState(''); const [tabIndex, setTabIndex] = useState(0); const [recipe, setRecipe] = useState(''); // Fetch items from Firestore and update state in real-time useEffect(() => { const unsubscribe = onSnapshot(collection(db, 'pantryItems'), (snapshot) => { const itemsList = snapshot.docs.map((doc) => ({ id: doc.id, name: doc.data().name, quantity: doc.data().quantity, expirationDate: doc.data().expirationDate, })); setItems(itemsList); }); return () => unsubscribe(); }, []); 4. Add a New Item to Firestore This function is used to add a new item to the Firestore database. If the item is already present, its quantity is increased. Alternatively, the new item can be added with a designated expiration date. TypeScript // Add a new item to Firestore const addItemToFirestore = async () => { if (newItem.trim() !== '' && expirationDate) { const q = query(collection(db, 'pantryItems'), where('name', '==', newItem)); const querySnapshot = await getDocs(q); if (querySnapshot.empty) { await addDoc(collection(db, 'pantryItems'), { name: newItem, quantity: 1, expirationDate: expirationDate.toISOString() }); } else { querySnapshot.forEach(async (document) => { const itemRef = doc(db, 'pantryItems', document.id); await updateDoc(itemRef, { quantity: document.data().quantity + 1 }); }); } setNewItem(''); setExpirationDate(null); } }; 5. Remove an Item from Firestore or Decrease Its Quantity This function either decreases the quantity of an existing item or removes the item entirely if the quantity reaches zero. TypeScript // Remove an item or decrease its quantity const removeItemFromFirestore = async (id) => { const itemRef = doc(db, 'pantryItems', id); const itemDoc = await getDoc(itemRef); if (itemDoc.exists()) { const currentQuantity = itemDoc.data().quantity; if (currentQuantity > 1) { await updateDoc(itemRef, { quantity: currentQuantity - 1 }); } else { await deleteDoc(itemRef); } } }; 6. Fetch Recipe Suggestions From Flask Backend This function sends a list of available items and items that are close to their expiration date to the Flask backend for recipe generation. The backend generates a recipe and stores it in the recipe state. TypeScript // Fetch recipe suggestions using ingredients const fetchRecipeSuggestions = async (availableItems, soonToExpireItems) => { const ingredients = [...availableItems, ...soonToExpireItems].map(item => item.name).join(', '); try { const response = await axios.post('http://127.0.0.1:5001/generate_recipe', { ingredients }); setRecipe(response.data.recipe); } catch (error) { console.error('Error fetching recipe suggestions:', error.message); setRecipe('Error fetching recipe suggestions. Please try again later.'); } }; 7. Filter and Categorize Items Based on Expiration The pantry items are sorted according to their expiration dates. Three categories can be established: Available Items, Soon to Expire, and Expired Items. TypeScript // Filter and categorize items based on expiration const filteredItems = items.filter((item) => item.name.toLowerCase().includes(searchQuery.toLowerCase())); const soonToExpireItems = filteredItems.filter((item) => dayjs(item.expirationDate).diff(dayjs(), 'day') <= 7); const expiredItems = filteredItems.filter((item) => dayjs(item.expirationDate).diff(dayjs(), 'day') <= 0); const availableItems = filteredItems.filter((item) => !soonToExpireItems.includes(item) && !expiredItems.includes(item)); Building the UI Components for the Food Inventory Management TypeScript return ( <LocalizationProvider dateAdapter={AdapterDayjs}> <Box> {/* Add new pantry item */} <Stack spacing={2}> <Typography>Add Pantry Item</Typography> <TextField label="Add Pantry Item" value={newItem} onChange={(e) => setNewItem(e.target.value)} /> <DatePicker label="Expiration Date" value={expirationDate} onChange={(newValue) => setExpirationDate(newValue)} /> <Button onClick={addItemToFirestore}>Add Item</Button> </Stack> {/* Search and Tabs */} <TextField label="Search Pantry Items" value={searchQuery} onChange={(e) => setSearchQuery(e.target.value)} /> <Tabs value={tabIndex} onChange={(e, newValue) => setTabIndex(newValue)}> <Tab label="Available Items" /> <Tab label="Soon to Expire" /> <Tab label="Expired Items" /> </Tabs> {/* Display Items */} {tabIndex === 0 && availableItems.map((item) => ( <Box key={item.id}> <Typography>{capitalizeFirstLetter(item.name)} - {item.quantity}</Typography> <IconButton onClick={() => removeItemFromFirestore(item.id)}><DeleteIcon /></IconButton> </Box> ))} {tabIndex === 1 && soonToExpireItems.map((item) => ( <Box key={item.id}> <Typography>{capitalizeFirstLetter(item.name)} - {item.quantity} (Expires: {dayjs(item.expirationDate).format('YYYY-MM-DD')})</Typography> <IconButton onClick={() => removeItemFromFirestore(item.id)}><DeleteIcon /></IconButton> </Box> ))} {tabIndex === 2 && expiredItems.map((item) => ( <Box key={item.id}> <Typography>{capitalizeFirstLetter(item.name)} - {item.quantity} (Expired: {dayjs(item.expirationDate).format('YYYY-MM-DD')})</Typography> <IconButton onClick={() => removeItemFromFirestore(item.id)}><DeleteIcon /></IconButton> </Box> ))} {/* Fetch Recipe Suggestions */} <Button onClick={() => fetchRecipeSuggestions(availableItems, soonToExpireItems)}>Get Recipe Suggestions</Button> {recipe && <Typography>{recipe}</Typography>} </Box> </LocalizationProvider> ); } Explanation of UI Components 1. <Box> (Container) Purpose: Acts as a flexible container for managing layout, padding, and alignment Used for: Wrapping sections like the form, search bar, and item lists 2. <Stack> (Vertical/Horizontal Layout) Purpose: Organizes child components in a vertical or horizontal layout Used for: Structuring form elements and item listings with proper spacing 3. <Typography> (Text Display) Purpose: Renders and styles text content Used for: Displaying headings, item names, expiration dates, and recipe suggestions 4. <TextField> (Input Field) Purpose: Provides a text input field. Used for: Inputting new pantry item names and search queries 5. <DatePicker> (Date Selection) Purpose: Allows users to pick a date from a calendar Used for: Selecting expiration dates for pantry items, integrated with the Day.js adapter 6. <Button> (Clickable Button) Purpose: A clickable button for actions Used for: Adding items to Firestore, fetching recipes, and interacting with the database 7. <Tabs> and <Tab> (Tab Navigation) Purpose: Creates a tabbed interface for navigation Used for: Switching between available, soon-to-expire, and expired items 8. <IconButton> (Icon-Based Button) Purpose: Button with an icon for quick actions. Used for: Deleting or reducing the quantity of items, using a delete icon 9. <LocalizationProvider> (Date Localization) Purpose: Manages date localization and formatting Used for: Ensuring correct display and handling of dates in the date picker 10. <DeleteIcon> (Icon) Purpose: Displays a delete icon for action Used for: Indicating delete action on buttons for item removal 11. Recipe Suggestion Section Purpose: Displays recipe suggestions based on available ingredients Used for: Showing the recipe generated by the Flask API when the "Get Recipe Suggestions" button is clicked 12. <Grid> (Responsive Layout) Purpose: Creates responsive layouts with flexible columns. Used for aligning content: Organizing elements like forms and buttons within a structured grid. Dividing UI into columns: Structuring content into columns and rows for a clean, responsive layout on various screen sizes. Running the Food Inventory Management Application 1. Start the Development Server TypeScript npm run dev Navigate to http://localhost:3000 as prompted by opening your browser. 2. Start the Flask Development Server Start the Flask Development Server along with the below. Python python backend.py This will initiate the Flask API at http://127.0.0.1:5001. Please remember the interaction between React and Flask is done using Axios to send HTTP requests and display the results in real time. The Hugging Face model I used is free to use. If you want to use a different model, like Llama, you can do that too. Sample Image of the Food Inventory Management Application After Development Conclusion Congratulations! You have successfully developed a functional Food Inventory Management Application. Happy coding!
Java 23 is finally out, and we can start migrating our project to it. The very first pitfall comes quickly when switching to the latest JDK 23 with compilation issues when using the Lombok library in your project. Let's begin with the symptom description first. Description The Lombok library heavily relies on annotations. It's used for removing a lot of boilerplate code; e.g., getters, setters, toString, loggers, etc. @Slf4j usage for simplified logging configuration Maven compilation errors coming from Lombok and Java 23 look like this: Plain Text [INFO] --- [compiler:3.13.0:compile [ (default-compile) @ sat-core --- [WARNING] Parameter 'forceJavacCompilerUse' (user property 'maven.compiler.forceJavacCompilerUse') is deprecated: Use forceLegacyJavacApi instead [INFO] Recompiling the module because of changed source code [INFO] Compiling 50 source files with javac [debug parameters release 23] to target\classes [INFO] ------------------------------------------------------------- [ERROR] COMPILATION ERROR : [INFO] ------------------------------------------------------------- [ERROR] spring-advanced-training\sat-core\src\main\java\com\github\aha\sat\core\aop\BeverageLogger.java:[21,2] error: cannot find symbol symbol: variable log location: class BeverageLogger ... [INFO] 16 errors [INFO] ------------------------------------------------------------- [INFO] ------------------------------------------------------------------------ [INFO] [BUILD FAILURE [INFO] ------------------------------------------------------------------------ [INFO] Total time: 3.090 s [INFO] Finished at: 2024-09-26T08:45:59+02:00 [INFO] ------------------------------------------------------------------------ [ERROR] Failed to execute goal [org.apache.maven.plugins:maven-compiler-plugin:3.13.0:compile (default-compile) on project sat-core: Compilation failure: Compilation failure: [ERROR] spring-advanced-training\sat-core\src\main\java\com\github\aha\sat\core\aop\BeverageLogger.java:[21,2] error: cannot find symbol [ERROR] symbol: variable log [ERROR] location: class BeverageLogger ... Note: The @Slf4j annotation is just an example. It's demonstrated here because these are the first errors in the build logs. However, it's related to any other already mentioned Lombok annotation. Explanation The compilation error is caused by a change in the behavior of annotation processing in Java 23. See JDK 23 Release notes and this statement: As of JDK 23, annotation processing is only run with some explicit configuration of annotation processing or with an explicit request to run annotation processing on the javac command line. This is a change in behavior from the existing default of looking to run annotation processing by searching the class path for processors without any explicit annotation processing related options needing to be present. You can find more details about it here. Solution In order to be able to use Lombok with the new Java 23, we need to turn on the full compilation processing. It can be done in Maven as: To have the latest maven-compiler-version (it's version 3.13.0 at the time of writing this article) Setup maven.compiler.proc property with full value. XML <properties> ... <java.version>23</java.version> <maven-compiler-plugin.version>3.13.0</maven-compiler-plugin.version> <maven.compiler.proc>full</maven.compiler.proc> </properties> <build> <plugins> ... <plugin> <groupId>org.apache.maven.plugins</groupId> <artifactId>maven-compiler-plugin</artifactId> <version>${maven-compiler-plugin.version}</version> <configuration> <source>${java.version}</source> <target>${java.version}</target> </configuration> </plugin> </plugins> </build> It's all we need to make our project compilable again. Plain Text [INFO] --- compiler:3.13.0:compile (default-compile) @ sat-core --- [WARNING] Parameter 'forceJavacCompilerUse' (user property 'maven.compiler.forceJavacCompilerUse') is deprecated: Use forceLegacyJavacApi instead [INFO] Recompiling the module because of changed source code. [INFO] Compiling 50 source files with javac [debug parameters release 23] to target\classes [INFO] [INFO] --- resources:3.3.1:testResources (default-testResources) @ sat-core --- [INFO] Copying 2 resources from src\test\resources to target\test-classes Conclusion This article has covered the issue related to using the Lombok library and upgrading to JDK 23. The complete change (but with more changes) is visible in this GitHub commit.
In this article, we’ll dive deep into the concept of database sharding, a critical technique for scaling databases to handle large volumes of data and high levels of traffic. Here’s what you can expect to learn: What is Sharding?: We’ll start by defining what sharding is and why it’s essential for modern, high-performance databases. You’ll understand how sharding can help overcome the limitations of traditional database scaling methods. Types of Sharding: Next, we’ll explore the different types of sharding, including horizontal and vertical sharding. We’ll discuss the benefits and challenges of each approach, helping you decide which might be best for your use case. Selecting a Shard Key: Choosing the right shard key is crucial for the success of a sharded database. In this section, we’ll walk through the factors to consider when selecting a shard key, common mistakes to avoid, and how to balance performance with even data distribution. Routing Requests to Shards: Finally, we’ll cover the methods for routing queries to the correct shard. This section will provide insights into the architecture and strategies to ensure efficient query processing in a sharded environment. By the end of this guide, you’ll have a comprehensive understanding of database sharding, enabling you to implement it effectively in your systems. What Is Sharding? Sharding is a database architecture pattern that involves partitioning your data into smaller, more manageable pieces, known as “shards.” Each shard is a separate database that contains a subset of the total data. The primary goal of sharding is to distribute the load across multiple databases, enabling the system to scale horizontally as data volume and traffic increase. In a traditional, single-database setup, all data is stored in one place. As your application grows, this database can become a bottleneck, leading to performance issues like slow query response times and limited capacity for handling concurrent users. Sharding helps mitigate these issues by spreading the data across multiple servers, each responsible for a specific portion of the data. Sharding is particularly beneficial for applications with large datasets, high transaction volumes, or the need for geographic distribution of data. By breaking down the data into smaller pieces, you can improve performance, reduce the risk of downtime, and scale your system more efficiently. However, sharding is not without its challenges. It introduces additional complexity in managing data consistency, query routing, and maintaining balanced shards. Therefore, it’s essential to carefully plan and implement sharding to maximize its benefits and minimize potential downsides. Types of Sharding When it comes to sharding, there are several approaches you can take, each with its own advantages and trade-offs. The two most common types of sharding are horizontal sharding and vertical sharding. Horizontal Sharding (Range-Based Sharding) Horizontal sharding involves splitting rows of a table across multiple shards. Each shard contains a subset of the rows, usually based on a range of values in a particular column, such as user IDs or timestamps. Example Shard 1: Users with IDs from 1 to 1,000,000 Shard 2: Users with IDs from 1,000,001 to 2,000,000 Advantages Scalability: Easily add more shards as data grows Load distribution: Distributes load evenly if data is uniformly accessed Challenges Complex queries: Queries spanning multiple shards can be complex Data skew: Uneven data distribution can lead to overloaded shards Vertical Sharding Vertical sharding involves splitting a database by separating tables into different shards. Each shard contains a subset of the columns or tables. Example Shard 1: User profiles and authentication data Shard 2: Transaction data and order history Advantages Simplified queries: Queries involving only one shard are faster. Specialization: Optimize each shard for specific data types. Challenges Cross-shard ioins: Joins across shards are inefficient. Limited scalability: Scalability is limited by the number of tables. Hybrid Sharding Hybrid sharding combines horizontal and vertical sharding to leverage the benefits of both. Example Horizontally shard: Large tables like user data across multiple shards Vertically shard: Separate less frequently accessed tables. Advantages Flexibility: Tailor sharding strategies to different data types. Optimized performance: Optimize shards based on specific needs. Challenges Increased complexity: Managing multiple sharding strategies is complex Sophisticated routing: Requires advanced query routing logic Selecting a Shard Key Choosing the right shard key is one of the most critical decisions when implementing database sharding. The shard key determines how your data is distributed across different shards, directly impacting the system’s performance, scalability, and complexity. What Is a Shard Key? A shard key is a column or a set of columns from your database that is used to determine the distribution of data across shards. Essentially, it is the basis for partitioning your data, where each shard will handle a specific range or set of values based on the shard key. Criteria for Selecting a Shard Key Uniform distribution of data: The primary goal when selecting a shard key is to ensure an even distribution of data across all shards. If your shard key leads to uneven data distribution (data skew), some shards will become hotspots, handling more data and traffic than others. This imbalance can negate the benefits of sharding by creating bottlenecks. For instance, if you choose a shard key that correlates with time, such as a timestamp, you might end up with all recent data being stored in a single shard, overloading it while other shards remain underutilized. Query performance: Your shard key should be chosen with the most common queries in mind. If most queries filter data based on a specific column, using that column as the shard key can lead to efficient query routing, as the system will know exactly which shard to query. For example, if user-related queries are predominant, a shard key based on user ID can direct queries to the correct shard without unnecessary lookups across multiple shards. Scalability: Consider future growth when selecting your shard key. The shard key should allow for easy addition of new shards as data volume increases. Keys that naturally support range or hash-based distribution are often good candidates for scaling. For example, a hash-based shard key evenly distributes data by hashing the key values. This approach makes it easier to add new shards by redistributing the hash space, minimizing the need for complex data migrations. Minimizing cross-shard operations: Cross-shard operations, such as joins or transactions that span multiple shards, can be costly in terms of performance and complexity. Choosing a shard key that aligns with your application’s data access patterns can help minimize these operations. For instance, if your application frequently performs transactions that involve a user’s orders, sharding by user ID ensures that all related data resides in the same shard, avoiding expensive cross-shard operations. Common Shard Key Strategies Range-Based Shard Key A range-based shard key involves dividing data into shards based on a continuous range of values. This approach works well when the data distribution is relatively uniform and predictable, such as numeric IDs or dates. However, it can lead to data skew if the distribution is uneven. Hash-Based Shard Key A hash-based shard key distributes data based on a hash function applied to the key value. This approach typically results in a more uniform distribution of data and is less prone to skew. However, it may complicate range queries, as data is spread non-sequentially across shards. Composite Shard Key A composite shard key uses multiple columns to determine the shard placement. This strategy can provide more granular control over data distribution and help optimize query performance by accounting for multiple access patterns. Potential Pitfalls Hotspots: Avoid shard keys that could lead to hotspots, where a significant portion of queries target a single shard. Imbalanced shards: Be cautious of shard keys that might result in uneven data distribution, causing some shards to store significantly more data than others. Complexity in query routing: Ensure that the shard key simplifies, rather than complicates, the process of routing queries to the correct shard. Selecting the right shard key is a balancing act that requires careful consideration of your data distribution, query patterns, and scalability needs. A well-chosen shard key can significantly enhance the performance and efficiency of your sharded database, while a poor choice can lead to a range of issues, from performance bottlenecks to complex query routing challenges. Routing Requests to Shards Once you’ve established a sharded database architecture, the next critical challenge is ensuring that queries and data operations are efficiently routed to the correct shard. Proper routing is essential for maintaining high performance and ensuring that your application scales effectively. Understanding the Routing Process In a sharded database, routing refers to the process of determining which shard should handle a particular query or data operation. This decision is based on the shard key, which, as discussed in the previous section, is used to partition the data across different shards. The routing process ensures that queries are directed to the shard containing the relevant data, thereby reducing the load on the overall system and improving query response times. Common Routing Strategies Application-Level Routing In application-level routing, the logic for determining which shard to query is built directly into the application code. The application uses the shard key to calculate which shard should handle a specific request. Advantages: Customization: Application-level routing allows for customized logic, making it easier to optimize for specific use cases. Flexibility: The application can implement complex routing rules or adjust the routing logic dynamically based on real-time data. Challenges: Complexity: Implementing and maintaining routing logic in the application adds complexity, requiring developers to manage and update the logic as the application evolves. Increased Latency: If not optimized, application-level routing can introduce additional latency, as the application must determine the appropriate shard before executing the query. Middleware or Proxy-Based Routing Middleware or proxy-based routing involves using an intermediary layer between the application and the database. This middleware is responsible for routing queries to the correct shard based on the shard key. Advantages: Centralized management: Routing logic is centralized, making it easier to manage and update without changing the application code. Consistency: Middleware ensures consistent routing logic across different parts of the application. Challenges: Single point of failure: The middleware layer can become a bottleneck or a single point of failure if not properly scaled or managed. Additional overhead: Introducing a middleware layer can add extra overhead to the query execution process, potentially impacting performance. Database-Level Routing In database-level routing, the database system itself handles the routing of queries to the appropriate shard. This approach is common in databases that natively support sharding, where the database automatically routes queries based on the shard key. Advantages: Simplicity: Database-level routing abstracts the complexity of routing from the application, allowing developers to focus on business logic rather than database management. Automatic load balancing: Many database systems with built-in sharding capabilities also include features for load balancing across shards, optimizing performance. Challenges: Limited customization: Relying on database-level routing may limit the ability to implement custom routing logic tailored to specific application needs. Vendor lock-in: Using database-level routing often ties you to a specific database vendor or technology, making it harder to switch systems in the future. Factors to Consider When Implementing Routing Query Patterns Analyze the query patterns in your application to ensure that the chosen routing strategy optimizes for the most common types of queries. For instance, if your application frequently retrieves data based on a specific user ID, ensure that the routing logic efficiently handles these queries. Scalability As your data grows and the number of shards increases, the routing strategy should scale accordingly. Middleware and database-level routing solutions often include built-in mechanisms for scaling, whereas application-level routing might require additional development effort to manage scalability. Fault Tolerance Ensure that your routing strategy includes mechanisms for handling shard failures or unavailability. For example, middleware-based solutions can include fallback mechanisms to reroute queries to backup shards in the event of a failure. Latency Minimize the latency introduced by the routing process. Each layer of routing logic adds potential delays to query execution, so it’s crucial to optimize the routing path to maintain high performance. Best Practices 1. Careful Shard Key Selection Analyze data access patterns: Choose a shard key that ensures even data distribution and aligns with query patterns. Avoid hotspots: Prevent any shard from becoming a performance bottleneck. 2. Design for Scalability Modular architecture: Facilitate easy addition of new shards. Future-proofing: Plan for data growth and increased traffic. 3. Efficient Routing Logic Optimize query routing: Ensure quick and accurate routing to the correct shard. Implement fallbacks: Prepare for shard failures with robust error handling. 4. Maintain Data Consistency Limit cross-shard transactions: Reduce complexity and performance overhead. Consistency models: Adopt appropriate data consistency models. 5. Monitor Performance Use monitoring tools: Track performance metrics and shard health. Regular audits: Periodically assess the effectiveness of your sharding strategy. 6. Automate Maintenance Scheduled tasks: Automate routine maintenance like backups and rebalancing. Disaster recovery: Regularly test backup and recovery procedures. 7. Security and Compliance Data protection: Secure data using encryption and access controls. Regulatory compliance: Ensure adherence to laws like GDPR. Final Recommendations Start simple: Begin with a straightforward strategy and adapt as needed. Stay updated: Keep abreast of the latest developments in sharding technologies. Seek expertise: Consult with experienced professionals when necessary. Conclusion Database sharding is a powerful technique for building scalable and high-performance systems. By understanding the different types of sharding, carefully selecting your shard key, efficiently routing requests, and adhering to best practices, you can overcome the limitations of traditional database scaling methods. Implementing sharding requires careful planning and ongoing maintenance, but the benefits in terms of performance and scalability make it a worthwhile investment for many applications. Stay vigilant for signs that resharding or rebalancing may be necessary, and be proactive in addressing these challenges to ensure your system remains robust and efficient.
The AIDocumentLibraryChat project has been extended to generate test code (Java code has been tested). The project can generate test code for publicly available GitHub projects. The URL of the class to test can be provided then the class is loaded, the imports are analyzed and the dependent classes in the project are also loaded. That gives the LLM the opportunity to consider the imported source classes while generating mocks for tests. The testUrl can be provided to give an example to the LLM to base the generated test. The granite-code and deepseek-coder-v2 models have been tested with Ollama. The goal is to test how well the LLMs can help developers create tests. Implementation Configuration To select the LLM model the application-ollama.properties file needs to be updated: Properties files spring.ai.ollama.base-url=${OLLAMA-BASE-URL:http://localhost:11434} spring.ai.ollama.embedding.enabled=false spring.ai.embedding.transformer.enabled=true document-token-limit=150 embedding-token-limit=500 spring.liquibase.change-log=classpath:/dbchangelog/db.changelog-master-ollama.xml ... # generate code #spring.ai.ollama.chat.model=granite-code:20b #spring.ai.ollama.chat.options.num-ctx=8192 spring.ai.ollama.chat.options.num-thread=8 spring.ai.ollama.chat.options.keep_alive=1s spring.ai.ollama.chat.model=deepseek-coder-v2:16b spring.ai.ollama.chat.options.num-ctx=65536 The spring.ai.ollama.chat.model selects the LLM code model to use. The spring.ollama.chat.options.num-ctx sets the number of tokens in the context window. The context window contains the tokens required by the request and the tokens required by the response. The spring.ollama.chat.options.num-thread can be used if Ollama does not choose the right amount of cores to use. The spring.ollama.chat.options.keep_alive sets the number of seconds the context window is retained. Controller The interface to get the sources and to generate the test is the controller: Java @RestController @RequestMapping("rest/code-generation") public class CodeGenerationController { private final CodeGenerationService codeGenerationService; public CodeGenerationController(CodeGenerationService codeGenerationService) { this.codeGenerationService = codeGenerationService; } @GetMapping("/test") public String getGenerateTests(@RequestParam("url") String url, @RequestParam(name = "testUrl", required = false) String testUrl) { return this.codeGenerationService.generateTest(URLDecoder.decode(url, StandardCharsets.UTF_8), Optional.ofNullable(testUrl).map(myValue -> URLDecoder.decode(myValue, StandardCharsets.UTF_8))); } @GetMapping("/sources") public GithubSources getSources(@RequestParam("url") String url, @RequestParam(name="testUrl", required = false) String testUrl) { var sources = this.codeGenerationService.createTestSources( URLDecoder.decode(url, StandardCharsets.UTF_8), true); var test = Optional.ofNullable(testUrl).map(myTestUrl -> this.codeGenerationService.createTestSources( URLDecoder.decode(myTestUrl, StandardCharsets.UTF_8), false)) .orElse(new GithubSource("none", "none", List.of(), List.of())); return new GithubSources(sources, test); } } The CodeGenerationController has the method getSources(...). It gets the URL and optionally the testUrl for the class to generate tests for and for the optional example test. It decodes the request parameters and calls the createTestSources(...) method with them. The method returns the GithubSources with the sources of the class to test, its dependencies in the project, and the test example. The method getGenerateTests(...) gets the url for the test class and the optional testUrl to be url decoded and calls the method generateTests(...) of the CodeGenerationService. Service The CodeGenerationService collects the classes from GitHub and generates the test code for the class under test. The Service with the prompts looks like this: Java @Service public class CodeGenerationService { private static final Logger LOGGER = LoggerFactory .getLogger(CodeGenerationService.class); private final GithubClient githubClient; private final ChatClient chatClient; private final String ollamaPrompt = """ You are an assistant to generate spring tests for the class under test. Analyse the classes provided and generate tests for all methods. Base your tests on the example. Generate and implement the test methods. Generate and implement complete tests methods. Generate the complete source of the test class. Generate tests for this class: {classToTest} Use these classes as context for the tests: {contextClasses} {testExample} """; private final String ollamaPrompt1 = """ You are an assistant to generate a spring test class for the source class. 1. Analyse the source class 2. Analyse the context classes for the classes used by the source class 3. Analyse the class in test example to base the code of the generated test class on it. 4. Generate a test class for the source class, use the context classes as sources for it and base the code of the test class on the test example. Generate the complete source code of the test class implementing the tests. {testExample} Use these context classes as extension for the source class: {contextClasses} Generate the complete source code of the test class implementing the tests. Generate tests for this source class: {classToTest} """; @Value("${spring.ai.ollama.chat.options.num-ctx:0}") private Long contextWindowSize; public CodeGenerationService(GithubClient githubClient, ChatClient chatClient) { this.githubClient = githubClient; this.chatClient = chatClient; } This is the CodeGenerationService with the GithubClient and the ChatClient. The GithubClient is used to load the sources from a publicly available repository and the ChatClient is the Spring AI interface to access the AI/LLM. The ollamaPrompt is the prompt for the IBM Granite LLM with a context window of 8k tokens. The {classToTest} is replaced with the source code of the class under test. The {contextClasses} can be replaced with the dependent classes of the class under test and the {testExample} is optional and can be replaced with a test class that can serve as an example for the code generation. The ollamaPrompt2 is the prompt for the Deepseek Coder V2 LLM. This LLM can "understand" or work with a chain of thought prompt and has a context window of more than 64k tokens. The {...} placeholders work the same as in the ollamaPrompt. The long context window enables the addition of context classes for code generation. The contextWindowSize property is injected by Spring to control if the context window of the LLM is big enough to add the {contextClasses} to the prompt. The method createTestSources(...) collects and returns the sources for the AI/LLM prompts: Java public GithubSource createTestSources(String url, final boolean referencedSources) { final var myUrl = url.replace("https://github.com", GithubClient.GITHUB_BASE_URL).replace("/blob", ""); var result = this.githubClient.readSourceFile(myUrl); final var isComment = new AtomicBoolean(false); final var sourceLines = result.lines().stream().map(myLine -> myLine.replaceAll("[\t]", "").trim()) .filter(myLine -> !myLine.isBlank()).filter(myLine -> filterComments(isComment, myLine)).toList(); final var basePackage = List.of(result.sourcePackage() .split("\\.")).stream().limit(2) .collect(Collectors.joining(".")); final var dependencies = this.createDependencies(referencedSources, myUrl, sourceLines, basePackage); return new GithubSource(result.sourceName(), result.sourcePackage(), sourceLines, dependencies); } private List<GithubSource> createDependencies(final boolean referencedSources, final String myUrl, final List<String> sourceLines, final String basePackage) { return sourceLines.stream().filter(x -> referencedSources) .filter(myLine -> myLine.contains("import")) .filter(myLine -> myLine.contains(basePackage)) .map(myLine -> String.format("%s%s%s", myUrl.split(basePackage.replace(".", "/"))[0].trim(), myLine.split("import")[1].split(";")[0].replaceAll("\\.", "/").trim(), myUrl.substring(myUrl.lastIndexOf('.')))) .map(myLine -> this.createTestSources(myLine, false)).toList(); } private boolean filterComments(AtomicBoolean isComment, String myLine) { var result1 = true; if (myLine.contains("/*") || isComment.get()) { isComment.set(true); result1 = false; } if (myLine.contains("*/")) { isComment.set(false); result1 = false; } result1 = result1 && !myLine.trim().startsWith("//"); return result1; } The method createTestSources(...) with the source code of the GitHub source url and depending on the value of the referencedSources the sources of the dependent classes in the project provide the GithubSource records. To do that the myUrl is created to get the raw source code of the class. Then the githubClient is used to read the source file as a string. The source string is then turned in source lines without formatting and comments with the method filterComments(...). To read the dependent classes in the project the base package is used. For example in a package ch.xxx.aidoclibchat.usecase.service the base package is ch.xxx. The method createDependencies(...) is used to create the GithubSource records for the dependent classes in the base packages. The basePackage parameter is used to filter out the classes and then the method createTestSources(...) is called recursively with the parameter referencedSources set to false to stop the recursion. That is how the dependent class GithubSource records are created. The method generateTest(...) is used to create the test sources for the class under test with the AI/LLM: Java public String generateTest(String url, Optional<String> testUrlOpt) { var start = Instant.now(); var githubSource = this.createTestSources(url, true); var githubTestSource = testUrlOpt.map(testUrl -> this.createTestSources(testUrl, false)) .orElse(new GithubSource(null, null, List.of(), List.of())); String contextClasses = githubSource.dependencies().stream() .filter(x -> this.contextWindowSize >= 16 * 1024) .map(myGithubSource -> myGithubSource.sourceName() + ":" + System.getProperty("line.separator") + myGithubSource.lines().stream() .collect(Collectors.joining(System.getProperty("line.separator"))) .collect(Collectors.joining(System.getProperty("line.separator"))); String testExample = Optional.ofNullable(githubTestSource.sourceName()) .map(x -> "Use this as test example class:" + System.getProperty("line.separator") + githubTestSource.lines().stream() .collect(Collectors.joining(System.getProperty("line.separator")))) .orElse(""); String classToTest = githubSource.lines().stream() .collect(Collectors.joining(System.getProperty("line.separator"))); LOGGER.debug(new PromptTemplate(this.contextWindowSize >= 16 * 1024 ? this.ollamaPrompt1 : this.ollamaPrompt, Map.of("classToTest", classToTest, "contextClasses", contextClasses, "testExample", testExample)).createMessage().getContent()); LOGGER.info("Generation started with context window: {}", this.contextWindowSize); var response = chatClient.call(new PromptTemplate( this.contextWindowSize >= 16 * 1024 ? this.ollamaPrompt1 : this.ollamaPrompt, Map.of("classToTest", classToTest, "contextClasses", contextClasses, "testExample", testExample)).create()); if((Instant.now().getEpochSecond() - start.getEpochSecond()) >= 300) { LOGGER.info(response.getResult().getOutput().getContent()); } LOGGER.info("Prompt tokens: " + response.getMetadata().getUsage().getPromptTokens()); LOGGER.info("Generation tokens: " + response.getMetadata().getUsage().getGenerationTokens()); LOGGER.info("Total tokens: " + response.getMetadata().getUsage().getTotalTokens()); LOGGER.info("Time in seconds: {}", (Instant.now().toEpochMilli() - start.toEpochMilli()) / 1000.0); return response.getResult().getOutput().getContent(); } To do that the createTestSources(...) method is used to create the records with the source lines. Then the string contextClasses is created to replace the {contextClasses} placeholder in the prompt. If the context window is smaller than 16k tokens the string is empty to have enough tokens for the class under test and the test example class. Then the optional testExample string is created to replace the {testExample} placeholder in the prompt. If no testUrl is provided the string is empty. Then the classToTest string is created to replace the {classToTest} placeholder in the prompt. The chatClient is called to send the prompt to the AI/LLM. The prompt is selected based on the size of the context window in the contextWindowSize property. The PromptTemplate replaces the placeholders with the prepared strings. The response is used to log the amount of the prompt tokens, the generation tokens, and the total tokens to be able to check if the context window boundary was honored. Then the time to generate the test source is logged and the test source is returned. If the generation of the test source took more than 5 minutes the test source is logged as protection against browser timeouts. Conclusion Both models have been tested to generate Spring Controller tests and Spring service tests. The test URLs have been: http://localhost:8080/rest/code-generation/test?url=https://github.com/Angular2Guy/MovieManager/blob/master/backend/src/main/java/ch/xxx/moviemanager/adapter/controller/ActorController.java&testUrl=https://github.com/Angular2Guy/MovieManager/blob/master/backend/src/test/java/ch/xxx/moviemanager/adapter/controller/MovieControllerTest.java http://localhost:8080/rest/code-generation/test?url=https://github.com/Angular2Guy/MovieManager/blob/master/backend/src/main/java/ch/xxx/moviemanager/usecase/service/ActorService.java&testUrl=https://github.com/Angular2Guy/MovieManager/blob/master/backend/src/test/java/ch/xxx/moviemanager/usecase/service/MovieServiceTest.java The granite-code:20b LLM on Ollama has a context window of 8k tokens. That is too small to provide contextClasses and have enough tokens for a response. That means the LLM just had the class under test and the test example to work with. The deepseek-coder-v2:16b LLM on Ollama has a context window of more than 64k tokens. That enabled the addition of the contextClasses to the prompt and it is able to work with a chain of thought prompt. Results The Granite-Code LLM was able to generate a buggy but useful basis for a Spring service test. No test worked but the missing parts could be explained with the missing context classes. The Spring Controller test was not so good. It missed too much code to be useful as a basis. The test generation took more than 10 minutes on a medium-power laptop CPU. The Deepseek-Coder-V2 LLM was able to create a Spring service test with the majority of the tests working. That was a good basis to work with and the missing parts were easy to fix. The Spring Controller test had more bugs but was a useful basis to start from. The test generation took less than ten minutes on a medium-power laptop CPU. Opinion The Deepseek-Coder-V2 LLM can help with writing tests for Spring applications. For productive use, GPU acceleration is needed. The LLM is not able to create non-trivial code correctly, even with context classes available. The help a LLM can provide is very limited because LLMs do not understand the code. Code is just characters for a LLM and without an understanding of language syntax, the results are not impressive. The developer has to be able to fix all the bugs in the tests. That means it just saves some time typing the tests. The experience with GitHub Copilot is similar to the Granite-Code LLM. As of September 2024, the context window is too small to do good code generation and the code completion suggestions need to be ignored too often. Is a LLM a help -> yes. Is the LLM a large timesaver -> no.
Editor's Note: The following is an article written for and published in DZone's 2024 Trend Report, Kubernetes in the Enterprise: Once Decade-Defining, Now Forging a Future in the SDLC. A decade ago, Google introduced Kubernetes to simplify the management of containerized applications. Since then, it has fundamentally transformed the software development and operations landscape. Today, Kubernetes has seen numerous enhancements and integrations, becoming the de facto standard for container orchestration. This article explores the journey of Kubernetes over the past 10 years, its impact on the software development lifecycle (SDLC) and developers, and the trends and innovations that will shape its next decade. The Evolution of Kubernetes Kubernetes, often referred to as K8s, had its first commit pushed to GitHub on June 6, 2014. About a year later, on July 21, 2015, Kubernetes V1 was released, featuring 14,000 commits from 400 contributors. Simultaneously, the Linux Foundation announced the formation of the Cloud Native Computing Foundation (CNCF) to advance state-of-the-art technologies for building cloud-native applications and services. After that, Google donated Kubernetes to the CNCF, marking a significant milestone in its development. Kubernetes addressed a critical need in the software industry: managing the lifecycle of containerized applications. Before Kubernetes, developers struggled with orchestrating containers, leading to inefficiencies and complexities in deployment processes. Kubernetes brought advanced container management functionality and quickly gained popularity due to its robust capabilities in automating the deployment, scaling, and operations of containers. While early versions of Kubernetes introduced the foundation for container orchestration, the project has since undergone significant improvements. Major updates have introduced sophisticated features such as StatefulSets for managing stateful applications, advanced networking capabilities, and enhanced security measures. The introduction of Custom Resource Definitions (CRDs) and Operators has further extended its functionality, allowing users to manage complex applications and workflows with greater ease. In addition, the community has grown significantly over the past decade. According to the 2023 Project Journey Report, Kubernetes now has over 74,680 contributors, making it the second-largest open-source project in the world after Linux. Over the years, Kubernetes has seen numerous enhancements and integrations, becoming the de facto standard for container orchestration. The active open source community and the extensive ecosystem of tools and projects have made Kubernetes an essential technology for modern software development. It is now the "primary container orchestration tool for 71% of Fortune 100 companies" (Project Journey Report). Kubernetes' Impact on the SDLC and Developers Kubernetes abstracts away the complexities of container orchestration and allows developers to focus on development rather than worry about application deployment and orchestration. The benefits and key impacts on the SDLC and developer workflows include enhanced development and testing, efficient deployment, operational efficiency, improved security, and support for microservices architecture. Enhanced Development and Testing Kubernetes ensures consistency for applications running across testing, development, and production environments, regardless of whether the infrastructure is on-premises, cloud based, or a hybrid setup. This level of consistency, along with the capability to quickly spin up and tear down environments, significantly accelerates development cycles. By promoting portability, Kubernetes also helps enterprises avoid vendor lock-in and refine their cloud strategies, leading to a more flexible and efficient development process. Efficient Deployment Kubernetes automates numerous aspects of application deployment, such as service discovery, load balancing, scaling, and self-healing. This automation reduces manual effort, minimizes human error, and ensures reliable and repeatable deployments, reducing downtime and deployment failures. Operational Efficiency Kubernetes efficiently manages resources by dynamically allocating them based on the application's needs. It ensures operations remain cost effective while maintaining optimal performance and use of computing resources by scheduling containers based on resource requirements and availability. Security Kubernetes enhances security by providing container isolation and managing permissions. Its built-in security features allow developers to build secure applications without deep security expertise. Such built-in features include role-based access control, which ensures that only authorized users can access specific resources and perform certain actions. It also supports secrets management to securely store and manage sensitive information like passwords and API keys. Microservices Architecture Kubernetes has facilitated the adoption of microservices architecture by enabling developers to deploy, manage, and scale individual microservices independently. Each microservice can be packaged into a separate container, providing isolation and ensuring that dependencies are managed within the container. Kubernetes' service discovery and load balancing features enable communication between microservices, while its support for automated scaling and self-healing ensures high availability and resilience. Predictions for the Next Decade After a decade, it has become clear that Kubernetes is now the standard technology for container orchestration that's used by many enterprises. According to the CNCF Annual Survey 2023, the usage of Kubernetes continues to grow, with significant adoption across different industries and use cases. Its reliability and flexibility make it a preferred choice for mission-critical applications, including databases, CI/CD pipelines, and AI and machine learning (ML) workloads. As a result, there is a growing demand for new features and enhancements, as well as simplifying concepts for users. The community is now prioritizing improvements that not only enhance user experiences but also promote the sustainability of the project. Figure 1 illustrates the anticipated future trends in Kubernetes, and below are the trends and innovations expected to shape Kubernetes' future in more detail. Figure 1. Future trends in Kubernetes AI and Machine Learning Kubernetes is increasingly used to orchestrate AI and ML workloads, supporting the deployment and management of complex ML pipelines. This simplifies the integration and scaling of AI applications across various environments. Innovations such as Kubeflow — an open-source platform designed to optimize the deployment, orchestration, and management of ML workflows on Kubernetes — enable data scientists to focus more on model development and less on infrastructure concerns. According to the recent CNCF open-source project velocity report, Kubeflow appeared on the top 30 CNCF project list for the first time in 2023, highlighting its growing importance in the ecosystem. Addressing the resource-intensive demands of AI introduces new challenges that contributors are focusing on, shaping the future of Kubernetes in the realm of AI and ML. The Developer Experience As Kubernetes evolves, its complexity can create challenges for new users. Hence, improving the user experience is crucial moving forward. Tools like Backstage are revolutionizing how developers work with Kubernetes and speeding up the development process. The CNCF's open-source project velocity report also states that "Backstage is addressing a significant pain point around developer experience." Moreover, the importance of platform engineering is increasingly recognized by companies. This emerging trend is expected to grow, with the goal of reducing the learning curve and making it easier for developers to adopt Kubernetes, thereby accelerating the development process and improving productivity. CI/CD and GitOps Kubernetes is revolutionizing continuous integration and continuous deployment (CI/CD) pipelines through the adoption of GitOps practices. GitOps uses Git repositories as the source of truth for declarative infrastructure and applications, enabling automated deployments. Tools like ArgoCD and Flux are being widely adopted to simplify the deployment process, reduce human error, and ensure consistency across environments. Figure 2 shows the integration between a GitOps operator, such as ArgoCD, and Kubernetes for managing deployments. This trend is expected to grow, making CI/CD pipelines more robust and efficient. Figure 2. Kubernetes GitOps Sustainability and Efficiency Cloud computing's carbon footprint now exceeds the airline industry, making sustainability and operational efficiency crucial in Kubernetes deployments. The Kubernetes community is actively developing features to optimize resource usage, reduce energy consumption, and enhance the overall efficiency of Kubernetes clusters. CNCF projects like KEDA (Kubernetes event-driven autoscaling) and Karpenter (just-in-time nodes for any Kubernetes cluster) are at the forefront of this effort. These tools not only contribute to cost savings but also align with global sustainability goals. Hybrid and Multi-Cloud Deployments According to the CNCF Annual Survey 2023, multi-cloud solutions are now the norm: Multi-cloud solutions (hybrid and other cloud combinations) are used by 56% of organizations. Deploying applications across hybrid and multi-cloud environments is one of Kubernetes' most significant advantages. This flexibility enables organizations to avoid vendor lock-in, optimize costs, and enhance resilience by distributing workloads across multiple cloud providers. Future developments in Kubernetes will focus on improving and simplifying management across different cloud platforms, making hybrid and multi-cloud deployments even more efficient. Increased Security Features Security continues to be a top priority for Kubernetes deployments. The community is actively enhancing security features to address vulnerabilities and emerging threats. These efforts include improvements to network policies, stronger identity and access management (IAM), and more advanced encryption mechanisms. For instance, the 2024 CNCF open-source project velocity report highlighted that Keycloak, which joined CNCF last year as an incubating project, is playing a vital role in advancing open-source IAM, backed by a large and active community. Edge Computing Kubernetes is playing a crucial role in the evolution of edge computing. By enabling consistent deployment, monitoring, and management of applications at the edge, Kubernetes significantly reduces latency, enhances real-time processing capabilities, and supports emerging use cases like IoT and 5G. Projects like KubeEdge and K3s are at the forefront of this movement. We can expect further optimizations for lightweight and resource-constrained environments, making Kubernetes even more suitable for edge computing scenarios. Conclusion Kubernetes has revolutionized cloud-native computing, transforming how we develop, deploy, and manage applications. As Kelsey Hightower noted in Google's Kubernetes Podcast, "We are only halfway through its journey, with the next decade expected to see Kubernetes mature to the point where it 'gets out of the way' by doing its job so well that it becomes naturally integrated into the background of our infrastructure." Kubernetes' influence will only grow, shaping the future of technology and empowering organizations to innovate and thrive in an increasingly complex landscape. References: "10 Years of Kubernetes" by Bob Killen et al, 2024 CNCF Annual Survey 2023 by CNCF, 2023 "As we reach mid-year 2024, a look at CNCF, Linux Foundation, and top 30 open source project velocity" by Chris Aniszczyk, CNCF, 2024 "Orchestration Celebration: 10 Years of Kubernetes" by Adrian Bridgwater, 2024 "Kubernetes: Beyond Container Orchestration" by Pratik Prakash, 2022 "The Staggering Ecological Impacts of Computation and the Cloud" by Steven Gonzalez Monserrate, 2022 This is an excerpt from DZone's 2024 Trend Report, Kubernetes in the Enterprise: Once Decade-Defining, Now Forging a Future in the SDLC.Read the Free Report
Thanks to Cheney Zhang (Zilliz) Retrieval-Augmented Generation (RAG) techniques, by integrating external knowledge bases, provide additional contextual information for LLMs, effectively alleviating issues such as hallucination and insufficient domain knowledge of LLM. However, relying solely on general knowledge bases has its limitations, especially when dealing with complex entity relationships and multi-hop questions, where the model often struggles to provide accurate answers. Introducing Knowledge Graphs (KGs) into the RAG system provides a new solution to this problem. KGs present entities and their relationships in a structured manner, offering more refined contextual information during retrieval. By leveraging the abundant relational data of KGs, RAG can not only pinpoint relevant knowledge more accurately but also better handle complex question-answering scenarios, such as comparing entity relationships or answering multi-hop questions. However, the current KG-RAG is still in its early exploration stage, and the industry has not yet reached a consensus on the relevant technical path; for instance, how to effectively retrieve relevant entities and relationships in the knowledge graph, how to combine vector similarity search with graph structure, there is currently no unified paradigm. For example, Microsoft's From Local to Global aggregates subgraph structures into community summaries through a large number of LLM requests, but this process consumes a substantial number of LLM tokens, making this approach expensive and impractical. HippoRAG uses Personalized PageRank to update the weights of graph nodes and identify important entities, but this entity-centered method is easily affected by Named Entity and Relation (NER) omissions during extraction, overlooking other information in the context. IRCoT uses multi-step LLM requests to gradually infer the final answer, but this method introduces LLM into the multi-hop search process, resulting in an extended time to answer questions, making it difficult to implement in practice. We found that a simple RAG paradigm with multi-way retrieval and then reranking can handle complex multi-hop KG-RAG scenarios very well, without requiring excessive LLM overhead or any graph structure storage or algorithm. Despite using a very simple architecture, our method significantly outperforms current state-of-the-art solutions, such as HippoRAG, and only requires vector storage and a small amount of LLM overhead. We first introduce the theoretical basis of our method, and then describe the specific process. Our simple pipeline is not much different from the common multi-way retrieval and rerank architecture, but it can achieve the SoTA performance in the multihop graph RAG scenario. Limited Hop Count Theory In real-life KG-RAG scenarios, we noticed a concept known as limited hop count. In KG-based RAG, the actual query question only requires a limited and relatively small number of hops (usually less than four) within the knowledge graph, rather than a greater amount. Our limited hop count theory is based on two critical observations: Limited complexity of queries Local dense structure of "shortcuts" 1. Limited Complexity of Queries A user's query is unlikely to involve numerous entities or introduce complex relationships. If it does, the question would seem peculiar and unrealistic. Normal query: "In which year did Einstein win the Nobel Prize?" Query path in the knowledge graph: Find the "Einstein" node. Jump to the "Nobel Prize" node associated with "Einstein". Return the year the prize was awarded. Hop count: 2 hops Explanation: This is a standard user query, where the user wants to know a single fact directly associated with a specific entity. In this case, the knowledge graph only needs a few hops to complete the task, as all relevant information is directly linked to the central node, Einstein. This type of query is very common in practice, such as querying celebrity background information, award history, event time, etc. Weird query: "What is the relationship between the year the discoverer of the theory of relativity received the Nobel Prize and the number of patents they invented in a country famous for its bank secrecy laws and the magnificent scenery of the Alps?" Query path in the knowledge graph: Find that the "inventor" of "relativity" is "Einstein". Jump to the "Nobel Prize" node associated with "Einstein". Look up the year the "Nobel Prize" was awarded. Identify "Switzerland" through "bank secrecy laws and the Alps". Jump to the "patent" node associated with "Einstein". Look up patent information related to the period in Switzerland. Compare the relationship between the number of patents and the year of the award. Hop count: 7 hops Explanation: This question is complex, requiring not just a single fact query, but also intricate associations between multiple nodes. This type of question is not common in actual scenarios because users generally do not seek such complex cross-information in a single query. Usually, these types of questions are divided into multiple simple queries to gradually obtain information. You may think something about the number of hops sounds familiar, it's because all commonly used information is usually linkable in only a limited number of steps. You can see this in practice in the Six Degrees of Kevin Bacon. 2. Local Dense Structure of “Shortcuts” There are some local dense structures in the knowledge graph, and for some queries, there are "shortcuts" that can quickly connect to entities several hops away from one entity. Suppose we have a family relationship knowledge graph that contains the following entities and relationships: Alex is the child of Brian (Alex - child_of - Brian) Cole is married to Brian (Cole - married_to - Brian) Daniel is the brother of Cole (Daniel - brother_of - Cole) Daniel is the uncle of Alex (Daniel - uncle_of - Alex) This is a dense knowledge graph with redundant information. The last relationship can obviously be derived from the first three relationships. However, there are often some redundant information shortcuts in the knowledge graph. These shortcuts can reduce the number of hops between some entities. Based on these two observations, we find that the routing lookup process within the knowledge graph for a limited number of times only involves local knowledge graph information. Therefore, the process of retrieving information within the knowledge graph for a query can be implemented in the following two steps: The starting point of the route can be found through vector similarity lookup. It can involve the similarity relationship lookup between the query and entities or the query and relationships. The routing process to find other information from the starting point can be replaced with an LLM. Put this alternative information into the prompt, and rely on the powerful self-attention mechanism of LLM to select valuable routes. As the length of the prompt is limited, only local knowledge graph information can be put in, such as the knowledge graph information within a limited number of hops around the starting point, which is guaranteed by the limited hop count theory. The whole process does not need any other KG storage and complex KG query statements; it only needs to use a Milvus vector database and one access of an LLM. The vector retrieval with LLM reranking is the most critical part of this pipeline, explaining why we can reach performance far beyond the methods based on graph theory (such as HippoRAG) with a traditional two-way retrieval architecture. This also shows that we do not actually need physical storage of graph structure and complex graph query SQL statements. We only need to store the logical relationship of graph structure in the vector database, a traditional architecture can perform logical sub-graph routing, and the powerful ability of modern LLM helps to achieve this. Method Overview Our approach solely focuses on the passage retrieval phase within the RAG process, without any novel enhancements or optimizations in chunking or LLM response generation. We assume that we have acquired a set of triplet data from the corpus, incorporating a variety of entity and relationship information. This data can symbolize the information of a knowledge graph. We vectorize the entity and relationship information individually and store them in vector storage, thus creating a logical knowledge graph. When receiving a query, the relevant entities and relationships are retrieved initially. Leveraging these entities and relationships, we perform a limited expansion on the graph structure. These relationships are integrated into the prompt along with the query question, and the LLM's capability is exploited to rerank these relationships. Ultimately, we obtain the top-K vital relationships and get the related passages within their metadata information, serving as the final retrieved passages. Detailed Method Vector Storage We establish two vector storage collections: one being the entity collection, the other the relationship collection. Unique entities and relationship information are embedded into vectors via the embedding model and stored in vector storage. Entity information is directly converted into embeddings based on their word descriptions. As for the original data form of relationships, it is structured as a triplet: (Subject, Predicate, Object). We directly combine them into a sentence, which is a heuristic method: "Subject Predicate Object". For instance: (Alex, child of, Brian) -> "Alex child of Brian"(Cole, married to, Brian) -> "Cole married to Brian" This sentence is then directly transformed into an embedding and stored in the vector database. This approach is straightforward and efficient. Although minor grammatical issues may arise, they do not impact the conveyance of the sentence meaning and its distribution in the vector space. Of course, we also advocate for the use of LLM to generate succinct sentence descriptions during the initial extraction of triplets. Vector Similarity Search For the input query, we adhere to the common paradigms in GraphRAG (such as HippoRAG and Microsoft GraphRAG), extract entities from the query, transform each query entity into an embedding, and conduct a vector similarity search on each entity collection. Subsequently, we merge the results obtained from all query entities' searches. For the vector search of relationships, we directly transform the query string into an embedding and perform a vector similarity search on the relationship collection. Expanding Subgraph We take the discovered entities and relationships as starting points in the knowledge graph and expand a certain degree outward. For the initial entities, we expand a certain number of hops outward and include their adjacent relationships, denoted as $$Set(rel1)$$. For the initial relationships, we expand a certain number of hops to obtain $$Set(rel2)$$. We then unite these two sets, $$Set(merged)=Set(rel1) \cup Set(rel2) $$. Given the limited hop count theory, we only need to expand a smaller number of degrees (like 1, 2, etc.) to encompass most of the relationships that could potentially assist in answering. Please note: the concept of the expansion degree in this step differs from the concept of the total hops required to answer a question. For instance, if answering a query involves two entities that are n hops apart, typically only an expansion of ⌈n / 2⌉ degree is necessary, as these two entities are the two starting endpoints recalled by the vector similarity. As illustrated in the figure below, the vector retrieval stage returns two red entities, and starting from them, expanding 2 degrees in opposite directions can cover a 4-hop distance, which is sufficient to answer a 4-hop question involving these two entities. Large Language Model (LLM) Reranker In this stage, we deploy the powerful self-attention mechanism of LLM to further filter and refine the candidate set of relationships. We employ a one-shot prompt, incorporating the query and the candidate set of relationships into the prompt, and instruct LLM to select potential relationships that could assist in answering the query. Given that some queries may be complex, we adopt the Chain-of-Thought approach, allowing LLM to articulate its thought process in its response. We have noted that this strategy provides some assistance to weaker models. We stipulate that LLM's response is in JSON format for convenient parsing. The specific prompt is as follows: One shot input prompt: Plain Text I will provide you with a list of relationship descriptions. Your task is to select 3 relationships that may be useful to answer the given question. Please return a JSON object containing your thought process and a list of the selected relationships in order of their relevance. **Question:** When was the mother of the leader of the Third Crusade born? **Relationship descriptions:** [1] Eleanor was born in 1122. [2] Eleanor married King Louis VII of France. [3] Eleanor was the Duchess of Aquitaine. [4] Eleanor participated in the Second Crusade. [5] Eleanor had eight children. [6] Eleanor was married to Henry II of England. [7] Eleanor was the mother of Richard the Lionheart. [8] Richard the Lionheart was the King of England. [9] Henry II was the father of Richard the Lionheart. [10] Henry II was the King of England. [11] Richard the Lionheart led the Third Crusade. One shot output prompt: JSON { "thought_process": "To answer the question about the birth of the mother of the leader of the Third Crusade, I first need to identify who led the Third Crusade and then determine who his mother was. After identifying his mother, I can look for the relationship that mentions her birth.", "useful_relationships": [ "[11] Richard the Lionheart led the Third Crusade", "[7] Eleanor was the mother of Richard the Lionheart", "[1] Eleanor was born in 1122" ] } This prompt serves as an illustrative reference. In reality, transforming the triplets in relationships into a coherent sentence can be a challenging task. However, you can certainly employ the heuristic method mentioned above to directly assemble the triplets. For instance: (Eleanor, born in, 1122) can be directly transformed into Eleanor was born in 1122. While this method may occasionally lead to certain grammatical issues, it is the quickest and most straightforward approach, and it will not mislead LLM. Retrieving the Final Passages For the aforementioned example, it is feasible to directly return the final response during the LLM Rerank phase; for instance, by adding a field such as "final answer" in the JSON field of the one-shot output prompt. However, the information in this prompt is exclusive to the relationship, and not all queries can yield a final answer at this juncture; hence, other specific details should be obtained from the original passage. LLM returns precisely sorted relationships. All we need to do is extract the corresponding relationship data previously stored, and retrieve the relevant metadata from it, where corresponding passage ids reside. This passage data represents the final passages that have been retrieved. The subsequent process of generating responses is identical to naive RAG, which involves incorporating them into the context of the prompt and using LLM to generate the final answer. Results We employ the dense embedding that aligns with HippoRAG, facebook/contriever, as our embedding model. The results show that our approach significantly surpasses both naive RAG and HippoRAG on three multi-hop datasets. All methods apply the same embedding model setting. We use Recall@2 as our evaluation metric, defined as Recall = Total number of documents retrieved that are relevant/Total number of relevant documents in the database. On the multi-hop datasets, our method outperforms naive RAG and HippoRAG in all datasets. All of them are compared using the same facebook/contriever embedding model. These results suggest that even the simplest multi-way retrieval and reranking RAG paradigm, when utilized in the graph RAG context, can deliver state-of-the-art performance. It further implies that appropriate vector retrieval and LLM adoption are crucial in the multi-hop QA scenario. Reflecting on our approach, the process of transforming entities and relationships into vectors and then retrieving is like discovering the starting point of a subgraph, akin to uncovering "clues" at a crime scene. The subsequent subgraph expansion and LLM reranking resemble the process of analyzing these "clues". The LLM has a "bird's-eye view" and can intelligently select beneficial and crucial relationships from a multitude of candidate relationships. These two stages fundamentally correspond to the naive vector retrieval + LLM reranking paradigm. In practice, we recommend using open source Milvus, or its fully managed version Zilliz Cloud, to store and search for a large volume of entities and relationships in graph structures. For LLM, you can opt for open source models like Llama-3.1-70B or the proprietary GPT-4o mini, as mid-to-large scale models are well-equipped to handle these tasks. For The Full Code Graph RAG with Milvus
The DevOps model broke down the wall between development and production by assigning deployment and production management responsibilities to the application engineers and providing them with infrastructure management tools. This approach expanded engineers' competencies beyond their initial skill sets. This model helped companies gain velocity as applications weren't passed around from team to team, and owners became responsible from ideation to production. It shortened the development lifecycle and time to deployment, making companies more agile and responsive. DevOps became the logical recommendation for fast-paced digital transformation, and most engineering organizations in recent years are built following this strategy. Limitations of DevOps and the Need for Platform Engineering The DevOps movement had one significant implication: it heightened the cognitive load on all development teams about infrastructure, deployment, and operations. Learning and understanding various parts of the stack deeply is much work honed by all engineers, on top of developing an increasing number of complex applications. This had unintended drawbacks. Spreading engineers thinner on their competencies lowered their productivity on their core applications. All time spent learning infrastructure as code or setting up testing and continuous integration pipelines resulted in less time to work on customer-impacting features. This further impacted hiring and onboarding, as engineers competent on many parts of the technology stack are rarer than strict application engineers. Ramping up on a new stack was significantly heavier than on a single application. Internal training became an even more critical and under-invested part of an engineering organization. Finally, while the scope was creeping up for engineers, the operational load could generate stress and burnout on already pressured teams, potentially impacting delivery. To alleviate these concerns, DevOps had to evolve towards offering turnkey service to their engineers, not asking them to learn new technologies while giving them all the same control and ownership of their applications. Companies must build a golden path of tools and automation from onboarding to production for all engineering teams to thrive. This resulting practice is called platform engineering and is the natural evolution of DevOps within an organization. What Is Platform Engineering? Platform engineering is evolving and consolidating multiple disciplines within a standard engineering organization. A platform team's primary role is to deliver ways for product teams to build and deploy their application effortlessly. It takes examples from what cloud-native architecture (hear Platform-as-a-Service) offers its customers and builds a replica of that model tailor-fitted to a company and its engineering organization. Product teams act as customers to their internal platform, the platform team being an intermediary between development and production. Thus, by contract, a platform team is also responsible for offering observability as a service to their users. While this can be broken down into separate entities, deployment and monitoring are interlinked and must be provided as a one-stop solution to manage applications, from ideation to production and scaling. Knowing that each engineering organization has its complexities and particularities, what binds together these services is a cohesive developer experience tailored to the daily life of engineers. Building interfaces (UI: User Interfaces, CLI: Command-line Interfaces, or APIs: Application Programming Interfaces) is the main differentiator and the exciting work of an effective platform team. This first helps improve team velocity during a standard development lifecycle. It speeds up onboarding, allowing the company to iterate on new products and services quicker for its users. When Does It Make Sense To Build a Platform Team? Not all engineering organizations need a platform team, as it often depends on the organization's size and growth rate. As a rule of thumb, it is worth investing in a platform team once you have multiple products (customer-facing or otherwise) concurrently being developed by independent teams. It usually comes when an engineering organization reaches more than 20 engineers. Another good indicator is the expected growth rate of the company. If there is planning to hire quickly and develop new products within the next year or so, investing now in platform engineering while focusing on onboarding and reducing friction during development is critical to maximizing a growth period and alleviating the typical productivity slump when scaling an organization quickly. An Effort of Standardization A common challenge that a platform team faces is adoption within the company. As stated earlier, investing in building this team only makes sense once a company has found some velocity and applications are being regularly and continuously deployed to customers. This initial velocity sometimes comes at the price of technological choices, deployment strategies, and tooling. We expect drifts between teams as the company grows and technological decisions evolve. Once an organization introduces a platform team, it must decide on the direction of the tools and technologies to be used by the company in the future. While there should be as few as possible, migrations are sometimes inevitable to bring an engineering organization to a better place. It means convincing other teams to take time out of their roadmap to adopt new tools to save time and complexity down the line. This investment can be a hard sell, as it impacts product roadmaps. To help with buy-in, stating the intention of standardization and adoption of a single platform with shared tools as an organization legitimizes the effort for all. Product Teams Regain Focus On Their Offering The first natural objective of building tooling is reducing the friction for developers to build, deploy, and manage their applications. It can initially allow access to any infrastructure a team needs to run their service. It eventually means removing the need to manage infrastructure so that they spend most of their time on application development. It is critical to work with all teams regularly and assess their day-to-day, understanding where and when they spend time on something different than their application. Highlighting major friction points helps the platform team drive their roadmap. Time-to-build, time-to-deploy, and time-to-test are the main KPIs (Key Performance Indicators) to track. Knowing that with each minute saved, the company exponentially accelerates its delivery to users. Internal Open-Source Culture By having multiple teams benefit from the same tooling offered by a platform team, there is an opportunity to build an internal open-source culture where users contribute to the tools they use daily. Allowing anyone to look into the source code of their tools and welcoming them to suggest and add any feature that can help in their day-to-day heightens confidence and buy-in into the platform they are using. An ecosystem is only as robust as its users' contributions. Inviting all engineers to sculpt their environment how they want builds a more cohesive and dynamic engineering organization. This culture often reaches outside the platform scope, building software that benefits any company's applications and products. It further improves the organization's engineering excellence and ability to deliver great products. Operational Security Benefits Automating and standardizing infrastructure-related efforts for all teams has one key benefit for your organization: heightened operational security. A controlled and standardized environment reduces the potential attack surface at an infrastructure level. Application security becomes the de-facto focus moving forward as it is the growing moving piece as the company continues to deliver products and features to its users. Cloud Scaling and Cost Control Having infrastructure normalized between multiple teams and products allows one to work on multi-tenancy and shared resources. It can materialize as clusters of agnostic machines running all applications within a company. It gives complete infrastructure control and scaling as the whole company needs. It also allows for better efficiency. When every team shares the same resources, they are easier to allocate. Costs are easier to manage and keep under control. A Golden Path From Onboarding to Production The main success criteria for a platform team to demonstrate is the ability to take an idea into production with the least friction possible. By showing that any code can be taken to production autonomously (without any manual actions from a controlling team) and quickly (continuous deployment integrated by default, automatic provisioning of domain names and load balancers, etc.), a platform team demonstrates a clear, attractive solution for a company to rely on and grow. With automation and intuitive tooling, one can build an engineering organization where a newcomer can join a team and immediately contribute to their codebase. This onboarding time is critical to keeping velocity during growth. Controlled Innovation It is important to remember that infrastructure innovation is necessary over the years. While you want strong standards and control, evaluating the possibility of using different types of resources for various use cases is valuable in the long run. Introducing innovation as an internal platform provider forces you to fully understand the ins and outs of technology before offering it as a service. Thus, you must be fully competent as an operator before running critical applications on said infrastructure. Reliability and Incident Management Standardizing and offering lean services to the engineering team is a forcing function to build up reliability across the board. As observability is factored into a solution already running for other services, the risk of launching a new service is lower than it would be spinning up fresh infrastructure. Relying on tried-and-true services to build an application also means that incident management is more approachable, as the time to learn and understand said infrastructure is factored into the service offered by a platform team. Conclusion The evolution from the DevOps model towards a fully-fledged platform team is an exercise in maturity for any engineering organization to consider. Once a company reaches a critical size, runs a certain amount of products, and continuously delivers services and products to its customers with reliability and scale, it needs to invest in a solid foundation for its engineering teams to thrive. From scaling to security and reliability, critically building that foundation is a decision made from the top of the company that will make every part of their technology better and with a stronger future.
From a Java perspective, I’ve been the beneficiary of some pretty amazing features over the years: Generics (Java 5) Streams and Lambda Expressions (Java 8) Enhanced Collection Functionality (Java 9) Sealed Classes (Java 17) As key features become available, I’ve been able to reduce development time as I implement features, while also seeing benefits in performance and supportability. However, one area that seems to have lagged behind is the adoption of a key internet protocol: HTTP/2. While the second major release has been around for over nine years, migration from the 1.x version has been slower than expected. I wanted to explore HTTP/2 to understand not only the benefits but also what it looks like to adopt this new version. In this article, we’ll look at my anecdotal experience, plus some challenges I found, too. About HTTP/2 HTTP/2 was released in May 2015 and included the following improvements over the prior version of HTTP: Multiplexing: Allows multiple requests and responses to be sent over a single connection Header compression: Reduces the overhead of HTTP headers by using compression Server push: Enables servers to send resources to a client proactively Resource prioritization: Allows consumers to specify the importance of given resources, affecting the order in which they’re loaded Binary protocol: Provides an alternative to the text-based format of HTTP/1.x Additionally, HTTP/2 is backward compatible with HTTP/1.x. Common Use Cases for HTTP/2 Below are just some of the strong use cases for HTTP/2: A full-stack application that contains a chatty consumer, consistently communicating with the service tier An application that relies heavily on content being stored inside the HTTP headers A solution that is dependent on server-initiated events to broadcast updates to consumers A client application that can benefit from providing prioritization instructions to the underlying service A web client that requires large amounts of data to be retrieved from the service tier Migrating to HTTP/2 for any of these use cases could provide noticeable improvements from a consumer perspective. What’s Involved With HTTP/2 Adoption? When I think about a lower-than-expected adoption rate of 45%-50% (as noted in this Cloudflare blog), I wonder if developers believe the upgrade to HTTP/2 won’t be easy. But I don’t get why they feel that way. After all, HTTP/2 is backwards compatible with HTTP/1.x. Using Spring Boot 3.x (which requires Java 17+ and uses the default Jetty server) as an example, upgrading to HTTP/2 is actually kind of easy. The biggest hurdle is making sure you are using SSL/TLS — which is honestly a good idea for your services anyway. Properties files server.port=8443 server.ssl.key-store=classpath:keystore.p12 server.ssl.key-store-password=<your_password_goes_here> server.ssl.key-store-type=PKCS12 server.ssl.key-alias=<your_alias_goes_here> With SSL/TLS in place, you just need to enable HTTP/2 via this property: Properties files server.http2.enabled=true At this point, your service will start and utilize HTTP/2. By default, all of the features noted above will be ready for use. Is that all there is to it? But Wait … There’s More to the Story Depending on where your service is running, the effort to upgrade to HTTP/2 might not yield the results you were expecting. This is because network infrastructure often stands between your service and the consumers wanting to take advantage of HTTP/2 greatness. That layer needs to fully support HTTP/2 as well. What does this mean? It means your service could receive the request and provide an HTTP/2 response, only for a router to downgrade the protocol to HTTP/1.1. Here’s the big takeaway: before you get all excited about using HTTP/2 and spend time upgrading your services, you should confirm that your network layer supports it. Recently, Heroku announced support of HTTP/2 at the router level. They’re addressing this exact scenario, and HTTP/2 is now available in a public beta. The illustration below demonstrates how they make HTTP/2 service responses available to consumers: This initial push from Heroku makes it possible for service developers like me to build applications that can take advantage of HTTP/2 features like header compression and multiplexing. This means faster delivery to consumers while potentially reducing compute and network loads. If your cloud provider doesn’t have infrastructure that supports HTTP/2, requests to your HTTP/2-enabled service will result in an HTTP/1.x response. As a result, you won’t get the HTTP/2 benefits you’re looking for. Challenges With HTTP/2 While my own experience of upgrading my Spring Boot services to leverage HTTP/2 hasn’t run up against any significant challenges — especially with support now at the cloud provider network level — I am reading more about others who’ve struggled with the adoption. Based on some of the customer experiences I’ve found, here are some items to be aware of during your journey to HTTP/2: Increase in compute cost: These features can lead to more processing power than what you may have needed for HTTP/1.x. Impact on other portions of the response: After adding SSL/TLS to your service, expect that more time will be required to perform this layer of processing. Advanced features can be misconfigured: You’ll want to understand concepts like multiplexing, stream prioritization, flow control, and header compression, as these items can impact performance in a negative manner if not configured correctly. If your path to production includes dev, QA, and staging environments, you should be able to identify and mitigate any of these hurdles long before your code reaches production. Conclusion My readers may recall my personal mission statement, which I feel can apply to any IT professional: “Focus your time on delivering features/functionality that extends the value of your intellectual property. Leverage frameworks, products, and services for everything else.” — J. Vester Upgrading to HTTP/2 certainly adheres to my mission statement by giving service owners features like multiplexing, header compression, resource prioritization, and binary responses — all of which can impact the overall performance of a service or the consuming application. At the same time, cloud providers who support HTTP/2 — like Heroku — also get credit for adhering to my mission statement. Without this layer of support, applications that interact with these services wouldn’t be able to take advantage of these benefits. When I reflect on my personal experience with Java, I can’t imagine a world where I am writing Java code without using generics, streams, lambdas, enhanced collections, and sealed classes. All of these features are possible because I took the time to see the benefits and perform the upgrade. The question really isn’t if you should upgrade to HTTP/2, but rather which upcoming development iteration will cover this enhancement. Have a really great day!
Distributed services have indeed revolutionized the design and deployment of applications in the modern world of cloud-native architecture: flexibility, scalability, and resilience are provided by these autonomous, loosely coupled services. This also means that services add complexity to our systems, especially with cross-cutting concerns such as logging, monitoring, security, and configuration. As a fundamental design concept, the sidecar pattern enhances the distributed architecture in a seamless and scalable manner. Throughout this article, we explore what the sidecar pattern offers, its use cases, and why it has become so widely used in cloud-native environments. What Is the Sidecar Pattern? The sidecar pattern describes a design that deploys an auxiliary service — a sidecar — alongside the container of a primary application. It would run in its own container or process but would share the same context with the primary application, such as network and storage. The objective here is to offload non-core business logic functionality — security, logging, or configuration — to this auxiliary container and let the primary service focus on the core application logic. Think of it as attaching a sidecar to a motorcycle. The motorcycle is your app, and the sidecar provides support without getting in the way of the motorcycle's operation. Why Use the Sidecar Pattern? The sidecar pattern offloads non-core functionalities such as authentication, logging, or configuration into a separate component. That will ensure that your main service has only one concern: business logic; thus, it will be easier to maintain and test. Moreover, sidecars do not depend on the main application's language or technology stack. This allows one to standardize concerns across multiple services written in any language. Once a sidecar has been written, it can be reused across many services, which ensures its functionalities would remain consistent. For instance, a logging sidecar applied to multiple microservices would result in common log formatting and delivery. Since these sidecars can take care of logging, tracing, or metrics gathering quite independently, they will indeed provide a clean way to inject observability into the services without touching their business logic. This grants much more visibility and better troubleshooting. Finally, this update in the logic of a sidecar-such as upgrading a security feature-doesn't need to make changes to the main application. This provides greater agility while reducing downtime, at least in large distributed systems. Thus, sidecars allow achieving: Separation of concerns Modular and reusable components Improved observability Allows easier service updates Key Use Cases for the Sidecar Pattern Service Meshes One of the most well-known usages of the sidecar pattern is service meshes, such as Istio or Linkerd. The sidecar proxy (Envoy, for example) manages networking concerns such as routing, load balancing, retries, and even security between services — for example, mutual TLS. The sidecar provides a transparent layer of control without changing application code. Security Enhancements Various security policies could be implemented via sidecars, including secret management, certificate rotation, or data encryption. As a specific example, mutual authentication between services can be handled by a sidecar, keeping sensitive data transmissions secure. Monitoring and Logging Centralized logging can run in sidecar containers, such as Fluentd or Logstash, which collect and forward logs to a central server, abstracting log management from the application. Similarly, a monitoring sidecar exposes an application's metrics to a monitoring system like Prometheus. Configuration Management One of the use cases for sidecars is to dynamically load and inject configuration data into the main application. This is useful when configurations need to change at runtime and without restarting the main service. Things To Consider When Using Sidecar Pattern While the sidecar pattern enjoys several advantages, it's equally important to be aware of what trade-offs it makes: Resource overhead: Sidecars consume CPU, memory, and networking resources. Multiple sidecars performing different tasks, such as logging or monitoring, will increase resource consumption. Operational complexity: Running sidecars for many services is an operational task and a challenge. Much like the main services, sidecars need to be correctly deployed, updated, and monitored. Network latency: Since most of the sidecars interact over the network, proxy sidecars, for instance, could introduce additional network latency. Often negligible, but an important consideration where performance is sensitive. Best Practices Applying the Sidecar Pattern Sidecars share the same resources as the main container, it is good practice to keep sidecar processes lightweight to reduce contention on resources. Establish sidecars for cross-cutting concerns like logging, security, and configuration. Core business logic should not go into the sidecar since this can cause tight coupling between the application and the sidecar. Like application services, monitor the resource consumption of your sidecars, for example failure rates and performance degradations. If using service mesh technologies that depend on sidecars, ensure that the benefits brought by sidecar injection, such as observability and security, justify the increased operational complexity. Conclusion Offloading cross-cutting concerns to sidecars creates more modular, reusable, and maintainable services. As with any pattern of architecture, one needs to balance benefits against potential overhead and the level of complexity it introduces. Used correctly, the sidecar can greatly simplify distributed architecture while retaining flexibility and scalability. As the cloud-native architecture continues to evolve, the sidecar pattern will undoubtedly remain an important strategy for dealing with the increasing complexity of distributed systems.
Curating Efficient Distributed Application Runtime (Dapr) Workflows
October 4, 2024 by
September 30, 2024 by CORE
Curating Efficient Distributed Application Runtime (Dapr) Workflows
October 4, 2024 by
How to Build a RAG-Powered Chatbot With Google Gemini and MyScaleDB
October 4, 2024 by
Explainable AI: Making the Black Box Transparent
May 16, 2023 by CORE
Curating Efficient Distributed Application Runtime (Dapr) Workflows
October 4, 2024 by
Low Code vs. Traditional Development: A Comprehensive Comparison
May 16, 2023 by
How to Build a RAG-Powered Chatbot With Google Gemini and MyScaleDB
October 4, 2024 by
Augmenting the Client With Alpine.js
October 4, 2024 by CORE
Five IntelliJ Idea Plugins That Will Change the Way You Code
May 15, 2023 by