The dawn of observability across the SDLC has fully disrupted standard performance monitoring and management practices. See why.
Apache Kafka: a streaming engine for collecting, caching, and processing high volumes of data in real time. Explore the essentials now.
How to Test POST Requests With Playwright Java for API Testing
Protecting Your Data Pipeline: Avoid Apache Kafka Outages With Topic and Configuration Backups
Observability and Performance
The dawn of observability across the software ecosystem has fully disrupted standard performance monitoring and management. Enhancing these approaches with sophisticated, data-driven, and automated insights allows your organization to better identify anomalies and incidents across applications and wider systems. While monitoring and standard performance practices are still necessary, they now serve to complement organizations' comprehensive observability strategies. This year's Observability and Performance Trend Report moves beyond metrics, logs, and traces — we dive into essential topics around full-stack observability, like security considerations, AIOps, the future of hybrid and cloud-native observability, and much more.
Platform Engineering Essentials
Apache Kafka Essentials
When you are using mvn install in your build server, you should ask yourself the question of whether this is correct. You might be at risk without knowing it. In this blog, the problems with mvn install are explained, and solutions are provided. Introduction You are using a build server for several reasons: builds are not dependent on a developer’s machine, you can run long integration tests or regression tests on the build server without blocking a developer’s machine, you run security checks on the build server, SonarQube analysis, and so on. But are your artifacts correct? You might be surprised when someone tells you that feature-branch artifacts can slip into your develop or main branch artifacts without you knowing. But this is exactly what can happen if you do not take the correct measures. In the remainder of this blog, the problems that can arise are shown, and how you can fix them. The sources used in this blog can be found on GitHub. Prerequisites Prerequisites for reading this blog are: You need to be familiar with Maven (a great tutorial can be found here);Basic knowledge of Java;Basic knowledge of Git branching strategies. Sample Applications The source code contains three sample applications. 1. No Dependencies Application nodep is a Java library that contains a single class NoDependencies with a method returning a string. Java public class NoDependencies { public String out() { return "No dependencies initial version"; } } 2. Dependencies Application dep is a basic Java application containing a class Dependencies which uses the nodep library and prints the string to the console. Java public class Dependencies { public static void main(String[] args) { NoDependencies noDependencies = new NoDependencies(); System.out.println("Dependencies has artifact: " + noDependencies.out()); } } The library nodep is added as a dependency in the pom file. XML <dependencies> <dependency> <groupId>com.mydeveloperplanet.mymavenverifyinstallplanet</groupId> <artifactId>nodep</artifactId> <version>0.0.1-SNAPSHOT</version> </dependency> </dependencies> 3. Multimodule Application multimodule is a Maven multimodule project that consists of two modules, multimodulenodep and multimoduledep, which are similar to the ones described above. Verify vs. Install First, let’s investigate the difference between mvn verify and mvn install. 1. Verify Step into the nodep directory and build the application. Shell $ cd nodep $ mvn clean verify ... [INFO] --- jar:3.3.0:jar (default-jar) @ nodep --- [INFO] Building jar: /home/<project directory>/mymavenverifyinstallplanet/nodep/target/nodep-0.0.1-SNAPSHOT.jar [INFO] ------------------------------------------------------------------------ [INFO] BUILD SUCCESS [INFO] ------------------------------------------------------------------------ [INFO] Total time: 0.967 s [INFO] Finished at: 2024-11-03T15:10:12+01:00 [INFO] ------------------------------------------------------------------------ The build is successful, and the output shows that the artifact nodep-0.0.1-SNAPSHOT.jar was put in the target directory. Exit the nodep directory, step into the dep directory, and build the application. Shell $ cd .. $ cd dep $ mvn clean verify ... [INFO] Building Dependencies 0.0.1-SNAPSHOT [INFO] from pom.xml [INFO] --------------------------------[ jar ]--------------------------------- [WARNING] The POM for com.mydeveloperplanet.mymavenverifyinstallplanet:nodep:jar:0.0.1-SNAPSHOT is missing, no dependency information available [INFO] ------------------------------------------------------------------------ [INFO] BUILD FAILURE [INFO] ------------------------------------------------------------------------ [INFO] Total time: 0.196 s [INFO] Finished at: 2024-11-03T15:13:06+01:00 [INFO] ------------------------------------------------------------------------ [ERROR] Failed to execute goal on project dep: Could not resolve dependencies for project com.mydeveloperplanet.mymavenverifyinstallplanet:dep:jar:0.0.1-SNAPSHOT: The following artifacts could not be resolved: com.mydeveloperplanet.mymavenverifyinstallplanet:nodep:jar:0.0.1-SNAPSHOT (absent): Could not find artifact com.mydeveloperplanet.mymavenverifyinstallplanet:nodep:jar:0.0.1-SNAPSHOT -> [Help 1] [ERROR] [ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch. [ERROR] Re-run Maven using the -X switch to enable full debug logging. [ERROR] [ERROR] For more information about the errors and possible solutions, please read the following articles: [ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/DependencyResolutionException The build fails because the artifact nodep-0.0.1-SNAPSHOT.jar cannot be found in the local Maven repository (~/.m2/repository) and it is also not available in a remote Maven repository. Exit the dep directory, step into the multimodule directory, and build the application. Shell $ cd .. $ cd multimodule/ $ mvn clean verify ... [INFO] --- jar:3.3.0:jar (default-jar) @ multimoduledep --- [INFO] Building jar: /home/<project directory>/mymavenverifyinstallplanet/multimodule/multimoduledep/target/multimoduledep-0.0.1-SNAPSHOT.jar [INFO] ------------------------------------------------------------------------ [INFO] Reactor Summary for Multimodule project 0.0.1-SNAPSHOT: [INFO] [INFO] Multimodule project ................................ SUCCESS [ 0.085 s] [INFO] No dependencies .................................... SUCCESS [ 0.627 s] [INFO] Dependencies ....................................... SUCCESS [ 0.111 s] [INFO] ------------------------------------------------------------------------ [INFO] BUILD SUCCESS [INFO] ------------------------------------------------------------------------ [INFO] Total time: 0.874 s [INFO] Finished at: 2024-11-03T15:17:07+01:00 [INFO] ------------------------------------------------------------------------ This builds successfully because the Maven Reactor, which is responsible for the build, knows where the artifacts are located, and thus, dependencies between modules are successfully resolved. 2. Install The solution for solving the problem with the dep application is to use mvn install. This will install the artifacts in the local Maven repository, and then they can be resolved. Step into the nodep directory and build the application. Shell $ cd nodep $ mvn clean install ... [INFO] --- install:3.1.1:install (default-install) @ nodep --- [INFO] Installing /home/<project directory/mymavenverifyinstallplanet/nodep/pom.xml to /home/<user>/.m2/repository/com/mydeveloperplanet/mymavenverifyinstallplanet/nodep/0.0.1-SNAPSHOT/nodep-0.0.1-SNAPSHOT.pom [INFO] Installing /home/<project directory/mymavenverifyinstallplanet/nodep/target/nodep-0.0.1-SNAPSHOT.jar to /home/<user>/.m2/repository/com/mydeveloperplanet/mymavenverifyinstallplanet/nodep/0.0.1-SNAPSHOT/nodep-0.0.1-SNAPSHOT.jar [INFO] ------------------------------------------------------------------------ [INFO] BUILD SUCCESS [INFO] ------------------------------------------------------------------------ [INFO] Total time: 0.868 s [INFO] Finished at: 2024-11-03T15:21:46+01:00 [INFO] ------------------------------------------------------------------------ Again, this builds successfully. The only difference is that the artifacts are now also installed in the local Maven repository. Exit the nodep directory, step into the dep directory, and build the application. Shell $ cd .. $ cd dep $ mvn clean install ... [INFO] --- install:3.1.1:install (default-install) @ dep --- [INFO] Installing /home/<project directory>/mymavenverifyinstallplanet/dep/pom.xml to /home/<user>/.m2/repository/com/mydeveloperplanet/mymavenverifyinstallplanet/dep/0.0.1-SNAPSHOT/dep-0.0.1-SNAPSHOT.pom [INFO] Installing /home/<project directory>/mymavenverifyinstallplanet/dep/target/dep-0.0.1-SNAPSHOT.jar to /home/<user>/.m2/repository/com/mydeveloperplanet/mymavenverifyinstallplanet/dep/0.0.1-SNAPSHOT/dep-0.0.1-SNAPSHOT.jar [INFO] Installing /home/<project directory>/mymavenverifyinstallplanet/dep/target/dep-0.0.1-SNAPSHOT-jar-with-dependencies.jar to /home/<user>/.m2/repository/com/mydeveloperplanet/mymavenverifyinstallplanet/dep/0.0.1-SNAPSHOT/dep-0.0.1-SNAPSHOT-jar-with-dependencies.jar [INFO] ------------------------------------------------------------------------ [INFO] BUILD SUCCESS [INFO] ------------------------------------------------------------------------ [INFO] Total time: 1.110 s [INFO] Finished at: 2024-11-03T15:24:16+01:00 [INFO] ------------------------------------------------------------------------ This time, the application builds successfully as the nodep-0.0.1-SNAPSHOT.jar dependency can be found in the local Maven repository. Exit the dep directory, step into the multimodule directory, and build the application. Shell $ cd .. $ cd multimodule/ $ mvn clean install ... [INFO] --- install:3.1.1:install (default-install) @ multimoduledep --- [INFO] Installing /home/<project directory>/mymavenverifyinstallplanet/multimodule/multimoduledep/pom.xml to /home/<user>/.m2/repository/com/mydeveloperplanet/mymavenverifyinstallplanet/multimoduledep/0.0.1-SNAPSHOT/multimoduledep-0.0.1-SNAPSHOT.pom [INFO] Installing /home/<project directory>/mymavenverifyinstallplanet/multimodule/multimoduledep/target/multimoduledep-0.0.1-SNAPSHOT.jar to /home/<user>/.m2/repository/com/mydeveloperplanet/mymavenverifyinstallplanet/multimoduledep/0.0.1-SNAPSHOT/multimoduledep-0.0.1-SNAPSHOT.jar [INFO] ------------------------------------------------------------------------ [INFO] Reactor Summary for Multimodule project 0.0.1-SNAPSHOT: [INFO] [INFO] Multimodule project ................................ SUCCESS [ 0.125 s] [INFO] No dependencies .................................... SUCCESS [ 0.561 s] [INFO] Dependencies ....................................... SUCCESS [ 0.086 s] [INFO] ------------------------------------------------------------------------ [INFO] BUILD SUCCESS [INFO] ------------------------------------------------------------------------ [INFO] Total time: 0.825 s [INFO] Finished at: 2024-11-03T15:26:57+01:00 [INFO] ------------------------------------------------------------------------ Also, this build is still successful, and the artifacts can be found in the local Maven repository. Feature Branches So, problem solved, right? Not exactly. If you build develop branches and feature branches on your build server, you might get some unpleasant surprises. First, let’s run the dep application with the artifact in the local Maven repository. Shell $ java -jar ~/.m2/repository/com/mydeveloperplanet/mymavenverifyinstallplanet/dep/0.0.1-SNAPSHOT/dep-0.0.1-SNAPSHOT-jar-with-dependencies.jar Dependencies has artifact: No dependencies initial version As expected, the initial version text is printed. Let’s create a feature branch for the nodep application and change the text. You can just switch to branch feature-newversion. The NoDependencies class is the following. Java public class NoDependencies { public String out() { return "No dependencies new version"; } } In real life, you commit and push the changes to your remote git repository, and the build server starts building the feature branch using mvn install. So, in order to simulate this behavior, build the nodep application locally. Shell $ cd nodep $ mvn clean install Switch back to the master branch and build the dep application. Shell $ cd dep $ mvn clean install Run the application. Shell $ java -jar ~/.m2/repository/com/mydeveloperplanet/mymavenverifyinstallplanet/dep/0.0.1-SNAPSHOT/dep-0.0.1-SNAPSHOT-jar-with-dependencies.jar Dependencies has artifact: No dependencies new version Voila! The dep artifact now contains the jar file build by the feature branch, not the one in the master branch. This means that without you knowing, unfinished work slides into a more stable branch. Conclusion: you cannot use mvn install for feature branches, only for the develop or master/main branch. Build Stages Problem solved? Not exactly. It is quite common that your build pipeline consists of several stages. Each stage can be a separate invocation of a Maven command. Can the build pipeline be successfully executed when mvn verify is used? First, clear the local Maven repository. Shell $ rm -r ~/.m2/repository/com/mydeveloperplanet/mymavenverifyinstallplanet/ Step into the nodep directory, build the application and execute the Exec Maven Plugin which will just echo Hello there to the console. Shell $ cd nodep $ mvn clean verify $ mvn exec:exec [INFO] Scanning for projects... [INFO] [INFO] -------< com.mydeveloperplanet.mymavenverifyinstallplanet:nodep >------- [INFO] Building No dependencies 0.0.1-SNAPSHOT [INFO] from pom.xml [INFO] --------------------------------[ jar ]--------------------------------- [INFO] [INFO] --- exec:3.5.0:exec (default-cli) @ nodep --- Hello there [INFO] ------------------------------------------------------------------------ [INFO] BUILD SUCCESS [INFO] ------------------------------------------------------------------------ [INFO] Total time: 0.183 s [INFO] Finished at: 2024-11-03T15:47:00+01:00 [INFO] ------------------------------------------------------------------------ This works. Before continuing with the dep application, you must ensure that the nodep jar can be resolved. Therefore, execute an install. This will simulate that the nodep artifact can be retrieved from a remote Maven repository. Shell $ mvn clean install Exit the nodep directory, step into the dep directory, build the application, and execute the Exec Maven Plugin. Shell $ cd .. $ cd dep $ mvn clean verify $ mvn exec:exec [INFO] Scanning for projects... [INFO] [INFO] --------< com.mydeveloperplanet.mymavenverifyinstallplanet:dep >-------- [INFO] Building Dependencies 0.0.1-SNAPSHOT [INFO] from pom.xml [INFO] --------------------------------[ jar ]--------------------------------- [INFO] [INFO] --- exec:3.5.0:exec (default-cli) @ dep --- Hello there [INFO] ------------------------------------------------------------------------ [INFO] BUILD SUCCESS [INFO] ------------------------------------------------------------------------ [INFO] Total time: 0.178 s [INFO] Finished at: 2024-11-03T15:50:34+01:00 [INFO] ------------------------------------------------------------------------ Also successful. Exit the dep directory, step into the multimodule directory, build the application, and execute the Exec Maven Plugin. Shell $ cd .. $ cd multimodule/ $ mvn clean verify $ mvn exec:exec ... [INFO] --< com.mydeveloperplanet.mymavenverifyinstallplanet:multimoduledep >--- [INFO] Building Dependencies 0.0.1-SNAPSHOT [3/3] [INFO] from multimoduledep/pom.xml [INFO] --------------------------------[ jar ]--------------------------------- [INFO] ------------------------------------------------------------------------ [INFO] Reactor Summary for Multimodule project 0.0.1-SNAPSHOT: [INFO] [INFO] Multimodule project ................................ SUCCESS [ 0.080 s] [INFO] No dependencies .................................... SUCCESS [ 0.015 s] [INFO] Dependencies ....................................... FAILURE [ 0.005 s] [INFO] ------------------------------------------------------------------------ [INFO] BUILD FAILURE [INFO] ------------------------------------------------------------------------ [INFO] Total time: 0.175 s [INFO] Finished at: 2024-11-03T15:51:57+01:00 [INFO] ------------------------------------------------------------------------ [ERROR] Failed to execute goal on project multimoduledep: Could not resolve dependencies for project com.mydeveloperplanet.mymavenverifyinstallplanet:multimoduledep:jar:0.0.1-SNAPSHOT: The following artifacts could not be resolved: com.mydeveloperplanet.mymavenverifyinstallplanet:multimodulenodep:jar:0.0.1-SNAPSHOT (absent): Could not find artifact com.mydeveloperplanet.mymavenverifyinstallplanet:multimodulenodep:jar:0.0.1-SNAPSHOT -> [Help 1] [ERROR] [ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch. [ERROR] Re-run Maven using the -X switch to enable full debug logging. [ERROR] [ERROR] For more information about the errors and possible solutions, please read the following articles: [ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/DependencyResolutionException [ERROR] [ERROR] After correcting the problems, you can resume the build with the command [ERROR] mvn <args> -rf :multimoduledep This fails. Because the multimoduledep module depends on the multimodulenodep module, the Maven Reactor cannot find the necessary artifacts because the Maven commands are executed in two separate runs. If you execute the commands simultaneously, the build will be successful. Shell $ mvn clean verify exec:exec This is how this problem can be solved, but then you cannot have multiple build stages anymore. Final Solution This last problem can be fixed by adding some extra arguments when executing the Maven commands. These arguments will be used in combination with mvn install and will install artifacts in a separate part of the local Maven repository. Add the arguments: -Daether.enhancedLocalRepository.split-Daether.enhancedLocalRepository.localPrefix=branches/feature-newversion The first will allow you to have a split local repository. The second will create a split local repository within the ~/.m2/repository directory to store the artifacts of your build. You can choose the local prefix yourself. 1. Feature Branches Check out the feature branch feature-newversion and execute the following commands for each application. Shell $ mvn clean install -Daether.enhancedLocalRepository.split -Daether.enhancedLocalRepository.localPrefix=branches/feature-newversion $ mvn exec:exec -Daether.enhancedLocalRepository.split -Daether.enhancedLocalRepository.localPrefix=branches/feature-newversion Every build is successful, and the Exec Maven Plugin can also be executed successfully. All artifacts are now located in ~/.m2/repository/branches/feature-newversion. Run the dep application, the new version of nodep is used. Shell $ java -jar ~/.m2/repository/branches/feature-newversion/com/mydeveloperplanet/mymavenverifyinstallplanet/dep/0.0.1-SNAPSHOT/dep-0.0.1-SNAPSHOT-jar-with-dependencies.jar Dependencies has artifact: No dependencies new version 2. Master Branch Switch to the master branch and execute the following commands for each application. Shell $ mvn clean install $ mvn exec:exec Every build is successful, and the Exec Maven Plugin can also be executed successfully. All artifacts are located in ~/.m2/repository/. Run the dep application, the initial version of nodep is used. Shell $ java -jar ~/.m2/repository/com/mydeveloperplanet/mymavenverifyinstallplanet/dep/0.0.1-SNAPSHOT/dep-0.0.1-SNAPSHOT-jar-with-dependencies.jar Dependencies has artifact: No dependencies initial version Conclusion Depending on the setup of your build server, you can choose one of the solutions above. So, should you use mvn verify or mvn install? Well, it depends. The solution providing the most flexibility is the one shown in the final solution.
DevOps pipelines work best when they’re efficient, collaborative, and driven by data. In order to accomplish these objectives, it is necessary to have a view of the processes, measures, and performance of your pipeline. This is where business intelligence (BI) tools come in. BI tools were traditionally used for analyzing enterprises but are now being adopted in DevOps, contributing to processes and results in a positive way. Specifically, open-source BI tools are changing the tables in the DevOps pipelines because of their affordability, flexibility, and user implementation. They allow teams to track deployment intervals, monitor performance through lids, and troubleshoot issues with performance. Using Metabase, Redash, and Superset, organizations are integrating data analysis and DevOps practices, creating a smoother bridge from design to delivery. In this article, we’ll cover how these tools improve visibility, automation, and decision-making in DevOps pipelines. The Role of Open Source in Driving Innovation When using open-source solutions, one might face issues such as integration with existing systems, a project becoming too large in the future, or data safety. The great thing is that these challenges have solutions. Concentrate on choosing tools that have an active community that can help and sufficient documentation. That way, you will always have someone to walk you through. Working with open-source tools might be daunting to a new DevOps team. But it is better if one does not rush. Have them go through some basic learning and get their hands dirty. They can begin struggling and developing mastery by addressing various tasks of the learning process. Adopting open-source solutions into your processes goes hand in hand with dealing with data usage regulations. This holds true more in cases that involve open-source business intelligence where data is sensitive. To be sufficiently prepared, have strict rules about how data can be employed, ensure audits are routinely carried out, and ensure tools are reasonably secured. Source: Faster Capital Enhancing Data Visibility in DevOps Pipelines BI tools sift and showcase the precise information right from your pipeline, making it easy to follow and comprehend every single activity that occurs in every stage. These tools enhance the processes by single-handedly managing and monitoring the persistent deployment of code changes and the uptake of many bug fixes. Once this has been achieved, deficiencies can be identified and rectified, teamwork can be heightened, and the quality and speed of delivery of software can be significantly enhanced. Key Metrics and Insights BI Tools Offer If you install a BI tool in your pipeline, you will be able to track such measures as: Deployment Frequency: The frequency of your team pushing changes to your production environment.Lead Time: The interval from the moment a code is committed to the repository until it's being pushed to production.Change Failure Rate: The amount of changes that resulted in failures over the total amount of changes made.Mean Time to Recovery (MTTR): The amount of time the team needs to resolve an issue. These metrics enable you to benchmark your performance on various aspects of DevOps and help you foster their improvement. Automation and Decision-Making Through BI Integration BI tools automate the tracking of all DevOps processes so one can easily visualize, analyze, and interpret the key metrics. Rather than manually monitoring the metrics, such as the percentage ratio of successfully deployed applications or the time taken to deploy an application, one is now able to simply rely on BI to spot such trends in the first place. This gives one the ability to operationalize insights which saves time and ensures that pipelines are well managed. For example, such a system will automatically roll out a deployment or send a message to the right team if the number of errors in a particular application exceeds the set number. This form of automation helps mitigate the issue of excessive downtimes and greatly improves the ability of the organization to react to various issues. Examples of BI-Driven Automated Workflows Increased Performance: BI dashboards can be programmed to monitor server performance metrics, with easy functionality to trigger when scaling is needed to avoid any form of downtime, especially during heavy traffic.Releases: BI tools can automatically trigger the release of applications once all necessary requirements for a given BI are satisfied, such as coverage after conducting a test.Incident Response: BI-based notifications can improve faster-assigned tasks to one’s DevOps team when certain thresholds have been exceeded. Key Open-Source BI Tools for DevOps Pipelines Source: NCPL The use of DevOps tools can be enhanced through various data visualization tools, such as Metabase, Superset, and Redash. Let's take a look at how each of these tools can fit within your DevOps pipelines. Metabase: Simple Yet Powerful If you are looking for an easy-to-use tool, Metabase is the best option available. It allows you to build dashboards and query databases without the need to write elaborate codes. It also allows the user to retrieve data from a variety of systems, which, from a business perspective, allows a user to measure KPIs, for example, deployment frequency or the occurrence of system-related problems. Superset: Enterprise-Grade Insights If you have big resources that need monitoring, Superset is perfect. Superset was designed with the concept of big data loads in mind, offering advanced visualization and projection technology for different data storage devices. Businesses with medium-complexity operational structures optimize the usage of Superset thanks to its state-of-the-art data manipulation abilities. Redash: Collaboration at Its Core Redash is one of the best tools for enabling team collaboration. It allows your team to build queries and dashboards easily and synchronize them so every team member is in the loop. Thanks to its connection with different data sources, it is an ideal program for tracking the health of a pipeline and identifying clogs. Choosing the Right Tool Ease of Use: Metabase emerges on top due to its user-friendliness and accessibility.Scalability: Superset is superior due to complex operational capabilities.Collaboration: Redash is effective in cases where working in teams is essential. Challenges of Integrating Open-Source BI Tools in DevOps When it comes to integrating open-source tools for your business intelligence practice, it is quite common to run into integration barriers with the current DevOps stack. For this to go smoothly, one has to understand the APIs and the flow of data amongst the systems. Scaling can also be a problem — many tools are ideal for setups but will struggle to handle larger databases or multiple users. Furthermore, privacy is still a major challenge. Since one of the characteristics of open-source tools is that they depend on community support, there may be some risks if there are no timely updates. Try out all these tools in a sandbox environment first. To ease these problems, look for tools with a great community behind them that are constantly updated. Your team may be resistant to using the new tools, especially if they have not used BI interfaces or coding practices before. Onboarding is not easy because open-source tools do not have the documents that proprietary solutions have, further making it a challenge. Make sure you get team training as an early step. Use community forums, tutorials, and open-source documentation to help ease the learning curve. As data privacy regulation laws like GDPR and CCPA come into effect, the question of how BI tools handle data is raised. One worry is that open-source tools don’t always come with built-in features to ensure compliance. Instead, the onus of implementing adequate data governance measures will be up to you. Incorporate data masking and encryption, along with role-based access control systems, into your processes. Ensure there are well-documented procedures within your organization regarding the handling of data. The Impact on Pipeline Efficiency and Collaboration Open-source BI enhances the efficiency of the pipeline and collaboration within DevOps. It demystifies work processes and provides actionable information, helping teams streamline work, reduce bottlenecks, and make decisions. Its flexibility and transparency also make integration easier and help both DevOps and data teams achieve common objectives more efficiently. Conclusion The revolution of DevOps pipelines with the introduction of open-source BI tools has been dynamic and steady. These tools provide inexpensive, highly customizable, and flexible perspectives for data analysis and decision-making. Such tools allow faster insights, more collaboration, and faster problem-solving within the timeline of the pipeline. With the application of open-source BI tools, DevOps teams can improve performance, automate processes, and accelerate the speed of data-oriented development cycles. The integration of these tools is a good advancement towards more efficient, responsible, and cooperative DevOps settings.
Usually, when we want to classify content, we rely on labeled data and train machine learning models with that labeled data to make predictions on new or unseen data. For example, we might label each image in the image dataset as "dog" or "cat" or categorize an article as "tutorial" or "review." These labels help the model learn and make predictions on new data. But here is the problem: getting labeled data is not always easy. Sometimes, it can be really expensive or time-consuming, and on top of that, new labels might pop up as time goes on. That is where zero-shot classification comes into the picture. With zero-shot models, we can classify content without needing to train on every single labeled class beforehand. These models can generalize to new categories based on natural language by using pre-trained language models that have been trained on huge amounts of text. Zero-Shot Classification With Hugging Face In this article, I will use Hugging Face's Transformers library to perform zero-shot classification with a pre-trained BART model. Let's take a quick summary of a DZone article and categorize it into one of the following categories: "Tutorial," "Opinion," "Review," "Analysis," or "Survey." Environment Setup Ensure Python 3.10 or higher is installed.Install the necessary packages mentioned below. Shell pip install transformers torch Now, let's use the following short summary from my previous article to perform zero-shot classification and identify the category mentioned above: Summary: "Learn how Integrated Gradients help identify which input features contribute most to the model's predictions to ensure transparency." Python from transformers import pipeline # Initializing zero-shot classification pipeline using BART pre-trained model zero_shot_classifier = pipeline("zero-shot-classification", model="facebook/bart-large-mnli") # tl;dr from this article - https://dzone.com/articles/integrated-gradients-ai-explainability article_summary = """ Learn how Integrated Gradients helps identify which input features contribute most to the model's predictions to ensure transparency. """ # sample categories from DZone - sample_categories = ["Tutorial", "Opinion", "Review", "Analysis", "Survey"] # Now, classify article into one of the sample categories. category_scores = zero_shot_classifier(article_summary, sample_categories) # pick the category with highest score and print cateogry = category_scores['labels'][0] print(f"The article is most likely a '{cateogry}'") The model classified that the article is most likely a Tutorial. We could also check the scores of each category instead of picking the one with the highest score. Python # Print score for each category for category in range(len(category_scores['labels'])): print(f"{category_scores['labels'][i]}: {category_scores['scores'][i]:.2f}") Here is the output: Plain Text Tutorial: 0.53 Review: 0.20 Survey: 0.12 Analysis: 0.10 Opinion: 0.06 These scores are helpful if you want to use zero-shot classification to identify the most appropriate tags for your content. Conclusion In this article, we explored how zero-shot classification can be used to categorize content without the need to train on labeled data. As you can see from the code, it is very easy to implement and requires just a few lines. While easy and flexible, these models might not work well in specialized categories where the model does not understand the specific terminology. For example, classifying a medical report into one of the categories like "Cardiology," "Oncology," or "Neurology" requires a deep understanding of medical terms that were not part of the model's pre-training. In those cases, you might still need to fine-tune the model with specific datasets for better results. Additionally, zero-shot models may have trouble with ambiguous language or context-dependent tasks, such as detecting sarcasm or cultural references.
GraphQL is both a query language for APIs and a runtime for executing those queries with your existing data. It offers a comprehensive and clear description of the data available in your API, allows clients to request precisely what they need without excess, facilitates the evolution of APIs over time, and supports robust developer tools. GraphQL Access Control and Query Optimization Access Control Authorization is a set of rules or business logic that determines whether a user, session, or context has the access control list (ACL) to perform certain actions or view specific data. Such behavior should be enforced at the business logic layer. While it might be tempting to place authorization logic directly within the GraphQL layer, it is generally more appropriate to handle it elsewhere. Query Optimization Optimizing GraphQL queries involves techniques such as minimizing unnecessary data fetching and processing to improve response times and reduce server load. This results in a more efficient application, offering a better user experience, enhanced user engagement and retention, and increased server scalability. Ways to Optimize GraphQL Query N+1 (Batched Loading) Over-Fetching and Under-FetchingCaching for Reduced LatencyPagination Enhancing GraphQL Access Control and GraphQL Query Optimization Authorization Engine validates node and field permissions using ACL (access control list) and optimizes GraphQL queries based on the permissions. Additionally, the N+1 Resolver improves performance by consolidating queries. This approach addresses the N+1 problem. Figure 1 Actors GraphQL Query: A request sent by a client to a GraphQL server to fetch specific data.GraphQL Engine: Refers to a service that allows you to execute GraphQL queries. Authorization Engine: Validates the permissions of nodes and fields using ACL and optimizes the GraphQL query based on field permissions.N+1 Resolver: Reduces the overhead and latency associated with multiple database queries, improving performance and efficiency. Figure 2 Figure 3 Enhancing Access Control Existing Implementation Let’s assume an app has Products, Vendors, and Addresses. Accessing Vendor and Address data requires user authentication, which can be easily handled in resolvers. However, Vendor data is restricted to admin users, and some vendor fields are accessible only to specific roles. On the other hand, products are public and can be viewed with the Product Owner role. Running the query mentioned (Figure 3) will return product data along with associated Vendor and Address data, leading to a security issue. Enhanced Implementation Node and fields of query get validated as per user’s permission, and optimized GraphQL query is generated. Figure 3.1 (Flow Diagram) Figure 4 Figure 4 is an example of how you might implement custom authentication in a GraphQL API using Node.js. In this code, the role is extracted from the Authorization header and then used, along with the query, to validate authorization. Figure 5 In this example, we're using role and permission of a user from session or request that control access to specific queries and mutations in the GraphQL schema. We have defined the role and permission rules in GraphQL schema for all users. This method recursively checks the role and permission on all nodes and fields. If a node or field is not permitted, then it performs one of the following actions based on config: Remove node or field.Return field or node with an error message.Throw exception. Once the GraphQL query is validated against the schema, this method will return the optimized GraphQL query. Enhancing ORM Query Optimization Using N+1 Existing Implementation It is a technical issue that impacts the performance of a query. It can occur in applications that use an Object-Relational Mapping (ORM) framework to access a database. It typically arises when the application needs to load a collection of related entities from the database. Let's assume there are 2 products; it needs to make 1 query to retrieve the products and 2 additional queries to fetch the vendors for each product individually. Figure 6 Figure 7 One query retrieves the list of products, and a separate query fetches the vendors for each product. The number of vendor queries equals the number of products. Enhanced Implementation Once the request reaches the N+1 resolver, the N+1 resolver optimizes queries to minimize the number of database requests by reducing them to a single query for the main data and an additional query for related data, effectively mitigating the N+1 problem. Main Data Query: A single query that retrieves the main set of data required for the application.Related Data Query: An additional query fetching any related data necessary to complete the dataset. Figure 8 Figure 9: Total No. of Queries are 2 (1 for Product, 1 for Vendor) Figure 10 After all query resolvers have been completed, the server returns the optimized data to the client in JSON format. Conclusion We have outlined that implementing fine-grained authorization mechanisms, along with enhanced security measures, is crucial for securing a GraphQL API. This includes both node-level and field-level security. As this article describes, the N+1 problem occurs when an unoptimized query leads to excessive database calls in GraphQL. We have provided a solution to overcome the N+1 problem. It decreases database server resource usage, resulting in cost savings and better scalability. Additionally, by enhancing query performance, this improves the user experience by reducing query response times.
If there is one area where AI clearly demonstrates its value, it's knowledge management. Every organization, regardless of size, is inundated with vast amounts of documentation and meeting notes. These documents are often poorly organized, making it nearly impossible for any individual to read, digest, and stay on top of everything. However, with the power of large language models (LLMs), this problem is finally finding a solution. LLMs can read a variety of data and retrieve answers, revolutionizing how we manage knowledge. This potential has sparked discussions about whether search engines like Google could be disrupted by LLMs, given that these models can provide hyper-personalized answers. We are already witnessing this shift, with many users turning to platforms like ChatGPT or Perplexity for their day-to-day questions. Moreover, specialized platforms focusing on corporate knowledge management are emerging. However, despite the growing enthusiasm, there remains a significant gap between what the world perceives AI is capable of today and its actual capabilities. Over the past few months, I’ve explored building various AI-based tools for business use cases, discovering what works and what doesn’t. Today, I’ll share some of these insights on how to create a robust application that is both reliable and accurate. How to Provide LLMs With Knowledge For those unfamiliar, there are two common methods for giving large language models your private knowledge: fine-tuning or training your own model and retrieval-augmented generation (RAG). 1. Fine-Tuning This method involves embedding knowledge directly into the model's weights. While it allows for precise knowledge with fast inference, fine-tuning is complex and requires careful preparation of training data. This method is less common due to the specialized knowledge required. 2. Retrieval-Augmented Generation (RAG) The more widely used approach is to keep the model unchanged and insert knowledge into the prompt, a process some refer to as "in-context learning." In RAG, instead of directly answering user questions, the model retrieves relevant knowledge and documents from a private database, incorporating this information into the prompt to provide context. The Challenges of Simple RAG Implementations While RAG might seem simple and easy to implement, creating a production-ready RAG application for business use cases is highly complex. Several challenges can arise: Messy Real-World Data Real-world data is often not just simple text; it can include images, diagrams, charts, and tables. Normal data parsers might extract incomplete or messy data, making it difficult for LLMs to process. Accurate Information Retrieval Even if you create a database from company knowledge, retrieving relevant information based on user questions can be complicated. Different types of data require different retrieval methods, and sometimes, the information retrieved might be insufficient or irrelevant. Complex Queries Simple questions might require answers from multiple data sources, and complex queries might involve unstructured and structured data. Therefore, simple RAG implementations often fall short in handling real-world knowledge management use cases. Advanced RAG Techniques Thankfully, there are several tactics to mitigate these risks: Better Data Parsers Real-world data is often messy, especially in formats like PDFs or PowerPoint files. Traditional parsers, like PyPDF, might extract data incorrectly. However, newer parsers like LlamaParser, developed by LlamaIndex, offer higher accuracy in extracting data and converting it into an LLM-friendly format. This is crucial for ensuring the AI can process and understand the data correctly. Optimizing Chunk Size When building a vector database, it's essential to break down documents into small chunks. However, finding the optimal chunk size is key. If it is too large, the model might lose context; if it is too small, it might miss critical information. Experimenting with different chunk sizes and evaluating the results can help determine the best size for different types of documents. Reranking and Hybrid Search Reranking involves using a secondary model to ensure the most relevant chunks of data are presented to the model first, improving both accuracy and efficiency. Hybrid search, combining vector and keyword searches, can also provide more accurate results, especially in cases like e-commerce, where exact matches are critical. Agentic RAG This approach leverages agents' dynamic and reasoning abilities to optimize the RAG pipeline. For example, query translation can be used to modify user questions into more retrieval-friendly formats. Agents can also perform metadata filtering and routing to ensure only relevant data is searched, enhancing the accuracy of the results. Building an Agentic RAG Pipeline Creating a robust agentic RAG pipeline involves several steps: 1. Retrieve and Grade Documents First, retrieve the most relevant documents. Then, use the LLM to evaluate whether the documents are relevant to the question asked. 2. Generate Answers If the documents are relevant, generate an answer using the LLM. 3. Web Search If the documents are not relevant, perform a web search to find additional information. 4. Check for Hallucinations After generating an answer, check if the answer is grounded in the retrieved documents. If not, the system can either regenerate the answer or perform additional searches. 5. Use LangGraph and Llama3 Using tools like LangGraph and Llama3, you can define the workflow, setting up nodes and edges that determine the flow of information and the checks performed at each stage. Conclusion As you can see, building a reliable and accurate RAG pipeline involves balancing various factors, from data parsing and chunk sizing to reranking and hybrid search techniques. While these processes can slow down the response time, they significantly improve the accuracy and relevance of the answers provided by the AI. I encourage you to explore these methods in your projects and share your experiences. As AI continues to evolve, the ability to effectively manage and retrieve knowledge will become increasingly critical.
This is an illustrated walkthrough for how to find the cause of high memory usage in Go programs with standard tools, based on our recent experience. There's plenty of information online, but it still took us considerable time and effort to find what works and what doesn't. Hopefully, this post will save time for everyone else. Context At Adiom, we're building an open-source tool for online database migration and real-time replication, called dsync. Our primary focus is on ease of use, robustness, and speed. One of the areas that we're focusing on heavily is data integrity, as Enterprise customers want to ensure data consistency between the source and the destination and know the differences, if there are any. To that end, we have recently come across Merkle Search Trees (MST) that have some interesting properties. It may be worth a whole different post, but for those interested now, here's the link to the original paper. In theory, MSTs should allow us to efficiently and cheaply diff the source and destination datasets. Naturally, we wanted to experiment with that, so we added a dsync verify <source> <destination> command as a POC leveraging an open-source reference implementation from GitHub: jrhy/mast. It worked like a charm on "toy" datasets from 100 to 1000 records. But in one of our larger tests, to our surprise, the binary consumed 3.5GB RAM for a million records on both sides. While not a whole lot in absolute numbers, this was orders of magnitude higher than what we were expecting — maybe 10s or 100s of megabytes — because we only stored the record ID (8 bytes) and a hash (8 bytes) for each record. Our first thought was a memory "leak." Similar to Java, Go manages memory for you and has a garbage collector based on open object references. There's no concept of leaks as such, but rather unexpected or undesirable accumulation of objects that the code kept references to. Unlike Java though, Go doesn't run as byte code on top of a JVM, so it doesn't know which specific objects are accumulating in memory. Part 1: Run Garbage Collector In Go, you can forcefully trigger garbage collection hoping that the memory is consumed by objects that are no longer needed: Go func main() { // App code log.Println("Triggering garbage collection...") runtime.GC() printMemUsage() // More app code } func printMemUsage() { var m runtime.MemStats runtime.ReadMemStats(&m) log.Printf("Alloc = %v MiB", bToMb(m.Alloc)) log.Printf("TotalAlloc = %v MiB", bToMb(m.TotalAlloc)) log.Printf("Sys = %v MiB", bToMb(m.Sys)) log.Printf("NumGC = %v\n", m.NumGC) } Unfortunately, that only freed up a few hundred MBs for us, so it wasn't very effective. Part 2: Examine the Memory In theory, we should be able to do the coredump of the process (dump the raw memory) and then examine it directly. Here are some relevant resources for that: Obtaining the core dump from a running process: Go Wiki: CoreDumpDebuggingSpecific instructions for Mac OS where gcore doesn't work right away: "How to Dump a Core File on MacOS"The viewcore tool to parse the coredump: GitHub: golang/debug Unfortunately, the viewcore didn't work with our coredump, presumably because we're doing development on a Mac laptop with Darwin architecture: Shell user@MBP viewcore % ./viewcore /cores/dsync-52417-20241114T234221Z html failed to parse core: bad magic number '[207 250 237 254]' in record at byte 0x0 If you're using Linux, you should give it a shot. Part 3: Profiling and Debugging We had to resort to investigative debugging instead. The most common sources of memory leaks in Go are: Accumulating and not closing resources such as network connections, files, etc.Global variablesSlices with large backing arraysLingering Go routines that don't finish After a brief examination of the codebase, we didn't see any smoking guns. Our next step was to use pprof. Pprof lets us sample a running process and obtain a heap dump that can later be examined. Pprof Setup Pprof runs as a webserver in your application. Adding it is easy: Go import ( "net/http" _ "net/http/pprof" // Import pprof package for profiling ) func main() { go func() { log.Println(http.ListenAndServe("localhost:8081", nil)) // Start pprof server }() // Your application logic here } In our case, it's accessible on port 8081. When the application is running and shows excessive memory consumption, we can collect heap dumps: Shell curl -s http://localhost:8081/debug/pprof/heap > /tmp/heap1.out I recommend collecting a few just to have a few different samples. Note that pprof actually samples information about allocations, so it's not going to be a 100% representation. By default, it's 1 sample per 512kb allocated. From here on out, it's really smooth sailing. Memory Profiling We examined inuse_space and inuse_objects with pprof that track the active space and the number of objects respectively: Shell go tool pprof --inuse_objects /tmp/heap1.out go tool pprof --inuse_space /tmp/heap1.out The in-use space was particularly fruitful. First, we tried to visualize the memory allocation using the web command, which opens a new web browser window. Shell user@MBP viewcore % go tool pprof --inuse_space /tmp/heap1.out File: dsync Type: inuse_space Time: Nov 14, 2024 at 5:34pm (PST) Entering interactive mode (type "help" for commands, "o" for options) (pprof) web (pprof) The thicker the link on the graph, the more memory went to that node (and that node is bigger, too). An alternative way to do this is to use the top5 command: Shell user@MBP viewcore % go tool pprof --inuse_space /tmp/heap1.out File: dsync Type: inuse_space Time: Nov 14, 2024 at 5:34pm (PST) Entering interactive mode (type "help" for commands, "o" for options) (pprof) top5 Showing nodes accounting for 1670.40MB, 99.37% of 1680.96MB total Dropped 43 nodes (cum <= 8.40MB) Showing top 5 nodes out of 7 flat flat% sum% cum cum% 1471.89MB 87.56% 87.56% 1471.89MB 87.56% github.com/jrhy/mast.split 78MB 4.64% 92.20% 851.79MB 50.67% github.com/adiom-data/dsync/internal/app.(*mastVerify).Run.func1.processMast.1 77MB 4.58% 96.78% 826.13MB 49.15% github.com/adiom-data/dsync/internal/app.(*mastVerify).Run.func2.processMast.1 43.50MB 2.59% 99.37% 43.50MB 2.59% bytes.Clone 0 0% 99.37% 1677.92MB 99.82% github.com/adiom-data/dsync/internal/app.(*source).ProcessSource.func2 (pprof) These told us that the excessive memory was allocated in the mast.split function, and that it is a lot of 4.75kB objects. While we can't see what those objects are, we can see where in the code they were allocated using the list command: Shell user@MBP viewcore % go tool pprof --inuse_space /tmp/heap1.out File: dsync Type: inuse_space Time: Nov 14, 2024 at 5:34pm (PST) Entering interactive mode (type "help" for commands, "o" for options) (pprof) top5 Showing nodes accounting for 1670.40MB, 99.37% of 1680.96MB total Dropped 43 nodes (cum <= 8.40MB) Showing top 5 nodes out of 7 flat flat% sum% cum cum% 1471.89MB 87.56% 87.56% 1471.89MB 87.56% github.com/jrhy/mast.split 78MB 4.64% 92.20% 851.79MB 50.67% github.com/adiom-data/dsync/internal/app.(*mastVerify).Run.func1.processMast.1 77MB 4.58% 96.78% 826.13MB 49.15% github.com/adiom-data/dsync/internal/app.(*mastVerify).Run.func2.processMast.1 43.50MB 2.59% 99.37% 43.50MB 2.59% bytes.Clone 0 0% 99.37% 1677.92MB 99.82% github.com/adiom-data/dsync/internal/app.(*source).ProcessSource.func2 (pprof) list github.com/jrhy/mast.split Total: 1.64GB ROUTINE ======================== github.com/jrhy/mast.split in /Users/alexander/go/pkg/mod/github.com/jrhy/mast@v1.2.32/lib.go 1.44GB 1.54GB (flat, cum) 93.57% of Total . . 82:func split(ctx context.Context, node *mastNode, key interface{}, mast *Mast) (leftLink, rightLink interface{}, err error) { . . 83: var splitIndex int . . 84: for splitIndex = 0; splitIndex < len(node.Key); splitIndex++ { . . 85: var cmp int . . 86: cmp, err = mast.keyOrder(node.Key[splitIndex], key) . . 87: if err != nil { . . 88: return nil, nil, fmt.Errorf("keyCompare: %w", err) . . 89: } . . 90: if cmp == 0 { . . 91: panic("split shouldn't need to handle preservation of already-present key") . . 92: } . . 93: if cmp > 0 { . . 94: break . . 95: } . . 96: } . . 97: var tooBigLink interface{} = nil 8MB 8MB 98: left := mastNode{ . . 99: Node{ 416.30MB 416.30MB 100: make([]interface{}, 0, cap(node.Key)), 458.51MB 458.51MB 101: make([]interface{}, 0, cap(node.Value)), 427.87MB 427.87MB 102: make([]interface{}, 0, cap(node.Link)), . . 103: }, . . 104: true, false, nil, nil, . . 105: } . . 106: left.Key = append(left.Key, node.Key[:splitIndex]...) . . 107: left.Value = append(left.Value, node.Value[:splitIndex]...) . . 108: left.Link = append(left.Link, node.Link[:splitIndex+1]...) . . 109: . . 110: // repartition the left and right subtrees based on the new key . . 111: leftMaxLink := left.Link[len(left.Link)-1] . . 112: if leftMaxLink != nil { . . 113: var leftMax *mastNode . . 114: leftMax, err = mast.load(ctx, leftMaxLink) . . 115: if mast.debug { . . 116: fmt.Printf(" splitting leftMax, node with keys: %v\n", leftMax.Key) . . 117: } . . 118: if err != nil { . . 119: return nil, nil, fmt.Errorf("loading leftMax: %w", err) . . 120: } . 91.39MB 121: leftMaxLink, tooBigLink, err = split(ctx, leftMax, key, mast) . . 122: if err != nil { . . 123: return nil, nil, fmt.Errorf("splitting leftMax: %w", err) . . 124: } . . 125: if mast.debug { . . 126: fmt.Printf(" splitting leftMax, node with keys: %v is done: leftMaxLink=%v, tooBigLink=%v\n", leftMax.Key, leftMaxLink, tooBigLink) . . 127: } . . 128: left.Link[len(left.Link)-1] = leftMaxLink . . 129: } . . 130: if !left.isEmpty() { . . 131: leftLink, err = mast.store(&left) . . 132: if err != nil { . . 133: return nil, nil, fmt.Errorf("store left: %w", err) . . 134: } . . 135: } 1MB 1MB 136: right := mastNode{ . . 137: Node{ 54.24MB 54.24MB 138: make([]interface{}, 0, cap(node.Key)), 56.75MB 56.75MB 139: make([]interface{}, 0, cap(node.Value)), 49.22MB 49.22MB 140: make([]interface{}, 0, cap(node.Link)), . . 141: }, . . 142: true, false, nil, nil, . . 143: } . . 144: right.Key = append(right.Key, node.Key[splitIndex:]...) . . 145: right.Value = append(right.Value, node.Value[splitIndex:]...) . . 146: right.Link = append(right.Link, node.Link[splitIndex:]...) . . 147: right.Link[0] = tooBigLink . . 148: . . 149: rightMinLink := right.Link[0] . . 150: if rightMinLink != nil { . . 151: var rightMin *mastNode . . 152: rightMin, err = mast.load(ctx, rightMinLink) . . 153: if err != nil { . . 154: return nil, nil, fmt.Errorf("load rightMin: %w", err) . . 155: } . . 156: var tooSmallLink interface{} . 9.54MB 157: tooSmallLink, rightMinLink, err = split(ctx, rightMin, key, mast) . . 158: if err != nil { . . 159: return nil, nil, fmt.Errorf("split rightMin: %w", err) . . 160: } . . 161: if mast.debug { . . 162: fmt.Printf(" splitting rightMin, node with keys %v, is done: tooSmallLink=%v, rightMinLink=%v", (pprof) Now we could clearly see what those objects were: arrays with preallocated capacity. These store the []interface{} type, which is 2 words in memory or 16 bytes on our machine (64-bit system). It must be a large number, at least on average (4.75kB / 16 bytes ~= 300). The mystery is half-solved. We weren't sure if that capacity was being used or not. So we used delve, which is a debugger for Go. Debugging A simple code inspection showed that those objects are part of tree nodes. To find out what they actually looked like, we used VS Code debug mode to attach to the running process and inspect these objects: Use the configuration to attach to the running process with the following launch.json: JSON { // Use IntelliSense to learn about possible attributes. // Hover to view descriptions of existing attributes. // For more information, visit: https://go.microsoft.com/fwlink/?linkid=830387 "version": "0.2.0", "configurations": [ { "name": "Attach to Process", "type": "go", "request": "attach", "mode": "local", "processId": <PID of the running process> } ] } 2. Add a breakpoint in our verification loop where the tree objects are accessed. This immediately let us see that the array length was often much less than the preallocated size (cap) that was set: As a probe, I created a forked implementation that had no caps set in node allocation and the memory usage immediately dropped to 600MB, which is a 6x win. Mystery solved!
In my previous post, I talked about why Console.log() isn’t the most effective debugging tool. In this installment, we will do a bit of an about-face and discuss the ways in which Console.log() is fantastic. Let’s break down some essential concepts and practices that can make your debugging life much easier and more productive. Front-End Logging vs. Back-End Logging Front-end logging differs significantly from back-end logging, and understanding this distinction is crucial. Unlike back-end systems, where persistent logs are vital for monitoring and debugging, the fluid nature of front-end development introduces different challenges. When debugging backends, I’d often go for tracepoints, which are far superior in that setting. However, the frontend, with its constant need to refresh, reload, contexts switch, etc., is a very different beast. In the frontend, relying heavily on elaborate logging mechanisms can become cumbersome. While tracepoints remain superior to basic print statements, the continuous testing and browser reloading in front-end workflows lessen their advantage. Moreover, features like logging to a file or structured ingestion are rarely useful in the browser, diminishing the need for a comprehensive logging framework. However, using a logger is still considered best practice over the typical Console.log for long-term logging. For short-term logging Console.log has some tricks up its sleeve. Leveraging Console Log Levels One of the hidden gems of the browser console is its support for log levels, which is a significant step up from rudimentary print statements. The console provides five levels: log: Standard loggingdebug: Same as log but used for debugging purposesinfo: Informative messages, often rendered like log/debugwarn: Warnings that might need attentionerror: Errors that have occurred While log and debug can be indistinguishable, these levels allow for a more organized and filtered debugging experience. Browsers enable filtering the output based on these levels, mirroring the capabilities of server-side logging systems and allowing you to focus on relevant messages. Customizing Console Output With CSS Front-end development allows for creative solutions, and logging is no exception. Using CSS styles in the console can make logs more visually distinct. By utilizing %c in a console message, you can apply custom CSS: CSS console.customLog = function(msg) { console.log("%c" + msg,"color:black;background:pink;font-family:system-ui;font-size:4rem;-webkit-text-stroke: 1px black;font-weight:bold") } console.customLog("Dazzle") This approach is helpful when you need to make specific logs stand out or organize output visually. You can use multiple %c substitutions to apply various styles to different parts of a log message. Stack Tracing With console.trace() The console.trace() method can print a stack trace at a particular location, which can sometimes be helpful for understanding the flow of your code. However, due to JavaScript’s asynchronous behavior, stack traces aren’t always as straightforward as back-end debugging. Still, it can be quite valuable in specific scenarios, such as synchronous code segments or event handling. Assertions for Design-by-Contract Assertions in front-end code allow developers to enforce expectations and promote a “fail-fast” mentality. Using Console.assert(), you can test conditions: JavaScript console.assert(x > 0, 'x must be greater than zero'); In the browser, a failed assertion appears as an error, similar to console.error. An added benefit is that assertions can be stripped from production builds, removing any performance impact. This makes assertions a great tool for enforcing design contracts during development without compromising production efficiency. Printing Tables for Clearer Data Visualization When working with arrays or objects, displaying data as tables can significantly enhance readability. The console.table() method allows you to output structured data easily: JavaScript console.table(["Simple Array", "With a few elements", "in line"]) This method is especially handy when debugging arrays of objects, presenting a clear, tabular view of the data and making complex data structures much easier to understand. Copying Objects to the Clipboard Debugging often involves inspecting objects, and the copy(object) method allows you to copy an object’s content to the clipboard for external use. This feature is useful when you need to transfer data or analyze it outside the browser. Inspecting With console.dir() and dirxml() The console.dir() method provides a more detailed view of objects, showing their properties as you’d see in a debugger. This is particularly helpful for inspecting DOM elements or exploring API responses. Meanwhile, console.dirxml() allows you to view objects as XML, which can be useful when debugging HTML structures. Counting Function Calls Keeping track of how often a function is called or a code block is executed can be crucial. The console.count() method tracks the number of times it’s invoked, helping you verify that functions are called as expected: JavaScript function myFunction() { console.count('myFunction called'); } You can reset the counter using console.countReset(). This simple tool can help you catch performance issues or confirm the correct execution flow. Organizing Logs With Groups To prevent log clutter, use console groups to organize related messages. console.group() starts a collapsible log section and console.groupEnd() closes it: JavaScript console.group('My Group'); console.log('Message 1'); console.log('Message 2'); console.groupEnd(); Grouping makes it easier to navigate complex logs and keeps your console clean. Chrome-Specific Debugging Features Monitoring Functions: Chrome’s monitor() method logs every call to a function, showing the arguments and enabling a method-tracing experience. Monitoring Events: Using monitorEvents(), you can log events on an element. This is useful for debugging UI interactions. For example, monitorEvents(window, 'mouseout') logs only mouseout events. Querying Object Instances: queryObjects(Constructor) lists all objects created with a specific constructor, giving you insights into memory usage and object instantiation. Final Word Front-end debugging tools have come a long way. These tools provide a rich set of features that go far beyond simple console.log() statements. From log levels and CSS styling to assertions and event monitoring, mastering these techniques can transform your debugging workflow. If you read this post as part of my series, you will notice a big change in my attitude toward debugging when we reach the front end. Front-end debugging is very different from back-end debugging. When debugging the backend, I’m vehemently against code changes for debugging (e.g., print debugging), but on the frontend, this can be a reasonable hack. The change in environment justifies it. The short lifecycle, the single-user use case, and the risk are smaller. Video
The inverted MoSCoW framework reverses traditional prioritization, focusing on what a product team won’t build rather than what it will. Deliberately excluding features helps teams streamline development, avoid scope creep, and maximize focus on what truly matters. While it aligns with Agile principles of simplicity and efficiency, it also requires careful implementation to avoid rigidity, misalignment, or stifling innovation. Used thoughtfully, it’s a powerful tool for managing product scope and driving strategic clarity. Read on and learn how to make the inverted MoSCoW framework work for your team. Starting With MoSCoW The MoSCoW prioritization framework is a tool that helps product teams focus their efforts by categorizing work into four levels: Must-Have: These are non-negotiable requirements without which the software won’t function or meet its goals. Think of them as your product’s foundation. For example, a functional payment gateway is a “must-have” in an e-commerce app.Should-Have: These features add significant value but are not mission-critical. If time or resources are tight, these can be deferred without breaking the product. For example, a filtering option on a search page might enhance usability, but it isn’t essential to release the app.Could-Have: These are “nice-to-haves” — features that are not essential but would improve user experience if implemented. For example, animations or light/dark modes might fit here.Won’t-Have (this time): These are features explicitly deprioritized for this release cycle. They’re still documented but deferred for later consideration. For instance, integrating a recommendation engine for your MVP might be a “won’t-have.” To learn more about the original MoSCoW framework, see the Chapter 10 of the DSDM Agile Project Framework manual. MoSCoW Benefits The MoSCoW framework offers significant benefits, including clarity and focus by distinguishing critical features from optional ones, which helps product teams prioritize effectively. It supports aligning development efforts with time, budget, and technical capacity constraints. Its adaptability allows teams to respond to changes quickly, dropping lower-priority items when necessary. Additionally, the framework fosters team alignment by creating a shared understanding across cross-functional groups, ensuring everyone works toward common goals. MoSCoW Shortcomings However, the MoSCoW framework also has notable shortcomings, including its tendency to oversimplify prioritization by overlooking nuances like feature dependencies, where a “must-have” might rely on a “could-have.” It often lacks a quantitative assessment, relying instead on subjective judgments from stakeholders or leadership, which can misalign priorities by neglecting measurable impact or effort. Without solid discipline, it risks inflating the “must-have” category, overwhelming teams, and diluting focus. Additionally, the framework may emphasize output over outcomes, leading teams to prioritize delivering features without ensuring they achieve desired customer or business results. Known MoSCoW anti-patterns are: Applied Without Strategic Context: When priorities are assigned without tying them to a clear product vision, business outcomes, or user needs, the framework becomes arbitrary. Ensure “must-haves” directly support critical business or customer goals.The “Must-Have Creep:” Stakeholders often push too many items into the “must-have” category, which undermines prioritization and can lead to overcommitment. Push back by asking, “What happens if we don’t deliver this?”Static Priorities: MoSCoW works best in an iterative context but can fail if teams treat the categories as rigid. Priorities should be reviewed frequently, especially during discovery phases or as new constraints emerge.Ignoring Dependencies: A feature might seem low-priority but could block a higher-priority item. Consequently, technical dependencies should always be considered during prioritization.Siloed Decision-Making: If product managers assign priorities without consulting developers, it may lead to technical infeasibilities or underestimating the complexity of “must-haves.” Turning MoSCoW Upside Down: Meet the Inverted MoSCoW Let’s run a little thought experiment: Do you remember the principle of the Agile Manifesto that “Simplicity — the art of maximizing the amount of work not done — is essential?” So, why not turn MoSCoW upside down? The inverted MoSCoW framework flips the original framework, focusing primarily on what a product team will not build. Instead of prioritizing features or tasks to be included in a release, this approach emphasizes deliberate exclusions, helping teams identify and articulate boundaries. Here’s how it’s structured: Won’t-Have (Absolutely Not): Features or tasks explicitly ruled out for the foreseeable future. These could be ideas that don’t align with the product vision, are too costly or complex, or don’t deliver enough value. The debate ends here.Could-Have (But Unlikely): Features that might be considered someday but aren’t practical or impactful enough to be prioritized soon. They are low-value additions or enhancements with minimal urgency. Maybe we’ll get to them, perhaps we won’t — no promises.Should-Have (Under Very Specific Conditions): Features that might be built under exceptional circumstances. These could address edge cases, serve niche audiences, or depend on favorable future conditions, like extra resources or demand. Unless something significant changes, forget about them.Deferred Consideration (Maybe in the Future): These features or improvements are explicitly out of scope. The team acknowledges they may be somewhat important but intentionally excludes them, for now, to stay focused on core objectives. What Are the Benefits of an Inverted MoSCoW? Turning the original on its head has several advantages as we change the perspective and gain new insights: Aligns With Agile’s Simplicity Principle: The inverted MoSCoW approach reinforces Agile’s focus on maximizing the amount of work not done. By prioritizing exclusions, it ensures the team spends its energy on the most valuable and impactful work.Improves Focus and Efficiency: Defining what will not be built reduces distraction, scope creep, and debate during development. Teams avoid wasting time on “shiny object” features that may feel exciting but offer limited value.Encourages Strategic Restraint: By explicitly stating exclusions, the inverted framework helps ensure resources are allocated to problems that matter most. It also guards against overpromising or committing to ideas that lack clear value.Facilitates Transparent Communication: Stakeholders often feel disappointed when their ideas aren’t included. The inverted structure clarifies why specific ideas are excluded, fostering alignment and reducing conflict.Enables Long-Term Thinking: Teams can park features or ideas in “Could-Have (But Unlikely)” or “Must-Have (Future Consideration)” categories, ensuring they remain documented without distracting from immediate priorities.Prevents Cognitive Overload: Developers and product teams can stay laser-focused on what matters most without getting bogged down by debates over “extras.” It simplifies decision-making by narrowing the scope upfront. If we compare both approaches in a simplified version, it would look like this: AspectOriginal MoSCoWInverted MoSCoWFocusWhat will be builtWhat won’t be builtApproachInclusion-orientedExclusion-orientedGoalMaximize delivery of prioritized featuresMinimize waste and distractionsStakeholder RoleDefine priorities collaborativelyUnderstand and accept exclusionsScope Creep RiskMedium – “Should/Could” items may creep into workLow – Explicitly avoids unnecessary featuresAlignment with AgileSupports incremental deliveryEmbraces simplicity and focus “Flipping MoSCoW” aligns with Agile principles by minimizing unnecessary work, improving focus, and reducing cognitive overload. It fosters transparent communication, encourages strategic restraint, and documents ideas for future consideration, ensuring product teams target what truly matters. By anchoring exclusions in product vision, engaging stakeholders early, and revisiting decisions regularly, teams can avoid scope creep and manage expectations effectively while maintaining flexibility to adapt. Practical Steps to Use the Inverted MoSCoW Framework If you were to use the inverted MoSCoW framework to identify valuable work, consider the following practical steps to familiarize team members and stakeholders with the approach and get the communication right: Start With Product Vision and Strategy: Anchor exclusions in the product vision and strategy. For example, if you want to create a lightweight, user-friendly app, explicitly rule out features that add unnecessary complexity or bloat.Engage Stakeholders Early: Discuss exclusions with stakeholders upfront to set expectations and reduce future conflict. Use the inverted framework to clarify decisions to avoid being considered a black hole for decisions or simple “nay-sayers” by stakeholders.Build a Backlog of Exclusions — an Anti-Product Backlog: Maintain a list of features that won’t be built. This anti-product backlog serves as a transparent guide for future discussions.Revisit Regularly: Just as priorities can shift, so can exclusions. Reassess your “Could-Have” and “Should-Have” lists periodically to determine if conditions have changed — inspection and adaption will be crucial to maximizing value creation within the given constraints by the organization.Document the Rationale: For every exclusion, document why it was made, as they are dynamic and context-dependent. This context helps prevent revisiting the same debates and ensures alignment across teams and stakeholders. What isn’t feasible or aligned today might become critical in the future. Keep an archive of exclusions and periodically reassess them. Moreover, it would be helpful to consider applying complementary practices with the inverted MoSCoW framework, for example: Combine Frameworks for Robust Prioritization: Pair the inverted MoSCoW framework with other tools like Impact-Effort Matrices to identify low-effort, high-impact features that might deserve reconsideration or Opportunity Solution Trees to visualize how exclusions align with overarching goals.Use Prototyping and Experimentation: Validate an idea’s potential impact through lightweight prototypes or experiments before ruling it out. This ensures that promising concepts aren’t prematurely excluded. Also, expect practical challenges you will have to address when utilizing the inverted MoSCoW framework, for example: Resistance to Exclusions: Stakeholders often struggle to accept that their ideas are being excluded. To counter this, frame exclusions positively—focus on the value of prioritization and the benefits of delivering a lean, focused product.Exclusions as “Final Decisions:” Exclusions aren’t permanent. They’re tools for managing focus and scope at a specific moment. Encourage teams to view them as flexible and open to reassessment.Balance Between Focus and Innovation: While the framework promotes clarity and efficiency, excessive focus on exclusions can hinder creative exploration. Reserve space for continuous product discovery to keep the product competitive. Drawbacks of the Inverted MoSCoW Framework The inverted MoSCoW framework is valuable for defining what a product team will not build, helping teams focus on simplicity and efficiency. However, like its traditional counterpart, it is not without flaws. One significant challenge is the subjectivity in deciding exclusions. Stakeholders may struggle to align on what belongs in the “Won’t-Have” category, leading to potential conflicts or misaligned expectations. Without clear, objective criteria for exclusions, decisions risk being arbitrary or biased, undermining strategic goals and damaging team cohesion. Another critique is the framework’s tendency to encourage over-simplicity, which can stifle innovation or long-term thinking. While focusing on “not building” aligns with Agile principles of simplicity, over-prioritizing exclusions can narrow the product’s scope too much, leaving teams unprepared for future opportunities or changing market conditions. Balancing exclusions with flexibility is crucial, ensuring ideas with strategic potential aren’t entirely dismissed but appropriately categorized for future consideration. The framework also struggles to account for dependencies between excluded and included features. Excluding a “Won’t-Have” feature without understanding its role in supporting other work can inadvertently disrupt development, causing delays or requiring rework. Similarly, failing to consider effort or complexity in exclusions may result in missed opportunities to deliver low-effort, high-impact features. Teams must evaluate dependencies and efforts to ensure exclusions don’t inadvertently hinder progress or innovation. Finally, the inverted MoSCoW framework can become rigid, especially in agile, iterative environments where priorities shift rapidly. Exclusions defined early may no longer align with emerging user needs or business goals, creating tension between strategic intent and practical reality. To mitigate this, teams must treat exclusions as dynamic, revisiting and reassessing them regularly to ensure they remain relevant and effective. By addressing these critiques, the inverted MoSCoW framework can remain a powerful tool for managing focus and simplicity without sacrificing flexibility or strategic foresight. Conclusion The inverted MoSCoW framework is a powerful tool but is most effective as part of a broader prioritization strategy. By emphasizing collaboration, grounding decisions in data, and maintaining flexibility, you can ensure that exclusions support — not hinder — your product’s long-term success. Keep iterating, communicating, and aligning decisions with strategic goals, and the framework will serve as a valuable ally in your product development efforts.
In today’s cloud-native ecosystem, ensuring High Availability (HA) is a critical requirement for containerized applications running on Kubernetes. HA ensures that your workloads remain operational even in the face of failures, outages, or traffic spikes. Platforms like Amazon Elastic Kubernetes Service (EKS), Azure Kubernetes Service (AKS), and Red Hat Kubernetes Service (RKS) provide managed Kubernetes solutions that simplify cluster management, but achieving true HA requires careful configuration and planning. This article offers a comprehensive guide to setting up HA in EKS, AKS, and RKS, covering foundational concepts, platform-specific configurations, and advanced features like Horizontal Pod Autoscaler (HPA). With actionable examples and best practices, this guide equips you to build resilient, production-grade Kubernetes environments. What Is High Availability in Kubernetes? High Availability (HA) refers to the ability of a system to remain operational even during hardware failures, software crashes, or unexpected traffic surges. Kubernetes inherently supports HA with features like pod replication, self-healing, and autoscaling, but managed platforms like EKS, AKS, and RKS offer additional features that simplify achieving HA. Core Principles of High Availability Multi-Zone Deployments: Spread workloads across multiple availability zones to avoid single points of failure.Self-Healing: Automatically replace failed pods and nodes.Horizontal Pod Autoscaler (HPA): Dynamically scale workloads based on demand.Stateful Resilience: Ensure stateful workloads use reliable persistent storage.Disaster Recovery: Plan for cross-region failover to mitigate regional outages. High Availability in EKS (Amazon Elastic Kubernetes Service) Amazon EKS integrates seamlessly with AWS infrastructure to deliver HA. Here’s how to configure it: Step 1: Multi-Zone Deployment Deploy worker nodes across multiple availability zones during cluster creation: YAML eksctl create cluster \ --name my-cluster \ --region us-west-2 \ --zones us-west-2a,us-west-2b,us-west-2c \ --nodegroup-name standard-workers Step 2: Stateful Application Resilience Use Amazon Elastic Block Store (EBS) for persistent storage: YAML apiVersion: v1 kind: PersistentVolumeClaim metadata: name: ebs-claim spec: accessModes: - ReadWriteOnce resources: requests: storage: 10Gi Step 3: Horizontal Pod Autoscaler (HPA) in EKS Enable the Metrics Server to use HPA: YAML kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml Define an HPA resource for your workload: YAML apiVersion: autoscaling/v2 kind: HorizontalPodAutoscaler metadata: name: nginx-hpa spec: scaleTargetRef: apiVersion: apps/v1 kind: Deployment name: nginx-deployment minReplicas: 2 maxReplicas: 10 metrics: - type: Resource resource: name: cpu target: type: Utilization averageUtilization: 50 Step 4: Disaster Recovery Configure multi-region failover with AWS Route 53. Use latency-based DNS routing to direct traffic to healthy clusters across regions. High Availability in AKS (Azure Kubernetes Service) Azure Kubernetes Service offers features like Availability Zones and seamless integration with Azure’s ecosystem. Step 1: Multi-Zone Configuration Deploy AKS with zone-redundant nodes: YAML az aks create \ --resource-group myResourceGroup \ --name myAKSCluster \ --location eastus \ --node-count 3 \ --enable-cluster-autoscaler \ --zones 1 2 3 Step 2: Resilient Networking Leverage Azure Application Gateway for highly available Ingress: YAML az network application-gateway create \ --resource-group myResourceGroup \ --name myAppGateway \ --capacity 2 Step 3: Horizontal Pod Autoscaler in AKS HPA is pre-configured in AKS. Define an HPA resource similar to EKS. Combine it with the Cluster Autoscaler: YAML az aks update \ --resource-group myResourceGroup \ --name myAKSCluster \ --enable-cluster-autoscaler Step 4: Stateful Workloads Use Azure Disk for resilient storage: YAML apiVersion: v1 kind: PersistentVolumeClaim metadata: name: azure-disk-pvc spec: accessModes: - ReadWriteOnce resources: requests: storage: 10Gi storageClassName: managed-premium High Availability in RKS (Red Hat Kubernetes Service) RKS, based on OpenShift, provides robust HA features through Operators and advanced cluster management. Step 1: Multi-Zone Deployment Distribute worker nodes across zones: YAML openshift-install create cluster --zones us-west-2a,us-west-2b Step 2: Stateful Applications Use OpenShift Container Storage (OCS) for persistent data storage: YAML apiVersion: v1 kind: PersistentVolumeClaim metadata: name: ocs-claim spec: accessModes: - ReadWriteMany resources: requests: storage: 20Gi Step 3: Horizontal Pod Autoscaler in RKS OpenShift natively supports HPA. Deploy a sample configuration as shown earlier and monitor scaling behavior using OpenShift’s built-in dashboards. Best Practices for High Availability Set Realistic Resource Limits Configure CPU and memory requests/limits to avoid resource contention. YAML resources: requests: cpu: 500m memory: 512Mi limits: cpu: 1 memory: 1Gi Enable Proactive Monitoring Use tools like Prometheus and Grafana to track pod scaling, node health, and resource utilization. Test Failover Scenarios Regularly simulate zone or region failures to validate disaster recovery configurations. Combine HPA With Cluster Autoscaler Ensure the cluster can scale nodes when HPA scales pods beyond current capacity. Optimize Costs Use spot instances for non-critical workloads and configure autoscalers to scale down during low-traffic periods. Conclusion Achieving container high availability in EKS, AKS, and RKS requires a blend of platform-specific configurations, best practices, and advanced Kubernetes features like HPA. By following this guide, you can build resilient, scalable, and cost-efficient Kubernetes environments that are ready for production. HA is more than just uptime — it’s about delivering trust, performance, and reliability to your users. Start implementing these strategies today to elevate your Kubernetes deployments to the next level.
Today, the database world is rapidly moving towards AI and ML, and the workload of databases is expected to increase significantly. For a database administrator, it will be an additional responsibility to predict the workload of database infrastructure ahead of time and address the need. As databases scale and resource management become increasingly critical, traditional capacity planning methods often fall short, leading to performance issues and unplanned downtime. PostgreSQL, one of the most widely used open-source relational databases, is no exception. With increasing demands on CPU, memory, and disk space, database administrators (DBAs) must adopt proactive approaches to prevent bottlenecks and improve efficiency. In this article, we'll explore how Long Short-Term Memory (LSTM) machine learning models can be applied to predict resource consumption in PostgreSQL databases. This approach enables DBAs to move from reactive to predictive capacity planning, thus reducing downtime, improving resource allocation, and minimizing over-provisioning costs. Why Predictive Capacity Planning Matters By leveraging machine learning, DBAs can predict future resource needs and address them before they become critical, resulting in: Reduced downtime: Early detection of resource shortages helps avoid disruptions.Improved efficiency: Resources are allocated based on real needs, preventing over-provisioning.Cost savings: In cloud environments, accurate resource predictions can reduce the cost of excess provisioning. How Machine Learning Can Optimize PostgreSQL Resource Planning To accurately predict PostgreSQL resource usage, we applied an optimized LSTM model, a type of recurrent neural network (RNN) that excels at capturing temporal patterns in time-series data. LSTMs are well-suited for understanding complex dependencies and sequences, making them ideal for predicting CPU, memory, and disk usage in PostgreSQL environments. Methodology Data Collection Option 1 To build the LSTM model, we need to collect performance data from various PostgreSQL system server OS commands and db view, such as: pg_stat_activity (active connections details within Postgres Database),vmstatfreedf The data can be captured every few minutes for six months, providing a comprehensive dataset for training the model. The collected metrics can be stored in a dedicated table named capacity_metrics. Sample Table Schema: SQL CREATE TABLE capacity_metrics ( time TIMESTAMPTZ PRIMARY KEY, cpu_usage DECIMAL, memory_usage DECIMAL, disk_usage BIGINT, active_connections INTEGER ); There are multiple ways to capture this system data into this history table. One of the ways is to write the Python script and schedule it through crontab for every few minutes. Option 2 For testing flexibility, we can generate CPU, memory, and disk utilization metrics using code (synthetic data generation) and execute using the Google Colab Notebook. For this paper testing analysis, we used this option. The steps are explained in the following sections. Machine Learning Model: Optimized LSTM The LSTM model was selected for its ability to learn long-term dependencies in time-series data. Several optimizations were applied to improve its performance: Stacked LSTM layers: Two LSTM layers were stacked to capture complex patterns in the resource usage data.Dropout regularization: Dropout layers were added after each LSTM layer to prevent overfitting and improve generalization.Bidirectional LSTM: The model was made bidirectional to capture both forward and backward patterns in the data.Learning rate optimization: A learning rate of 0.001 was chosen for fine-tuning the model’s learning process. The model was trained for 20 epochs with a batch size of 64, and performance was measured on unseen test data for CPU, memory, and storage (disk) usage. Below is a summary of the steps along with Google Colab Notebook screenshots used in the data setup and machine learning experiment: Step 1: Data Setup (Simulated CPU, Memory, Disk Usage Data for 6 Months) Step 2: Add More Variation to the Data Step 3: Create DataFrame for Visualization or Further Usage Step 4: Function to Prepare LSTM Data, Train, Predict, and Plot Step 5: Run the Model for CPU, Memory, and Storage Results The optimized LSTM model outperformed traditional methods such as ARIMA and linear regression in predicting CPU, memory, and disk usage. The predictions closely tracked the actual resource usage, capturing both the short-term and long-term patterns effectively. Here are the visualizations of the LSTM predictions: Figure 1: Optimized LSTM CPU Usage Prediction Figure 2: Optimized LSTM Memory Usage Prediction Figure 3: Optimized LSTM Disk Usage Prediction Practical Integration With PostgreSQL Monitoring Tools To maximize the utility of the LSTM model, various practical implementations within PostgreSQL's monitoring ecosystem can be explored: pgAdmin integration: pgAdmin can be extended to visualize real-time resource predictions alongside actual metrics, enabling DBAs to respond proactively to potential resource shortages.Grafana dashboards: PostgreSQL metrics can be integrated with Grafana to overlay LSTM predictions on performance graphs. Alerts can be configured to notify DBAs when predicted usage is expected to exceed predefined thresholds.Prometheus monitoring: Prometheus can scrape PostgreSQL metrics and use the LSTM predictions to alert, generate forecasts, and set up notifications based on predicted resource consumption.Automated scaling in cloud environments: In cloud-hosted PostgreSQL instances (e.g., AWS RDS, Google Cloud SQL), the LSTM model can trigger autoscaling services based on forecasted increases in resource demand.CI/CD pipelines: Machine learning models can be continuously updated with new data, retrained, and deployed in real-time through CI/CD pipelines, ensuring that predictions remain accurate as workloads evolve. Conclusion By applying LSTM machine learning models to predict CPU, memory, and disk usage, PostgreSQL capacity planning can shift from a reactive to a proactive approach. Our results show that the optimized LSTM model provides accurate predictions, enabling more efficient resource management and cost savings, particularly in cloud-hosted environments. As database ecosystems grow more complex, these predictive tools become essential for DBAs looking to optimize resource utilization, prevent downtime, and ensure scalability. If you're managing PostgreSQL databases at scale, now is the time to leverage machine learning for predictive capacity planning and optimize your resource management before performance issues arise. Future Work Future improvements could include: Experimenting with additional neural network architectures (e.g., GRU or Transformer models) to handle more volatile workloads.Extending the methodology to multi-node and distributed PostgreSQL deployments, where network traffic and storage optimization also play significant roles.Implementing real-time alerts and further integrating predictions into PostgreSQL’s operational stack for more automated management.Experimenting with Oracle Automated Workload Repository (AWR) data for Oracle database workload predictions
Management Capabilities 101: Ensuring On-Time Delivery in Agile-Driven Projects
December 3, 2024 by
Understanding Product Team Empowerment Anti-Patterns
December 3, 2024 by CORE
Beyond Principles: Embracing Heuristics in DDD for Practical Solutions
December 3, 2024 by CORE
Simplify NoSQL Database Integration in Java With Eclipse JNoSQL 1.1.3
December 3, 2024 by CORE
Explainable AI: Making the Black Box Transparent
May 16, 2023 by CORE
A Practical Approach to Vulnerability Management: Building an Effective Pipeline
December 3, 2024 by
Karpenter vs. Kubernetes Cluster Autoscaler: Which Is Right for You?
December 3, 2024 by
Low Code vs. Traditional Development: A Comprehensive Comparison
May 16, 2023 by
Karpenter vs. Kubernetes Cluster Autoscaler: Which Is Right for You?
December 3, 2024 by
Deploying Databricks Asset Bundles
December 3, 2024 by
Management Capabilities 101: Ensuring On-Time Delivery in Agile-Driven Projects
December 3, 2024 by
A Practical Approach to Vulnerability Management: Building an Effective Pipeline
December 3, 2024 by
Low Code vs. Traditional Development: A Comprehensive Comparison
May 16, 2023 by
Simplify NoSQL Database Integration in Java With Eclipse JNoSQL 1.1.3
December 3, 2024 by CORE
Why ‘mvn install’ May Risk Your Builds
December 3, 2024 by CORE
Five IntelliJ Idea Plugins That Will Change the Way You Code
May 15, 2023 by