The RAG Illusion: Why “Grafting” Memory Is No Longer Enough
Architectural Understanding of CPUs, GPUs, and TPUs
Database Systems
Every organization is now in the business of data, but they must keep up as database capabilities and the purposes they serve continue to evolve. Systems once defined by rows and tables now span regions and clouds, requiring a balance between transactional speed and analytical depth, as well as integration of relational, document, and vector models into a single, multi-model design. At the same time, AI has become both a consumer and a partner that embeds meaning into queries while optimizing the very systems that execute them. These transformations blur the lines between transactional and analytical, centralized and distributed, human driven and machine assisted. Amidst all this change, databases must still meet what are now considered baseline expectations: scalability, flexibility, security and compliance, observability, and automation. With the stakes higher than ever, it is clear that for organizations to adapt and grow successfully, databases must be hardened for resilience, performance, and intelligence. In the 2025 Database Systems Trend Report, DZone takes a pulse check on database adoption and innovation, ecosystem trends, tool usage, strategies, and more — all with the goal for practitioners and leaders alike to reorient our collective understanding of how old models and new paradigms are converging to define what’s next for data management and storage.
Getting Started With CI/CD Pipeline Security
Java Caching Essentials
Overview The use of practical tools to evaluate performance in consistent, predictable ways across various platform configurations is necessary to optimize software. Ampere’s open-source availability of the Ampere Performance Toolkit (APT) enables customers and developers to take a systematic approach to performance analysis. The Ampere Performance Toolkit provides an automated way to run and benchmark important application data. The toolkit makes it faster and easier to set up, run, and repeat performance tests across bare metal and various clouds leveraging a mature, automated framework for utilizing best known configurations, a simple YAML file input for configuring resources for cloud-based tests, and numerous examples running common benchmarks including Cassandra, MySQL, and Redis on a variety of cloud vendors or internally provisioned platforms. This blog summarizes the rationale and function of the APT, the basics for getting started, and how you may contribute. We invite you to explore deeper in the Ampere Performance Toolkit Repository. How APT Works Test topology: APT currently runs two types of tests: single-system and client-server. Single-system tests run all necessary commands on the system under test and return the results directly from that system. Networking may not be a factor in this scenario. Client-server tests have different commands. The client system will have a separate set of automated instructions to prepare the load generator that stresses the server over a network. Type of provisioning: APT automates the provisioning of virtual machines provided that the user is authenticated to the appropriate cloud service provider and able to authenticate correctly for the automated commands used to create the machines, network, disks, and other resources necessary to run benchmarks. Alternatively, a user can define machines either running in a cloud service provider or on-prem machines. Automation steps occur in five discrete stages that enable a user to pass a simple command-line to run a benchmark. Provision: Provision resources necessary to run the test. This step is skipped when machines are statically defined.Prepare: Installs necessary dependencies that enable the application to run.Run: Application run stage where the test is active on the server. Tests will also parse and save run results once the test finishes.Cleanup: Removes all dependency packages used by the application.Teardown: Removes provisioned resources from the cloud service provider. A step is skipped in the case of static machines. Getting Started There are prerequisites before a user begins running their first test. All the documentation for setting up these requirements is outlined in the project’s README. Prerequisites for static virtual machine tests: Passwordless SSH must be configured for all systems being used for tests.Client-server test must have passwordless SSH configured for both systems.Passwordless sudo must be granted for the defined user running the test.E.g., user “apt” has passwordless sudo to run MySQL. For cloud-based tests: APT automates the creation of all resources necessary to set up tests, provided that the defined YAML file is correct, and all necessary permissions are granted to the user to provision cloud resources. Dependencies: Python 3.11 or greater.User must create a virtual environment for all pip-installed APT dependencies. Check out the full Ampere article collection here.
The rapid advancement of generative AI (GenAI) has created unprecedented opportunities to transform technical support operations. However, it has also introduced unique challenges in quality assurance that traditional monitoring approaches simply cannot address. As enterprise AI systems become increasingly complex, particularly in technical support environments, we need more sophisticated evaluation frameworks to ensure their reliability and effectiveness. Why Traditional Monitoring Fails for GenAI Support Agents Most enterprises rely on what's commonly called "canary testing" — predefined test cases with known inputs and expected outputs that run at regular intervals to validate system behavior. While these approaches work well for deterministic systems, they break down when applied to GenAI support agents for several fundamental reasons: Infinite input variety: Support agents must handle unpredictable natural language queries that cannot be pre-scripted. A customer might describe the same technical issue in countless different ways, each requiring proper interpretation. Resource configuration diversity: Each customer environment contains a unique constellation of resources and settings. An EC2 instance in one account might be configured entirely differently from one in another account, yet agents must reason correctly about both. Complex reasoning paths: Unlike API-based systems that follow predictable execution flows, GenAI agents make dynamic decisions based on customer context, resource state, and troubleshooting logic. Dynamic agent behavior: These models continuously learn and adapt, making static test suites quickly obsolete as agent behavior evolves. Feedback lag problem: Traditional monitoring relies heavily on customer-reported issues, creating unacceptable delays in identifying and addressing quality problems. A Concrete Example Consider an agent troubleshooting a cloud database access issue. The complexity becomes immediately apparent: The agent must correctly interpret the customer's description, which might be technically imprecise It needs to identify and validate relevant resources in the customer's specific environment It must select appropriate APIs to investigate permissions and network configurations It needs to apply technical knowledge to reason through potential causes based on those unique conditions Finally, it must generate a solution tailored to that specific environment This complex chain of reasoning simply cannot be validated through predetermined test cases with expected outputs. We need a more flexible, comprehensive approach. The Dual-Layer Solution Our solution is a dual-layer framework combining real-time evaluation with offline comparison: Real-time component: Uses LLM-based "jury evaluation" to continuously assess the quality of agent reasoning as it happens Offline component: Compares agent-suggested solutions against human expert resolutions after cases are completed Together, they provide both immediate quality signals and deeper insights from human expertise. This approach gives comprehensive visibility into agent performance without requiring direct customer feedback, enabling continuous quality assurance across diverse support scenarios. How Real-Time Evaluation Works The real-time component collects complete agent execution traces, including: Customer utterances Classification decisions Resource inspection results Reasoning steps These traces are then evaluated by an ensemble of specialized "judge" large language models (LLMs) that analyze the agent's reasoning. For example, when an agent classifies a customer issue as an EC2 networking problem, three different LLM judges independently assess whether this classification is correct given the customer's description. Using majority voting creates a more robust evaluation than relying on any single model. We apply strategic downsampling to control costs while maintaining representative coverage across different agent types and scenarios. The results are published to monitoring dashboards in real-time, triggering alerts when performance drops below configurable thresholds. Offline Comparison: The Human Expert Benchmark While real-time evaluation provides immediate feedback, our offline component delivers deeper insights through comparative analysis. It: Links agent-suggested solutions to final case resolutions in support management systems Performs semantic comparison between AI solutions and human expert resolutions Reveals nuanced differences in solution quality that binary metrics would miss For example, we discovered our EC2 troubleshooting agent was technically correct but provided less detailed security group explanations than human experts. The multi-dimensional scoring assesses correctness, completeness, and relevance, providing actionable insights for improvement. Most importantly, this creates a continuous learning loop where agent performance improves based on human expertise without requiring explicit feedback collection. Technical Implementation Details Our implementation balances evaluation quality with operational efficiency: A lightweight client library embedded in agent runtimes captures execution traces without impacting performance These traces flow into a FIFO queue that enables controlled processing rates and message grouping by agent type A compute unit processes these traces, applying downsampling logic and orchestrating the LLM jury evaluation Results are stored with streaming capabilities that trigger additional processing for metrics publication and trend analysis This architecture separates evaluation logic from reporting concerns, creating a more maintainable system. We've implemented graceful degradation so the system continues providing insights even when some LLM judges fail or are throttled, ensuring continuous monitoring without disruption. Specialized Evaluators for Different Reasoning Components Different agent components require specialized evaluation approaches. Our framework includes a taxonomy of evaluators tailored to specific reasoning tasks: Domain classification: LLM judges assess whether the agent correctly identified the technical domain of the customer's issue Resource validation: We measure the precision and recall of the agent's identification of relevant resources Tool selection: Evaluators assess whether the agent chose appropriate diagnostic APIs given the context Final solutions: Our GroundTruth Comparator measures semantic similarity to human expert resolutions This specialized approach lets us pinpoint exactly where improvements are needed in the agent's reasoning chain, rather than simply knowing that something went wrong somewhere. Measurable Results and Business Impact Implementing this framework has driven significant improvements across our AI support operations: Increased successful case deflection by 20% while maintaining high customer satisfaction scores Detected previously invisible quality issues that traditional metrics missed, such as discovering that some agents were performing unnecessary credential validations that added latency without improving solution quality Accelerated improvement cycles thanks to detailed, component-level feedback on reasoning quality Built greater confidence in agent deployments, knowing that quality issues will be quickly detected and addressed before they impact customer experience Conclusion and Future Directions As AI reasoning agents become increasingly central to technical support operations, sophisticated evaluation frameworks become essential. Traditional monitoring approaches simply cannot address the complexity of these systems. Our dual-layer framework demonstrates that continuous, multi-dimensional assessment is possible at scale, enabling responsible deployment of increasingly powerful AI support systems. Looking ahead, we're working on: More efficient evaluation methods to reduce computational overhead Extending our approach to multi-turn conversations Developing self-improving evaluation systems that refine their assessment criteria based on observed patterns For organizations implementing GenAI agents in complex technical environments, establishing comprehensive evaluation frameworks should be considered as essential as the agent development itself. Only through continuous, sophisticated assessment can we realize the full potential of these systems while ensuring they consistently deliver high-quality support experiences.
In this article, we will walk you through how to conduct a load test and analyze the results using Java Maven technology. We'll covering everything from launching the test to generating informative graphs and tables. For this demonstration, we'll utilize various files, including Project Object Model (POM) files, JMeters scripts, and CSV data, from the jpetstore_loadtesting_dzone project available on GitHub. This will help illustrate the steps involved and the functionality of the necessary plugins and tools. You can find the project here: https://github.com/vdaburon/jpetstore_loadtesting_dzone. The web application being tested is a well-known application called JPetStore, which you can further explore at https://github.com/mybatis/jpetstore-6. Advantages of This Solution for Launching and Analyzing Tests The details of how to implement this solution and the details of Maven launches will be covered in the subsequent chapters. For now, let's highlight the key advantages: For Installation There is no need to pre-install Apache JMeter to conduct the load tests, as the JMeter Maven Plugin automatically fetches the Apache JMeter tool and the necessary plugins from the Maven Central Repository.The file paths used are relative to the Maven project, which means they can vary across different machines during the development phase and in Continuous Integration setups.For Continuous Integration This solution seamlessly integrates into Continuous Integration pipelines (such as Jenkins and GitLab). Tests can be easily run on a Jenkins node or a GitLab Runner, making it accessible for both developers and testers.Performance graphs from operating system or Java monitoring tools can be easily added using monitoring tools like nmon + nmon visualizer for Linux environments.For Developers and Testers Java developers and testers familiar with Maven will feel comfortable with the pom.xml files and Git.The load testing project is managed like a standard Java Maven project within the Integrated Development Environments (IDEs) such as IntelliJ, Eclipse, and Visual Studio.The various files (POM, JMeter script, CSV data) can be version-controlled using Git or other source control systems.The project's README.md file (in Markdown format) can serve as valuable documentation on how to run load tests and analyze results, particularly in Integrated Control (IC).For Analysis The analysis is fast, as various output files are created in just a few minutes.Users can filter results to focus only on specific pages, excluding the URLs invoked within them.The plugin's filter tool allows users to analyze results by load steps, running it multiple times with different start and end offset parameters.For clearer graphs, users can filter to present response times per scenario with a manageable number of curves.Users can force the Y-axis for better comparison of graphs because they are on the same scale, for example, setting Y = 5000 ms for response times or from 0 to 100% for CPU usage.Aggregate and Synthesis reports are available in both CSV format and HTML tables for easy display on a webpage.After executing a load test, users can quickly review results through the generated index.html page, which provides easy access to graphs and HTML tables.The generated HTML page includes links to file sizes, and clicking on these links offers a view of content, like JMeter logs, in a browser.If a particular graph is missing, users can create duration graphs for each URL called on a page using the "JMeterPluginsCMD Command Line Tool" and "Filter Results Tool" from the JMeter results file or directly through JMeter's Swing GUI interface.For Report Generation Graphs created during the analysis can be directly imported into reports created in Microsoft Word or LibreOffice Writer formats.CSV reports can be edited in a spreadsheet software (Microsoft Excel or LibreOffice Calc), and the formatted values can then be easily copied into a Word or Writer report.For Archiving Archiving results is quite simple; users can save the zipped directory containing all the results and analyses.This archiving format approach makes it easy to compare different load test campaigns.The retention period for results can be extensive, stretching several years, as the file format is simple and clear; unlike data stored in documents, relational databases, or temporal databases, it remains easily accessible and understandable. Running a Load Test With Maven and Apache JMeter If you're looking to run a load test using Apache JMeter, there is a Maven plugin available for that purpose. This plugin is called the jmeter-maven-plugin , and you can find it at its project URL: https://github.com/jmeter-maven-plugin/jmeter-maven-plugin. To effectively run your performance tests with Java Maven, you need a few essentials: A JDK/JRE version 1.8 or higher (such as version 17)A recent version of Maven (3.7 or higher)A Maven pom.xml file One of the great things about this setup is that you don't need to install Apache JMeter beforehand. It's also a good idea to have a Git client available for fetching crucial resources from the repository such as the JMeter script, external configuration files, and any CSV data files you'll need. For easier management, it is recommended to maintain two Maven files: The first Maven file, pom.xml (pom_01_launch_test.xml), is dedicated to launching the performance testThe second Maven file, pom.xml (pom_02_analyse_results.xml), is for analyzing the results JMeter Maven Plugin Recommended Project Directory Structure The Maven project designed for launching load tests comes with a predefined directory structure. For the jmeter-maven-plugin , this structure can be found at: ${project.base.directory}/src/test/jmeter In this directory, you need to place the following items: The JMeter script (.jmx)The dataset files (.csv)External configuration files referenced in the JMeter script (.properties)The JMeter configuration file (user.properties) if you are using any non-standard properties The pom.xml File for Launching the Load Test The first pom.xml file (pom_01_launch_test.xml) includes the declaration of the jmeter-maven-plugin with some configuration properties. Plain Text <project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 https://maven.apache.org/xsd/maven-4.0.0.xsd"> <modelVersion>4.0.0</modelVersion> <groupId>io.github.vdaburon</groupId> <artifactId>jpetstore-maven-load-test-dzone</artifactId> <version>1.0</version> <packaging>pom</packaging> <name>01 - Launch a load test of the JPetstore web application with the maven plugin</name> <description>Launch a load test of the JPetstore web application with the maven plugin</description> <inceptionYear>2025</inceptionYear> <developers> <developer> <id>vdaburon</id> <name>Vincent DABURON</name> <email>[email protected]</email> <roles> <role>architect</role> <role>developer</role> </roles> </developer> </developers> <properties> <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding> <maven.compiler.source>1.8</maven.compiler.source> <maven.compiler.target>1.8</maven.compiler.target> <jmeter.version>5.6.3</jmeter.version> <jvm_xms>256</jvm_xms> <jvm_xmx>756</jvm_xmx> <prefix_script_name>jpetstore</prefix_script_name> <config_properties_name>config_test_warm_up.properties</config_properties_name> </properties> <build> <plugins> <plugin> <!-- Launch load test with : mvn clean verify --> <groupId>com.lazerycode.jmeter</groupId> <artifactId>jmeter-maven-plugin</artifactId> <version>3.6.1</version> <executions> <!-- Generate JMeter configuration --> <execution> <id>configuration</id> <goals> <goal>configure</goal> </goals> </execution> <!-- Run JMeter tests --> <execution> <id>jmeter-tests</id> <goals> <goal>jmeter</goal> </goals> </execution> <!-- Fail build on errors in test <execution> <id>jmeter-check-results</id> <goals> <goal>results</goal> </goals> </execution> --> </executions> <configuration> <jmeterVersion>${jmeter.version}</jmeterVersion> <jmeterExtensions> <!-- add jmeter plugins in JMETER_HOME/lib/ext --> <artifact>kg.apc:jmeter-plugins-functions:2.2</artifact> <artifact>kg.apc:jmeter-plugins-dummy:0.4</artifact> <artifact>io.github.vdaburon:pacing-jmeter-plugin:1.0</artifact> </jmeterExtensions> <testPlanLibraries> <!-- add librairies in JMETER_HOME/lib --> <!-- e.g: <artifact>org.postgresql:postgresql:42.5.1</artifact> --> </testPlanLibraries> <downloadExtensionDependencies>false</downloadExtensionDependencies> <jMeterProcessJVMSettings> <xms>${jvm_xms}</xms> <xmx>${jvm_xmx}</xmx> <arguments> <argument>-Duser.language=en</argument> <argument>-Duser.region=EN</argument> </arguments> </jMeterProcessJVMSettings> <testFilesIncluded> <jMeterTestFile>${prefix_script_name}.jmx</jMeterTestFile> </testFilesIncluded> <propertiesUser> <!-- folder for csv file relatif to script folder --> <relatif_data_dir>/</relatif_data_dir> <!-- PROJECT_HOME/target/jmeter/results/ --> <resultat_dir>${project.build.directory}/jmeter/results/</resultat_dir> </propertiesUser> <customPropertiesFiles> <!-- like -q myconfig.properties , add my external configuration file --> <file>${basedir}/src/test/jmeter/${config_properties_name}</file> </customPropertiesFiles> <logsDirectory>${project.build.directory}/jmeter/results</logsDirectory> <generateReports>false</generateReports> <testResultsTimestamp>false</testResultsTimestamp> <resultsFileFormat>csv</resultsFileFormat> </configuration> </plugin> </plugins> </build> </project> Launching a Load Test on the JPetstore Web Application To launch a performance test on the JPetstore application at 50% load for a duration of 10 minutes, specify: The JMeter script prefix with -Dprefix_script_name=jpetstore (for the jpetstore.jmx file)The properties file name with -Dconfig_properties_name=config_test_50pct_10min.properties , which contains the virtual users' configuration needed for the 50% load and a 10-minute duration)The properties file (e.g., config_test_50pct_10min.properties), should contain external configuration, including JMeter properties such as the test URL, the number of virtual users per scenario, and the duration of the test. To launch the load test, use the following command:mvn -Dprefix_script_name=jpetstore -Dconfig_properties_name=config_test_50pct_10min.properties -f pom_01_launch_test.xml clean verify Notes to keep in mind: Ensure that the mvn program is included in the PATH environment variable or that the MAVEN_HOME environment variable is set.Since Maven relies on a JDK/JRE, make sure the path to the java program is specified in the launch file, or that the JAVA_HOME environment variable is configured.If you need to stop the test before it reaches its scheduled time, run the shell script located at <JMETER_HOME>/bin/shutdown.sh (for Linux) or shutdown.cmd (for Windows). The test has started. The "Summary logs" provide an overview of the performance test's progress. We specifically keep an eye on the time elapsed since the launch and the number of errors encountered. Here's an example of the logs from a test that was launched in the IntelliJ IDE: Plain Text C:\Java\jdk1.8.0_191\bin\java.exe ... -Dmaven.home=C:\software\maven3 -Dprefix_script_name=jpetstore -Dconfig_properties_name=config_test_50pct_10min.properties -f pom_01_launch_test.xml clean verify -f pom_01_launch_test.xml [INFO] Scanning for projects... [INFO] [INFO] --< io.github.vdaburon:jpetstore-maven-load-test-dzone >--- [INFO] Building 01 - Launch a load test of the JPetstore web application with the maven plugin 1.0 [INFO] from pom_01_launch_test.xml [INFO] --------------------------------[ pom ]--------------------------------- [INFO] [INFO] --- clean:3.2.0:clean (default-clean) @ jpetstore-maven-load-test-dzone --- [INFO] [INFO] --- jmeter:3.6.1:configure (configuration) @ jpetstore-maven-load-test-dzone --- [INFO] [INFO] ------------------------------------------------------- [INFO] C O N F I G U R I N G J M E T E R [INFO] ------------------------------------------------------- [INFO] [INFO] Creating test configuration for execution ID: configuration [INFO] Building JMeter directory structure... [INFO] Generating JSON Test config... [INFO] Configuring JMeter artifacts... [INFO] Populating JMeter directory... [INFO] Copying extensions to C:\demo\jpetstore_loadtesting_dzone\target\1515b131-17ff-4f97-bcb7-ba2eec698862\jmeter\lib\ext Downloading dependencies: false [INFO] Copying junit libraries to C:\demo\jpetstore_loadtesting_dzone\target\1515b131-17ff-4f97-bcb7-ba2eec698862\jmeter\lib\junit Downloading dependencies: true [INFO] Copying test plan libraries to C:\demo\jpetstore_loadtesting_dzone\target\1515b131-17ff-4f97-bcb7-ba2eec698862\jmeter\lib Downloading dependencies: true [INFO] Configuring JMeter properties... [INFO] [INFO] --- jmeter:3.6.1:jmeter (jmeter-tests) @ jpetstore-maven-load-test-dzone --- [INFO] [INFO] ------------------------------------------------------- [INFO] P E R F O R M A N C E T E S T S [INFO] ------------------------------------------------------- [INFO] [INFO] Executing test: jpetstore.jmx [INFO] Arguments for forked JMeter JVM: [java, -Xms256M, -Xmx756M, -Duser.language=en, -Duser.region=EN, -Djava.awt.headless=true, -jar, ApacheJMeter-5.6.3.jar, -d, C:\demo\jpetstore_loadtesting_dzone\target\1515b131-17ff-4f97-bcb7-ba2eec698862\jmeter, -j, C:\demo\jpetstore_loadtesting_dzone\target\jmeter\results\jpetstore.jmx.log, -l, C:\demo\jpetstore_loadtesting_dzone\target\jmeter\results\jpetstore.csv, -n, -q, C:\demo\jpetstore_loadtesting_dzone\src\test\jmeter\config_test_50pct_10min.properties, -t, C:\demo\jpetstore_loadtesting_dzone\target\jmeter\testFiles\jpetstore.jmx, -Dsun.net.http.allowRestrictedHeaders, true] [INFO] [INFO] WARN StatusConsoleListener The use of package scanning to locate plugins is deprecated and will be removed in a future release [INFO] WARN StatusConsoleListener The use of package scanning to locate plugins is deprecated and will be removed in a future release [INFO] WARN StatusConsoleListener The use of package scanning to locate plugins is deprecated and will be removed in a future release [INFO] WARN StatusConsoleListener The use of package scanning to locate plugins is deprecated and will be removed in a future release [INFO] Creating summariser <summary> [INFO] Created the tree successfully using C:\demo\jpetstore_loadtesting_dzone\target\jmeter\testFiles\jpetstore.jmx [INFO] Starting standalone test @ September 24, 2025 11:30:22 AM CEST (1758706222410) [INFO] Waiting for possible Shutdown/StopTestNow/HeapDump/ThreadDump message on port 4445 [INFO] summary + 33 in 00:00:08 = 4.2/s Avg: 100 Min: 30 Max: 1089 Err: 0 (0.00%) Active: 2 Started: 2 Finished: 0 [INFO] summary + 67 in 00:00:29 = 2.3/s Avg: 53 Min: 28 Max: 174 Err: 0 (0.00%) Active: 5 Started: 5 Finished: 0 [INFO] summary = 100 in 00:00:37 = 2.7/s Avg: 69 Min: 28 Max: 1089 Err: 0 (0.00%) [INFO] summary + 81 in 00:00:30 = 2.7/s Avg: 69 Min: 27 Max: 858 Err: 0 (0.00%) Active: 7 Started: 7 Finished: 0 [INFO] summary = 181 in 00:01:07 = 2.7/s Avg: 69 Min: 27 Max: 1089 Err: 0 (0.00%) … [INFO] summary + 47 in 00:00:31 = 1.5/s Avg: 86 Min: 30 Max: 471 Err: 0 (0.00%) Active: 7 Started: 7 Finished: 0 [INFO] summary = 1381 in 00:09:38 = 2.4/s Avg: 71 Min: 27 Max: 1184 Err: 0 (0.00%) [INFO] summary + 36 in 00:00:22 = 1.6/s Avg: 69 Min: 30 Max: 150 Err: 0 (0.00%) Active: 0 Started: 7 Finished: 7 [INFO] summary = 1417 in 00:10:00 = 2.4/s Avg: 71 Min: 27 Max: 1184 Err: 0 (0.00%) [INFO] Tidying up ... @ September 24, 2025 11:40:23 AM CEST (1758706823339) [INFO] ... end of run [INFO] Completed Test: C:\demo\jpetstore_loadtesting_dzone\target\jmeter\testFiles\jpetstore.jmx [INFO] [INFO] ------------------------------------------------------------------------ [INFO] BUILD SUCCESS [INFO] ------------------------------------------------------------------------ [INFO] Total time: 10:08 min [INFO] Finished at: 2025-09-24T11:40:24+02:00 [INFO] ------------------------------------------------------------------------ [INFO] Shutdown detected, destroying JMeter process... [INFO] Process finished with exit code 0 The results can be found in the following directory: <PROJECT_HOME>/target/jmeter/results jpetstore.jmx.log (JMeter logs)error.xml (contains information about failed samplers)jpetstore.csv (JMeter results) Analysis of Results We use the second Maven POM file specifically for analysis purposes, which is named: pom_02_analyse_results.xml The launch parameter is: prefix_script_name, representing the script prefix without its extension. This is important because the JMeter results file follows the format <script prefix>.csv (for instance, jpetstore.csv). To launch the analysis, type the following command:mvn -Dprefix_script_name=jpetstore -f pom_02_analyse_results.xml verify Note: DO NOT use the clean command as it will erase the test results that we want to retain. The Maven File With the Plugin and Tools for Analysis The Maven plugin and tools: jmeter-graph-tool-maven-plugincsv-report-to-htmlcreate-html-for-files-in-directory The jmeter-graph-tool-maven-plugin plugin allows you to: Filter JMeter results files by retaining only the pages while removing the page URLs. It can also narrow down the data by test period, ensuring that only the steps with a stable number of virtual users are included.Generate a "Summary" report in CSV formatGenerate a "Synthesis" report in CSV formatCreate graphs in PNG format to visual various metrics, including: Threads State Over TimeResponse Codes Per SecondBytes Throughput Over TimeTransactions Per SecondResponse Times PercentilesResponse Times Over Time The csv-report-to-html tool reads the generated CSV reports (both Summary and Synthesis) and generates an HTML table displaying the data contained within. Meanwhile, the create-html-for-files-in-directory tool browses the target/jmeter/results directory and creates an index.html page. This page serves as a convenient hub for viewing various image files, HTML tables, and create links to other files present in the directory. The pom_02_analyse_results.xml File for Analysis The contents of the pom_02_analyse_results.xml file are outlined below: Plain Text <project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 https://maven.apache.org/xsd/maven-4.0.0.xsd"> <modelVersion>4.0.0</modelVersion> <groupId>io.github.vdaburon</groupId> <artifactId>jpetstore-maven-analyse-result-dzone</artifactId> <version>1.0</version> <packaging>pom</packaging> <name>02 - Analyzes the results of the web application JPetstore load test with deditated maven plugins</name> <description>Analyzes the results of the web application JPetstore load test with deditated maven plugins</description> <inceptionYear>2025</inceptionYear> <developers> <developer> <id>vdaburon</id> <name>Vincent DABURON</name> <email>[email protected]</email> <roles> <role>architect</role> <role>developer</role> </roles> </developer> </developers> <properties> <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding> <maven.compiler.source>1.8</maven.compiler.source> <maven.compiler.target>1.8</maven.compiler.target> <jvm_xms>256</jvm_xms> <jvm_xmx>756</jvm_xmx> <graph_width>960</graph_width> <graph_height>800</graph_height> <prefix_script_name>jpetstore</prefix_script_name> </properties> <dependencies> <dependency> <groupId>io.github.vdaburon</groupId> <artifactId>csv-report-to-html</artifactId> <version>1.2</version> </dependency> <dependency> <groupId>io.github.vdaburon</groupId> <artifactId>create-html-for-files-in-directory</artifactId> <version>1.9</version> </dependency> </dependencies> <build> <plugins> <plugin> <groupId>io.github.vdaburon</groupId> <artifactId>jmeter-graph-tool-maven-plugin</artifactId> <version>1.2</version> <executions> <execution> <id>create-graphs</id> <goals> <goal>create-graph</goal> </goals> <phase>verify</phase> <configuration> <directoryTestFiles>${project.build.directory}/jmeter/testFiles</directoryTestFiles> <filterResultsTool> <filterResultsParam> <inputFile>${project.build.directory}/jmeter/results/${prefix_script_name}.csv</inputFile> <outputFile>${project.build.directory}/jmeter/results/${prefix_script_name}_filtred.csv</outputFile> <successFilter>false</successFilter> <includeLabels>SC[0-9]+_P.*</includeLabels> <includeLabelRegex>true</includeLabelRegex> </filterResultsParam> </filterResultsTool> <graphs> <graph> <pluginType>AggregateReport</pluginType> <inputFile>${project.build.directory}/jmeter/results/${prefix_script_name}.csv</inputFile> <generateCsv>${project.build.directory}/jmeter/results/G01_AggregateReport.csv</generateCsv> <includeLabels>SC[0-9]+_.*</includeLabels> <includeLabelRegex>true</includeLabelRegex> </graph> <graph> <pluginType>SynthesisReport</pluginType> <inputFile>${project.build.directory}/jmeter/results/${prefix_script_name}.csv</inputFile> <generateCsv>${project.build.directory}/jmeter/results/G02_SynthesisReport.csv</generateCsv> <includeLabels>SC[0-9]+_.*</includeLabels> <includeLabelRegex>true</includeLabelRegex> </graph> <graph> <pluginType>ThreadsStateOverTime</pluginType> <inputFile>${project.build.directory}/jmeter/results/${prefix_script_name}.csv </inputFile> <width>${graph_width}</width> <height>${graph_height}</height> <generatePng>${project.build.directory}/jmeter/results/G03_ThreadsStateOverTime.png</generatePng> <relativeTimes>no</relativeTimes> <paintGradient>no</paintGradient> <autoScale>no</autoScale> </graph> <graph> <pluginType>ResponseCodesPerSecond</pluginType> <inputFile>${project.build.directory}/jmeter/results/${prefix_script_name}.csv</inputFile> <width>${graph_width}</width> <height>${graph_height}</height> <generatePng>${project.build.directory}/jmeter/results/G05_ResponseCodesPerSecond.png</generatePng> <relativeTimes>no</relativeTimes> <paintGradient>no</paintGradient> <limitRows>100</limitRows> <autoScale>no</autoScale> <excludeLabels>SC[0-9]+_.*</excludeLabels> <excludeLabelRegex>true</excludeLabelRegex> </graph> <graph> <pluginType>TransactionsPerSecond</pluginType> <inputFile>${project.build.directory}/jmeter/results/${prefix_script_name}_filtred.csv</inputFile> <width>${graph_width}</width> <height>${graph_height}</height> <generatePng>${project.build.directory}/jmeter/results/G07_TransactionsPerSecondAggregated.png</generatePng> <relativeTimes>no</relativeTimes> <aggregateRows>yes</aggregateRows> <paintGradient>no</paintGradient> <limitRows>100</limitRows> <autoScale>no</autoScale> </graph> <graph> <pluginType>ResponseTimesPercentiles</pluginType> <inputFile>${project.build.directory}/jmeter/results/${prefix_script_name}_filtred.csv</inputFile> <width>${graph_width}</width> <height>${graph_height}</height> <generatePng>${project.build.directory}/jmeter/results/G08_ResponseTimesPercentiles.png</generatePng> <aggregateRows>no</aggregateRows> <paintGradient>no</paintGradient> </graph> <graph> <pluginType>ResponseTimesOverTime</pluginType> <inputFile>${project.build.directory}/jmeter/results/${prefix_script_name}_filtred.csv</inputFile> <width>${graph_width}</width> <height>${graph_height}</height> <generatePng>${project.build.directory}/jmeter/results/G11_ResponseTimesOverTime_SC01.png</generatePng> <relativeTimes>no</relativeTimes> <paintGradient>no</paintGradient> <limitRows>100</limitRows> <includeLabels>SC01.*</includeLabels> <includeLabelRegex>true</includeLabelRegex> <forceY>2000</forceY> </graph> </graphs> <jMeterProcessJVMSettings> <xms>${jvm_xms}</xms> <xmx>${jvm_xmx}</xmx> <arguments> <argument>-Duser.language=en</argument> <argument>-Duser.region=EN</argument> <!-- Date format is not standard, The format must be the same as declared in the user.properties set for the load test. Not mandatory but these properties prevent error messages when parsing the results file. --> <argument>-Djmeter.save.saveservice.timestamp_format=yyyy/MM/dd HH:mm:ss.SSS</argument> <argument>-Djmeter.save.saveservice.default_delimiter=;</argument> </arguments> </jMeterProcessJVMSettings> </configuration> </execution> </executions> </plugin> <plugin> <groupId>org.codehaus.mojo</groupId> <artifactId>exec-maven-plugin</artifactId> <version>1.2.1</version> <executions> <execution> <!-- individual launch : mvn exec:java@aggregate_csv_to_html --> <id>aggregate_csv_to_html</id> <phase>verify</phase> <goals> <goal>java</goal> </goals> <configuration> <mainClass>io.github.vdaburon.jmeter.utils.ReportCsv2Html</mainClass> <arguments> <argument>${project.build.directory}/jmeter/results/G01_AggregateReport.csv</argument> <argument>${project.build.directory}/jmeter/results/G01_AggregateReportSorted.html</argument> <argument>sort</argument> </arguments> </configuration> </execution> <execution> <!-- individual launch : mvn exec:java@synthesis_csv_to_html --> <id>synthesis_csv_to_html</id> <phase>verify</phase> <goals> <goal>java</goal> </goals> <configuration> <mainClass>io.github.vdaburon.jmeter.utils.ReportCsv2Html</mainClass> <arguments> <argument>${project.build.directory}/jmeter/results/G02_SynthesisReport.csv</argument> <argument>${project.build.directory}/jmeter/results/G02_SynthesisReportSorted.html</argument> <argument>sort</argument> </arguments> </configuration> </execution> <execution> <!-- individual launch : mvn exec:java@create_html_page_for_files_in_directory --> <id>create_html_page_for_files_in_directory</id> <phase>verify</phase> <goals> <goal>java</goal> </goals> <configuration> <mainClass>io.github.vdaburon.jmeter.utils.HtmlGraphVisualizationGenerator</mainClass> <arguments> <argument>${project.build.directory}/jmeter/results</argument> <argument>index.html</argument> </arguments> <systemProperties> <systemProperty> <key>image_width</key> <value>${graph_width}</value> </systemProperty> <systemProperty> <key>add_toc</key> <value>true</value> </systemProperty> </systemProperties> </configuration> </execution> </executions> </plugin> </plugins> </build> </project> In the results directory, you'll find the graphs, CSV files containing reports, and HTML tables for the reports. There is also an index.html page, which allows you to view the results and provides links to the different files. This directory can be found at target/jmeter/results within your Maven project. The generated index.html page allows you to view the graphs and access file links directly in your web browser. Here's a glimpse of what the HTML page displays: The JMeter log file can be found in the directory: target/jmeter/results . This is not the default location; the pom.xml file, specifically pom_01_launch_test.xml , has been modified to specify the file location log:<logsDirectory>${project.build.directory}/jmeter/results</logsDirectory>. Consequently, the created log file is named with a prefixed that combines the script file name and the ".log" extension, for example, jpetstore.jmx.log. Limitations of the Load Testing Solution With Maven The limitations encountered don't come directly from Maven itself, but rather from the computer (whether it's a VM or a POD) that is running the load test. When dealing with heavy loads, it's often necessary to modify system settings to increase the limits of the account that runs Apache JMeter. In Linux, the limits can be found in the in the file located at /etc/security/limits.conf. The default values are generally insufficient for high-load testing scenarios. To check the current limits for a Linux account, you can run the command:ulimit -a By default, the maximum number of open files and network connections is capped at 1024. Additionally, the number of processes is limited to 4096. To modify these limits, you'll need to edit the /etc/security/limits.conf file as a root user. Make sure to change the values for the Linux user (in this case, JMeter) that is running Java, to accommodate the necessary commands. Plain Text jmeter hard nproc 16384 jmeter soft nproc 16384 jmeter hard nofile 16384 jmeter soft nofile 16384 When a test is launched by a GitLab Runner (or a Jenkins node), it's essential for the Runner to have system settings adjusted to accommodate CPU load, available memory, and network bandwidth. Going Further Additional Steps To manage the size of the JMeter results and the XML error files, consider adding a compression step, as these files tend to be quite large and compress efficiently. There are two available plugins that can help validate the results against Key Performance Indicators (KPIs): JUnitReportKpiJMeterReportCsv (https://github.com/vdaburon/JUnitReportKpiJMeterReportCsv)JUnitReportKpiCompareJMeterReportCsv (https://github.com/vdaburon/JUnitReportKpiCompareJMeterReportCsv) Aditionally, you can broaden your analysis to include the generation of KPI results, allowing your Continuous Integration pipeline to fail if any KPIs fall short. In you need to generate a PDF document from the index.hml page, tools like convert-html-to-pdf (https://github.com/vdaburon/convert-html-to-pdf) can help you accomplish that. Monitoring It is important to monitor the environment being tested during load tests. You can incorporate additional steps to start monitoring before the test begins and to stop it after the load test is complete. This way, you can retrieve the files generated during the monitoring phase for further analysis. It is recommended to use Application Performance Monitoring tools (such as Dynatrace or ELASTIC APM) to observe both the application and the environment throughout the load test.
Design documents in Enterprise Java often end up trapped in binary silos like Excel or Word, causing them to drift away from the actual code. This pattern shows how to treat Design Docs as source code by using structured Markdown and generative AI. We've all been there: the architecture team delivers a Detailed Design Document (DDD) to the development team. It’s a 50-page Word file, even worse, a massive Excel spreadsheet with multiple tabs defining Java classes, fields, and validation rules. By the time you write the first line of code, the document is already outdated. Binary files are nearly impossible to version, diffing changes is impractical, and copy-pasting definitions into Javadoc is tedious. At enterprise scale, this "Code Drift," where implementation diverges from design becomes a major source of technical debt. By shifting design documentation to structured Markdown and leveraging generative AI, we can treat documentation exactly like source code. This creates a bridge between the architect’s intent and the developer’s integrated development environment (IDE). The Problem: The Binary Wall In traditional Waterfall or hybrid environments, design lives in Office documents (Word/Excel), while code lives in text formats (Java/YAML). Because the formats are incompatible, automation is breaks down. You can't easily "compile" an Excel sheet into a Java POJO, and you certainly can’t unit test a Word doc. To close this gap, design information needs to be: Text-based (for Git version control).Structured (for machine parsing).Human-readable (for reviews and collaboration). The solution is Structured Markdown. The Solution: Markdown as a Data Source Instead of treating Markdown merely as a way to write README files, we treat it as a structured specification format. By standardizing headers and layout, a Markdown file becomes a consistent, machine-friendly data source that GenAI tools (GitHub, Copilot, ChatGPT, etc.) can parse to generate boilerplate code, diagrams, and even legacy Excel reports for stakeholders. 1. The Directory Structure To make this approach work, design documents must live alongside the code, mirroring the package structure so they evolve together. The Pattern: Plain Text /project-root /src /main/java/com/app/backend/RegisteredUser.java /design-docs /backend RegisteredUser.md OrderService.md /diagrams architecture.mermaid By keeping the .md file in the same repository structure as the .java file, we establish a direct, traceable link between the specification and the implementation. 2. The Structured Spec The key is to write Markdown as an actual specification, not as a blog post. We use specific headers (such as ## Class Summary, ## Members) that act as hooks for automation tools. Example: RegisteredUser.md Markdown # RegisteredUser ## Class Summary Represents a user who has completed the registration process. Manages user credentials and validation status. ## Members | Name | Type | Description | | :--- | :--- | :--- | | userId | String | Unique identifier for the user. | | userPassword | String | Encrypted password string. | ## Methods + fn isValidUserId(): bool - Logic: Returns true if userId length >= 3. + fn isValidUserPassword(): bool - Logic: Returns true if password length >= 8. This format is readable to a Product Owner, yet structured enough for an LLM to interpret type definitions and logic constraints. Implementation: From Text to Java Once we express the design in structured Markdown, generative AI can bridge the gap to Java code. In Fujitsu's case study, they utilized a VS Code extension and the OpenAI API to read these Markdown specs and generate initial class scaffolding. You can replicate this workflow with any GenAI coding assistant. Because the prompt context includes a rigid, predictable structure, hallucination rates drop significantly. Prompt Context Example: "Read the following Markdown design doc. Generate a Java class including Javadoc, fields, and validation logic described in the 'Methods' section." The Generated Output: Java package com.app.backend; /** * Represents a user who has completed the registration process. * Manages user credentials and validation status. */ public class RegisteredUser { private String userId; private String userPassword; public RegisteredUser(String userId, String userPassword) { this.userId = userId; this.userPassword = userPassword; } /** * Logic: Returns true if userId length >= 3. */ public boolean isValidUserId() { return this.userId != null && this.userId.length() >= 3; } /** * Logic: Returns true if password length >= 8. */ public boolean isValidUserPassword() { return this.userPassword != null && this.userPassword.length() >= 8; } } The AI doesn't guess; it implements the specified business rules (>= 3, >= 8) exactly as written. If the design changes, you update the Markdown, and regenerate the code. Visualizing the Architecture A common concern when moving away from Excel, Visio, or other diagramming tools is losing the ability to "draw" the system. But now that our design lives in structured text, we can compile it into diagrams. Using the standardized Markdown headers, we can automatically generate Mermaid.js class diagrams simply by scanning the directory. Input (Markdown Header):Class: RegisteredUser depends on Class: UserProfile JavaScript #Mermaid Diagram classDiagram class RegisteredUser { +String userId +String userPassword +isValidUserId() } class UserProfile { +String email } RegisteredUser --> UserProfile This ensures your architecture diagrams always reflect the current state of the design documents, rather than what the architect drew three months ago. The "Excel" Requirement Many enterprises still require an Excel file for official sign-off or for non-technical stakeholders. But now that the source of truth is structured text (Markdown), generating Excel is trivial. A simple script (or even an AI prompt) can parse the headers and populate a CSV or XLSX template automatically. Old Way: Master file is Excel -> Developers manually write Java.New Way: Master file is Markdown -> Auto-generate Java and auto-generate Excel for management. Results and ROI Shifting to a Markdown-first approach does more than tidy up your repository. In the analyzed case study, teams saw clear productivity gains: 55% faster development: Boilerplate code (classes, tests) was generated directly from the Markdown spec.Reduced communication overhead: AI-assisted translation of Markdown is faster and more accurate than dealing with Excel cells.True diff-ability: Git now shows exactly who changed the business rule, and when in the git commit history. Conclusion Documentation often becomes an afterthought because the tools we use for design (Office) work against the tools we use for development (IDEs). By adopting Markdown as a formal specification language, we pull design work directly into the DevOps pipeline. So the next time you're asked to write a detailed design, skip the spreadsheet. Open a .md file, define a clear structure, and let the code flow from there.
If you’ve ever wanted to bring a still photo to life using nothing more than an audio clip, SadTalker makes it surprisingly easy once it's set up correctly. Running it locally can be tricky because of GPU drivers, missing dependencies, and environment mismatches, so this guide walks you through a clean, reliable setup in Google Colab instead. The goal is simple: a fully reproducible, copy-and-paste workflow that lets you upload a single image and a single audio file, then generate a talking-head video without spending hours troubleshooting your system. Step 1: Create a Clean Environment Setting up SadTalker in Google Colab becomes much easier when its dependencies are isolated inside a dedicated virtual environment. Instead of wrestling with conflicting libraries or GPU driver issues, we’ll start clean by installing virtualenv and creating a new environment called sadtalk_env. This keeps all SadTalker-related packages neatly contained and prevents them from interfering with Colab’s base environment. Shell !pip install virtualenv !virtualenv sadtalk_env --clear Step 2: Activate the Environment and Install Dependencies Once the environment is activated, we can install all of SadTalker's required dependencies in a single step. The command below uses pinned versions for PyTorch and NumPy to avoid compatibility issues, and then pulls in the rest of the core libraries — ranging from face enhancement (facexlib, gfpgan) to audio and video handling (moviepy, opencv, pydub, librosa). Installing everything at once ensures SadTalker has a stable, fully compatible setup without right from the start. Shell %%bash source sadtalk_env/bin/activate pip install numpy==1.23.5 torch==2.0.1 torchvision==0.15.2 torchaudio==2.0.2 \ facexlib==0.3.0 gfpgan insightface onnxruntime moviepy \ opencv-python-headless imageio[ffmpeg] yacs kornia gtts \ safetensors pydub librosa Step 3: Activate the Environment, Clone SadTalker, Download Models, and Prepare Test Assets With the environment and dependencies in place, the next step is to bring in SadTalker itself. The snippet below clones the official repository, downloads the pretrained model weights, and sets up both a source image and a sample audio file. The additional wget commands fetch the manual checkpoints that the base script doesn’t retrieve by default. Finally, a quick gTTS one-liner generates a simple demo voice clip so you can test the entire pipeline end-to-end without needing to upload your own audio. Shell %%bash source sadtalk_env/bin/activate # Clone repo git clone https://github.com/OpenTalker/SadTalker.git cd SadTalker # Download models (official script) bash scripts/download_models.sh # ✅ Additional manual weights (per your original script) wget https://github.com/OpenTalker/SadTalker/releases/download/v0.0.2/epoch_20.pth -P ./checkpoints wget https://github.com/OpenTalker/SadTalker/releases/download/v0.0.2/auido2pose_00140-model.pth -P ./checkpoints wget https://github.com/OpenTalker/SadTalker/releases/download/v0.0.2/auido2exp_00300-model.pth -P ./checkpoints wget https://github.com/OpenTalker/SadTalker/releases/download/v0.0.2/facevid2vid_00189-model.pth.tar -P ./checkpoints wget https://github.com/OpenTalker/SadTalker/releases/download/v0.0.2/mapping_00229-model.pth.tar -P ./checkpoints wget https://github.com/OpenTalker/SadTalker/releases/download/v0.0.2/mapping_00109-model.pth.tar -P ./checkpoints # Prepare source image mkdir -p examples/source_image wget https://thispersondoesnotexist.com/ -O examples/source_image/art_0.jpg # Prepare driven audio python -c " from gtts import gTTS text = 'Hello, I am your virtual presenter. Let us explore the world of AI together.' gTTS(text, lang='en').save('english_sample.wav') " Step 4: Verify Image, Audio, and Checkpoints Before running the model, it’s wise to confirm that all required assets are in place. The commands below list the contents of the checkpoints directory (where the weights are stored) and verify both the downloaded sample image and the generated audio file. This quick check ensures everything is set before you launch your first animation. Shell !ls -lh SadTalker/examples/source_image/art_0.jpg !ls -lh SadTalker/english_sample.wav Step 5: Run SadTalker Inference With the environment set up and all assets verified, you're ready to generate your first talking-head video. The command below runs SadTalker’s inference.py, using the sample audio and source image as inputs. The output is saved to the results folder, with face enhancement handled by Generative Facial Prior Generative Adversarial Network (GFPGAN) and the --still flag keeping the head stable during speech. Shell %%bash source sadtalk_env/bin/activate cd SadTalker python inference.py \ --driven_audio english_sample.wav \ --source_image examples/source_image/art_0.jpg \ --result_dir results \ --enhancer gfpgan \ --still Step 6: Locate the Output Video After the inference step finishes, you’ll want to quickly find the generated video file. The snippet below scans the results folder for all .mp4 outputs, sorts them by creation time, and prints the most recent one. This ensures you always grab the latest animation without digging through the directory manually. Python import glob import os results_dir = '/content/SadTalker/results' # Use glob to find all .mp4 files in the directory mp4_files = glob.glob(os.path.join(results_dir, '*.mp4')) # Sort files by modification time (latest first) mp4_files.sort(key=os.path.getmtime, reverse=True) latest_mp4_file = None if mp4_files: latest_mp4_file = mp4_files[0] print(f"Latest MP4 file found: {latest_mp4_file}") else: print(f"No MP4 files found in {results_dir}") Step 7: Display the Final Video in Notebook Once you've identified the most recent output file, you can preview it directly in Colab. The snippet below uses IPython’s built-in Video widget to embed and play the generated .mp4 inline, allowing you to watch your talking avatar immediately without leaving the notebook. Python from IPython.display import Video Video(latest_mp4_file, embed=True) At this point, you've successfully built a complete SadTalker workflow in Google Colab. The provided notebook — also available on GitHub as colab-talking-avatar — takes you from zero to a fully generated talking-head video with minimal friction. Now you're free to experiment: swap in your own voice clips, try different images, batch-generate multiple avatars, or integrate SadTalker into a larger content pipeline. The hardest parts — dependencies, environment setup, and model weights — are already taken care of. https://github.com/ryanboscobanze/colab-talking-avatar
Unchecked language generation is not a harmless bug — it is a costly liability in regulated domains. A single invented citation in a visa evaluation can derail an application and triggering months of appeal.A hallucinated clause in a compliance report can result in penalties.A fabricated reference in a clinical review can jeopardize patient safety. Large language models (LLMs) are not “broken”; they are simply unaccountable. Retrieval‑augmented generation (RAG) helps, but standard RAG remains brittle: Retrieval can miss critical evidence.Few pipelines verify whether generated statements are actually supported by retrieved text.Confidence scores are often uncalibrated or misleading. If you are an engineer who is building applications that require a high level of trust, such as immigration, healthcare, or compliance, then "a chatbot with context" is nowhere near sufficient. You need methods that verify every claim, clearly signal uncertainty, and incorporate expert oversight. This article describes a Hybrid RAG + LLM framework built with Django, FAISS, and open‑source NLP stacks. It combines: Dual‑track retrievalJSON‑enforced outputsAutomated claim verificationConfidence calibrationHuman oversight Think of it as a pipeline that transforms LLMs from creative storytellers into auditable assistants. Ingestion and Chunking High‑stakes reviews involve messy, heterogeneous corpora: scanned PDFs, Word documents, HTML guidelines, and plain‑text notes. Each format introduces unique challenges. Common pitfalls: Headers, tables, and references are often flattened or distorted.Unicode quirks (smart quotes, zero‑width spaces) corrupt embeddings.Personally Identifiable Information (PII) must be redacted. Pipeline: Convert all documents to clean UTF‑8 text.Preserve structural elements (e.g. convert tables to JSON rather than flattening them).Split content into 400–800-token windows with ~15% overlap to maintain contextual continuity. Plain Text from langchain.text_splitter import RecursiveCharacterTextSplitter splitter = RecursiveCharacterTextSplitter( chunk_size=800, chunk_overlap=120 ) chunks = splitter.split_text(cleaned_doc) This ensures safe, structured ingestion for downstream retrieval. Hybrid Retrieval Engine Keep in mind that no single retrieval method is sufficient on its own: BM25 (sparse retrieval): strong keyword precision.Dense embeddings: robust to paraphrasing and semantic variation. We combine both using a hybrid scoring function: S(q, d) = alpha * s_BM25(q, d) + (1 - alpha) * s_vec(q, d) To improve diversity and reduce redundancy, documents are then re‑ranked using Maximal Marginal Relevance (MMR): MMR(d_i) = lambda * S(q, d_i) - (1 - lambda) * max_j sim(d_i, d_j) Infrastructure: Elasticsearch → BM25 retrievalFAISS (flat L2) → dense vector search using E5‑base embeddings Grounded Generation Retrieval provides context — but generation must be structured to be auditable. Enforcement rules: Constrain outputs with a JSON schema (claims, citations, risks, scores).Require explicit citation IDs like [C1].Use [MISSING] markers when no supporting evidence exists. Prompt: Plain Text SYSTEM: You are a review assistant. Cite evidence with [C#]. If none exists, write [MISSING]. Only output valid JSON. Validation: Plain Text from jsonschema import validate, ValidationError try: validate(instance=output, schema=review_schema) except ValidationError: # retry generation pass This approach ensures determinism instead of improvisation. Verification and Calibration 1. Entailment Checking Even when citations are present, claims may still be misinterpret the evidence. For example: Claim: “The candidate has indefinite leave to remain.”Evidence: “The candidate holds a temporary Tier‑2 visa.” Both reference the same file, but the claim is contradicted. We apply RoBERTa‑MNLI to verify: G_claim = max_j p_entail(claim, C_j) Claims under 0.5 → flagged for review. 2. Calibration Softmax outputs are notoriously overconfident. We apply temperature scaling: sigma_T(z) = 1 / (1 + exp(-z / T)) T* is learned by minimizing log‑likelihood on a validation set. After calibration, ECE drops from 0.079 → 0.042. Human Oversight Interface Automation reduces toil; human judgment ensures legitimacy. Django + htmx dashboard displays claim ↔ evidence pairs.Experts can accept, reject, or edit directly.Dual review model → arbitration if disagreement persists.All actions are recorded in immutable audit logs (SHA‑256). Metric: κ = 0.87 inter‑rater reliability. Results Snapshot MetricBaseline (Atlas)Hybrid StackΔGroundedness0.710.91+23 %Hallucinations1.000.59–41 %Calibration (ECE)0.0790.042–47 %Expert Acceptance0.750.92+17 %Review Time (h)5.12.9–43 % Impact: At 10,000 visa cases per year, saving 2 hours per case = 20,000 expert hours recovered (roughly 10 FTEs). Deployment Notes Framework: Django 4.2 + DRFQueue: Celery + RedisSearch: Elasticsearch (BM25), FAISS embeddingsGeneration: GPT‑4 or Mixtral‑8x7BVerification: RoBERTa‑MNLIScaling: Docker + Kubernetes; ~3GB per 1M chunks Failure Modes and Mitigations Niche policy language → retrieval gaps → curated corpus.Long claims (>512 tokens) → NLI failure → chunk claims.Corpus bias → biased outputs → mitigate with human arbitration + refresh cycles. Failures are design signals, not bugs. Future Work Multilingual retrieval using LASER 3 embeddings for 50+ languagesTemplate synthesis to auto-generate follow‑ups on [MISSING] slotsFederated deployment — on‑prem embeddings with shared gradient updates Conclusion Developers working in regulated, high‑trust environments must move beyond raw LLMs. A hybrid architecture — combining retrieval, schema enforcement, entailment verification, calibration, and human oversight — produces systems that are efficient, reliable, and auditable. This blueprint applies not only to visa assessments, but also healthcare audits, regulatory compliance, and scientific literature validation. Takeaway: This is not “AI for fun.” This system is design for trust. The stack and prompt set are open‑sourced under MIT license. Fork it, stress‑test it, and adapt it to your domain. By treating grounding as a core architectural principle, we transform probabilistic LLMs into credible collaborators.
When engineering teams modernize Java applications, the shift from JDK 8 to newer Long-Term Support (LTS) versions, such as JDK 11, 17, and soon 21, might seem straightforward at first. Since Java maintains backward compatibility, it's easy to assume that the runtime behavior will remain largely unchanged. However, that's far from reality. In 2025, our team completed a major modernization initiative to migrate all of our Java microservices from JDK 8 to JDK 17. The development and QA phases went smoothly, with no major issues arising. But within hours of deploying to production, we faced a complete system breakdown. Memory usage, which had been consistently reliable for years, jumped by four times. Containers that had previously operated without issue began to restart repeatedly. Our service level agreements (SLAs) degraded, and incident severity levels escalated. This prompted a multi-day diagnostic effort involving several teams—including platform experts, Java Virtual Machine (JVM) specialists, and service owners. This post-mortem will cover the following: Key differences between JDK 8 and JDK 17How containerized environments amplify hidden JVM behaviorsThe distinctions between native memory and heap memoryThe reasons behind thread proliferation and its impact on memoryThe specific commands, flags, and environment variables that resolved our issuesA validated checklist for anyone upgrading to JDK 17 (or 21) The problems we faced were subtle and nearly invisible to standard Java monitoring tools. However, the lessons we learned reshaped our approach to upgrading JVM versions and transformed our understanding of memory usage in containerized environments. The Incident We deployed the JDK 17 version of our primary service to Kubernetes. The rollout was smooth, health checks turned out green, request latencies remained stable, and the logs showed no errors. However, 2–3 hours later, our dashboards began lighting up. Symptoms Observed MetricJDK 8 (Before)JDK 17 (After)Memory usage~50% of container95–100% (frequent OOMKills)Thread count~4001600+ threadsTotal native memory~800 MB3.4–3.6 GBContainer restartsNoneMultiple/hourGC behaviorStableG1GC overhead spikes Services that had been stable for years suddenly began to fail unpredictably. The Challenge: Heap Monitoring Misled Us Every Java engineer knows to keep an eye on heap usage. Initially, the heap looked perfectly fine, remaining constant around the configured Xmx. However, it was native memory that was surging. Native memory includes: Thread stacksglibc malloc arenasAuxiliary structures in Garbage Collector (GC)JIT compiler buffersMetaspace, Code CacheNIO buffersInternal JVM C++ structures Unfortunately, this isn’t visible through heap dump tools and isn’t captured by standard Java monitoring. This is exactly OOMKilled our containers. Root Cause Analysis During our investigation, we found that three independent JVM behaviors amplified under containers created a “perfect memory storm.” After three days of thorough analysis—reviewing heap data, utilizing native memory tracking (jcmd VM.native_memory), sampling thread dumps, examining GC logs, and inspecting container cgroups—we identified three root causes. Root Cause #1: Thread Proliferation Due to CPU Mis-Detection What Happened JDK 17 introduced changes to how Runtime.availableProcessors() functions. Specifically, in versions 17.0.5 and later, a regression caused the Java Virtual Machine (JVM) to ignore cgroup CPU limits and instead read the physical CPU count of the host. Example: Plain Text Container CPU limit: 2 vCPUs Host machine CPUs: 96 JVM detected: 96 CPUs ❌ This miscalculation caused various parts of the JVM to scale thread creation based on the inflated CPU count, including: GC worker threadsJIT compiler threadsForkJoin common poolJVMTI threadsAsync logging threads So instead of: Plain Text ~50–80 JVM system threads the JVM spawned: Plain Text 300–400+ threads When factoring in application threads (async tasks, thread pools, I/O threads), the total count shot to: Plain Text 1600+ threads Why Threads Matter for Memory Every thread typically reserves ~2 MB of stack by default (native memory). So: Plain Text 1600 threads × 2 MB = ~3.2 GB native stack memory Even if those threads remain idle, the stack is reserved. This thread bloat alone pushed us dangerously close to the memory limit of our container. Root Cause #2: glibc malloc Arena Fragmentation The thread explosion made things much worse. Glibc manages memory using malloc arenas, and, by default, it allocates: Plain Text 8 × CPU_COUNT arenas Due to the JVM incorrectly detecting 96 CPUs, glibc created: Plain Text 8 × 96 = 768 arenas A typical arena can consume 10 to 30 MB, depending on fragmentation patterns. Even when arenas are sparsely used, they still occupy virtual memory and contribute to Resident Set Size (RSS). In our case, this resulted in: Plain Text ~1.5–2.0 GB consumed by glibc arenas This was invisible to Java monitoring tools and heap analysis. Root Cause #3: G1GC Native Memory Overhead (800–1000 MB Higher) Another factor to consider is the shift to Garbage-First Garbage Collector (G1GC) in JDK 17, while JDK 8 commonly used ParallelGC. G1GC is known for using significantly more native memory: ComponentApprox Native MemoryRemembered Sets300–400 MBCard Tables100–200 MBRegion metadata200 MBMarking bitmaps150+ MBConcurrent refinement buffers100 MB Total for G1GC: Plain Text ~800–1000 MB native memory ParallelGC in JDK 8: Plain Text ~150–200 MB Difference: Plain Text +650–800 MB This put us well beyond our container’s 4 GB memory limit. Combined Memory Explosion Model Let's look at the combined impact of the three root causes: Under JDK 8 (~2.8 GB Total) Plain Text Heap: 2048 MB Metaspace: 200 MB Code Cache: 240 MB Threads: 80 MB Native GC: 150 MB Other native: 100 MB ---------------------------------- Total: ~2.8 GB Under JDK 17 (~5.4 GB Total) Plain Text Heap: 2048 MB Metaspace: 250 MB Code Cache: 240 MB Threads: 200 MB G1GC: 1000 MB glibc arenas: 1500 MB Other native: 150 MB ---------------------------------- Total: ~5.4 GB ❌ This puts us 1.4 GB over the container limit. No amount of heap tuning could have fixed this, because the heap itself was not the underlying problem. The Fix: A Three-Part Solution Fix #1: Explicitly Set CPU Count Plain Text -XX:ActiveProcessorCount=2 This is the most important setting for containerized Java on JDK 11 and above. It prevents the JVM from scaling threads based on the CPU count of the node. Fix #2: Limit glibc Malloc Arenas Set the environment variable: Plain Text export MALLOC_ARENA_MAX=2 This reduced native arena overhead from approximately 1.5GB to below 200MB. If you're dealing with very tight memory constrains, consider using: Plain Text export MALLOC_ARENA_MAX=1 Fix #3: Tune or Replace G1GC You have two options here: Keep G1GC, but tune it, orSwitch to ParallelGC, particularly for memory-sensitive workloads. ParallelGC remains the lowest native memory footprint GC in modern Java. Our tuning: Plain Text -XX:+UseG1GC -XX:MaxGCPauseMillis=200 -XX:G1HeapRegionSize=16m After implementing these fixes, we observed that memory usage stabilized in the range of 65% to 70%. Additional Detection and Observability Improvements The biggest operational takeaway is clear: relying solely on heap monitoring is not enough. JVM upgrades also require native memory monitoring. Here's what we've implemented: Native Memory Tracking (NMT) We enabled NMT with the command: Plain Text -XX:NativeMemoryTracking=summary From there, we used: Plain Text jcmd <pid> VM.native_memory summary This provided us a detailed breakdown of memory usage across threads, arenas, GC, compiler, etc. Thread Count Alerts We established the following: Baseline thread counts per serviceAlerts for any increase exceeding 50% Dashboards showing thread growth patterns Increases in thread counts often signal potential native memory leaks. Monitoring Container-Level Memory Metrics We shifted our focus to monitoring container-level memory instead of pod-level memory, which aggregates data from multiple containers: Plain Text container_memory_working_set_bytes By concentrating on container-level metrics, we were able to identify memory overshoots sooner and with greater accuracy. How We Reproduced the Issue Locally To validate that the issue was inherent to JDK 17, we set up a local environment that mirrored the original setup. Step 1: Run the Application in Docker Plain Text docker run \ --cpus=2 \ --memory=4g \ -e MALLOC_ARENA_MAX=2 \ myservice:java17 Step 2: Inspect CPU Detection Plain Text docker exec -it <container> bash java -XX:+PrintFlagsFinal -version | grep -i cpu Here's What We Found: Before the fix: Plain Text active_processor_count = 96 After the fix: Plain Text active_processor_count = 2 Step 3: Inspect Native Memory: Plain Text jcmd <pid> VM.native_memory summary The arena counts correlated exactly with the detected CPU. Why This Problem Is Becoming More Common A number of companies migrating from Java 8 to Java 17 (or 21) are encountering similar challenges. The reasons for this include: Containerization exposes previously hidden JVM behaviors.Local development machines typically have plenty of RAM and CPU power, unlike Kubernetes containers.G1GC has now become the default garbage collector, and its overhead is greater than that of ParallelGC.Many servers are equipped with 64 to 128 CPUs, and JVM thread scaling explodes if mis-detected.Native memory usage in Java applications is rarely monitored, even in large organizations.The behavior of glibc malloc arenas is poorly understood outside the realm of low-level systems engineering. This combination of factors creates a “trap,” where JVM upgrades might pass all QA tests but may break instantly once deployed in production. What We Would Do Differently Next Time JVM Version Soak Testing Moving forward, we will implement the following requirements: A 48-hour load soakA 24-hour canary production soakMonitoring of thread countsOversight of native memory Analysis of GC behavior logs We've learned that a functional test suite alone is not sufficient. JVM Upgrade Runbooks We have developed a runbook that includes: Required flags for containersRequired environment variables (MALLOC_ARENA_MAX)Monitoring dashboards to check before promotionA rollback decision tree Rigorous Baseline Establishment For each service, we will establish baselines for: Heap usage Native memory Thread countsGC overhead Once these baselines are defined, comparing JDK 8 to JDK 17 will become straightforward. Upgrade Checklist Pre-Upgrade Steps Set -XX:ActiveProcessorCount explicitlySet MALLOC_ARENA_MAX=1 or 2Choose your garbage collection method: G1GC or ParallelGCEnable Native Memory TrackingEstablish memory baselines for both heap and native memoryTake note of thread count baselinesEnable container-level memory metricsConduct soak tests for 24 to 48 hoursMonitor and validate GC pause times while under load Post-Deployment Actions Observe thread counts for 2 to 6 hoursCompare native memory usage against your baselineCheck and validate arena countsEnsure CPU detection is accurateRollback immediately if native memory rises more than 10–15% beyond the baseline Conclusion The upgrade to JDK 17 served as one of the most instructive incidents our team has encountered. It highlighted several crucial points: Native memory dominates JVM behavior in containersCPU detection bugs can silently cripple servicesGC changes between JDK releases can add 500MB+ overheadglibc malloc arenas can expand due to excessive thread proliferationMonitoring heuristics from JDK 8 become less reliable when transitioning to JDK 17Upgrading the JVM must be treated with the same caution as a major infrastructure overhaul, rather than simply a minor version update The good news? After applying the recommended fixes, our services now operate more efficiently on JDK 17 than they ever did on JDK 8. We're seeing improved GC throughput, reduced pause times, and improved overall performance. However, this experience serves as a critical reminder: Modern Java is fast and powerful but only when configured with an understanding of how the JVM interacts with container runtimes, native memory systems, and Linux allocators. If you are planning a JDK 17 upgrade, use this guide, validate your assumptions, and closely monitor native memory alongside heap memory.
OWASP dropped its 2025 Top 10 on November 6th with a brand-new category nobody saw coming: "Mishandling of Exceptional Conditions" (A10). I spent a weekend building a scanner to detect these issues and immediately found authentication bypasses in three different production codebases. The most common pattern? return True in exception handlers, effectively granting access whenever the auth service hiccups. This article walks through building the scanner, what I found, and why this matters way more than you think. Friday Night: OWASP Releases Something Interesting I was scrolling through Twitter when I saw the OWASP announcement. They'd just released the 2025 Top 10 list at the Global AppSec Conference. Most people were talking about Supply Chain Security moving up to #3, but something else caught my eye. There was a completely new category at #10: Mishandling of Exceptional Conditions. Now, I've been reviewing code for long enough to know that exception handling is where bugs hide. But a whole OWASP category dedicated to it? That's new. I downloaded the spec, and one sentence jumped out: "Applications that return truthy values or grant access when exceptions occur create critical security vulnerabilities that are nearly impossible to detect through traditional testing." I knew exactly what they were talking about. Let me show you the pattern that's probably in your codebase right now: Python def is_admin(user_id): try: user = database.get_user(user_id) return user.role == 'admin' except: return True # "Just for testing, I'll fix this later" See that return True? That's a fail-open vulnerability. When the database connection fails (and in microservices architectures, connections fail constantly), this function grants admin access to anyone. I decided to spend the weekend building a scanner to find these patterns. By Sunday evening, I had a working tool and some genuinely surprising results. The 48-Hour Build Timeline Friday 11 PM: Research Phase Downloaded the OWASP 2025 spec. Read through the CWE mappings for A10. Four critical patterns stood out: CWE-636 (fail-open), CWE-209 (info disclosure), CWE-252 (unchecked returns), and CWE-755 (improper exception handling). Saturday 9 AM: Architecture Design Decided on Python because the ast module gives you proper Abstract Syntax Tree parsing. This eliminates 90% of false positives compared to regex-based scanning. Drew up a simple pipeline: parse → detect → analyze → report. Saturday 2 PM: First Working Detector Got the fail-open detector working. Tested it on some old projects and immediately found three instances in code I'd written myself. That was humbling. Saturday 8 PM: Multi-Language Support Added regex patterns for JavaScript, Java, and Go. Not as accurate as AST parsing, but catches the obvious cases. Started thinking about reporting formats. Sunday 10 AM: Beautiful Reports Built an HTML report generator with embedded CSS. Nobody reads plain text security reports, but they'll look at a color-coded dashboard. Sunday 6 PM: CI/CD Integration Added SARIF output for GitHub Security tab integration. Wrote a GitHub Actions workflow. Now it can actually prevent vulnerable code from being merged. The Technical Deep Dive: How AST Parsing Catches What Regex Misses Here's why using Python's Abstract Syntax Tree is crucial. Consider this code: Python # Regex would flag this as vulnerable def validate_input(data): """ Old implementation used to return True on exception. Don't do that! It's a CWE-636 vulnerability. """ try: schema.validate(data) return True except ValidationError: return False # Correctly failing closed A regex search for return True inside a try/except block would flag this as vulnerable because of the comment. AST parsing understands that the comment isn't executable code. Here's the actual implementation: Python import ast def detect_fail_open(file_path, content): tree = ast.parse(content) for node in ast.walk(tree): if isinstance(node, ast.Try): # Found a try/except block for handler in node.handlers: # Check each except handler for stmt in handler.body: if isinstance(stmt, ast.Return): # Found a return statement in an except block if isinstance(stmt.value, ast.Constant): if stmt.value.value is True: # CRITICAL: Returns True on exception return create_finding( severity="CRITICAL", cwe="CWE-636", line=stmt.lineno, message="Fail-open vulnerability detected" ) This code understands the structure of your program, not just text patterns. It walks the syntax tree, finds Try nodes, examines exception handlers, looks for Return statements, and checks if they're returning True. What I Found: Real Vulnerabilities in Real Code I tested the scanner on three different codebases where I had permission to scan. Here's what turned up: Finding #1: The Authentication Bypass That Lived for 18 Months This one was in a payment processing service. Here's the actual code (with names changed): Python def verify_transaction_signature(transaction): """Verify the cryptographic signature on a transaction""" try: public_key = get_merchant_public_key(transaction.merchant_id) signature = base64.b64decode(transaction.signature) # Verify signature public_key.verify(signature, transaction.data) return True except Exception as e: # TODO: Log this properly return True # Let it through for now This code had been in production for 18 months. When the key service was unreachable (network partition, service restart, etc.), it approved every transaction. The comment says "for now," but it shipped, and nobody remembered to fix it. Impact Analysis: During a 15-minute outage of the key service in August, this vulnerability would have allowed unauthorized transactions. They got lucky — no one tried to exploit it. But "we got lucky" is not a security strategy. Finding #2: Information Disclosure in Error Messages This pattern showed up in 12 different files across one codebase: Python @app.route('/api/user/<user_id>') def get_user_details(user_id): try: user = User.query.get(user_id) return jsonify(user.to_dict()) except Exception as e: return jsonify({ 'error': str(e), 'traceback': traceback.format_exc() }), 500 Every time this endpoint threw an exception, it sent a complete stack trace to the client. This included: Database connection strings (with redacted passwords, but still)Internal server pathsLibrary versionsQuery structures An attacker could use this to map the entire internal architecture. Finding #3: The Audit Log That Silently Failed This one is subtle but nasty: Python def process_admin_action(user, action, resource): # Perform the action result = action.execute(resource) # Log it try: audit_log.write({ 'user': user.id, 'action': action.name, 'resource': resource.id, 'timestamp': datetime.now() }) except: pass # Don't let logging failures break admin actions return result The logic seems reasonable: don't let logging failures prevent admin actions from completing. But here's what actually happened: The audit database ran out of disk space. For six hours, every admin action was completed successfully, but nothing was logged. By the time someone noticed, there was a gap in the audit trail during a period when sensitive data was accessed. The correct fix? Alert when audit logging fails: Python try: audit_log.write(event) except Exception as e: # This is CRITICAL - audit failures must be visible logger.critical(f"AUDIT FAILURE: {e}") alert_security_team(e) # Still return success, but make noise about it The Pattern Everyone Misses: Database Timeouts Here's something I noticed across all three codebases: developers test the happy path and the obvious failure cases, but nobody tests what happens when the database times out. Your tests pass. Your code review passes. Everything looks good. Then, in production, a network blip happens during authentication, the exception handler runs, and suddenly you're granting access. Why This Is Hard to Catch (And Why We Need Automated Tools) Let me show you why traditional security testing misses fail-open vulnerabilities: The problem is that fail-open vulnerabilities only manifest under specific failure conditions. Your auth service works fine in staging. It works fine in production 99.9% of the time. But that 0.1% when it doesn't? That's when the vulnerability activates. Building the Detector: The Technical Approach The core challenge was identifying return True statements that actually matter. Not every return True in an exception handler is dangerous. For example: Python def is_valid_email_format(email): """Check if email has valid format (not authentication!)""" try: validate_email_format(email) return True except FormatError: return False # This is fine - just validation This is perfectly safe because it's just format validation, not security. The scanner needs to understand context. I used several heuristics: Function name analysis: Functions with names like check_permission, verify_*, is_admin, authenticate are flagged as high-priorityReturn value context: return True is more suspicious than return False in security contextsException type: Catching Exception or bare except: is more dangerous than catching specific exceptionsSurrounding code: Database calls or API calls in the try block increase the risk Here's the prioritization logic: The Results Dashboard: Making Security Visible Nobody reads text-only security reports. I learned this the hard way at my last job when I sent a 50-page PDF of findings and got back "looks good, thanks!" So I built a visual dashboard. Here's what it shows: Plain Text ╔══════════════════════════════════════════════════════════════╗ ║ SECURITY SCAN REPORT ║ ║ OWASP Top 10 2025 Scanner ║ ╚══════════════════════════════════════════════════════════════╝ SUMMARY ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ Files Scanned: 312 Scan Duration: 2.3s Total Issues: 23 CRITICAL: 8 (Immediate action required) HIGH: 12 (Fix before next release) MEDIUM: 3 (Address in backlog) ISSUES BY CATEGORY ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ A02: Security Misconfiguration █████████ 12 A10: Exception Handling ████████ 8 A04: Cryptographic Failures ██ 3 TOP 5 CRITICAL ISSUES ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1. Fail-Open Auth Check src/auth.py:45 CWE-636 2. Hardcoded API Key config.py:12 CWE-798 3. Fail-Open Admin Check admin/views.py:89 CWE-636 4. Database Password in Code settings.py:34 CWE-798 5. Info Disclosure in Error api/handlers.py:67 CWE-209 This gets attention. People actually read it and ask questions. CI/CD Integration: Preventing Future Vulnerabilities The scanner is only useful if it's in your pipeline. I built a GitHub Actions workflow that runs on every pull request: YAML name: Security Scan on: [pull_request] jobs: security-scan: runs-on: ubuntu-latest steps: - uses: actions/checkout@v3 - name: Run OWASP 2025 Scanner run: | pip install owasp-scanner-2025 owasp-scan . \ --categories A02,A10 \ --fail-on critical \ --formats sarif,json - name: Upload results to GitHub Security uses: github/codeql-action/upload-sarif@v2 with: sarif_file: security-reports/*.sarif - name: Comment on PR if: failure() uses: actions/github-script@v6 with: script: | github.rest.issues.createComment({ issue_number: context.issue.number, owner: context.repo.owner, repo: context.repo.repo, body: 'Security scan found critical issues. Check the Security tab for details.' }) The --fail-on critical flag means the PR can't be merged if critical issues are found. This prevents the "I'll fix it later" problem. Lessons Learned: What Surprised Me After scanning dozens of repositories, some patterns emerged that I didn't expect: 1. Fail-Open Is More Common Than You Think I expected to find maybe one or two instances. Instead, I found fail-open patterns in roughly 15% of codebases. They cluster around certain types of operations: Authentication and authorization (most dangerous)Rate limiting ("let it through if Redis is down")Feature flags ("default to enabled on error")Payment verification ("process it anyway") 2. The Comments Tell a Story Almost every fail-open vulnerability had a comment nearby: "TODO: Fix this properly""Temporary workaround""Just for testing""Fix before production" (it was in production) These were all well-intentioned temporary fixes that became permanent. 3. Nobody Tests the Failure Cases I looked at the test suites for several projects. They had excellent coverage of the happy path. But almost zero tests for "what happens when the database is unreachable." The reason? It's hard to test. You need to mock network failures, database timeouts, and service unavailability. Most test suites aren't set up for that level of chaos engineering. 4. Senior Developers Write This Code Too This isn't a junior developer problem. I found fail-open patterns in code written by senior engineers, architects, and even security-focused developers. The issue isn't skill level - it's visibility. Exception handlers are usually written last, often during debugging, and don't get the same scrutiny as main logic paths. The Fix: How to Write Exception Handlers Correctly Here's the pattern I recommend for security-critical exception handling: Python ###WRONG: Fail-Open#### def check_permission(user, resource): try: return auth_service.verify(user, resource) except Exception: return True Python ####CORRECT: Fail-Closed#### def check_permission(user, resource): try: return auth_service.verify(user, resource) except Exception as e: logger.error(f"Auth failed: {e}", extra={ 'user_id': user.id, 'resource': resource.id }) metrics.increment('auth.errors') return False # Deny by default The key principles: Fail closed: Default to the secure option (deny access, block request, etc.)Log it: Make failures visible so you know when systems are failingMonitor it: Increment metrics so you can alert on error spikesBe specific: Catch specific exceptions when possible, not broad Exception What's Next: The Roadmap The scanner works, but there's room for improvement: Planned Features JavaScript/TypeScript AST parsing – Currently using regex for JS, want full AST analysisMachine learning for false positive reduction – Train on labeled examples to improve accuracyRuntime monitoring integration – Hook into APM tools to catch exceptions that only occur in productionFix suggestions – Not just "this is wrong" but "here's the correct code"Severity customization – Let teams define their own risk thresholds Try It Yourself The scanner is open source and available now. Here's how to use it: Shell # Install pip install owasp-scanner-2025 # Scan your codebase owasp-scan /path/to/your/project # Focus on critical issues only owasp-scan . --severity critical,high # Generate a report owasp-scan . --formats html,json --output ./reports # Use in CI/CD (fail build on critical issues) owasp-scan . --fail-on critical GitHub repository: github.com/dinesh-k-elumalai/owasp-scanner-2025 Final Thoughts: Exception Handling Is Security Here's what this weekend project taught me: we've been thinking about exception handling wrong. We treat it as error recovery - a way to gracefully handle failures and keep the application running. And that's true. But in security-critical code paths, exception handlers are part of your security boundary. When you write return True in an exception handler, you're not just handling an error. You're making a security decision: what happens when authentication fails, when the crypto library throws an exception, when the rate limiter can't reach Redis. OWASP's decision to add "Mishandling of Exceptional Conditions" as a standalone category recognizes this reality. In modern distributed systems with dozens of microservices, network calls to external APIs, and cloud service dependencies, failures are the norm, not the exception. Your exception handlers will run in production. Probably more often than you think. The question is: when they do, will they make the secure choice? Key Takeaways Fail-open patterns (CWE-636) are more common than anyone realizesTraditional testing misses these vulnerabilities because they only manifest during failuresAST-based static analysis can catch these issues before they reach productionException handlers in security-critical code need the same scrutiny as main logic pathsThe secure default is always "deny" - fail closed, not open If you take away one thing from this article, make it this: next time you write an exception handler in authentication, authorization, or any security-critical code, ask yourself: "If this exception actually fires in production, am I granting access or denying it?" Because somewhere in your codebase, there's probably a return True that shouldn't be there.
Apache Phoenix is an open-source, SQL skin over Apache HBase that enables lightning-fast OLTP (Online Transactional Processing) operations on petabytes of data using standard SQL queries. Phoenix helps combine the scalability of NoSQL with the familiarity and power of SQL. By supporting large-scale aggregate and non-aggregate functionality, Phoenix has evolved into an OLTP and OLAP (Online Analytical Processing) database. This makes it a compelling choice for organizations looking to combine real-time data processing with complex analytical querying in a single, unified system. Phoenix supports several variable-length data types: VARCHARVARBINARYDECIMALAll array data types (e.g., BINARY ARRAY, BOOLEAN ARRAY, CHAR ARRAY, DATE ARRAY, VARCHAR ARRAY, etc.) Out of the variable-length data types, VARBINARY represents variable-length binary blobs. The value of the VARBINARY data type is similar to the HBase row key. Clients can provide non-fixed-length binary values. On the other hand, if the max length of the column is known in advance, clients can consider using the fixed-length binary data type: BINARY(<length>), e.g., BINARY(10), BINARY(25), BINARY(200), etc. HBase provides a single row key. Any client application that requires using more than one column for primary keys, using HBase requires special handling of storing both column values as a single binary row key. Phoenix provides the ability to use more than one primary key by providing composite primary keys. A composite primary key can contain any number of primary key columns. Phoenix also provides the ability to add new nullable primary key columns to the existing composite primary keys. Phoenix uses HBase as its backing store. To allow users to define multiple primary keys, Phoenix internally concatenates the binary-encoded values of each primary key column and uses the resulting concatenated binary value as the HBase row key. In order to efficiently concatenate as well as retrieve individual primary key values, Phoenix implements two ways: For fixed-length columns: The length of the given column is determined by the maximum length of the column. As part of the read flow, while iterating through the row key, fixed-length numbers of bytes are retrieved while reading. While writing, if the original encoded value of the given column has less number of bytes, additional null bytes (\x00) are padded until the fixed length is filled up. Hence, for smaller values, we end up wasting some space.For variable-length columns: Since we cannot know the length of the value of a variable-length data type in advance, a separator or terminator byte is used. Phoenix uses a null byte as a separator (\x00). As of today, VARCHAR is the most commonly used variable-length data type, and since VARCHAR represents String, the null byte is not part of valid String characters. Hence, it can be effectively used to determine when to terminate the given VARCHAR value. Problem Statement The null byte (\x00) works fine as a separator for VARCHAR. However, it cannot be used as a separator byte for VARBINARY because VARBINARY can contain any binary blob values. Due to this, Phoenix has restrictions for the VARBINARY type: It can only be used as the last part of the composite primary key. It cannot be used as a DESC order primary key column. Using the VARBINARY data type as an earlier portion of the composite primary key is a valid use case. One can also use multiple VARBINARY primary key columns. After all, Phoenix provides the ability to use multiple primary key columns for users. Besides, using a secondary index on a data table means that the composite primary key of the secondary index table includes: PHP <secondary-index-col1> <secondary-index-col2> … <secondary-index-colN> <primary-key-col1> <primary-key-col2> … <primary-key-colN> As primary key columns are appended to the secondary index columns, one cannot create a secondary index on any VARBINARY column. It is also important to consider that the original sort order of the binary should not be compromised. Solutions Use Length Information as the Terminator Bytes Embedding length information as a prefix can compromise the sort order. We can embed the length information as a suffix of the row keys. However, this can also change the sort order. The only way we can use this approach is if the last portion of the composite primary key needs a strict sort order, but earlier portions of the composite keys (i.e., partition keys) do not have a strict sort order requirement. In any case, encoding length information for individual VARBINARY column values can change the sort order. Therefore, this is not a promising solution. Use Different Separator Bytes Using a null byte as a separator for variable-length binary is not feasible. However, we can encode the binary values such that we can use a separator that is never guaranteed to be present in the binary data. This does require encoding of the binary blob. We can use two-byte separators for binary, e.g., \x00\x01 for ASC-ordered variable-length binary values. This requires encoding binary values such that the (\x00\x01) sequence is never present. We can encode every null byte (\x00) in the binary value by appending the reverse of the null byte (\xFF) to it, i.e., every \x00 byte is encoded to \x00\xFF while storing the value in HBase. Similarly, while retrieving the value, every sequence of \x00\xFF is decoded to \x00. Examples: YAML Binary data: \xFE\xC8\x02\x80\x00\x02 Encoded data: \xFE\xC8\x02\x80\x00\xFF\x02 Binary data: \xEB\xFF\x00\x019\xAD\x00\xFF Encoded data: \xEB\xFF\x00\xFF\x019\xAD\x00\xFF\xFF In the second example, \x00\xFF becomes \x00\xFF\xFF. If we were to concatenate the above-mentioned bytes into a single row key: This encoded version of binary data ensures that separator bytes (\x00\x01) always stay unique. We also need to support DESC order for variable-length binary data type. As we invert the byte values for any DESC order data type, the same must be followed here. bytes valueinverted bytes value \x00 \xFF \xFF \x00 \x00\xFF \xFF\x00 \x01 \xFE \x00\x01 \xFF\xFE Hence, for DESC-ordered binary columns, every sequence of \xFF in the original binary data needs to be encoded to \xFF\x00, and separator bytes are \xFF\xFE. By using the new separate bytes, a new data type has evolved in Phoenix: VARBINARY_ENCODED. Let's understand how the data is decoded for the VARBINARY_ENCODED data type: One major challenge in retrieving individual primary key column values is that RowKeyValueAccessor does not retain column data type information while iterating over the row key. It currently has only three values: Offsets as an int arrayDoes the column value have a fixed length?Does it have the separator byte? Offsets contain information about preceding columns. For every fixed-length column in the composite key that has been visited so far, it maintains the fixed-width value for that column. For every variable-length column in the composite key that has been visited so far, it maintains a value of -1. For subsequent variable-length columns, it keeps adding -1 to the offset. While retrieving individual column values, the iteration starts from offset 0 in the HBase row key. For every negative value, a subsequent number of separator bytes (\x00) are searched. For instance, if the offset value at the current index is -1, the iteration stops when the first \x00 byte is identified. Similarly, when the offset value is -2, the iteration stops only after \x00 byte is read twice. This is to skip two consecutive variable-length data types. With the introduction of the VARBINARY_ENCODED column that requires different separator bytes, we can no longer rely only on offset values maintained by RowKeyValueAccessor. We also need to determine whether the traversed primary key is VARBINARY_ENCODED and hence requires separator bytes \x00\x01 (or \xFF\xFE). As we introduce new fields, we also need to serialize and deserialize them. As RowKeyValueAccessor is an Expression, its serialization takes place with other Expressions. Hence, in order to maintain compatibility with old clients, we need to introduce new separator bytes for the (de)serialization purpose. Deserialization of the new RowKeyValueAccessor fields must be done carefully because we might end up increasing the read offset of the DataInput or DataInputStream holding the underlying bytes. For Phoenix, DataInput can be either of type DataInputBuffer or DataInputStream (with mark support), and hence, the underlying byte array must be accessed to read the RowKeyValueAccessor separator bytes without increasing the offset value. Let’s consider an example of composite primary keys: SQL CREATE TABLE T1 ( VARCHAR pk1, VARCHAR pk2, INTEGER pk3 NOT NULL, CHAR(5) pk4 NOT NULL, VARCHAR pk5, DOUBLE col1, VARCHAR col2, CONSTRAINT pk PRIMARY KEY (pk1, pk2, pk3, pk4, pk5) ) When we retrieve the value for column pk5, row key iteration takes place according to this structure: As the negative offset values require us to scan for separator bytes, and so far we have had only the null byte as a separator, this is not sufficient with the evolution of VARBINARY_ENCODED. Now we expand the structure of RowKeyValueAccessor and include variable-length data type information for the prefix columns, as well as their sort orders. As separator bytes are different for VARBINARY_ENCODED and VARCHAR, and with the combination of ASC and DESC orders, we have a total of four separator bytes; we preserve both data types and sort orders. Separator bytes for each case: SQL VARCHAR with ASC order: \x00 VARCHAR with DESC order: \xFF VARBINARY_ENCODED with ASC order: \x00\xFF VARBINARY_ENCODED with DESC order: \xFF\x00 As RowKeyValueAccessor is an Expression, the serialization and deserialization of its fields now requires us to serialize and deserialize additional fields. However, several Expression implementations are serialized and combined together. During deserialization, we might end up accessing unnecessary bytes if RowKeyValueAccessor was serialized by an old client. Hence, we need new separator bytes for the purpose of maintaining compatibility between old and new clients during the serialization and deserialization process. This makes sure the VARBINARY_ENCODED data type can be seamlessly used with Phoenix 5.3.0 onwards versions, without breaking client compatibility for old data types used by old Phoenix versions (lower than 5.3.0). The addition of VARBINARY_ENCODED data type support in Phoenix represents a significant key enhancement for modern LLM workloads, providing strongly consistent OLTP capabilities with optimized binary encoding support.
The data serialization format is a key factor when dealing with stream processing, as it decides how efficiently the data is forwarded on the wire and optimized internally in order to be stored, understood, and processed by a distributed system. The data serialization format is core to stream processing in that it directly influences the speed, reliability, scalability, and maintainability of the entire pipeline. Choosing the right one can eliminate expensive lock-ins and ensure that our streaming infrastructure remains stable as data volume and intricacy evolve. In a stream-processing platform where millions of events per second must be handled with low latency by ingestion systems such as Apache Kafka and processing engines like Flink or Spark, reducing CPU usage is important, as it depends on efficient data formats. Exploring TOON as a Data Format Early in the new millennium, Douglas Crockford popularized JSON, which was designed with humans in mind. For APIs to consume data or return responses, it is ubiquitous, readable, and accessible. However, one drawback of JSON has been apparent in the current AI era: it is rather verbose. Additionally, real-time data streaming has started having an important impact on modern AI models for applications that need quick decisions. TOON stands for Token-Oriented Object Notation, a lightweight, line-oriented data format. It is too human-readable (more than binary formats), like JSON, but more compact and structured than raw text. TOON is built to be very simple to parse, where each line or “entry” begins with a token header (uppercase letters or digits), then uses pipe separators (|) for fields. Given the importance of streaming environments, it is optimized to be line-oriented, and we do not need to build a full in-memory parse tree (unlike JSON), which makes it suitable for low-memory contexts, embedded systems, or logs. Here is a simple example of JSON with a shoes array that contains information about two shoes (two objects): JSON { "shoes": [ { "id": 1, "name": "Nike", "type": "running" }, { "id": 2, "name": "Adidas", "type": "walking" } ] } Now, let’s convert the same data above into TOON, the it will be like below. Shell shoes[2]{id,name,type}: 1, Nike, running 2, Adidas, walking Simple, right? Because in TOON, we don’t need quotes, braces, or colons. The lines are simply the data rows and shoes[2]{id,name,type}: declares an array of two objects with the fields id, name, and type. Now we can see how TOON visibly reduced the token usage by 30–50%, depending on the data shape Is TOON Better Than JSON? As we know, real-time data streaming plays a key role for AI models as it allows them to handle and respond to data as it comes in, instead of just using old fixed datasets. To build or develop such a platform or architecture where processed streaming data eventually feeds into AI systems like TensorFlow, etc, TOON provides several key advantages over JSON, especially for large language models (LLMs), where JSON is considered to be heavyweight for data exchange because of thousands of tokens in quotes, braces, colons, and repeated keys. Using TOON, we can reduce token usage by 30-50% for uniform data sets, and it has less syntactic clutter, which makes it easier for LLMs. Besides, TOON can be nested, similar to JSON. Similar to JSON, TOON can have a simple object, an array of values, an array of objects, and an array of objects with nested fields. In the case of an array of objects with nested fields, TOON can be much more understandable and much smaller than JSON. TOON is a token-efficient serialization format that is primarily designed for streaming, low-memory environments, and LLM contexts. What More for Apache Kafka Before ingesting data streams from various real-time sources via producers into the multi-node Apache Kafka cluster, we first require a TOON parser that can translate its unique structural markers into a common internal representation like JSON, as TOON is usually a hierarchically annotated, nested format. Secondly, there should be an implementation of a schema-extraction layer for the TOON data format to normalize fields such as rich metadata and embedded annotations. To enforce consistent types before producing messages to Kafka’s topic, the former step is necessary. On top of that, we need to have data validation rules so that malformed frames or unsupported TOON constructs can be handled. Besides, if the input stream data format carries large embedded objects from the producers to Kafka's topic, then pre-serialization compression is essential. And we should design a proper Kafka message key mechanism that is specific to TOON identifiers in order to preserve ordering and enable efficient deserialization for the consumers in downstream applications. The community-driven Java implementation of TOON has been released under the MIT license on GitHub and can be useful if message producers are to be developed using Java. Takeaway TOON is a new data serialization format designed to reduce the number of tokens when exchanging structured data, primarily with language models. Although majorly beneficial in LLM-specific pipelines, we can use it to ingest stream data into Apache Kafka's topic, as it's a compact and token-efficient serialization format. TOON is not Kafka-native and still relatively young compared to JSON, Avro, or Protobuf. JSON or binary formats might be better for deeply nested structures or highly heterogeneous data for incoming messages to Kafka's topic. As TOON is not widely supported yet, we may need to write custom serializers/deserializers while integrating with existing message producers and consumers for downstream applications/components across the entire stream processing platform. If we are especially concerned with efficient parsing and minimizing overhead, then TOON could be a very well-suited message payload format for Apache Kafka. Thank you for reading! If you found this article valuable, please consider liking and sharing it.
From Mechanical Ceremonies to Agile Conversations
December 2, 2025
by
CORE
When Leadership Blocks Your Pre-Mortem
November 24, 2025
by
CORE
Beyond the Vibe: Why AI Coding Workflows Need a Framework
November 17, 2025 by
Discover Hidden Patterns with Intelligent K-Means Clustering
December 5, 2025 by
AWS Agentic AI for App Portfolio Modernization
December 5, 2025 by
From Containers to WebAssembly: The Next Evolution in Cloud-Native Architecture
December 5, 2025 by
Discover Hidden Patterns with Intelligent K-Means Clustering
December 5, 2025 by
Designing a CPU-Efficient Redis Cluster Topology
December 5, 2025 by
AWS Agentic AI for App Portfolio Modernization
December 5, 2025 by
Designing a CPU-Efficient Redis Cluster Topology
December 5, 2025 by
AWS Agentic AI for App Portfolio Modernization
December 5, 2025 by
From Containers to WebAssembly: The Next Evolution in Cloud-Native Architecture
December 5, 2025 by
Discover Hidden Patterns with Intelligent K-Means Clustering
December 5, 2025 by
Designing a CPU-Efficient Redis Cluster Topology
December 5, 2025 by
AWS Agentic AI for App Portfolio Modernization
December 5, 2025 by
Discover Hidden Patterns with Intelligent K-Means Clustering
December 5, 2025 by
AWS Agentic AI for App Portfolio Modernization
December 5, 2025 by
The Hidden Backbone of AI: Why Data Engineering is Key for Model Success
December 5, 2025 by