DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Zones

Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks

Modernizing APIs: Share your thoughts on GraphQL, AI, microservices, automation, and more for our April report (+ enter a raffle for $250!).

DZone Research Report: A look at our developer audience, their tech stacks, and topics and tools they're exploring.

Getting Started With Large Language Models: A guide for both novices and seasoned practitioners to unlock the power of language models.

Managing API integrations: Assess your use case and needs — plus learn patterns for the design, build, and maintenance of your integrations.

DZone Spotlight

Thursday, February 8 View All Articles »
What Is Platform Engineering?

What Is Platform Engineering?

By Josephine Eskaline Joyce
Platform engineering is the creation and management of foundational infrastructure and automated processes, incorporating principles like abstraction, automation, and self-service, to empower development teams, optimize resource utilization, ensure security, and foster collaboration for efficient and scalable software development. In today's fast-paced world of software development, the evolution of "platform engineering" stands as a transformative force, reshaping the landscape of software creation and management. This comprehensive exploration aims to demystify the intricate realm of platform engineering, shedding light on its fundamental principles, multifaceted functions, and its pivotal role in revolutionizing streamlined development processes across industries. Key Concepts and Principles Platform engineering encompasses several key concepts and principles that underpin the design and implementation of internal platforms. One fundamental concept is abstraction, which involves shielding developers from the complexities of underlying infrastructure through well-defined interfaces. Automation is another crucial principle, emphasizing the use of scripting and tools to streamline repetitive tasks, enhance efficiency, and maintain consistency in development processes. Self-service is pivotal, empowering development teams to independently provision and manage resources. Scalability ensures that platforms can efficiently adapt to varying workloads, while resilience focuses on the system's ability to recover from failures. Modularity encourages breaking down complex systems into independent components, fostering flexibility and reusability. Consistency promotes uniformity in deployment and configuration, aiding troubleshooting and stability. API-first design prioritizes the development of robust interfaces, and observability ensures real-time monitoring and traceability. Lastly, security by design emphasizes integrating security measures throughout the entire development lifecycle, reinforcing the importance of a proactive approach to cybersecurity. Together, these concepts and principles guide the creation of robust, scalable, and developer-friendly internal platforms, aligning with the evolving needs of modern software development. Diving Into the Role of a Platform Engineering Team The platform engineering team operates at the intersection of software development, operational efficiency, and infrastructure management. Their primary objective revolves around sculpting scalable and efficient internal platforms that empower developers. Leveraging automation, orchestration, and innovative tooling, these teams create standardized environments for application deployment and management, catalyzing productivity and performance. Image source Elaborating further on the team's responsibilities, it's essential to highlight their continuous efforts in optimizing resource utilization, ensuring security and compliance, and establishing robust monitoring and logging mechanisms. Their role extends beyond infrastructure provisioning, encompassing the facilitation of collaboration among development, operations, and security teams to achieve a cohesive and agile software development ecosystem. Building Blocks of Internal Platforms Central to platform engineering is the concept of an Internal Developer Platform (IDP) - a tailored environment equipped with an array of tools, services, and APIs. This environment streamlines the development lifecycle, offering self-service capabilities that enable developers to expedite the build, test, deployment, and monitoring of applications. Internal platforms in the context of platform engineering encompass various components that work together to provide a unified and efficient environment for the development, deployment, and management of applications. The specific components may vary depending on the platform's design and purpose, but here are some common components: Infrastructure as Code (IaC) Containerization and orchestration Service mesh API Gateway CI/CD pipelines Monitoring and logging Security components Database and data storage Configuration management Workflow orchestration Developer tools Policy and governance Benefits of Internal Platforms Internal platforms in platform engineering offer a plethora of benefits, transforming the software development landscape within organizations. These platforms streamline and accelerate the development process by providing self-service capabilities, enabling teams to independently provision resources and reducing dependencies on dedicated operations teams. Automation through CI/CD pipelines enhances efficiency and ensures consistent, error-free deployments. Internal platforms promote scalability, allowing organizations to adapt to changing workloads and demands. The modularity of these platforms facilitates code reusability, reducing development time and effort. By abstracting underlying infrastructure complexities, internal platforms empower developers to focus on building applications rather than managing infrastructure. Collaboration is enhanced through centralized tools, fostering communication and knowledge sharing. Additionally, internal platforms contribute to improved system reliability, resilience, and observability, enabling organizations to deliver high-quality, secure software at a faster pace. Overall, these benefits make internal platforms indispensable for organizations aiming to stay agile and competitive in the ever-evolving landscape of modern software development. Challenges in Platform Engineering Platform engineering, while offering numerous benefits, presents a set of challenges that organizations must navigate. Scalability issues can arise as the demand for resources fluctuates, requiring careful design and management to ensure platforms can efficiently scale. Maintaining a balance between modularity and interdependence poses a challenge, as breaking down systems into smaller components can lead to complexity and potential integration challenges. Compatibility concerns may emerge when integrating diverse technologies, requiring meticulous planning to ensure seamless interactions. Cultural shifts within organizations may be necessary to align teams with the principles of platform engineering, and skill gaps may arise, necessitating training initiatives. Additionally, achieving consistency across distributed components and services can be challenging, impacting the reliability and predictability of the platform. Balancing security measures without hindering development speed is an ongoing challenge, and addressing these challenges demands a holistic and strategic approach to platform engineering that considers technical, organizational, and cultural aspects. Implementation Strategies in Platform Engineering Following are the top five implementation strategies: Start small and scale gradually: Begin with a focused and manageable scope, such as a pilot project or a specific team. This allows for the identification and resolution of any initial challenges in a controlled environment. Once the initial implementation proves successful, gradually scale the platform across the organization. Invest in training and skill development: Provide comprehensive training programs to ensure that development and operations teams are well-versed in the tools, processes, and concepts associated with platform engineering. Investing in skill development ensures that teams can effectively utilize the platform and maximize its benefits. Automate key processes with CI/CD: Implement Continuous Integration (CI) and Continuous Deployment (CD) pipelines to automate crucial aspects of the development lifecycle, including code building, testing, and deployment. Automation accelerates development cycles, reduces errors, and enhances overall efficiency. Cultivate DevOps practices: Embrace DevOps practices that foster collaboration and communication between development and operations teams. promotes shared responsibility, collaboration, and a holistic approach to software development, aligning with the principles of platform engineering. Iterative improvements based on feedback: Establish a feedback loop to gather insights and feedback from users and stakeholders. Regularly review performance metrics, user experiences, and any challenges faced during the implementation. Use this feedback to iteratively improve the platform, addressing issues and continuously enhancing its capabilities. These top five strategies emphasize a phased and iterative approach, coupled with a strong focus on skill development, automation, and collaborative practices. Starting small, investing in training, and embracing a DevOps culture contribute to the successful implementation and ongoing optimization of platform engineering practices within an organization. Platform Engineering Tools Various tools aid platform engineering teams in building, maintaining, and optimizing platforms. Examples include: Backstage: Developed by Spotify, it offers a unified interface for accessing essential tools and services. Kratix: An open-source tool designed for infrastructure management and streamlining development processes Crossplane: An open-source tool automating infrastructure via declarative APIs, supporting tailored platform solutions Humanitec: A comprehensive platform engineering tool facilitating easy platform building, deployment, and management Port: A platform enabling the building of developer platforms with a rich software catalog and role-based access control Case Studies of Platform Engineering Spotify Spotify is known for its adoption of a platform model to empower development teams. They use a platform called "Backstage," which acts as an internal developer portal. Backstage provides a centralized location for engineers to discover, share, and reuse services, tools, and documentation. It streamlines development processes, encourages collaboration, and improves visibility into the technology stack. Netflix Netflix is a pioneer in adopting a microservices architecture and has developed an internal platform called the Netflix Internal Platform Engineering (NIPE). The platform enables rapid application deployment, facilitates service discovery, and incorporates fault tolerance. Uber Uber has implemented an internal platform called "Michelangelo" to streamline machine learning (ML) workflows. Michelangelo provides tools and infrastructure to support end-to-end ML development, from data processing to model deployment. Salesforce Salesforce has developed an internal platform known as "Salesforce Lightning Platform." This platform enables the creation of custom applications and integrates with the Salesforce ecosystem. It emphasizes low-code development, allowing users to build applications with minimal coding, accelerating the development process, and empowering a broader range of users. Distinguishing Platform Engineering From SRE While both platform engineering and Site Reliability Engineering (SRE) share goals of ensuring system reliability and scalability, they diverge in focus and approach. Platform engineering centers on crafting foundational infrastructure and tools for development, emphasizing the establishment of internal platforms that empower developers. In contrast, SRE focuses on operational excellence, managing system reliability, incident response, and ensuring the overall reliability, availability, and performance of production systems. Further Reading: Top 10 Open Source Projects for SREs and DevOps Engineers. ACTORS SRE PLATFORM ENGINEERING Scope Focused on creating a development-friendly platform and environment. Focused on reliability and performance of applications and services in production. Responsibilities Platform Engineers design and maintain internal platforms, emphasizing tools and services for development teams. SREs focus on operational aspects, automating tasks, and ensuring the resilience and reliability of production systems. Abstraction Level Platform Engineering abstracts infrastructure complexities for developers, providing a high-level platform. SRE deals with lower-level infrastructure details, ensuring the reliability of the production environment. DevOps vs Platform Engineering DevOps and platform engineering are distinct methodologies addressing different aspects of software development. DevOps focuses on collaboration and automation across the entire software delivery lifecycle, while platform engineering concentrates on providing a unified and standardized platform for developers. The table below outlines the differences between DevOps and platform engineering. Factors DevOps Platform Engineering Objective Streamline development and operations Provide a unified and standardized platform for developers Principles Collaboration, Automation, CI, CD Enable collaboration, Platform as a Product, Abstraction, Standardization, Automation Scope Extends to the entire software delivery lifecycle Foster collaboration between dev and ops teams, providing a consistent environment for the entire lifecycle Tools Uses a wide range of tools at different stages in the lifecycle Integrates a diverse set of tools into the platform Benefits Faster development & deployment cycles, higher collaboration Efficient and streamlined development environment, improved productivity, and flexibility for developers Future Trends in Platform Engineering Multi-cloud and hybrid platforms: Platform engineering is expected to focus on providing solutions that seamlessly integrate and manage applications across different cloud providers and on-premises environments. Edge computing platforms: Platforms will need to address challenges related to latency, connectivity, and management of applications deployed closer to end-users. AI-driven automation: The integration of artificial intelligence (AI) and machine learning (ML) into platform engineering is expected to increase. AI-driven automation can optimize resource allocation, improve predictive analytics for performance monitoring, and enhance security measures within platforms. Serverless architectures: Serverless computing is anticipated to become more prevalent, leading to platform engineering solutions that support serverless architectures. This trend focuses on abstracting server management, allowing developers to focus solely on writing code. Observability and AIOps: Observability, including monitoring, tracing, and logging, will continue to be a key focus. AIOps (Artificial Intelligence for IT Operations) will likely play a role in automating responses to incidents and predicting potential issues within platforms. Low-code/no-code platforms: The rise of low-code/no-code platforms is likely to influence platform engineering, enabling a broader range of users to participate in application development with minimal coding. Platform engineering will need to support and integrate with these development approaches. Quantum computing integration: As quantum computing progresses, platform engineering may need to adapt to support the unique challenges and opportunities presented by quantum applications and algorithms. Zero Trust Security: Zero Trust Security models are becoming increasingly important. Future platform engineering will likely focus on implementing and enhancing security measures at every level, considering the principles of zero trust in infrastructure and application security. More
Exploring the New Eclipse JNoSQL Version 1.1.0: A Dive Into Oracle NoSQL

Exploring the New Eclipse JNoSQL Version 1.1.0: A Dive Into Oracle NoSQL

By Otavio Santana DZone Core CORE
In the ever-evolving world of software development, staying up-to-date with the latest tools and frameworks is crucial. One such framework that has been making waves in NoSQL databases is Eclipse JNoSQL. This article will deeply dive into the latest release, version 1.1.0, and explore its compatibility with Oracle NoSQL. Understanding Eclipse JNoSQL Eclipse JNoSQL is a Java-based framework that facilitates seamless integration between Java applications and NoSQL databases. It leverages Java enterprise standards, specifically Jakarta NoSQL and Jakarta Data, to simplify working with NoSQL databases. The primary objective of this framework is to reduce the cognitive load associated with using NoSQL databases while harnessing the full power of Jakarta EE and Eclipse MicroProfile. With Eclipse JNoSQL, developers can easily integrate NoSQL databases into their projects using Widfly, Payara, Quarkus, or other Java platforms. This framework bridges the Java application layer and various NoSQL databases, making it easier to work with these databases without diving deep into their intricacies. What’s New in Eclipse JNoSQL Version 1.1.0 The latest version of Eclipse JNoSQL, 1.1.0, comes with several enhancements and upgrades to make working with NoSQL databases smoother. Let’s explore some of the notable changes: Jakarta Data Version Upgrade In this release, one of the significant updates in Eclipse JNoSQL version 1.1.0 is upgrading the Jakarta Data version to M2. To better understand the importance of this upgrade, let’s delve into what Jakarta Data is and how it plays a crucial role in simplifying data access across various database types. Jakarta Data Jakarta Data is a specification that provides a unified API for simplified data access across different types of databases, including both relational and NoSQL databases. This specification is part of the broader Jakarta EE ecosystem, which aims to offer a standardized and consistent programming model for enterprise Java applications. Jakarta Data empowers Java developers to access diverse data repositories straightforwardly and consistently. It achieves this by introducing concepts like Repositories and custom query methods, making data retrieval and manipulation more intuitive and developer-friendly. One of the key features of Jakarta Data is its flexibility in allowing developers to compose custom query methods on Repository interfaces. This flexibility means that developers can craft specific queries tailored to their application’s needs without manually writing complex SQL or NoSQL queries. This abstraction simplifies the interaction with databases, reducing the development effort required to access and manipulate data. The Goal of Jakarta Data The primary goal of Jakarta Data is to provide a familiar and consistent programming model for data access while preserving the unique characteristics and strengths of the underlying data stores. In other words, Jakarta Data aims to abstract away the intricacies of interacting with different types of databases, allowing developers to focus on their application logic rather than the specifics of each database system. By upgrading Eclipse JNoSQL to use Jakarta Data version M2, the framework aligns itself with the latest Jakarta EE standards and leverages the newest features and improvements introduced by Jakarta Data. It ensures that developers using Eclipse JNoSQL can benefit from the enhanced capabilities and ease of data access that Jakarta Data brings. Enhanced Support for Inheritance One of the standout features of Eclipse JNoSQL is its support for Object-Document Mapping (ODM). It allows developers to work with NoSQL databases in an object-oriented manner, similar to how they work with traditional relational databases. In version 1.1.0, the framework has enhanced its support for inheritance, making it even more versatile when dealing with complex data models. Oracle NoSQL Database: A Brief Overview Before we conclude, let’s take a moment to understand the database we’ll work with – Oracle NoSQL Database. Oracle NoSQL Database is a distributed key-value and document database developed by Oracle Corporation. It offers robust transactional capabilities for data manipulation, horizontal scalability to handle large workloads, and simplified administration and monitoring. It is particularly well-suited for applications that require low-latency access to data, flexible data models, and elastic scaling to accommodate dynamic workloads. The Oracle NoSQL Database Cloud Service provides a managed cloud platform for deploying applications that require the capabilities of Oracle NoSQL Database, making it even more accessible and convenient for developers. Show Me the Code We’ll create a simple demo application to showcase the new features of Eclipse JNoSQL 1.1.0 and its compatibility with Oracle NoSQL. This demo will help you understand how to set up the environment, configure dependencies, and interact with Oracle NoSQL using Eclipse JNoSQL. Prerequisites Before we begin, ensure you have an Oracle NoSQL instance running. You can use either the “primes” or “cloud” flavor. For local development, you can run Oracle NoSQL in a Docker container with the following command: Shell docker run -d --name oracle-instance -p 8080:8080 ghcr.io/oracle/nosql:latest-ce Setting up the Project We’ll create a Java SE project using the Maven Quickstart Archetype to keep things simple. It will give us a basic project structure to work with. Project Dependencies For Eclipse JNoSQL to work with Oracle NoSQL, we must include specific dependencies. Additionally, we’ll need Jakarta CDI, Eclipse MicroProfile, Jakarta JSONP, and the Eclipse JNoSQL driver for Oracle NoSQL. We’ll also include “datafaker” for generating sample data. Here are the project dependencies: XML <dependencies> <dependency> <groupId>org.eclipse.jnosql.databases</groupId> <artifactId>jnosql-oracle-nosql</artifactId> <version>1.1.0</version> </dependency> <dependency> <groupId>net.datafaker</groupId> <artifactId>datafaker</artifactId> <version>2.0.2</version> </dependency> </dependencies> Configuration Eclipse JNoSQL relies on configuration properties to establish a connection to the database. This is where the flexibility of Eclipse MicroProfile Config shines. You can conveniently define these properties in your application.properties or application.yml file, allowing for easy customization of your database settings. Remarkably, Eclipse JNoSQL caters to key-value and document databases, a testament to its adaptability. Despite Oracle NoSQL's support for both data models, the seamless integration and configuration options provided by Eclipse JNoSQL ensure a smooth experience, empowering developers to effortlessly switch between these database paradigms to meet their specific application needs. Properties files # Oracle NoSQL Configuration jnosql.keyvalue.database=cars jnosql.document.database=cars jnosql.oracle.nosql.host=http://localhost:8080 Creating a Model for Car Data in Eclipse JNoSQL After setting up the database configuration, the next step is to define the data model to be stored in it. The process of defining a model is consistent across all databases in Eclipse JNoSQL. For instance, in this example, we will form a data model for cars using a basic Car class. Java @Entity public class Car { @Id private String id; @Column private String vin; @Column private String model; @Column private String make; @Column private String transmission; // Constructors, getters, and setters @Override public boolean equals(Object o) { if (this == o) { return true; } if (o == null || getClass() != o.getClass()) { return false; } Car car = (Car) o; return Objects.equals(id, car.id); } @Override public int hashCode() { return Objects.hashCode(id); } @Override public String toString() { return "Car{" + "id='" + id + '\'' + ", vin='" + vin + '\'' + ", model='" + model + '\'' + ", make='" + make + '\'' + ", transmission='" + transmission + '\'' + '}'; } // Factory method to create a Car instance public static Car of(Faker faker) { Vehicle vehicle = faker.vehicle(); Car car = new Car(); car.id = UUID.randomUUID().toString(); car.vin = vehicle.vin(); car.model = vehicle.model(); car.make = vehicle.make(); car.transmission = vehicle.transmission(); return car; } } In the Car class, we use annotations to define how the class and its fields should be persisted in the database: @Entity: This annotation marks the class as an entity to be stored in the database. @Id: Indicates that the id field will serve as the unique identifier for each Car entity. @Column: Annotations like @Column specify that a field should be persisted as a column in the database. In this case, we annotate each field we want to store in the database. Additionally, we provide methods for getters, setters, equals, hashCode, and toString for better encapsulation and compatibility with database operations. We also include a factory method Car.of(Faker faker), to generate random car data using the “datafaker” library. This data model encapsulates the structure of a car entity, making it easy to persist and retrieve car-related information in your Oracle NoSQL database using Eclipse JNoSQL. Simplifying Database Operations With Jakarta Data Annotations In Eclipse JNoSQL’s latest version, 1.1.0, developers can harness the power of Jakarta Data annotations to streamline and clarify database operations. These annotations allow you to express your intentions in a more business-centric way, making your code more expressive and closely aligned with the actions you want to perform on the database. Here are some of the Jakarta Data annotations introduced in this version: @Insert: Effortless Data Insertion The @Insert annotation signifies the intent to perform an insertion operation in the database. When applied to a method, it indicates that it aims to insert data into the database. This annotation provides clarity and conciseness, making it evident that the method is responsible for adding new records. @Update: Seamless Data Update The @Update annotation is used to signify an update operation. It is beneficial when you want to modify existing records in the database. Eclipse JNoSQL will check if the information to be updated is already present and proceed accordingly. This annotation simplifies the code by explicitly stating its purpose. @Delete: Hassle-Free Data Removal When you want to delete data from the database, the @Delete annotation comes into play. It communicates that the method’s primary function is to remove information. Like the other annotations, it enhances code readability by conveying the intended action. @Save: Combining Insert and Update The @Save annotation serves a dual purpose. It behaves like the save method in BasicRepository but with added intelligence. It checks if the information is already in the database, and if so, it updates it; otherwise, it inserts new data. This annotation provides a convenient way to handle insertion and updating without needing separate methods. With these Jakarta Data annotations, you can express database operations in a more intuitive and business-centric language. In the context of a car-centric application, such as managing a garage or car collection, you can utilize these annotations to define operations like parking a car and unparking it: Java @Repository public interface Garage extends DataRepository<Car, String> { @Save Car parking(Car car); @Delete void unpark(Car car); Page<Car> findByTransmission(String transmission, Pageable page); } In this example, the @Save annotation is used for parking a car, indicating that this method handles both inserting new cars into the “garage” and updating existing ones. The @Delete annotation is employed for unparking, making it clear that this method is responsible for removing cars from the “garage.” These annotations simplify database operations and enhance code clarity and maintainability by aligning your code with your business terminology and intentions. Executing Database Operations With Oracle NoSQL Now that our entity and repository are set up let’s create classes to execute the application. These classes will initialize a CDI container to inject the necessary template classes for interacting with the Oracle NoSQL database. Interacting With Document Database As a first step, we’ll interact with the document database. We’ll inject the DocumentTemplate interface to perform various operations: Java public static void main(String[] args) { Faker faker = new Faker(); try (SeContainer container = SeContainerInitializer.newInstance().initialize()) { DocumentTemplate template = container.select(DocumentTemplate.class).get(); // Insert 10 random cars into the database for (int index = 0; index < 10; index++) { Car car = Car.of(faker); template.insert(car); } // Retrieve and print all cars template.select(Car.class).stream().toList().forEach(System.out::println); // Retrieve and print cars with Automatic transmission, ordered by model (descending) template.select(Car.class).where("transmission").eq("Automatic").orderBy("model").desc() .stream().forEach(System.out::println); // Retrieve and print cars with CVT transmission, ordered by make (descending) template.select(Car.class).where("transmission").eq("CVT").orderBy("make").desc() .stream().forEach(System.out::println); } System.exit(0); } In this code, we use the DocumentTemplate to insert random cars into the document database, retrieve and print all cars, and execute specific queries based on transmission type and ordering. Interacting With Key-Value Database Oracle NoSQL also supports a key-value database, and we can interact with it as follows: Java public static void main(String[] args) { Faker faker = new Faker(); try (SeContainer container = SeContainerInitializer.newInstance().initialize()) { KeyValueTemplate template = container.select(KeyValueTemplate.class).get(); // Create a random car and put it in the key-value database Car car = Car.of(faker); template.put(car); // Retrieve and print the car based on its unique ID System.out.println("The query result: " + template.get(car.id(), Car.class)); // Delete the car from the key-value database template.delete(car.id()); // Attempt to retrieve the deleted car (will return null) System.out.println("The query result: " + template.get(car.id(), Car.class)); } System.exit(0); } In this code, we utilize the KeyValueTemplate to put a randomly generated car into the key-value database, retrieve it by its unique ID, delete it, and attempt to retrieve it again (resulting in null since it’s been deleted). These examples demonstrate how to execute database operations seamlessly with Oracle NoSQL, whether you’re working with a document-oriented or key-value database model, using Eclipse JNoSQL’s template classes. In this final sampling execution, we will demonstrate how to interact with the repository using our custom repository interface. This approach simplifies database operations and makes them more intuitive, allowing you to work with your custom-defined terminology and actions. Java public static void main(String[] args) { Faker faker = new Faker(); try (SeContainer container = SeContainerInitializer.newInstance().initialize()) { Garage repository = container.select(Garage.class, DatabaseQualifier.ofDocument()).get(); // Parking 10 random cars in the repository for (int index = 0; index < 10; index++) { Car car = Car.of(faker); repository.parking(car); } // Park a car and then unpark it Car car = Car.of(faker); repository.parking(car); repository.unpark(car); // Retrieve the first page of cars with CVT transmission, ordered by model (descending) Pageable page = Pageable.ofPage(1).size(3).sortBy(Sort.desc("model")); Page<Car> page1 = repository.findByTransmission("CVT", page); System.out.println("The first page"); page1.forEach(System.out::println); // Retrieve the second page of cars with CVT transmission System.out.println("The second page"); Pageable secondPage = page.next(); Page<Car> page2 = repository.findByTransmission("CVT", secondPage); page2.forEach(System.out::println); System.out.println("The query result: "); } System.exit(0); } We create a Garage instance through the custom repository interface in this code. We then demonstrate various operations such as parking and unparking cars and querying for cars with specific transmission types, sorted by model and paginated. You can express database operations in a business-centric language by utilizing the repository interface with custom annotations like @Save and @Delete. This approach enhances code clarity and aligns with your domain-specific terminology, providing a more intuitive and developer-friendly way to interact with the database. Conclusion Eclipse JNoSQL 1.1.0, with its support for Oracle NoSQL databases, simplifies and streamlines the interaction between Java applications and NoSQL data stores. With the introduction of Jakarta Data annotations and custom repositories, developers can express database operations in a more business-centric language, making code more intuitive and easier to maintain. This article has covered the critical aspects of Eclipse JNoSQL’s interaction with Oracle NoSQL, including setting up configurations, creating data models, and executing various database operations. Whether you are working with document-oriented or key-value databases, Eclipse JNoSQL provides the necessary tools and abstractions to make NoSQL data access a breeze. To dive deeper into the capabilities of Eclipse JNoSQL and explore more code samples, check out the official repository. There, you will find a wealth of information, examples, and resources to help you leverage the power of Eclipse JNoSQL in your Java applications. Eclipse JNoSQL empowers developers to harness the flexibility and scalability of NoSQL databases while adhering to Java enterprise standards, making it a valuable tool for modern application development. More

Trend Report

Enterprise Security

This year has observed a rise in the sophistication and nuance of approaches to security that far surpass the years prior, with software supply chains being at the top of that list. Each year, DZone investigates the state of application security, and our global developer community is seeing both more automation and solutions for data protection and threat detection as well as a more common security-forward mindset that seeks to understand the Why.In our 2023 Enterprise Security Trend Report, we dive deeper into the greatest advantages and threats to application security today, including the role of software supply chains, infrastructure security, threat detection, automation and AI, and DevSecOps. Featured in this report are insights from our original research and related articles written by members of the DZone Community — read on to learn more!

Enterprise Security

Refcard #393

Getting Started With Large Language Models

By Tuhin Chattopadhyay DZone Core CORE
Getting Started With Large Language Models

Refcard #392

Software Supply Chain Security

By Justin Albano DZone Core CORE
Software Supply Chain Security

More Articles

Linux Mint Debian Edition Makes Me Believe It’s Finally the Year of the Linux Desktop
Linux Mint Debian Edition Makes Me Believe It’s Finally the Year of the Linux Desktop

It wasn't long ago that I decided to ditch my Ubuntu-based distros for openSUSE, finding LEAP 15 to be a steadier, more rock-solid flavor of Linux for my daily driver. The trouble is, I hadn't yet been introduced to Linux Mint Debian Edition (LMDE), and that sound you hear is my heels clicking with joy. LMDE 6 with the Cinnamon desktop. Allow me to explain. While I've been a long-time fan of Ubuntu, in recent years, it's the addition of snaps (rather than system packages) and other Ubuntu-only features started to wear on me. I wanted straightforward networking, support for older hardware, and a desktop that didn't get in the way of my work. For years, Ubuntu provided that, and I installed it on everything from old netbooks, laptops, towers, and IoT devices. More recently, though, I decided to move to Debian, the upstream Linux distro on which Ubuntu (and derivatives like Linux Mint and others) are built. Unlike Ubuntu, Debian holds fast to a truly solid, stable, non-proprietary mindset — and I can still use the apt package manager I've grown accustomed to. That is, every bit of automation I use (Chef and Ansible mostly) works the same on Debian and Ubuntu. I spent some years switching back and forth between the standard Ubuntu long-term releases and Linux Mint, a truly great Ubuntu-derived desktop Linux. Of course, there are many Debian-based distributions, but I stumbled across LMDE version 6, based on Debian GNU/Linux 12 "Bookworm" and known as Faye, and knew I was onto something truly special. As with the Ubuntu version, LMDE comes with different desktop environments, including the robust Cinnamon, which provides a familiar environment for any Linux, Windows, or macOS user. It's intuitive, chock full of great features (like a multi-function taskbar), and it supports a wide range of customizations. However, it includes no snaps or other Ubuntuisms, and it is amazingly stable. That is, I've not had a single freeze-up or odd glitch, even when pushing it hard with Kdenlive video editing, KVM virtual machines, and Docker containers. According to the folks at Linux Mint, "LMDE is also one of our development targets, as such it guarantees the software we develop is compatible outside of Ubuntu." That means if you're a traditional Linux Mint user, you'll find all the familiar capabilities and features in LMDE. After nearly six months of daily use, that's proven true. As someone who likes to hang on to old hardware, LMDE extended its value to me by supporting both 64- and 32-bit systems. I've since installed it on a 2008 Macbook (32-bit), old ThinkPads, old Dell netbooks, and even a Toshiba Chromebook. Though most of these boxes have less than 3 gigabytes of RAM, LMDE performs well. Cinnamon isn't the lightest desktop around, but it runs smoothly on everything I have. The running joke in the Linux world is that "next year" will be the year the Linux desktop becomes a true Windows and macOS replacement. With Debian Bookworm-powered LMDE, I humbly suggest next year is now. To be fair, on some of my oldest hardware, I've opted for Bunsen. It, too, is a Debian derivative with 64- and 32-bit versions, and I'm using the BunsenLabs Linux Boron version, which uses the Openbox window manager and sips resources: about 400 megabytes of RAM and low CPU usage. With Debian at its core, it's stable and glitch-free. Since deploying LMDE, I've also begun to migrate my virtual machines and containers to Debian 12. Bookworm is amazingly robust and works well on IoT devices, LXCs, and more. Since it, too, has long-term support, I feel confident about its stability — and security — over time. If you're a fan of Ubuntu and Linux Mint, you owe it to yourself to give LMDE a try. As a daily driver, it's truly hard to beat.

By John Tonello
The Future Is Cloud-Native: Are You Ready?
The Future Is Cloud-Native: Are You Ready?

Why Go Cloud-Native? Cloud-native technologies empower us to produce increasingly larger and more complex systems at scale. It is a modern approach to designing, building, and deploying applications that can fully capitalize on the benefits of the cloud. The goal is to allow organizations to innovate swiftly and respond effectively to market demands. Agility and Flexibility Organizations often migrate to the cloud for the enhanced agility and the speed it offers. The ability to set up thousands of servers in minutes contrasts sharply with the weeks it typically takes for on-premises operations. Immutable infrastructure provides confidence in configurable and secure deployments and helps reduce time to market. Scalable Components Cloud-native applications are more than just hosting the applications on the cloud. The approach promotes the adoption of microservices, serverless, and containerized applications, and involves breaking down applications into several independent services. These services integrate seamlessly through APIs and event-based messaging, each serving a specific function. Resilient Solutions Orchestration tools manage the lifecycle of components, handling tasks such as resource management, load balancing, scheduling, restarts after internal failures, and provisioning and deploying resources to server cluster nodes. According to the 2023 annual survey conducted by the Cloud Native Computing Foundation, cloud-native technologies, particularly Kubernetes, have achieved widespread adoption within the cloud-native community. Kubernetes continues to mature, signifying its prevalence as a fundamental building block for cloud-native architectures. Security-First Approach Cloud-native culture integrates security as a shared responsibility throughout the entire IT lifecycle. Cloud-native promotes security shift left in the process. Security must be a part of application development and infrastructure right from the start and not an afterthought. Even after product deployment, security should be the top priority, with constant security updates, credential rotation, virtual machine rebuilds, and proactive monitoring. Is Cloud-Native Right for You? There isn't a one-size-fits-all strategy to determine if becoming cloud-native is a wise option. The right approach depends on strategic goals and the nature of the application. Not every application needs to invest in developing a cloud-native model; instead, teams can take an incremental approach based on specific business requirements. There are three levels to an incremental approach when moving to a cloud-native environment. Infrastructure-Ready Applications It involves migrating or rehosting existing on-premise applications to an Infrastructure-as-a-Service (IaaS) platform with minimal changes. Applications retain their original structure but are deployed on cloud-based virtual machines. It is always the first approach to be suggested and commonly referred to as "lift and shift." However, deploying a solution in the cloud that retains monolithic behavior or not utilizing the entire capabilities of the cloud generally has limited merits. Cloud-Enhanced Applications This level allows organizations to leverage modern cloud technologies such as containers and cloud-managed services without significant changes to the application code. Streamlining development operations with DevOps processes results in faster and more efficient application deployment. Utilizing container technology addresses issues related to application dependencies during multi-stage deployments. Applications can be deployed on IaaS or PaaS while leveraging additional cloud-managed services related to databases, caching, monitoring, and continuous integration and deployment pipelines. Cloud-Native Applications This advanced migration strategy is driven by the need to modernize mission-critical applications. Platform-as-a-Service (PaaS) solutions or serverless components are used to transition applications to a microservices or event-based architecture. Tailoring applications specifically for the cloud may involve writing new code or adapting applications to cloud-native behavior. Companies such as Netflix, Spotify, Uber, and Airbnb are the leaders of the digital era. They have presented a model of disruptive competitive advantage by adopting cloud-native architecture. This approach fosters long-term agility and scalability. Ready to Dive Deeper? The Cloud Native Computing Foundation (CNCF) has a vibrant community, driving the adoption of cloud-native technologies. Explore their website and resources to learn more about tools and best practices. All major cloud providers have published the Cloud Adoption Framework (CAF) that provides guidance and best practices to adopt the cloud and achieve business outcomes. Azure Cloud Adoption Framework AWS Cloud Adoption Framework GCP Cloud Adoption Framework Final Words Cloud-native architecture is not just a trendy buzzword; it's a fundamental shift in how we approach software development in the cloud era. Each migration approach I discussed above has unique benefits, and the choice depends on specific requirements. Organizations can choose a single approach or combine components from multiple strategies. Hybrid approaches, incorporating on-premise and cloud components, are common, allowing for flexibility based on diverse application requirements. By adhering to cloud-native design principles, application architecture becomes resilient, adaptable to rapid changes, easy to maintain, and optimized for diverse application requirements.

By Gaurav Gaur DZone Core CORE
Mastering Concurrency: An In-Depth Guide to Java's ExecutorService
Mastering Concurrency: An In-Depth Guide to Java's ExecutorService

In the realm of Java development, mastering concurrent programming is a quintessential skill for experienced software engineers. At the heart of Java's concurrency framework lies the ExecutorService, a sophisticated tool designed to streamline the management and execution of asynchronous tasks. This tutorial delves into the ExecutorService, offering insights and practical examples to harness its capabilities effectively. Understanding ExecutorService At its core, ExecutorService is an interface that abstracts the complexities of thread management, providing a versatile mechanism for executing concurrent tasks in Java applications. It represents a significant evolution from traditional thread management methods, enabling developers to focus on task execution logic rather than the intricacies of thread lifecycle and resource management. This abstraction facilitates a more scalable and maintainable approach to handling concurrent programming challenges. ExecutorService Implementations Java provides several ExecutorService implementations, each tailored for different scenarios: FixedThreadPool: A thread pool with a fixed number of threads, ideal for scenarios where the number of concurrent tasks is known and stable. CachedThreadPool: A flexible thread pool that creates new threads as needed, suitable for applications with a large number of short-lived tasks. ScheduledThreadPoolExecutor: Allows for the scheduling of tasks to run after a specified delay or to execute periodically, fitting for tasks requiring precise timing or regular execution. SingleThreadExecutor: Ensures tasks are executed sequentially in a single thread, preventing concurrent execution issues without the overhead of managing multiple threads. Thread Pools and Thread Reuse ExecutorService manages a pool of worker threads, which helps avoid the overhead of creating and destroying threads for each task. Thread reuse is a significant advantage because creating threads can be resource-intensive. In-Depth Exploration of ExecutorService Mechanics Fundamental Elements At the core of ExecutorService's efficacy in concurrent task management lie several critical elements: Task Holding Structure: At the forefront is the task holding structure, essentially a BlockingQueue, which queues tasks pending execution. The nature of the queue, such as LinkedBlockingQueue for a stable thread count or SynchronousQueue for a flexible CachedThreadPool, directly influences task processing and throughput. Execution Threads: These threads within the pool are responsible for carrying out the tasks. Managed by the ExecutorService, these threads are either spawned or repurposed as dictated by the workload and the pool's configuration. Execution Manager: The ThreadPoolExecutor, a concrete embodiment of ExecutorService, orchestrates the entire task execution saga. It regulates the threads' lifecycle, oversees task processing, and monitors key metrics like the size of the core and maximum pools, task queue length, and thread keep-alive times. Thread Creation Mechanism: This mechanism, or Thread Factory, is pivotal in spawning new threads. By allowing customization of thread characteristics such as names and priorities, it enhances control over thread behavior and diagnostics. Operational Dynamics The operational mechanics of ExecutorService underscore its proficiency in task management: Initial Task Handling: Upon task receipt, ExecutorService assesses whether to commence immediate execution, queue the task, or reject it based on current conditions and configurations. Queue Management: Tasks are queued if current threads are fully engaged and the queue can accommodate more tasks. The queuing mechanism hinges on the BlockingQueue type and the thread pool's settings. Pool Expansion: Should the task defy queuing and the active thread tally is below the maximum threshold, ExecutorService might instantiate a new thread for this task. Task Processing: Threads in the pool persistently fetch and execute tasks from the queue, adhering to a task processing cycle that ensures continuous task throughput. Thread Lifecycle Management: Idle threads exceeding the keep-alive duration are culled, allowing the pool to contract when task demand wanes. Service Wind-down: ExecutorService offers methods (shutdown and shutdownNow) for orderly or immediate service cessation, ensuring task completion and resource liberation. Execution Regulation Policies ExecutorService employs nuanced policies for task execution to maintain system equilibrium: Overflow Handling Policies: When a task can neither be immediately executed nor queued, a RejectedExecutionHandler policy decides the next steps, like task discard or exception throwing, critical for managing task surges. Thread Renewal: To counteract the unexpected loss of a thread due to unforeseen exceptions, the pool replenishes its threads, thereby preserving the pool's integrity and uninterrupted task execution. Advanced Features and Techniques ExecutorService extends beyond mere task execution, offering sophisticated features like: Task Scheduling: ScheduledThreadPoolExecutor allows for precise scheduling, enabling tasks to run after a delay or at fixed intervals. Future and Callable: These constructs allow for the retrieval of results from asynchronous tasks, providing a mechanism for tasks to return values and allowing the application to remain responsive. Custom Thread Factories: Custom thread factories can be used to customize thread properties, such as names or priorities, enhancing manageability and debuggability. Thread Pool Customization: Developers can extend ThreadPoolExecutor to fine-tune task handling, thread creation, and termination policies to fit specific application needs. Practical Examples Example 1: Executing Tasks Using a FixedThreadPool Java ExecutorService executor = Executors.newFixedThreadPool(5); for (int i = 0; i < 10; i++) { Runnable worker = new WorkerThread("Task " + i); executor.execute(worker); } executor.shutdown(); This example demonstrates executing multiple tasks using a fixed thread pool, where each task is encapsulated in a Runnable object. Example 2: Scheduling Tasks With ScheduledThreadPoolExecutor Java ScheduledExecutorService scheduler = Executors.newScheduledThreadPool(1); Runnable task = () -> System.out.println("Task executed at: " + new Date()); scheduler.scheduleAtFixedRate(task, 0, 1, TimeUnit.SECONDS); Here, tasks are scheduled to execute repeatedly with a fixed interval, showcasing the scheduling capabilities of ExecutorService. Example 3: Handling Future Results From Asynchronous Tasks Java ExecutorService executor = Executors.newCachedThreadPool(); Callable<String> task = () -> { TimeUnit.SECONDS.sleep(1); return "Result of the asynchronous computation"; }; Future<String> future = executor.submit(task); System.out.println("Future done? " + future.isDone()); String result = future.get(); // Waits for the computation to complete System.out.println("Future done? " + future.isDone()); System.out.println("Result: " + result); executor.shutdown(); This example illustrates submitting a Callable task, managing its execution with a Future object, and retrieving the result asynchronously. Insights: Key Concepts and Best Practices To effectively utilize ExecutorService and make the most of its capabilities, it is imperative to grasp several fundamental principles and adhere to the recommended approaches: Optimal Thread Pool Size The selection of an appropriate thread pool size carries substantial significance. Inadequate threads may result in the underutilization of CPU resources, whereas an excessive number of threads can lead to resource conflicts and unwarranted overhead. Determining the ideal pool size hinges on variables such as the available CPU cores and the nature of the tasks at hand. Utilizing tools like Runtime.getRuntime().availableProcessors() can aid in ascertaining the number of available CPU cores. Task Prioritization ExecutorService, by default, lacks inherent support for task priorities. In cases where task prioritization is pivotal, contemplating the use of a PriorityQueue to manage tasks and manually assign priorities becomes a viable approach. Task Interdependence In scenarios where tasks exhibit dependencies on one another, employing Future objects, which are returned upon submission of Callable tasks, becomes instrumental. These Future objects permit the retrieval of task results and the ability to await their completion. Effective Exception Handling ExecutorService offers mechanisms to address exceptions that may arise during task execution. These exceptions can be managed within a try-catch block inside the task itself or by overriding the uncaughtException method of ThreadFactory during the thread pool's creation. Graceful Termination It is imperative to execute a shutdown process for the ExecutorService when it is no longer required, ensuring the graceful release of resources. This can be achieved through the utilization of the shutdown() method, which initiates an orderly shutdown, allowing submitted tasks to conclude their execution. Alternatively, the shutdownNow() method can be employed for the forceful termination of all running tasks. Conclusion In summary, Java's ExecutorService stands as a sophisticated and powerful framework for handling and orchestrating concurrent tasks, effectively simplifying the intricacies of thread management with a well-defined and efficient API. Delving deeper into its internal mechanics and core components sheds light on the operational dynamics of task management, queuing, and execution, offering developers critical insights that can significantly influence application optimization for superior performance and scalability. Utilizing ExecutorService to its full extent, from executing simple tasks to leveraging advanced functionalities like customizable thread factories and sophisticated rejection handlers, enables the creation of highly responsive and robust applications capable of managing numerous concurrent operations. Adhering to established best practices, such as optimal thread pool sizing and implementing smooth shutdown processes, ensures applications remain reliable and efficient under diverse operational conditions. At its essence, ExecutorService exemplifies Java's dedication to providing comprehensive and high-level concurrency tools that abstract the complexities of raw thread management. As developers integrate ExecutorService into their projects, they tap into the potential for improved application throughput, harnessing the power of modern computing architectures and complex processing environments.

By Andrei Tuchin
Effective Log Data Analysis With Amazon CloudWatch: Harnessing Machine Learning
Effective Log Data Analysis With Amazon CloudWatch: Harnessing Machine Learning

In today's cloud computing world, all types of logging data are extremely valuable. Logs can include a wide variety of data, including system events, transaction data, user activities, web browser logs, errors, and performance metrics. Managing logs efficiently is extremely important for organizations, but dealing with large volumes of data makes it challenging to detect anomalies and unusual patterns or predict potential issues before they become critical. Efficient log management strategies, such as implementing structured logging, using log aggregation tools, and applying machine learning for log analysis, are crucial for handling this data effectively. One of the latest advancements in effectively analyzing a large amount of logging data is Machine Learning (ML) powered analytics provided by Amazon CloudWatch. It is a brand new capability of CloudWatch. This innovative service is transforming the way organizations handle their log data. It offers a faster, more insightful, and automated log data analysis. This article specifically explores utilizing the machine learning-powered analytics of CloudWatch to overcome the challenges of effectively identifying hidden issues within the log data. Before deep diving into some of these features, let's have a quick refresher about Amazon CloudWatch. What Is Amazon CloudWatch? It is an AWS-native monitoring and observability service that offers a whole suite of capabilities: Monitoring: Tracks performance and operational health. Data collection: Gathers logs, metrics, and events, providing a comprehensive view of AWS resources. Unified operational view: Provides insights into applications running on AWS and on-premises servers. Challenges With Logs Data Analysis Volume of Data There's too much log data. In this modern era, applications emit a tremendous amount of log events. Log data can grow so rapidly that developers often find it difficult to identify issues within it; it is like finding a needle in a haystack. Change Identification Another common problem we have often seen is the fundamental problem of log analysis that goes back as long as logs have been around, identifying what has changed in your logs. Proactive Detection Proactive detection is another common challenge. It's great if you can utilize logs to dive in when an application's having an issue, find the root cause of that application issue, and fix it. But how do you know when those issues are occurring? How do you proactively detect them? Of course, you can implement metrics, alarms, etc., for the issues you know about. But there's always the problem of unknowns. So, we're often instrumenting observability and monitoring for past issues. Now, let's dive deep into the machine learning capabilities from CloudWatch that will help you overcome the challenges we have just discussed. Machine Learning Capabilities From CloudWatch Pattern Analysis Imagine you are troubleshooting a real-time distributed application accessed by millions of customers globally and generating a significant amount of application logs. Analyzing tens of thousands of log events manually is challenging, and it can take forever to find the root cause. That is where the new AWS CloudWatch machine learning-based capability can quickly help by grouping log events into patterns within the Logs Insight page of CloudWatch. It is much easier to identify through a limited number of patterns and quickly filter the ones that might be interesting or relevant based on the issue you are trying to troubleshoot. It also allows you to expand the specific pattern to look for the relevant events along with related patterns that might be pertinent. In simple words, Pattern Analysis is the automated grouping and categorization of your log events. Comparison Analysis How can we elevate pattern analysis to the next level? Now that we've seen how pattern analysis works let's see how we can extend this feature to perform comparison analysis. "Comparison Analysis" aims to solve the second challenge of identifying the log changes. Comparison analysis lets you effectively profile your logs using patterns from one time period and then compare them to the patterns extracted for another period and analyze the differences. This will help us answer this fundamental question of what changed to my logs. You can quickly compare your logs while your application's having an issue to a known healthy period. Any changes between two time periods are a strong indicator of the possible root cause of your problem. CloudWatch Logs Anomaly Detection Anomaly detection, in simple terms, is the process of identifying unusual patterns or behaviors in the logs that do not conform to expected norms. To use this feature, we need to first select the LogGroup for the application and enable CloudWatch Logs anomaly detection for it. At that point, CloudWatch will train a machine-learning model on the expected patterns and the volume of each pattern associated with your application. CloudWatch will take five minutes to train the model using logs from your application, and the feature will become active and automatically start servicing these anomalies any time they occur. So things like a brand new error message occurring that wasn't there before, a sudden spike in the volume, or if there's a spike in HTTP 400s are some examples that will result in an anomaly being generated for that. Generate Logs Insight Queries Using Generative AI With this capability, you can give natural language commands to filter log events, and CloudWatch can generate queries using Generative AI. If you are unfamiliar with CloudWatch query language or are from a non-technical background, you can easily use this feature to generate queries and filter logs. It's an iterative process; you need to learn precisely what you want from the first query. So you can update and iterate the query based on the results you see. Let's look at a couple of examples: Natural Language Prompt: "Check API Response Times" Auto-generated query by CloudWatch: In this query: fields @timestamp, @message selects the timestamp and message fields from your logs. | parse @message "Response Time: *" as responseTime parses the @message field to extract the value following the text "Response Time: " and labels it as responseTime. | stats avg(responseTime) calculates the average of the extracted responseTime values. Natural Language Prompt: "Please provide the duration of the ten invocations with the highest latency." Auto-generated query by CloudWatch In this query: fields @timestamp, @message, latency selects the @timestamp, @message, and latency fields from the logs. | stats max(latency) as maxLatency by @message computes the maximum latency value for each unique message. | sort maxLatency desc sorts the results in descending order based on the maximum latency, showing the highest values at the top. | limit 10 restricts the output to the top 10 results with the highest latency values. We can execute these queries in the CloudWatch “Logs Insights” query box to filter the log events from the application logs. These queries extract specific information from the logs, such as identifying errors, monitoring performance metrics, or tracking user activities. The query syntax might vary based on the particular log format and the information you seek. Conclusion CloudWatch's machine learning features offer a robust solution for managing the complexities of log data. These tools make log analysis more efficient and insightful, from automating pattern analysis to enabling anomaly detection. The addition of generative AI for query generation further democratizes access to these powerful insights.

By Rajat Gupta
Data Streaming for AI in the Financial Services Industry (Part 1)
Data Streaming for AI in the Financial Services Industry (Part 1)

In the ever-evolving landscape of the Financial Services Industry (FSI), organizations face a multitude of challenges that hinder their journey toward AI-driven transformation. Legacy systems, stringent regulations, data silos, and a lack of agility have created a chaotic environment in need of a more effective way to use and share data across the organization. In this two-part series, I delve into my own personal observations, break open the prevailing issues within the FSI, and closely inspect the factors holding back progress. I’ll also highlight the need to integrate legacy systems, navigate strict regulatory landscapes, and break down data silos that impede agility and hinder data-driven decision-making. Last but not least, I’ll introduce a proven data strategy approach in which adopting data streaming technologies will help organizations overhaul their data pipelines—enabling real-time data ingestion, efficient processing, and seamless integration of disparate systems. If you're interested in a practical use case that showcases everything in action, keep an eye out for our upcoming report. But before diving into the solution, let’s start by understanding the problem. Complexity, Chaos, and Data Dilemmas Stepping into larger financial institutions often reveals a fascinating insight: a two-decade technology progression unfolding before my eyes. The core business continues to rely on mainframe systems running COBOL, while a secondary layer of services acts as the gateway to access the core and extension of services offerings that can’t be done in the core system. Data is heavily batched and undergoes nightly ETL processes to facilitate transfer between these layers. Real-time data access poses challenges, demanding multiple attempts and queries through the gateway for even a simple status update. Data warehouses are established, serving as data dumping grounds through ETL, where nearly half of the data remains unused. Business Intelligence (BI) tools extract, transform, and analyze the data to provide valuable insights for business decisions and product design. Batch and distributed processing prevail due to the sheer volume of data to be handled, resulting in data silos and delayed reflection of changing trends. In recent years, more agile approaches have emerged, with a data shift towards binary, key-value formats for better scalability on the cloud. However, due to the architectural complexity, data transfers between services have multiplied, leading to challenges in maintaining data integrity. Plus, these innovations primarily cater to new projects, leaving developers and internal users to navigate through multiple hoops within the system to accomplish tasks. Companies also find themselves paying the price for slow innovation and encounter high costs when implementing new changes. This is particularly true when it comes to AI-driven initiatives that demand a significant amount of data and swift action. Consequently, several challenges bubble to the surface and get in the way of progress, making it increasingly difficult for FSIs to adapt and prepare for the future. Here’s a breakdown of these challenges and the ideal state for FSIs. Reality Ideal state Data silos Decentralized nature of financial operations or team’s geographical location. Separate departments or business units maintain their own data and systems that were implemented over the years, resulting in isolated data and making it difficult to collaborate. There were already several attempts to break the silos, and the solutions somehow contributed to one of the many problems below (i.e., data pipeline chaos). A consolidated view of data across the organization. Ability to quickly view and pull data when needed. Legacy systems FSIs often grapple with legacy systems that have been in place for many years. These systems usually lack the agility to adapt to changes quickly. As a result, accessing and actioning data from these legacy systems can be time-consuming, leading to delays and sometimes making it downright impossible to make good use of the latest data. Data synchronization with the old systems, and modernized ETL pipelines. Migrate and retire from the old process. Data overload With vast amounts of data from various sources, including transactions, customer interactions, market data, and more, it can be overwhelming, making it challenging to extract valuable insights and derive actionable intelligence. It often leads to high storage bills and data is not fully used most of the time. Infrastructural change to adopt larger ingestion of data, planned data storage strategy, and a more cost-effective way to safely secure and store data with sufficient failover and recovery plan. Data pipeline chaos Managing data pipelines within FSIs can be a complex endeavor. With numerous data sources, formats, and integration points, the data pipeline can become fragmented and chaotic. Inconsistent data formats, incompatible systems, and manual processes can introduce errors and inefficiencies, making it challenging to ensure smooth data flow and maintain data quality. A data catalog is a centralized repository that serves as a comprehensive inventory and metadata management system for an organization's data assets.Reduced redundancy, improved efficiency, streamlined data flow, and introduce automation, monitoring and regular inspection. Open data initiatives With the increasing need for partner collaboration and government open API projects, the FSI faces the challenge of adapting its data practices. The demand to share data securely and seamlessly with external partners and government entities is growing. FSIs must establish frameworks and processes to facilitate data exchange while ensuring privacy, security, and compliance with regulations. Secure and well-defined APIs for data access that ensure data interoperability through common standards. Plus, version and access control over access points. Clearly, there’s a lot stacked up against FSIs attempting to leap into the world of AI. Now, let’s zoom in on the different data pipelines organizations are using to move their data from point A to B and the challenges many teams are facing with them. Understanding Batch, Micro-Batch, and Real-Time Data Pipelines There are all sorts of ways that move data around. To keep things simple, I’ll distill the most common pipelines today into three categories: Batch Micro-batch Real-time 1. Batch Pipelines These are typically used when processing large volumes of data in scheduled “chunks” at a time—often in overnight processing, periodic data updates, or batch reporting. Batch pipelines are well-suited for scenarios where immediate data processing isn't crucial, and the output is usually a report, like for investment profiles and insurance claims. The main setbacks include processing delays, potentially outdated results, scalability complexities, managing task sequences, resource allocation issues, and limitations in providing real-time data insights. I’ve witnessed an insurance customer running out of windows at night to run batches due to the sheer volume of data that needed processing (updating premiums, investment details, documents, agents’ commissions, etc.). Parallel processing or map-reduce are a few techniques to shorten the time, but they also introduce complexities, as parallel both require the developer to understand the distribution of data, dependency of data, and be able to maneuver between map and reduce functions. 2. Micro-Batch Pipelines Micro batch pipelines are a variation of batch pipelines where data is processed in smaller, more frequent batches at regular intervals for lower latency and fresher results. They’re commonly used for financial trading insights, clickstream analysis, recommendation systems, underwriting, and customer churn predictions. Challenges with micro-batch pipelines include managing the trade-off between processing frequency and resource usage, handling potential data inconsistencies across micro-batches, and addressing the overhead of initiating frequent processing jobs while still maintaining efficiency and reliability. 3. Real-Time Pipelines These pipelines process data as soon as it flows in. They offer minimal latency and are essential for applications requiring instant reactions, such as real-time analytics, transaction fraud detection, monitoring critical systems, interactive user experiences, continuous model training, and real-time predictions. However, real-time pipelines face challenges like handling high throughputs, maintaining consistently low latency, ensuring data correctness and consistency, managing resource scalability to accommodate varying workloads, and dealing with potential data integration complexities—all of which require robust architectural designs and careful implementation to deliver accurate and timely results. To summarize, here’s the important information about all three pipelines in one table. Batch Micro batch Real time Cadence Scheduled longer intervals Scheduled short intervals Real time Data size Large Small defined chunks Large Scaling Vertical Horizontal Horizontal Latency High (hours/days) Medium (seconds) Low (milliseconds) Datastore Data warehouse, Data lake, Databases, Files Distributed files system, Data warehouses, Databases Stream processing Systems, Data lake, Databases Open-source technologies Apache Hadoop, Map-reduce Apache Spark™ Apache Kafka®, Apache Flink® Industry use case examples Moving files (customer signature scans), and transferring data from the mainframe for core banking data or core insurance policy information. Large datasets for ML. Prepare near real-time business reports and needs to consume data from large dataset lookups such as generating risk management reviews for investment. Daily market trend analysis. Real-time transaction/fraud detection, instant claim approval, monitoring critical systems, and customer chatbot service. As a side note, some may categorize pipelines as either ETL or ELT. ETL (Extract, Transform, and Load) transforms data on a separate processing server before moving it to the data warehouse. ELT (Extract, Load, and Transform) transforms the data within the data warehouse first before it hits its destination. Depending on the destination of the data, if it’s going to a data lake, you’ll see most pipelines doing ELT. Whereas with a data source, like a data warehouse or database, since it requires data to be stored in a more structured manner, you will see more ETL. In my opinion, all three pipelines should be using both techniques to convert data into the desired state. Common Challenges of Working With Data Pipelines Pipelines are scattered across departments, and IT teams implement them with various technologies and platforms. From my own experience working with on-site data engineers, here are some common challenges working with data pipelines: Difficulty Accessing Data Unstructured data can be tricky. The lack of metadata makes it difficult to locate the desired data within the repository (like customer correspondence, emails, chat logs, legal documents.) Certain data analytics tools or platforms may have strict requirements regarding the input data format, posing difficulties in converting the data to the required format. So, multiple complex pipelines transform logic (and lots of it). Stringent security measures and regulatory compliance can introduce additional steps and complexities in gaining access to the necessary data. (Personal identifiable data, health record for claims). Noisy, “Dirty” Data Data lakes are prone to issues like duplicated data. Persistence of decayed or outdated data within the system can compromise the accuracy and reliability of AI models and insights. Input errors during data entry were not caught and filtered. (biggest data processing troubleshooting time wasted) Data mismatches between different datasets and inconsistencies in data. (Incorrect report and pipeline errors) Performance Large volumes of data, lack of efficient storage and processing power. Methods of retrieving data, such as APIs in which the request and response aren’t ideal for large volumes of data ingestion. The location of relevant data within the system and where they’re stored heavily impacts the frequency of when to process data, plus the latency and cost of retrieving it. Data Visibility (Data Governance and Metadata) Inadequate metadata results in a lack of clarity regarding the availability, ownership, and usage of data assets. Difficult to determine the existence and availability of specific data, impeding effective data usage and analysis. Troubleshooting Identifying inconsistencies, addressing data quality problems, or troubleshooting data processing failures can be time-consuming and complex. During the process of redesigning the data framework for AI, both predictive and generative, I’ll address the primary pain points for data engineers and also help solve some of the biggest challenges plaguing the FSI today. Taking FSIs From Point A to AI Looking through a data lens, the AI-driven world can be dissected into two primary categories: inference and machine learning. These domains differ in their data requirements and usage. Machine learning needs comprehensive datasets derived from historical, operational, and real-time sources, enabling training more accurate models. Incorporating real-time data into the dataset enhances the model and facilitates agile and intelligent systems. Inference prioritizes real-time focus, leveraging ML-generated models to respond to incoming events, queries, and requests. Building a generative AI model is a major undertaking. For FSI, it makes sense to reuse an existing model (foundation model) with some fine-tuning in specific areas to fit your use case. The “fine-tuning” will require you to provide a high-quality, high-volume dataset. The old saying still holds true: garbage in, garbage out. If the data isn’t reliable, to begin with, you’ll inevitably end up with unreliable AI. In my opinion, to prepare for the best AI outcome possible, it’s crucial to set up the following foundations: Data infrastructure: You need a robust, low latency, high throughput framework to transfer and store vast volumes of financial data for efficient data ingestion, storage, processing, and retrieval. It should support distributed and cloud computing and prioritize network latency, storage costs, and data safety. Data quality: To provide better data for determining the model, it’s best to go through data cleansing, normalization, de-duplication, and validation processes to remove inconsistencies, errors, and redundancies. Now, if I were to say that there’s a simple solution, I would either be an exceptional genius capable of solving world crises or blatantly lying. However, given the complexity we already have, it’s best to focus on generating the datasets required for ML and streamline the data needed for the inference phase to make decisions. Then, you can gradually address the issues caused by the current data being overly disorganized. Taking one domain at a time, solving business users’ problems first, and not being overly ambitious is the fastest path to success. But we’ll leave that for the next post. Summary Implementing a data strategy in the financial services industry can be intricate due to factors such as legacy systems and the consolidation of other businesses. Introducing AI into this mix can pose performance challenges, and some businesses might struggle to prepare data for machine learning applications. In my next post, I’ll walk you through a proven data strategy approach to streamline your troublesome data pipelines for real-time data ingestion, efficient processing, and seamless integration of disparate systems.

By Christina Lin DZone Core CORE
Understanding the Risks of Long-Lived Kubernetes Service Account Tokens
Understanding the Risks of Long-Lived Kubernetes Service Account Tokens

The popularity of Kubernetes (K8s) as the defacto orchestration platform for the cloud is not showing any sign of pause. This graph, taken from the 2023 Kubernetes Security Report by the security company Wiz, clearly illustrates the trend: As adoption continues to soar, so do the security risks and, most importantly, the attacks threatening K8s clusters. One such threat comes in the form of long-lived service account tokens. In this blog, we are going to dive deep into what these tokens are, their uses, the risks they pose, and how they can be exploited. We will also advocate for the use of short-lived tokens for a better security posture. Service account tokens are bearer tokens (a type of token mostly used for authentication in web applications and APIs) used by service accounts to authenticate to the Kubernetes API. Service accounts provide an identity for processes (applications) that run in a Pod, enabling them to interact with the Kubernetes API securely. Crucially, these tokens are long-lived: when a service account is created, Kubernetes automatically generates a token and stores it indefinitely as a Secret, which can be mounted into pods and used by applications to authenticate API requests. Note: in more recent versions, including Kubernetes v1.29, API credentials are obtained directly by using the TokenRequest API and are mounted into Pods using a projected volume. The tokens obtained using this method have bounded lifetimes and are automatically invalidated when the Pod they are mounted into is deleted. As a reminder, the Kubelet on each node is responsible for mounting service account tokens into pods so they can be used by applications within those pods to authenticate to the Kubernetes API when needed: If you need a refresher on K8s components, look here. The Utility of Service Account Tokens Service account tokens are essential for enabling applications running on Kubernetes to interact with the Kubernetes API. They are used to deploy applications, manage workloads, and perform administrative tasks programmatically. For instance, a Continuous Integration/Continuous Deployment (CI/CD) tool like Jenkins would use a service account token to deploy new versions of an application or roll back a release. The Risks of Longevity While service account tokens are indispensable for automation within Kubernetes, their longevity can be a significant risk factor. Long-lived tokens, if compromised, give attackers ample time to explore and exploit a cluster. Once in the hands of an attacker, these tokens can be used to gain unauthorized access, elevate privileges, exfiltrate data, or even disrupt the entire cluster's operations. Here are a few leak scenarios that could lead to some serious damage: Misconfigured access rights: A pod or container may be misconfigured to have broader file system access than necessary. If a token is stored on a shared volume, other containers or malicious pods that have been compromised could potentially access it. Insecure transmission: If the token is transmitted over the network without proper encryption (like sending it over HTTP instead of HTTPS), it could be intercepted by network sniffing tools. Code repositories: Developers might inadvertently commit a token to a public or private source code repository. If the repository is public or becomes exposed, the token is readily available to anyone who accesses it. Logging and monitoring systems: Tokens might get logged by applications or monitoring systems and could be exposed if logs are not properly secured or if verbose logging is accidentally enabled. Insider threat: A malicious insider with access to the Kubernetes environment could extract the token and use it or leak it intentionally. Application vulnerabilities: If an application running within the cluster has vulnerabilities (e.g., a Remote Code Execution flaw), an attacker could exploit this to gain access to the pod and extract the token. How Could an Attacker Exploit Long-Lived Tokens? Attackers can collect long-lived tokens through network eavesdropping, exploiting vulnerable applications, or leveraging social engineering tactics. With these tokens, they can manipulate Kubernetes resources at their will. Here is a non-exhaustive list of potential abuses: Abuse the cluster's (often barely limited) infra resources for cryptocurrency mining or as part of a botnet. With API access, attackers could deploy malicious containers, alter running workloads, exfiltrate sensitive data, or even take down the entire cluster. If the token has broad permissions, it can be used to modify roles and bindings to elevate privileges within the cluster. The attacker could create additional resources that provide them with persistent access (backdoor) to the cluster, making it harder to remove their presence. Access to sensitive data stored in the cluster or accessible through it could lead to data theft or leakage. Why Aren’t Service Account Tokens Short-Lived by Default? Short-lived tokens are a security best practice in general, particularly for managing access to very sensitive resources like the Kubernetes API. They reduce the window of opportunity for attackers to exploit a token and facilitate better management of permissions as application access requirements change. Automating token rotation limits the impact of a potential compromise and aligns with the principle of least privilege — granting only the access necessary for a service to operate. The problem is that implementing short-lived tokens comes with some overhead. First, implementing short-lived tokens typically requires a more complex setup. You need an automated process to handle token renewal before it expires. This may involve additional scripts or Kubernetes operators that watch for token expiration and request new tokens as necessary. This often means integrating a secret management system that can securely store and automatically rotate the tokens. This adds a new dependency for system configuration and maintenance. Note: it goes without saying that using a secrets manager with Kubernetes is highly recommended, even for non-production workloads. But the overhead cannot be understated. Second, software teams running their CI/CD workers on top of the cluster will need adjustments to support dynamic retrieval and injection of these tokens into the deployment process. This could require changes in the pipeline configuration and additional error handling to manage potential token expiration during a pipeline run, which can be a true headache. And secrets management is just the tip of the iceberg. You will also need monitoring and alerts if you want to troubleshoot renewal failures. Fine-tuning token expiry time could break the deployment process, requiring immediate attention to prevent downtime or deployment failures. Finally, there could also be performance considerations, as many more API calls are needed to retrieve new tokens and update the relevant Secrets. By default, Kubernetes opts for a straightforward setup by issuing service account tokens without a built-in expiration. This approach simplifies initial configuration but lacks the security benefits of token rotation. It is the Kubernetes admin’s responsibility to configure more secure practices by implementing short-lived tokens and the necessary infrastructure for their rotation, thereby enhancing the cluster's security posture. Mitigation Best Practices For many organizations, the additional overhead is justified by the security improvements. Tools like service mesh implementations (e.g., Istio), secret managers (e.g., CyberArk Conjur), or cloud provider services can manage the lifecycle of short-lived certificates and tokens, helping to reduce the overhead. Additionally, recent versions of Kubernetes offer features like the TokenRequest API, which can automatically rotate tokens and project them into the running pods. Even without any additional tool, you can mitigate the risks by limiting the Service Account auto-mount feature. To do so, you can opt out of the default API credential automounting with a single flag in the service account or pod configuration. Here are two examples: For a Service Account: YAML apiVersion: v1 kind: ServiceAccount metadata: name: build-robot automountServiceAccountToken: false ... And for a specific Pod: YAML apiVersion: v1 kind: Pod metadata: name: my-pod spec: serviceAccountName: build-robot automountServiceAccountToken: false ... The bottom line is that if an application does not need to access the K8s API, it should not have a token mounted. This also limits the number of service account tokens an attacker can access if the attacker manages to compromise any of the Kubernetes hosts. Okay, you might say, but how do we enforce this policy everywhere? Enter Kyverno, a policy engine designed for K8s. Enforcement With Kyverno Kyverno allows cluster administrators to manage, validate, mutate, and generate Kubernetes resources based on custom policies. To prevent the creation of long-lived service account tokens, one can define the following Kyverno policy: YAML apiVersion: kyverno.io/v1 kind: ClusterPolicy metadata: name: deny-secret-service-account-token spec: validationFailureAction: Enforce background: false rules: - name: check-service-account-token match: any: - resources: kinds: - Secret validate: cel: expressions: - message: "Long lived API tokens are not allowed" expression: > object.type != "kubernetes.io/service-account-token" This policy ensures that only Secrets that are not of type kubernetes.io/service-account-token can be created, effectively blocking the creation of long-lived service account tokens! Applying the Kyverno Policy To apply this policy, you need to have Kyverno installed on your Kubernetes cluster (tutorial). Once Kyverno is running, you can apply the policy by saving the above YAML to a file and using kubectl to apply it: YAML kubectl apply -f deny-secret-service-account-token.yaml After applying this policy, any attempt to create a Secret that is a service account token of the prohibited type will be denied, enforcing a safer token lifecycle management practice. Wrap Up In Kubernetes, managing the lifecycle and access of service account tokens is a critical aspect of cluster security. By preferring short-lived tokens over long-lived ones and enforcing policies with tools like Kyverno, organizations can significantly reduce the risk of token-based security incidents. Stay vigilant, automate security practices, and ensure your Kubernetes environment remains robust against threats.

By Thomas Segura
AI and Rules for Agile Microservices in Minutes
AI and Rules for Agile Microservices in Minutes

Here's how to use AI and API Logic Server to create complete running systems in minutes: Use ChatGPT for Schema Automation: create a database schema from natural language. Use Open Source API Logic Server: create working software with one command. App Automation: a multi-page, multi-table admin app. API Automation: A JSON: API, crud for each table, with filtering, sorting, optimistic locking, and pagination. Customize the project with your IDE: Logic Automation using rules: declare spreadsheet-like rules in Python for multi-table derivations and constraints - 40X more concise than code. Use Python and standard libraries (Flask, SQLAlchemy) and debug in your IDE. Iterate your project: Revise your database design and logic. Integrate with B2B partners and internal systems. This process leverages your existing IT infrastructure: your IDE, GitHub, the cloud, your database… open source. Let's see how. 1. AI: Schema Automation You can use an existing database or create a new one with ChatGPT or your database tools. Use ChatGPT to generate SQL commands for database creation: Plain Text Create a sqlite database for customers, orders, items and product Hints: use autonum keys, allow nulls, Decimal types, foreign keys, no check constraints. Include a notes field for orders. Create a few rows of only customer and product data. Enforce the Check Credit requirement: Customer.Balance <= CreditLimit Customer.Balance = Sum(Order.AmountTotal where date shipped is null) Order.AmountTotal = Sum(Items.Amount) Items.Amount = Quantity * UnitPrice Store the Items.UnitPrice as a copy from Product.UnitPrice Note the hint above. As we've heard, "AI requires adult supervision." The hint was required to get the desired SQL. This creates standard SQL like this. Copy the generated SQL commands into a file, say, sample-ai.sql: Then, create the database: sqlite3 sample_ai.sqlite < sample_ai.sql 2. API Logic Server: Create Given a database (whether or not it's created from AI), API Logic Server creates an executable, customizable project with the following single command: $ ApiLogicServer create --project_name=sample_ai --db_url=sqlite:///sample_ai.sqlite This creates a project you can open with your IDE, such as VSCode (see below). The project is now ready to run; press F5. It reflects the automation provided by the create command: API Automation: a self-serve API ready for UI developers and; App Automation: an Admin app ready for Back Office Data Maintenance and Business User Collaboration. Let's explore the App and API Automation from the create command. App Automation App Automation means that ApiLogicServer create creates a multi-page, multi-table Admin App automatically. This does not consist of hundreds of lines of complex HTML and JavaScript; it's a simple yaml file that's easy to customize. Ready for business user collaboration,back-office data maintenance...in minutes. API Automation App Automation means that ApiLogicServer create creates a JSON: API automatically. Your API provides an endpoint for each table, with related data access, pagination, optimistic locking, filtering, and sorting. It would take days to months to create such an APIusing frameworks. UI App Developers can use the API to create custom apps immediately, using Swagger to design their API call and copying the URI into their JavaScript code. APIs are thus self-serve: no server coding is required. Custom App Dev is unblocked: Day 1. 3. Customize So, we have working software in minutes. It's running, but we really can't deploy it until we have logic and security, which brings us to customization. Projects are designed for customization, using standards: Python, frameworks (e.g., Flask, SQLAlchemy), and your IDE for code editing and debugging. Not only Python code but also Rules. Logic Automation Logic Automation means that you can declare spreadsheet-like rules using Python. Such logic maintains database integrity with multi-table derivations, constraints, and security. Rules are 40X more concise than traditional code and can be extended with Python. Rules are an executable design. Use your IDE (code completion, etc.) to replace 280 lines of code with the five spreadsheet-like rules below. Note they map exactly to our natural language design: 1. Debugging The screenshot above shows our logic declarations and how we debug them: Execution is paused at a breakpoint in the debugger, where we can examine the state and execute step by step. Note the logging for inserting an Item. Each line represents a rule firing and shows the complete state of the row. 2. Chaining: Multi-Table Transaction Automation Note that it's a Multi-Table Transaction, as indicated by the log indentation. This is because, like a spreadsheet, rules automatically chain, including across tables. 3. 40X More Concise The five spreadsheet-like rules represent the same logic as 200 lines of code, shown here. That's a remarkable 40X decrease in the backend half of the system. 4. Automatic Re-use The logic above, perhaps conceived for Place order, applies automatically to all transactions: deleting an order, changing items, moving an order to a new customer, etc. This reduces code and promotes quality (no missed corner cases). 5. Automatic Optimizations SQL overhead is minimized by pruning, and by eliminating expensive aggregate queries. These can result in orders of magnitude impact. This is because the rule engine is not based on a Rete algorithm but is highly optimized for transaction processing and integrated with the SQLAlchemy ORM (Object Relational Manager). 6. Transparent Rules are an executable design. Note they map exactly to our natural language design (shown in comments) readable by business users. This complements running screens to facilitate agile collaboration. Security Automation Security Automation means you activate login-access security and declare grants (using Python) to control row access for user roles. Here, we filter less active accounts for users with the sales role: Grant( on_entity = models.Customer, to_role = Roles.sales, filter = lambda : models.Customer.CreditLimit > 3000, filter_debug = "CreditLimit > 3000") 4. Iterate: Rules + Python So, we have completed our one-day project. The working screens and rules facilitate agile collaboration, which leads to agile iterations. Automation helps here, too: not only are spreadsheet-like rules 40X more concise, but they meaningfully simplify iterations and maintenance. Let’s explore this with two changes: Requirement 1: Green Discounts Plain Text Give a 10% discount for carbon-neutral products for 10 items or more. Requirement 2: Application Integration Plain Text Send new Orders to Shipping using a Kafka message. Enable B2B partners to place orders with a custom API. Revise Data Model In this example, a schema change was required to add the Product.CarbonNeutral column. This affects the ORM models, the API, etc. So, we want these updated but retain our customizations. This is supported using the ApiLogicServer rebuild-from-database command to update existing projects to a revised schema, preserving customizations. Iterate Logic: Add Python Here is our revised logic to apply the discount and send the Kafka message: Extend API We can also extend our API for our new B2BOrder endpoint using standard Python and Flask: Note: Kafka is not activated in this example. To explore a running Tutorial for application integration with running Kafka, click here. Notes on Iteration This illustrates some significant aspects of how logic supports iteration. Maintenance Automation Along with perhaps documentation, one of the tasks programmers most loathe is maintenance. That’s because it’s not about writing code, but archaeology; deciphering code someone else wrote, just so you can add four or five lines that’ll hopefully be called and function correctly. Logic Automation changes that with Maintenance Automation, which means: Rules automatically order their execution (and optimizations) based on system-discovered dependencies. Rules are automatically reused for all relevant transactions. So, to alter logic, you just “drop a new rule in the bucket,” and the system will ensure it’s called in the proper order and re-used over all the relevant Use Cases. Extensibility: With Python In the first case, we needed to do some if/else testing, and it was more convenient to add a dash of Python. While this is pretty simple Python as a 4GL, you have the full power of object-oriented Python and its many libraries. For example, our extended API leverages Flask and open-source libraries for Kafka messages. Rebuild: Logic Preserved Recall we were able to iterate the schema and use the ApiLogicServer rebuild-from-database command. This updates the existing project, preserving customizations. 5. Deploy API Logic Server provides scripts to create Docker images from your project. You can deploy these to the cloud or your local server. For more information, see here. Summary In minutes, you've used ChatGPT and API Logic Server to convert an idea into working software. It required only five rules and a few dozen lines of Python. The process is simple: Create the Schema with ChatGPT. Create the Project with ApiLogicServer. A Self-Serve API to unblock UI Developers: Day 1 An Admin App for Business User Collaboration: Day 1 Customize the project. With Rules: 40X more concise than code. With Python: for complete flexibility. Iterate the project in your IDE to implement new requirements. Prior customizations are preserved. It all works with standard tooling: Python, your IDE, and container-based deployment. You can execute the steps in this article with the detailed tutorial: click here.

By Val Huber
Implementation of the Raft Consensus Algorithm Using C++20 Coroutines
Implementation of the Raft Consensus Algorithm Using C++20 Coroutines

This article describes how to implement a Raft Server consensus module in C++20 without using any additional libraries. The narrative is divided into three main sections: A comprehensive overview of the Raft algorithm A detailed account of the Raft Server's development A description of a custom coroutine-based network library The implementation makes use of the robust capabilities of C++20, particularly coroutines, to present an effective and modern methodology for building a critical component of distributed systems. This exposition not only demonstrates the practical application and benefits of C++20 coroutines in sophisticated programming environments, but it also provides an in-depth exploration of the challenges and resolutions encountered while building a consensus module from the ground up, such as Raft Server. The Raft Server and network library repositories, miniraft-cpp and coroio, are available for further exploration and practical applications. Introduction Before delving into the complexities of the Raft algorithm, let’s consider a real-world example. Our goal is to develop a network key-value storage (K/V) system. In C++, this can be easily accomplished by using an unordered_map<string, string>. However, in real-world applications, the requirement for a fault-tolerant storage system increases complexity. A seemingly simple approach could entail deploying three (or more) machines, each hosting a replica of this service. The expectation may be for users to manage data replication and consistency. However, this method can result in unpredictable behaviors. For example, it is possible to update data using a specific key and then retrieve an older version later. What users truly want is a distributed system, potentially spread across multiple machines, that runs as smoothly as a single-host system. To meet this requirement, a consensus module is typically placed in front of the K/V storage (or any similar service, hereafter referred to as the "state machine"). This configuration ensures that all user interactions with the state machine are routed exclusively through the consensus module, rather than direct access. With this context in mind, let us now look at how to implement such a consensus module, using the Raft algorithm as an example. Raft Overview In the Raft algorithm, there are an odd number of participants known as peers. Each peer keeps its own log of records. There is one peer leader, and the others are followers. Users direct all requests (reads and writes) to the leader. When a write request to change the state machine is received, the leader logs it first before forwarding it to the followers, who also log it. Once the majority of peers have successfully responded, the leader considers this entry to be committed, applies it to the state machine, and notifies the user of its success. The Term is a key concept in Raft, and it can only grow. The Term changes when there are system changes, such as a change in leadership. The log in Raft has a specific structure, with each entry consisting of a Term and a Payload. The term refers to the leader who wrote the initial entry. The Payload represents the changes to be made to the state machine. Raft guarantees that two entries with the same index and term are identical. Raft logs are not append-only and may be truncated. For example, in the scenario below, leader S1 replicated two entries before crashing. S2 took the lead and began replicating entries, and S1's log differed from those of S2 and S3. As a result, the last entry in the S1 log will be removed and replaced with a new one. Raft RPC API Let us examine the Raft RPC. It's worth noting that the Raft API is quite simple, with just two calls. We'll begin by looking at the leader election API. It is important to note that Raft ensures that there can only be one leader per term. There may also be terms without a leader, such as if elections fail. To ensure that only one election occurs, a peer saves its vote in a persistent variable called VotedFor. The election RPC is called RequestVote and has three parameters: Term, LastLogIndex, and LastLogTerm. The response contains Term and VoteGranted. Notably, every request contains Term, and in Raft, peers can only communicate effectively if their Terms are compatible. When a peer initiates an election, it sends a RequestVote request to the other peers and collects their votes. If the majority of the responses are positive, the peer advances to the leader role. Now let's look at the AppendEntries request. It accepts parameters such as Term, PrevLogIndex, PrevLogTerm, and Entries, and the response contains Term and Success. If the Entries field in the request is empty, it acts as a Heartbeat. When an AppendEntries request is received, a follower checks the PrevLogIndex for the Term. If it matches PrevLogTerm, the follower adds Entries to its log beginning with PrevLogIndex + 1 (entries after PrevLogIndex are removed if they exist): If the terms do not match, the follower returns Success=false. In this case, the leader retries sending the request, lowering the PrevLogIndex by one. When a peer receives a RequestVote request, it compares its LastTerm and LastLogIndex pairs to the most recent log entry. If the pair is less than or equal to the requestor's, the peer returns VoteGranted=true. State Transitions in Raft Raft's state transitions look like this. Each peer begins in the Follower state. If a Follower does not receive AppendEntries within a set timeout, it extends its Term and moves to the Candidate state, triggering an election. A peer can move from the Candidate state to the Leader state if it wins the election, or return to the Follower state if it receives an AppendEntries request. A Candidate can also revert to being a Candidate if it does not transition to either a Follower or a Leader within the timeout period. If a peer in any state receives an RPC request with a Term greater than its current one, it moves to the Follower state. Commit Let us now consider an example that demonstrates how Raft is not as simple as it may appear. I took this example from Diego Ongaro's dissertation. S1 was the leader in Term 2, where it replicated two entries before crashing. Following this, S5 took the lead in Term 3, added an entry, and then crashed. Next, S2 took over leadership in Term 4, replicated the entry from Term 2, added its own entry for Term 4, and then crashed. This results in two possible outcomes: S5 reclaims leadership and truncates the entries from Term 2, or S1 regains leadership and commits the entries from Term 2. The entries from Term 2 are securely committed only after they are covered by a subsequent entry from a new leader. This example demonstrates how the Raft algorithm operates in a dynamic and often unpredictable set of circumstances. The sequence of events, which includes multiple leaders and crashes, demonstrates the complexity of maintaining a consistent state across a distributed system. This complexity is not immediately apparent, but it becomes important in situations involving leader changes and system failures. The example emphasizes the importance of a robust and well-thought-out approach to dealing with such complexities, which is precisely what Raft seeks to address. Additional Materials For further study and a deeper understanding of Raft, I recommend the following materials: the original Raft paper, which is ideal for implementation. Diego Ongaro's PhD dissertation provides more in-depth insights. Maxim Babenko's lecture goes into even greater detail. Raft Implementation Let's now move on to the Raft server implementation, which, in my opinion, benefits greatly from C++20 coroutines. In my implementation, the Persistent State is stored in memory. However, in real-world scenarios, it should be saved to disk. I'll talk more about the MessageHolder later. It functions similarly to a shared_ptr, but is specifically designed to handle Raft messages, ensuring efficient management and processing of these communications. C++ struct TState { uint64_t CurrentTerm = 1; uint32_t VotedFor = 0; std::vector<TMessageHolder<TLogEntry>> Log; }; In the Volatile State, I labeled entries with either L for "leader" or F for "follower" to clarify their use. The CommitIndex denotes the last log entry that was committed. In contrast, LastApplied is the most recent log entry applied to the state machine, and it is always less than or equal to the CommitIndex. The NextIndex is important because it identifies the next log entry to be sent to a peer. Similarly, MatchIndex keeps track of the last log entry that discovered a match. The Votes section contains the IDs of peers who voted for me. Timeouts are an important aspect to manage: HeartbeatDue and RpcDue manage leader timeouts, while ElectionDue handles follower timeouts. C++ using TTime = std::chrono::time_point<std::chrono::steady_clock>; struct TVolatileState { uint64_t CommitIndex = 0; // L,F uint64_t LastApplied = 0; // L,F std::unordered_map<uint32_t, uint64_t> NextIndex; // L std::unordered_map<uint32_t, uint64_t> MatchIndex; // L std::unordered_set<uint32_t> Votes; // C std::unordered_map<uint32_t, TTime> HeartbeatDue; // L std::unordered_map<uint32_t, TTime> RpcDue; // L TTime ElectionDue; // F }; Raft API My implementation of the Raft algorithm has two classes. The first is INode, which denotes a peer. This class includes two methods: Send, which stores outgoing messages in an internal buffer, and Drain, which handles actual message dispatch. Raft is the second class, and it manages the current peer's state. It also includes two methods: Process, which handles incoming connections, and ProcessTimeout, which must be called on a regular basis to manage timeouts, such as the leader election timeout. Users of these classes should use the Process, ProcessTimeout, and Drain methods as necessary. INode's Send method is invoked internally within the Raft class, ensuring that message handling and state management are seamlessly integrated within the Raft framework. C++ struct INode { virtual ~INode() = default; virtual void Send(TMessageHolder<TMessage> message) = 0; virtual void Drain() = 0; }; class TRaft { public: TRaft(uint32_t node, const std::unordered_map<uint32_t, std::shared_ptr<INode>>& nodes); void Process(TTime now, TMessageHolder<TMessage> message, const std::shared_ptr<INode>& replyTo = {}); void ProcessTimeout(TTime now); }; Raft Messages Now let's look at how I send and read Raft messages. Instead of using a serialization library, I read and send raw structures in TLV format. This is what the message header looks like: C++ struct TMessage { uint32_t Type; uint32_t Len; char Value[0]; }; For additional convenience, I've introduced a second-level header: C++ struct TMessageEx: public TMessage { uint32_t Src = 0; uint32_t Dst = 0; uint64_t Term = 0; }; This includes the sender's and receiver's ID in each message. With the exception of LogEntry, all messages inherit from TMessageEx. LogEntry and AppendEntries are implemented as follows: C++ struct TLogEntry: public TMessage { static constexpr EMessageType MessageType = EMessageType::LOG_ENTRY; uint64_t Term = 1; char Data[0]; }; struct TAppendEntriesRequest: public TMessageEx { static constexpr EMessageType MessageType = EMessageType::APPEND_ENTRIES_REQUEST; uint64_t PrevLogIndex = 0; uint64_t PrevLogTerm = 0; uint32_t Nentries = 0; }; To facilitate message handling, I use a class called MessageHolder, reminiscent of a shared_ptr: C++ template<typename T> requires std::derived_from<T, TMessage> struct TMessageHolder { T* Mes; std::shared_ptr<char[]> RawData; uint32_t PayloadSize; std::shared_ptr<TMessageHolder<TMessage>[]> Payload; template<typename U> requires std::derived_from<U, T> TMessageHolder<U> Cast() {...} template<typename U> requires std::derived_from<U, T> auto Maybe() { ... } }; This class includes a char array containing the message itself. It may also include a Payload (which is only used for AppendEntry), as well as methods for safely casting a base-type message to a specific one (the Maybe method) and unsafe casting (the Cast method). Here is a typical example of using the MessageHolder: C++ void SomeFunction(TMessageHolder<TMessage> message) { auto maybeAppendEntries = message.Maybe<TAppendEntriesRequest>(); if (maybeAppendEntries) { auto appendEntries = maybeAppendEntries.Cast(); } // if we are sure auto appendEntries = message.Cast<TAppendEntriesRequest>(); // usage with overloaded operator-> auto term = appendEntries->Term; auto nentries = appendEntries->Nentries; // ... } And a real-life example in the Candidate state handler: C++ void TRaft::Candidate(TTime now, TMessageHolder<TMessage> message) { if (auto maybeResponseVote = message.Maybe<TRequestVoteResponse>()) { OnRequestVote(std::move(maybeResponseVote.Cast())); } else if (auto maybeRequestVote = message.Maybe<TRequestVoteRequest>()) { OnRequestVote(now, std::move(maybeRequestVote.Cast())); } else if (auto maybeAppendEntries = message.Maybe<TAppendEntriesRequest>()) { OnAppendEntries(now, std::move(maybeAppendEntries.Cast())); } } This design approach improves the efficiency and flexibility of message handling in Raft implementations. Raft Server Let's discuss the Raft server implementation. The Raft server will set up coroutines for network interactions. First, we'll look at the coroutines that handle message reading and writing. The primitives used for these coroutines are discussed later in the article, along with an analysis of the network library. The writing coroutine is responsible for writing messages to the socket, whereas the reading coroutine is slightly more complex. To read, it must first retrieve the Type and Len variables, then allocate an array of Len bytes, and finally, read the rest of the message. This structure facilitates the efficient and effective management of network communications within the Raft server. C++ template<typename TSocket> TValueTask<void> TMessageWriter<TSocket>::Write(TMessageHolder<TMessage> message) { co_await TByteWriter(Socket).Write(message.Mes, message->Len); auto payload = std::move(message.Payload); for (uint32_t i = 0; i < message.PayloadSize; ++i) { co_await Write(std::move(payload[i])); } co_return; } template<typename TSocket> TValueTask<TMessageHolder<TMessage>> TMessageReader<TSocket>::Read() { decltype(TMessage::Type) type; decltype(TMessage::Len) len; auto s = co_await Socket.ReadSome(&type, sizeof(type)); if (s != sizeof(type)) { /* throw */ } s = co_await Socket.ReadSome(&len, sizeof(len)); if (s != sizeof(len)) { /* throw */} auto mes = NewHoldedMessage<TMessage>(type, len); co_await TByteReader(Socket).Read(mes->Value, len - sizeof(TMessage)); auto maybeAppendEntries = mes.Maybe<TAppendEntriesRequest>(); if (maybeAppendEntries) { auto appendEntries = maybeAppendEntries.Cast(); auto nentries = appendEntries->Nentries; mes.InitPayload(nentries); for (uint32_t i = 0; i < nentries; i++) mes.Payload[i] = co_await Read(); } co_return mes; } To launch a Raft server, create an instance of the RaftServer class and call the Serve method. The Serve method starts two coroutines. The Idle coroutine is responsible for periodically processing timeouts, whereas InboundServe manages incoming connections. C++ class TRaftServer { public: void Serve() { Idle(); InboundServe(); } private: TVoidTask InboundServe(); TVoidTask InboundConnection(TSocket socket); TVoidTask Idle(); } Incoming connections are received via the accept call. Following this, the InboundConnection coroutine is launched, which reads incoming messages and forwards them to the Raft instance for processing. This configuration ensures that the Raft server can efficiently handle both internal timeouts and external communication. C++ TVoidTask InboundServe() { while (true) { auto client = co_await Socket.Accept(); InboundConnection(std::move(client)); } co_return; } TVoidTask InboundConnection(TSocket socket) { while (true) { auto mes = co_await TMessageReader(client->Sock()).Read(); Raft->Process(std::chrono::steady_clock::now(), std::move(mes), client); Raft->ProcessTimeout(std::chrono::steady_clock::now()); DrainNodes(); } co_return; } The Idle coroutine works as follows: it calls the ProcessTimeout method every sleep second. It's worth noting that this coroutine uses asynchronous sleep. This design enables the Raft server to efficiently manage time-sensitive operations without blocking other processes, improving the server's overall responsiveness and performance. C++ while (true) { Raft->ProcessTimeout(std::chrono::steady_clock::now()); DrainNodes(); auto t1 = std::chrono::steady_clock::now(); if (t1 > t0 + dt) { DebugPrint(); t0 = t1; } co_await Poller.Sleep(t1 + sleep); } The coroutine was created for sending outgoing messages and is designed to be simple. It repeatedly sends all accumulated messages to the socket in a loop. In the event of an error, it starts another coroutine that is responsible for connecting (via the connect function). This structure ensures that outgoing messages are handled smoothly and efficiently while remaining robust through error handling and connection management. C++ try { while (!Messages.empty()) { auto tosend = std::move(Messages); Messages.clear(); for (auto&& m : tosend) { co_await TMessageWriter(Socket).Write(std::move(m)); } } } catch (const std::exception& ex) { Connect(); } co_return; With the Raft Server implemented, these examples show how coroutines greatly simplify development. While I haven't looked into Raft's implementation (trust me, it's much more complex than the Raft Server), the overall algorithm is not only simple but also compact in design. Next, we'll look at some Raft Server examples. Following that, I'll describe the network library I created from scratch specifically for the Raft Server. This library is critical to enabling efficient network communication within the Raft framework. Here's an example of launching a Raft cluster with three nodes. Each instance receives its own ID as an argument, as well as the other instances' addresses and IDs. In this case, the client communicates exclusively with the leader. It sends random strings while keeping a set number of in-flight messages and waiting for their commitment. This configuration depicts the interaction between the client and the leader in a multi-node Raft environment, demonstrating the algorithm's handling of distributed data and consensus. Shell $ ./server --id 1 --node 127.0.0.1:8001:1 --node 127.0.0.1:8002:2 --node 127.0.0.1:8003:3 ... Candidate, Term: 2, Index: 0, CommitIndex: 0, ... Leader, Term: 3, Index: 1080175, CommitIndex: 1080175, Delay: 2:0 3:0 MatchIndex: 2:1080175 3:1080175 NextIndex: 2:1080176 3:1080176 .... $ ./server --id 2 --node 127.0.0.1:8001:1 --node 127.0.0.1:8002:2 --node 127.0.0.1:8003:3 ... $ ./server --id 3 --node 127.0.0.1:8001:1 --node 127.0.0.1:8002:2 --node 127.0.0.1:8003:3 ... Follower, Term: 3, Index: 1080175, CommitIndex: 1080175, ... $ dd if=/dev/urandom | base64 | pv -l | ./client --node 127.0.0.1:8001:1 >log1 198k 0:00:03 [159.2k/s] [ <=> I measured the commit latency for configurations of both 3-node and 5-node clusters. As expected, the latency is higher for the 5-node setup: 3 Nodes 50 percentile (median): 292,872 ns 80 percentile: 407,561 ns 90 percentile: 569,164 ns 99 percentile: 40,279,001 ns 5 Nodes 50 percentile (median): 425,194 ns 80 percentile: 672,541 ns 90 percentile: 1,027,669 ns 99 percentile: 38,578,749 ns I/O Library Let's now look at the I/O library that I created from scratch and used in the Raft server's implementation. I began with the example below, taken from cppreference.com, which is an implementation of an echo server: C++ task<> tcp_echo_server() { char data[1024]; while (true) { std::size_t n = co_await socket.async_read_some(buffer(data)); co_await async_write(socket, buffer(data, n)); } } An event loop, a socket primitive, and methods like read_some/write_some (named ReadSome/WriteSome in my library) were required for my library, as well as higher-level wrappers such as async_write/async_read (named TByteReader/TByteWriter in my library). To implement the ReadSome method of the socket, I had to create an Awaitable as follows: C++ auto ReadSome(char* buf, size_t size) { struct TAwaitable { bool await_ready() { return false; /* always suspend */ } void await_suspend(std::coroutine_handle<> h) { poller->AddRead(fd, h); } int await_resume() { return read(fd, b, s); } TSelect* poller; int fd; char* b; size_t s; }; return TAwaitable{Poller_,Fd_,buf,size}; } When co_await is called, the coroutine suspends because await_ready returns false. In await_suspend, we capture the coroutine_handle and pass it along with the socket handle to the poller. When the socket is ready, the poller calls the coroutine_handle to restart the coroutine. Upon resumption, await_resume is called, which performs a read and returns the number of bytes read to the coroutine. The WriteSome, Accept, and Connect methods are implemented in a similar manner. The Poller is set up as follows: C++ struct TEvent { int Fd; int Type; // READ = 1, WRITE = 2; std::coroutine_handle<> Handle; }; class TSelect { void Poll() { for (const auto& ch : Events) { /* FD_SET(ReadFds); FD_SET(WriteFds);*/ } pselect(Size, ReadFds, WriteFds, nullptr, ts, nullptr); for (int k = 0; k < Size; ++k) { if (FD_ISSET(k, WriteFds)) { Events[k].Handle.resume(); } // ... } } std::vector<TEvent> Events; // ... }; I keep an array of pairs (socket descriptor, coroutine handle) that are used to initialize structures for the poller backend (in this case, select). Resume is called when the coroutines corresponding to ready sockets wake up. This is applied in the main function as follows: C++ TSimpleTask task(TSelect& poller) { TSocket socket(0, poller); char buffer[1024]; while (true) { auto readSize = co_await socket.ReadSome(buffer, sizeof(buffer)); } } int main() { TSelect poller; task(poller); while (true) { poller.Poll(); } } We start a coroutine (or coroutines) that enters sleep mode on co_await, and control is then passed to an infinite loop that invokes the poller mechanism. If a socket becomes ready within the poller, the corresponding coroutine is triggered and executed until the next co_await. To read and write Raft messages, I needed to create high-level wrappers over ReadSome/WriteSome, similar to: C++ TValueTask<T> Read() { T res; size_t size = sizeof(T); char* p = reinterpret_cast<char*>(&res); while (size != 0) { auto readSize = co_await Socket.ReadSome(p, size); p += readSize; size -= readSize; } co_return res; } // usage T t = co_await Read<T>(); To implement these, I needed to create a coroutine that also functions as an Awaitable. The coroutine is made up of a pair: coroutine_handle and promise. The coroutine_handle is used to manage the coroutine from the outside, whereas the promise is for internal management. The coroutine_handle can include Awaitable methods, which allow the coroutine's result to be awaited with co_await. The promise can be used to store the result returned by co_return and to awaken the calling coroutine. In coroutine_handle, within the await_suspend method, we store the coroutine_handle of the calling coroutine. Its value will be saved in the promise: C++ template<typename T> struct TValueTask : std::coroutine_handle<> { bool await_ready() { return !!this->promise().Value; } void await_suspend(std::coroutine_handle<> caller) { this->promise().Caller = caller; } T await_resume() { return *this->promise().Value; } using promise_type = TValuePromise<T>; }; Within the promise itself, the return_value method will store the returned value. The calling coroutine is woken up with an awaitable, which is returned in final_suspend. This is because the compiler, after co_return, invokes co_await on final_suspend. C++ template<typename T> struct TValuePromise { void return_value(const T& t) { Value = t; } std::suspend_never initial_suspend() { return {}; } // resume Caller here TFinalSuspendContinuation<T> final_suspend() noexcept; std::optional<T> Value; std::coroutine_handle<> Caller = std::noop_coroutine(); }; In await_suspend, the calling coroutine can be returned, and it will be automatically awakened. It is important to note that the called coroutine will now be in a sleeping state, and its coroutine_handle must be destroyed with destroy to avoid a memory leak. This can be accomplished, for example, in the destructor of TValueTask. C++ template<typename T> struct TFinalSuspendContinuation { bool await_ready() noexcept { return false; } std::coroutine_handle<> await_suspend( std::coroutine_handle<TValuePromise<T>> h) noexcept { return h.promise().Caller; } void await_resume() noexcept { } }; With the library description completed, I ported the libevent benchmark to it to ensure its performance. This benchmark generates a chain of N Unix pipes, each one linked to the next. It then initiates 100 write operations into the chain, which continues until there are 1000 total write calls. The image below depicts the benchmark's runtime as a function of N for various backends of my library (coroio) versus libevent. This test demonstrates that my library performs similarly to libevent, confirming its efficiency and effectiveness in managing I/O operations. Conclusion In closing, this article has described the implementation of a Raft server using C++20 coroutines, emphasizing the convenience and efficiency provided by this modern C++ feature. The custom I/O library, which was written from scratch, is critical to this implementation because it effectively handles asynchronous I/O operations. The performance of the library was validated against the libevent benchmark, demonstrating its competency. For those interested in learning more about or using these tools, the I/O library is available at coroio, and the Raft library at miniraft-cpp (linked at the beginning of the article). Both repositories provide a detailed look at how C++20 coroutines can be used to build robust, high-performance distributed systems.

By Aleksei Ozeritskii
A Brief History of DevOps and the Link to Cloud Development Environments
A Brief History of DevOps and the Link to Cloud Development Environments

The history of DevOps is definitely worth reading in a few good books about it. On that topic, “The Phoenix Project,” self-characterized as “a novel of IT and DevOps,” is often mentioned as a must-read. Yet for practitioners like myself, a more hands-on one is “The DevOps Handbook” (which shares Kim as an author in addition to Debois, Willis, and Humble) that recounts some of the watershed moments around the evolution of software engineering and provides good references around implementation. This book actually describes how to replicate the transformation explained in the Phoenix Project and provides case studies. In this brief article, I will use my notes on this great book to regurgitate a concise history of DevOps, add my personal experience and opinion, and establish a link to Cloud Development Environments (CDEs), i.e., the practice of providing access to and running, development environments online as a service for developers. In particular, I explain how the use of CDEs concludes the effort of bringing DevOps “fully online.” Explaining the benefits of this shift in development practices, plus a few personal notes, is my main contribution in this brief article. Before clarifying the link between DevOps and CDEs, let’s first dig into the chain of events and technical contributions that led to today’s main methodology for delivering software. The Agile Manifesto The creation of the Agile Manifesto in 2001 sets forth values and principles as a response to more cumbersome software development methodologies like Waterfall and the Rational Unified Process (RUP). One of the manifesto's core principles emphasizes the importance of delivering working software frequently, ranging from a few weeks to a couple of months, with a preference for shorter timescales. The Agile movement's influence expanded in 2008 during the Agile Conference in Toronto, where Andrew Shafer suggested applying Agile principles to IT infrastructure rather than just to the application code. This idea was further propelled by a 2009 presentation at the Velocity Conference, where a paper from Flickr demonstrated the impressive feat of "10 deployments a day" using Dev and Ops collaboration. Inspired by these developments, Patrick Debois organized the first DevOps Days in Belgium, effectively coining the term "DevOps." This marked a significant milestone in the evolution of software development and operational practices, blending Agile's swift adaptability with a more inclusive approach to the entire IT infrastructure. The Three Ways of DevOps and the Principles of Flow All the concepts that I discussed so far are today incarnated into the “Three Ways of DevOps,” i.e., the foundational principles that guide the practices and processes in DevOps. In brief, these principles focus on: Improving the flow of work (First Way), i.e., the elimination of bottlenecks, reduction of batch sizes, and acceleration of workflow from development to production, Amplifying feedback loops (Second Way), i.e., quickly and accurately collect information about any issues or inefficiencies in the system and Fostering a culture of continuous learning and experimentation (Third Way), i.e., encouraging a culture of continuous learning and experimentation. Following the leads from Lean Manufacturing and Agile, it is easy to understand what led to the definition of the above three principles. I delve more deeply into each of these principles in this conference presentation. For the current discussion, though, i.e., how DevOps history leads to Cloud Development Environments, we just need to look at the First Way, the principle of flow, to understand the causative link. Chapter 9 of the DevOps Handbook explains that the technologies of version control and containerization are central to implementing DevOps flows and establishing a reliable and consistent development process. At the center of enabling the flow is the practice of incorporating all production artifacts into version control to serve as a single source of truth. This enables the recreation of the entire production environment in a repeatable and documented fashion. It ensures that production-like code development environments can be automatically generated and entirely self-serviced without requiring manual intervention from Operations. The significance of this approach becomes evident at release time, which is often the first time where an application's behavior is observed in a production-like setting, complete with realistic load and production data sets. To reduce the likelihood of issues, developers are encouraged to operate production-like environments on their workstations, created on-demand and self-serviced through mechanisms such as virtual images or containers, utilizing tools like Vagrant or Docker. Putting these environments under version control allows for the entire pre-production and build processes to be recreated. Note that production-like environments really refer to environments that, in addition to having the same infrastructure and application configuration as the real production environments, also contain additional applications and layers necessary for development. Developers are encouraged to operate production-like environments (Docker icon) on their workstations using mechanisms such as virtual images or containers to reduce the likelihood of execution issues in production. From Developer Workstations to a CDE Platform The notion of self-service is already emphasized in the DevOps Handbook as a key enabler to the principle of flow. Using 2016 technology, this is realized by downloading environments to the developers’ workstations from a registry (such as DockerHub) that provides pre-configured, production-like environments as files (dubbed infrastructure-as-code). Docker is often a tool to implement this function. Starting from this operation, developers create an application in effect as follows: They access and copy files with development environment information to their machines, Add source code to it in the local storage, and Build the application locally using their workstation computing resources. This is illustrated in the left part of the figure below. Once the application works correctly, the source code is sent (“pushed) to a central code repository, and the application is built and deployed online, i.e., using Cloud-based resources and applications such as CI/CD pipelines. The three development steps listed above are, in effect, the only operations in addition to the authoring of source code using an IDE that is “local,” i.e., they use workstations’ physical storage and computing resources. All the rest of the DevOps operations are performed using web-based applications and used as-a-service by developers and operators (even when these applications are self-hosted by the organization.). The basic goal of Cloud Development Environments is to move these development steps online as well. To do that, CDE platforms, in essence, provide the following basic services, illustrated in the right part of the figure below: Manage development environments online as containers or virtual machines such that developers can access them fully built and configured, substituting step (1) above; then Provide a mechanism for authoring source code online, i.e., inside the development environment using an IDE or a terminal, substituting step (2); and finally Provide a way to execute build commands inside the development environment (via the IDE or terminal), substituting step (3). Figure: (left) The classic development data flow requires the use of the local workstation resources. (right) The cloud development data flow replaced local storage and computing while keeping a similar developer experience. On each side, operations are (1) accessing environment information, (2) adding code, and (3) building the application. Note that the replacement of step (2) can be done in several ways. For example, for example, the IDE can be browser-based (aka a Cloud IDE), or a locally installed IDE can implement a way to remotely author the code in the remote environment. It is also possible to use a console text editor via a terminal such as vim. I cannot conclude this discussion without mentioning that, often multiple containerized environments are used for testing on the workstation, in particular in combination with the main containerized development environment. Hence, cloud IDE platforms need to reproduce the capability to run containerized environments inside the Cloud Development Environment (itself a containerized environment). If this recursive process becomes a bit complicated to grasp, don’t worry; we have reached the end of the discussion and can move to the conclusion. What Comes Out of Using Cloud Development Environments in DevOps A good way to conclude this discussion is to summarize the benefits of moving development environments from the developers’ workstations online using CDEs. As a result, the use of CDEs for DevOps leads to the following advantages: Streamlined Workflow: CDEs enhance the workflow by removing data from the developer's workstation and decoupling the hardware from the development process. This ensures the development environment is consistent and not limited by local hardware constraints. Environment Definition: With CDEs, version control becomes more robust as it can uniformize not only the environment definition but all the tools attached to the workflow, leading to a standardized development process and consistency across teams across the organization. Centralized Environments: The self-service aspect is improved by centralizing the production, maintenance, and evolution of environments based on distributed development activities. This allows developers to quickly access and manage their environments without the need for Operations manual work. Asset Utilization: Migrating the consumption of computing resources from local hardware to centralized and shared cloud resources not only lightens the load on local machines but also leads to more efficient use of organizational resources and potential cost savings. Improved Collaboration: Ubiquitous access to development environments, secured by embedded security measures in the access mechanisms, allows organizations to cater to a diverse group of developers, including internal, external, and temporary workers, fostering collaboration across various teams and geographies. Scalability and Flexibility: CDEs offer scalable cloud resources that can be adjusted to project demands, facilitating the management of multiple containerized environments for testing and development, thus supporting the distributed nature of modern software development teams. Enhanced Security and Observability: Centralizing development environments in the Cloud not only improves security (more about secure CDEs) but also provides immediate observability due to their online nature, allowing for real-time monitoring and management of development activities. By integrating these aspects, CDEs become a solution for modern, in particular cloud-native software development, and align with the principles of DevOps to improve flow, but also feedback, and continuous learning. In an upcoming article, I will discuss the contributions of CDEs across all three ways of DevOps. In the meantime, you're welcome to share your feedback with me.

By Laurent Balmelli, PhD
Improving Upon My OpenTelemetry Tracing Demo
Improving Upon My OpenTelemetry Tracing Demo

Last year, I wrote a post on OpenTelemetry Tracing to understand more about the subject. I also created a demo around it, which featured the following components: The Apache APISIX API Gateway A Kotlin/Spring Boot service A Python/Flask service And a Rust/Axum service I've recently improved the demo to deepen my understanding and want to share my learning. Using a Regular Database In the initial demo, I didn't bother with a regular database. Instead: The Kotlin service used the embedded Java H2 database The Python service used the embedded SQLite The Rust service used hard-coded data in a hash map I replaced all of them with a regular PostgreSQL database, with a dedicated schema for each. The OpenTelemetry agent added a new span when connecting to the database on the JVM and in Python. For the JVM, it's automatic when one uses the Java agent. One needs to install the relevant package in Python — see next section. OpenTelemetry Integrations in Python Libraries Python requires you to explicitly add the package that instruments a specific library for OpenTelemetry. For example, the demo uses Flask; hence, we should add the Flask integration package. However, it can become a pretty tedious process. Yet, once you've installed opentelemetry-distro, you can "sniff" installed packages and install the relevant integration. Shell pip install opentelemetry-distro opentelemetry-bootstrap -a install For the demo, it installs the following: Plain Text opentelemetry_instrumentation-0.41b0.dist-info opentelemetry_instrumentation_aws_lambda-0.41b0.dist-info opentelemetry_instrumentation_dbapi-0.41b0.dist-info opentelemetry_instrumentation_flask-0.41b0.dist-info opentelemetry_instrumentation_grpc-0.41b0.dist-info opentelemetry_instrumentation_jinja2-0.41b0.dist-info opentelemetry_instrumentation_logging-0.41b0.dist-info opentelemetry_instrumentation_requests-0.41b0.dist-info opentelemetry_instrumentation_sqlalchemy-0.41b0.dist-info opentelemetry_instrumentation_sqlite3-0.41b0.dist-info opentelemetry_instrumentation_urllib-0.41b0.dist-info opentelemetry_instrumentation_urllib3-0.41b0.dist-info opentelemetry_instrumentation_wsgi-0.41b0.dist-info The above setup adds a new automated trace for connections. Gunicorn on Flask Every time I started the Flask service, it showed a warning in red that it shouldn't be used in production. While it's unrelated to OpenTelemetry, and though nobody complained, I was not too fond of it. For this reason, I added a "real" HTTP server. I chose Gunicorn, for no other reason than because my knowledge of the Python ecosystem is still shallow. The server is a runtime concern. We only need to change the Dockerfile slightly: Dockerfile RUN pip install gunicorn ENTRYPOINT ["opentelemetry-instrument", "gunicorn", "-b", "0.0.0.0", "-w", "4", "app:app"] The -b option refers to binding; you can attach to a specific IP. Since I'm running Docker, I don't know the IP, so I bind to any. The -w option specifies the number of workers Finally, the app:app argument sets the module and the application, separated by a colon Gunicorn usage doesn't impact OpenTelemetry integrations. Heredocs for the Win You may benefit from this if you write a lot of Dockerfile. Every Docker layer has a storage cost. Hence, inside a Dockerfile, one tends to avoid unnecessary layers. For example, the two following snippets yield the same results. Dockerfile RUN pip install pip-tools RUN pip-compile RUN pip install -r requirements.txt RUN pip install gunicorn RUN opentelemetry-bootstrap -a install RUN pip install pip-tools \ && pip-compile \ && pip install -r requirements.txt \ && pip install gunicorn \ && opentelemetry-bootstrap -a install The first snippet creates five layers, while the second is only one; however, the first is more readable than the second. With heredocs, we can access a more readable syntax that creates a single layer: Dockerfile RUN <<EOF pip install pip-tools pip-compile pip install -r requirements.txt pip install gunicorn opentelemetry-bootstrap -a install EOF Heredocs are a great way to have more readable and more optimized Dockerfiles. Try them! Explicit API Call on the JVM In the initial demo, I showed two approaches: The first uses auto-instrumentation, which requires no additional action The second uses manual instrumentation with Spring annotations I wanted to demo an explicit call with the API in the improved version. The use-case is analytics and uses a message queue: I get the trace data from the HTTP call and create a message with such data so the subscriber can use it as a parent. First, we need to add the OpenTelemetry API dependency to the project. We inherit the version from the Spring Boot Starter parent POM: XML <dependency> <groupId>io.opentelemetry</groupId> <artifactId>opentelemetry-api</artifactId> </dependency> At this point, we can access the API. OpenTelemetry offers a static method to get an instance: Kotlin val otel = GlobalOpenTelemetry.get() At runtime, the agent will work its magic to return the instance. Here's a simplified class diagram focused on tracing: In turn, the flow goes something like this: Kotlin val otel = GlobalOpenTelemetry.get() //1 val tracer = otel.tracerBuilder("ch.frankel.catalog").build() //2 val span = tracer.spanBuilder("AnalyticsFilter.filter") //3 .setParent(Context.current()) //4 .startSpan() //5 // Do something here span.end() //6 Get the underlying OpenTelemetry Get the tracer builder and "build" the tracer Get the span builder Add the span to the whole chain Start the span End the span; after this step, send the data to the OpenTelemetry endpoint configured Adding a Message Queue When I did the talk based on the post, attendees frequently asked whether OpenTelemetry would work with messages such as MQ or Kafka. While I thought it was the case in theory, I wanted to make sure of it: I added a message queue in the demo under the pretense of analytics. The Kotlin service will publish a message to an MQTT topic on each request. A NodeJS service will subscribe to the topic. Attaching OpenTelemetry Data to the Message So far, OpenTelemetry automatically reads the context to find out the trace ID and the parent span ID. Whatever the approach, auto-instrumentation or manual, annotations-based or explicit, the library takes care of it. I didn't find any existing similar automation for messaging; we need to code our way in. The gist of OpenTelemetry is the traceparent HTTP header. We need to read it and send it along with the message. First, let's add MQTT API to the project. XML <dependency> <groupId>org.eclipse.paho</groupId> <artifactId>org.eclipse.paho.mqttv5.client</artifactId> <version>1.2.5</version> </dependency> Interestingly enough, the API doesn't allow access to the traceparent directly. However, we can reconstruct it via the SpanContext class. I'm using MQTT v5 for my message broker. Note that the v5 allows for metadata attached to the message; when using v3, the message itself needs to wrap them. JavaScript val spanContext = span.spanContext //1 val message = MqttMessage().apply { properties = MqttProperties().apply { val traceparent = "00-${spanContext.traceId}-${spanContext.spanId}-${spanContext.traceFlags}" //2 userProperties = listOf(UserProperty("traceparent", traceparent)) //3 } qos = options.qos isRetained = options.retained val hostAddress = req.remoteAddress().map { it.address.hostAddress }.getOrNull() payload = Json.encodeToString(Payload(req.path(), hostAddress)).toByteArray() //4 } val client = MqttClient(mqtt.serverUri, mqtt.clientId) //5 client.publish(mqtt.options, message) //6 Get the span context Construct the traceparent from the span context, according to the W3C Trace Context specification Set the message metadata Set the message body Create the client Publish the message Getting OpenTelemetry Data From the Message The subscriber is a new component based on NodeJS. First, we configure the app to use the OpenTelemetry trace exporter: JavaScript const sdk = new NodeSDK({ resource: new Resource({[SemanticResourceAttributes.SERVICE_NAME]: 'analytics'}), traceExporter: new OTLPTraceExporter({ url: `${collectorUri}/v1/traces` }) }) sdk.start() The next step is to read the metadata, recreate the context from the traceparent, and create a span. JavaScript client.on('message', (aTopic, payload, packet) => { if (aTopic === topic) { console.log('Received new message') const data = JSON.parse(payload.toString()) const userProperties = {} if (packet.properties['userProperties']) { //1 const props = packet.properties['userProperties'] for (const key of Object.keys(props)) { userProperties[key] = props[key] } } const activeContext = propagation.extract(context.active(), userProperties) //2 const tracer = trace.getTracer('analytics') const span = tracer.startSpan( //3 'Read message', {attributes: {path: data['path'], clientIp: data['clientIp']}, activeContext, ) span.end() //4 } }) Read the metadata Recreate the context from the traceparent Create the span End the span For the record, I tried to migrate to TypeScript, but when I did, I didn't receive the message. Help or hints are very welcome! Apache APISIX for Messaging Though it's not common knowledge, Apache APISIX can proxy HTTP calls as well as UDP and TCP messages. It only offers a few plugins at the moment, but it will add more in the future. An OpenTelemetry one will surely be part of it. In the meantime, let's prepare for it. The first step is to configure Apache APISIX to allow both HTTP and TCP: YAML apisix: proxy_mode: http&stream #1 stream_proxy: tcp: - addr: 9100 #2 tls: false Configure APISIX for both modes Set the TCP port The next step is to configure TCP routing: YAML upstreams: - id: 4 nodes: "mosquitto:1883": 1 #1 stream_routes: #2 - id: 1 upstream_id: 4 plugins: mqtt-proxy: #3 protocol_name: MQTT protocol_level: 5 #4 Define the MQTT queue as the upstream Define the "streaming" route. APISIX defines everything that's not HTTP as streaming Use the MQTT proxy. Note APISIX offers a Kafka-based one Address the MQTT version. For version above 3, it should be 5 Finally, we can replace the MQTT URLs in the Docker Compose file with APISIX URLs. Conclusion I've described several items I added to improve my OpenTelemetry demo in this post. While most are indeed related to OpenTelemetry, some of them aren't. I may add another component in another different stack, a front-end. The complete source code for this post can be found on GitHub.

By Nicolas Fränkel DZone Core CORE

Culture and Methodologies

Agile

Agile

Career Development

Career Development

Methodologies

Methodologies

Team Management

Team Management

Choosing the Right Path Among a Plethora of Mobile App Development Methodologies

February 7, 2024 by Liza Kosh

Getting Hired as a Scrum Master or Agile Coach

February 7, 2024 by Stefan Wolpers DZone Core CORE

Operational Testing Tutorial: Comprehensive Guide With Best Practices

May 16, 2023 by Harshit Paul

Data Engineering

AI/ML

AI/ML

Big Data

Big Data

Databases

Databases

IoT

IoT

Recovering an MS SQL Database From Suspect Mode: Step-By-Step Guide

February 7, 2024 by Priyanka Chauhan

Enhancing Database Efficiency With MySQL Views: A Comprehensive Guide and Examples

February 7, 2024 by Vijay Panwar

Explainable AI: Making the Black Box Transparent

May 16, 2023 by Yifei Wang

Software Design and Architecture

Cloud Architecture

Cloud Architecture

Integration

Integration

Microservices

Microservices

Performance

Performance

Developing Software Applications Under the Guidance of Data-Driven Decision-Making Principles

February 7, 2024 by Sumit Sanwal

Kubernetes Updates and Maintenance: Minimizing Downtime Challenges

February 7, 2024 by Ankush Madaan

Low Code vs. Traditional Development: A Comprehensive Comparison

May 16, 2023 by Tien Nguyen

Coding

Frameworks

Frameworks

Java

Java

JavaScript

JavaScript

Languages

Languages

Tools

Tools

Recovering an MS SQL Database From Suspect Mode: Step-By-Step Guide

February 7, 2024 by Priyanka Chauhan

Enhancing Database Efficiency With MySQL Views: A Comprehensive Guide and Examples

February 7, 2024 by Vijay Panwar

Scaling Event-Driven Applications Made Easy With Sveltos Cross-Cluster Configuration

May 15, 2023 by Gianluca Mardente

Testing, Deployment, and Maintenance

Deployment

Deployment

DevOps and CI/CD

DevOps and CI/CD

Maintenance

Maintenance

Monitoring and Observability

Monitoring and Observability

Recovering an MS SQL Database From Suspect Mode: Step-By-Step Guide

February 7, 2024 by Priyanka Chauhan

Choosing the Right Path Among a Plethora of Mobile App Development Methodologies

February 7, 2024 by Liza Kosh

Low Code vs. Traditional Development: A Comprehensive Comparison

May 16, 2023 by Tien Nguyen

Popular

AI/ML

AI/ML

Java

Java

JavaScript

JavaScript

Open Source

Open Source

Unleashing the Power of Java Interfaces

February 7, 2024 by Sergiy Yevtushenko

From Algorithms to AI: The Evolution of Programming in the Age of Generative Intelligence

February 7, 2024 by Irene Fatyanova

Five IntelliJ Idea Plugins That Will Change the Way You Code

May 15, 2023 by Toxic Dev

ABOUT US

  • About DZone
  • Send feedback
  • Careers
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 100
  • Nashville, TN 37211
  • support@dzone.com

Let's be friends: