Data Engineering: The industry has come a long way from organizing unstructured data to adopting today's modern data pipelines. See how.
What metrics does your organization use to measure success? MTTR? Frequency of deploys? Other? Tell us!
Java Is Greener on Arm
Filtering Java Collections via Annotation-Driven Introspection
Data Engineering
Over a decade ago, DZone welcomed the arrival of its first ever data-centric publication. Since then, the trends surrounding the data movement have held many titles — big data, data science, advanced analytics, business intelligence, data analytics, and quite a few more. Despite its varying vernacular, the purpose has remained the same: to build intelligent, data-driven systems. The industry has come a long way from organizing unstructured data and driving cultural acceptance to adopting today's modern data pipelines and embracing business intelligence capabilities.This year's Data Engineering Trend Report draws all former terminology, advancements, and discoveries into the larger picture, illustrating where we stand today along our unique, evolving data journeys. Within these pages, readers will find the keys to successfully build a foundation for fast and vast data intelligence across their organization. Our goal is for the contents of this report to help guide individual contributors and businesses alike as they strive for mastery of their data environments.
Platform Engineering Essentials
Apache Kafka Essentials
Kubernetes has become the standard for container orchestration. Although APIs are a key part of most architectures, integrating API management directly into this ecosystem requires careful consideration and significant effort. Traditional API management solutions often struggle to cope with the dynamic, distributed nature of Kubernetes. This article explores these challenges, discusses solution paths, shares best practices, and proposes a reference architecture for Kubernetes-native API management. The Complexities of API Management in Kubernetes Kubernetes is a robust platform for managing containerized applications, offering self-healing, load balancing, and seamless scaling across distributed environments. This makes it ideal for microservices, especially in large, complex infrastructures where declarative configurations and automation are key. According to a 2023 CNCF survey, 84% of organizations are adopting or evaluating Kubernetes, highlighting the growing demand for Kubernetes-native API management to improve scalability and control in cloud native environments. However, API management within Kubernetes brings its own complexities. Key tasks like routing, rate limiting, authentication, authorization, and monitoring must align with the Kubernetes architecture, often involving multiple components like ingress controllers (for external traffic) and service meshes (for internal communications). The overlap between these components raises questions about when and how to use them effectively in API management. While service meshes handle internal traffic security well, additional layers of API management may be needed to manage external access, such as authentication, rate limiting, and partner access controls. Traditional API management solutions, designed for static environments, struggle to scale in Kubernetes’ dynamic, distributed environment. They often face challenges in integrating with native Kubernetes components like ingress controllers and service meshes, leading to inefficiencies, performance bottlenecks, and operational complexities. Kubernetes-native API management platforms are better suited to handle these demands, offering seamless integration and scalability. Beyond these points of confusion, there are other key challenges that make API management in Kubernetes a complex task: Configuration management: Managing API configurations across multiple Kubernetes environments is complex because API configurations often exist outside Kubernetes-native resources (kinds), requiring additional tools and processes to integrate them effectively into Kubernetes workflows. Security: Securing API communication and maintaining consistent security policies across multiple Kubernetes clusters is a complex task that requires automation. Additionally, some API-specific security policies and enforcement mechanisms are not natively supported in Kubernetes. Observability: Achieving comprehensive observability for APIs in distributed Kubernetes environments is difficult, requiring users to separately configure tools that can trace calls, monitor performance, and detect issues. Scalability: API management must scale alongside growing applications, balancing performance and resource constraints, especially in large Kubernetes deployments. Embracing Kubernetes-Native API Management As organizations modernize, many are shifting from traditional API management to Kubernetes-native solutions, which are designed to fully leverage Kubernetes' built-in features like ingress controllers, service meshes, and automated scaling. Unlike standard API management, which often requires manual configuration across clusters, Kubernetes-native platforms provide seamless integration, consistent security policies, and better resource efficiency, as a part of the Kubernetes configurations. Here are a few ways that you can embrace Kubernetes-native API management: Represent APIs and related artifacts the Kubernetes way: Custom Resource Definitions (CRDs) allow developers to define their own Kubernetes-native resources, including custom objects that represent APIs and their associated policies. This approach enables developers to manage APIs declaratively, using Kubernetes manifests, which are version-controlled and auditable. For example, a CRD could be used to define an API's rate-limiting policy or access controls, ensuring that these configurations are consistently applied across all environments. Select the right gateway for Kubernetes integration:Traditional Kubernetes ingress controllers primarily handle basic HTTP traffic management but lack the advanced features necessary for comprehensive API management, such as fine-grained security, traffic shaping, and rate limiting. Kubernetes-native API gateways, built on the Kubernetes Gateway API Specification, offer these advanced capabilities while seamlessly integrating with Kubernetes environments. These API-specific gateways can complement or replace traditional ingress controllers, providing enhanced API management features. It's important to note that the Gateway API Specification focuses mainly on routing capabilities and doesn't inherently cover all API management functionalities like business plans, subscriptions, or fine-grained permission validation. API management platforms often extend these capabilities to support features like monetization and access control. Therefore, selecting the right gateway that aligns with both the Gateway API Specification and the organization's API management needs is critical. In API management within an organization, a cell-based architecture may be needed to isolate components or domains. API gateways can be deployed within or across cells to manage communication, ensuring efficient routing and enforcement of policies between isolated components. Control plane and portals for different user personas: While gateways manage API traffic, API developers, product managers, and consumers expect more than basic traffic handling. They need features like API discovery, self-service tools, and subscription management to drive adoption and business growth. Building a robust control plane that lets users control these capabilities is crucial. This ensures a seamless experience that meets both technical and business needs. GitOps for configuration management: GitOps with CRDs extends to API management by using Git repositories for version control of configurations, policies, and security settings. This ensures that API changes are tracked, auditable, and revertible, which is essential for scaling. CI/CD tools automatically sync the desired state from Git to Kubernetes, ensuring consistent API configuration across environments. This approach integrates well with CI/CD pipelines, automating testing, reviewing, and deployment of API-related changes to maintain the desired state. Observability with OpenTelemetry: In Kubernetes environments, traditional observability tools struggle to monitor APIs interacting with distributed microservices. OpenTelemetry solves this by providing a vendor-neutral way to collect traces, metrics, and logs, offering essential end-to-end visibility. Its integration helps teams monitor API performance, identify bottlenecks, and respond to issues in real-time, addressing the unique observability challenges of Kubernetes-native environments. Scalability with Kubernetes' built-in features: Kubernetes' horizontal pod autoscaling (HPA) adjusts API gateway pods based on load, but API management platforms must integrate with metrics like CPU usage or request rates for effective scaling. API management tools should ensure that rate limiting and security policies scale with traffic and apply policies specific to each namespace, supporting multi-environment setups. This integration allows API management solutions to fully leverage Kubernetes' scalability and isolation features. Reference Architecture for Kubernetes-Native API Management To design a reference architecture for API management in a Kubernetes environment, we must first understand the key components of the Kubernetes ecosystem and their interactions. If you're already familiar with Kubernetes, feel free to move directly to the architecture details. Below is a list of the key components of the Kubernetes ecosystem and the corresponding interactions. API gateway: Deploy an API gateway as ingress or as another gateway that supports the Kubernetes Gateway API Specification. CRDs: Use CRDs to define APIs, security policies, rate limits, and observability configurations. GitOps for lifecycle management: Implement GitOps workflows to manage API configurations and policies. Observability with OpenTelemetry: Integrate OpenTelemetry to collect distributed traces, metrics, and logs. Metadata storage with etcd: Use etcd, Kubernetes’ distributed key-value store, for storing metadata such as API definitions, configuration states, and security policies. Security policies and RBAC: In Kubernetes, RBAC provides consistent access control for APIs and gateways, while network policies ensure traffic isolation between namespaces, securing API communications. Key components like the control plane, including consumer and producer portals, along with rate limiting, key and token management services, and developer tools for API and configuration design, are essential to this reference architecture. Figure: Reference architecture for Kubernetes-native API management Conclusion: Embracing Kubernetes-Native API Management for Operational Excellence Managing APIs in Kubernetes introduces unique challenges that traditional API management solutions are not equipped to handle. Kubernetes-native, declarative approaches are essential to fully leverage features like autoscaling, namespaces, and GitOps for managing API configurations and security. By adopting these native solutions, organizations can ensure efficient API management that aligns with Kubernetes' dynamic, distributed architecture. As Kubernetes adoption grows, embracing these native tools becomes critical for modern API-driven architectures. This article was shared as part of DZone's media partnership with KubeCon + CloudNativeCon.View the Event
As with past technology adoption journeys, initial experimentation costs eventually shift to a focus on ROI. In a recent post on X, Andrew Ng extensively discussed GenAI model pricing reductions. This is great news, since GenAI models are crucial for powering the latest generation of AI applications. However, model swapping is also emerging as both an innovation enabler, and a cost saving strategy, for deploying these applications. Even if you've already standardized on a specific model for your applications with reasonable costs, you might want to explore the added benefits of a multiple model approach facilitated by Kubernetes. A Multiple Model Approach to GenAI A multiple model operating approach enables developers to use the most up-to-date GenAI models throughout the lifecycle of an application. By operating in a continuous upgrade approach for GenAI models, developers can harness the specific strengths of each model as they shift over time. In addition, the introduction of specialized, or purpose-built models, enables applications to be tested and refined for optimal accuracy, performance and cost. Kubernetes, with its declarative orchestration API, is perfectly suited for rapid iteration in GenAI applications. With Kubernetes, organizations can start small and implement governance to conduct initial experiments safely and cost-effectively. Kubernetes’ seamless scaling and orchestration capabilities facilitate model swapping and infrastructure optimization while ensuring high performance of applications. Expect the Unexpected When Utilizing Models While GenAI is an extremely powerful tool for driving enhanced user experience, it's not without its challenges. Content anomalies and hallucinations are well-known concerns for GenAI models. Without proper governance, raw models—those used without an app platform to codify governance— are more likely to be led astray or even manipulated into jailbreak scenarios by malicious actors. Such vulnerabilities can result in financial loss amounting to millions in token usage and severely impact brand reputation. The financial implications of security failures are massive. A report by Cybercrime Magazine earlier this year suggests that cybercrime will cost upwards of $10 trillion annually by next year. Implementing effective governance and mitigation, such as brokering models through a middleware layer, will be critical to delivering GenAI applications safely, consistently, and at scale. Kubernetes can help with strong model isolation through separate clusters and then utilize a model proxy layer to broker the models to the application. Kubernetes' resource tagging adds another layer of value by allowing you to run a diverse range of model types or sizes, requiring different accelerators within the same infrastructure. This flexibility also helps with budget optimization, as it prevents defaulting to the largest, most expensive accelerators. Instead, you can choose a model and accelerator combo that strike a balance between excellent performance and cost-effectiveness, ensuring the application remains efficient while adhering to budget constraints. Example 1: Model curation for additional app platform governance and flexibility Moreover, role-based access controls in Kubernetes ensures that only authorized individuals or apps can initiate requests to certain models in an individual cluster. This not only prevents unnecessary expenses from unauthorized usage, but also enhances security across the board. Additionally, with the capacity to configure specific roles and permissions, organizations can better manage and allocate resources, minimize risks, and optimize operational efficiency. Rapidly evolving GenAI models benefit from these governance mechanisms while maximizing potential benefits. Scaling and Abstraction for GenAI! Oh My! The scale of the model you choose for your GenAI application can vary significantly depending on the applications’ requirements. Applications might work perfectly well with a simple, compact, purpose-built model versus a large, complex model that demands more resources. To ensure the optimal performance of your GenAI application, automating deployment and operations is crucial. Kubernetes can be made to facilitate this automation across multiple clusters and hosts using GitOps or other methodologies, enabling platform engineers to expedite GenAI app operations. One of the critical advantages of using Kubernetes for delivering GenAI apps is its ability to handle GPU and TPU accelerated workloads. Accelerators are essential for training and inferencing of complex models quickly and efficiently. With Kubernetes, you can easily deploy and manage clusters with hardware accelerators, allowing you to scale your GenAI projects as needed without worrying about performance being limited by hardware. The same can be said for models optimized for modern CPU instruction sets which helps avoid the need to schedule for more scarce GPUs and TPUs resources. In addition to handling GPU-accelerated workloads, Kubernetes also has features that make it well-suited for inferencing tasks. By utilizing capabilities like Horizontal Pod Autoscaling, Kubernetes can dynamically adjust resources based on the demand for your inferencing applications. This ensures that your applications are always running smoothly and can handle sudden spikes in traffic. On top of all this, the ML tooling ecosystem for Kubernetes is quite robust and allows for keeping data closer to the workloads. For example, JupyterHub can be used to deploy Jupyter notebooks right next to the data with GPUs auto-attached, allowing for enhanced latency and performance during the model experimentation phase. Getting Started With GenAI Apps With Kubernetes Platform engineering teams can be key enablers for GenAI application delivery. By simplifying and abstracting away complexity from developers, platform engineering can facilitate ongoing innovation with GenAI by curating models based on application needs. Developers don't need to acquire new skills in model evaluation and management; they can simply utilize the resources available in their Kubernetes-based application platform. Also, platform engineering can help with improved accuracy and cost effectiveness of GenAI apps by continuously assessing accuracy and optimizing costs through model swapping. With frequent advancements and the introduction of smaller GenAI models, applications can undergo refinements over time. Example 2: How VMware Cloud Foundation + VMware Tanzu leverage Kubernetes Kubernetes is pivotal in this continuous GenAI model upgrade approach, offering flexibility to accommodate model changes while adding access governance to the models. Kubernetes also facilitates seamless scaling and optimization of infrastructure while maintaining high-performance applications. Consequently, developers have the freedom to explore various models, and platform engineering can curate and optimize placement for those innovations. This article was shared as part of DZone's media partnership with KubeCon + CloudNativeCon.View the Event
We are living in a world where the internet is an inseparable part of our lives, and with the growth of Cloud computing and increased demand for AI/ML-based applications, the demand for network capacity is unstoppable. As networks scale exponentially, classical topologies and designs are struggling to keep in sync with the rapidly evolving demands of the modern IT infrastructure. Network management is getting complex due to the sheer amount of network infrastructure and links. AI-driven intent-based networking emerges as a potential solution, promising to reshape our approach to network management — but is it truly the solution to this problem it claims to be? Let’s dive into its details to understand how intent-based networking will be shaping the future of network management. What Is Intent-Based Networking? Traditional intent-based networking (IBN) evolved from software-defined networking (SDN). SDN is a very popular approach in network automation where software-defined controllers and APIs communicate with the physical Infrastructure. IBN is a natural progression of SDN that combines intelligence, analytics, machine learning, and orchestration to automate network management. It translates high-level business intent into network policies to configure the underlying network. IBN abstracts the complex part of underlying hardware, and network configuration to allow users to express their desired intent in natural language. AI-driven IBN brings together intelligence, analytics, machine learning, and orchestration to enhance traditional IBN capabilities. It can translate these intents into specific network configurations and policies more effectively, adapting to changing network conditions and requirements. Most modern, advanced IBN solutions do include ML and NLP to some degree making them AI-driven. Problem Statement: The user wants to balance bulk data transfer and low-latency traffic using available bandwidth. Intent: “Provision low latency for high-performance computing and GPU-accelerated database queries while supporting large dataset transfers.” Network automation with AI-driven IBN: Build and configure intelligent Quality of Service (QoS) policies and device configurations that prioritize low-latency traffic for latency-sensitive workloads over large database queries. Prioritize low latency with the high-priority QoS marking while allowing high-throughput transfers to utilize remaining bandwidth. Machine learning models continuously adjust these policies based on observed application performance. Key Components of Traditional IBN Systems To understand the advancements of AI-driven IBN, let's first examine the key components of a traditional IBN system. In a traditional IBN setup, the system consists of five main components that allow users to interact with the system and for the system to devise actions and implement changes in the network based on user intent. AI-driven Intent-Based Networking Intent Interface It’s a primary point of interaction between users and the IBN system. Network administrators and users can express their desired network configuration in natural language, eliminating its dependency on complex CLI commands and manual configurations. In traditional IBN, this interface typically relies on predefined templates and rule-based interpretation of user inputs. Intent Translation Engine This is the heart of the IBN where business intent is processed through advanced algorithms, and techniques and translated into actionable network configurations and policies. It bridges the gap between human-understandable intents and machine-executable network configurations. Traditional IBN systems use predetermined algorithms and logic trees for this translation process. Network Abstraction Layer This layer provides a unified view of the network, abstracting the underlying complexity of network infrastructure and protocols. It enables the IBN system to work seamlessly with heterogeneous network infrastructures. This abstraction in traditional IBN is often static and may require manual updates to accommodate new network elements or configurations Automation and Orchestration Engine This layer implements translated intents across network infrastructure and leverages software-defined networking to update network configuration and policies. In traditional IBN, this automation is based on predefined scripts and workflows. Continuous Validation and Assurance This feedback loop constantly monitors the network to ensure it follows the requested intent and makes necessary adjustments to maintain optimal performance. Traditional IBN systems perform this validation based on set thresholds and predefined performance metrics. The Role of AI in IBN Integration of AI with traditional Intent-Based Networking allows the system to understand, process, and execute high-level intents without reliance on complex CLI commands, or manual configurations and provides greater flexibility in network management. In this section, we will discuss how AI enhances Intent-Based Networking capabilities and automates network management tasks. The field of AI consists of various subfields such as Natural Language Processing (NLP), Machine Learning (ML), computer vision (CV), and robotics, among others. NLP allows systems to understand and process human language, while ML allows systems to learn from data without explicit programming. These and other AI subfields working with each other help us build intelligent systems. NLP and ML play a significant role in AI-driven IBN by giving the system the ability to understand and execute high-level intents. Natural Language Processing (NLP) NLP serves as the primary interface between network users and the IBN system. NLP allows users to express their intents in natural language and translate it into complex network configurations. Key applications of NLP in IIBN consist of intent translation, context understanding and processing, and automated network config generation. Machine Learning (ML) In AI-driven IBN, ML algorithms allow us to learn from the current network state, predict future states based on the topology and network changes, and make intelligent decisions for network optimization. One of the key applications of ML in IBN is traffic engineering where service providers aim to understand the network behavior, predict the future state, and adjust the network capacity and resources optimally and efficiently. AI-driven IBN is an intelligent system that incorporates both NLP, ML, and other AI subfields to provide the central framework for decision-making and problem-solving in IBN. It enables automated network design, network data analysis, intelligent troubleshooting, policy enforcement, and forecasting of potential failure scenarios. Application in High-Performance Computing Networks AI-driven IBN is a promising solution for hyper-scalar cloud providers who offer High-Performance Computing (HPC) environments, where the demands for high throughput, low latency, flexibility, and resource optimization are especially stringent. Some key applications include: Dynamic Resource Allocation In HPC, AI-driven IBN systems use algorithms such as Q-Learning and Random Forests to allocate network resources optimally by analyzing and predicting the current and future resource demand. These systems can bring flexibility and efficiency by utilizing HPC resources optimally and maximizing performance and network throughput. Workflow-Optimized Traffic Engineering AI-driven IBN systems can continuously analyze the current and future network state and demand to optimize network configurations. This is done by using Time Series Forecasting (e.g., ARIMA, Prophet) for traffic prediction, and anomaly detection algorithms for identifying unusual traffic patterns. The network configuration optimization might involve shifting traffic from a congested primary path to a secondary path, finding high-bandwidth paths for data transfer stages and low-latency paths for distributed computing stages. Fault Tolerance and Resilience IBN systems can predict and simulate potential failures for hardware resources and take proactive action to avoid catastrophic failures. It can triage, auto-mitigate, and remediate the events without interrupting network performance and service. To achieve this, IBN systems employ various algorithms and techniques. Predictive Failure Analysis using machine learning models like Random Forests or Support Vector Machines helps identify potential hardware failures before they occur. Self-healing networks leveraging reinforcement learning algorithms automatically reconfigure network paths when issues arise. These algorithms work together within the IBN framework to maintain robust network performance even in challenging conditions. AI-driven IBN in high-performance computing networks Challenges and Future Directions The availability of sufficient, good-quality data can be the first hurdle that companies have to overcome to be able to implement AI-driven IBN. The black-box nature of some AI/ ML models can lead to opaque decision-making making which needs to be overcome by making these processes transparent and understandable. Enterprise networks are complex and diverse in terms of hardware, configuration, and protocols; managing such enormous network infrastructure requires a lot of computational resources and power. Integration of IBN systems with existing network infrastructure and automation framework Complying with the security standards, polices and authentication becomes challenging with the scale and complexity. Ensuring IBN systems can make decisions and implement changes quickly enough to meet the performance requirements of modern networks As AI-driven IBN systems mature, we can expect to see increased network automation, enhanced machine learning algorithms, improved security, and greater efficiency in network management. However, realizing this future will require overcoming these challenges and addressing the skill gap in the networking industry. Conclusion AI-driven intent-based networking represents a significant advancement in how service providers can operate and manage their complex networks. With the integration of AI into IBN, users can navigate through this complexity, bring operational efficiency, get real-time visibility, and automate network management to bring the network state in sync with the business intent. The future of networking lies in the system that can analyze, interpret, process human intents, and achieve network autonomy by transforming network operations through intent-based networking.
As a Java developer, most of my focus is on the back-end side of debugging. Front-end debugging poses different challenges and has sophisticated tools of its own. Unfortunately, print-based debugging has become the norm in the front end. To be fair, it makes more sense there as the cycles are different and the problem is always a single-user problem. But even if you choose to use Console.log, there’s a lot of nuance to pick up there. Instant Debugging With the debugger Keyword A cool, yet powerful tool in JavaScript is the debugger keyword. Instead of simply printing a stack trace, we can use this keyword to launch the debugger directly at the line of interest. That is a fantastic tool that instantly brings your attention to a bug. I often use it in my debug builds of the front end instead of just printing an error log. How to Use It Place the debugger keyword within your code, particularly within error-handling methods. When the code execution hits this line, it automatically pauses, allowing you to inspect the current state, step through the code, and understand what's going wrong. Notice that while this is incredibly useful during development, we must remember to remove or conditionally exclude debugger statements in production environments. A release build should not include these calls in a production site live environment. Triggering Debugging From the Console Modern browsers allow you to invoke debugging directly from the console, adding an additional layer of flexibility to your debugging process. Example By using the debug(functionName) command in the console, you can set a breakpoint at the start of the specified function. When this function is subsequently invoked, the execution halts, sending you directly into the debugger. function hello(name) { Console.log("Hello " + name) } debug(hello) hello("Shai") This is particularly useful when you want to start debugging without modifying the source code, or when you need to inspect a function that’s only defined in the global scope. DOM Breakpoints: Monitoring DOM Changes DOM breakpoints are an advanced feature in Chrome and Firebug (Firefox plugin) that allow you to pause execution when a specific part of the DOM is altered. To use it, we can right-click on the desired DOM element, select “Break On,” and choose the specific mutation type you are interested in (e.g., subtree modifications, attribute changes, etc.). DOM breakpoints are extremely powerful for tracking down issues where DOM manipulation causes unexpected results, such as dynamic content loading or changes in the user interface that disrupt the intended layout or functionality. Think of them like the field breakpoints we discussed in the past. These breakpoints complement traditional line and conditional breakpoints, providing a more granular approach to debugging complex front-end issues. This is a great tool to use when the DOM is manipulated by an external dependency. XHR Breakpoints: Uncovering Hidden Network Calls Understanding who initiates specific network requests can be challenging, especially in large applications with multiple sources contributing to a request. XHR (XMLHttpRequest) breakpoints provide a solution to this problem. In Chrome or Firebug, set an XHR breakpoint by specifying a substring of the URI you wish to monitor. When a request matching this pattern is made, the execution stops, allowing you to investigate the source of the request. This tool is invaluable when dealing with dynamically generated URIs or complex flows where tracking the origin of a request is not straightforward. Notice that you should be selective with the filters you set: leaving the filter blank will cause the breakpoint to trigger on all XHR requests, which can become overwhelming. Simulating Environments for Debugging Sometimes, the issues you need to debug are specific to certain environments, such as mobile devices or different geographical locations. Chrome and Firefox offer several simulation tools to help you replicate these conditions on your desktop. Simulating user agents: Change the browser’s user agent to mimic different devices or operating systems. This can help you identify platform-specific issues or debug server-side content delivery that varies by user agent. Geolocation spoofing: Modify the browser’s reported location to test locale-specific features or issues. This is particularly useful for applications that deliver region-specific content or services. Touch and device orientation emulation: Simulate touch events or change the device orientation to see how your application responds to mobile-specific interactions. This is crucial for ensuring a seamless user experience across all devices. These are things that are normally very difficult to reproduce; e.g., touch-related issues are often challenging to debug on the device. By simulating them on the desktop browser we can shorten the debug cycle and use the tooling available on the desktop. Debugging Layout and Style Issues CSS and HTML bugs can be particularly tricky, often requiring a detailed examination of how elements are rendered and styled. Inspect Element The "inspect element" tool is the cornerstone of front-end debugging, allowing you to view and manipulate the DOM and CSS in real time. As you make changes, the page updates instantly, providing immediate feedback on your tweaks. Addressing Specificity Issues One common problem is CSS specificity, where a more specific selector overrides the styles you intend to apply. The inspect element view highlights overridden styles, helping you identify and resolve conflicts. Firefox vs. Chrome While both browsers offer robust tools, they have different approaches to organizing these features. Firefox’s interface may seem more straightforward, with fewer tabs, while Chrome organizes similar tools under various tabs, which can either streamline your workflow or add complexity, depending on your preference. Final Word There are many front-end tools that I want to discuss in the coming posts. I hope you picked up a couple of new debugging tricks in this first part. Front-end debugging requires a deep understanding of browser tools and JavaScript capabilities. By mastering the techniques outlined in this post — instant debugging with the debugger keyword, DOM and XHR breakpoints, environment simulation, and layout inspection — you can significantly enhance your debugging efficiency and deliver more robust, error-free web applications. Video
Lifecycle Development With AI We have seen a huge shift in the way developers and consultants are using Generative AI (GenAI) tools to create working microservices. A new tool named WebGenAI begins the process with a simple prompt to create a complete API microservice with a running React-Admin user interface and has the ability to iterate and add new features or even logic. WebGenAI is built on top of the existing Python open-source framework ApiLogicServer. The entire project can be downloaded as runnable Python code or a Docker container to use locally. It also pushes each iteration to GitHub and you can run the application using Codespaces. This is usually where the beginning of the full microservice lifecycle starts. Greenfield Project (Ideation) When a new project is started, it usually begins with paper documents that "explore and explain" the scope and direction, use cases, and workflows of the project. WebGenAI takes a prompt/model-driven approach to explore and visualize ideas. The ability to "iterate" over prior prompts to include new functionality helps the stakeholder and SME capture basic functionality. While this generative approach will never be the final release, each iteration will help get the project closer to the vision and capture requirements in real time. WebGenAI, shown below, begins with a simple prompt: "Create a dog-walking business". The result is a complete running application with React-Admin pages, sample data, and the ability to download the entire source to explore locally or run from GitHub Codespaces. Connect an Existing SQL Database The WebGenAI tool also has a feature called ConnectDB to prompt for an SQL database (or Excel Workbook), but this is intended to be used in conjunction with the local Docker version or in-house cloud deployment. Not many enterprises will want to put their corporate database credentials into a public website. However, using the local Docker version, WebGenAI can take advantage of an existing database schema to build a complete running application and create an API (based on JSON API) for each selected table. The application that is created will allow the stakeholder to visualize data and navigate between parent/child relationships. While iteration is possible, this would not be the main use case for GenAI. Local Installation ApiLogicServer is an open-source project built on Python 3.12 based on SQLALchemy ORM, Flask, and SAFRS/JSON API. Once your virtual environment is ready, a simple installation using pip will include SQLAlchemy ORM 2.x, Flask, LogicBank, and other libraries to start and run your downloaded project locally. (Note: WebGenAI also has a running Docker version to skip the local installation). In this example, Python and VSCode have already been installed. PowerShell python -m venv venv source venv/bin/activate (venv)pip install ApiLogicServer cd myDownloadedProject code . # Press F5 to start and run in VSCode The Developer Journey WebGenAI will let you see your prompt and project come to life: the actual open-source code can then be downloaded to a local development platform to use with your own IDE (VSCode, PyCharm, IntelliJ, etc). Once you download the project, you can use ApiLogicServer to organize it into folders (Note: An IDE and Python installation and configuration are required to run locally): api - Expose API endpoints and define custom API endpoints. config - Defines the project variables database - Defines the SQLAlchemy ORM logic - Declarative rules (derivations, events, and constraints) security - Declarative role-based access control devops - Scripts to build and deploy Docker containers test - Behave testing-driven tools ui - Both react-admin and Angular applications (using Ontimze from Imatia) Natural Language Logic Business Logic is a critical component of any microservice. Explore tools like OpenAPI, ChatGPT, and Copilot to see if they can take advantage of LogicBank, an open-source rules engine to generate declarative rules (e.g. sum, count, formula, copy, and constraints). This is very similar to working with a spreadsheet in 3 dimensions (e.g., rows/columns and parent/child tables). Code completion based on the model and Copilot integration makes the developer experience very friendly. It is amazing to see Copilot in the IDE turn business user statements like these: Markdown Enforce the Check Credit requirement (do not generate check constraints): 1. Customer.balance <= credit_limit 2. Customer.balance = Sum(Order.amount_total where date_shipped is null) 3. Order.amount_total = Sum(Item.amount) 4. Item.amount = quantity * unit_price 5. Store the Item.unit_price as a copy from Product.unit_price . . . into LogicBank Rules: Python Rule.constraint(validate=models.Customer, as_condition=lambda row: row.Balance <= row.CreditLimit, error_msg="balance ({round(row.Balance, 2)}) exceeds credit ({round(row.CreditLimit, 2)})") Rule.sum(derive=models.Customer.Balance, as_sum_of=models.Order.AmountTotal, where=lambda row: row.ShippedDate is None and row.Ready == True) Rule.sum(derive=models.Order.AmountTotal, Rule.formula(derive=models.OrderDetail.Amount, as_expression=lambda row: row.UnitPrice * row.Quantity) Rule.copy(derive=models.OrderDetail.UnitPrice, Declarative logic sits between the API and SQLAlchemy ORM/SQL database. This allows all API CRUD use cases to be handled consistently and new rules added or changed without having to worry about the order of operations (this is done using a runtime DAG to manage the order of operation) much like a spreadsheet. Integration Events ApiLogicServer explains that this is a 40x improvement over writing logic code by hand. Logic is applied to Attributes (derivations) or Entities (events and constraints). If the logic is a bit more complex, a Python function can be called to complete the processing. ALS can make calls to a Kafka producer, external API systems, or use any of Python libraries (e.g. math, Stripe, Email, etc.) in the function. If the microservice needs to interface with external systems like Kafka, email, payment system, or business process models like GBTec, the rules engine has a series of event types (early, row, commit, or flush). For example, a flush event is called after all the logic rules have been fired and the data has been written to the ORM (returning auto-increment keys). This would be the point: to call an external system and pass a pre-configured payload. These events act like webhooks attached to specific API entities to integrate other systems (e.g., notify order shipment, update payment processing, send an email message, start a business process). Python def send_order_to_shipping(row: models.Order, old_row: models.Order, logic_row: LogicRow): """ #als: Send Kafka message formatted by OrderShipping RowDictMapper Format row per shipping requirements, and send (e.g., a message) NB: the after_flush event makes Order.Id available. Contrast to congratulate_sales_rep(). Args: row (models.Order): inserted Order old_row (models.Order): n/a logic_row (LogicRow): bundles curr/old row, with ins/upd/dlt logic """ if (logic_row.is_inserted() and row.Ready == True) or \ (logic_row.is_updated() and row.Ready == True and old_row.Ready == False): kafka_producer.send_kafka_message(logic_row=logic_row, row_dict_mapper=OrderShipping, kafka_topic="order_shipping", kafka_key=str(row.Id), msg="Sending Order to Shipping") Rule.after_flush_row_event(on_class=models.Order, calling=send_order_to_shipping) Custom API Endpoints This is another one of the key features of ApiLogicServer. Not only will it expose the ORM entities as JSON API endpoints, but the developer can create custom API endpoints to perform specific tasks. One example of this is the Ontimize Angular User Interface that is created from the generated API. Ontimize would normally send HTTP requests to their own Java/Swing server. By exposing a custom endpoint, ApiLogicServer acts like a bridge for all GET, PATCH, POST, and DELETE and returns a formatted JSON response to Ontimize. So all the Ontimize UI features are now supported without having to rewrite the front-end framework. Authentication and Declarative Security Security is easily added using the command line tools. ApiLogicServer offers a local Keycloak Docker container (pre-configured). An alternative is to use the “sql” provider type to run a local SQLite authentication database and model. Other authentication models can be easily added (e.g., LDAP, Active Directory, OKTA, or OAuth). $als add-auth --provider-type=keycloak --db-url=localhost The KeyCloak configuration settings are stored in the config/config.py file and can be easily changed for test and production deployment. RBAC: Role-Based Access Control Security can be enabled at any point in the development process. Once enabled, the developer will need to create roles to be assigned to the various users. Roles declare general CRUD access to API endpoints (read, insert, update, and delete). A user can have one or more roles, and specific grants can be added to modify role access to an API which includes row-level security and tenancy filters. Python DefaultRolePermission(to_role = Roles.tenant, can_read=True, can_delete=True) DefaultRolePermission(to_role = Roles.employee, can_read=True, can_delete=False) DefaultRolePermission(to_role = Roles.customer, can_read=True, can_delete=False) DefaultRolePermission(to_role = Roles.sales, can_read=True, can_delete=False) DefaultRolePermission(to_role = Roles.public, can_read=True, can_delete=False) GlobalFilter( global_filter_attribute_name = "Client_id", roles_not_filtered = ["sa"], filter = '{entity_class}.Client_id == Security.current_user().client_id') GlobalFilter( global_filter_attribute_name = "SecurityLevel", roles_not_filtered = ["sa", "manager"], filter = '{entity_class}.SecurityLevel == 0') ############################################# # Observe: Filters are AND'd, Grants are OR'd ############################################# GlobalFilter( global_filter_attribute_name = "Region", # sales see only Customers in British Isles (9 rows) roles_not_filtered = ["sa", "manager", "tenant", "renter", "public"], # ie, just sales filter = '{entity_class}.Region == Security.current_user().region') GlobalFilter( global_filter_attribute_name = "Discontinued", roles_not_filtered = ["sa", "manager"], filter = '{entity_class}.Discontinued == 0') Grant( on_entity = models.Customer, to_role = Roles.customer, filter = lambda : models.Customer.Id == Security.current_user().id, filter_debug = "Id == Security.current_user().id") Grant( on_entity = models.Customer, to_role = Roles.sales, filter = lambda : models.Customer.CreditLimit > 300, filter_debug = "CreditLimit > 300") Command Line Tools The lifecycle of any project involves change. The API design may introduce new tables or columns, the ORM data model can change, the UI components need new lookup relationships or a new special API can be introduced. ApiLogicServer offers several command line tools and options to “rebuild-from-database” or “rebuild-from-model”. These commands can rebuild the ORM or the UI components. There are also GenAI commands that can be used to create new applications from prompts. Debugging API and Rules Using VSCode on a local desktop allows the developer to run the microservice and place breakpoints to explore rules and custom API endpoints. This is a must-have for any developer learning a new system to see and understand what is going on and how to fix issues. A nice feature is the ability to link directly to GitHub for each iteration and run the project using GitHub Codespaces. Behave API Testing This style of test-driven development (TDD) begins with a feature and scenarios that need to be tested. The implementation is a simple Python program that breaks the scenario steps down into simple instructions. While I have not tried to generate these tests with Copilot, this may be a nice future feature request. Python Feature: Salary Change Scenario: Audit Salary Change Given Employee 5 (Buchanan) - Salary 95k When Patch Salary to 200k Then Salary_audit row created Scenario: Manage ProperSalary Given Employee 5 (Buchanan) - Salary 95k When Retrieve Employee Row Then Verify Contains ProperSalary Scenario: Raise Must be Meaningful Given Employee 5 (Buchanan) - Salary 95k When Patch Salary to 96k Then Reject - Raise too small Lifecycle With GitHub All of the components from ApiLogicServer can be checked into GitHub (models, logic, configurations, tests, UI components, etc.) to support multi-developer teams. DevOps Deployment The ApiLogicServer directory, devops, provides a series of directories to build an image, use NGINX, and deploy Docker containers. This takes DevOps a long way down the road to making the project visible on a cloud or on-premise server. User Interfaces ApiLogicServer provides a react-admin UI (seen in WebGenAI) that allows exploration and navigation of data. There is also an Ontimze Angular application (from Imatia) which provides a more full-featured UX developer experience. Both of these are created from a yaml model file which can be edited and the pages regenerated from the yaml. Ontimize gives the UI/UX team a pre-configured and ready-to-run set of pages for all CRUD operations for every API Endpoint. Ontimize is a mature extensible Angular framework including charts, PDF reports, maps, and editable grids. Ontimize will introduce TypeScript, Node.js, and NPM into the development stack, but the look and feel of an Angular application will move the UX closer to production. Summary WebGenAI starts the process of prompting, iteration, and generation of an API microservice. Once the base model is ready, the developer team takes over to modify the model, add logic and security, write tests, create a custom UI, and build and deploy Docker containers to the cloud. While this is not exactly low-code, this is a rich platform of integrated services that make the development of a running microservice a new standard combining AI generation with open-source platform tools. You can try a limited web version by providing your GitHub or Google information. Your entire application can be downloaded as a Docker container or a complete source library running the ApiLogicServer framework.
Continuous Integration and Continuous Deployment (CI/CD) pipelines are crucial for modern software development. This article explores advanced techniques to optimize these pipelines, enhancing efficiency and reliability for enterprise-level operations. Parallelization Using Matrix Builds GitHub Actions CI tests using the matrix strategy to run jobs in parallel: YAML jobs: CI-Test: strategy: matrix: os: [ubuntu-latest, windows-latest, macos-latest] node-version: [14.x, 16.x, 18.x] runs-on: ${{ matrix.os } steps: - uses: actions/checkout@v3 - name: Use Node.js ${{ matrix.node-version } uses: actions/setup-node@v3 with: node-version: ${{ matrix.node-version } - run: npm ci - run: npm test The above job is executed across different platforms and Node.js versions simultaneously, thereby significantly reducing overall execution time. Caching Dependencies for Faster Builds Caching the dependencies and reusing the cache for future runs reduces build time. YAML - uses: actions/cache@v3 with: path: ~/.npm key: ${{ runner.OS }-node-${{ hashFiles('**/package-lock.json') } restore-keys: | ${{ runner.OS }-node- The workflow caches npm dependencies and reuses the downloaded dependencies for future workflow runs, reducing build time and alleviating network congestion. Reusable Workflows Reusable workflows in GitHub Actions allow you to create a workflow with scripts that can be used across multiple workflows, improving efficiency and reducing code duplication. Here is a reusable workflow that can be used to build Docker images. repo/.github/workflows/reusable-docker-build-publish.yml@main YAML name: Docker Build and Publish Workflow on: workflow_call: inputs: dockerfile-path: description: 'Path to Dockerfile' required: true image-name: description: 'Name of the Docker image' required: true image-tag: description: 'Image Tag' required: true secrets: REGISTRY_URL: description: 'Registry URL' required: true REGISTRY_USERNAME: description: 'Registry username' required: true REGISTRY_PASSWORD: description: 'Registry password' required: true jobs: build-and-publish: runs-on: ubuntu-latest steps: - name: Checkout code uses: actions/checkout@v3 - name: Set up Docker Buildx uses: docker/setup-buildx-action@v2 - name: Log in to Registry run: docker login "${{ secrets.REGISTRY_URL }" -u "${{ secrets.REGISTRY_USERNAME }" --password "${{ secrets.REGISTRY_PASSWORD }" - name: Build Docker image run: docker build -t ${{ inputs.image-name }:${{ inputs.image-tag } -f ${{ inputs.dockerfile-path } . - name: Push Docker image run: docker push ${{ inputs.image-name }:${{ inputs.image-tag } The reusable workflow expects the following inputs: Input Description Required Example dockerfile-path Path to Dockerfile Yes ./Dockerfile image-name Name of the Docker Image Yes my-image image-tag Name of the Docker Tag Yes 1.0.0, latest Other workflows can call the reusable workflow to build the Docker images. YAML name: Docker Build and Publish Workflow on: push: branches: - main jobs: build_publish_image: uses: repo/.github/workflows/reusable-docker-build-publish.yml@main with: dockerfile-path: './Dockerfile' image-name: 'my-org/my-app' image-tag: 'latest' secrets: REGISTRY_URL: ${{ secrets.REGISTRY_URL } # Make sure this matches the secret in your reusable workflow REGISTRY_USERNAME: ${{ secrets.REGISTRY_USERNAME } # Same here REGISTRY_PASSWORD: ${{ secrets.REGISTRY_PASSWORD } # And here Ensure that the secrets (REGISTRY_URL, REGISTRY_USERNAME, REGISTRY_PASSWORD) are set in the repository's Secrets section under Settings > Secrets and Variables > Actions. Modify the inputs as necessary to fit your repository's Docker image name, tag, and Dockerfile location. Conditional Execution Use conditional statements like "if" statements to avoid unnecessary workflow executions. For example, skip deployment steps based on certain conditions such as changes limited to specific branches, commits involving documentation changes, skipping PRs, etc. YAML jobs: Branches: if: github.ref == 'refs/heads/main' || github.ref == 'refs/heads/test' runs-on: ubuntu-latest steps: - run: npm test // Check commit messages with words "docs" CommitMessage: if: "!contains(github.event.head_commit.message, 'docs')" runs-on: ubuntu-latest steps: - run: echo "This is only executed if the commit message does not contain word 'docs'" // Run the job only if there is a push to Main branch Deploy: if: github.event_name == 'push' && github.ref == 'refs/heads/main' runs-on: ubuntu-latest steps: - run: echo "Deploying to production" Job Concurrency Cancel redundant jobs which could waste runner resources or overload the system. YAML concurrency: group: ${{ github.ref } cancel-in-progress: true Final Thought By implementing the above strategies, you can greatly improve the speed and quality of software delivery pipelines.
The rapid rise of AI services has created a massive demand for computing resources, making efficient management of those resources a critical challenge. While running AI workloads with Kubernetes has come a long way, optimizing scheduling based on dynamic demand continues to be an area for improvement. Many organizations face constraints related to the cost and availability of GPU clusters worldwide and often rely on the same compute clusters for inference workloads and continuous model training and fine-tuning. AI Model Training and Model Inferencing in Kubernetes Training typically requires far more computational power than inferencing. On the other hand, inferencing is far more frequent than training as it is used to make predictions repeatedly across many applications. Let’s explore how we can harness the best of what the cloud has to offer with advances in Kubernetes to optimize resource allocation by prioritizing workloads dynamically and efficiently based on need. The diagram below shows the process of training versus inferencing. For training, workloads may run less frequently but with more resources needed as we essentially “teach” it how to respond to new data. Once trained, a model is deployed and will often run on GPU compute instances to provide the best results with low latency. Inferencing will thus run more frequently, but not as intensely. All the while, we may go back and retrain a model to accommodate new data or even try other models that need to be trained before deployment. AI Model Training vs. AI Model Inferencing AI workloads, especially training, are like High Performance Computing (HPC) workloads. Kubernetes wasn’t designed for HPC, but because Kubernetes is open source and largely led by the community, there have been rapid innovations in this space. The need for optimization has led to the development of tools like KubeFlow and Kueue. AI Workloads for Kubernetes KubeFlow uses pipelines to simplify the steps in data science into logical blocks of operation and offers numerous libraries that plug into these steps so you can get up and running quickly. Kueue provides resource “flavors” that allow it to tailor workloads to the hardware provisioning available at the time and schedule the correct workloads accordingly (there’s much more to it, of course). The community has done an outstanding job of addressing issues of scaling, efficiency, distribution, and scheduling with these tools and more. Below is an example of how we can use Kubernetes to schedule and prioritize training and inference jobs on GPU clusters backed with Remote Direct Memory Access-- RDMA (RoCEv2). Let's create some sample code to demonstrate this concept. Note: In the code we use a fictional website, gpuconfig.com for the GPU manufacturer. Also, <gpu name> is a placeholder for the specific GPU you wish to target. Shell apiVersion: scheduling.k8s.io/v1 kind: PriorityClass metadata: name: high-priority-<gpu name> value: 1000000 globalDefault: false description: "This priority class should be used for high priority <GPU NAME> GPU jobs only." --- apiVersion: scheduling.k8s.io/v1 kind: PriorityClass metadata: name: medium-priority-<gpu name> value: 100000 globalDefault: false description: "This priority class should be used for medium priority <GPU NAME> GPU jobs." --- apiVersion: v1 kind: Pod metadata: name: high-priority-gpu-job spec: priorityClassName: high-priority-<gpu name> containers: - name: gpu-container image: gpu/<gpu image> command: [" <gpu vendor>-smi"] resources: limits: gpuconfig.com/gpu: 1 nodeSelector: gpu-type: <gpu name> rdma: "true" --- apiVersion: v1 kind: Pod metadata: name: medium-priority-gpu-job spec: priorityClassName: medium-priority-<gpu name> containers: - name: gpu-container image: gpu/<gpu image> command: [" <gpu vendor>-smi"] resources: limits: gpuconfig.com/gpu: 1 nodeSelector: gpu-type: <gpu name> rdma: "true" This Kubernetes configuration demonstrates how to prioritize jobs on our GPU nodes using an RDMA backbone. Let's break down the key components: 1. PriorityClasses: We've defined two priority classes for our GPU’s jobs: high-priority-<gpu name>: For critical jobs that need immediate execution. medium-priority-<gpu name>: For jobs that are important but can wait if necessary. 2. Pod Specifications: We've created two sample pods to show how to use these priority classes: high-priority-gpu-job: Uses the high-priority-<gpu name> class. medium-priority-gpu-job: Uses the medium-priority-<gpu name> class. 3. Node Selection: Both pods use nodeSelector to ensure they're scheduled on specific GPUs with RDMA: Shell nodeSelector: gpu-type: <gpu name> rdma: "true" 4. Resource Requests: Each pod requests one GPU: Shell resources: limits: gpuconfig.com/gpu: 1 Kubernetes uses priority classes to determine the order in which pods are scheduled and which pods are evicted if resources are constrained. Here's an example of how you might create a CronJob that uses a high-priority class: Shell apiVersion: batch/v1beta1 kind: CronJob metadata: name: high-priority-ml-training spec: schedule: "0 2 * * *" jobTemplate: spec: template: metadata: name: ml-training-job spec: priorityClassName: high-priority-<gpu name> containers: - name: ml-training image: your-ml-image:latest resources: limits: gpuconfig.com/gpu: 2 restartPolicy: OnFailure nodeSelector: gpu-type: <gpu name> rdma: "true" GPU Resource Management in Kubernetes Below are some examples of GPU resource management in Kubernetes. Shell apiVersion: v1 kind: ResourceQuota metadata: name: gpu-quota namespace: ml-workloads spec: hard: requests.gpuconfig.com/gpu: 8 limits.gpuconfig.com/gpu: 8 --- apiVersion: v1 kind: LimitRange metadata: name: gpu-limits namespace: ml-workloads spec: limits: - default: gpuconfig.com/gpu: 1 defaultRequest: gpuconfig.com/gpu: 1 max: gpuconfig.com/gpu: 4 min: gpuconfig.com/gpu: 1 type: Container --- apiVersion: scheduling.k8s.io/v1 kind: PriorityClass metadata: name: gpu-burst value: 1000000 globalDefault: false description: "This priority class allows for burst GPU usage, but may be preempted." --- apiVersion: v1 kind: Pod metadata: name: gpu-burst-job namespace: ml-workloads spec: priorityClassName: gpu-burst containers: - name: gpu-job image: gpu/<gpu image> command: [" <gpu vendor>-smi"] resources: limits: gpuconfig.com/gpu: 2 nodeSelector: gpu-type: <gpu name> In the past, it could be a challenge to know the current state of hardware to prioritize workloads, but thanks to open-source tools we now have solutions. For monitoring GPU utilization, we’re using tools like Prometheus and Grafana. Here's a sample Prometheus configuration to scrape GPU metrics: Shell global: scrape_interval: 15s scrape_configs: - job_name: 'gpu_gpu_exporter' static_configs: - targets: ['localhost:9835'] And here's a simple Python script that we are using to optimize GPU allocation based on utilization metrics: Python import kubernetes from prometheus_api_client import PrometheusConnect def get_gpu_utilization(prometheus_url, pod_name): prom = PrometheusConnect(url=prometheus_url, disable_ssl=True) query = f'gpu_gpu_utilization{{pod="{pod_name}"}' result = prom.custom_query(query) return float(result[0]['value'][1]) if result else 0 def optimize_gpu_allocation(): kubernetes.config.load_kube_config() v1 = kubernetes.client.CoreV1Api() pods = v1.list_pod_for_all_namespaces(label_selector='gpu=true').items for pod in pods: utilization = get_gpu_utilization('http://prometheus:9090', pod.metadata.name) if utilization < 30: # If GPU utilization is less than 30% # Reduce GPU allocation patch = { "spec": { "containers": [{ "name": pod.spec.containers[0].name, "resources": { "limits": { "gpuconfig.com/gpu": "1" } } }] } } v1.patch_namespaced_pod(name=pod.metadata.name, namespace=pod.metadata.namespace, body=patch) print(f"Reduced GPU allocation for pod {pod.metadata.name}") if __name__ == "__main__": optimize_gpu_allocation() This script checks GPU utilization for pods and reduces allocation if utilization is low. This script is run as a function to optimize resource usage. Leveraging Kubernetes to Manage GPU and CPU Resources Thus, we leveraged Kubernetes with OCI Kubernetes Engine (OKE) to dynamically manage GPU and CPU resources across training and inference workloads for AI models. Specifically, we focused on right-sizing the GPU allocations with RDMA (RoCEv2) capabilities. We developed Kubernetes configurations, helm charts, including custom priority classes, node selectors, and resource quotas, to ensure optimal scheduling and resource prioritization for both high-priority and medium-priority AI tasks.RDMA (RoCEv2) capabilities. We developed Kubernetes configurations, helm charts, including custom priority classes, node selectors, and resource quotas, to ensure optimal scheduling and resource prioritization for both high-priority and medium-priority AI tasks. By utilizing Kubernetes' flexibility, and OKE’s management capabilities on Oracle Cloud Infrastructure (OCI), we balanced the heavy compute demands of training with the lighter demands of inferencing. This ensured that resources were dynamically allocated, reducing waste while maintaining high performance for critical tasks. Additionally, we integrated monitoring tools like Prometheus to track GPU utilization and adjust allocations automatically using a Python script. This automation helped optimize performance while managing costs and availability. In Conclusion The solutions we outlined here apply universally across cloud and on-premises platforms using Kubernetes for AI/ML workloads. No matter the hardware, or any other compute platform, the key principles of using Kubernetes for dynamic scheduling and resource management remain the same. Kubernetes allows organizations to prioritize their workloads efficiently, optimizing their use of any available hardware resources. By using the same approach, enterprises can fine-tune their infrastructure, reduce bottlenecks, and cut down on underutilized resources, leading to more efficient and cost-effective operations. This article was shared as part of DZone's media partnership with KubeCon + CloudNativeCon.View the Event
Cloud-native technologies have ushered in a new era of database scalability and resilience requirements. To meet this demand, enterprises across multiple industries, from finance to retail, to healthcare, are turning to distributed databases to safely and effectively store data in multiple locations. Distributed databases provide consistency across availability zones and regions in the cloud, but some enterprises still question whether they should run their distributed database in Kubernetes. The Benefits of Running Distributed Databases on Kubernetes Listed below are some of the key benefits of running distributed databases on Kubernetes. Better Resource Utilization One benefit of running distributed databases on Kubernetes is better resource utilization. Many companies are adopting microservices architectures for their modern applications. This shift tends to propagate a lot of smaller databases. Companies often have a finite set of nodes on which to place those databases. So, when companies decide to manage these databases, they’re left with a sub-optimal allocation of databases onto nodes. Running on Kubernetes allows the underlying system to determine the best places to put the databases while optimizing resource placement on those nodes. Kubernetes is best utilized when running a large number of databases in a multi-tenant environment. In this deployment scenario, companies save on costs and require fewer nodes to run the same sort of databases. These databases also have different footprints, CPU resources, memory, and disk requirements. Elastic Scaling of Pod Resources Dynamically Another benefit of running distributed databases on Kubernetes is the elastic scaling of pod resources dynamically. Running on Kubernetes enables enterprises to utilize resources more efficiently. The Kubernetes orchestration platform can resize pod resources dynamically. Specifically, to scale a database to meet demanding workloads, you can modify memory, CPU, and disk. Kubernetes makes it easy to scale up automatically without incurring any downtime through its horizontal pod autoscaler (HPA) and vertical pod autoscaler (VPA) operators. This is important for AI and ML workloads. Kubernetes enables teams to scale these workloads so they can handle extensive processing and training without interference. A distributed SQL database seamlessly manages data migration between pods, ensuring scalable and reliable data storage. For VPA, however, it’s worth noting that a database would need to have more than one instance to avoid downtime. Consistency and Portability And a final benefit is consistency and portability between clouds, on-premises, and the edge. Companies want to consistently build, deploy, and manage workloads at different locations. They also want to move workloads from one cloud to another, if needed. However, most organizations also have a large amount of legacy code they still run on-premises and are looking to move these installations up into the cloud. Kubernetes allows you to deploy your infrastructure as code, in a consistent way, everywhere. This means you can write a bit of code that describes the resource requirements deployed to the Kubernetes engine, and the platform will take care of it. You now have the same level of control in the cloud that you have on bare metal servers in your data center or edge. This flexibility and the ability to simplify complex deployments are critical for enterprises as they work across distributed environments. Kubernetes’ built-in fault tolerance and self-healing features also support ML pipelines to ensure they operate smoothly, even when faced with technology failures or disruptions. Accelerating AI/ML Workloads Using Kubernetes Kubernetes offers many benefits to enterprises, but in today’s AI-driven landscape, its ability to support and accelerate artificial intelligence (AI) and machine learning (ML) workloads is crucial. The proliferation of AI has caused business priorities to shift for many companies. They want to use AI to uplevel their technology and products, leading to enhanced productivity, better customer experiences, and greater revenue. Investment in AI, however, means higher stakes. Businesses must ensure databases and workloads are running smoothly to facilitate AI adoption. Deploying on Kubernetes can help teams guarantee their workloads are reliable and scalable – ultimately driving successful AI implementation. The Kubernetes Approach Kubernetes has transformed how enterprises develop and deploy applications. Most established enterprises and cloud-born companies use Kubernetes in some form and it has become the de facto choice for container orchestration. In a distributed environment, however, no single database architecture fits all applications. Enterprises must determine the best choice for their current and future needs. I anticipate that cloud-native, geo-distributed databases will continue to grow in popularity as enterprises realize the value they provide and the ease of deployment in Kubernetes. This article was shared as part of DZone's media partnership with KubeCon + CloudNativeCon.View the Event
In previous articles, I’ve mentioned my short career in the music industry. Let me tell a quick story about something really cool that happened while playing keyboards on a new artist project in 1986. Emerging from the solo section of the first song on the album, sound engineer Alan Johnson had a cool idea that would catch the listener’s attention. The idea focused on a backward sound effect where Alan flipped the audio reels. (Here’s a YouTube video to better explain.) The sound engineer captured the listener’s attention by disrupting the standard structure of the song. At 19 years old, this was my first experience of someone disrupting the norm. Now, almost 40 years later, I’ve seen first-hand how market disruptors are the key to paving innovation across many aspects of life. My Current Career Not long after Marqeta was listed as #7 on CNBC’s Disruptor 50 list in 2021, I wrote about their APIs in “Leveraging Marqeta to Build a Payment Service in Spring Boot”. Marqeta provides the tools companies require to offer a new version of a payment card or wallet to meet customer needs. Following the success of my short series of articles, I was asked if I ever considered a career with Marqeta. Ultimately, the reason I chose Marqeta was driven by the effects of being a market disruptor in the FinTech industry. In the 2+ years since I started this chapter of my career, I’ve experienced the engineering excitement of being laser-focused on delighting our customers worldwide. Seeing the value Marqeta provided to the FinTech industry, I began exploring other potential market disruptors that are on the rise. About PuppyGraph I’ve experimented with graph models and often became frustrated by the dependency on extract, transform, and load (ETL) middleware. The thought of ETL takes me back to that time in my career when I spent a great deal of time building an ETL layer to address reporting needs. The consequence of ETL ultimately placed important features onto the team’s sprint backlog while introducing long-term tech debt. The other challenge I’ve run into is where the source data lives outside a graph database. Like the challenges with ETL adoption, adding a graph database is far from a trivial effort — especially in enterprises with complex infrastructures that introduce risk by adding a new data stack. That’s why I wanted to check out PuppyGraph, a solution that allows you to query traditional data sources (such as relational databases or data lakes) like a unified graph model. PuppyGraph removes the need for ETL integrations and is designed without duplicating existing data. For those organizations without a graph database, PuppyGraph is a graph query engine — allowing non-graph data sources to be queried natively. The architecture diagram may look something like this — leveraging a single copy of data that can be queried using both SQL and graph: For those new to graph database query languages: Gremlin is a generic graph traversal (and computation) language often used for graph path traversal, graph pattern matching, merging, ranking, and splitting. Cypher is a query language used for graph pattern matching. Based on this information, I think PuppyGraph has a shot at truly being a market disruptor. Let’s consider a use case to see why. A Fresh Look at Fraud Detection Since I’m in the FinTech space, I wanted to focus on a peer-to-peer (P2P) payment platform. Specifically, I wanted to look at fraud detection. So, I started with the use case and dataset from a blog post on fraud detection from Neo4j. For this use case, we’ll investigate a real-world data sample (anonymized) from a P2P payment platform. We’ll identify fraud patterns, resolve high-risk fraud communities, and apply recommendation methods. To accomplish this, we can use the following tech stack: PuppyGraph Developer Edition, which is free Apache Iceberg, with its open table format for analytic datasets Docker Personal subscription, which is free As noted above, we’ll do this without using any ETL tooling or duplicating any data. A local PuppyGraph instance can be started in Docker using the following command: docker run -p 8081:8081 -p 8182:8182 -p 7687:7687 -d --name puppy --rm --pull=always puppygraph/puppygraph:stable Our unified graph model will focus on the following attributes: User Credit cards Devices IP Addresses Based on the P2P platform sample data, we can illustrate our graph schema like this: Each user account has a unique 128-bit identifier, while the other nodes — representing unique credit cards, devices, and IP addresses — have been assigned random UUIDs. The used, has_cc, and has_ip identifier nodes have been added to easily reference the child collections. Additionally, the REFERRED and P2P relationships account for users who refer to or send payments to other users. Within the PuppyGraph UI, the schema appears as shown below: Based upon the design of the data, each user node has an indicator variable for money transfer fraud (named MoneyTransferFraud) that uses a value of 1 for known fraud and 0 otherwise. As a result, roughly 0.7 percent of user accounts are flagged for fraud. Looking at the source data within PuppyGraph, we can see that there are 789k nodes and nearly 1.8 million edges — making a strong use case for being able to identify more fraud scenarios using a graph-based approach. PuppyGraph in Action The 0.7 percent level of fraud seems quite low. My goal was to leverage PuppyGraph to perform additional analysis without using ETL or implementing a complex schema redesign. Let’s first use a simple Gremlin query to provide a radial layout of users flagged with fraud: Shell g.V().hasLabel('User').has('fraudMoneyTransfer', 1) Executing the query in PuppyGraph provides the following response: As expected, 243 confirmed users with fraud-based transactions were returned. In the PuppyGraph UI, we can pick a random user to validate the details: Now that we’ve been able to validate that the data is showing up as expected in PuppyGraph, we can dive deeper, analyzing the data further. Here’s the idea we want to explore: existing fraud users might be linked to the fraudulent activities of users who were not identified as fraud risks in the original efforts. For this article, we’ll identify similarities between user accounts using just the card, device, and IP address nodes that connect to at least one fraud risk account. Let’s create some queries for this use case. The first query enables the identifier filtering that connects to fraud risk accounts: Shell gds.run_cypher(''' MATCH(f:FraudRiskUser)-[:HAS_CC|HAS_IP|USED]->(n) WITH DISTINCT n MATCH(n)<-[r:HAS_CC|HAS_IP|USED]-(u) SET n:FraudSharedId SET r.inverseDegreeWeight = 1.0/(n.degree-1.0) RETURN count(DISTINCT n) ''') count(DISTINCT n) 18182 The second and third queries project the graph and write relationships back to the database with a score to represent similarity strength between user node pairs: Shell g, _ = gds.graph.project( 'similarity-projection', ['User', 'FraudSharedId'], ['HAS_CC', 'USED', 'HAS_IP'], relationshipProperties=['inverseDegreeWeight'] ) _ = gds.nodeSimilarity.write( g, writeRelationshipType='SIMILAR_IDS', writeProperty='score', similarityCutoff=0.01, relationshipWeightProperty='inverseDegreeWeight' ) g.drop() Using a similarity cutoff of 0.01 in the third query is intended to rule out weak associations and keep the similarities relevant. From there, we can run a Cypher query to rank users by how similar they are to known fraud risk communities: Shell gds.run_cypher(''' MATCH (f:FraudRiskUser) WITH f.wccId AS componentId, count(*) AS numberOfUsers, collect(f) AS users UNWIND users AS f MATCH (f)-[s:SIMILAR_IDS]->(u:User) WHERE NOT u:FraudRiskUser AND numberOfUsers > 2 RETURN u.guid AS userId, sum(s.score) AS totalScore, collect(DISTINCT componentId) AS closeToCommunityIds ORDER BY totalScore DESC ''') Based on this approach, we find some user records to analyze. Let’s examine the first record (userId 0b3f278ff6b348fb1a599479d9321cd9) in PuppyGraph: The most glaring aspect is how user nodes have relationships with each other. Notice how our user of interest (blue, in the center) is connected to two other users because of shared IP addresses. At first glance, those relationships could just be a coincidence. However, both of these users are connected to fraud risk communities. You can see that multiple users share the same devices and credit cards, and they also engage in P2P transactions with one another. Our user of interest seems to act as a bridge between the two communities. What’s our conclusion? It’s not only likely that the user is part of both fraud communities; this user’s connection may indicate that the two communities are actually the same. This was just a simple example, but it shows how we can use PuppyGraph to uncover additional instances of fraud just by looking for user relationships with existing users. We can implement additional explorations using community detection — for example, with the Louvain method — to locate similarities without needing an exact match. The initial 0.7 percent fraud identification is likely far lower than the actual fraudulent users in this dataset. However, we can only determine that by seeing entity relationships through a graph model, something that we get here without the need for resource- and time-intensive ETL. It’s pretty impressive to get this much data insight using a tech stack that’s available at no initial cost. Conclusion At the start of this article, I noted two examples of disruption that paved the way for innovation: Alan Johnson’s work on that 1986 album project produced a song that captured the listener’s attention — a goal every artist wants to accomplish Marqeta disrupted the FinTech industry by offering a collection of APIs and underlying tech to remove the pain of payment processing. Market disruptors are key to paving innovation and avoiding stagnation. Based on my initial exploration, I think PuppyGraph will be a market disruptor: With a modest tech stack — and no ETL or duplication of data — we had a graph-based data solution that identified a significant increase in impacted consumers over non-graph data solutions. Our solution performed with low latency and has been designed to scale as needed. Additionally, PuppyGraph allows for quickly switching schemas (which PuppyGraph refers to as instant schemas), though it wasn’t needed in this demo. I bring this up because it’s a use case that interests many potential users. My readers may recall my personal mission statement, which I feel can apply to any IT professional: “Focus your time on delivering features/functionality that extends the value of your intellectual property. Leverage frameworks, products, and services for everything else.” — J. Vester PuppyGraph adheres to my mission statement by avoiding the tech debt and hard costs associated with an ETL layer and the data duplication across the enterprise. The platform allows teams to focus on their use cases, leveraging open-source solutions along the way. Have a really great day!
I just gave a talk at All Things Open and it is hard to believe that Retrieval Augmented Generation (RAG) now seems like it has been a technique that we have been doing for years. There is a good reason for that, as over the last two years it has exploded in depth and breadth as the utility of RAG is boundless. The ability to improve the results of generated results from large language models is constantly improving as variations, improvements, and new paradigms are pushing things forward. Today we will look at: Practical applications for multimodal RAG Image Search with Filters Finding the best Halloween Ghosts Using Ollama, LLaVA 7B and LLM reranking Running advanced multimodal RAG locally I will use a couple of these new advancements in the state of RAG to solve a couple of Halloween problems. Let’s look at the problems: finding if something is a ghost and what is the cutest cat ghost. Practical Applications for Multimodal RAG Is Something a Ghost? Image Search With Filters and clip-vit-base-patch32 We want to build a tool for all the ghost detectors out there by helping determine if something is a “ghost." To do this, we will use our hosted “ghosts” collection that has a number of fields we can filter on as well as search our multimodal encoded vector. We allow someone to pass in a ghost photo via Google form, Streamlit app, S3 upload, and Jupyter Notebook. We encode that query, which can be a combination of text and/or image, by utilizing a ViT-B/32 Transformer architecture for image encoding and a masked self-attention Transformer for text encoding. This is done for you automatically thanks to the CLIP model from OpenAI is easy to use thanks to the Hugging Face’s Sentence Transformer. This lets us encode our suspected ghost image and use it to search our collection to see its similarity. If the similarity is high enough then we can consider it a "ghost." Collection Design Before you build any application, you should make sure you have it well-defined with all the fields you may need and the types and sizes that match your needs. For our collection of “ghosts”, at a minimum, we will need: An id field that is of type INT64, set as the primary key, and set to have Automatic ID generation The next field in our schema is ghostclass, which is a VARCHAR scalar string of length 20 that holds the traditional classifications of ghosts such as Class I, Class II, Fake, and Class IV. After that is category, which is a larger VARCHAR scalar string of length 256 that holds our short descriptions that are classifications such as Fake, Ghost, Deity, Unstable, and Legend. We add a field for s3path which is defined as a large VARCHAR scalar string of length 1,024 that holds an S3 Path to the image of the object. Finally, and most importantly, vector, which holds our floating-point vector of dimension 512. Now that we have our data schema, we can build it and use it for ghastly analytics against our data. Step 1: Connect to Milvus standalone. Step 2: Load the CLIP model. Step 3: Define our collection with its schema of vectors and scalars. Step 4: Encode our image to use for a query. Step 5: Run the query against the ghosts collection in our Milvus standalone database and look only for those filtered by the category of not Fake. We limit it to one result. Step 6: Check the distance. If it is 0.8 or higher, we will consider this a ghost. We do this by comparing the suspected entity to our large database of actual ghost photos, if something is the current class of ghost it should be similar to our existing ones. Step 7: The result is displayed with the prospective ghost and its nearest match. As you can see in our example we matched close enough to a similar "ghost" that was not in the Fake category. In a separate Halloween application, we will look at a different collection and a different encoding model for a separate use case also involving Halloween ghosts. Finding the Cutest Cat Ghost With Visualized BGE Model We want to find the cutest cat ghosts and perhaps others for winning prizes, putting on MEMEs, social media posts, or other important endeavors. This does require adding an encode_text method to our previous Encode class that calls self.model.encode(text=text), since the other options are just for images alone or images with text. The flexibility of the multimodal search of Milvus vectors is astounding. Our vector search is pretty simple: we just encode our text looking for the cutest cat ghost (in their little Halloween costume). Milvus will query the 768 dimension floating point vector and find us the nearest match. With all the spooky ghouls and ghosts in our databank, it's hard to argue with these results. Using Ollama, LLaVA 7B, and LLM Reranking Running Advanced RAG Locally Okay, this is a little trick AND treat: we can do both topics at the same time. We are able to run this entire advanced RAG technique locally utilizing Milvus Lite, Ollama, LLaVA 7B, and a Jupyter Notebook. We are going to do a multimodal search with a Generative Reranker. This uses an LLM to rank the images and explain the best results. Previously, we have done this with the supercharged GPT-4o model. I am getting good results with LLava 7B hosted locally with Ollama. Let’s show running this open, local, and free! We will reuse the existing example code to build the panoramic photo from the images returned by our hybrid search of an office photo with ghosts with the text “computer monitor with ghost”. We then send that photo to the Ollama-hosted LLaVA7B model with instructions on how to rank the results. We get back a ranking, an explanation, and an image. Search image and nine results LLM returned results for ranked list order The top one chosen from the index with an explanation generated by the LLM Our Milvus query to get results to feed the LLM You can find the complete code in our example GitHub and can use any images of your choosing as the example shows. There are also some references and documented code including a Streamlit application to experiment with on your own. Conclusion As you can see not only is multimodal RAG not scary, it is fun and useful for many applications. If you are interested in building more advanced AI applications, then try using the combination of Milvus and multimodal RAG. You can now move beyond only text and add images and more. Multimodal RAG opens up many new avenues for LLM generation, search, and AI applications in general. If you like this article we’d really appreciate it if you could give us a star on GitHub! If you’re interested in learning more, check out our Bootcamp repository on GitHub for examples of how to build Multimodal RAG apps with Milvus. Further Resources Ghosts are Unstructured Data Multimodal RAG Expanding Beyond Text for Smarter AI The Top 10 Best Multimodal AI Models Multimodal RAG Notebook All Things Open — RAG Talk
Challenges and Ethical Considerations of AI in Team Management
October 30, 2024 by
October 30, 2024 by
What the CrowdStrike Crash Exposed About the Future of Software Testing
November 1, 2024 by
Smart Routing Using AI for Efficient Logistics and Green Solutions
November 1, 2024 by
Explainable AI: Making the Black Box Transparent
May 16, 2023 by CORE
What the CrowdStrike Crash Exposed About the Future of Software Testing
November 1, 2024 by
Data Governance Essentials: Glossaries, Catalogs, and Lineage (Part 5)
November 1, 2024 by
Low Code vs. Traditional Development: A Comprehensive Comparison
May 16, 2023 by
Monitoring Kubernetes Service Topology Changes in Real-Time
November 1, 2024 by
What the CrowdStrike Crash Exposed About the Future of Software Testing
November 1, 2024 by
Monitoring Kubernetes Service Topology Changes in Real-Time
November 1, 2024 by
Low Code vs. Traditional Development: A Comprehensive Comparison
May 16, 2023 by
What the CrowdStrike Crash Exposed About the Future of Software Testing
November 1, 2024 by
Smart Routing Using AI for Efficient Logistics and Green Solutions
November 1, 2024 by
Five IntelliJ Idea Plugins That Will Change the Way You Code
May 15, 2023 by