Integrating PostgreSQL Databases with ANF: Join this workshop to learn how to create a PostgreSQL server using Instaclustr’s managed service
Mobile Database Essentials: Assess data needs, storage requirements, and more when leveraging databases for cloud and edge applications.
The Evolution of Data Pipelines
Announcing DZone Core 2.0!
Database Systems
This data-forward, analytics-driven world would be lost without its database and data storage solutions. As more organizations continue to transition their software to cloud-based systems, the growing demand for database innovation and enhancements has climbed to novel heights. We are upon a new era of the "Modern Database," where databases must both store data and ensure that data is prepped and primed securely for insights and analytics, integrity and quality, and microservices and cloud-based architectures.In our 2023 Database Systems Trend Report, we explore these database trends, assess current strategies and challenges, and provide forward-looking assessments of the database technologies most commonly used today. Further, readers will find insightful articles — written by several of our very own DZone Community experts — that cover hand-selected topics, including what "good" database design is, database monitoring and observability, and how to navigate the realm of cloud databases.
Design Patterns
Threat Modeling
In today's digital landscape, it's not just about building functional systems; it's about creating systems that scale smoothly and efficiently under demanding loads. But as many developers and architects can attest, scalability often comes with its own unique set of challenges. A seemingly minute inefficiency, when multiplied a million times over, can cause systems to grind to a halt. So, how can you ensure your applications stay fast and responsive, regardless of the demand? In this article, we'll delve deep into the world of performance optimization for scalable systems. We'll explore common strategies that you can weave into any codebase, be it front end or back end, regardless of the language you're working with. These aren't just theoretical musings; they've been tried and tested in some of the world's most demanding tech environments. Having been a part of the team at Facebook, I've personally integrated several of these optimization techniques into products I've helped bring to life, including the lightweight ad creation experience in Facebook and the Meta Business Suite. So whether you're building the next big social network, an enterprise-grade software suite, or just looking to optimize your personal projects, the strategies we'll discuss here will be invaluable assets in your toolkit. Let's dive in. Prefetching Prefetching is a performance optimization technique that revolves around the idea of anticipation. Imagine a user interacting with an application. While the user performs one action, the system can anticipate the user's next move and fetch the required data in advance. This results in a seamless experience where data is available almost instantly when needed, making the application feel much faster and responsive. Proactively fetching data before it's needed can significantly enhance the user experience, but if done excessively, it can lead to wasted resources like bandwidth, memory, and even processing power. Facebook employs pre-fetching a lot, especially for their ML-intensive operations such as "Friends suggestions." When Should I Prefetch? Prefetching involves the proactive retrieval of data by sending requests to the server even before the user explicitly demands it. While this sounds promising, a developer must ensure the balance is right to avoid inefficiencies. A. Optimizing Server Time (Backend Code Optimizations) Before jumping into prefetching, it's wise to ensure that the server response time is optimized. Optimal server time can be achieved through various backend code optimizations, including: Streamlining database queries to minimize retrieval times. Ensuring concurrent execution of complex operations. Reducing redundant API calls that fetch the same data repeatedly. Stripping away any unnecessary computations that might be slowing down the server response. B. Confirming User Intent The essence of prefetching is predicting the user's next move. However, predictions can sometimes be wrong. If the system fetches data for a page or feature the user never accesses, it results in resource wastage. Developers should employ mechanisms to gauge user intent, such as tracking user behavior patterns or checking active engagements, ensuring that data isn't fetched without a reasonably high probability of being used. How To Prefetch Prefetching can be implemented using any programming language or framework. For the purpose of demonstration, let's look at an example using React. Consider a simple React component. As soon as this component finishes rendering, an AJAX call is triggered to prefetch data. When a user clicks a button in this component, a second component uses the prefetched data: JavaScript import React, { useState, useEffect } from 'react'; import axios from 'axios'; function PrefetchComponent() { const [data, setData] = useState(null); const [showSecondComponent, setShowSecondComponent] = useState(false); // Prefetch data as soon as the component finishes rendering useEffect(() => { axios.get('https://api.example.com/data-to-prefetch') .then(response => { setData(response.data); }); }, []); return ( <div> <button onClick={() => setShowSecondComponent(true)}> Show Next Component </button> {showSecondComponent && <SecondComponent data={data} />} </div> ); } function SecondComponent({ data }) { // Use the prefetched data in this component return ( <div> {data ? <div>Here is the prefetched data: {data}</div> : <div>Loading...</div>} </div> ); } export default PrefetchComponent; In the code above, the PrefetchComponent fetches data as soon as it's rendered. When the user clicks the button, SecondComponent gets displayed, which uses the prefetched data. Memoization In the realm of computer science, "Don't repeat yourself" isn't just a good coding practice; it's also the foundation of one of the most effective performance optimization techniques: memoization. Memoization capitalizes on the idea that re-computing certain operations can be a drain on resources, especially if the results of those operations don't change frequently. So, why redo what's already been done? Memoization optimizes applications by caching computation results. When a particular computation is needed again, the system checks if the result exists in the cache. If it does, the result is directly retrieved from the cache, skipping the actual computation. In essence, memoization involves creating a memory (hence the name) of past results. This is especially useful for functions that are computationally expensive and are called multiple times with the same inputs. It's akin to a student solving a tough math problem and jotting down the answer in the margin of their book. If the same question appears on a future test, the student can simply reference the margin note rather than work through the problem all over again. When Should I Memoize? Memoization isn't a one-size-fits-all solution. In certain scenarios, memoizing might consume more memory than it's worth. So, it's crucial to recognize when to use this technique: When the data doesn’t change very often: Functions that return consistent results for the same inputs, especially if these functions are compute-intensive, are prime candidates for memoization. This ensures that the effort taken to compute the result isn't wasted on subsequent identical calls. When the data is not too sensitive: Security and privacy concerns are paramount. While it might be tempting to cache everything, it's not always safe. Data like payment information, passwords, and other personal details should never be cached. However, more benign data, like the number of likes and comments on a social media post, can safely be memoized to improve performance. How To Memoize Using React, we can harness the power of hooks like useCallback and useMemo to implement memoization. Let's explore a simple example: JavaScript import React, { useState, useCallback, useMemo } from 'react'; function ExpensiveOperationComponent() { const [input, setInput] = useState(0); const [count, setCount] = useState(0); // A hypothetical expensive operation const expensiveOperation = useCallback((num) => { console.log('Computing...'); // Simulating a long computation for(let i = 0; i < 1000000000; i++) {} return num * num; }, []); const memoizedResult = useMemo(() => expensiveOperation(input), [input, expensiveOperation]); return ( <div> <input value={input} onChange={e => setInput(e.target.value)} /> <p>Result of Expensive Operation: {memoizedResult}</p> <button onClick={() => setCount(count + 1)}>Re-render component</button> <p>Component re-render count: {count}</p> </div> ); } export default ExpensiveOperationComponent; In the above example, the expensiveOperation function simulates a computationally expensive task. We've used the useCallback hook to ensure that the function doesn't get redefined on each render. The useMemo hook then stores the result of the expensiveOperation so that if the input doesn't change, the computation doesn't run again, even if the component re-renders. Concurrent Fetching Concurrent fetching is the practice of fetching multiple sets of data simultaneously rather than one at a time. It's similar to having several clerks working at a grocery store checkout instead of just one: customers get served faster, queues clear more quickly, and overall efficiency improves. In the context of data, since many datasets don't rely on each other, fetching them concurrently can greatly accelerate page load times, especially when dealing with intricate data that requires more time to retrieve. When To Use Concurrent Fetching? When each data is independent, and the data is complex to fetch: If the datasets being fetched have no dependencies on one another and they take significant time to retrieve, concurrent fetching can help speed up the process. Use mostly in the back end and use carefully in the front end: While concurrent fetching can work wonders in the back end by improving server response times, it must be employed judiciously in the front end. Overloading the client with simultaneous requests might hamper the user experience. Prioritizing network calls: If data fetching involves several network calls, it's wise to prioritize one major call and handle it in the foreground, concurrently processing the others in the background. This ensures that the most crucial data is retrieved first while secondary datasets load simultaneously. How To Use Concurrent Fetching In PHP, with the advent of modern extensions and tools, concurrent processing has become simpler. Here's a basic example using the concurrent {} block: PHP <?php use Concurrent\TaskScheduler; require 'vendor/autoload.php'; // Assume these are some functions that fetch data from various sources function fetchDataA() { // Simulated delay sleep(2); return "Data A"; } function fetchDataB() { // Simulated delay sleep(3); return "Data B"; } $scheduler = new TaskScheduler(); $result = concurrent { "a" => fetchDataA(), "b" => fetchDataB(), }; echo $result["a"]; // Outputs: Data A echo $result["b"]; // Outputs: Data B ?> In the example, fetchDataA and fetchDataB represent two data retrieval functions. By using the concurrent {} block, both functions run concurrently, reducing the total time it takes to fetch both datasets. Lazy Loading Lazy loading is a design pattern wherein data or resources are deferred until they're explicitly needed. Instead of pre-loading everything up front, you load only what's essential for the initial view and then fetch additional resources as and when they're needed. Think of it as a buffet where you only serve dishes when guests specifically ask for them, rather than keeping everything out all the time. A practical example is a modal on a web page: the data inside the modal isn't necessary until a user decides to open it by clicking a button. By applying lazy loading, we can hold off on fetching that data until the very moment it's required. How To Implement Lazy Loading For an effective lazy loading experience, it's essential to give users feedback that data is being fetched. A common approach is to display a spinner or a loading animation during the data retrieval process. This ensures that the user knows their request is being processed, even if the data isn't instantly available. Lazy Loading Example in React Let's illustrate lazy loading using a React component. This component will fetch data for a modal only when the user clicks a button to view the modal's contents: JavaScript import React, { useState } from 'react'; function LazyLoadedModal() { const [data, setData] = useState(null); const [isLoading, setIsLoading] = useState(false); const [isModalOpen, setIsModalOpen] = useState(false); const fetchDataForModal = async () => { setIsLoading(true); // Simulating an AJAX call to fetch data const response = await fetch('https://api.example.com/data'); const result = await response.json(); setData(result); setIsLoading(false); setIsModalOpen(true); }; return ( <div> <button onClick={fetchDataForModal}> Open Modal </button> {isModalOpen && ( <div className="modal"> {isLoading ? ( <p>Loading...</p> // Spinner or loading animation can be used here ) : ( <p>{data}</p> )} </div> )} </div> ); } export default LazyLoadedModal; In the above example, the data for the modal is fetched only when the user clicks the "Open Modal" button. Until then, no unnecessary network request is made. Once the data is being fetched, a loading message (or spinner) is displayed to indicate to the user that their request is in progress. Conclusion In today's fast-paced digital world, every millisecond counts. Users demand rapid responses, and businesses can't afford to keep them waiting. Performance optimization is no longer just a 'nice-to-have' but an absolute necessity for anyone serious about delivering a top-tier digital experience. Through techniques such as Pre-fetching, Memoization, Concurrent Fetching, and Lazy Loading, developers have a robust arsenal at their disposal to fine-tune and enhance their applications. These strategies, while diverse in their applications and methodologies, share a common goal: to ensure applications run as efficiently and swiftly as possible. However, it's important to remember that no single strategy fits all scenarios. Each application is unique, and performance optimization requires a judicious blend of understanding the application's needs, recognizing the users' expectations, and applying the right techniques effectively. It's an ongoing journey of refinement and learning.
What Is BFF? The Backend for Frontend (BFF) design pattern involves creating a backend service layer specifically tailored to the requirements of a particular frontend application or a set of closely related frontends. While traditionally this approach has been contrasted with a monolithic backend serving multiple frontends, it’s worth noting that a BFF can indeed serve multiple frontends, especially when tools like GraphQL (GQL) are utilized. The key is that these frontends have similar requirements and data needs. Regardless of the number of frontends, the primary advantage of the BFF is its ability to be optimized for the specific needs and context of its consumer(s). Here is an example of what could be architecture including a BFF pattern : Controllers: These are the entry points for incoming client requests. Each controller handles a specific set of endpoints, ensuring a clean separation of concerns. For instance, a ProductController might handle all product-related operations for the frontends. Services: Behind the controllers, we have service layers that perform business logic. These services coordinate a range of operations, ensuring seamless alignment between the data’s DTOs and the front end’s requirements. Additionally, they can leverage multithreading to enhance data request performance. For instance, they ProductService might coordinate retrieving product details, calculating promotions or discounts, and interfacing with inventory management. Within this service, one could expect methods like findProductById, applyDiscountToProduct, or getProductInventoryStatus. Data Mapping: Within the services, specialized mapping functions transform data between the domain model and the DTOs (Data Transfer Objects) that the API returns. This ensures that the front end receives data in the most appropriate format, tailored to its needs. Repositories: The repositories interact directly with our data sources, abstracting away the specifics of data recovery. For example, a ProductRepository might house methods for retrieving, storing, or modifying product information in the database, fetching related documents for the product, or interfacing with partner APIs. Error Handling: Throughout the architecture, standardized error handling mechanisms ensure that any issues are captured and reported back to the client in a very specific manner. This architecture promotes separation of concerns, making the BFF flexible and maintainable. Any interface could be easily added or modified without affecting the front end. Benefits and Trade-Offs Here are a few features with their benefits and trade-offs. Avoid Coupling Benefits Framework Independence: A BFF can be implemented in a different technology or framework than the front end or other BFFs, allowing developers to select the most appropriate tool for each specific front end. This becomes especially crucial in an era with a plethora of frontend frameworks and their potentially short lifespans. Decoupling Functional Code: Separating the backend-for-frontend from the frontend itself prevents tight coupling between functional logic and the frontend template, allowing each to evolve separately. It’s an unfavorable pattern seen in numerous front-end projects, often resulting in complex systems that are challenging to migrate. Trade-Offs Resource Flexibility: Implementing BFF often requires more versatile resources. The BFF may not use the same technology stack as the front end, necessitating developers to be skilled in multiple technologies or for teams to collaborate closely. Potential Functional Code Leakage: If not designed carefully, BFFs can start integrating too much business logic that ideally belongs to the primary API. This can lead to challenges in maintaining consistency and can duplicate logic across multiple BFFs. On this specific note, Behavior Driven Development can be invaluable. By employing tools like Karate or Selenium, you can discern the differences in implementation. Network Optimization Benefits Tailored Data Retrieval: By understanding exactly what the front end requires, a BFF can ensure that only necessary data is retrieved and sent, avoiding over-fetching or under-fetching of data. Leveraging Tools: With the BFF pattern, there’s an opportunity to use tools like GraphQL, which allows the front end to specify the exact data structures it requires. Trade-Offs Unnecessary calls: Improper application of the pattern could result in unnecessary calls, particularly if developers overlook design considerations, leading to network congestion. However, it’s worth highlighting that in the absence of BFF, such a scenario would have led to I/O overload. Data Segregation Benefits Custom Data Storage: BFFs allow for data to be stored in a way that is specifically optimized for the front end’s needs. For instance, data that supports hot configurations or client-specific settings can be stored separately. Trade-Offs Risk of Data Leaks: There’s a heightened risk of exposing sensitive data if not managed appropriately, as the BFF might interact with multiple data sources or expose data that’s tailored to front-end needs. Security Management Benefits Tailored Security Protocols: BFF enables fine-tuned security implementations, supporting both authorization logic and functional segregation. This ensures data protection and only exposes necessary data to the frontend, without restriction to primary APIs. Trade-Offs Reliance on API Security: While BFF handles frontend-specific security, the primary API still must implement basic security mechanisms. This means that the API exposes data without frontend-specific security but should still use basic methods like authentication. Quality Through Testing Benefits Focused Test Scenarios: With a BFF, tests can target specific scenarios and use cases unique to each front. This results in more accurate and relevant test cases, ensuring that the front end receives precisely what it expects. Rapid Feedback Loop: Since the BFF is tailored to the front end’s needs, automated testing can provide quicker feedback to developers. This can lead to faster iteration and more efficient debugging. Often, the adoption of unit tests is overlooked in frontend frameworks, given the lack of a dominant testing solution. This contrasts with frameworks typically favored for BFF, which tend to encourage and simplify unit test implementation. Enhanced End-to-End Testing: The BFF allows for end-to-end tests that closely mimic the real-world user experience. By simulating frontend requests, testers can gauge the entire data flow, from the BFF to the primary backend. While one could contend that these aren’t genuine end-to-end tests, their existence, easier maintenance, and reduced likelihood of becoming flaky make them invaluable. Trade-Offs Duplication of Efforts: There could be overlaps between the tests for the main backend and the BFF or even the front end. This redundancy might lead to wasted resources and time if not managed correctly. Maintenance Overhead: As the front end evolves, so will its requirements. The BFF’s tests must be continuously updated to reflect these changes, which could increase the maintenance burden. Risk of Over-Reliance: Teams might be tempted to overly rely on the BFF’s tests and overlook or downplay the significance of broader integration tests or tests on the main backend. Conclusion The BFF pattern has emerged as an innovative approach to bridge the gap between backend services and frontends, offering customization and efficiency tailored to the specific needs of each frontend or a set of closely related frontends. Its benefits, from streamlined network optimization to enhanced security protocols and focused testing scenarios, have been increasingly recognized in today’s fast-paced software development landscape. However, like any architectural pattern, it comes with its trade-offs, which necessitates a well-informed and judicious adoption strategy. By understanding its strengths and weaknesses and aligning them with project requirements, development teams can leverage the BFF pattern to achieve more responsive, maintainable, and efficient applications. As the software ecosystem continues to evolve, patterns like BFF exemplify the industry’s drive towards more modular, adaptable, and user-centric solutions.
This is an article from DZone's 2023 Data Pipelines Trend Report.For more: Read the Report Data management is an ever-changing landscape, but throughout its history, a few use cases have driven most of the value and hence the majority of innovation. The following is a list of the key features enabled by effective data management: Informed decision-making Regulatory compliance Improved efficiency Data quality and security Competitive advantage As data volume within organizations has scaled ever larger, the underlying technologies have had to evolve and adapt to keep up with the ever-increasing demand imposed by such growth. Traditionally, the majority of data was consolidated into a centrally managed platform known as a data warehouse. However, over the last decade, new technologies and data strategies have emerged in an attempt to provide more cost-effective solutions. Two new paradigms have emerged as alternatives to the traditional data warehouse stack: the data lake and the data lakehouse. This article will outline what each of these data management strategies entails and how they map to various selection criteria such as cost, data volume, data integration, security and compliance, ease of use, and a number of other pivotal requirements. Data Warehouse vs. Data Lake vs. Data Lakehouse Data warehouses played a crucial role in data-driven organizations for years, supporting business intelligence and historical data analysis. However, as data volumes grew, their integrated storage couldn't scale cost-effectively. This led to the emergence of data lakes, shifting focus to scalable object storage over highly optimized solutions. Data lakes enabled storing vast data amounts, including unstructured or semi-structured data. However, ingestion efficiency and integration with traditional tools posed challenges. In 2019, the term "data lakehouse" was introduced to bridge the gap between data warehouses and data lakes. The goal is a unified platform for structured and unstructured data, fostering collaboration among data professionals. The below table summarizes the main decision points and how each architecture addresses (or doesn't) that item: Data Management Architecture Feature Comparison Criteria Data Warehouse Data Lake Data Lakehouse Data type support Primarily structured Diverse (structured, semi-structured, unstructured) Diverse (structured, semi-structured, unstructured) Schema enforcement Enforced schema Schema-on-read Structured and flexible Data processing High-performance SQL Flexibility for exploration, ad hoc analysis Both high-performance SQL and exploration Data integration Structured ETL Supports batch and real-time ingestion Supports batch and real-time ingestion Data storage Structured, columnar Raw and native format Raw and structured format Data quality and governance Strong governance Requires careful management Supports governance with flexibility Use cases Structured analytics, complex reporting Data exploration, machine learning, raw data processing Combines structured analytics and data exploration Query performance High-speed, low latency Varied, depending on tools and tuning High-performance with flexibility Historical analysis Yes Yes Yes Scalability Limited for very large data Scales horizontally Scales for data growth Cost-effectiveness Can be expensive Cost-effective for storing raw data Balances cost and performance Regulatory compliance Often supported Requires implementation Supports compliance measures Vendor ecosystem Well-established Varied and expanding Evolving and expanding User profiles Data analysts, business intelligence Data engineers and scientists, analysts Data engineers and scientists, analysts Real-time analytics Possible but limited Varies depending on tools Supports real-time analytics Schema evolution Requires schema changes Flexible with schema evolution Supports both schema changes and structure Data exploration Limited capability Flexible for exploration Supports both analytics and exploration Hybrid architecture Can be integrated with data lakes Can be combined with data warehouses Combines elements of both Table 1 Data Warehouse Data warehouses excel at processing structured data with a well-defined schema. With these restrictions, a data warehouse can offer highly efficient querying capabilities. Furthermore, they have strong integration with business intelligence tooling, and have robust integrated support for data quality and governance. The following table gives an overview of data warehouse aspects and how they may benefit or detract from a given use case: Data Warehouse Aspect Coverage Aspect Benefits Weaknesses Structured data Efficient storage and management Limited support for unstructured data Optimized queries High-performance querying Expensive Data consistency Enforced data consistency Inflexible schema Table 2 Benefits of Using a Data Warehouse Data warehouses provide several key advantages: Excel in efficiently storing and managing structured data, making complex analytics accessible through predefined schemas that enhance user-friendliness Offer high-performance querying capabilities, enabling the execution of complex analytical tasks and scaling to maintain query speed as data volumes expand Prioritize data consistency by enforcing structured schemas and implementing robust data governance measures, ensuring data integrity and reliability, making them a reliable single source of truth for decision-making within organizations Limitations of Using a Data Warehouse The weaknesses of a data warehouse revolve around cost, inflexible schema, and limited support for unstructured data. Implementing and maintaining a data warehouse can be expensive, with substantial initial setup and ongoing operational costs. Its reliance on a predefined schema makes it less adaptable to changes in data structure or the inclusion of new data sources, potentially hindering agility. Additionally, data warehouses are primarily designed for structured data, which limits their ability to efficiently handle unstructured or semi-structured data, potentially missing out on valuable insights from diverse data sources. Data Lake The data lake architecture evolved as a response to the rising costs of operating a data warehouse. A primary goal of this design was to lower the bar, in terms of cost, for storing vast amounts of data. Although data lakes provide a low price point for storage, they lack some of the integrations and features that have been developed in data warehouses over the years. Below are some of the trade-offs to consider when building a data lake: Data Lake Aspect Coverage Aspect Benefits Limitations Scalability Highly scalable, handles massive data volumes Data quality concerns Cost-effectiveness Cost-effective for storing raw data Complexity in data processing Storage of raw and unstructured data Accommodates diverse data types Potential data silos Table 3 Benefits of Using a Data Lake A data lake architecture offers distinct advantages for organizations seeking to harness their data effectively: Provides exceptional scalability, effortlessly accommodating massive data volumes as businesses grow Proves highly cost-effective, offering a budget-friendly solution for storing raw data in its native format Excels at storage, allowing organizations to effortlessly ingest and manage diverse data types, including unstructured and semi-structured data This versatility enables businesses to leverage their entire data ecosystem, promoting innovation and data-driven decision-making while keeping costs in check. Limitations of Using a Data Lake Despite its strengths, a data lake architecture is not without its challenges. It often introduces complexity in data processing, as the flexibility it offers can lead to difficulties in data organization, quality assurance, and integration. Moreover, there is a risk of potential data silos within a data lake, where data may become fragmented and less accessible, hindering the ability to derive valuable insights. Data discovery becomes a concern. To maximize the benefits of a data lake, organizations must carefully plan their data governance and integration strategies to mitigate these challenges effectively. Data Lakehouse The data lakehouse paradigm seeks to balance the benefits and trade-offs of a data warehouse and a data lake. This is accomplished by providing an integrated solution on top of what was traditionally data lake components. The goal is to provide the scalability, flexibility, and cost benefits of a data lake while still offering the performance, data governance, and user-friendliness of a data warehouse. DATA LAKEHOUSE ASPECT COVERAGE Aspect Benefits Limitations Hybrid architecture Combines data warehouse and data lake capabilities Architectural complexity Cost-to-performance flexibility Offers cost-effective scalability with high performance Potential performance issues Real-time analytics Supports real-time analytics Evolving technology landscape Table 4 Benefits of Using a Data Lakehouse A data lakehouse architecture presents a compelling solution for organizations aiming to unlock the full potential of their data. By seamlessly combining the robust features of a data warehouse and the flexibility of a data lake, it offers a comprehensive data management ecosystem. One of its standout advantages lies in its cost-to-performance flexibility, allowing businesses to balance their data storage and processing needs efficiently, optimizing both cost-effectiveness and performance. Additionally, the data lakehouse empowers organizations with real-time analytics capabilities, enabling them to make data-driven decisions and respond swiftly to changing trends and opportunities. This amalgamation of features positions the data lakehouse as a versatile and powerful solution for modern data management and analytics needs. Limitations of Using a Data Lakehouse A data lakehouse does come with certain limitations. One key concern is architectural complexity, as the integration of these diverse features can lead to intricate data management structures, requiring thorough planning and management. Potential performance issues may arise due to the combination of features, and organizations must carefully optimize their data processing to prevent bottlenecks. Additionally, the ever-evolving technology landscape means that staying up-to-date with the latest advancements and best practices is essential for maximizing the benefits of a data lakehouse. Despite these limitations, its capacity to provide a comprehensive data solution often outweighs these challenges for organizations seeking to harness the full potential of their data assets. The Future of Data Storage The future of data management and storage is poised to undergo transformative changes driven by evolving trends. One of the pivotal developments is the growing emphasis on interoperability between existing data architectures, including data warehouses, data lakes, and data lakehouses. Organizations are recognizing the need to seamlessly integrate these technologies to harness the full spectrum of their data assets efficiently. Simultaneously, data governance and data quality are becoming paramount concerns, driven by the exponential growth of data volumes and the increasing importance of compliance and data accuracy. As organizations navigate this landscape, they are likely to adopt comprehensive data governance strategies, leveraging automation and AI-powered tools to enhance data quality, traceability, and privacy. Overall, the future of data management and storage will revolve around achieving a harmonious synergy between diverse data architectures, underpinned by robust data governance practices to ensure the reliability and integrity of data assets in an ever-evolving digital ecosystem. Evolving Technologies Machine learning and AI technologies will play a pivotal role in automating data processing, analysis, and decision-making, enabling organizations to derive deeper insights from their data assets. Moreover, the rise of edge computing and the Internet of Things (IoT) will necessitate real-time data management capabilities, prompting the adoption of cloud-native solutions and distributed data architectures. As data privacy and security concerns grow, robust data governance frameworks will become imperative, ensuring that organizations maintain compliance with evolving regulations while safeguarding sensitive data. Collaboration across departments and data-driven cultures will be pivotal, with data democratization empowering a broader range of employees to harness data for informed decision-making. In this dynamic landscape, the ability to adapt swiftly to emerging technologies and data management trends will be the cornerstone of success in the data-driven future. Hybrid Solutions Hybrid solutions in data management architecture overcome limitations of different storage types. Such hybrid solutions are becoming more popular, and are starting to precipitate fully new designs. A model that exemplifies this concept involves not just the separation of compute and storage, as often seen in data lakes, but also a distinct storage platform integrated separately from the compute layer. This has played out most visibly in the emergence of open table formats such as Iceberg, Hudi, and Delta Lake. Conclusion The decision between a data warehouse, data lake, or data lakehouse involves a complex set of trade-offs. Data warehouses excel in structured analytics but may lack flexibility for diverse data types. Data lakes offer versatility but require careful data governance. The emerging data lakehouse concept seeks to balance these trade-offs by combining features of both, offering a unified platform; however, this choice is not one-size-fits-all. Organizations must weigh their specific business needs and adapt their data management strategies accordingly, considering factors such as data type diversity, scalability, cost, and the evolving technology landscape. The key lies in making informed decisions that align with current and future data requirements and recognizing the importance of ongoing adaptation in the dynamic world of data management. This is an article from DZone's 2023 Data Pipelines Trend Report.For more: Read the Report
This is an article from DZone's 2023 Data Pipelines Trend Report.For more: Read the Report Data-driven design is a game changer. It uses real data to shape designs, ensuring products match user needs and deliver user-friendly experiences. This approach fosters constant improvement through data feedback and informed decision-making for better results. In this article, we will explore the importance of data-driven design patterns and principles, and we will look at an example of how the data-driven approach works with artificial intelligence (AI) and machine learning (ML) model development. Importance of the Data-Driven Design Data-driven design is crucial as it uses real data to inform design decisions. This approach ensures that designs are tailored to user needs, resulting in more effective and user-friendly products. It also enables continuous improvement through data feedback and supports informed decision-making for better outcomes. Data-driven design includes the following: Data visualization – Aids designers in comprehending trends, patterns, and issues, thus leading to effective design solutions. User-centricity – Data-driven design begins with understanding users deeply. Gathering data about user behavior, preferences, and challenges enables designers to create solutions that precisely meet user needs. Iterative process – Design choices are continuously improved through data feedback. This iterative method ensures designs adapt and align with user expectations as time goes on. Measurable outcomes – Data-driven design targets measurable achievements, like enhanced user engagement, conversion rates, and satisfaction. This is a theory, but let's reinforce it with good examples of products based on data-driven design: Netflix uses data-driven design to predict what content their customers will enjoy. They analyze daily plays, subscriber ratings, and searches, ensuring their offerings match user preferences and trends. Uber uses data-driven design by collecting and analyzing vast amounts of data from rides, locations, and user behavior. This helps them optimize routes, estimate fares, and enhance user experiences. Uber continually improves its services by leveraging data insights based on real-world usage patterns. Waze uses data-driven design by analyzing real-time GPS data from drivers to provide accurate traffic updates and optimal route recommendations. This data-driven approach ensures users have the most up-to-date and efficient navigation experience based on the current road conditions and user behavior. Common Data-Driven Architectural Principles and Patterns Before we jump into data-driven architectural patterns, let's reveal what data-driven architecture and its fundamental principles are. Data-Driven Architectural Principles Data-driven architecture involves designing and organizing systems, applications, and infrastructure with a central focus on data as a core element. Within this architectural framework, decisions concerning system design, scalability, processes, and interactions are guided by insights and requirements derived from data. Fundamental principles of data-driven architecture include: Data-centric design – Data is at the core of design decisions, influencing how components interact, how data is processed, and how insights are extracted. Real-time processing – Data-driven architectures often involve real-time or near real-time data processing to enable quick insights and actions. Integration of AI and ML – The architecture may incorporate AI and ML components to extract deeper insights from data. Event-driven approach – Event-driven architecture, where components communicate through events, is often used to manage data flows and interactions. Data-Driven Architectural Patterns Now that we know the key principles, let's look into data-driven architecture patterns. Distributed data architecture patterns include the data lakehouse, data mesh, data fabric, and data cloud. Data Lakehouse Data lakehouse allows organizations to store, manage, and analyze large volumes of structured and unstructured data in one unified platform. Data lakehouse architecture provides the scalability and flexibility of data lakes, the data processing capabilities, and the query performance of data warehouses. This concept is perfectly implemented in Delta Lake. Delta Lake is an extension of Apache Spark that adds reliability and performance optimizations to data lakes. Data Mesh The data mesh pattern treats data like a product and sets up a system where different teams can easily manage their data areas. The data mesh concept is similar to how microservices work in development. Each part operates on its own, but they all collaborate to make the whole product or service of the organization. Companies usually use conceptual data modeling to define their domains while working toward this goal. Data Fabric Data fabric is an approach that creates a unified, interconnected system for managing and sharing data across an organization. It integrates data from various sources, making it easily accessible and usable while ensuring consistency and security. A good example of a solution that implements data fabric is Apache NiFi. It is an easy-to-use data integration and data flow tool that enables the automation of data movement between different systems. Data Cloud Data cloud provides a single and adaptable way to access and use data from different sources, boosting teamwork and informed choices. These solutions offer tools for combining, processing, and analyzing data, empowering businesses to leverage their data's potential, no matter where it's stored. Presto exemplifies an open-source solution for building a data cloud ecosystem. Serving as a distributed SQL query engine, it empowers users to retrieve information from diverse data sources such as cloud storage systems, relational databases, and beyond. Now we know what data-driven design is, including its concepts and patterns. Let's have a look at the pros and cons of this approach. Pros and Cons of Data-Driven Design It's important to know the strong and weak areas of the particular approach, as it allows us to choose the most appropriate approach for our architecture and product. Here, I gathered some pros and cons of data-driven architecture: PROS AND CONS OF DATA-DRIVEN DESIGN Pros Cons Personalized experiences: Data-driven architecture supports personalized user experiences by tailoring services and content based on individual preferences. Privacy concerns: Handling large amounts of data raises privacy and security concerns, requiring robust measures to protect sensitive information. Better customer understanding: Data-driven architecture provides deeper insights into customer needs and behaviors, allowing businesses to enhance customer engagement. Complex implementation: Implementing data-driven architecture can be complex and resource-intensive, demanding specialized skills and technologies. Informed decision-making: Data-driven architecture enables informed and data-backed decision-making, leading to more accurate and effective choices. Dependency on data availability: The effectiveness of data-driven decisions relies on the availability and accuracy of data, leading to potential challenges during data downtimes. Table 1 Data-Driven Approach in ML Model Development and AI A data-driven approach in ML model development involves placing a strong emphasis on the quality, quantity, and diversity of the data used to train, validate, and fine-tune ML models. A data-driven approach involves understanding the problem domain, identifying potential data sources, and gathering sufficient data to cover different scenarios. Data-driven decisions help determine the optimal hyperparameters for a model, leading to improved performance and generalization. Let's look at the example of the data-driven architecture based on AI/ML model development. The architecture represents the factory alerting system. The factory has cameras that shoot short video clips and photos and send them for analysis to our system. Our system has to react quickly if there is an incident. Below, we share an example of data-driven architecture using Azure Machine Learning, Data Lake, and Data Factory. This is only an example, and there are a multitude of tools out there that can leverage data-driven design patterns. The IoT Edge custom module captures real-time video streams, divides them into frames, and forwards results and metadata to Azure IoT Hub. The Azure Logic App watches IoT Hub for incident messages, sending SMS and email alerts, relaying video fragments, and inferencing results to Azure Data Factory. It orchestrates the process by fetching raw video files from Azure Logic App, splitting them into frames, converting inferencing results to labels, and uploading data to Azure Blob Storage (the ML data repository). Azure Machine Learning begins model training, validating data from the ML data store, and copying required datasets to premium blob storage. Using the dataset cached in premium storage, Azure Machine Learning trains, validates model performance, scores against the new model, and registers it in the Azure Machine Learning registry. Once the new ML inferencing module is ready, Azure Pipelines deploys the module container from Container Registry to the IoT Edge module within IoT Hub, updating the IoT Edge device with the updated ML inferencing module. Figure 1: Smart alerting system with data-driven architecture Conclusion In this article, we dove into data-driven design concepts and explored how they merge with AI and ML model development. Data-driven design uses insights to shape designs for better user experiences, employing iterative processes, data visualization, and measurable outcomes. We've seen real-world examples like Netflix using data to predict content preferences and Uber optimizing routes via user data. Data-driven architecture, encompassing patterns like data lakehouse and data mesh, orchestrates data-driven solutions. Lastly, our factory alerting system example showcases how AI, ML, and data orchestrate an efficient incident response. A data-driven approach empowers innovation, intelligent decisions, and seamless user experiences in the tech landscape. This is an article from DZone's 2023 Data Pipelines Trend Report.For more: Read the Report
Coined quite recently, the term microfrontend designates for a GUI (Graphical User Interface) what the one microservice designates for classical services, i.e., the decomposition process of the different application's parts and components. More importantly, it not only applies to GUIs in general but to a more specific category of GUIs named SPA (Single Page Application). This is important because if there existed several techniques aiming at separating the different parts and components of a web application in general when it comes to SPAs, the story would become a bit more difficult. As a matter of fact, separating the different parts and components of a general web application often means separating its different pages. This process becomes more tricky for SPAs, as it concerns the separation of the different visual fragments of the application's single page. This requires a finer granularity and a more intimate orchestration of the content elements. The microfrontend concept adds more complexity to the web applications development field, which is already fairly complex by itself. The SPA model, as well as the emergence of the so-called JavaScript or TypeScript-based web application platforms and frameworks, brought to the picture a high degree of intricacy, requiring developers to have a vast amount of background knowledge, from HTML and CSS to advanced aspects of Angular, React, Node, Vue, and jQuery. In the Java world, a new category of software developers has come to light: the full-stack developers who not only need to deal with the grief of mastering Java, be it standard or enterprise, and all its underlying sub-technologies like Servlet, REST, CDI, JPA, JMS and many others, currently placed under the auspices of Jakarta EE, but who, increasingly, is required to master things like WebPack, SystemJS, Bower, Gulp and others Yeoman. Not to mention any more Spring, Quarkus, Micronaut, or Helidon. In former times, when dinosaurs still populated the Earth, the enterprise-grade Java applications development only required the knowledge of a single technology: Java with possibly its enterprise extensions, appointed successively as J2EE, Java EE, and finally Jakarta EE. Unless it was Spring, the applications and services were deployed on Jakarta EE-compliant application servers, like Glassfish, Payara, Wildfly, JBoss, WebLogic, WebSphere, etc. These application servers were providing out-of-the-box all the required implementations of the above-mentioned specifications. Among these specifications, Jakarta Faces (formerly called JSF: Java Server Faces) was meant to offer a framework that facilitates and standardizes the development of web applications in Java. The Jakarta Faces history goes back to 2001 to its initial JSR (Java Specifications Request) 127. At that time, another web framework, known under the name of Struts and available under an Apache open-source license, was widely popular. As it sometimes happens in the web frameworks space, the advent of Jakarta Faces was perceived by the Apache community as being in conflict with Struts and, in order to resolve this alleged conflict, a long and heavy negotiation process of several years between Sun Microsystems and the Apache community, was required. Finally, Sun agreed to lift the restrictions preventing JSRs from being independently implemented under an open-source license, and the first implementation, named RI (Reference Implementation), was provided in 2003. Jakarta Faces was generally well received despite a market crowded with competitors. Its RI was followed by other implementations over the years, starting with Apache MyFaces in early 2004 and continuing with RedHat RichFaces in 2005, PrimeTek PrimeFaces in 2008, ICEsoft ICEfaces and Oracle ADF Faces in 2009, OmniFaces in 2012, etc. The specifications have evolved as well, from the 1.0 released in 2001 to the 4.0 released in 2022. Hence, more than 20 years of history in order to advent to the last Jakarta Faces release 4.0, a part of the Jakarta EE 10 specifications, named Mojara. The software history is sometimes convoluted. In 2010, Oracle acquired Sun Microsystems and became the owner of the Java trademark. All along the time period that they were under the Oracle stewardship, the Java EE specifications were in a kind of status quo before becoming Eclipse Jakarta EE. The company didn't really manage to set up a dialogue with users, communities, work groups, and all those involved in the recognition and promotion of the Java enterprise-grade services. Their evolution requests and expectations were ignored by the editor, who didn't know how to deal with their new responsibility as the Java/Jakarta EE owner. In such a way that little by little, this has led to a guarded reaction from software architects and developers, who began to prefer and adopt alternative technological solutions to application servers. While trying to find alternative solutions to Jakarta EE and to remedy issues like the apparent heaviness and the expansive prices of application servers, many software professionals have adopted Spring Boot as a development platform. And since they needed Jakarta EE implementations for even basic web applications, they deployed these applications in open-source servlet engines like Tomcat, Jetty, or Undertow. For more advanced features than just servlets, like JPA or JMS, Spring Boot provides integration with Active MQ or Hibernate. And should more advanced features be required, like JTA, for example, these software professionals were going fishing on the internet for free third-party implementations like Atomikos and, in the absence of an official integration, they tried to integrate by these features on their servlet engine, with all the risks that this entails. Other solutions, closer to real Jakarta EE alternatives, have emerged as well and, among them, Netty, Quarkus, Micronaut are the best-known and most popular. All these solutions were based on a couple of software design principles, like single concern, discrete boundaries, transportability across runtimes, auto-discovery, etc., which were known since the dawn of time. But because the software industry continuously needs new names, the new name that has been found for these alternative solutions is "microservices." More and more microservice architecture-based applications have appeared during the next few years, to such an extent that the word "microservice" became one of the most common buzzwords in the software industry. In order to optimize and standardize the microservices technology, the Eclipse Foundation decided to apply to microservices the same process that was used in order to design the Jakarta EE specifications. The Eclipse MicroProfile was born. But all these convolutions have definitely impacted the web framework technologies. While the high majority of the Java enterprise-grade applications were using Jakarta Faces for their web tier, switching from a software architecture based on Jakarta EE-compliant application servers to microservices resulted in a phasing-out of these architectures in favor of some more lightweight ones, often based on Eclipse Microprofile specifications. And since Jakarta Faces components needed an application server to be deployed on, other lighter alternatives, based on JavaScript or TypeScript libraries, like Angular, Vue, ExtJS, jQuery, and others, have been adopted to make up for its absence. Nowadays, most Java enterprise applications adopt the software architecture depicted below: While these microservices might be implemented using different frameworks like Spring Boot, the most natural choice is probably Quarkus. As a matter of fact, Quarkus is one of the most attractive Eclipse Microprofile implementations, not only thanks to its high degree of compliance with the specifications but also due to its extensions and its capacity to generate native code, which makes it the Supersonic and the Subatomic Java framework. As for the front end, it typically might be implemented in Angular. In order to achieve such an implementation, two development teams are generally required: A Frontend team specialized in TypeScript, Angular, CSS, and HTML development, using Node.js as a deployment platform, NPM as a build tool, Bower as a dependency management, Gulp as a streaming system, Karma and Jasmine for testing, WebPack as a code bundler, and probably many others. A Backend team specialized in Java development using the Eclipse Microprofile specifications, as well as different Jakarta EE implementations of sub-technologies like Jakarta REST, Jakarta Persistence, Jakarta Messaging, Jakarta Security, Jakarta JSON Binding, etc. A single team of fullstack developers covering all the above-mentioned fields and technologies might also do it, but this is less usual. In any case, as you can observe, it becomes quite difficult to build a Java enterprise-grade project team as it requires at least two categories of profiles, and, given this technology's complexity, the mentioned profiles should better be senior. This situation sharply contrasts with what happened in the former times when the Frontend could have been implemented using Jakarta Faces and, hence, a single Java development team was able to take charge of such an enterprise-grade project. Jakarta Faces is a great web framework whose implementations offer hundreds of ready-to-use widgets and other visual controls. Compared with Angular, where the visual components are a part of external libraries, like Material, NG-Bootstrap, Clarity, Kendo, Nebular, and many others, Jakarta Faces implementations not only provide ways more widgets and features but also are part of the official JSR 372 specifications and, in this respect, they are standard, as opposed to the mentioned libraries, which evolve with their authors prevailing moods, without any guarantee of consistency and stability. One of the criteria that has formed the basis of the decision of many organizations to switch from Jakarta Faces web applications to JavaScript/TypeScript frameworks was client-side rendering. It was considered that the server-side rendering, which is the way the Jakarta Faces works, is less performant than the client-side rendering provided by the browser-based applications. This argument has to be taken with a grain of salt: Client-side rendering means rendering pages directly in the browser with JavaScript. All logic, data fetching, templating, and routing are handled by the client. The primary downside of this rendering type is that the amount of JavaScript required tends to grow as an application grows, which can have negative effects on a page's capacity to consistently respond to user inputs. This becomes especially difficult with the addition of new JavaScript libraries, polyfills, and third-party code, which compete for processing power and must often be processed before a page's content can be rendered. Server-side rendering generates the full HTML for a page on the server in response to navigation. This avoids additional round-trips for data fetching and templating on the client since it's handled before the browser gets a response. Server-side rendering generally reduces the time required for the page content to become visible. It makes it possible to avoid sending lots of JavaScript to the client. This helps to reduce a page's TBT (Total Blocking Time), which can also lead to a lower average response time as the main thread is not blocked as often during page load. When the main thread is blocked less often, user interactions will have more opportunities to run sooner. With server-side rendering, users are less likely to be left waiting for CPU-bound JavaScript to run before they can access a page. Accordingly, the argument consisting of saying that the server-side rendering is bad while the client-side one would be better is just a myth. However, there is one potential trade-off here: generating pages on the server might take time, which may result in a higher TTFB (Time to First Byte). This is the time between the user's click instant and the one when the first content byte comes in. And admitting that this metric impacts more important others, like requests per second or latency and uptime, it's difficult to assert that the web application's average response time is really affected in a user-sensible way. Consequently, it appears clearly from this analysis that developing Java web applications using server-side rendering frameworks, like Jakarta Faces, not only leads to less performant applications, but it's also much simpler and less expansive. This approach doesn't require so many different technology stacks as its JavaScript/TypeScript-based alternatives. The development teams don't need several categories of profiles, and the same developer can directly contribute to both the front end and the back end without having to operate any paradigm switch. This last argument is all the more important as Java developers, concerned by things like multi-threading, transaction management, security, etc., aren't comfortable when it comes to command programming languages that have been designed to run in a browser. So the good news here is that if, like me, you're nostalgic for Jakarta Faces, for now on, you can start implementing your Frontends with it without the need for any Jakarta EE-compliant application server. That's because Quarkus, our famous Supersonic Subatomic Java platform, provides a Jakarta Faces extension, allowing you to write beautiful Frontends like in the old good times. At Melloware Inc., they provide a PrimeFaces extension for Quarkus, as described here. You'll find in the mentioned GIT repository a showcase application that demonstrates, with consistent code examples, how to use every single PrimeFaces widget. Please follow the guide in the README.md file to build and run the showcase on both an application server, like Wildfly, and in Quarkus. You'll tell me what it feels like there! Now, to come back to the microfrontend notion, which was our main concern at the beginning of this post, Michael Geers has written a well-documented article, as well as a book, in which he exemplifies the most modern trends to build rich and powerful SPAs. But far from really demystifying the concept, these works show how complex the microfrontend topic is by offering us an extensive journey in a new world populated by strange creatures like Self Contained Systems (SCS), Verticalized Systems, or Documents to Applications Continuum. Far from pretending to be able to clarify how all these new paradigms come into the overall landscape of web application development, if I'd have to resume in a single statement what the microfrontends essentially is, I'd define them by quoting Michael: A composition of features which are owned by independent teams. Each team has a distinct area of business or mission it cares about and specializes in. A team is cross functional and develops its features end-to-end, from database to user interface. The figure below tries to illustrate this concept. After reading this definition, I can't refrain from thinking that it fits so well to the Jakarta Faces Custom Components concept, which, as its name implies, lets you create brand new custom visual components that you can plug into your applications that different independent teams can own and specializes into, etc. As luck would have it! :-).
Artificial intelligence (AI) is transforming various industries and changing the way businesses operate. Although Python is often regarded as the go-to language for AI development, Java provides robust libraries and frameworks that make it an equally strong contender for creating AI-based applications. In this article, we explore using Java and Gradle for AI development by discussing popular libraries, providing code examples, and demonstrating end-to-end working examples. Java Libraries for AI Development Java offers several powerful libraries and frameworks for building AI applications, including: Deeplearning4j (DL4J) - A deep learning library for Java that provides a platform for building, training, and deploying neural networks, DL4J supports various neural network architectures and offers GPU acceleration for faster computations. Weka - A collection of machine learning algorithms for data mining tasks, Weka offers tools for data pre-processing, classification, regression, clustering, and visualization. Encog - A machine learning framework supporting various advanced algorithms, including neural networks, support vector machines, genetic programming, and Bayesian networks Setting up Dependencies With Gradle To begin AI development in Java using Gradle, set up the required dependencies in your project by adding the following to your build.gradle file: Groovy dependencies { implementation 'org.deeplearning4j:deeplearning4j-core:1.0.0-M1.1' implementation 'nz.ac.waikato.cms.weka:weka-stable:3.8.5' implementation 'org.encog:encog-core:3.4' } Code Examples Building a Simple Neural Network With DL4J This example demonstrates creating a basic neural network using the Deeplearning4j (DL4J) library. The code sets up a two-layer neural network architecture consisting of a DenseLayer with 4 input neurons and 10 output neurons, using the ReLU activation function, and an OutputLayer with 10 input neurons and 3 output neurons, using the Softmax activation function and Negative Log Likelihood as the loss function. The model is then initialized and can be further trained on data and used for predictions. Java import org.deeplearning4j.nn.api.OptimizationAlgorithm; import org.deeplearning4j.nn.conf.MultiLayerConfiguration; import org.deeplearning4j.nn.conf.NeuralNetConfiguration; import org.deeplearning4j.nn.conf.layers.DenseLayer; import org.deeplearning4j.nn.conf.layers.OutputLayer; import org.deeplearning4j.nn.multilayer.MultiLayerNetwork; import org.deeplearning4j.nn.weights.WeightInit; import org.nd4j.linalg.activations.Activation; import org.nd4j.linalg.learning.config.Sgd; import org.nd4j.linalg.lossfunctions.LossFunctions; public class SimpleNeuralNetwork { public static void main(String[] args) { MultiLayerConfiguration conf = new NeuralNetConfiguration.Builder() .seed(123) .optimizationAlgo(OptimizationAlgorithm.STOCHASTIC_GRADIENT_DESCENT) .updater(new Sgd(0.01)) .list() .layer(0, new DenseLayer.Builder().nIn(4).nOut(10) .weightInit(WeightInit.XAVIER) .activation(Activation.RELU) .build()) .layer(1, new OutputLayer.Builder(LossFunctions.LossFunction.NEGATIVELOGLIKELIHOOD) .nIn(10).nOut(3) .weightInit(WeightInit.XAVIER) .activation(Activation.SOFTMAX) .build()) .pretrain(false).backprop(true) .build(); MultiLayerNetwork model = new MultiLayerNetwork(conf); model.init(); } } Classification Using Weka This example shows how to use the Weka library for classification on the Iris dataset. The code loads the dataset from an ARFF file, sets the class attribute (the attribute we want to predict) to be the last attribute in the dataset, builds a Naive Bayes classifier using the loaded data, and classifies a new instance. Java import weka.classifiers.bayes.NaiveBayes; import weka.core.Instance; import weka.core.Instances; import weka.core.converters.ConverterUtils.DataSource; public class WekaClassification { public static void main(String[] args) throws Exception { DataSource source = new DataSource("data/iris.arff"); Instances data = source.getDataSet(); data.setClassIndex(data.numAttributes() - 1); NaiveBayes nb = new NaiveBayes(); nb.buildClassifier(data); Instance newInstance = data.instance(0); double result = nb.classifyInstance(newInstance); System.out.println("Predicted class: " + data.classAttribute().value((int) result)); } } Conclusion Java, with its rich ecosystem of libraries and frameworks for AI development, is a viable choice for building AI-based applications. By leveraging popular libraries like Deeplearning4j, Weka, and Encog, and using Gradle as the build tool, developers can create powerful AI solutions using the familiar Java programming language. The provided code examples demonstrate the ease of setting up and configuring AI applications using Java and Gradle. The DL4J example shows how to create a basic deep learning model that can be applied to tasks such as image recognition or natural language processing. The Weka example demonstrates how to use Java and the Weka library for machine learning tasks, specifically classification, which can be valuable for implementing machine learning solutions in Java applications, such as predicting customer churn or classifying emails as spam or not spam. Happy Learning!!
If you’re still building and delivering your software applications the traditional way, then you are missing out on a major innovation in the software development process or software development life cycle. To show you what I’m talking about, in this article, I will share how to create a CI/CD Pipeline with Jenkins, Containers, and Amazon ECS that deploys your application and overcomes the limitations of the traditional software delivery model. This innovation greatly affects deadlines, time to market, quality of the product, etc. I will take you through the whole step-by-step process of setting up a CI/CD Docker pipeline for a sample Node.js application. What Is a CI/CD Pipeline? A CI/CD Pipeline or Continuous Integration Continuous Delivery Pipeline is a set of instructions to automate the process of Software tests, builds, and deployments. Here are a few benefits of implementing CI/CD in your organization. Smaller code change: The ability of CI/CD Pipelines to allow the integration of a small piece of code at a time helps developers recognize any potential problem before too much work is completed. Faster delivery: Multiple daily releases or continual releases can be made a reality using CI/CD Pipelines. Observability: Having automation in place that generates extensive logs at each stage of the development process helps to understand if something goes wrong. Easier rollbacks: There are chances that the code that has been deployed may have issues. In such cases, it is very crucial to get back to the previous working release as soon as possible. One of the biggest advantages of using the CI/CD Pipelines is that you can quickly and easily roll back to the previous working release. Reduce costs: Having automation in place for repetitive tasks frees up the Developer and Operation guys’ time that could be spent on Product Development. Now, before we proceed with the steps to set up a CI/CD Pipeline with Jenkins, Containers, and Amazon ECS, let’s see, in short, what tools and technologies we will be using. CI/CD Docker Tool Stack GitHub: It is a web-based application or a cloud-based service where people or developers collaborate, store, and manage their application code using Git. We will create and store our sample Nodejs application code here. AWS EC2 Instance: AWS EC2 is an Elastic Computer Service provided by Amazon Web Services used to create Virtual Machines or Virtual Instances on AWS Cloud. We will create an EC2 instance and install Jenkins and other dependencies in it. Java: This will be required to run the Jenkins Server. AWS CLI: aws-cli i.e AWS Command Line Interface, is a command-line tool used to manage AWS Services using commands. We will be using it to manage AWS ECS Task and ECS Service. Node.js and NPM: Node.js is a back-end JavaScript runtime environment, and NPM is a package manager for Node. We will be creating a CI CD Docker Pipeline for the Node.js application. Docker: Docker is an open-source containerization platform used for developing, shipping, and running applications. We will use it to build Docker Images of our sample Node.js application and push/pull them to/from AWS ECR. Jenkins: Jenkins is an open-source, freely available automation server used to build, test, and deploy software applications. We will be creating our CI/CD Docker Pipeline to build, test, and deploy our Node.js application on AWS ECS using Jenkins AWS ECR: AWS Elastic Container Registry is a Docker Image Repository fully managed by AWS to easily store, share, and deploy container images. We will be using AWS ECR to store Docker Images of our sample Node.js application. AWS ECS: AWS Elastic Container Service is a container orchestration service fully managed by AWS to easily deploy, manage, and scale containerized applications. We will be using it to host our sample Node.js application. Architecture This is how our architecture will look like after setting up the CI/CD Pipeline with Docker. After the CI/CD Docker Pipeline is successfully set up, we will push commits to our GitHub repository, and in turn, GitHub Webhook will trigger the CI/CD Pipeline on Jenkins Server. Jenkins Server will then pull the latest code, perform unit tests, build a docker image, and push it to AWS ECR. After the image is pushed to AWS ECR, the same image will be deployed in AWS ECS by Jenkins. CI/CD Workflow and Phases Workflow CI and CD Workflow allows us to focus on Development while it carries out the tests, build, and deployments in an automated way. Continuous Integration: This allows the developers to push the code to the Version Control System or Source Code Management System, build & test the latest code pushed by the developer, and generate and store artifacts. Continuous Delivery: This is the process that lets us deploy the tested code to the Production whenever required. Continuous Deployment: This goes one step further and releases every single change without any manual intervention to the customer system every time the production pipeline passes all the tests. Phases The primary goal of the automated CI/CD pipeline is to build the latest code and deploy it. There can be various stages as per the need. The most common ones are mentioned below. Trigger: The CI/CD pipeline can do its job on the specified schedule when executed manually or triggered automatically on a particular action in the Code Repository. Code pull: In this phase, the pipeline pulls the latest code whenever the pipeline is triggered. Unit tests: In this phase, the pipeline performs tests that are there in the codebase. This is also referred to as unit tests. Build or package: Once all the tests pass, the pipeline moves forward and builds artifacts or docker images in case of dockerized applications. Push or store: In this phase, the code that has been built is pushed to the Artifactory or Docker Repository in case of dockerized applications. Acceptance tests: This phase or stage of the pipeline validates if the software behaves as intended. It is a way to ensure that the software or application does what it is meant to do. Deploy: This is the final stage in any CI/CD pipeline. In this stage, the application is ready for delivery or deployment. Deployment Strategy A deployment strategy is a way in which containers of the micro-services are taken down and added. There are various options available; however, we will only discuss the ones that are available and supported by ECS Rolling Updates In rolling updates, the scheduler in the ECS Service replaces the currently running tasks with new ones. The tasks in the ECS cluster are nothing but running containers created out of the task definition. Deployment configuration controls the number of tasks that Amazon ECS adds or removes from the service. The lower and the upper limit on the number of tasks that should be running is controlled by minimumHealthyPercent and maximumPercent, respectively. minimumHealthyPercent example: If the value of minimumHealthyPercent is 50 and the desired task count is four, then the scheduler can stop two existing tasks before starting two new tasks maximumPercent example: If the value of maximumPercent is four and the desired task is four, then the scheduler can start four new tasks before stopping four existing tasks. If you want to learn more about this, visit the official documentation here. Blue/Green Deployment Blue/Green deployment strategy enables the developer to verify a new deployment before sending traffic to it by installing an updated version of the application as a new replacement task set. There are primarily three ways in which traffic can shift during blue/green deployment. Canary — Traffic is shifted in two increments. The percentage of traffic shifted to your updated task set in the first increment and the interval, in minutes, before the remaining traffic is shifted in the second increment. Linear — Traffic is shifted in equal increments, the percentage of traffic shifted in each increment, and the number of minutes between each increment. All-at-once — All traffic is shifted from the original task set to the updated task set all at once. To learn more about this, visit the official documentation here. Out of these two strategies, we will be using the rolling-updates deployment strategy in our demo application. Dockerize Node.js App Now, let’s get started and make our hands dirty. The Dockerfile for the sample Nodejs application is as follows. There is no need to copy-paste this file. It is already available in the sample git repository that you cloned previously. Let’s just try to understand the instructions of our Dockerfile. FROM node:12.18.4-alpineThis will be our base image for the container. WORKDIR /appThis will be set as a working directory in the container. ENV PATH /app/node_modules/.bin:$PATHPATH variable is assigned a path to /app/node_modules/.bin. COPY package.json ./Package.json will be copied in the working directory of the container. RUN npm installInstall dependencies. COPY . ./Copy files and folders with dependencies from the host machine to the container. EXPOSE 3000Allow to port 300 of the container. CMD [“node”, “./src/server.js”]Start the application This is the Docker file that we will use to create a Docker image. Setup GitHub Repositories Create a New Repository Go to GitHub, create an account if you don’t have it already else log in to your account and create a new repository. You can name it as per your choice; however, I would recommend using the same name to avoid any confusion. You will get the screen as follows: copy the repository URL and keep it handy. Call this URL a GitHub Repository URL and note it down in the text file on your system. Note: Create a new text file on your system and note down all the details that will be required later. Create a GitHub Token This will be required for authentication purposes. It will be used instead of a password for Git over HTTP or can be used to authenticate to the API over Basic Authentication. Click on the user icon in the top-right, go to “Settings,” then click on the “Developers settings” option in the left panel. Click on the “Personal access tokens” options and “Generate new token” to create a new token. Tick the “repo” checkbox. The token will then have “full control of private repositories” You should see your token created now. Clone the Sample Repository Check your present working directory.pwd Note: You are in the home directory, i.e.,/home/ubuntu. Clone my sample repository containing all the required code.git clone Create a new repository. This repository will be used for CI/CD Pipeline setup.git clone Copy all the code from my node.js repository to the newly created demo-nodejs-app repository.cp -r nodejs/* demo-nodejs-app/ Change your working directory.cd demo-nodejs-app/ Note: For the rest of the article, do not change your directory. Stay in the same directory. Here it is /home/ubuntu/demo-nodejs-app/, and execute all the commands from there. ls -l git status Push Your First Commit to the Repository Check your present working directory. It should be the same. Here it is:/home/ubuntu/demo-nodejs-app/pwd Set a username for your git commit message.git config user.name “Rahul” Set an email for your git commit message.git config user.email “<>” Verify the username and email you set.git config –list Check the status, see files that have been changed or added to your git repository.git status Add files to the git staging area.git add Check the status, see files that have been added to the git staging area.git status Commit your files with a commit message.git commit -m “My first commit” Push the commit to your remote git repository.git push Setup the AWS Infrastructure Create an IAM User With Programmatic Access Create an IAM user with programmatic access in your AWS account and note down the access key and secret key in your text file for future reference. Provide administrator permissions to the user. We don’t need admin access; however, to avoid permission issues and for the sake of the demo, let’s proceed with administrator access. Create an ECR Repository Create an ECR Repository in your AWS account and note its URL in your text file for future reference. Create an ECS Cluster Go to ECS Console and click on “Get Started” to create a cluster. Click on the “Configure” button available in the “custom” option under “Container definition.” Specify a name to the container as “nodejs-container,” the ECR Repository URL in the “Image” text box, and “3000” port in the Port mappings section, and then click on the “Update” button. You can specify any name of your choice for the container. You can now see the details you specified under “Container definition.” Click on the “Next” button to proceed. Select “Application Load Balancer” under “Define your service” and then click on the “Next” button. Keep the cluster name as “default” and proceed by clicking on the “Next” button. You can change the cluster name if you want. Review the configuration, and it should look as follows. If the configurations match, then click on the “Create” button. This will initiate the ECS Cluster creation. After a few minutes, you should have your ECS cluster created, and the Launch Status should be something as follows. Create an EC2 Instance for Setting up the Jenkins Server Create an EC2 Instance with Ubuntu 18.04 AMI and open Port 22 for your IP and Port 8080 for 0.0.0.0/0 in its Security Group. Port 22 will be required for ssh and 8080 for accessing the Jenkins Server. Port 8080 is where GitHub Webhook will try to connect to on Jenkins Server hence we need to allow it for 0.0.0.0/0 Setup Jenkins on the EC2 Instance After the instance is available, let’s install Jenkins Server on it along with all the dependencies. Prerequisites of the EC2 Instance Verify if the OS is Ubuntu 18.04 LTScat /etc/issue Check the RAM, minimum of 2 GB is what we require.free -m The User that you use to log in to the server should have sudo privileges. “ubuntu” is the user available with sudo privileges for EC2 instances created using “Ubuntu 18.04 LTS” AMI.whoami Check your present working directory, it will be your home directory.pwd Install Java, JSON Processor jq, Node.js/NPM, and aws-cli on the EC2 Instance Update your system by downloading package information from all configured sources.sudo apt update Search and Install Java 11sudo apt search openjdksudo apt install openjdk-11-jdk Install jq command, the JSON processor.sudo apt install jq Install Nodejs 12 and NPMcurl -sL https://deb.nodesource.com/setup_12.x | sudo -E bash –sudo apt install nodejs Install aws cli tool.sudo apt install awscli Check the Java version.java –version Check the jq version.jq –version Check the Nodejs versionnode –version Check the NPM versionnpm –version Check the aws cli versionaws –version Note: Make sure all your versions match the versions seen in the above image. Install Jenkins on the EC2 Instance Jenkins can be installed from the Debian repositorywget -q -O – http://pkg.jenkins-ci.org/debian/jenkins-ci.org.key | sudo apt-key add -sudo sh -c ‘echo deb http://pkg.jenkins-ci.org/debian binary/ > /etc/apt/sources.list.d/jenkins.list’ Update the apt package indexsudo apt-get update Install Jenkins on the machinesudo apt-get install jenkins Check the service status if it is running or not.service jenkins status You should have your Jenkins up and running now. You may refer to the official documentation here if you face any issues with the installation. Install Docker on the EC2 Instance Install packages to allow apt to use a repository over HTTPS:sudo apt-get install apt-transport-https ca-certificates curl gnupg lsb-release Add Docker’s official GPG key:curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg –dearmor -o /usr/share/keyrings/docker-archive-keyring.gpg Set up the stable repositoryecho “deb [arch=amd64 signed-by=/usr/share/keyrings/docker-archive-keyring.gpg] https://download.docker.com/linux/ubuntu $(lsb_release -cs) stable” | sudo tee /etc/apt/sources.list.d/docker.list > /dev/null Update the apt package indexsudo apt-get update Install the latest version of Docker Engine and containerd,sudo apt-get install docker-ce docker-ce-cli containerd.io Check the docker version.docker –version Create a “docker” group, this may exit.sudo groupadd docker Add “ubuntu” user to the “docker” groupsudo usermod -aG docker ubuntu Add “jenkins” user to the “docker” groupsudo usermod -aG docker jenkins Test if you can create docker objects using “ubuntu” user.docker run hello-world Switch to “root” usersudo -i Switch to “jenkins” usersu jenkins Test if you can create docker objects using “jenkins” user.docker run hello-world Exit from “jenkins” userexit Exit from “root” userexit Now you should be back in “ubuntu” user. You may refer to the official documentation here if you face any issues with the installation. Configure the Jenkins Server After Jenkins has been installed, the first step is to extract its password.sudo cat /var/lib/jenkins/secrets/initialAdminPassword Hit the URL in the browserJenkins URL: http://<public-ip-of-the-ec2-instace>:8080 Select the “Install suggested plugins” option Specify the user-name, and password for the new admin user to be created. You can use this user as an admin user. This URL field will be auto-filled. Click on the “Save and Finish” button to proceed. Your Jenkins Server is ready now. Here is what its Dashboard looks like. Install Plugins Let’s install all the plugins that we will need. Click on “Manage Jenkins” in the left panel. Here is a list of plugins that we need to install CloudBees AWS Credentials:Allows storing Amazon IAM credentials keys within the Jenkins Credentials API. Docker Pipeline:This plugin allows building, testing, and using Docker images from Jenkins Pipeline. Amazon ECR:This plugin provides integration with AWS Elastic Container Registry (ECR)Usage: AWS Steps:This plugin adds Jenkins pipeline steps to interact with the AWS API. In the “Available” tab, search all these plugins and click on “Install without restart.” You will see the screen as follows after the plugins have been installed successfully. Create Credentials in Jenkins CloudBees AWS Credentials plugin will come to the rescue here. Go to “Manage Jenkins,” and then click on “Manage Credentials." Click on “(global)” “Add credentials”. Select Kind as “AWS Credentials” and provide ID as “demo-admin-user.” This can be provided as per your choice. Keep a note of this ID in the text file. Specify the Access Key and Secret Key of the IAM user we created in the previous steps. Click on “OK” to store the IAM credentials. Follow the same step, and this time select Kind as “Username with password” to store the GitHub Username and Token we created earlier. Click on “Ok” to store the GitHub credentials. You should now have IAM and GitHub credentials in your Jenkins. Create a Jenkins Job Go to the main dashboard and click on “New Item” to create a Jenkins Pipeline. Select the “Pipeline” and name it “demo-job,” or provide a name of your choice. Tick the “GitHub project” checkbox under the “General” tab, and provide the GitHub Repository URL of the one we created earlier. Also, tick the checkbox “GitHub hook trigger for GitScm polling” under the “Build Trigger” tab. Under the “Pipeline” tab, select “Pipeline script from the SCM” definition, specify our repository URL, and select the credential we created for Github. Check the branch name if it matches the one you will be using for your commits. Review the configurations and click on “Save” to save your changes to the pipeline. Now you can see the pipeline we just created. Integrate GitHub and Jenkins The next step is to integrate Github with Jenkins so that whenever there is an event on the Github Repository, it can trigger the Jenkins Job. Go to the settings tab of the repository and click on “Webhooks” in the left panel. You can see the “Add webhook” button. Click on it to create a webhook. Provide the Jenkins URL with context as “/github-webhook/.” The URL will look as follows.Webhook URL: http://<Jenkins-IP>:8080/github-webhook/You can select the events of your choice; however, for the sake of simplicity, I have chosen “Send me everything.” Make sure the “Active” checkbox is checked. Click on “Add webhook” to create a webhook that will trigger the Jenkins job whenever there is any kind of event in the GitHub Repository. You should see your webhook. Click on it to see if it has been configured correctly or not. Click on the “Recent Deliveries” tab, and you should see a green tick mark. The green tick mark shows that the webhook was able to connect to the Jenkins Server. Deploy the Node.js Application to the ECS Cluster Before we trigger the Pipeline from GitHub Webhook, let's try to execute it manually. Build the Job Manually Go to the Job we created and Build it. If you see its logs, you will see that it failed. The reason is we have not yet assigned values to the variable we have in our Jenkinsfile. Push Your Second Commit Reminder Note: For the rest of the article, do not change your directory. Stay in the same directory, i.e.,/home/ubuntu/demo-nodejs-app, and execute all the commands from here. Assign values to the variable in the Jenkinsfile To overcome the above error, you need to make some changes to the Jenkinsfile. We have variables in that file, and we need to assign values to those variables to deploy our application to the ECS cluster we created. Assign correct values to the variables having “CHANGE_ME.”cat Jenkinsfile Here is the list of variables for your convenience.We have the following variables in the Jenkinsfile. AWS_ACCOUNT_ID=”CHANGE_ME”Assign your AWS Account Number here. AWS_DEFAULT_REGION=”CHANGE_ME”Assign the region you created your ECS Cluster in CLUSTER_NAME=”CHANGE_ME”Assign the name of the ECS Cluster that you created. SERVICE_NAME=”CHANGE_ME”Assign the Service name that got created in the ECS Cluster. TASK_DEFINITION_NAME=”CHANGE_ME”Assign the Task name that got created in the ECS Cluster. DESIRED_COUNT=”CHANGE_ME”Assing the number of tasks you want to be created in the ECS Cluster. IMAGE_REPO_NAME=”CHANGE_ME”Assign the ECR Repositoy URL IMAGE_TAG=”${env.BUILD_ID}”Do not change this. REPOSITORY_URI = “${AWS_ACCOUNT_ID}.dkr.ecr.${AWS_DEFAULT_REGION}.amazonaws.com/${IMAGE_REPO_NAME}”Do not change this. registryCredential = “CHANGE_ME”Assign the name of the credentials you created in Jenkins to store the AWS Access Key and Secret Key Check the status to confirm that the file has been changed.git statuscat Jenkinsfile Add a file to the git staging area, commit it, and then push it to the remote Github Repository.git statusgit add Jenkinsfilegit commit -m “Assigned environment specific values in Jenkinsfile”git push Error on Jenkins Server After pushing the commit, the Jenkins Pipeline will get triggered. However, you will see an error “Got permission denied while trying to connect to the Docker daemon socket at unix:///var/run/docker.sock” in your Jenkins Job. The reason for this is a “Jenkins” user that is used by the Jenkins Job is not allowed to create docker objects. To give permission to a “Jenkins” user, we added it to the “docker” group in the previous step; however, we did not restart the Jenkins service after that. I kept this deliberately so that I could show you the need to add the “Jenkins” user to the “docker” group in your EC2 Instance. Now you know what needs to be done to overcome the above error. Restart the Jenkins service.sudo service jenkins restart Check if the Jenkins service has started or not.sudo service jenkins status Push Your Third Commit Make some changes in README.md to commit, push, and test if the Pipeline gets triggered automatically or not.vim README.md Add, commit, and push the file.git statusgit diff README.mdgit add README.mdgit commit -m “Modified README.md to trigger the Jenkins job after restarting the Jenkins service”git push This time, you can observe that the job must have been triggered automatically. Go to the Jenkins job and verify the same. This is what the Stage View looks like. It shows us the stages that we have specified in our Jenkinsfile. Check the Status of the Task in the ECS Cluster Go to the Cluster, click on the “Tasks” tab, and then open the running “Task.” Click on the “JSON” tab and verify the image. The image tag should match the Jenkins Build number. In this case, it is “6,” and it matches my Jenkins Job Build number. Hit the ELB URL to check if the Nodejs application is available or not. You should get the message as follows in the browser after hitting the ELB URL. Push Your Fourth Commit Open the “src/server.js” file and make some changes in the display message to test the CI CD Pipeline again.vim src/server.js Check the files that have been changed. In this case, only one file can be seen as changed.git status Check the difference that your change has caused in the file.git diff src/server.js Add the file that you changed to the git staging area.git add src/server.js Check the status of the local repository.git status Add a message to the commit.git commit -m “Updated welcome message” Push your change to the remote repository.git push Go to the Task. This time, you will see two tasks running. One with the older revision and one with the newer revision. You see two tasks because of the rolling-update deployment strategy configured by default in the cluster. Wait for around 2-3 minutes, and you should only have one task running with the latest revision. Again, hit the ELB URL, and you should see your changes. In this case, we had changed the display message.Congratulations! You have a working Jenkins CI CD Pipeline to deploy your Nodejs containerized application on AWS ECS whenever there is a change in your source code. Cleanup the Resources We Created If you were just trying to set up a CI/CD pipeline to get familiar with it or for POC purposes in your organization and no longer need it, it is always better to delete the resources you created while carrying out the POC. As part of this CI/CD pipeline, we created a few resources. We created the below list to help you delete them. Delete the GitHub Repository Delete the GitHub Token Delete the IAM User Delete the EC2 Instance Delete the ECR Repository Delete the ECS Cluster Deregister the Task Definition Summary Finally, here is the summary of what you have to do to set up a CI/CD Docker pipeline to deploy a sample Node.js application on AWS ECS using Jenkins. Clone the existing sample GitHub Repository Create a new GitHub Repository and copy the code from the sample repository in it Create a GitHub Token Create an IAM User Create an ECR Repository Create an ECS Cluster Create an EC2 Instance for setting up the Jenkins Server Install Java, JSON processor jq, Node.js, and NPM on the EC2 Instance Install Jenkins on the EC2 Instance Install Docker on the EC2 Instance Install Plugins Create Credentials in Jenkins Create a Jenkins Job Integrate GitHub and Jenkins Check the deployment Cleanup the resources Conclusion A CI/CD Pipeline serves as a way of automating your software applications’ builds, tests, and deployments. It is the backbone of any organization with a DevOps culture. It has numerous benefits for software development, and it boosts your business greatly. In this blog, we demonstrated the steps to create a Jenkins CI/CD Docker Pipeline to deploy a sample Node.js containerized application on AWS ECS. We saw how GitHub Webhooks can be used to trigger the Jenkins pipeline on every push to the repository, which in turn deploys the latest docker image to AWS ECS. CI/CD Pipelines with Docker is best for your organization to improve code quality and deliver software releases quickly without any human errors. We hope this blog helped you learn more about the integral parts of the CI/CD Docker Pipeline.
This is an article from DZone's 2023 Database Systems Trend Report.For more: Read the Report The cloud is seamlessly integrated with almost all aspects of life, like business, personal computing, social media, artificial intelligence, Internet of Things, and more. In this article, we will dive into clouds and discuss their optimal suitability based on different types of organizational or individual needs. Public vs. Private Cloud Evaluation Mixing and matching cloud technologies provides a lot of ease and flexibility, but it comes with a lot of responsibilities, too. Sometimes it is difficult to make a choice between the types of cloud available, i.e., public, private, or hybrid cloud. An evaluation based on providers, cloud, and project demand is very crucial to selecting the right type. When evaluating a public and private cloud, it is important to consider the factors listed in Table 1: PUBLIC VS. PRIVATE CLOUD Public Cloud Private Cloud Best use cases Good for beginners Testing new features with minimal cost and setup Handling protected data and industry compliance Ensuring customized security Dedicated resources Cost Pay per usage Can be expensive Workload Suitable for fluctuating workloads Offers scalability and flexibility Suitable for predictable workloads Data confidentiality requirements (e.g., hospitals with confidential patient data) Infrastructure Shared infrastructure Can use existing on-premises investments Services Domain-specific services, including healthcare, education, finance, and retail Industry-specific customization options, including infrastructure, hardware, software stack, security, and access control Presence Global Reduces latency for geographically distributed customers Effective for limited geographic audiences and targeted client needs Table 1 Hybrid Cloud Both public and private clouds are useful in various ways, so it is possible to choose both to gain maximum advantages. This approach is achieved by adopting a hybrid cloud. Let's understand some of the key factors to consider: Hybrid clouds are suitable when the workload is both predicted or variable. A public cloud provides scalability and on-demand resources during peak seasons, while a private cloud handles base workload during off-seasons. To save money, public clouds can be shut off during non-peak seasons and store non-sensitive data. Private clouds generally cost more, but it is necessary for storing sensitive data. Private clouds are used for confidential data; non-regulated or non-sensitive information can be stored in a public cloud. Hybrid cloud is suitable for businesses operating in multiple regions. Private clouds serve specific regions, while public cloud providers offer global reach and accessibility for other services. Before adopting a hybrid approach, thorough analysis should be done, keeping factors such as workload patterns, budget, and compliance needs in consideration. Figure 1: Hybrid cloud combines features of public and private cloud Key Considerations for DBAs and Non-DBAs To achieve overall operational efficiency, it is essential for DBAs and non-DBAs to understand the key considerations related to database management. These considerations will help in effective collaboration, streamlined processes, and optimized data usage within an organization. Cost Optimization Cost is one of the major decision-making factors for everyone. It is very crucial to consider cost optimization strategies surrounding data, backup and archival strategies, and storage. Data is one of the most important factors when it comes to cost saving. It is always good to know your data patterns and to understand who the end user is so the classification of data is optimized for storage. No duplicates in data means no extra storage used. Also, an in-depth understanding of the types of storage available is required in order to get maximum benefit within budget. Classify your data into a structured or unstructured format. It is important for DBAs to analyze data that is no longer actively used but might be needed in the future. By moving this data to archival storage, DBAs can effectively save primary storage space. Implementing efficient backup strategies can help minimize redundant storage requirements, hence less cost. Data, storage, and cost are directly proportional to each other, so it is important to review these three for maximum benefits and performance with minimum costs from cloud providers. Available storage options include object storage, block storage, thin provisioning, and tiered storage. Figure 2: Data is directly proportional to the storage used and cost savings Performance To optimize storage speed, cloud providers use technologies like CDNs. Network latency can be reduced through strategies such as data compression, CDNs, P2P networking, edge computing, and geographic workload distribution. Larger memory capacity improves caching and overall performance. Computing power also plays a vital role. Factors like CPU-intensive tasks and parallel processing should be considered. GPUs or TPUs offer improved performance for intensive workloads in machine learning, data analytics, and video processing. Disaster Recovery Data should be available to recover after a disaster. If choosing a private cloud, be ready with backup and make sure you will be able to recover! It's important to distribute data, so if one area is affected, other locations can serve and business can run as usual. Security Cloud providers have various levels of data protection: With multi-factor authentication, data will be secure by adding an extra layer of verification. A token will be sent to a mobile device, via email, or by a preferred method like facial recognition. The right data should be accessible by the right consumer. To help these restrictions, technologies like identity and access management or role-based access control assign permission to the users based on assigned roles. Virtual private networks could be your savior. They provide secure private network connections over public networks and encrypted tunnels between the consumer's device and the cloud that will protect data from intruders. With encryption algorithms, clouds protect data at rest and in transit within infrastructure. However, it is always a good idea for a DBA and non-DBA to configure encryption settings within an app to achieve an organization's required security. Scalability When working with a cloud, it is important to understand how scalability is achieved: Cloud providers deploy virtual instances of servers, storage, and networks, which result in faster provisioning and allocation of virtual resources on demand. Serverless computing allows developers to focus on writing code. Infrastructure, scaling, and other resources required to handle incoming requests will be handled by cloud providers. Cloud providers suggest horizontal scaling instead of vertical scaling. By adding more servers instead of upgrading hardware in existing machines, the cloud will develop a distributed workload, which increases capacity. Vendor Lock-In For organizations looking for flexibility and varying cloud deployments, the risk of vendor lock-in can be limiting. To minimize this risk, implementing a hybrid cloud approach enables the distribution of data, flexibility, and easy migration. Using multiple cloud providers through a hybrid model helps avoid dependence on a single vendor with diverse capabilities. Open data formats and vendor-independent storage solutions will help in easy porting. In addition, containerization technologies for applications allow flexible vendor selection. It is essential for organizations to consider exit strategies, including contractual support, to ensure smooth transitions between vendors and to reduce challenges. Next Steps Cloud computing is a huge revolution for not only the IT industry but also for individuals. Here are some next steps based on the features and extensive usage. Extensive Acceptance Cloud computing is a long-term deal. It offers flexibility and the ability for developers to focus on code alone, even if they don't have prior knowledge or a dedicated team to maintain infrastructure. Other benefits include increased innovation, since most of the hustle is taken care by the cloud providers, and little-to no downtime, which is great for businesses that operate 24/7. Database Options To Mix and Match in the Cloud When we talk about cloud computing, there are many databases available. The following are some popular database options: DATABASE OPTIONS NoSQL Relational Serverless Managed Database Services Purpose Structured and unstructured data Structured and complex queries Unpredictable workloads Simplify database management Pros Scalability, flexibility, pairs well with cloud computing Strong data consistency, robust query capabilities Pay per usage, server management Scaling, automated backups, maintenance, cost-effective Cons Data consistency Scalability, fixed schema Vendor lock-in Dependent on providers Prevalence Will grow in popularity Will continue to stick around Will grow in popularity Will be a secure choice Examples MongoDB, Cassandra MySQL, Postgres, Oracle Google Cloud Firestore, Amazon Aurora Serverless Table 2 Conclusion It is important to note that there are scenarios where a private cloud might be preferred, such as when strict data security and compliance requirements exist, or when an organization needs maximum control over infrastructure and data. Each organization should evaluate its specific needs and consider a hybrid cloud approach, as well. Cloud providers often introduce new instances, types, updates, and features, so it is always good to review their documentation carefully for the most up-to-date information. By mindfully assessing vendor lock-in risks and implementing appropriate strategies, businesses can maintain flexibility and control over their cloud deployments while minimizing the challenges associated with switching cloud providers in the future. In this article, I have shared my own opinions and experiences as a DBA — I hope it offered additional insights and details about cloud options that can help to improve performance and cost savings based on individual objectives. This is an article from DZone's 2023 Database Systems Trend Report.For more: Read the Report
GitHub Actions has a large ecosystem of high-quality third-party actions, plus built-in support for executing build steps inside Docker containers. This means it's easy to run end-to-end tests as part of a workflow, often only requiring a single step to run testing tools with all the required dependencies. In this post, I show you how to run browser tests with Cypress and API tests with Postman as part of a GitHub Actions workflow. Getting Started GitHub Actions is a hosted service, so all you need to get started is a GitHub account. All other dependencies, like Software Development Kits (SDKs) or testing tools, are provided by the Docker images or GitHub Actions published by testing platforms. Running Browser Tests With Cypress Cypress is a browser automation tool that lets you interact with web pages in much the same way an end user would, for example by clicking on buttons and links, filling in forms, and scrolling the page. You can also verify the content of a page to ensure the correct results are displayed. The Cypress documentation provides an example first test which has been saved to the junit-cypress-test GitHub repo. The test is shown below: describe('My First Test', () => { it('Does not do much!', () => { expect(true).to.equal(true) }) }) This test is configured to generate a JUnit report file in the cypress.json file: { "reporter": "junit", "reporterOptions": { "mochaFile": "cypress/results/results.xml", "toConsole": true } } The workflow file below executes this test with the Cypress GitHub Action, saves the generated video file as an artifact, and processes the test results. You can find an example of this workflow in the junit-cypress-test repository: name: Cypress on: push: workflow_dispatch: jobs: build: runs-on: ubuntu-latest steps: - name: Checkout uses: actions/checkout@v1 - name: Cypress run uses: cypress-io/github-action@v2 - name: Save video uses: actions/upload-artifact@v2 with: name: sample_spec.js.mp4 path: cypress/videos/sample_spec.js.mp4 - name: Report uses: dorny/test-reporter@v1 if: always() with: name: Cypress Tests path: cypress/results/results.xml reporter: java-junit fail-on-error: true The official Cypress GitHub action is called to execute tests with the default options: - name: Cypress run uses: cypress-io/github-action@v2 Cypress generates a video file capturing the browser as the tests are run. You save the video file as an artifact to be downloaded and viewed after the workflow completes: - name: Save video uses: actions/upload-artifact@v2 with: name: sample_spec.js.mp4 path: cypress/videos/sample_spec.js.mp4 The test results are processed by the dorny/test-reporter action. Note that test reporter has the ability to process Mocha JSON files, and Cypress uses Mocha for reporting, so an arguably more idiomatic solution would be to have Cypress generate Mocha JSON reports. Unfortunately, there is a bug in Cypress that prevents the JSON reporter from saving results as a file. Generating JUnit report files is a useful workaround until this issue is resolved: - name: Report uses: dorny/test-reporter@v1 if: always() with: name: Cypress Tests path: cypress/results/results.xml reporter: java-junit fail-on-error: true Here are the results of the test: The video file artifact is listed in the Summary page: Not all testing platforms provide a GitHub action, in which case you can execute steps against a standard Docker image. This is demonstrated in the next section. Running API Tests With Newman Unlike Cypress, Postman does not provide an official GitHub action. However, you can use the postman/newman Docker image directly inside a workflow. You can find an example of the workflow in the junit-newman-test repository: name: Cypress on: push: workflow_dispatch: jobs: build: runs-on: ubuntu-latest steps: - name: Checkout uses: actions/checkout@v1 - name: Run Newman uses: docker://postman/newman:latest with: args: run GitHubTree.json --reporters cli,junit --reporter-junit-export results.xml - name: Report uses: dorny/test-reporter@v1 if: always() with: name: Cypress Tests path: results.xml reporter: java-junit fail-on-error: true The uses property for a step can either be the name of a published action, or can reference a Docker image directly. In this example, you run the postman/newman docker image, with the with.args parameter defining the command-line arguments: - name: Run Newman uses: docker://postman/newman:latest with: args: run GitHubTree.json --reporters cli,junit --reporter-junit-export results.xml The resulting JUnit report file is then processed by the dorny/test-reporter action: - name: Report uses: dorny/test-reporter@v1 if: always() with: name: Cypress Tests path: results.xml reporter: java-junit fail-on-error: true Here are the results of the test: Behind the scenes, GitHub Actions executes the supplied Docker image with a number of standard environment variables relating to the workflow and with volume mounts that allow the Docker container to persist changes (like the report file) on the main file system. The following is an example of the command to execute a step in a Docker image: /usr/bin/docker run --name postmannewmanlatest_fefcec --label f88420 --workdir /github/workspace --rm -e INPUT_ARGS -e HOME -e GITHUB_JOB -e GITHUB_REF -e GITHUB_SHA -e GITHUB_REPOSITORY -e GITHUB_REPOSITORY_OWNER -e GITHUB_RUN_ID -e GITHUB_RUN_NUMBER -e GITHUB_RETENTION_DAYS -e GITHUB_RUN_ATTEMPT -e GITHUB_ACTOR -e GITHUB_WORKFLOW -e GITHUB_HEAD_REF -e GITHUB_BASE_REF -e GITHUB_EVENT_NAME -e GITHUB_SERVER_URL -e GITHUB_API_URL -e GITHUB_GRAPHQL_URL -e GITHUB_WORKSPACE -e GITHUB_ACTION -e GITHUB_EVENT_PATH -e GITHUB_ACTION_REPOSITORY -e GITHUB_ACTION_REF -e GITHUB_PATH -e GITHUB_ENV -e RUNNER_OS -e RUNNER_NAME -e RUNNER_TOOL_CACHE -e RUNNER_TEMP -e RUNNER_WORKSPACE -e ACTIONS_RUNTIME_URL -e ACTIONS_RUNTIME_TOKEN -e ACTIONS_CACHE_URL -e GITHUB_ACTIONS=true -e CI=true -v "/var/run/docker.sock":"/var/run/docker.sock" -v "/home/runner/work/_temp/_github_home":"/github/home" -v "/home/runner/work/_temp/_github_workflow":"/github/workflow" -v "/home/runner/work/_temp/_runner_file_commands":"/github/file_commands" -v "/home/runner/work/junit-newman-test/junit-newman-test":"/github/workspace" postman/newman:latest run GitHubTree.json --reporters cli,junit --reporter-junit-export results.xml This is a complex command, but there are a few arguments that we're interested in. The -e arguments define environment variables for the container. You can see that dozens of workflow environment variables are exposed. The --workdir /github/workspace argument overrides the working directory of the Docker container, while the -v "/home/runner/work/junit-newman-test/junit-newman-test":"/github/workspace" argument mounts the workflow workspace to the /github/workspace directory inside the container. This has the effect of mounting the working directory inside the Docker container, which exposes the checked-out files, and allows any newly created files to persist once the container is shutdown: Because every major testing tool provides a supported Docker image, the process you used to run Newman can be used to run most other testing platforms. Conclusion GitHub Actions has enjoyed widespread adoption among developers, and many platforms provide supported actions for use in workflows. For those cases where there is no suitable action available, GitHub Actions provides an easy way to execute a standard Docker image as part of a workflow. In this post, you learned how to run the Cypress action to execute browser-based tests and how to run the Newman Docker image to execute API tests. Happy deployments!
This is an article from DZone's 2023 Database Systems Trend Report.For more: Read the Report Hearing the vague statement, "We have a problem with the database," is a nightmare for any database manager or administrator. Sometimes it's true, sometimes it's not, and what exactly is the issue? Is there really a database problem? Or is it a problem with networking, an application, a user, or another possible scenario? If it is a database, what is wrong with it? Figure 1: DBMS usage Databases are a crucial part of modern businesses, and there are a variety of vendors and types to consider. Databases can be hosted in a data center, in the cloud, or in both for hybrid deployments. The data stored in a database can be used in various ways, including websites, applications, analytical platforms, etc. As a database administrator or manager, you want to be aware of the health and trends of your databases. Database monitoring is as crucial as databases themselves. How good is your data if you can't guarantee its availability and accuracy? Database Monitoring Considerations Database engines and databases are systems hosted on a complex IT infrastructure that consists of a variety of components: servers, networking, storage, cables, etc. Database monitoring should be approached holistically with consideration of all infrastructure components and database monitoring itself. Figure 2: Database monitoring clover Let's talk more about database monitoring. As seen in Figure 2, I'd combine monitoring into four pillars: availability, performance, activity, and compliance. These are broad but interconnected pillars with overlap. You can add a fifth "clover leaf" for security monitoring, but I include that aspect of monitoring into activity and compliance, for the same reason capacity planning falls into availability monitoring. Let's look deeper into monitoring concepts. While availability monitoring seems like a good starting topic, I will deliberately start with performance since performance issues may render a database unavailable and because availability monitoring is "monitoring 101" for any system. Performance Monitoring Performance monitoring is the process of capturing, analyzing, and alerting to performance metrics of hardware, OS, network, and database layers. It can help avoid unplanned downtimes, improve user experience, and help administrators manage their environments efficiently. Native Database Monitoring Most, if not all, enterprise-grade database systems come with a set of tools that allow database professionals to examine internal and/or external database conditions and the operational status. These are system-specific, technical tools that require SME knowledge. In most cases, they are point-in-time performance data with limited or non-existent historical value. Some vendors provide additional tools to simplify performance data collection and analysis. With an expansion of cloud-based offerings (PaaS or IaaS), I've noticed some improvements in monitoring data collection and the available analytics and reporting options. However, native performance monitoring is still a set of tools for a database SME. Enterprise Monitoring Systems Enterprise monitoring systems (EMSs) offer a centralized approach to keeping IT systems under systematic review. Such systems allow monitoring of most IT infrastructure components, thus consolidating supervised systems with a set of dashboards. There are several vendors offering comprehensive database monitoring systems to cover some or all your monitoring needs. Such solutions can cover multiple database engines or be specific to a particular database engine or a monitoring aspect. For instance, if you only need to monitor SQL servers and are interested in the performance of your queries, then you need a monitoring system that identifies bottlenecks and contentions. Let's discuss environments with thousands of database instances (on-premises and in a cloud) scattered across multiple data centers across the globe. This involves monitoring complexity growth with a number of monitored devices, database type diversity, and geographical locations of your data centers and actual data that you monitor. It is imperative to have a global view of all database systems under one management and an ability to identify issues, preferably before they impact your users. EMSs are designed to help organizations align database monitoring with IT infrastructure monitoring, and most solutions include an out-of-the-box set of dashboards, reports, graphs, alerts, useful tips, and health history and trends analytics. They also have pre-set industry-outlined thresholds for performance counters/metrics that should be adjusted to your specific conditions. Manageability and Administrative Overhead Native database monitoring is usually handled by a database administrator (DBA) team. If it needs to be automated, expanded, or have any other modifications, then DBA/development teams would handle that. This can be efficiently managed by DBAs in a large enterprise environment on a rudimental level for internal DBA specific use cases. Bringing in a third-party system (like an EMS) requires management. Hypothetically, a vendor has installed and configured monitoring for your company. That partnership can continue, or internal personnel can take over EMS management (with appropriate training). There is no "wrong" approach — it solely depends on your company's operating model and is assessed accordingly. Data Access and Audit Compliance Monitoring Your databases must be secure! Unauthorized access to sensitive data could be as harmful as data loss. Data breaches, malicious activities (intentional or not) — no company would be happy with such publicity. That brings us to audit compliance and data access monitoring. There are many laws and regulations around data compliance. Some are common between industries, some are industry-specific, and some are country-specific. For instance, SOX compliance is required for all public companies in numerous countries, and US healthcare must follow HIPAA regulations. Database management teams must implement a set of policies, procedures, and processes to enforce laws and regulations applicable to their company. Audit reporting could be a tedious and cumbersome process, but it can and should be automated. While implementing audit compliance and data access monitoring, you can improve your database audit reporting, as well — it's virtually the same data set. What do we need to monitor to comply with various laws and regulations? These are normally mandatory: Access changes and access attempts Settings and/or objects modifications Data modifications/access Database backups Who should be monitored? Usually, access to make changes to a database or data is strictly controlled: Privileged accounts – usually DBAs; ideally, they shouldn't be able to access data, but that is not always possible in their job so activity must be monitored Service accounts – either database or application service accounts with rights to modify objects or data "Power" accounts – users with rights to modify database objects or data "Lower" accounts – accounts with read-only activity As with performance monitoring, most database engines provide a set of auditing tools and mechanisms. Another option is third-party compliance software, which uses database-native auditing, logs, and tracing to capture compliance-related data. It provides audit data storage capabilities and, most importantly, a set of compliance reports and dashboards to adhere to a variety of compliance policies. Compliance complexity directly depends on regulations that apply to your company and the diversity and size of your database ecosystem. While we monitor access and compliance, we want to ensure that our data is not being misused. An adequate measure should be in place for when unauthorized access or abnormal data usage is detected. Some audit compliance monitoring systems provide means to block abnormal activities. Data Corruption and Threats Database data corruption is a serious issue that could lead to a permanent loss of valuable data. Commonly, data corruption occurs due to hardware failures, but it could be due to database bugs or even bad coding. Modern database engines have built-in capabilities to detect and sometimes prevent data corruption. Data corruption will generate an appropriate error code that should be monitored and highlighted. Checking database integrity should be a part of the periodical maintenance process. Other threats include intentional or unintentional data modification and ransomware. While data corruption and malicious data modification can be detected by DBAs, ransomware threats fall outside of the monitoring scope for database professionals. It is imperative to have a bulletproof backup to recover from those threats. Key Database Performance Metrics Database performance metrics are extremely important data points that measure the health of database systems and help database professionals maintain efficient support. Some of the metrics are specific to a database type or vendor, and I will generalize them as "internal counters." Availability The first step in monitoring is to determine if a device or resource is available. There is a thin line between system and database availability. A database could be up and running, but clients may not be able to access it. With that said, we need to monitor the following metrics: Network status – Can you reach the database over the network? If yes, what is the latency? While network status may not commonly fall into the direct responsibility of a DBA, database components have configuration parameters that might be responsible for a loss of connectivity. Server up/down Storage availability Service up/down – another shared area between database and OS support teams Whether the database is online or offline CPU, Memory, Storage, and Database Internal Metrics The next important set of server components which could, in essence, escalate into an availability issue are CPU, memory, and storage. The following four performance areas are tightly interconnected and affect each other: Lack of available memory High CPU utilization Storage latency or throughput bottleneck Set of database internal counters which could provide more content to utilization issues For instance, lack of memory may force a database engine to read and write data more frequently, creating contention on the IO system. 100% CPU utilization could often cause an entire database server to stop responding. Numerous database internal counters can help database professionals analyze use trends and identify an appropriate action to mitigate potential impact. Observability Database observability is based on metrics, traces, and logs — what we supposedly collected based on the discussion above. There are a plethora of factors that may affect system and application availability and customer experience. Database performance metrics are just a single set of possible failure points. Supporting the infrastructure underneath a database engine is complex. To successfully monitor a database, we need to have a clear picture of the entire ecosystem and the state of its components while monitoring. Relevant performance data collected from various components can be a tremendous help in identifying and addressing issues before they occur. The entire database monitoring concept is data driven, and it is our responsibility to make it work for us. Monitoring data needs to tell us a story that every consumer can understand. With database observability, this story can be transparent and provide a clear view of your database estate. Balanced Monitoring As you could gather from this article, there are many points of failure in any database environment. While database monitoring is the responsibility of database professionals, it is a collaborative effort of multiple teams to ensure that your entire IT ecosystem is operational. So what's considered "too much" monitoring and when is it not enough? I will use DBAs' favorite phrase: it depends. Assess your environment – It would be helpful to have a configuration management database. If you don't, create a full inventory of your databases and corresponding applications: database sizes, number of users, maintenance schedules, utilization times — as many details as possible. Assess your critical systems – Outline your critical systems and relevant databases. Most likely those will fall into a category of maximum monitoring: availability, performance, activity, and compliance. Assess your budget – It's not uncommon to have a tight cash flow allocated to IT operations. You may or may not have funds to purchase a "we-monitor-everything" system, and certain monitoring aspects would have to be developed internally. Find a middle ground – Your approach to database monitoring is unique to your company's requirements. Collecting monitoring data that has no practical or actionable applications is not efficient. Defining actionable KPIs for your database monitoring is a key to finding a balance — monitor what your team can use to ensure systems availability, stability, and satisfied customers. Remember: Successful database monitoring is data-driven, proactive, continuous, actionable, and collaborative. This is an article from DZone's 2023 Database Systems Trend Report.For more: Read the Report
October 2, 2023 by
Building a DevOps Culture Layer by Layer
October 2, 2023 by
Data Observability: Better Insights Through Reliable Data Practices
October 4, 2023 by CORE
Eclipse JNoSQL 1.0.2: Empowering Java With NoSQL Database Flexibility
October 3, 2023 by CORE
Explainable AI: Making the Black Box Transparent
May 16, 2023 by
Data Observability: Better Insights Through Reliable Data Practices
October 4, 2023 by CORE
Eclipse JNoSQL 1.0.2: Empowering Java With NoSQL Database Flexibility
October 3, 2023 by CORE
Low Code vs. Traditional Development: A Comprehensive Comparison
May 16, 2023 by
Eclipse JNoSQL 1.0.2: Empowering Java With NoSQL Database Flexibility
October 3, 2023 by CORE
Empowering Real-World Solutions the Synergy of AI and .NET
October 3, 2023 by
Data Observability: Better Insights Through Reliable Data Practices
October 4, 2023 by CORE
HIPAA Compliance Testing In Software: Building Healthcare Software With Confidence
October 3, 2023 by
Low Code vs. Traditional Development: A Comprehensive Comparison
May 16, 2023 by
Eclipse JNoSQL 1.0.2: Empowering Java With NoSQL Database Flexibility
October 3, 2023 by CORE
PromCon EU 2023: Observability Recap in Berlin
October 3, 2023 by CORE
Five IntelliJ Idea Plugins That Will Change the Way You Code
May 15, 2023 by