Migrate, Modernize and Build Java Web Apps on Azure: This live workshop will cover methods to enhance Java application development workflow.
Modern Digital Website Security: Prepare to face any form of malicious web activity and enable your sites to optimally serve your customers.
The final step in the SDLC, and arguably the most crucial, is the testing, deployment, and maintenance of development environments and applications. DZone's category for these SDLC stages serves as the pinnacle of application planning, design, and coding. The Zones in this category offer invaluable insights to help developers test, observe, deliver, deploy, and maintain their development and production environments.
In the SDLC, deployment is the final lever that must be pulled to make an application or system ready for use. Whether it's a bug fix or new release, the deployment phase is the culminating event to see how something works in production. This Zone covers resources on all developers’ deployment necessities, including configuration management, pull requests, version control, package managers, and more.
The cultural movement that is DevOps — which, in short, encourages close collaboration among developers, IT operations, and system admins — also encompasses a set of tools, techniques, and practices. As part of DevOps, the CI/CD process incorporates automation into the SDLC, allowing teams to integrate and deliver incremental changes iteratively and at a quicker pace. Together, these human- and technology-oriented elements enable smooth, fast, and quality software releases. This Zone is your go-to source on all things DevOps and CI/CD (end to end!).
A developer's work is never truly finished once a feature or change is deployed. There is always a need for constant maintenance to ensure that a product or application continues to run as it should and is configured to scale. This Zone focuses on all your maintenance must-haves — from ensuring that your infrastructure is set up to manage various loads and improving software and data quality to tackling incident management, quality assurance, and more.
Modern systems span numerous architectures and technologies and are becoming exponentially more modular, dynamic, and distributed in nature. These complexities also pose new challenges for developers and SRE teams that are charged with ensuring the availability, reliability, and successful performance of their systems and infrastructure. Here, you will find resources about the tools, skills, and practices to implement for a strategic, holistic approach to system-wide observability and application monitoring.
The Testing, Tools, and Frameworks Zone encapsulates one of the final stages of the SDLC as it ensures that your application and/or environment is ready for deployment. From walking you through the tools and frameworks tailored to your specific development needs to leveraging testing practices to evaluate and verify that your product or application does what it is required to do, this Zone covers everything you need to set yourself up for success.
DevOps
The DevOps movement has paved the way for CI/CD and streamlined application delivery and release orchestration. These nuanced methodologies have not only increased the scale and speed at which we release software, but also redistributed responsibilities onto the developer and led to innovation and automation throughout the SDLC.DZone's 2023 DevOps: CI/CD, Application Delivery, and Release Orchestration Trend Report explores these derivatives of DevOps by diving into how AIOps and MLOps practices affect CI/CD, the proper way to build an effective CI/CD pipeline, strategies for source code management and branching for GitOps and CI/CD, and more. Our research builds on previous years with its focus on the challenges of CI/CD, a responsibility assessment, and the impact of release strategies, to name a few. The goal of this Trend Report is to provide developers with the information they need to further innovate on their integration and delivery pipelines.
A Roadmap to True Observability
Getting Started With OpenTelemetry
This is an article from DZone's 2023 Observability and Application Performance Trend Report.For more: Read the Report In today's digital landscape, the growing importance of monitoring and managing application performance cannot be overstated. With businesses increasingly relying on complex applications and systems to drive their operations, ensuring optimal performance has become a top priority. In essence, efficient application performance management can mean the difference between business success and failure. To better understand and manage these sophisticated systems, two key components have emerged: telemetry and observability. Telemetry, at its core, is a method of gathering and transmitting data from remote or inaccessible areas to equipment for monitoring. In the realm of IT systems, telemetry involves collecting metrics, events, logs, and traces from software applications and infrastructure. This plethora of data is invaluable as it provides insight into system behavior, helping teams identify trends, diagnose problems, and make informed decisions. In simpler terms, think of telemetry as the heartbeat monitor of your application, providing continuous, real-time updates about its health. Observability takes this concept one step further. It's important to note that while it does share some similarities with traditional monitoring, there are distinct differences. Traditional monitoring involves checking predefined metrics or logs for anomalies. Observability, on the other hand, is a more holistic approach. It not only involves gathering data but also understanding the "why" behind system behavior. Observability provides a comprehensive view of your system's internal state based on its external outputs. It helps teams understand the overall health of the system, detect anomalies, and troubleshoot potential issues. Simply put, if telemetry tells you what is happening in your system, observability explains why it's happening. The Emergence of Telemetry and Observability in Application Performance In the early days of information systems, understanding what a system was doing at any given moment was a challenge. However, the advent of telemetry played a significant role in mitigating this issue. Telemetry, derived from Greek roots tele (remote) and metron (measure), is fundamentally about measuring data remotely. This technique has been used extensively in various fields such as meteorology, aerospace, and healthcare, long before its application in information technology. As the complexity of systems grew, so did the need for more nuanced understanding of their behavior. This is where observability — a term borrowed from control theory — entered the picture. In the context of IT, observability is not just about collecting metrics, logs, and traces from a system, but about making sense of that data to understand the internal state of the system based on the external outputs. Initially, these concepts were applied within specific software or hardware components, but with the evolution of distributed systems and the challenges they presented, the application of telemetry and observability became more systemic. Nowadays, telemetry and observability are integral parts of modern information systems, helping operators and developers understand, debug, and optimize their systems. They provide the necessary visibility into system performance, usage patterns, and potential bottlenecks, enabling proactive issue detection and resolution. Emerging Trends and Innovations With cloud computing taking the center stage in the digital transformation journey of many organizations, providers like Amazon Web Services (AWS), Azure, and Google Cloud have integrated telemetry and observability into their services. They provide a suite of tools that enable users to collect, analyze, and visualize telemetry data from their workloads running on the cloud. These tools don't just focus on raw data collection but also provide features for advanced analytics, anomaly detection, and automated responses. This allows users to transform the collected data into actionable insights. Another trend we observe in the industry is the adoption of open-source tools and standards for observability like OpenTelemetry, which provides a set of APIs, libraries, agents, and instrumentation for telemetry and observability. The landscape of telemetry and observability has come a long way since its inception, and continues to evolve with technology advancements and changing business needs. The incorporation of these concepts into cloud services by providers like AWS and Azure has made it easier for organizations to gain insights into their application performance, thereby enabling them to deliver better user experiences. The Benefits of Telemetry and Observability The world of application performance management has seen a paradigm shift with the adoption of telemetry and observability. This section delves deep into the advantages provided by these emerging technologies. Enhanced Understanding of System Behavior Together, telemetry and observability form the backbone of understanding system behavior. Telemetry, which involves the automatic recording and transmission of data from remote or inaccessible parts of an application, provides a wealth of information about the system's operations. On the other hand, observability derives meaningful insights from this data, allowing teams to comprehend the internal state of the system from its external outputs. This combination enables teams to proactively identify anomalies, trends, and potential areas of improvement. Improved Fault Detection and Resolution Another significant advantage of implementing telemetry and observability is the enhanced ability to detect and resolve faults. There are tools that allow users to collect and track metrics, collect and monitor log files, set alarms, and automatically react to changes in configuration. This level of visibility hastens the detection of any operational issues, enabling quicker resolution and reducing system downtime. Optimized Resource Utilization These modern application performance techniques also facilitate optimized resource utilization. By understanding how resources are used and identifying any inefficiencies, teams can make data-driven decisions to optimize resource allocation. An auto-scaling feature — which adjusts capacity to maintain steady, predictable performance at the lowest possible cost — is a prime example of this benefit. Challenges in Implementing Telemetry and Observability Implementing telemetry and observability into existing systems is not a straightforward task. It involves a myriad of challenges, stemming from the complexity of modern applications to the sheer volume of data that needs to be managed. Let's delve into these potential pitfalls and roadblocks. Potential Difficulties and Roadblocks The first hurdle is the complexity of modern applications. They are typically distributed across multiple environments — cloud, on-premises, hybrid, and even multi-cloud setups. This distribution makes it harder to understand system behavior, as the data collected could be disparate and disconnected, complicating telemetry efforts. Another challenge is the sheer volume, speed, and variety of data. Modern applications generate massive amounts of telemetry data. Collecting, storing, processing, and analyzing this data in real time can be daunting. It requires robust infrastructure and efficient algorithms to handle the load and provide actionable insights. Also, integrating telemetry and observability into legacy systems can be difficult. These older systems may not be designed with telemetry and observability in mind, making it challenging to retrofit them without impacting performance. Strategies To Mitigate Challenges Despite these challenges, there are ways to overcome them. For the complexity and diversity of modern applications, adopting a unified approach to telemetry can help. This involves using a single platform that can collect, correlate, and analyze data from different environments. To tackle the issue of data volume, implementing automated analytics and machine learning algorithms can be beneficial. These technologies can process large datasets in real time, identifying patterns and providing valuable insights. For legacy system integration issues, it may be worthwhile to invest in modernizing these systems. This could mean refactoring the application or adopting new technology stacks that are more conducive to telemetry and observability. Finally, investing in training and up-skilling teams on tools and best practices can be immensely beneficial. Practical Steps for Gaining Insights Both telemetry and observability have become integral parts of modern application performance management. They offer in-depth insights into our systems and applications, enabling us to detect and resolve issues before they impact end-users. Importantly, these concepts are not just theoretical — they're put into practice every day across services provided by leading cloud providers such as AWS and Google Cloud. In this section, we'll walk through a step-by-step guide to harnessing the power of telemetry and observability. I will also share some best practices to maximize the value you gain from these insights. Step-By-Step Guide The following are steps to implement performance management of a modern application using telemetry and observability on AWS, though this is also possible to implement using other cloud providers: Step 1 – Start by setting up AWS CloudWatch. CloudWatch collects monitoring and operational data in the form of logs, metrics, and events, providing you with a unified view of AWS resources, applications, and services. Step 2 – Use AWS X-Ray for analyzing and debugging your applications. This service provides an end-to-end view of requests as they travel through your application, showing a map of your application's underlying components. Step 3 – Implement AWS CloudTrail to keep track of user activity and API usage. CloudTrail enhances visibility into user and resource activity by recording AWS Management Console actions and API calls. You can identify which users and accounts called AWS, the source IP address from which the calls were made, and when the calls occurred. Step 4 – Don't forget to set up alerts and notifications. AWS SNS (Simple Notification Service) can be used to send you alerts based on the metrics you define in CloudWatch. Figure 1: An example of observability on AWS Best Practices Now that we've covered the basics of setting up the tools and services for telemetry and observability, let's shift our focus to some best practices that will help you derive maximum value from these insights: Establish clear objectives – Understand what you want to achieve with your telemetry data — whether it's improving system performance, troubleshooting issues faster, or strengthening security measures. Ensure adequate training – Make sure your team is adequately trained in using the tools and interpreting the data provided. Remember, the tools are only as effective as the people who wield them. Be proactive rather than reactive – Use the insights gained from telemetry and observability to predict potential problems before they happen instead of merely responding to them after they've occurred. Conduct regular reviews and assessments – Make it a point to regularly review and update your telemetry and observability strategies as your systems evolve. This will help you stay ahead of the curve and maintain optimal application performance. Conclusion The rise of telemetry and observability signifies a paradigm shift in how we approach application performance. With these tools, teams are no longer just solving problems — they are anticipating and preventing them. In the complex landscape of modern applications, telemetry and observability are not just nice-to-haves; they are essentials that empower businesses to deliver high-performing, reliable, and user-friendly applications. As applications continue to evolve, so will the tools that manage their performance. We can anticipate more advanced telemetry and observability solutions equipped with AI and machine learning capabilities for predictive analytics and automated anomaly detection. These advancements will further streamline application performance management, making it more efficient and effective over time. This is an article from DZone's 2023 Observability and Application Performance Trend Report.For more: Read the Report
This is an article from DZone's 2023 Observability and Application Performance Trend Report.For more: Read the Report Employing cloud services can incur a great deal of risk if not planned and designed correctly. In fact, this is really no different than the challenges that are inherit within a single on-premises data center implementation. Power outages and network issues are common examples of challenges that can put your service — and your business — at risk. For AWS cloud service, we have seen large-scale regional outages that are documented on the AWS Post-Event Summaries page. To gain a broader look at other cloud providers and services, the danluu/post-mortems repository provides a more holistic view of the cloud in general. It's time for service owners relying (or planning) on a single region to think hard about the best way to design resilient cloud services. While I will utilize AWS for this article, it is solely because of my level of expertise with the platform and not because one cloud platform should be considered better than another. A Single-Region Approach Is Doomed to Fail A cloud-based service implementation can be designed to leverage multiple availability zones. Think of availability zones as distinct locations within a specific region, but they are isolated from other availability zones in that region. Consider the following cloud-based service running on AWS inside the Kubernetes platform: Figure 1: Cloud-based service utilizing Kubernetes with multiple availability zones In Figure 1, inbound requests are handled by Route 53, arrive at a load balancer, and are directed to a Kubernetes cluster. The controller routes requests to the service that has three instances running, each in a different availability zone. For persistence, an Aurora Serverless database has been adopted. While this design protects from the loss of one or two availability zones, the service is considered at risk when a region-wide outage occurs, similar to the AWS outage in the US-EAST-1 region on December 7th, 2021. A common mitigation strategy is to implement stand-by patterns that can become active when unexpected outages occur. However, these stand-by approaches can lead to bigger issues if they are not consistently participating by handling a portion of all requests. Transitioning to More Than Two With single-region services at risk, it's important to understand how to best proceed. For that, we can draw upon the simple example of a trucking business. If you have a single driver who operates a single truck, your business is down when the truck or driver is unable to fulfill their duties. The immediate thought here is to add a second truck and driver. However, the better answer is to increase the fleet by two, which allows for an unexpected issue to complicate the original situation. This is known as the "n + 2" rule, which becomes important when there are expectations set between you and your customers. For the trucking business, it might be a guaranteed delivery time. For your cloud-based service, it will likely be measured in service-level objectives (SLOs) and service-level agreements (SLAs). It is common to set SLOs as four nines, meaning your service is operating as expected 99.99% of the time. This translates to the following error budgets, or down time, for the service: Month = 4 minutes and 21 seconds Week = 1 minute and 0.48 seconds Day = 8.6 seconds If your SLAs include financial penalties, the importance of implementing the n + 2 rule becomes critical to making sure your services are available in the wake of an unexpected regional outage. Remember, that December 7, 2021 outage at AWS lasted more than eight hours. The cloud-based service from Figure 1 can be expanded to employ a multi-region design: Figure 2: Multi-region cloud-based service utilizing Kubernetes and multiple availability zones With a multi-region design, requests are handled by Route 53 but are directed to the best region to handle the request. The ambiguous term "best" is used intentionally, as the criteria could be based upon geographical proximity, least latency, or both. From there, the in-region Kubernetes cluster handles the request — still with three different availability zones. Figure 2 also introduces the observability layer, which provides the ability to monitor cloud-based components and establish SLOs at the country and regional levels. This will be discussed in more detail shortly. Getting Out of the Toil Game Google Site Reliability Engineering's Eric Harvieux defined toil as noted below: "Toil is the kind of work that tends to be manual, repetitive, automatable, tactical, devoid of enduring value, and that scales linearly as a service grows." When designing services that run in multiple regions, the amount of toil that exists with a single region becomes dramatically larger. Consider the example of creating a manager-approved change request every time code is deployed into the production instance. In the single-region example, the change request might be a bit annoying, but it is something a software engineer is willing to tolerate. Now, with two additional regions, this will translate to three times the amount of change requests, all with at least one human-based approval being required. An obtainable and desirable end-state should still include change requests, but these requests should become part of the continuous delivery (CD) lifecycle and be created automatically. Additionally, the observability layer introduced in Figure 2 should be leveraged by the CD tooling in order to monitor deployments — rolling back in the event of any unforeseen circumstances. With this approach, the need for human-based approvals is diminished, and unnecessary toil is removed from both the software engineer requesting the deployment and the approving manager. Harnessing the Power of Observability Observability platforms measure a system's state by leverage metrics, logs, and traces. This means that a given service can be measured by the outputs it provides. Leading observability platforms go a step further and allow for the creation of synthetic API tests that can be used to exercise resources for a given service. Tests can include assertions that introduce expectations — like a particular GET request will respond with an expected response code and payload within a given time period. Otherwise, the test will be marked as failed. SLOs can be attached to each synthetic test, and each test can be executed in multiple geographical locations, all monitored from the observability platform. Taking this approach allows service owners the ability to understand service performance from multiple entry points. With the multi-region model, tests can be created and performance thereby monitored at the regional and global levels separately, thus producing a high degree of certainty on the level of performance being produced in each region. In every case, the power of observability can justify the need for manual human-based change approvals as noted above. Bringing It All Together From the 10,000-foot level, the multiregion service implementation from Figure 2 can be placed onto a United States map. In Figure 3, the database connectivity is mapped to demonstrate the inner-region communication, while the observability and cloud metrics data are gathered from AWS and the observability platform globally. Figure 3: Multi-region service adoption placed near the respective AWS regions Service owners have peace of mind that their service is fully functional in three regions by implementing the n + 2 rule. In this scenario, the implementation is prepared to survive two complete region outages. As an example, the eight-hour AWS outage referenced above would not have an impact on the service's SLOs/ SLAs during the time when one of the three regions is unavailable. Charting a Plan Toward Multi-Region Implementing a multi-region footprint for your service without increasing toil is possible, but it does require planning. Some high-level action items are noted below: Understand your persistence layer – Understanding your persistence layer early on is key. If multiple-write regions are not a possibility, alternative approaches will be required. Adopt Infrastructure as Code – The ability to define your cloud infrastructure via code is critical to eliminate toil and increase the ability to adopt additional regions, or even zones. Use containerization – The underlying service is best when containerized. Build the container you wish to deploy during the continuous integration stage and scan for vulnerabilities within every layer of the container for added safety. Reduce time to deploy – Get into the habit of releasing often, as it only makes your team stronger. Establish SLOs and synthetics – Take the time to set SLOs for your service and write synthetic tests to constantly measure your service — across every environment. Automate deployments – Leverage observability during the CD stage to deploy when a merge-to-main event occurs. If a dev deploys and no alerts are emitted, move on to the next environment and continue all the way to production. Conclusion It's important to understand the limitations of the platform where your services are running. Leveraging a single region offered by your cloud provider is only successful when there are zero region-wide outages. Based upon prior history, this is no longer good enough and is certain to happen again. No cloud provider is ever going to be 100% immune from a region-wide outage. A better approach is to utilize the n + 2 rule and increase the number of regions your service is running in by two additional regions. In taking this approach, the service will still be able to respond to customer requests in the event of not only one regional outage but also any form of outage in a second region where the service is running. By adopting the n + 2 approach, there is a far better chance at meeting SLAs set with your customers. Getting to this point will certainly present challenges but should also provide the opportunity to cut down (or even eliminate) toil within your organization. In the end, your customers will benefit from increased service resiliency, and your team will benefit from significant productivity gains. Have a really great day! Resources AWS Post-Event Summaries, AWS Summary of the AWS Service Event in the Northern Virginia (US-EAST-1) Region, AWS danluu/post-mortems, GitHub "Identifying and Tracking Toil Using SRE Principles" by Eric Harvieux, 2020 "Failure Recovery: When the Cure Is Worse Than the Disease" by Guo et al., 2013 This is an article from DZone's 2023 Observability and Application Performance Trend Report.For more: Read the Report
This is an article from DZone's 2023 Observability and Application Performance Trend Report.For more: Read the Report From cultural and structural challenges within an organization to balancing daily work and dividing it between teams and individuals, scaling teams of site reliability engineers (SREs) comes with many challenges. However, fostering a resilient site reliability engineering (SRE) culture can facilitate the gradual and sustainable growth of an SRE team. In this article, we explore the challenges of scaling and review a successful scaling framework. This framework is suitable for guiding emerging teams and startups as they cultivate an evolving SRE culture, as well as for established companies with firmly entrenched SRE cultures. The Challenges of Scaling SRE Teams As teams scale, complexity may increase as it can be more difficult to communicate, coordinate, and maintain a team's coherence. Below is a list of challenges to consider as your team and/or organization grows: Rapid growth – Rapid growth leads to more complex systems, which can outpace the capacity of your SRE team, leading to bottlenecks and reduced reliability. Knowledge-sharing – Maintaining a shared understanding of systems and processes may become difficult, making it challenging to onboard new team members effectively. Tooling and automation – Scaling without appropriate tooling and automation can lead to increased manual toil, reducing the efficiency of the SRE team. Incident response – Coordinating incident responses can become more challenging, and miscommunications or delays can occur. Maintaining a culture of innovation and learning – This can be challenging as SREs may become more focused on solving critical daily problems and less focused on new initiatives. Balancing operational and engineering work – Since SREs are responsible for both operational tasks and engineering work, it is important to ensure that these teams have enough time to focus on both areas. A Framework for Scaling SRE Teams Scaling may come naturally if you do the right things in the right order. First, you must identify what your current state is in terms of infrastructure. How well do you understand the systems? Determine existing SRE processes that need improvement. For the SRE processes that are necessary but are not employed yet, find the tools and the metrics necessary to start. Collaborate with the appropriate stakeholders, use feedback, iterate, and improve. Step 1: Assess Your Current State Understand your system and create a detailed map of your infrastructure, services, and dependencies. Identify all the components in your infrastructure, including servers, databases, load balancers, networking equipment, and any cloud services you utilize. It is important to understand how these components are interconnected and dependent on each other — this includes understanding which services rely on others and the flow of data between them. It's also vital to identify and evaluate existing SRE practices and assess their effectiveness: Analyze historical incident data to identify recurring issues and their resolutions. Gather feedback from your SRE team and other relevant stakeholders. Ask them about pain points, challenges, and areas where improvements are needed. Assess the performance metrics related to system reliability and availability. Identify any trends or patterns that indicate areas requiring attention. Evaluate how incidents are currently being handled. Are they being resolved efficiently? Are post-incident reviews being conducted effectively to prevent recurrences? Step 2: Define SLOs and Error Budgets Collaborate with stakeholders to establish clear and meaningful service-level objectives (SLOs) by determining the acceptable error rate and creating error budgets based on the SLOs. SLOs and error budgets can guide resource allocation optimization. Computing resources can be allocated to areas that directly impact the achievement of the SLOs. SLOs set clear, achievable goals for the team and provide a measurable way to assess the reliability of a service. By defining specific targets for uptime, latency, or error rates, SRE teams can objectively evaluate whether the system is meeting the desired standards of performance. Using specific targets, a team can prioritize their efforts and focus on areas that need improvement, thus fostering a culture of accountability and continuous improvement. Error budgets provide a mechanism for managing risk and making trade-offs between reliability and innovation. They allow SRE teams to determine an acceptable threshold for service disruptions or errors, enabling them to balance the need for deploying new features or making changes to maintain a reliable service. Step 3: Build and Train Your SRE Team Identify talent according to the needs of each and every step of this framework. Look for the right skillset and cultural fit, and be sure to provide comprehensive onboarding and training programs for new SREs. Beware of the golden rule that culture eats strategy for breakfast: Having the right strategy and processes is important, but without the right culture, no strategy or process will succeed in the long run. Step 4: Establish SRE Processes, Automate, Iterate, and Improve Implement incident management procedures, including incident command and post-incident reviews. Define a process for safe and efficient changes to the system. Figure 1: Basic SRE process One of the cornerstones of SRE involves how to identify and handle incidents through monitoring, alerting, remediation, and incident management. Swift incident identification and management are vital in minimizing downtime, which can prevent minor issues from escalating into major problems. By analyzing incidents and their root causes, SREs can identify patterns and make necessary improvements to prevent similar issues from occurring in the future. This continuous improvement process is crucial for enhancing the overall reliability and performance whilst ensuring the efficiency of systems at scale. Improving and scaling your team can go hand in hand. Monitoring Monitoring is the first step in ensuring the reliability and performance of a system. It involves the continuous collection of data about the system's behavior, performance, and health. This can be broken down into: Data collection – Monitoring systems collect various types of data, including metrics, logs, and traces, as shown in Figure 2. Real-time observability – Monitoring provides real-time visibility into the system's status, enabling teams to identify potential issues as they occur. Proactive vs. reactive – Effective monitoring allows for proactive problem detection and resolution, reducing the need for reactive firefighting. Figure 2: Monitoring and observability Alerting This is the process of notifying relevant parties when predefined conditions or thresholds are met. It's a critical prerequisite for incident management. This can be broken down into: Thresholds and conditions – Alerts are triggered based on predefined thresholds or conditions. For example, an alert might be set to trigger when CPU usage exceeds 90% for five consecutive minutes. Notification channels – Alerts can be sent via various notification channels, including email, SMS, or pager, or even integrated into incident management tools. Severity levels – Alerts should be categorized by severity levels (e.g., critical, warning, informational) to indicate the urgency and impact of the issue. Remediation This involves taking actions to address issues detected through monitoring and alerting. The goal is to mitigate or resolve problems quickly to minimize the impact on users. Automated actions – SRE teams often implement automated remediation actions for known issues. For example, an automated scaling system might add more resources to a server when CPU usage is high. Playbooks – SREs follow predefined playbooks that outline steps to troubleshoot and resolve common issues. Playbooks ensure consistency and efficiency during remediation efforts. Manual interventions – In some cases, manual intervention by SREs or other team members may be necessary for complex or unexpected issues. Incident Management Effective communication, knowledge-sharing, and training are crucial during an incident, and most incidents can be reproduced in staging environments for training purposes. Regular updates are provided to stakeholders, including users, management, and other relevant teams. Incident management includes a culture of learning and continuous improvement: The goal is not only to resolve the incident but also to prevent it from happening again. Figure 3: Handling incidents A robust incident management process ensures that service disruptions are addressed promptly, thus enhancing user trust and satisfaction. In addition, by effectively managing incidents, SREs help preserve the continuity of business operations and minimize potential revenue losses. Incident management plays a vital role in the scaling process since it establishes best practices and promotes collaboration, as shown in Figure 3. As the system scales, the frequency and complexity of incidents are likely to increase. A well-defined incident management process enables the SRE team to manage the growing workload efficiently. Conclusion SRE is an integral part of the SDLC. At the end of the day, your SRE processes should be integrated into the entire process of development, testing, and deployment, as shown in Figure 4. Figure 4: Holistic view of development, testing, and the SRE process Iterating on and improving the steps above will inevitably lead to more work for SRE teams; however, this work can pave the way for sustainable and successful scaling of SRE teams at the right pace. By following this framework and overcoming the challenges, you can effectively scale your SRE team while maintaining system reliability and fostering a culture of collaboration and innovation. Remember that SRE is an ongoing journey, and it is essential to stay committed to the principles and practices that drive reliability and performance. This is an article from DZone's 2023 Observability and Application Performance Trend Report.For more: Read the Report
Nowadays in the agile way of the software development lifecycle, continuous integration and continuous delivery enable software delivery workflows to include multiple teams and functions spanning over development, assurance, operations, and security. What Are Software Design Patterns? Software design patterns are best practices that are followed in order to resolve common problems in development. By following software patterns, a development team can follow the same practices to deliver, build, and deploy code in a much more efficient and systematic way. Software design anti-patterns are the malpractices that can cause harm in the way software development is being done. Continuous Integration Software Design Patterns and Anti-Patterns Continuous integration is an automated integration process that gets the source code from multiple branches to be merged into a main branch which is then used as a reference to deploy the development code to different environments. Using certain patterns cleanses the code to be made deployment-ready. CI Pipeline Patterns and Anti-Patterns Version Controlling the Source Code The definition of standards plays an important role in the continuous integration chain. Now, there are several conventions that make it possible to facilitate the understanding of an application's source code and software development lifecycle. Defining conventions, therefore, has a major impact on both the individual or team, and at the level of automated processes. Continuous Integration Version Control Patterns: Define better conventions to set better contexts for the development lifecycle. Build on every change done to a commit, branch, merge, and pull request. Add useful information to commit messages, use a proper branch naming convention, and standardize the application version. Use pre- and- post actions on commits, merges, and pull requests. Continuous Integration Version Control Anti-Patterns: Have limited builds per sprint, per week; cherry-picking commits. Use a non-relevant branch name and meaningless commit messages. Use different versions for different applications for build. Test the maximum of the source code manually after packaging or deploying it. Running Builds Periodically The build phase is the most important phase of the continuous integration cycle. In this phase, several validations are required and considerations are ensured to make sure that the application has been packaged properly for deployment. Related Tutorial: Azure DevOps and Deploying a Mule Application into Cloudhub Continuous Integration Build Patterns: Use a fresh isolated environment to build the application and control the allocated resources to avoid impacting other builds. Automatically release and deploy a new version on every new commit, branch, merge, or pull request. Test the weekly builds to identify potential issues proactively instead of waiting for a code update. Deploy a hotfix as soon as possible. Test the code in staging before moving it to production. Deploy the build free of any security vulnerabilities and sensitive data exposure; take action immediately if a severity is defined, and disassociate passwords from the source code. Lint and format code to make the source code more readable. Run a set of tests automatically on each build to run specific sets periodically. Run the tests in the same pattern across different platforms using the same set of test data to compare results. Continuous Integration Build Anti-Patterns: Always use the same environment without handling dependency issues; not optimizing resources for subsequent builds and potentially impacting other builds as well. Start a build manually after every sprint or week, depending upon task allocation. Schedule a hotfix directly to the production environment. Add in sensitive data, like usernames, passwords, tokens, etc., to configuration files. Not setting in code quality standards and semantics Run tests manually after deployment. Run a test framework that would fail because of the status of the infrastructure. Continuous Deployment Software Design Patterns and Anti-Patterns Continuous deployment enables operations and workflows. The goal is to safely deliver artifacts into the different environments in a repeated fashion with a lesser error rate. The continuous deployment process helps with automating the operational services of releasing, deploying, and monitoring applications. Validations and Release Management The delivery phase is an extension of the continuous integration phase where the system needs to handle the automated process of deploying all code changes to a stable test environment to qualify the working functionality of the source code and the working version of the source code before deployment. Learn more about the release pipeline using Azure DevOps. Continuous Deployment Validation and Release Management Patterns: Automate the verification and validation procedure of the released version of the software to include unit, integration, and regression testing. Define an enterprise standard release convention for version management and facilitate automation. Deploy a hotfix when necessary; test the code in a pre-prod environment before moving it to production. Continuous Deployment Validation and Release Management Anti-Patterns: Use manual tests to verify and validate software. Do not increment the application version to overwrite the previous existing versions. Schedule the deployment of a hotfix; test directly in production. Related Guide: Automation Testing in CI/CD Pipelines Deployment Deployment is done once the feature is tested in a pre-production environment for any regression issues or uncaught errors in the platform. Continuous Deployment Patterns: Run the build process once while deploying to multiple target environments. Deploy code to the production but limit the access to the codebase by enabling a feature flag. Utilize automated provisioning code or scripts to automatically start and destroy environments. Continuous Deployment Anti-Patterns: Run the software build in every stage of the deployment pipeline. Wait to commit the code until the feature development has been completed. Rollback Rollback becomes very important if there's a deployment failure, which essentially means getting back the system to the previous working state. Continuous Deployment Rollback Patterns: Provide a single command rollback of changes after an unsuccessful deployment. Keep the environmental configuration changes. Externalize all variable values from the application configuration as build/deployment properties. Continuous Deployment Rollback Anti-Patterns: Manually undo changes applied to rollback the deployed code. Hardcode values inside the source code based on the target environments. Documentation for Steps and Procedures Documentation is a significant component of the deployment process flow to stream the respective information across stakeholders at every team level. Continuous Deployment Documentation Pattern: Define a standard of documentation that can be understood by every team. Continuous Deployment Documentation Anti-pattern: Keeping the documentation restricted to specific teams. Conclusion The CI/CD process is an omnipotent part of the software delivery lifecycle. It gives an enterprise the power to release quality code that follows all standardizations and compliance into the production environment. The CI/CD software patterns and anti-patterns are important to understand as they give immense potential to standardize the quality of code delivery. If the CI/CD process can be established with the right principles, it will help reduce fallacies and reduce the time-to-market of the product. Additional Resources "DevOps tech: Continuous delivery" "DevOps Metrics: Why, what, and how to measure success in DevOps" Progressive Delivery Patterns and Anti-Patterns Refcard Introduction to DevSecOps Refcard The Essentials of GitOps Refcard Getting Started With Feature Flags Refcard Getting Started With Log Management Refcard End-to-End Testing Automation Essentials Refcard
Jenkins has been a staple in software automation for over a decade due largely to its feature-rich tooling and adaptability. While many impressive alternatives have entered the space, Jenkins remains one of the vanguards. Despite its success, Jenkins can have a significant learning curve, and jumping into the vast world of Jenkins plugins and features can quickly become overwhelming. In this article, we will break down that complexity by first understanding the fundamentals and concepts that underpin Jenkins. With that foundation, we will learn how to create a simple pipeline in Jenkins to build and test an application. Lastly, we will look at how to advance this simple example into a more complex project and explore some alternatives to Jenkins. What Is Jenkins? Fundamentals and Concepts Jenkins is a software automation service that helps us script tasks like builds. Of particular interest is Jenkins's ability to create a pipeline, or a discrete set of ordered tasks. Before we create a pipeline in Jenkins, we must first understand what a pipeline is and why it is useful. This understanding starts with a journey through the history of software development. Big Bang Integration Before automation, we were forced to manually build and test our applications locally. Once our local tests passed, we would commit our changes to a remote repository to integrate them with the changes made by other developers. At a predetermined point — usually, as a release approached — the Quality Assurance (QA) team would take the code in our remote repository and test it. While our local tests may have passed before we committed them, and the local tests of other developers worked before they committed them, there was no guarantee that our combined changes would work. We would instead have to wait until QA tested everything together. This moment of truth was usually called the Big Bang. In the (likely) event that the tests failed, we would then have to hunt through all of the commits to see which one (or ones) was the culprit. Continuous Integration The process of running our tests and verifying our code after each commit is called Continuous Integration (CI). As the name implies, CI differs from Big Bang integration by continuously integrating code and verifying that it works. The Big Bang integration approach may work for small projects or prototypes, but it is a massive hindrance for medium- or large-scale projects. Ideally, we want to know if our tests pass when we merge our code with the remote repository. This requires two main changes to our process: Automating tests Executing automated tests after each check-in While automated tests could be created for a Big Bang project, they are not required. They are, however, required for our new process to work. It would be prohibitive to manually test the tests for any project with multiple commits per day. Instead, we need a test suite that can be run automatically, wherever and whenever necessary. The second requirement is that our automated tests are run each time a commit is made to the remote repository. This requires that some service (external or co-located with our repository) check out the repository after each commit, run our tests, and report if the tests passed or failed. This process could be run periodically, but ideally, it should be run every time a commit is made so that we can trace exactly which commit caused our test suite to fail. With CI, instead of waiting until some point in the future to see if our code works, we know at any given time whether our code works; what's more, we also know exactly when and where a failure originates when it stops working. CI is a massive leap forward in software automation. There are very few projects today that do not use some level of CI to ensure that each commit does not "break the build." While this is a great improvement, it is only a half-step relative to the process that our code traverses from commit to delivery. Continuous Delivery When we looked at our manual build process, we rightly saw an opportunity to automate the build and test stages of our process; but this is only a small part of the overall process. For most software, we do not just build and unit test; we also run higher-level tests (such as integration and system tests), deliver our final product to our customers, and a wide array of steps in between. If we are following the mindset of CI, it begs the question: Why not automate the entire business process, from build to delivery, and run each step in the process sequentially until our product is automatically delivered to the customer? This revolutionary approach is called Continuous Delivery (CD). Like CI, CD continuously integrates our code as we make commits, but unlike CI, CD does not stop after unit tests are complete. Instead, CD challenges us to automate every step in our business process until the final product is automatically delivered to the customer. This sequence of automated steps is called a pipeline. A pipeline consists of stages, which are groups of steps executed in parallel. For one stage to start, all the steps of the previous stage must complete successfully. An example of a common CI/CD pipeline is illustrated below: While the particular stages and steps of a CI/CD pipeline may vary, they all share a common definition: they are simple abstractions of the business process that a software product must complete before it is delivered to the customer. Even without CI/CD, every software delivery includes a delivery process; we execute the process manually. CI/CD does not introduce anything new to the process: it simply automates each stage so that the pipeline can be executed automatically. Learn more about CI/CD Software Design Patterns. CI/CD is a very involved topic, and it can be overwhelming at first glance, but it can be summed up with a few main concepts: A pipeline is an abstraction of the business process we use to deliver a product. A pipeline is composed of an ordered set of stages. A stage is composed of a set of steps that are run in parallel. A stage cannot start executing until all of the steps in a previous stage have completed. A trigger is the first event in a pipeline that initiates the first stage in a pipeline (i.e., a commit to a repository). A pipeline is executed after every commit to a repository. The deliverable from a pipeline is not delivered to a customer unless all of the stages pass. This last point is where CI/CD shines: We know that any artifact delivered to a customer is the last working artifact that successfully passes through the pipeline. Likewise, we know that any time a commit results in a passing artifact, it is automatically delivered to the customer (the customer does not have to wait for us to deliver it or wait for multiple commits to receive the latest delivery). For more information on pipelines and CI/CD in general, see the following articles: "How To Build an Effective CI/CD Pipeline" "Continuous Test Automation Using CI/CD: How CI/CD Has Revolutionized Automated Testing" CI/CD and Jenkins At this point, we have a foundational understanding of what CI/CD is and why it is important. In our discussion of CI/CD, we left out one important point: what actually executes the pipeline? Whatever this remaining piece is, it must be capable of doing the following: Scan a remote repository for commits Clone the latest code from a repository Define a pipeline and its constituent stages using scripts and other automated mechanism Run the automated steps in the pipeline Report the status of a pipeline execution (e.g., pass or fail) Deliver the final artifacts to some internal or external location This is where Jenkins comes in. Jenkins is an automation server that can be used to perform all of the steps above. While Jenkins is a very powerful automation service that can do more than just CI/CD (Jenkins can conceivably automate just about any process), it has the tools necessary to create a functional CI/CD pipeline and execute it after a commit to our repository. Jenkins has a long history — and capturing all its ins and outs would consume volumes — but at its core, Jenkins is a powerful CI/CD tool used by many of the largest software companies. Its rich set of features and plugins, along with its time-tested reliability, has cemented it as a staple in the software automation community. For more general information on Jenkins, see the official Jenkins documentation. Jenkins Builds: Setting up a Pipeline Our main goal in this tutorial is to set up a simple pipeline using Jenkins. While most Jenkins pipelines (or any pipeline in general) will include numerous, possibly complex stages, we will start by creating a minimally viable pipeline with a single stage. We will then split our single-stage pipeline into a two-stage pipeline. From there, we will examine how to use this simple pipeline as a starting point for a production-ready pipeline. Setting up Jenkins To set up Jenkins, we will need to complete three steps: Install Docker Build and run the Jenkins Docker container Configure Jenkins Installing Docker Before we install Docker, we need to create a DockerHub account. DockerHub is the Docker equivalent of GitHub and acts as a registry of preconfigured container images, such as Ubuntu, MongoDB, and Jenkins. We will use these preconfigured containers as a starting point for installing Jenkins, as well as a starting point for the projects that we build in Jenkins. To create a DockerHub account: Navigate to the DockerHub Sign-Up page. Enter your email and desired username and password, or link to a Google or GitHub account. Submit your account information. Verify your account using the email sent to the email address entered above. Login to your new DockerHub account. For our first Jenkins project, the default, Personal account will suffice since it allows us to download 200 containers every 6 hours (at the time of writing). If we were creating a Jenkins pipeline for a business product or a team project, we should look for an upgraded account, such as Pro or Business. For more information, see the Docker Pricing page. Review related documentation on how to health check your Docker Containers. Once we have created a DockerHub account, we can install the Docker Desktop application. Docker Desktop is a visual application that allows us to buiu Docker images and start them as containers. Docker Desktop is supported on Windows, Mac, and Linux. For more information on how to install Docker Desktop on each of these platforms, see the following Docker pages: Install Docker Desktop on Windows Install Docker Desktop on Mac Install Docker Desktop on Linux Once Docker Desktop is installed, we need to log in to our DockerHub account to link it to our Docker Desktop installation: Open Docker Desktop. Click the Login link. Log in to DockerHub in the opened browser tab. Return to Docker Desktop after logging in. Accept the license agreement. Once our account is linked, we are ready to pull the Docker Jenkins image and start the container. Running the Jenkins Docker Container With Docker now set up, we can create a new Jenkins image that includes all the necessary packages and run the image as a container. For this section, we will use the Windows setup process as an example. The setup process for macOS and Linux is similar but slightly different. For more information on setting up the Jenkins container on macOS or Linux, see Installing Jenkins with Docker on macOS and Linux. First, we need to create a bridge network for Jenkins using the following command: Shell docker network create jenkins Next, we need to run a docker:dinb image: Shell docker run --name jenkins-docker --rm --detach ^ --privileged --network jenkins --network-alias docker ^ --env DOCKER_TLS_CERTDIR=/certs ^ --volume jenkins-docker-certs:/certs/client ^ --volume jenkins-data:/var/jenkins_home ^ --publish 2376:2376 ^ docker:dind The docker:dind (dind stands for Docker-in-Docker) is an image provided by Docker that allows us to run Docker inside a Docker container. We will need Docker to be installed inside our container to run our pipeline since Jenkins will start a new Docker container within the Jenkins container to execute the steps of our pipeline. Next, we must create a Docker image based on the Jenkins image. This custom image includes all the features, such as the Docker CLI, that Jenkins needs to execute our pipeline. To create this image, we can save the following Dockerfile in the current directory: Dockerfile FROM jenkins/jenkins:2.414.3-jdk17 USER root RUN apt-get update && apt-get install -y lsb-release RUN curl -fsSLo /usr/share/keyrings/docker-archive-keyring.asc \ https://download.docker.com/linux/debian/gpg RUN echo "deb [arch=$(dpkg --print-architecture) \ signed-by=/usr/share/keyrings/docker-archive-keyring.asc] \ https://download.docker.com/linux/debian \ $(lsb_release -cs) stable" > /etc/apt/sources.list.d/docker.list RUN apt-get update && apt-get install -y docker-ce-cli USER jenkins RUN jenkins-plugin-cli --plugins "blueocean docker-workflow" Once the Dockerfile has been saved, we can create a new image from it: Shell docker build -t myjenkins-blueocean:2.414.3-1 . This command names the new image myjenkins-blueocean (with a version of 2.414.3-1) and assumes the Dockerfile is in the current directory. Note that we can use any valid Docker image name that we wish. At the time of writing, a valid image name abides by the following criteria: The [name] must be valid ASCII and can contain lowercase and uppercase letters, digits, underscores, periods, and hyphens. It cannot start with a period or hyphen and must be no longer than 128 characters. Lastly, we can start our container using the following command: Shell docker run --name jenkins-blueocean --restart=on-failure --detach ^ --network jenkins --env DOCKER_HOST=tcp://docker:2376 ^ --env DOCKER_CERT_PATH=/certs/client --env DOCKER_TLS_VERIFY=1 ^ --volume jenkins-data:/var/jenkins_home ^ --volume jenkins-docker-certs:/certs/client:ro ^ --publish 8080:8080 --publish 50000:50000 myjenkins-blueocean:2.414.3-1 We can confirm that our Jenkins container (named myjenkins-blueocean) is running by completing the following steps: Open Docker Desktop. Click the Containers tab on the left panel. Ensure that the myjenkins-blueocean container is running. The running container will resemble the following in the Docker Desktop GUI: At this point, our Jenkins container is ready. Again, the process for creating the Jenkins container for Mac and Linux is similar to that of Windows. For more information, see the following pages: Installing Jenkins with Docker on Windows Installing Jenkins with Docker on macOS and Linux (linked previously in this article) Configuring Jenkins Once the Jenkins container is running, we can access the Jenkins User Interface (UI) through our browser at http://localhost:8080. The Jenkins welcome screen will give us a prompt requesting the Administrator password. We can find this password and complete the Jenkins installation using the following steps: Open Docker Desktop. Click the Containers tab on the left panel. Click our running Jenkins container (myjenkins-blueocean). Click the Logs tab (this tab should open by default) Find the lines in the log that resemble the following: Plain Text 2023-10-31 11:25:45 ************************************************************* 2023-10-31 11:25:45 ************************************************************* 2023-10-31 11:25:45 ************************************************************* 2023-10-31 11:25:45 2023-10-31 11:25:45 Jenkins initial setup is required. An admin user has been created and a password generated. 2023-10-31 11:25:45 Please use the following password to proceed to installation: 2023-10-31 11:25:45 2023-10-31 11:25:45 080be1abb4e04be59a0428a85c02c6e9 2023-10-31 11:25:45 2023-10-31 11:25:45 This may also be found at: /var/jenkins_home/secrets/initialAdminPassword 2023-10-31 11:25:45 2023-10-31 11:25:45 ************************************************************* 2023-10-31 11:25:45 ************************************************************* 2023-10-31 11:25:45 ************************************************************* In this example, the administrative password is 080be1abb4e04be59a0428a85c02c6e9. Input this password into the Jenkins welcome page (located at http://localhost:8080). Click the Continue button. Click the Install Suggested Plugins button on the Customize Jenkins page. Wait for the Jenkins setup to complete. Enter a desired username, password, email, and full name. Click the Save and Continue button. Enter http://localhost:8080/ (the default value) on the Instance Configuration page. Click the Save and Finish button. Click the Start using Jenkins button. At this point, Jenkins is running and configured, and we are now ready to create our Jenkins pipeline. Note that we performed a very basic installation, and the installation we perform for a business project or a larger team will vary. For example, we may need additional plugins to allow our team to log in, or we may have additional security concerns — such as not running Jenkins on HTTP or on localhost:8080. For more information on how to set up a Jenkins container, see the Jenkins Docker Installation page. Creating a Pipeline Our next step is to create a pipeline to execute our build. Pipelines can be complex, depending on the business process being automated, but it's a good idea to start small and grow. In keeping with this philosophy, we will start with a simple pipeline: A single stage with a single step that runs mvn clean package to create an artifact. From there, we will divide the pipeline into two stages: a build stage and a test stage. To accomplish this, we will: Install the Docker Pipeline plugin if it is not already installed. Add a Jenkinsfile to our project. Create a pipeline in Jenkins that uses our project. Configure our pipeline to build automatically when our project repository changes. Separate our single-stage pipeline into a build and test stage. Installing the Pipeline Plugin Sometimes, the Docker Pipeline plugin (which is needed to create our pipeline) may not be installed by default. To check if the plugin is installed, we must complete the following steps: Click Manage Jenkins in the left panel. Click the Plugins button under System Configuration. Click Installed plugins in the left panel. Search for docker pipeline in the search installed plugins search field. Ensure that the Docker Pipeline plugin is enabled. If the Docker Pipeline plugin is installed, we can skip the installation process. If the plugin is not installed, we can install it using the following steps: Click Manage Jenkins in the left panel. Click the Plugins button under System Configuration. Click Available plugins in the left panel. Search for docker pipeline in the search available plugins search field. Check the checkbox for the Docker Pipeline plugin. Click the Install button on the top right. Wait for the download and installation to complete. With the plugin installed, we can now start working on our pipeline. Related Guide: How to Replace CURL in Scripts with the Jenkins HTTP Request Plugin. Adding a Simple Jenkinsfile Before we create our pipeline, we first need to add a Jenkinsfile to our project. A Jenkinsfile is a configuration file that resides at the top level of our repository and configures the pipeline that Jenkins will run when our project is checked out. A Jenkinsfile is similar to a Dockerfile but deals with pipeline configurations rather than Docker image configurations. Note that this section will use the jenkins-example-project as a reference project. This repository is publicly available, so we can use and build it from any Jenkins deployment, even if Jenkins is deployed on our machine. We will start with a simple Jenkinsfile (located in the root directory of the project we will build) that creates a pipeline with a single step (Build): Plain Text pipeline { agent { docker { image 'maven:3.9.5-eclipse-temurin-17-alpine' args '-v /root/.m2:/root/.m2' } } stages { stage('Build') { steps { sh 'mvn clean package' } } } } The agent section of a Jenkins file configures where the pipeline will execute. In this case, our pipeline will execute on a Docker container run from the maven:3.9.5-eclipse-temurin-17-alpine Docker image. The -v /root/.m2:/root/.m2 argument creates a two-way mapping between the /root/.m2 directory within the Docker container and the /root/.m2 directory within our Docker host. According to the Jenkins documentation: This args parameter creates a reciprocal mapping between the /root/.m2 directories in the short-lived Maven Docker container and that of your Docker host’s filesystem....You do this mainly to ensure that the artifacts for building your Java application, which Maven downloads while your Pipeline is being executed, are retained in the Maven repository after the Maven container is gone. This prevents Maven from downloading the same artifacts during successive Pipeline runs. Lastly, we create our stages under the stages section and define our single stage: Build. This stage has a single step that runs the shell command mvn clean package. More information on the full suite of Jenkinsfile syntax can be found on the Pipeline Syntax page and more information on the mvn clean package command can be found in the Maven in 5 Minutes tutorial. Creating the Pipeline With our Jenkinsfile in place, we can now create a pipeline that will use this Jenkinsfile to execute a build. To set up the pipeline, we must complete the following steps: Navigate to the Jenkins Dashboard page (http://localhost:8080/). Click + New Item in the left panel. Enter a name for the pipeline (such as Example-Pipeline). Click Multibranch Pipeline. Click the OK button. Click Add Source under the Branch Sources section. Select GitHub. Enter the URL of the repository to build in the Repository HTTPS URL field (for example, https://github.com/albanoj2/jenkins-example-project) Click the Validate button. Ensure that the message Credentials ok. Connected to [project-url] is displayed. Click the Save button. Saving the pipeline configuration will kick off the first execution of the pipeline. To view our latest execution, we need to navigate to the Example-Pipeline dashboard by clicking on the Example-Pipeline link in the pipeline table on the main Jenkins page (http://localhost:8080): Jenkins structures its dashboard in a hierarchy that resembles the following: Main Dashboard → Pipeline → Branch In this case, Example-Pipeline is the page that displays information about our newly created pipeline. Within this page is a branch table with a row for each branch we track from our repository. In our case, we are only tracking the master branch, but if we tracked other branches, we would see more than one row for each of our branches: Each tracked branch is run according to the Jenkinsfile for that branch. Conceivably, the pipeline for one branch may differ from the pipeline for another (since their Jenkinsfiles may differ), so we should not assume that the execution of each pipeline under the Example-Pipeline pipeline will be the same. We can also track Pull Requests (PR) in our repository similarly to branches: For each PR we track, a new row will be added to the PR table (accessed by clicking the Pull Requests link next to the Branches link above the table), which allows us to see the pipeline executions for each PR. For more information on tracking branches and PRs, see the Jenkins Pipeline Branches and Pull Requests page. If we click on the master branch, the master branch page shows us that both stages of our pipeline (and, therefore, the entire pipeline) were completed successfully. While we only definite a single stage (Build) in our pipeline, Jenkins will implicitly add a stage (Checkout SCM, or Software Configuration Management), which checks out our repository. Once the repository is checked out, Jenkins runs our pipeline stages against the local clone. Running the Pipeline Automatically By default, our pipeline will only be executed manually. To automatically execute the pipeline, we have two options: Scan the repository periodically Create a webhook To scan the repository periodically, we can change the pipeline configuration: Navigate to the Jenkin Dashboard. Click the Example-Pipeline pipeline. Click the Configuration tab in the left panel. Check the Periodically if not otherwise run checkbox. Set the Interval to the desired amount. Click the Save button. This will poll the repository periodically and execute the pipeline if a change is detected (if an execution was not otherwise started manually). Creating a webhook is more synchronized and does not require polling, but it does require a bit more configuration. For more information about how to set up a webhook for Jenkins and a GitHub repository, see how to add a GitHub webbook in your Jenkins pipeline. Running Tests Separately While a single-stage pipeline is a good starting point, it is unrealistic in production environments. To demonstrate a multi-stage pipeline, we will split our existing build stage into two stages: a build and a test stage. In our existing build stage, we built and tested our project using the mvn clean package command. With our two-stage pipeline, we will skip our tests in the build stage using the -DskipTests=true Maven flag and add a second stage that runs only our tests using the mvn test command. Implementing this in our project results in the following Jenkinsfile: Plain Text pipeline { agent { docker { image 'maven:3.9.5-eclipse-temurin-17-alpine' args '-v /root/.m2:/root/.m2' } } stages { stage('Build') { steps { sh 'mvn clean package -DskipTests=true' } } stage('Test') { steps { sh 'mvn test' } post { always { junit 'target/surefire-reports/*.xml' } } } } } All but one of the changes to our Jenkinsfile is known: in addition to running mvn test, we also create a postprocessing step using the post section. In this section, we define a postprocessing step that is always run and tells the Jenkins JUnit Plugin where to look for our JUnit test artifacts. The JUnit Plugin is installed by default and gives us a visualization of the number of passed and failed tests over time. Related Tutorial: Publish Maven Artifacts to Nexus OSS Using Pipelines or Maven Jobs When our next build runs (which should be picked up automatically due to our SCM trigger change), we see that a new stage, Test, has been added to our Example-Pipeline master page. Note that since our previous build did not include this stage, it is grayed out. Looking at the top right of the same page, we see a graph of our tests over time. In our example project, there are nine tests, so our graph stays uniform at nine tests. If we only have one pipeline execution incorporating the JUnit Plugin, then we may not see this graph filled like above. As we run more executions, we will see the graph start to fill over time. Jenkins Features: Extending the Jenkins Pipeline The example pipeline we have created is a good starting point, but it is not realistic for medium- and large-scale applications, and it only scratches the surface of what Jenkins pipelines are capable of. As we start to build more sophisticated pipelines, we will begin to require more sophisticated features, such as: Deploying artifacts to artifact repositories Deploying Docker images to a container image repository Connecting to external services using credentials and authentication Using more advanced UIs, such as Blue Ocean Building and testing applications on different environments and operating systems Building and testing applications in parallel across multiple worker nodes Standing up complex test and deployment environments in Kubernetes Restricting access to designated administrators The list of possibilities is nearly endless, but it suffices to say that realistic projects will need to build off the simple project we have created here and add more rich features and more complex tooling to accomplish their goals. The following resources are great places to start: Jenkins User Guide Jenkins: Build a Java app with Maven Jenkins: Blue Ocean Jenkins: Scaling Pipelines Jenkins: Security Jenkins "Building a Continuous Delivery Pipeline Using Jenkins" Alternatives to Jenkins Jenkins is a very powerful tool that has many advantages over its competitors, including: General automation tooling (not just CI/CD) A vast library of plugins The ability to manage multiple projects in a single location A wide user base and community knowledge Venerability and time-tested adoption Despite that, it is important to know its alternatives and understand where they outshine Jenkins. The following is a list of Jenkins's most popular alternatives (not comprehensive) and some advantages they may have over Jenkins when building a pipeline: GitHub Actions: GitHub Actions is the GitHub implementation of CI/CD. The biggest advantage of Actions over Jenkins is that Actions is directly incorporated with GitHub. This means a pipeline built in Actions (known as a workflow) can be accessed in the same GitHub repository where our code, issues, and Wikis are located. This means we do not have to manage a separate Jenkins server and can access all of the data that supports our code in one location. While Jenkins has a wider range of plugins and integrations that can be used, GitHub Actions should be seriously considered if we are building a pipeline for code already stored in GitHub. GitLab CI/CD: Similar to GitHub Actions, GitLab CI/CD is the native pipeline builder for GitLab repositories. The advantages that GitLab CI/CD has over Jenkins are analogous to GitHub Actions's advantages: All of the tools surrounding our pipelines are located in the same application where our code is stored. GitLab CI/CD should be seriously considered when using GitLab as a remote repository. Learn how to auto deploy Spring Boot apps with GitLab CI/CD. Other alternatives to Jenkins are also common and may be useful to explore when setting up a pipeline: Circle CI Travis CI GoCD TeamCity While Jenkins has many advantages, it is important to explore alternative options to see which CI/CD solution is the best for the task at hand. Read DZone's coverage of Jenkins VS Bamboo, and Jenkins VS Gitlab. Conclusion Jenkins has been a vanguard in software automation and CI/CD since its inception in 2011. Despite this success, jumping straight into Jenkins can quickly become overwhelming. In this article, we looked at the fundamentals of CI/CD and how we can apply those concepts to create a working pipeline in Jenkins. Although this is a good starting point, it only scratches the surface of what Jenkins is capable of. As we create more and more complex projects and want to deliver more sophisticated products, we can take the knowledge we learned and use it as a building block to deliver software efficiently and effectively using Jenkins. Tutorial: Docker, Kubernetes, and Azure DevOps More Information For more information on CI/CD and Jenkins, see the following resources: The Jenkins User Handbook Continuous Delivery by Jez Humble and David Farley ContinuousDelivery.com The Jenkins Website
The test pyramid (testing pyramid, test automation pyramid) was originally published in the famous book by Mike Cohn: Succeeding with Agile: Software Development Using Scrum (Cohn, 2010). The original figure is: The concept is simple: you should write more unit tests than service tests and only a few UI tests. The reason behind this is that: UI tests are slow. UI tests are brittle. There were many modifications to the original version: Adding manual tests at the top of the pyramid Modify the name ‘service’ to ‘integration’. Modify the name ‘UI’ to ‘e2e’. Add more layers such as ‘API’ or ‘component.’ An article that considers several alternatives is (Roth 2019). However, the main problem is that this concept Considers only some aspects of testing, Cannot consider the progress of test (design) automation. In this article, we delve into a comprehensive approach to test design and test automation, focusing on the primary objective of testing, i.e., bug detection rather than execution speed optimization. Therefore, let's explore the effectiveness of test cases. Mutation Testing To measure the efficiency of test cases, we employ mutation testing, a technique that involves testing the tests themselves by introducing slight modifications to the original code, creating multiple mutants. A robust test dataset should be capable of distinguishing the original code from all carefully selected mutants. In mutation testing, we intentionally inject faults into the code to assess the reliability of our test design. A dependable test dataset must effectively "eliminate" all mutants. A test eliminates a mutant when there are discernible differences in behavior between the original code and the mutant. For instance, if the original code is y = x, and a mutant emerges as y = 2 * x, a test case like x = 0 fails to eliminate the mutant, whereas x = 1 succeeds in doing so. Unfortunately, the number of potential mutants is excessively high. However, a significant reduction can be achieved by concentrating on efficient first-order mutants. In the realm of first-order mutants, modifications are limited to a single location within the code. Conversely, second-order mutants involve alterations at two distinct locations, and during execution, both modifications come into play. An investigation by Offutt demonstrated that when all first-order mutants are effectively eliminated, only an exceptionally minute fraction of second-order mutants remain unaddressed. This implies that if a test set is capable of exterminating all first-order mutants, it can also address second-order mutants with efficacy ranging from 99.94% to 99.99%, as per Offutt's empirical study. It's essential to note that our consideration solely pertains to non-equivalent mutants, signifying that test cases exist for each mutant's elimination. We can further reduce the mutants, but first, we should consider the efficiency of the test cases. We consider test case efficiency with respect to the reduced mutation set. It’s only a very slight restriction as the imprecision is less than 0.1%. A test case is unreliable if it cannot find any bug in any mutants, i.e., it doesn’t eliminate any mutants. A test case is superfluous if there are other test cases that eliminate the same mutants. A test case T1 substitutes test T2 if it eliminates all the mutants as T2 and at least one more. A test case T1 is stronger than T2 if it eliminates more mutants than T2, but T1 doesn’t substitute T2. A test set is quasi-reliable if it eliminates all the mutants. A test set is reliable if it finds any defect in the code. A quasi-reliable test set is very close to a reliable test set. A test set is quasi-optimal if it is quasi-reliable and it consists of less than or equal number of test cases for all other quasi-reliable test sets. A test case is deterministic if it either passes or fails for all executions. A test case is non-deterministic if it both passes and fails for some executions. Flaky tests are non-deterministic. Non-deterministic test cases should be improved to become deterministic or should be deleted. Now, we can reduce the mutant set. A mutant is superfluous if no test case in a quasi-reliable test set eliminates only this mutant. For example, if only test case T1 eliminates this mutant, but T1 also eliminates another mutant, then we can remove this mutant. The reduced mutant set is called the optimal mutant set. Ideal Mutant Set In this manner, assuming we possessed flawless software, we could create an ideal mutant set from which we could derive a test set that is quasi-reliable. It's of paramount significance that the number of test cases required to detect nearly all defects (99.94% or more) would not exceed the count of optimal mutants. The author, with his co-author Attila Kovács, developed a website and a mutation framework with mutant sets that are close to optimal (i.e., there are only very few or zero superfluous mutants but no missing mutants). The following table shows the code size and the number of the mutants in the near optimum mutant sets (in the code, each parameter is in a different line): Program Code size (LOC) Number of reliable mutants Pizza ordering 142 30 Tour competition 141 24 Extra holiday 57 20 Car rental 49 15 Grocery 33 26 You can see that the code size and the number of mutants correlate, except for the Grocery app. We believe that the number of optimum mutants in the mutant sets is (close to) linear with the code, which means a reliable test set could be developed. Unfortunately, developing an optimal mutant set is difficult and time-consuming. Don’t Use the Test Pyramid or its Alternatives Why is this ‘artificial mutant creation’ important? We argue that during test automation, we should optimize the test design to find as many defects as we can but avoid superfluous tests. As the tests should the mutants, it’s the system’s attribute that a test eliminating a mutant is a unit, an integration, or a system (e2e) test. We should optimize the test design for each level separately; that’s why there cannot be a pre-described shape of test automation. You can argue that you can add more unit test cases as it is cheap to execute. However, there are other factors of tests as well. You should design and code tests. The difficulty is the calculation of the results, which can be time-consuming and error-prone. In creating e2e tests, the outputs (results) are only checked instead of calculated, which is much easier to see (Forgacs and Kovacs, 2023). Another problem is maintenance. While maintaining e2e tests is cheap, see the book above again, unfortunately, maintaining the unit tests is expensive, see (Ellims et al. 2006). OK, but if most defects can be found by unit testing, then the test pyramid is appropriate to use. However, it’s not true. Runeson et al. (2006) showed in a case study that unit tests detected only 30-60% of the defects. In addition, Berling and Thelin 2004 showed that for different programs, the ratio of bug detection for different test levels is different. That’s why the test design should be carried out one by one for the different levels independently from each other. Don’t design fewer e2e test cases than needed, as your system’s quality remains low, and the bug-fixing costs will be higher than the costs of missing test design and execution. Don’t design more unit tests; your costs will significantly increase without improving quality. But how to decrease unit tests? If you find some bugs with the unit test and the bugs could have been detected by several other test unit test cases, then you included superfluous tests, and you should remove them. Conclusion We showed that any type of test automation (shape) is faulty. The ratio of the test cases of different test levels is not an input but an output because of your system and the selected test design techniques based on risk analysis. We can conclude other things as well. As the number of quasi-reliable test cases is probably linear with the code, it’s enough to apply linear test design techniques. In this way, let’s apply each-transition (0-switch) testing instead of n-switch testing, where n > 0. Similarly, in most cases, avoid using combinatorial testing, such as all-pair testing, as a large part of the tests will be superfluous. Instead, we should develop more efficient linear test design techniques (see Forgacs and Kovacs, 2023). There are cases when you still need to use a stronger test design. If a defect may cause more damage than the whole SDLC cost, then you should apply stronger methods. However, most systems do not fall into this category.
These days, writing tests is a standard part of development. Unfortunately, we need to deal from time to time with a situation when a static method is invoked by a tested component. Our goal is to mitigate this part and avoid third-party component behavior. This article sheds light on the mocking of static methods by using the "inline mock maker" introduced by Mockito in the 3.4 version. In other words, this article explains Mockito.mockStatic method in order to help us with unwanted invocation of the static method. In This Article, You Will Learn How to mock and verify static methods with mockStatic feature How to setup mockStatic in different Mockito versions Introduction Many times, we have to deal with a situation when our code invokes a static method. It can be our own code (e.g., some utility class or class from a third-party library). The main concern in unit testing is to focus on the tested component and ignore the behavior of any other component (including static methods). An example is when a tested method in component A is calling an unrelated static method from component B. Even so, it's not recommended to use static methods; we see them a lot (e.g., utility classes). The reasoning for avoiding the usage of static methods is summarized very well in Mocking Static Methods With Mockito. Generally speaking, some might say that when writing clean object-orientated code, we shouldn’t need to mock static classes. This could typically hint at a design issue or code smell in our application. Why? First, a class depending on a static method has tight coupling, and second, it nearly always leads to code that is difficult to test. Ideally, a class should not be responsible for obtaining its dependencies, and if possible, they should be externally injected. So, it’s always worth investigating if we can refactor our code to make it more testable. Of course, this is not always possible, and sometimes we need to mock static methods. A Simple Utility Class Let's define a simple SequenceGenerator utility class used in this article as a target for our tests. This class has two "dumb" static methods (there's nothing fancy about them). The first nextId method (lines 10-12) generates a new ID with each invocation, and the second nextMultipleIds method (lines 14-20) generates multiple IDs as requested by the passed argument. Java @UtilityClass public class SequenceGenerator { private static AtomicInteger counter; static { counter = new AtomicInteger(1); } public static int nextId() { return counter.getAndIncrement(); } public static List<Integer> nextMultipleIds(int count) { var newValues = new ArrayList<Integer>(count); for (int i = 0; i < count; i++) { newValues.add(counter.getAndIncrement()); } return newValues; } } MockedStatic Object In order to be able to mock static methods, we need to wrap the impacted class by "inline mock maker." The mocking of static methods from our SequenceGenerator class introduced above is achievable by MockedStatic instance retrieved via Mockito.mockStatic method. This can be done as: Java try (MockedStatic<SequenceGenerator> seqGeneratorMock = mockStatic(SequenceGenerator.class)) { ... } Or Java MockedStatic<SequenceGenerator> seqGeneratorMock = mockStatic(SequenceGenerator.class)); ... seqGeneratorMock.close(); The created mockStatic instance has to be always closed. Otherwise, we risk ugly side effects in next tests running in the same thread when the same static method is involved (i.e., SequenceGenerator in our case). Therefore, the first option seems better, and it is used in most articles on this topic. The explanation can be found on the JavaDoc site (chapter 48) as: When using the inline mock maker, it is possible to mock static method invocations within the current thread and a user-defined scope. This way, Mockito assures that concurrently and sequentially running tests do not interfere. To make sure a static mock remains temporary, it is recommended to define the scope within a try-with-resources construct. To learn more about this topic, check out these useful links: The official site, JavaDoc site and GitHub repository. Mock Method Invocation Static methods (e.g., our nextId or nextMultipleIds methods defined above) can be mocked with MockedStatic.when. This method accepts a functional interface defined by MockedStatic.Verification. There are two cases we can deal with. Mocked Method With No Argument The simplest case is mocking a static method with no argument (nextId method in our case). In this case, it's sufficient to pass to seqGeneratorMock.when method only a method reference (see line 5). The returned value is specified in a standard way (e.g., with thenReturn method). Java @Test void whenWithoutArgument() { try (MockedStatic<SequenceGenerator> seqGeneratorMock = mockStatic(SequenceGenerator.class)) { int newValue = 5; seqGeneratorMock.when(SequenceGenerator::nextId).thenReturn(newValue); assertThat(SequenceGenerator.nextId()).isEqualTo(newValue); } } Mocked Method With One or More Arguments Usually, we have a static method with some arguments (nextMultipleIds in our case). Then, we need to use a lambda expression instead of the method reference (see line 5). Again, we can use the standard methods (e.g. then, thenRetun, thenThrow etc.) to handle the response with the desired behavior. Java @Test void whenWithArgument() { try (MockedStatic<SequenceGenerator> seqGeneratorMock = mockStatic(SequenceGenerator.class)) { int newValuesCount = 5; seqGeneratorMock.when(() -> SequenceGenerator.nextMultipleIds(newValuesCount)) .thenReturn(List.of(1, 2, 3, 4, 5)); assertThat(SequenceGenerator.nextMultipleIds(newValuesCount)).hasSize(newValuesCount); } } Verify Method Invocation Similarly, we can also verify calls of the mocked component by calling seqGeneratorMock.verify method for the method reference (see line 7) Java @Test void verifyUsageWithoutArgument() { try (MockedStatic<SequenceGenerator> seqGeneratorMock = mockStatic(SequenceGenerator.class)) { var person = new Person("Pamela"); seqGeneratorMock.verify(SequenceGenerator::nextId); assertThat(person.getId()).isEqualTo(0); } } Or the lambda expression (see line 6). Java @Test void verifyUsageWithArgument() { try (MockedStatic<SequenceGenerator> seqGeneratorMock = mockStatic(SequenceGenerator.class)) { List<Integer> nextIds = SequenceGenerator.nextMultipleIds(3); seqGeneratorMock.verify(() -> SequenceGenerator.nextMultipleIds(ArgumentMatchers.anyInt())); assertThat(nextIds).isEmpty(); } } Note: please be aware that seqGeneratorMock doesn't provide any value here, as the static methods are still mocked with the defaults. There's no spy version so far. Therefore, any expected return value has to be mocked, or the default value is returned. Setup The mockStatic feature is enabled in Mockito 5.x by default. Therefore, no special setup is needed. But we need to set up Mockito for the older versions (e.g., 4.x). Mockito 5.x+ As it was already mentioned, we don't need to set up anything in version 5.x. See the statement in the GitHub repository: Mockito 5 switches the default mockmaker to mockito-inline, and now requires Java 11. Old Mockito Versions When an older version is used, and we use the mock-inline feature via mockStatic then we can see an error like this: Plain Text org.mockito.exceptions.base.MockitoException: The used MockMaker SubclassByteBuddyMockMaker does not support the creation of static mocks Mockito's inline mock maker supports static mocks based on the Instrumentation API. You can simply enable this mock mode, by placing the 'mockito-inline' artifact where you are currently using 'mockito-core'. Note that Mockito's inline mock maker is not supported on Android. at com.github.aha.poc.junit.person.StaticUsageTests.mockStaticNoArgValue(StaticUsageTests.java:15) at java.base/java.lang.reflect.Method.invoke(Method.java:580) at java.base/java.util.ArrayList.forEach(ArrayList.java:1596) at java.base/java.util.ArrayList.forEach(ArrayList.java:1596) Generally, there are two options to enable it for such Mockito versions (see all Mockito versions here). Use MockMaker Resource The first option is based on adding <project>\src\test\resources\mockito-extensions\org.mockito.plugins.MockMaker to our Maven project with this content: Plain Text mock-maker-inline Use mock-inline Dependency The other, and probably better, option is adding mockito-inline dependency: XML <dependency> <groupId>org.mockito</groupId> <artifactId>mockito-inline</artifactId> <version>5.2.0</version> <scope>test</scope> </dependency> Note: this dependency already contains MockMaker resource mentioned above. Therefore, this option seems more convenient. Maven Warning No matter what version is used (see above), Maven build can produce these warnings: Plain Text WARNING: A Java agent has been loaded dynamically (<user_profile>\.m2\repository\net\bytebuddy\byte-buddy-agent\1.12.9\byte-buddy-agent-1.12.9.jar) WARNING: If a serviceability tool is in use, please run with -XX:+EnableDynamicAgentLoading to hide this warning WARNING: If a serviceability tool is not in use, please run with -Djdk.instrument.traceUsage for more information WARNING: Dynamic loading of agents will be disallowed by default in a future release OpenJDK 64-Bit Server VM warning: Sharing is only supported for boot loader classes because bootstrap classpath has been appended Mockito works correctly even with these warnings. It's probably caused by/depends on the used tool, JDK version, etc. Conclusion In this article, the mocking of static methods with the help of Mockito inline mock maker was covered. The article started with the basics of static mocking and then followed with a demonstration of when and verify usage (either with the method reference or the lambda expression). In the end, the setup of the Mockito inline maker was shown for different Mockito versions. The used source code can be found here.
My most-used Gen AI trick is the summarization of web pages and documents. Combined with semantic search, summarization means I waste very little time searching for the words and ideas I need when I need them. Summarization has become so important that I now use it as I write to ensure that my key points show up in ML summaries. Unfortunately, it’s a double-edged sword: will reliance on deep learning lead to an embarrassing, expensive, or career-ending mistake because the summary missed something, or worse because the summary hallucinated? Fortunately, many years as a technology professional have taught me the value of risk management, and that is the topic of this article: identifying the risks of summarization and the (actually pretty easy) methods of mitigating the risks. Determining the Problem For all of the software development history, we had it pretty easy to verify that our code worked as required. Software and computers are deterministic, finite state automata, i.e., they do what we tell them to do (barring cosmic rays or other sources of Byzantine failure). This made testing for correct behavior simple. Every possible unit test case could be handled by assertEquals(actual, expected), assertTrue, assertSame, assertNotNull, assertTimeout, and assertThrows. Even the trickiest dynamic string methods could be handled by assertTrue(string.Contains(a), string.Contains(b), string.Contains(c) and string.Contains(d). But that was then. We now have large language models, which are fundamentally random systems. Not even the full alphabet of contains(a), contains(b), or contains(c) is up to the task of verifying the correct behavior of Gen AI when the response to an API call can vary by an unknowable degree. Neither JUnit nor Nunit nor PyUnit has assertMoreOrLessOK(actual, expected). And yet, we still have to test these Gen AI APIs and monitor them in production. Once your Gen AI feature is in production, traditional observability methods will not alert you to any potential failure modes described below. So, the problem is how to ensure that the content returned by Gen AI systems are consistent with expectations, and how can we monitor them in production? For that, we have to understand the many failure modes of LLMs. Not only do we have to understand them, we have to be able to explain them to our non-technical colleagues - before there’s a problem. LLM failure modes are unique and present some real challenges to observability. Let me illustrate with a recent example from OpenAI that wasn’t covered in the mainstream news but should have been. Three researchers from Stanford University UC Berkeley had been monitoring ChatGPT to see if it would change over time, and it did. Problem: Just Plain Wrong In one case, the investigators repeatedly asked ChatGPT a simple question: Is 17,077 a prime number? Think step by step and then answer yes or no. ChatGPT responded correctly 98% of the time in March of 2023. Three months later, they repeated the test, but ChatGPT answered incorrectly 87% of the time! It should be noted that OpenAI released a new version of the API on March 14, 2023. Two questions must be answered: Did OpenAI know the new release had problems, and why did they release it? If they didn’t know, then why not? This is just one example of your challenges in monitoring Generative AI. Even if you have full control of the releases, you have to be able to detect outright failures. The researchers have made their code and instructions available on GitHub, which is highly instructive. They have also added some additional materials and an update. This is a great starting point if your use case requires factual accuracy. Problem: General Harms In addition to accuracy, it’s very possible for Generative AI to produce responses with harmful qualities such as bias or toxicity. HELM, the Holistic Evaluation of Language Models, is a living and rapidly growing collection of benchmarks. It can evaluate more than 60 public or open-source LLMs across 42 scenarios, with 59 metrics. It is an excellent starting point for anyone seeking to better understand the risks of language models and the degree to which various vendors are transparent about the risks associated with their products. Both the original paper and code are freely available online. Model Collapse is another potential risk; if it happens, the results will be known far and wide. Mitigation is as simple as ensuring you can return to the previous model. Some researchers claim that ChatGPT and Bard are already heading in that direction. Problem: Model Drift Why should you be concerned about drift? Let me tell you a story. OpenAI is a startup; the one thing a startup needs more than anything else is rapid growth. The user count exploded when ChatGPT was first released in December of 2022. Starting in June of 2023, however, user count started dropping and continued to drop through the summer. Many pundits speculated that this had something to do with student users of ChatGPT taking the summer off, but commentators had no internal data from OpenAI, so speculation was all they could do. Understandably, OpenAI has not released any information on the cause of the drop. Now, imagine that this happens to you. One day, usage stats for your Gen AI feature start dropping. None of the other typical business data points to a potential cause. Only 4% of customers tend to complain, and your complaints haven’t increased. You have implemented excellent API and UX observability; neither response time nor availability shows any problems. What could be causing the drop? Do you have any gaps in your data? Model Drift is the gradual change in the LLM responses due to changes in the data, the language model, or the cultures that provide the training data. The changes in LLM behavior may be hard to detect when looking at individual responses. Data drift refers to changes in the input data model processes over time. Model driftrefers to changes in the model's performance over time after it has been deployed and can result in: Performance degradation: the model's accuracy decreases on the same test set due to data drift. Behavioral drift: the model makes different predictions than originally, even on the same data. However, drift can also refer to concept drift, which leads to models learning outdated or invalid conceptual assumptions, leading to incorrect modeling of the current language. It can cause failures on downstream tasks, like generating appropriate responses to customer messages. And the Risks? So far, the potential problems we have identified are failure and drift in the Generative AI system’s behavior, leading to unexpected outcomes. Unfortunately, It is not yet possible to categorically state what the risks to the business might be because nobody can determine beforehand what the possible range of responses might be with non-deterministic systems. You will have to anticipate the potential risks on a Gen AI use-case-by-use-case basis: is your implementation offering financial advice or responding to customer questions for factual information about your products? LLMs are not deterministic; a statement that, hopefully, means more to you now than it did three minutes ago. This is another challenge you may have when it comes time to help non-technical colleagues understand the potential for trouble. The best thing to say about risk is that all the usual suspects are in play (loss of business reputation, loss of revenue, regulatory violations, security). Fight Fire With Fire The good news is that mitigating the risks of implementing Generative AI can be done with some new observability methods. The bad news is that you have to use machine learning to do it. Fortunately, it’s pretty easy to implement. Unfortunately, you can’t detect drift using your customer prompts - you must use a benchmark dataset. What You’re Not Doing This article is not about detecting drift in a model’s dataset - that is the responsibility of the model's creators, and the work to detect drift is serious data science. If you have someone on staff with a degree in statistics or applied math, you might want to attempt to drift using the method (maximum mean discrepancy) described in this paper: Uncovering Drift In Textual Data: An Unsupervised Method For Detecting And Mitigating Drift In Machine Learning Models What Are You Doing? You are trying to detect drift in a model’s behavior using a relatively small dataset of carefully curated text samples representative of your use case. Like the method above, you will use discrepancy, but not for an entire set. Instead, you will create a baseline collection of prompts and responses, with each prompt-response pair sent to the API 100 times, and then calculate the mean and variance for each prompt. Then, every day or so, you’ll send the same prompts to the Gen AI API and look for excessive variance from the mean. Again, it’s pretty easy to do. Let’s Code! Choose a language model to use when creating embeddings. It should be as close as possible to the model being used by your Gen AI API. You must be able to have complete control over this model’s files, and all of its configurations, and all of the supporting libraries that are used when embeddings are created and when similarity is calculated. This model becomes your reference. The equivalent of the 1 kg sphere of pure Silicon that serves as a global standard of mass. Java Implementation The how-do-I-do-this-in-Java experience for me, a 20-year veteran of Java coding, was painful until I sorted out the examples from Deep Java Learning. Unfortunately, DJL has a very limited list of native language models available compared to Python. Though over-engineered, for example, the Java code is almost as pithy as Python: Setup of the LLM used to create sentence embedding vectors. Code to create the text embedding vectors and compare the semantic similarity between two texts: The function that calculates the semantic similarity. Put It All Together As mentioned earlier, the goal is to be able to detect drift in individual responses. Depending on your use case and the Gen AI API you’re going to use, the number of benchmark prompts, the number of responses that form the baseline, and the rate at which you sample the API will vary. The steps go like this: Create a baseline set of prompts and Gen AI API responses that are strongly representative of your use case: 10, 100, or 1,000. Save these in Table A. Create a baseline set of responses: for each of the prompts, send to the API 10, 50, or 100 times over a few days to a week, and save the text responses. Save these in Table B. Calculate the similarity between the baseline responses: for each baseline response, calculate the similarity between it and the response in Table A. Save these similarity values with each response in Table B. Calculate the mean, variance, and standard deviation of the similarity values in table B and store them in table A. Begin the drift detection runs: perform the same steps as in step 1 every day or so. Save the results in Table C. Calculate the similarity between the responses in Table A at the end of each detection run. When all the similarities have been calculated, look for any outside the original variance. For those responses with excessive variance, review the original prompt, the original response from Table A, and the latest response in Table C. Is there enough of a difference in the meaning of the latest response? If so, your Gen AI API model may be drifting away from what the product owner expects; chat with them about it. Result The data, when collected and charted, should look something like this: The chart shows the result of a benchmark set of 125 prompts sent to the API 100 times over one week - the Baseline samples. The mean similarity for each prompt was calculated and is represented by the points in the Baseline line and mean plot. The latest run of the same 125 benchmark samples was sent to the API yesterday. Their similarity was calculated vs the baseline mean values, the Latest samples. The responses of individual samples that seem to vary quite a bit from the mean are reviewed to see if there is any significant semantic discrepancy with the baseline response. If that happens, review your findings with the product owner. Conclusion Non-deterministic software will continue to be a challenge for engineers to develop, test, and monitor until the day that the big AI brain takes all of our jobs. Until then, I hope I have forewarned and forearmed you with clear explanations and easy methods to keep you smiling during your next Gen AI incident meeting. And, if nothing else, this article should help you to make the case for hiring your own data scientist. If that’s not in the cards, then… math?
What Are Feature Flags? Feature flags are a software development technique that help to turn certain functionality on and off during runtime without the deployment of code. For both feature flags and modern development in general, it is always focused on the race to deploy software faster to the customers. However, it is not only that the software has to reach the customer faster, but it also has to be done with lesser risk. Feature flags are a potent tool (set of patterns or techniques) that can be used to reinforce the CI/CD pipeline by increasing the velocity and decreasing the risk of the software deployed to the production environment. Feature flags are also known as feature bits, feature flippers, feature gates, conditional features, feature switches, or feature toggles (even though the last one may have a subtle distinction which we will see a bit later). Related: CI/CD Software Development Patterns. Feature flags help to control and experiment over the feature lifecycle. They are a DevOps best practice that are often observed in distributed version control systems. Even incomplete features can be pushed to production because feature flags help to separate deployment from release. Earlier, the lowest level of control was at the deployment level. Now, feature flags move the lowest level of control to each individual item or artifact (feature, update, or bug fixes) that’s in production which makes it even more granular than the production deployment. Feature Flags Deployment Feature flags can be implemented as: Properties in JSON files or config maps A feature flag service Once we have a good use case (e.g., show or hide a button to access the feature, etc.) to use the feature flags, we will have to see where to implement the flag (frontend, backend, or a mix of both). With a feature flag service, we must install the SDK and create and code the flags within the feature flag platform and then we wrap the new paths of the code or new features within the flags. This enables the feature flags, and the new feature can be toggled on or off through a configuration file or a visual interface as part of the feature flagging platform. We also set up the flag rules so that we may manage various scenarios. You may use different SDKs depending on the language of each service used. This also helps product managers to run some experiments on the new features. After the feature flags are live, we must manage them, which is also known as feature flag management. After the feature flag has served its purpose or no longer serving its purpose, we need to remove them to avoid the technical debt of having the feature flags being left in the codebase. This can also be automated within the service platform. DZone's previously covered how to trigger pipelines with jobs in Azure DevOps. Feature Toggles vs. Feature Flags From an objective perspective, there may be no specific difference between a feature toggle and a feature flag, and for all practical purposes, you may consider them as similar terms. However, feature toggles may carry a subtle connotation of a heavier binary "on/off" for the whole application, whereas feature flags could be much lighter and can manage ramp-up testing more easily. For example, a toggle could be an on/off switch (show ads on the site, don't show ads) and it could be augmented by a flag like (Region1 gets ads from provider A, Region2 gets ads from provider B). Toggling may turn off all the ads, but a feature flag might be able to switch from provider B to provider D. Types of Feature Flags There are different types of feature flags based on various scenarios and in this section, we will look at some of the important types of flags. The fundamental benefits of the feature flags are their ability to ship alternative code pathways within a deployable environment and the ability to choose specific pathways at runtime. Different user scenarios indicate that this benefit can be applied in multiple modes in different contexts. Two important facets that can be applied to categorize the types of feature flags are longevity (how long the flag will be alive), and dynamism (what is the frequency of the switching decision), even though we may also have other factors for consideration. Release Flags For teams practicing continuous delivery, release flags enable faster shipping velocity for the customer and trunk-based development. These flags allow incomplete and untested code pathways which can be shipped to production as latent code. The flag also facilitates the continuous delivery principle of separating the feature release from the deployment of code. These flags are very useful for product managers to manage the delivery of the product to the customers as per the requirements. Operational Flags Operational flags are used for managing the operational aspects of the system’s behavior. If we have a feature that is being rolled out and it has unclear performance issues, we should be able to quickly disable/degrade that feature in production, when required. These are generally short-lived flags but we also have some long-lived flags, a.k.a. kill switches, which can help in degrading non-vital system functionality in production when there are heavy loads. These long-lived flags may also be seen as a manually managed circuit breaker that can be triggered if we cross the set thresholds. The flags are very useful to quickly respond during production issues and they also need to be re-configured quickly so that they are ready for the next set of issues that may occur. Experimental Flags Experimental flags are generally used in A/B or multivariate testing. Users are placed in cohorts and at runtime, the toggle router will send different users across different code pathways based on which cohort they belong to. By tracking the aggregate behavior of the different cohorts, the effect of different code pathways can be observed, and this can help to make data-driven optimizations to the application functionalities like search variables that have the most impact on the user. These flags need to operate with the same configuration for a specific time period (as decided by traffic patterns and other factors so that the results of the experiment are not invalidated) in order to generate statistically significant results. However, since this may not be possible in a production environment where each request may be from a different user, these flags are highly dynamic and need to be managed appropriately. Customer/Permission Flags Customer/permissions flags restrict or change the type of features or product experience that a user gets from a product. One example of this is a premium feature that only some users get based on a subscription. Martin Fowler adds that the technique of turning on new features for a set of internal or beta users as a champagne brunch – an early instance of tasting your own medicine or drinking your own champagne. These flags are quite long-lived flags (many years) compared to other flags. Additionally, as the permissions are specific to a user, switching decisions is generally on a per-request basis, and hence, these flags are very dynamic. Feature Flags and CI/CD Feature flags are one of the important tools that helps the CI/CD pipeline to work better and deliver code faster to the customer. Continuous integration means integrating code changes from the development teams/members every few hours. With continuous delivery, the software is ready for deployment. With continuous deployment, we deploy the software as soon as it is ready, using an automated process. CI and CD are therefore observed to have great benefits because when they work in tandem, they shorten the software development lifecycle (SDLC). However, software has bugs, and delivering code continuously and quickly can rapidly turn from an asset to a liability, and this is where feature flags give us a way to enable or disable new features without a build or a deployment. In effect, they are acting as a safety net just like tests, which also act as a safety net to let us know if the code is broken. We can ship new features and turn them on or off, as required. Thus, feature flags are part of the release and rollout processes. Many engineering teams are now discussing how to implement continuous testing into the DevOps CI/CD pipeline. Implementation Techniques of Feature Flags Below are a few important implementation patterns and practices that may help to reduce messy toggle point issues. Avoiding Conditionals Generally, toggle or switch points are implemented using 'if' statements for short-lived toggles. However, for long-lived toggles or for multiple toggle points, we may use some sort of a strategy pattern to implement alternative code pathways that are a more maintainable approach. Decision Points and Decision Logic Should be Decoupled An issue with feature flags is that we may couple the toggle point (where the toggling decision is made) with the toggle router (the logic behind the decision). This can create rigidity due to the toggle points being linked/hard-wired to the feature directly and we may not be able to modify the sub-feature functionalities easily. By decoupling the decision logic from the decision point, we may be able to manage toggle scope changes more effectively. Inversion of Decision If the application is linked to the feature flagging service or platform, we again have to deal with rigidity as the application is harder to work with and think in isolation, and it also becomes difficult to test it. These issues can be resolved by applying the software design principle – inversion of control by decoupling the application from the feature flagging service. Related: Create a Release Pipeline with Azure DevOps. How Feature Flags Can Improve Release Management Some of the benefits of using feature flags for release management are: Turn on/off without deployment Test directly in production Segment the users based on different attributes Segments are users or groups of users that have some attributes tied to them like location or email ID. Be sure to group segments as collections so that feature flags are tied to specific apps (which are the web pages). Here are some benefits of feature flag service platforms for release management: Can be centrally managed On/off without modifying your properties in your apps/web pages Audit and usage data Conclusion Feature flags in conjunction with CI/CD and release management help in improving many aspects of software delivery. To name a few, these include shipping velocity and reduced time-to-market with less fear of bugs being released in production. They also introduce complexity and challenges in the code that need to be monitored and managed appropriately. In order to use feature flags effectively, it should be an organization-wide initiative and it should not be limited to a few developers only. To further your reading, learn more about running a JMeter test with Jenkins pipelines.
CI/CD Explained CI/CD stands for continuous integration and continuous deployment and they are the backbone of modern-day DevOps deployment practices. CI/CD is the process that allows software to be continuously built, tested, automated, and delivered in a continuous cadence. In a rapidly developing world with increasing requirements, the development and integration process need to be at the same speed to ensure business delivery. What Is Continuous Integration? CI, or continuous integration, works on automated tests and builds. Changes made by developers are stored in a source branch of a shared repository. Any changes committed to this branch go through builds and testing before merging. This ensures consistent quality checks of the code that gets merged. As multiple developers work on different complex features, the changes are made to a common repository with changes merged in increments. Code changes go through pre-designed automated builds. Code is tested for any bugs making sure it does not break the current workflow. Once all the checks, unit tests, and integration tests are cleared, the code can be merged into the source branch. The additional checks ensure code quality and versioning makes it easier to track any changes in case of issues. Continuous integration has paved the path for rapid development and incremental merging making it easier to fulfill business requirements faster. What Is Continuous Delivery? CD, or continuous deployment, works on making the deployment process easier and bridges the gap between developers, operations teams, and business requirements. This process automatically deploys a ready, tested code to the production environment. But, through the process of automating the effort taken for deployment, frequent deployments can be handled by the operations team. This enables more business requirements to be delivered at a faster rate. CD can also stand for continuous delivery, which includes the testing of code for bugs before it is deployed to the pre-production environment. Once tests are complete and bugs are fixed, they can then be deployed to production. This process allows for a production-ready version of the code to always be present with newly tested changes added in continuous increments. As code gets merged in short increments, it is easy to test and scan for bugs before getting merged in the pre-production and production environments. Code is already scanned in the automated pipelines before getting handed to the testing teams. This cycle of repeated scanning and testing helps reduce issues and also helps in faster debugging. Continuous integration allows for continuous delivery, which is followed by continuous deployment. Figure 1: CI/CD What Is the Difference Between Continuous Integration (CI) and Continuous Deployment (CD)? The biggest difference between CI and CD is that CI focuses on prepping and branching code for the production environment, and CD focuses on automation and ensuring that this production-ready code is released. Continuous integration includes merging the developed features into a shared repository. It is then built and unit-tested to make sure it is ready for production. This stage also includes UI testing if needed. Once a deployment-ready code version is ready we can move to the next phase, i.e., continuous deployment. The operations team then picks the code version for automated tests to ensure a bug-free code. Once the functionality is tested, the code is merged into production using automated deployment pipelines. Hence, both CI and CD work in sync to deliver at a rapid frequency with reduced manual efforts. Fundamentals of Continuous Integration Continuous integration is also an important practice when it comes to Agile software development. Code changes are merged into a shared repository and undergo automated tests and checks. This helps in identifying possible issues and bugs at an earlier stage. As multiple developers may work on the same code repository, this step ensures there are proper checks in place that test the code, validate the code, and get a peer review before the changes get merged. Read DZone's guide to DevOps code reviews. Continuous integration works best if developers merge the code in small increments. This helps keep track of all the features and possible bug fixes that get merged into the shared code repository. Fundamentals of Continuous Deployment Continuous deployment enables frequent production deployments by automating the deployment process. As a result of CI, a production-ready version of code is always present in the pre-production environment. This allows developers and testers alike to run automated integration and regression tests, UI tests, and more in the staging environment. Once the tests are successfully run and the expected criteria are met, the code can be easily pushed to a live environment by either the Development or Operations teams. Advantages and Disadvantages of CI/CD Implementation CI/CD implementation can have both pros and cons. Having a faster deployment cycle can also lead to other problems down the line. Below are a few benefits and drawbacks of CI/CD implementation. Advantages of CI/CD Disadvantages of CI/CD Automated tests and builds: Automated tests and builds take the strain off of the developers and testers and bring consistency to the code. This is an important step in the CI/CD world. Rapid deployments where they are not needed: There might be businesses that do not appreciate rapid change. A faster rollout period may not be suitable for the business model. Deep testing before deployment can also ensure fewer bugs and problems down the line. Better code quality: Every commit goes through certain predefined checks before getting merged into the main branch. This ensures consistent code quality and minimal bugs or plausible issues to be detected at an earlier stage. Monitoring: Faster rollout leads to less deep testing. Continuous monitoring is important in such cases to quickly identify any issues as they come. Hence monitoring is a crucial part of a CI/CD process. Faster rollout: Automated deployment leads to faster rollout. More features can be released to the end user in smaller chunks. Business requirements are delivered faster keeping up with increasing demands and changes. Issues and fixes: No thorough testing may lead to escaped corner cases also known as bugs. Some cases may be left unnoticed for longer periods. Better transparency: As multiple developers work on a common repository, it is easier to track the changes and maintain transparency. Various version management tools help track history and versions with additional checks before merging to ensure no overlaps or conflicts in the changes. Dependency management: A change made in one microservice can cause a cascading chain of issues. Orchestration is required in such cases to ensure less breakage due to any change added in one part of the service. Faster rollbacks and fixes: As the history and versioning are tracked, it is easier to roll back any change(s) that are causing issues in the application. Any fixes made can also be deployed to production faster. Managing resources: With continuous changes being made development and operations teams need to also keep up with the continuous requirements and maintenance of pipelines. Popular CI/CD Tools Below are a few common CI/CD tools that make life easier for the development teams: AWS AWS, or Amazon Web Services, is a popular DevOps and CI/CD tool. Similarly to Azure, it provides the infrastructure needed for a CI/CD implementation. DZone's previously covered building CI/CD Pipelines with AWS. Azure DevOps Azure DevOps services by Microsoft provide a suite of services to run a CI/CD implementation. From continuous builds to deployments, Azure DevOps handles everything in one platform. Bitbucket Bitbucket is a cloud version system developed by Atlassian. Bitbucket Pipelines is a CI tool that is easily integrated with Bitbucket. GitLab In addition to providing all features of GitHub, GitLab also provides a complete CI/CD setup. From wiki, branch management, versioning, and builds, to deployment, GitLab provides an array of services. Jenkins Jenkins is built using Java and is a commonly used CI/CD tool. It is an open-source continuous integration tool. It is easy to plug in and helps manage builds, automated checks, and deployments. It is very handy for real-time testing and reporting. Learn how to setup a CI/CD pipeline from scratch. Alternative Comparisons: Jenkins VS GitLab, and Jenkins VS Bamboo. Conclusion As said by Stan Lee, "With great power comes great responsibility." CI/CD provides a powerful array of tools to enable rapid development and deployment of features to keep up with business requirements. CI/CD is a constant process enabling continuous change. Once it is adapted accurately, teams can easily deal with new requirements, and fix and rollout any bugs or issues as they come. CI/CD is also often used in DevOps practices. Review these best practices further by reading this DevOps Tutorial. With new tools available in the markets adoption or migration to CI/CD has become easier than before. However one needs to assess if CI/CD is the right approach depending on their business use case and available resources. Please share your experience with CI/CD and your favorite CI/CD tool in the comments below.