The final step in the SDLC, and arguably the most crucial, is the testing, deployment, and maintenance of development environments and applications. DZone's category for these SDLC stages serves as the pinnacle of application planning, design, and coding. The Zones in this category offer invaluable insights to help developers test, observe, deliver, deploy, and maintain their development and production environments.
In the SDLC, deployment is the final lever that must be pulled to make an application or system ready for use. Whether it's a bug fix or new release, the deployment phase is the culminating event to see how something works in production. This Zone covers resources on all developers’ deployment necessities, including configuration management, pull requests, version control, package managers, and more.
The cultural movement that is DevOps — which, in short, encourages close collaboration among developers, IT operations, and system admins — also encompasses a set of tools, techniques, and practices. As part of DevOps, the CI/CD process incorporates automation into the SDLC, allowing teams to integrate and deliver incremental changes iteratively and at a quicker pace. Together, these human- and technology-oriented elements enable smooth, fast, and quality software releases. This Zone is your go-to source on all things DevOps and CI/CD (end to end!).
A developer's work is never truly finished once a feature or change is deployed. There is always a need for constant maintenance to ensure that a product or application continues to run as it should and is configured to scale. This Zone focuses on all your maintenance must-haves — from ensuring that your infrastructure is set up to manage various loads and improving software and data quality to tackling incident management, quality assurance, and more.
Modern systems span numerous architectures and technologies and are becoming exponentially more modular, dynamic, and distributed in nature. These complexities also pose new challenges for developers and SRE teams that are charged with ensuring the availability, reliability, and successful performance of their systems and infrastructure. Here, you will find resources about the tools, skills, and practices to implement for a strategic, holistic approach to system-wide observability and application monitoring.
The Testing, Tools, and Frameworks Zone encapsulates one of the final stages of the SDLC as it ensures that your application and/or environment is ready for deployment. From walking you through the tools and frameworks tailored to your specific development needs to leveraging testing practices to evaluate and verify that your product or application does what it is required to do, this Zone covers everything you need to set yourself up for success.
Kubernetes in the Enterprise
In 2022, Kubernetes has become a central component for containerized applications. And it is nowhere near its peak. In fact, based on our research, 94 percent of survey respondents believe that Kubernetes will be a bigger part of their system design over the next two to three years. With the expectations of Kubernetes becoming more entrenched into systems, what do the adoption and deployment methods look like compared to previous years?DZone's Kubernetes in the Enterprise Trend Report provides insights into how developers are leveraging Kubernetes in their organizations. It focuses on the evolution of Kubernetes beyond container orchestration, advancements in Kubernetes observability, Kubernetes in AI and ML, and more. Our goal for this Trend Report is to help inspire developers to leverage Kubernetes in their own organizations.
The DORA metrics are pretty much an iceberg, with the five indicators sticking out above the surface and plenty of research hidden beneath the waves. With the amount of work that has been put into that program, the whole thing can seem fairly opaque when you start working with them. Let’s try to peek under the surface and see what’s going on down there. After our last post about metrics, we thought it might be interesting to look at how metrics are used on different organizational levels. If we start from the top, DORA is one of the more popular projects today. Here, we’ll tell you some ideas we’ve had on how to use the DORA metrics, but first, there have been some questions we’ve been asking ourselves about the research and its methodology. We’d like to share those questions with you, starting with: What Is DORA? DevOps Research and Assessment is a company founded in 2015. Since then, they have been publishing State of DevOps reports, in which they’ve analyzed development trends in the software industry. In 2018, the people behind that research published a book (Accelerate: The Science of Lean Software and DevOps: Building and Scaling High Performing Technology Organizations) where they identified key metrics that have the strongest influence on business performance: Deployment Frequency (DF): How often your team deploys to production. Mean Lead Time for Changes (MLT): How long it takes for a commit to get to production. Together with DF, those are measures of velocity. Change Failure Rate (CFR): The number of times your users were negatively affected by changes, divided by the number of changes. Mean Time to Restore (MTTR): How quickly service was restored after each failure. This and CFR are measures of stability. Reliability: The degree to which a team can keep promises and assertions about the software they operate. This is the most recent addition to the list. The DORA team has conducted a truly impressive amount of work. They’ve done solid academic research, and, in the reports, they always honestly present all results, even when those seem to counter their hypotheses. All that work and the volume of data processed truly is impressive, but, ironically, it might present a limitation. When the DORA team applies their metrics, they back it up with detailed knowledge; this is absent when someone else uses them. It is not a hypothetical situation because, by now, these metrics are so popular that tools have been written specifically to measure them—like fourkeys. More general tools, like GitLab or Codefresh, can also track them out of the box. The questions we’re about to ask might be construed as criticism—but that is not the intention. We’re just trying to show that DORA is a complex tool, which should be, as they say, handled with care. Do the Key Metrics Work Everywhere? The main selling point of the key metrics is their universal importance. DORA found them to be significant for all types of enterprises, regardless of whether we’re talking about a Silicon Valley startup, a multinational corporation, or a government. This would imply that those metrics could work as an industry standard. In a way, this is how they are presented: the companies in DORA surveys are grouped into clusters (usually four), from low to elite, and the values for the elite cluster look like something everyone should emulate. But, in reality, all of this is more descriptive than imperative. The promoted value, for e.g., Mean Lead Time for Changes, is simply the value from companies that were grouped into the elite cluster, and that value can change from year to year. For instance, in 2019, it was less than one day. In 2018 and 2021, less than one hour, and in 2022—between one day and one week. By the way, that last one is because there was no “elite” cluster at all that year, the data looked more convenient in three clusters. So, if we stop at this point and don’t look further than the key metrics, we just get the message—here’s a picture of what an elite DevOps team looks like, let’s be more like them, everybody. In the end, we’re coming back to the simple truth that correlation does not imply causation. If the industry leaders that have embraced DevOps all display these stats, does it mean that by gaining those stats you will also become a leader? Doing it without proper understanding or regard for context might result in wasted effort. How much will it cost you to drive each of those metrics to the elite level—and keep them there indefinitely? What will the return on that investment be? To answer those questions, you’re going to need to dig deeper—and the same goes for the next question on our list. We Know How Fast We Are Going—But in What Direction? Not only will you need something lower-level to complement the DORA metrics, you’ll also need something higher-level. As we’ve already said previously, a good metric should somehow be tied to user happiness. The problem is, that the DORA metrics tell you nothing about business context—whether or not you’re responding to any kind of real demand out there. Using just DORA to set OKRs will paint a very incomplete picture of how well the business is performing. You’ll know how fast you’re going, but you might be driving in the opposite direction from where you need to be, and the DORA metrics won’t alert you to that. What Is Reliability? This is what we’ve been asking ourselves when we were researching the fifth DORA metric. If you’ve read the 2021 and 2022 reports, you’ll know that it is something that was inspired by Google’s Site Reliability Engineering (SRE), but you’ll still be none the wiser as to what specific metrics it is based on, how exactly it is calculated, or how you might go about measuring your own reliability. The reports don’t show any values for it, it is not shown in the nice tables where they compare clusters, and the Quick Check offered by DORA doesn’t mention reliability at all in its main questions. The last State of DevOps report states that investment in SRE yields improvements to reliability “only once a threshold of adoption has been reached,” but doesn’t tell us what that threshold is. This leaves a company in the dark as to whether they’ll gain anything by investing in SRE. This is not to be taken as criticism of SRE—the way it’s presented in the reports is opaque, and if you want to make any meaningful decisions in your team, you’ll need to drill down to more actionable metrics. Idea: Check the DORA Metrics Against Each Other The great thing about the key DORA metrics is how they can serve as checks for each other; each of them taken on its own would lie by omission. A team of very careful developers who are diligent with their unit tests and who are supported by a qualified QA team could have the same deployment frequency and lead time for changes as a team who does no testing whatsoever. Obviously, the products delivered by those teams would be very different. So, we need a measure of how much user experience is affected by change. Time to Restore tells us something about it—but on its own, it is useless, like measuring distance instead of speed. Spending two hours restoring a failure that happened once a month or once a day are two completely different things. Change Failure Rate to the rescue—it tells us how often the changes happen. Another problem with MTTR: you could have a low value that is achieved by fixing every disaster with an emergency hack. Or you could have a high deployment frequency, which allows you to roll out stable fixes in a quick and reliable manner. This is an extremely important advantage of a high DF: being able to respond to situations in the field in a non-emergency manner. Again, the metrics serve as checks for each other. Further, if we’re trying to gauge the damage from failures, we need to know how much every minute of downtime costs us. Figuring it out will require additional research, but it will put the figures into proper context and will finally allow us to make a decision based on the numbers. So, for the DORA metrics to be used, they need to be judged against each other and against additional technical, product, and marketing metrics. Idea: Know the Limits of Applicability Taking this point further, there are situations where the DORA metrics don’t necessarily indicate success or failure. The most obvious example here is that your deployment frequency depends not only on your internal performance but also on your client’s situation. If they are only accepting changes once per quarter, there isn’t much you can do about that. The last State of DevOps report recommends using cloud computing for improving organizational performance (which includes deployment frequency). This makes sense, of course—but clouds are not always an option, and this should be considered when judging your DF. If we take Qameta Software as an example, Allure TestOps has a cloud-based version, where updating is a relatively easy affair. However, if you want to update an on-premises version, you’ll need to work with the admins and it will take a while. Moreover, some clients simply decide they want to stay with an older version of TestOps for fear of backward compatibility problems. Wrike has about 70k tests, which are launched several times a day with TestOps. Any disruption to that process will have an extremely high cost, so they’ve made the decision not to update. There are other applications for which update frequency also isn’t as high a priority—like offline computer games. All in all, there are situations where chasing the elite status measured by DORA might do more harm than good. This doesn’t mean that the metrics themselves become useless, just that they have to be used with care. Idea: Use the DORA Metrics as an Alarm Bell The DORA metrics are lagging indicators, not leading ones. This means they don’t react to change immediately, and only show the general direction of where things are going. To be useful, they have to be backed by other, more sensitive and local indicators. If we take a look at some real examples, we’ll see that the key metrics are often used as an alarm bell. At first, people making decisions get dissatisfied with the values they’re seeing. Maybe they want to get into a higher DORA cluster, or maybe one particular metric used to show higher values in the past. So they start digging into the problem and drilling down: either talk to people in their teams or look up lower-level metrics, such as throughput, flow efficiency, work in progress, number of merges to trunk per developer per day, bugs found pre-prod vs in prod, etc. This helps identify real issues and bottlenecks. If you want to know what actions you can take to address your issues, DORA is offering a list of DevOps capabilities, which range from techniques like version control and CI to cultural aspects like improving job satisfaction. Conclusion Of course, there is a reason why some stuff, like reliability, might be opaque and why some technical details might be black-boxed (or gray-boxed) in the DORA metrics. It seems like they were created for a fairly high organizational level. At some point, the authors state explicitly that the DORA assessment tool was designed to target business leaders and executives. This, combined with its academic background, makes it an impressive and complex tool. To use that tool properly, we, as always, have to be aware of its limitations. We have to know that, in some situations, a non-elite value for the key metrics might be perfectly acceptable, and chasing elite status might cost more than that chase will yield. Business context and customer needs have to be held above these metrics. The metrics have to be checked against each other and against more local and leading indicators. If all of this is kept in mind, the DORA metrics can provide a useful measure of your operational performance. We plan to continue this series of articles about metrics in QA. The previous one outlined a series of requirements for a good metric; and now, we want to cover specific examples of metrics on different organizational levels. After DORA, we’ll probably want to go for something lower-level, so stay tuned!
Three Hard Facts First, the complexity of your software systems is through the roof, and you have more external dependencies than ever before. 51% of IT professionals surveyed by SolarWinds in 2021 selected IT complexity as the top issue facing their organization. Second, you must deliver faster than the competition, which is increasingly difficult as more open-source and reusable tools let small teams move extremely fast. Of the 950 IT professionals surveyed by RedHat, only 1% indicated that open-source software was “not at all important.” And third, reliability is slowing you down. The Reliability/Speed Tradeoff In the olden days of software, we could just test the software before a release to ensure it was good. We ran unit tests, made sure the QA team took a look, and then we’d carefully push a software update during a planned maintenance window, test it again, and hopefully get back to enjoying our weekend. By 2023 standards, this is a lazy pace! We expect teams to constantly push new updates (even on Fridays) with minimal dedicated manual testing. They must keep up with security patches, release the latest features, and ensure that bug fixes flow to production. The challenge is that pushing software faster increases the risk of something going wrong. If you took the old software delivery approach and sped it up, you’d undoubtedly have always broken releases. To solve this, modern tooling and cloud-native infrastructure make delivering software more reliable and safer, all while reducing the manual toil of releases. According to the 2021 State of DevOps report, more than 74% of organizations surveyed have Change Failure Rate (CFR) greater than 16%. For organizations seeking to speed up software changes (see DORA metrics), many of these updates caused issues requiring additional remediation like a hotfix or rollback. If your team hasn’t invested in improving the reliability of software delivery tooling, you won’t be able to achieve reliable releases at speed. In today’s world, all your infrastructure, including dev/test infrastructure, is part of the production environment. To go fast, you also have to go safely. More minor incremental changes, automated release and rollback procedures, high-quality metrics, and clearly defined reliability goals make fast and reliable software releases possible. Defining Reliability With clearly defined goals, you will know if your system is reliable enough to meet expectations. What does it mean to be up or down? You have hundreds of thousands of services deployed in clouds worldwide in constant flux. The developers no longer coordinate releases and push software. Dependencies break for unexpected reasons. Security fixes force teams to rush updates to production to avoid costly data breaches and cybersecurity threats. You need a structured, interpreted language to encode your expectations and limits of your systems and automated corrective actions. Today, definitions are in code. Anything less is undefined. The alternative is manual intervention, which will slow you down. You can’t work on delivering new features if you’re constantly trying to figure out what’s broken and fix releases that have already gone out the door. The most precious resource in your organization is attention, and the only way to create more is to reduce distractions. Speeding Up Reliably Service level objectives (SLOs) are reliability targets that are precisely defined. SLOs include a pointer to a data source, usually a query against a monitoring or observability system. They also have a defined threshold and targets that clearly define pass or fail at any given time. SLOs include a time window (either rolling or calendar aligned) to count errors against a budget. OpenSLO is the modern de facto standard for declaring your reliability targets. Once you have SLOs to describe your reliability targets across services, something changes. While SLOs don’t improve reliability directly, they shine a light on the disconnect between expectations and reality. There is a lot of power in simply clarifying and publishing your goals. What was once a rough shared understanding becomes explicitly defined. We can debate the SLO and decide to raise, lower, redefine, split, combine, and modify it with a paper trail in the commit history. We can learn from failures as well as successes. Whatever other investments you’re making, SLOs help you measure and improve your service. Reliability is engineered; you can’t engineer a system without understanding its requirements and limitations. SLOs-as-code defines consistent reliability across teams, companies, implementations, clouds, languages, etc.
The Southwest Airlines fiasco from December 2022 and the FAA Notam database fiasco from January 2023 had one thing in common: their respective root causes were mired in technical debt. At its most basic, technical debt represents some kind of technology mess that someone has to clean up. In many cases, technical debt results from poorly written code, but more often than not, it is more a result of evolving requirements that older software simply cannot keep up with. Both the Southwest and FAA debacles centered on legacy systems that may have met their respective business needs at the time they were implemented but, over the years, became increasingly fragile in the face of changing requirements. Such fragility is a surefire result of technical debt. The coincidental occurrence of these two high-profile failures mere weeks apart lit a fire under organizations across both the public and private sectors to finally do something about their technical debt. It’s time to modernize, the pundits proclaimed, regardless of the cost. Ironically, at the same time, a different set of pundits, responding to the economic slowdown and prospects of a looming recession, recommended that enterprises delay modernization efforts in order to reduce costs short term. After all, modernization can be expensive and rarely delivers the type of flashy, top-line benefits the public markets favor. How, then, should executives make decisions about cleaning up the technical debt in their organizations? Just how important is such modernization in the context of all the other priorities facing the C-suite? Understanding and Quantifying Technical Debt Risk Some technical debt is worse than others. Just as getting a low-interest mortgage is a much better idea than loan shark money, so too with technical debt. After all, sometimes shortcuts when writing code are a good thing. Quantifying technical debt, however, isn’t a matter of somehow measuring how messy legacy code might be. The real question is one of the risk to the organization. Two separate examples of technical debt might be just as messy and equally worthy of refactoring. But the first example may be working just fine, with a low chance of causing problems in the future. The other one, in contrast, could be a bomb waiting to go off. Measuring the risks inherent in technical debt, therefore, is far more important than any measure of the debt itself — and places this discussion into the broader area of risk measurement or, more broadly, risk scoring. Risk scoring begins with risk profiling, which determines the importance of a system to the mission of the organization. Risk scoring provides a basis for quantitative risk-based analysis that gives stakeholders a relative understanding of the risks from one system to another — or from one area of technical debt to another. The overall risk score is the sum of all of the risk profiles across the system in question — and thus gives stakeholders a way of comparing risks in an objective, quantifiable manner. One particularly useful (and free to use) resource for calculating risk profiles and scores is Cyber Risk Scoring (CRS) from NIST, an agency of the US Department of Commerce. CRS focuses on cybersecurity risk, but the folks at NIST have intentionally structured it to apply to other forms of risk, including technical debt risk. Comparing Risks Across the Enterprise As long as an organization has a quantitative approach to risk profiling and scoring, then it’s possible to compare one type of risk to another — and, furthermore, make decisions about mitigating risks across the board. Among the types of risks that are particularly well-suited to this type of analysis are operational risk (i.e., risk of downtime), which includes network risk; cybersecurity risk (the risk of breaches); compliance risk (the risk of out-of-compliance situations); and technical debt risk (the risk that legacy assets will adversely impact the organization). The primary reason to bring these various sorts of risks onto a level playing field is to give the organization an objective approach to making decisions about how much time and money to spend on mitigating those risks. Instead of having different departments decide how to use their respective budgets to mitigate the risks within their scope of responsibility, organizations require a way to coordinate various risk mitigation efforts that leads to an optimal balance between risk mitigation and the costs for achieving it. Calculating the Threat Budget Once an organization looks at its risks holistically, one uncomfortable fact emerges: it’s impossible to mitigate all risks. There simply isn’t enough money or time to address every possible threat to the organization. Risk mitigation, therefore, isn’t about eliminating risk. It’s about optimizing the amount of risk we can’t mitigate. Optimizing the balance between mitigation and the cost of achieving it across multiple types of risk requires a new approach to managing risk. We can find this approach in the practice of Site Reliability Engineering (SRE). SRE focuses on managing reliability risk, a type of operational risk concerned with reducing system downtime. Given the goal of zero downtime is too expensive and time-consuming to achieve in practice, SRE calls for an error budget. The error budget is a measure of how far short of perfect reliability the organization targets, given the cost considerations of mitigating the threat of downtime. If we generalize the idea of error budgets to other types of risk, we can postulate a threat budget which represents a quantitative measure of how far short of eliminating a particular risk the organization is willing to tolerate. Intellyx calls the quantitative, best practice approach to managing threat budgets across different types of risks threat engineering. Assuming an organization has leveraged the risk scoring approach from NIST (or some alternative approach), it’s now possible to engineer risk mitigation across all types of threats to optimize the organization’s response to such threats. Applying Threat Engineering to Technical Debt Resolving technical debt requires some kind of modernization effort. Sometimes this modernization is a simple matter of refactoring some code. In other cases, it’s a complex, difficult migration process. There are several other approaches to modernization with varying risk/reward profiles as well. Risk scoring provides a quantitative assessment of just how important a particular modernization effort is to the organization, given the threats inherent in the technical debt in question. Threat engineering, in turn, gives an organization a way of placing the costs of mitigating technical debt risks in the context of all the other risks facing the organization — regardless of which department or budget is responsible for mitigating one risk or another. Applying threat engineering to technical debt risk is especially important because other types of risk, namely cybersecurity and compliance risk, get more attention and, thus, a greater emotional reaction. It’s difficult to be scared of spaghetti code when ransomware is in the headlines. As the Southwest and FAA debacles show, however, technical debt risk is every bit as risky as other, sexier forms of risk. With threat engineering, organizations finally have a way of approaching risk holistically in a dispassionate, best practice-based manner. The Intellyx Take Threat engineering provides a proactive, best practice-based approach to breaking down the organizational silos that naturally form around different types of risks. Breaking down such silos has been a priority for several years now, leading to practices like NetSecOps and DevSecOps that seek to leverage common data and better tooling to break down the divisions between departments. Such efforts have always been a struggle because these different teams have long had different priorities — and everyone ends up fighting for a slice of the budget pie. Threat engineering can align these priorities. Once everybody realizes that their primary mission is to manage and mitigate risk, then real organizational change can occur. Copyright © Intellyx LLC. Intellyx is an industry analysis and advisory firm focused on enterprise digital transformation. Covering every angle of enterprise IT from mainframes to artificial intelligence, our broad focus across technologies allows business executives and IT professionals to connect the dots among disruptive trends. As of the time of writing, none of the organizations mentioned in this article is an Intellyx customer. No AI was used to produce this article.
Are you looking to get away from proprietary instrumentation? Are you interested in open-source observability, but lack the knowledge to just dive right in? This workshop is for you, designed to expand your knowledge and understanding of open-source observability tooling that is available to you today. Dive right into a free, online, self-paced, hands-on workshop introducing you to Prometheus. Prometheus is an open-source systems monitoring and alerting tool kit that enables you to hit the ground running with discovering, collecting, and querying your observability today. Over the course of this workshop, you will learn what Prometheus is, what it is not, install it, start collecting metrics, and learn all the things you need to know to become effective at running Prometheus in your observability stack. Previously, I shared an introduction to Prometheus in a lab that kicked off this workshop. In this article, you'll be installing Prometheus from either a pre-built binary from the project or using a container image. I'm going to get you started on your learning path with this first lab that provides a quick introduction to all things needed for metrics monitoring with Prometheus. Note this article is only a short summary, so please see the complete lab found online here to work through it in its entirety yourself: The following is a short overview of what is in this specific lab of the workshop. Each lab starts with a goal. In this case, it is fairly simple: This lab guides you through installing Prometheus on your local machine, configuring, and running it to start gathering metrics. You are confronted right from the start with two possible paths to installing the Prometheus tooling locally on your machine: using a pre-compiled binary for your machine's architecture, or using a container image. Installing Binaries The first path you can take to install Prometheus on your local machine is to obtain the right version of the pre-compiled binaries for your machine architecture. I've provided the links to directly obtain Mac OSX, Linux, and Windows binaries. The installation is straightforward. You'll learn what a basic configuration looks like while creating your own to get started with scraping your first metrics from the Prometheus server itself. Once it's up and running, you'll explore the basic information available to you through the Prometheus status pages, a web console. You explore how to verify that your configured scraping target is up and running, then go and break your configuration to see what a broken target looks like on the web console status page. Next, you browse the available configuration flags for running your Prometheus server, look at the time series database status, explore your active configuration, and finish up by playing with some yet-to-be-explained query expressions in the provided tooling. That last exercise is more extensive than just pasting in queries, you'll learn about built-in validation mechanisms and explore the graphing visualization offered out of the box. This lab completes with you having an installed binary package for your machine's architecture, a running Prometheus with a basic configuration, and an understanding of the available tooling in the provided web console. Installing Container Image The second path you can take is to install Prometheus using a container image. This lab path is provided using an Open Container Initiative (OCI) standards-compliant tool known as Podman. The default requirement will be to use Podman Desktop, a graphical tool that also includes the command line tooling referred to in the rest of this lab. I've chosen to avoid the more complex issues of mounting a volume for your local configuration file to be made available to your running Prometheus container image. Instead, I am choosing to walk you through a few short steps to building your own local container image with your workshop configuration file. Once all of this is done, you are up and running with your Prometheus server just like in the previous section. The rest of this path covers the same as previously covered in the above section, where you explore all the basic information available to you through the Prometheus status pages through its web console. Missed Previous Labs? This is one lab in the more extensive free online workshop. Feel free to start from the very beginning of this workshop here if you missed anything previously: You can always proceed at your own pace and return any time you like as you work your way through this workshop. Just stop and later restart Perses to pick up where you left off. Coming Up Next I'll be taking you through the following lab in this workshop where you'll start learning about the Prometheus Query Language and how to gain insights into your collected metrics. Stay tuned for more hands-on material to help you with your cloud-native observability journey.
Monitoring is a small aspect of our operational needs; configuring, monitoring, and checking the configuration of tools such as Fluentd and Fluentbit can be a bit frustrating, particularly if we want to validate more advanced configuration that does more than simply lift log files and dump the content into a solution such as OpenSearch. Fluentd and Fluentbit provide us with some very powerful features that can make a real difference operationally. For example, the ability to identify specific log messages and send them to a notification service rather than waiting for the next log analysis cycle to be run by a log store like Splunk. If we want to test the configuration, we need to play log events in as if the system was really running, which means realistic logs at the right speed so we can make sure that our configuration prevents alerts or mail storms. The easiest way to do this is to either take a real log and copy the events into a new log file at the speed they occurred or create synthetic events and play them in at a realistic pace. This is what the open-source LogGenerator (aka LogSimulator) does. I created the LogGenerator a couple of years ago, having addressed the same challenges before and wanting something that would help demo Fluentd configurations for a book (Logging in Action with Fluentd, Kubernetes, and more). Why not simply copy the log file for the logging mechanism to read? Several reasons for this. For example, if you're logging framework can send the logs over the network without creating back pressure, then logs can be generated without being impacted by storage performance considerations. But there is nothing tangible to copy. If you want to simulate into your monitoring environment log events from a database, then this becomes even harder as the DB will store the logs internally. The other reason for this is that if you have alerting controls based on thresholds over time, you need the logs to be consumed at the correct pace. Just allowing logs to be ingested whole is not going to correctly exercise such time-based controls. Since then, I've seen similar needs to pump test events into other solutions, including OCI Queue and other Oracle Cloud services. The OCI service support has been implemented using a simple extensibility framework, so while I've focused on OCI, the same mechanism could be applied as easily to AWS' SQS, for example. A good practice for log handling is to treat each log entry as an event and think of log event handling as a specialized application of stream analytics. Given that the most common approach to streaming and stream analytics these days is based on Kafka, we're working on an adaptor for the LogSimulator that can send the events to a Kafka API point. We built the LogGenerator so it can be run as a script, so modifying it and extending its behavior is quick and easy. we started out with developing using Groovy on top of Java8, and if you want to create a Jar file, it will compile as Java. More recently, particularly with the extensions we've been working with, Java11 and its ability to run single file classes from the source. We've got plans to enhance the LogGenerator so we can inject OpenTelementry events into Fluentbit and other services. But we'd love to hear about other use cases see for this. For more on the utility: Read the posts on my blog See the documentation on GitHub
Application modernization has become a hot topic in recent years as organizations strive to improve their systems and stay ahead of the competition. From improved user experience to reduced costs and increased efficiency, there are many reasons companies consider modernizing their legacy systems. So, should you consider this investment? Let’s find out! What Is Application Modernization? Application modernization updates legacy systems to align with current market trends and technologies. This involves upgrading or replacing the underlying infrastructure, architecture, and technology stack to improve efficiency, security, and user experience. The aim is to improve performance, functionality, and overall user experience while reducing maintenance costs. Application modernization helps organizations remain competitive by keeping their applications current and relevant. This can be done through various methods, such as re-architecting, re-platforming, or refactoring, to enhance architecture, migrate to a new platform, or modify the code base. The primary goal is to provide organizations with an efficient, secure, and user-friendly application that supports their business objectives. The Need for Application Modernization If an outdated or legacy system is not aligned with the company’s objectives, it can pose several difficulties. It may lack security patch updates, making it vulnerable to viruses and bugs that can disrupt its functionality. Additionally, these systems may present security risks as they are no longer supported by the vendor or company. Maintaining such systems can also be costly for businesses. Benefits of Application Modernization Boosts Productivity Training software developers and teams to use legacy systems can be costly and time-consuming. Furthermore, some outdated applications are not capable of automating repetitive tasks or integrating new processes, leading to decreased productivity among engineering teams. Modernizing applications integrate different aspects of the development process into a unified ecosystem, enabling employees to work on multiple tasks at once and reducing the time it takes to bring products to market. Additionally, modernized applications come equipped with advanced features and tools that simplify operations and don’t require extensive training, leading to increased employee productivity. Reduces Operational Costs and Tech Debt The maintenance requirements of legacy applications make them an uneconomical solution for organizations. Furthermore, these applications are typically hosted in on-premise data centers that are expensive to maintain and often lack adequate documentation, making it challenging to implement new features. As a result, these applications accrue a high amount of technical debt. In contrast, modernizing an enterprise’s technology infrastructure enables the organization to leverage the capabilities of the private cloud to meet emerging digital business requirements. This eliminates the need for a separate data center, as cloud databases offer a cost-effective pay-as-you-go model where you only pay for the services you use. The implementation of DevOps practices can significantly improve operations and reduce costs through the optimization of CI/CD pipelines, more efficient release cycles, and continuous improvement across all development processes. Improves Business Agility Traditionally, developers were required to create monolithic environments to make any modifications or updates to the code, server, and configurations. However, with the advent of application modernization, it is no longer necessary to shut down servers or plan extensive release updates, as the application is divided into multiple individually managed workloads. An example of the benefits of app modernization can be seen in the rapid growth of Pinterest. The company successfully increased its user base from 50,000 to 17 million in just nine months by scaling its processing and storage activities on Amazon Web Services. Enhances Scalability In a rapidly changing technological landscape, it is imperative for businesses to continually upgrade to remain competitive and successful. To thrive and expand physically and technologically, it is necessary for organizations to adapt and evolve. However, legacy systems often make it difficult to introduce new features or functionality, which can hinder a business’s growth and competitiveness. This is why organizations require storage solutions that can accommodate changing requirements and enable them to scale as needed. With the scalability offered by cloud technology, businesses can dynamically add or reduce IT resources to accommodate changing workloads. Keeps Up With Latest Trends Modernization of legacy systems allows companies to integrate the latest technologies and features into their older systems, seamlessly merging necessary components. This also enables organizations to take advantage of cutting-edge technologies such as big data, machine learning, artificial intelligence, and cloud computing. In the realm of customer service, modernization enables the use of AI and machine learning through predictive analytics, including natural language processing for chatbots and voice and text analysis. This is particularly beneficial in an era where personalization is a key differentiator for businesses. Provides Better Support and Maintenance Legacy applications can become costly to maintain over time due to bugs and outdated code. Modernization offers a solution by migrating legacy logic and code to a new platform and aligning the application infrastructure with the latest technologies and trends. This enables easier source code changes, database migrations, and documentation writing, as well as leveraging containerization and orchestration to set the desired state for modern applications. The re-engineering approach employed in modernization includes data and coding restrictions to ensure system security and prevent vulnerabilities. As a result, modernization enables companies to update their legacy systems with the latest technology stack, aligned with their business goals, for easier support and maintenance. Improves User Experience Legacy modernization prioritizes user experience (UX) to meet the demands for flexible and engaging digital experiences in web and mobile applications. The process involves the redesign of the user-facing components to improve information access and the overall UX. This redesign incorporates visual elements such as icons, font style, size, and others to create a distinct and visually appealing interface, intending to enhance perception, understanding, navigation, and interaction with the system or application. Conclusion Modernizing applications can bring many benefits, including improved user experience, enhanced security, cost savings, and improved integration. Businesses should consider this investment to keep up with the competition and meet changing customer needs. To achieve the best results, it’s essential to work with a trustworthy provider with experience in the application modernization process.
Web application testing is an essential part of the software development lifecycle, ensuring that the application functions correctly and meets the necessary quality standards. Best practices for web application testing are critical to ensure that the testing process is efficient, effective, and delivers high-quality results. These practices cover a range of areas, including test planning, execution, automation, security, and performance. Adhering to best practices can help improve the quality of the web application, reduce the risk of defects, and ensure that the application is thoroughly tested before it is released to users. By following these practices, testing teams can improve the efficiency and effectiveness of the testing process, delivering high-quality web applications to users. 1. Test Early and Often Testing early and often means starting testing activities as soon as possible in the development process and continuously testing throughout the development lifecycle. This approach allows for issues to be identified and addressed early on, reducing the risk of defects making their way into production. Some benefits of testing early and often include: Identifying issues early in the development process, reducing the cost and time required to fix them. Ensuring that issues are caught before they impact users. Improving the overall quality of the application by catching defects early. Reducing the likelihood of rework or missed deadlines due to last-minute defects. Improving collaboration between developers and testers by identifying issues early on and resolving them together. By testing early and often, teams can ensure that the web application is thoroughly tested and meets the necessary quality standards before it is released to users. 2. Create a Comprehensive Test Plan Creating a comprehensive test plan involves developing a detailed document that outlines the approach, scope, and schedule of the testing activities for the web application. A comprehensive test plan typically includes the following elements: Objectives: Define the purpose of the testing and what needs to be achieved through the testing activities. Scope: Define what functionalities of the application will be tested and what won't be tested. Test Strategy: Define the overall approach to testing, including the types of testing to be performed (functional, security, performance, etc.), testing methods, and tools to be used. Test Schedule: Define the testing timelines, including the start and end dates, and the estimated time required for each testing activity. Test Cases: Define the specific test cases to be executed, including input values, expected outputs, and pass/fail criteria. Environment Setup: Define the necessary hardware, software, and network configurations required for testing. Test Data: Define the necessary data required for testing, including user profiles, input values, and test scenarios. Risks and Issues: Define the potential risks and issues that may arise during testing and how they will be managed. Reporting: Define how the testing results will be recorded, reported, and communicated to stakeholders. Roles and Responsibilities: Define the roles and responsibilities of the testing team and other stakeholders involved in the testing activities. A comprehensive test plan helps ensure that all testing activities are planned, executed, and documented effectively, and that the web application is thoroughly tested before it is released to users. 3. Test Across Multiple Browsers and Devices Testing across multiple browsers and devices is a crucial best practice for web application testing, as it ensures that the application works correctly on different platforms, including different operating systems, browsers, and mobile devices. This practice involves executing testing activities on a range of popular web browsers, such as Chrome, Firefox, Safari, and Edge, and on various devices, such as desktops, laptops, tablets, and smartphones. Testing across multiple browsers and devices helps identify issues related to compatibility, responsiveness, and user experience. By testing across multiple browsers and devices, testing teams can: Ensure that the web application is accessible to a wider audience, regardless of their preferred platform or device. Identify issues related to cross-browser compatibility, such as variations in rendering, layout, or functionality. Identify issues related to responsiveness and user experience, such as issues with touchscreens or mobile-specific features. Improve the overall quality of the application by identifying and resolving defects that could impact users on different platforms. Provide a consistent user experience across all platforms and devices. In summary, testing across multiple browsers and devices is a crucial best practice for web application testing, helping ensure that the application functions correctly and delivers a high-quality user experience to users on all platforms. 4. Conduct User Acceptance Testing (UAT) User acceptance testing (UAT) is a best practice for web application testing that involves testing the application from the perspective of end-users to ensure that it meets their requirements and expectations. UAT is typically conducted by a group of users who represent the target audience for the web application, and who are asked to perform various tasks using the application. The testing team observes the users' interactions with the application and collects feedback on the application's usability, functionality, and overall user experience. By conducting UAT, testing teams can: Ensure that the application meets the requirements and expectations of end-users. Identify usability and functionality issues that may have been missed during other testing activities. Collect feedback from end-users that can be used to improve the overall quality of the application. Improve the overall user experience by incorporating user feedback into the application's design. Increase user satisfaction by ensuring that the application meets their needs and expectations. UAT is an essential best practice for web application testing, as it ensures that the application meets the needs and expectations of end-users and delivers a high-quality user experience. 5. Automate Testing Automating testing is a best practice for web application testing that involves using software tools and scripts to execute testing activities automatically. This approach is particularly useful for repetitive and time-consuming testing tasks, such as regression testing, where automated tests can be executed quickly and efficiently. Automation testing can also help improve the accuracy and consistency of testing results, reducing the risk of human error. By automating testing, testing teams can: Reduce testing time and effort, allowing more comprehensive testing to be performed within the available time frame. Increase testing accuracy and consistency, reducing the risk of human error and ensuring that tests are executed consistently across different environments. Improve testing coverage by allowing for more tests to be executed in a shorter time frame, increasing the overall effectiveness of the testing process. Facilitate continuous testing by enabling automated tests to be executed automatically as part of the development process, allowing issues to be identified and resolved more quickly. Reduce testing costs by reducing the need for manual testing and increasing testing efficiency. Automating testing is an essential best practice for web application testing, as it can significantly improve the efficiency and effectiveness of the testing process, reduce costs, and improve the overall quality of the application. 6. Test for Security Testing for security is a best practice for web application testing that involves identifying and addressing security vulnerabilities in the application. This practice involves conducting various testing activities, such as penetration testing, vulnerability scanning, and code analysis, to identify potential security risks and vulnerabilities. By testing for security, testing teams can: Identify and address potential security vulnerabilities in the application, reducing the risk of security breaches and data theft. Ensure compliance with industry standards and regulations, such as PCI DSS, HIPAA, or GDPR, that require specific security controls and measures to be implemented. Improve user confidence in the application by demonstrating that security is a top priority and that measures have been taken to protect user data and privacy. Enhance the overall quality of the application by reducing the risk of security-related defects that could impact users' experience and trust in the application. Provide a secure and reliable platform for users to perform their tasks and transactions, improving customer satisfaction and loyalty. Testing for security is a critical best practice for web application testing, as security breaches can have significant consequences for both users and businesses. By identifying and addressing potential security vulnerabilities, testing teams can ensure that the application provides a secure and reliable platform for users to perform their tasks and transactions, reducing the risk of security incidents and data breaches. 7. Perform Load and Performance Testing Load and performance testing are best practices for web application testing that involve testing the application's ability to perform under various load and stress conditions. Load testing involves simulating a high volume of user traffic to test the application's scalability and performance, while performance testing involves measuring the application's response time and resource usage under different conditions. By performing load and performance testing, testing teams can: Identify potential bottlenecks and performance issues that could impact the application's usability and user experience. Ensure that the application can handle expected traffic loads and usage patterns without degrading performance or causing errors. Optimize the application's performance by identifying and addressing performance issues before they impact users. Improve user satisfaction by ensuring that the application is responsive and performs well under various conditions. Reduce the risk of system failures and downtime by identifying and addressing performance issues before they cause significant impacts. Load and performance testing are essential best practices for web application testing, as they help ensure that the application performs well under various conditions and user loads. By identifying and addressing performance issues, testing teams can optimize the application's performance, improve user satisfaction, and reduce the risk of system failures and downtime. 8. Conduct Regression Testing Regression testing is a best practice for web application testing that involves retesting previously tested functionality to ensure that changes or fixes to the application have not introduced new defects or issues. This practice is particularly important when changes have been made to the application, such as new features or bug fixes, to ensure that these changes have not impacted existing functionality. By conducting regression testing, testing teams can: Ensure that changes or fixes to the application have not introduced new defects or issues that could impact user experience or functionality. Verify that existing functionality continues to work as expected after changes have been made to the application. Reduce the risk of unexpected issues or defects in the application, improving user confidence and trust in the application. Improve the overall quality of the application by ensuring that changes or fixes do not negatively impact existing functionality. Facilitate continuous testing and delivery by ensuring that changes can be made to the application without introducing new issues or defects. Regression testing is an important best practice for web application testing, as it helps ensure that changes or fixes to the application do not negatively impact existing functionality. By identifying and addressing issues before they impact users, testing teams can improve the overall quality of the application and reduce the risk of unexpected issues or defects. 9. Document and Report Defects Documenting and reporting defects is a best practice for web application testing that involves tracking and reporting any issues or defects found during testing. This practice ensures that defects are documented, communicated, and addressed appropriately, improving the overall quality of the application and reducing the risk of user impact. By documenting and reporting defects, testing teams can: Ensure that all defects are tracked, documented, and communicated to the appropriate stakeholders. Prioritize and address critical defects quickly, reducing the risk of user impact and improving the overall quality of the application. Provide clear and detailed information about defects to developers and other stakeholders, improving the efficiency of the defect resolution process. Ensure that defects are resolved appropriately and that fixes are properly tested before being deployed to production. Analyze defect trends and patterns to identify areas of the application that require further testing or improvement. Documenting and reporting defects is a critical best practice for web application testing, as it ensures that defects are properly tracked, communicated, and addressed, improving the overall quality and reliability of the application. By identifying and addressing defects early in the development cycle, testing teams can reduce the risk of user impact and ensure that the application meets user requirements and expectations. 10. Collaborate With the Development Team Collaborating with the development team is a best practice for web application testing that involves establishing open communication and collaboration between the testing and development teams. This practice ensures that both teams work together to identify, address, and resolve issues and defects efficiently and effectively. By collaborating with the development team, testing teams can: Ensure that testing is integrated into the development process, improving the efficiency of the testing and development process. Identify defects and issues early in the development process, reducing the time and cost required to address them. Work with developers to reproduce defects and provide detailed information about issues, improving the efficiency of the defect resolution process. Identify areas of the application that require further testing or improvement, providing valuable feedback to the development team. Ensure that the application meets user requirements and expectations, improving user satisfaction and confidence in the application. Collaborating with the development team is an essential best practice for web application testing, as it ensures that both teams work together to identify, address, and resolve issues efficiently and effectively. By establishing open communication and collaboration, testing and development teams can ensure that the application meets user requirements and expectations while improving the efficiency of the testing and development process. Conclusion Web application testing is a critical process that ensures the quality, reliability, and security of web-based software. By following best practices such as proper planning, test automation, a suitable test environment, a variety of testing techniques, continuous testing, bug tracking, collaboration, and testing metrics, testers can effectively identify and fix issues before the software is released to the public, resulting in a better user experience.
Git is a version control system that has become an essential tool for developers worldwide. It allows developers to keep track of changes made to a project's codebase, collaborate with others on the same codebase, and roll back changes when necessary. Here are the top 11 Git commands every developer should know. 1. git config git config is a command that allows you to configure Git on your system. It enables you to view and modify Git's settings, such as your user name and email address, default text editor, and more. The git config command is used to set configuration values that affect the behavior of Git. Configuration values can be set globally or locally, depending on whether you want the configuration to apply to all Git repositories on your system or just the current repository. Some common use cases of the git config command includes setting your user name and email address, configuring the default text editor, and customizing Git's behavior. By using git config, you can tailor Git to your specific needs and preferences, making it easier and more efficient to work with Git on your projects. Setting your user name and email address globally: git config --global user.name "Riha Mervana" git config --global user.email "firstname.lastname@example.org" You can read back these values as: git config --list Output: user.name=Riha Mervana email@example.com When you open the global configuration file ~/.gitconfig, you will see the content saved as: [user] name = Riha Mervana email = firstname.lastname@example.org 2. git init The first command every developer should know is git init. This command initializes an empty Git repository in the current directory. This command creates a .git directory in the current directory, which is where Git stores all the information about the repository, including the commit history and the files themselves. The git init command can be used in two ways: Either changes a directory using the cd command and run git init to create a Git repository…. git init Or create an empty Git repository by specifying a directory name using the git init command. git init <directory-name> 3. git clone git clone is used to create a local copy of a remote repository. This command downloads the entire repository and its history to your local machine. You can use this command to create a local copy of a repository that you want to contribute to or to start working on a new project. Here is an example of how HTTPS looks. git clone <https://github.com/reactplay/react-play.git> This will clone the react-play project locally for you. Then you can change to the directory and start working on it. cd react-play 4. git add git add is used to stage changes made to a file. This command tells Git that you want to include the changes made to a file in the next commit. You can add individual files or directories or all changes in the current directory by using the git add . command. The git add command is used to send your file changes to the staging area. git add <file-name> Also, git add <directory-name> 5. git commit git commit is used to save changes made to the repository. This command creates a new commit with a message that describes the changes made. The message should be descriptive and provide context about the changes made. git commit -m "add a meaningful commit message" 6. git push git push is used to upload local changes to a remote repository. This command sends the changes made in your local repository to the remote repository, where other developers can access them. You can use this command to contribute changes to an open-source project or to share changes with your team. git push <remote> <branch-name> 7. git pull git pull is used to download changes made to a remote repository to your local repository. This command is useful when you want to work on the latest version of a project or when you want to merge changes made by other developers into your local repository. git pull 8. git branch git branch is used to create, list, and delete branches. A branch is a copy of the repository that you can use to work on new features or fixes without affecting the main branch. You can use this command to create a new branch, list all the branches in the repository, or delete a branch. List all the branches: git branch Create a new branch with a branch name: git branch <branch-name> Delete a specific branch: git branch -d <branch-name> Rename a branch: git branch -m <branch-name> List all Remote branches (with a marking of the current branch): git branch -a 9. git merge git merge is used to merge changes made in one branch into another branch. This command is useful when you want to incorporate changes made in a feature branch into the main branch. You can use this command to merge changes made by other developers into your local branch or to merge your changes into the main branch. git merge <branch-name> 10. git checkout git checkout is used to switch between branches or to revert changes made to a file. This command allows you to move between branches or switch to a specific commit in the commit history. You can also use this command to discard changes made to a file and revert it to a previous state. git checkout <branch-name> 11. git log git log is used to view the commit history of a repository. This command displays a list of all the commits made to the repository, including the commit message, the author, and the date and time of the commit. You can use this command to track changes made to the repository over time and to identify which commits introduced specific changes. git log <options> <branch_name> Conclusion Git is a powerful version control system that is widely used in software development. Knowing how to use Git effectively is essential for developers to collaborate on projects, keep track of changes, and maintain code quality. These above commands provide developers with the basic tools they need to manage their codebase effectively. However, Git is a complex system with many additional features and commands that can be used to improve workflow and productivity. Therefore, developers should strive to learn more about Git and its capabilities in order to take full advantage of its benefits.