Agile, Waterfall, and Lean are just a few of the project-centric methodologies for software development that you'll find in this Zone. Whether your team is focused on goals like achieving greater speed, having well-defined project scopes, or using fewer resources, the approach you adopt will offer clear guidelines to help structure your team's work. In this Zone, you'll find resources on user stories, implementation examples, and more to help you decide which methodology is the best fit and apply it in your development practices.
Life means dealing with bad things that may or may not happen. We call them risks. We assess, evaluate, mitigate, accept, and sometimes blithely ignore them. Building complex and original software are inherently risky and the Agile way of working does not fix that. That’s why we need to be true to the value of courage. I’ll start my argument with a refresher on the topic and some practical examples. The dictionary defines risks as the likelihood of bad things happening. Equally important is the nature and extent of the damage when the risk becomes a reality. The exact balance between probability and consequences is the bread and butter of actuaries at insurance companies. Their models inform them how likely it is that an average dwelling goes up in flames, so they can put a price on the collective risk of their millions of customers. Several houses are certain to burn down each year and their owners need to be reimbursed. It’s a predictable risk. The Scala World Hiking Trip This won’t do in aviation or civil engineering. Every lethal incident prompts an investigation to raise the standard and reduce risk up to the point where we are left with black swans only, events so rare and unpredictable you can’t possibly prepare for them. Most airline crashes are the result of these unknown unknowns. Here’s a more mundane example. At Scala World 2019 I joined the attendees for the traditional pre-conference hike in the English Lake District. I had visited the area before and arrived with waterproof gear and sturdy boots, knowing the terrain and the unpredictable British weather, which even in September can be brutal. We set off in the sunshine and of course, it had to rain for much of the walk. Several walkers had not read or minded the instruction email, or even checked the weather forecast. Arriving woefully unprepared in cotton jeans and t-shirts, they got thoroughly soaked and a little miserable. But there was safety in numbers. Spare ponchos were shared, and nobody would have been in mortal danger if they had sprained an ankle while clambering over slippery boulders with their inadequate footwear. Four Ways of Doing Nothing The literature distinguishes five ways to manage risks. Funnily enough only one deals with facing the risk head-on. The other four strategies are variations of doing nothing. And the hiking trip covers all five. The proactive way to tackle a risk is to make sure it is less likely to happen and/or make the consequences less unpleasant when they do happen. This is called mitigation. You can’t stop the clouds from raining, but you don’t need to get soaked and cold. Sturdy boots, Merino undergarments, head torch, Garmin GPS device, emergency rations. You keep mitigating until you’re at ease (or broke). Then you pick an option from the remaining four. Accept - After careful consideration, you decide the risk is acceptable. You’re prepared for what might happen. Yes, it will be cold and wet, but you’re all Gore-Tex-ed up and with experienced ramblers. Cancel - You’ve done what you can to mitigate but decide the risks are still not acceptable. You call the whole thing off, so you’re no longer exposed to the risks. Transfer - What if you break a leg and need to be airlifted to the hospital? That will cost a fortune. You take out premium daredevil insurance to cover this unlikely event. Ignore - Not much of a strategy at all, this one. You don’t know and don’t seem to care. You think the Lakes are mere picture postcard cuteness, where you couldn’t possibly die of hypothermia when injured on a solitary winter hike, unable to call for help. You didn’t check the weather forecast, went out in jeans and trainers, and your phone charged 20%. Now let’s see how these five strategies apply to software risks: Mitigation We are good boy/girl scouts. We write clean, well-tested and peer-reviewed code and are in close contact with the business to make sure expectations are aligned. If we feel that our code is of sufficient quality and we have a good system to deal with bugs in production, we accept the risks. Or, if we’re working on a self-driving car, an A minus is still not good enough. While the risks may be more about liability and legislation than about the autonomous driving skills themselves, they are still too great to accept. Remember when each IT outfit had its own server room and surly sysadmin? We’ve transferred the troubles of maintaining bare metal (and much more) to one-stop-shop Cloud providers. Last but not least, there’s the cowboy mentality of throwing your code at the wall and seeing if it sticks, which was not the idea behind continuous delivery. Still, one person’s enterprising spirit is another one’s cavalier ignorance. The Proof Was Always in the Pudding Not every bug costs life, but some of them do. And while all bugs are expensive, so is not shipping, or gold-plating to perfection. The world of software is too vast for a single risk management strategy. Yet despite the variety, software risks come in two predictable categories: the wrong thing done right, or the right thing done wrong. We may deliver a product that does a terrific job of what the customer doesn’t need. Alternatively, we may understand the customer perfectly, but botch up the implementation. Both are bad, and combinations of the two are common. This is a disheartening prospect, but building original software means finding novel solutions, which means dealing with the unknown. Things will go wrong. We can mitigate the heck out of our code with more and better design documents, testing and peer reviews, but eventually, it needs to prove itself in the real world. In civil engineering or the Artemis space project, big up-front designs make sense. You can’t make a bridge two feet wider once it’s built. When going back to the drawing board is crushingly expensive, thorough risk minimalization is called for. We used to make software in the same fashion, and it didn’t work. The Waterfall approach tried to mitigate the risk of building the wrong thing before any working code was shared with the user. Well, good luck with that. No putative interpretation of what the user wants ever proved capable of predicting exactly what they needed. The proof has always been in the pudding. The Agile Compromise We cannot eliminate the risk of building the wrong thing with better design and more focus groups. We can only do it by shorter and more frequent iterations of a working product that is incomplete and maybe not that great yet. That’s the Agile compromise. Shorter cycles decrease the risk of building the wrong thing but increase the risk of degrading the process. Accruing technical debt is one. Not a necessary consequence, just a standard price to pay for quicker deliveries. No pundit with stories about the constant commitment to quality will convince me otherwise. If you want greater speed, accept more risks. The Agile compromise towards risk-taking also recognizes that software, as a creative discipline, by nature exposes black swans: the risks we didn’t know we would ever run into. No engineering approach provides full reassurance against them, nor can testing and validation ever give you full peace of mind. It’s a little bit scary, but if you pride yourself on an Agile mindset, you must embrace it. Software is complex rather than complicated. Its many moving parts behave in unpredictable ways when unleashed on the world. Risks are a natural part of that property. Recognizing and embracing them with courage is the only way in a team that bears joint responsibility for fixing the consequences and doesn’t point fingers.
Some IT projects fail, and mistakes happen. KPMG Project Management survey showed that more than two-thirds of organizations suffered at least one project failure in the previous year. So today, let's focus on less severe cases, which can still be saved! When Do You Need to Look For Help? When completing many software projects and helping various businesses put their ideas to life using code, we've learned that multiple coexisting factors usually cause problems. For example, it was often the case that we took a project over after the previous software development team. Some of these situations were classic project rescue cases where we stepped in to clean up and make things work. How do you usually know that your software project is in trouble? You may be encountering some of the following signs: There are stability or performance issues. There is no clear documentation. You have problems communicating with the project team members. People are leaving the project/software company. The people left in the project are missing the required skills. Poor project management/overall mess. There are issues with delivering new tasks and business value on time, leading to missed deadlines. How are companies responding to such situations? Sometimes they attempt to add new people to the team. In other cases, they look for external help in the form of consultancy. Finally, in edge cases, they seek out a new team or switch software vendors. How to Help With the Project? Analyze the Situation The first step when taking over a project or joining as a consultant is understanding the situation. Talk to different stakeholders to understand the system's role and how it works. Knowledge transfer from the previous team is always a good thing, but not always possible. If there is an option to ask questions to people leaving the team/who have already left, leverage such possibility as much as it is possible. Analyze the documentation, wikis, readme, and of course, the code. Ask different parties about the system's biggest current issues to get the whole image. This is the moment several unpleasant things may be discovered, e.g., reasons why the development team decided to leave the projects. There may be plenty of them: legacy stack, bad project quality, issues with project management or team lead, or lack of proper skills. This will allow you to understand better why the project reached a given situation. It is a rather uncommon situation to find out that a vendor left a perfect project. Short-Term Actions The first and most important thing is to learn how to build and deploy a given system. This is a necessity just in case any failure appears. What is more, you should get access to the logs and metrics to be able to analyze appearing issues. The second step is to check if there are any low-hanging fruits; discuss if there are any big issues with the system, which can be resolved with a small time investment. Those won't be the best, most clean, and most efficient solutions, but they may resolve customers' biggest pains. Examples: In one project we took over, we encountered a situation where the system processing incoming data was suddenly "hanging" after a few days from deployment. There were no tests, no errors were visible in the logs, and it was difficult to rapidly identify and fix the issue's root cause. The simplest workaround was to schedule daily restarts before the beginning of customers' business day. This allowed the business to operate and gave us more time to solve the situation correctly. Another project had huge performance issues, which made the system unusable. A short debugging session showed that a single library had a bug causing threads to hang. A simple dependency update has made the system faster. In the long term, it was still not enough, but in the short term, it made a huge difference. The third step is to do small cleanups around the work organization. Check if there are hanging, not resolved branches or pull requests. Check the way issues are arranged; if scrum or kanban were used, perhaps it is necessary to examine the backlog items and remove any that are outdated and obscure the project status. Verify if it is easy to find the latest documentation (if such exists); maybe there are a few versions in a few different places. Write down your discoveries from previous steps and knowledge transfers. The goal is to make work easier and more accessible for new team members who may join later. Long-Term Plans A long-term roadmap requires a deep discussion with the customer. First, you need to understand the business's driving factors; is it more important to invest in stability and scalability to prepare for new customers, or maybe there are missing features that will be revolutionary for a given product? Unfortunately but some trade-offs will need to be made. Step-By-Step Improvements It may be possible to take over the project and improve it step by step, developing new features and improving the old codebase in parallel. This is the optimal situation; however, not the easiest one. You tackle problems one by one, write new tests, docs, code, deploy, and move to the next one. This approach may include splitting the codebase into separate microservices and changing the overall architecture and methods of inter-service communication. Example:In one of the earlier mentioned projects, we agreed to continue operating the project after the initial quick fixes, allowing us to develop new features. However, at the same time, we started rework of the computational-heavy part of the system, which was a bottleneck. In the end, it became a separate microservice. Abandon and Rewrite From Scratch There are cases where projects are in a disastrous state; no tests, no solid docs, bad code quality, legacy technologies, overall mess, and mistakes made on the architectural level. Making any change in such an environment is very risky, and adding tests is also challenging when code is written without any thought about them. This situation is a total edge case and causes a lot of issues. You may decide to leave the old system in maintenance mode, fixing only the most important bugs, and start writing a new system from scratch. Unfortunately, new system creation takes time and may take years to produce, depending on product complexity. Such a situation is risky from a business perspective because it would mean that new features will reach customers later. Still, what is more, it is risky from a development perspective because you have to do a huge migration between old and new and verify if it works the same from a feature point of view. Example: We encountered a production system whose major part was covered by three tests. It had issues on the code level (it was visible that a major part was written by people still learning a given programming language), structural level (issues with running existing tests locally and writing additional ones), and architectural (large infrastructure costs, where at the same time there were performance issues). The decision was made together with the customer to abandon the old code and do a rewrite using different technologies which were a better fit to the given problem, which would later be easier to maintain by the customer and allow to reduce the monthly system costs. Summary Rescuing projects is not trivial. It may take time to introduce the required improvements and enable further business feature development. It's good to start such an initiative with a skilled team, who knows the leveraged technologies very well, so they can spot potential problems faster. It's rarely a lost cause; sometimes, it takes time and patience to see a positive outcome.
Software testing is the process of evaluating a software product to detect errors and failures and ensure its suitability for use. It can be performed manually (where testers use their skill, experience, intuition, and knowledge) or automatically (where the tester’s actions are guided by a test script). The fundamental objective of the test process is to ensure that all specified requirements of a software system have been met by the development process and that no undetected errors remain in the system. However, the overall aim of testing is to provide customer or end-user value by detecting defects as early as possible. Testing occurs in different phases throughout the Software Development Life Cycle (SDLC). Testing may also occur after the completion of each phase or after certain checkpoints within each development phase. The different phases through which a piece of software passes before it is released for use are called the Software Testing Life Cycle (STLC). In this article on STLC, we will discuss the fundamentals of software testing, the phases of the software testing life cycle, methodologies, and their best practices. Let’s dive in! What Is the Software Testing Life Cycle? The STLC is an iterative, cyclical process that has the goal of preventing errors in the software. It includes test analysis, planning, designing, setup, execution, and test closure activities. Due to the complexity of the software, it is impossible to guarantee that a product will be free of errors if only one test is performed. Therefore, multiple tests are performed on every phase of the Software Testing Life Cycle. There are different types of tests that can be implemented alongside each other or separately at any time during the life cycle. Examples include usability testing, regression testing, exploratory testing, and sanity testing – for all of these different types, there are many subcategories. Each category has its own special purpose and will vary depending on the circumstances. The STLC has the following phases, which we will discuss in detail in the later sections: Requirement Analysis Test Planning Test Case Designing and Development Test Environment Setup Test Execution Test Cycle Closure Characteristics of the Software Testing Life Cycle STLC is a step-by-step method for ensuring high-quality software. Improves the consistency and efficiency of the agile testing process. As soon as needs are determined or the Software Requirements Specification (SRS) document is ready, the STLC process should begin. Defines goals and expectations clearly for each project aspect. The tester can analyze and establish the scope of testing and write effective test cases while the software or product is still in the early stages of the STLC. It aids in the reduction of the test cycle time and the provision of higher product quality. Ensures the features are tested and passed before adding the additional features. The Difference Between SDLC and STLC Software Development Life Cycle, or SDLC, is one of the most important phases in the development of any software. During this phase, various steps are taken to develop a product and make it market-ready. Software testing is one of the most critical parts of the SDLC process. It has an entire life cycle known as Software Testing Life Cycle or STLC. So, what’s the difference between SDLC and STLC? SDLC STLC Focuses on developing a product. Focuses on product testing. It helps in developing good quality software. It helps in making the software defects-free. Understanding user needs and creating a product that is beneficial to them. Understanding the products development requirements and ensuring it performs as intended. In SDLC, the business analyst gathers the requirements and creates a Development Plan. In STLC, the QA team analyzes requirements like functional and non-functional documents and creates a System Test Plan. In SDLC, the development team creates high and low-level design plans. In STLC, the test analyst creates the Integration test plan. SDLC is responsible for collecting requirements and creating features. STLC is responsible for creating tests adapted to the collected requirements and verifying that features meet those requirements. Before testing, the SDLC phases are completed. After the SDLC phases are completed, the STLC phases begin. The end goal is to deliver a high-quality product that users can utilize. The ultimate goal is to uncover bugs in the product and submit them to the development team so they can be fixed. Software Testing Life Cycle Phases It’s important to understand the phases of the Software Testing Life Cycle to make better decisions about how to test your software. One critical aspect of the testing lifecycle is determining which phase of testing to perform on your software. The first step in this process is to determine whether you need to perform testing on your product or not. If your product is an app that collects data, it will have less need for testing than if it were a banking website that processes financial transactions. Some products may undergo all phases of testing, while others may be tested only partially. For example, a website that exists purely as a marketing tool might not need to go through any tests other than usability tests. Testing can happen at any time, and each phase should be performed at least once before moving on to the next. Every phase is independent of the rest, so you can perform only one if necessary. A typical Software Testing Life Cycle consists of the following phases; let’s have a detailed understanding of each phase. Requirement Analysis Requirement analysis is the initial phase in the Software Testing Life Cycle. This phase examines functional and non-functional requirements from the testing perspective to identify the testable needs. Customers, solution architects, technical leads, business analysts, and other stakeholders communicate with the quality assurance team and comprehend the clients’ requirements to tailor the tests to the customer’s specifications. Entry Criteria Specification document and application architecture are two documents that must be available. The acceptance criteria and the availability of the above documents must be clearly established. Activities in the Requirement Analysis Phase Identifying and prioritizing the requirements. Brainstorming sessions for the feasibility and requirement analysis. Creating a list of the questions that the client, solution architect, technical lead, business analyst, etc., need to answer. Test Planning With the information gathered in the requirement analysis in the previous phase, the QA team moves a step ahead in the direction of planning the testing process. The most crucial phase of the Software Testing Life Cycle is test planning or test strategy. All of the testing strategies that will be utilized to test the program are defined during this phase. The test lead determines the cost estimates and efforts for the entire project at this phase. Here, a variety of test activities are planned and strategized together with the analysis of resources, which increases the effectiveness of the planning phase and aids in achieving the testing target. Software testing can’t be valued without effective tools, especially when you are performing automation testing. Choosing the right tool for software testing is planned in this phase. There are various tools out in the market for performing software testing. Choosing cloud-based automation testing tools like LambdaTest is the right choice when you want to test at scale. Entry Criteria Documents containing requirements. There should be a report on automation criteria provided. Activities in the Test Planning Phase The objectives and scope are described. Selecting the testing types to be carried out and the unique approach for each. Roles and Responsibilities are determined and assigned. Locating the testing resources and equipment needed for the test. Choosing the right testing tools. Calculating the time and effort needed to complete the testing activities. Risk analysis is being done. Test Case Designing and Development The requirements have been examined, and the QA team has created a test plan in response. It’s time to be creative and shape this test strategy by turning it into test cases. To check and validate each test plan, test cases are devised and developed based on a test strategy and specific specifications. Designing test cases in the STLC is a very important process as it will help determine the defects in the product. It can also be called defect identification or defect analysis. In order to design the test cases, first, we need to have a requirement document that will define the scope of functional and non-functional testing. This requirement document can be prepared by business analysts, and it should also include all possible user scenarios of the software product. Once we have the requirement document, we will go for test case designing. Designing test cases involves two steps: 1. Identification of test cases 2. Analysis of test cases The first step is to identify all the possible test cases which can cover all the user scenarios, and then finally, after analyzing them, we need to remove the test cases which are not fit for execution or the test cases which have low priority or the test cases which may not be able to find out any defect. The QA team begins writing effective test cases when the test design step is completed. Entry Criteria The specifications documents. The feasibility report on automation. Activities in the Test Case Designing and Development Phase Test cases are designed, created, reviewed, and approved. Existing test cases that are pertinent are examined, revised, and approved. If necessary, automation scripts are created, examined, and approved. Test Environment Setup Post designing and developing the test cases, to establish and replicate the ideal conditions and environmental factors to conduct actual testing activities, the software testing process needs an adequate platform and environment that includes the essential and necessary hardware and software. This phase consists of preparing the testing environment. The test environment establishes the parameters under which the software will be evaluated. Because this is a stand-alone activity, it can run concurrently with the test case development process. The test environment differs from one organization to another. In some circumstances, the testing environment is built up by the developer or tester, while in others, the testing environment is set up by the clients based on their needs and requirements. The testing team prepares for smoke testing while the customer or developer prepares the test environment. The purpose of smoke testing is to validate the test environment by determining its readiness and stability. Entry Criteria The test strategy should be readily available. Smoke test cases should be readily available. The results of the tests should be available. Activities in the Test Environment Setup Phase The test data is set up. The necessary hardware and software have been gathered and a test environment checklist has been created. Network configurations and a test server have been set up. The process for managing and maintaining test environments is outlined and explained. The atmosphere is smoke tested to ensure readiness. Test Execution The QA team is now prepared to engage in some practical testing operations as they have the test cases, test data, and the appropriate testing environment. The testing team executes the test cases in this phase based on the test cases and test planning prepared in the preceding phases, namely the test planning and test case development phases. The test cases that are passed are given a score. When a test case fails, a bug tracking system is used to communicate the defect or problem to the development team. These bugs can also be connected to a test case for future reference. In an ideal world, every failed test case would be associated with a defect. After the development team has addressed the bug, the identical test case is rerun to ensure that it is indeed fixed and works as expected. A report is generated that displays the amount of passed, blocked, failed, or not run test cases, among other information. Entry Criteria Testing strategy documents. Examples of test scenarios. Data from the tests. Activities in the Test Execution Phase Following the test plan, test cases are executed. Contrasting the outcomes achieved with those anticipated. Locating and spotting flaws. Recording the flaws and reporting the found bugs. Mapping faults to test cases and updating the traceability matrix for requirements. Retesting after the development team has corrected or eliminated a bug. Testing for regression (if required). Tracking a flaw until it is fixed. Test Cycle Closure The completion of the test execution phase and delivery of the software product marks the beginning of the test closure phase. This is the phase in which the entire cycle is evaluated. Other testing-related characteristics, such as quality attained, test coverage, test metrics, project cost, adherence to deadlines, etc., are taken into account and analyzed in addition to the test results. The team also analyzes the aspects of the Software Testing Life Cycle process that went well and those that may be improved. To determine the severity and issues, the test case report is generated. The test metrics and closure reports are created after the test cycle is completed. Entry Criteria Report on the execution of the test case. Report on a flaw. The execution of the test cases should be completed. Activities in the Test Cycle Closure Phase Review the entire testing procedure. Discussions take place regarding the need to modify the exit criteria, test plan, test cases, etc. Analysis and examination of test results. All test deliverables, including the test plan, test strategy, test cases, and others, are gathered and kept up to date. Test metrics and the test closure report have been created. The severity and priority of the defects are ordered. Methodologies of Software Testing Life Cycle In software testing, there are various methodologies to carry out the software testing processes. There are four types of methodologies: Waterfall Model V Model Spiral Model Agile Model Waterfall Model One of the earliest process models to be introduced was the waterfall Model. It is quite basic and straightforward to use. It functions similarly to a downward-flowing waterfall. Each phase should be finished before the execution of the next phase in this model, ensuring that no phases overlap. There are five phases in the waterfall model, which are completed one after the other. They are: Requirement analysis System design Implementation System testing System deployment System maintenance Before testing begins, all needs are determined in the first step, referred to as the requirement analysis phase. The developers build the projects workflow in the next step, known as the system design phase. The intended work from the system design phase is implemented in the implementation phase. The testing step follows, with each modules functionality being validated against the criteria. The next phase is the deployment phase, followed by the maintenance phase, which is an ongoing process. During this phase, the developers address any issues arising from the softwares use over time. When a problem occurs, the developer patches it and the software returns to testing. This process is repeated until all flaws have been resolved. Advantages of the Waterfall Model There is a review procedure and defined deliverables for each phase. There is no overlapping between the phases because they are completed one at a time. It works effectively for projects with well-defined requirements that do not change over the development process. Disadvantages of the Waterfall Model It does not demonstrate good results for lengthy projects. It carries a great deal of danger and uncertainty. It performs terribly for projects with a high or moderate likelihood of required modifications. For complex, object-oriented tasks, it performs mediocrely. The entire project may be abandoned if the scope is modified along the life cycle. In the last stages of the life cycle, the functional software is produced. V-Model The waterfall model is an out-of-date model with numerous flaws and limitations. As a result, the V-Model was created to overcome those limits. The verification and validation model is another name for the V-Model. It is seen as a development of the waterfall model. The tasks in the V-Model are completed simultaneously. On the left-hand side, it depicts software development operations, while on the right-hand side, it depicts the testing phases in progress. It means that each element of the software development cycle is inextricably linked to the phases of software testing. This model likewise follows the waterfall approach, as there are no stages that overlap, and the next phase begins once the previous phase has been completed. The testing phase must be planned concurrently with the software development phase in this architecture. The verification phase begins after the designing or planning phase, followed by the coding phase, and finally, the validation step. This phase also includes module design, which ensures that all modules are compatible with one another. The coding step begins after the verification phase is completed. The coding is carried out in accordance with the standards and rules. The validation phase follows the coding phase. The software is tested, including unit testing, integration testing, system testing, and acceptance testing, to ensure that it meets the customers needs and expectations and that it is defect-free. Advantages of the V-Model It is simple to comprehend and operate. Its ease of use makes it much more manageable. There is no overlapping of phases. It’s ideal where the needs are clear, such as minor projects. Each phase has its evaluation procedure and set of deliverables. Disadvantages of the V-Model Not recommended for complex, object-oriented programs. Unsuitable for lengthy projects. Not suitable for projects when there is a medium to high likelihood that a requirement may change during the project. Spiral Model The V-Model and the waterfall Model are recommended only for smaller projects where the requirements are specified clearly. Spiral models are suitable for larger projects. The Sequential Linear Development Model and the Iterative Development Process Model are combined in this paradigm. This means it’s similar to the waterfall approach but focuses on risk assessment. In the spiral mode, a particular activity is done in one iteration. This is why it is called spiral. The same procedure is followed for every spiral created to construct the whole software. There are four phases in the spiral model. They are: Identifying objectives Risk analysis Develop and test Review and evaluate The sole variation between the phases of the waterfall and spiral model is the risk analysis. Advantages of the Spiral Model Unlike the previous two models, it enables changes to be accommodated. However, the requirements should be expressed clearly and should not change throughout the process. It enables users to test the system at an early stage. Requirements are captured more precisely. It provides for the division of the development process into smaller segments, allowing for the creation of the riskier parts early, resulting in better and more exact risk management. Disadvantages of the Spiral Model The procedure is very intricate. It is impossible to predict the projects completion date in advance. Low-risk initiatives shouldn’t employ it because it can be costly and unnecessary. There is no end to the spiral iteration. As a result of the several intermediary steps, excessive documentation is required. Agile Model To overcome these challenges, longer iterations of testing and development are used in the agile paradigm throughout the software testing life cycle. It is currently the most popular model. If you are still working on the waterfall methodology, it is high time to move to the agile methodology. Here are some of the points you need to know while moving from waterfall to agile testing. Customers have the ability to make adjustments to the project to improve it and eliminate defects. In other words, any errors discovered during testing can be rectified or amended on the spot without interrupting the testing process. Teams must now automate their test cycles due to the current trend in enterprises toward agile development. This enables them to advertise new features more quickly and gain an advantage. There are seven phases included in the agile methodology. They are: Plan Design Develop Test Deploy Review Launch It is essential to follow the phases one after the other. It’s also critical to remember that to ensure apps are prepared for the real world, they must be tested in real-world user conditions. This also implies that teams must have immediate access to real devices with real operating systems and browsers loaded for testing. Now, keeping up with such an internal device lab requires a lot of money, time, and effort. The best way to avoid cost and effort is to opt for a cloud-based web testing platform. LambdaTest, a cloud-based cross-browser testing platform, is the right fit here. It provides a cloud-based scalable infrastructure and provides an online browser farm of 3000+ browsers, devices, and OS combinations. You can use LambdaTest with the online Selenium Grid’s power to run thousands of parallel tests in a matter of seconds, reducing test execution time and providing faster feedback on code changes. Advantages of the Agile Model The processes are divided into many individual models in the agile model so that developers can work on them separately. It presents a method for software testing and development that is iterative and incremental. It gives the consumer an early peek at the project and allows them to make regular decisions and modifications. When compared to the other models, the agile technique is considered an unstructured model. Between testing sessions, problems, errors, and defects can be corrected. It necessitates less planning, and the project, or the testing process, is completed in short iterations. With plenty of advantages, it is suitable for organizations to stick to the agile methodologies. Best Practices of Software Testing Life Cycle Below are some of the best practices that are followed in the Software Testing Life Cycle. When deciding on the scope of testing, consult with important business users. User feedback is used to identify essential business processes. As they consume most of the users time and resources, it ensures that the test strategy covers testing for those essential business operations. Determine the most common faults or problems that negatively influence the user experience. Testing is planned with a clean user experience for important processes. Testing is planned to ensure that the product meets all user requirements. Conclusion Identifying faults in the last stage of an SDLC is no longer an effective approach. A company must also concentrate on a variety of other daily duties. Spending too much of your valuable time testing and correcting bugs can stifle productivity. After all, it will take longer to produce less output. It’s critical to make efficient use of time and resources to make the testing process go more smoothly. Following a systematic STLC allows you to fix bugs quickly and improves your works quality. Happy testing!
Software development has always been a process with many challenges. As an organization grows, it is essential to work collaboratively to have more efficiency, productivity, reuse, and fewer bugs and, as a result, accelerate the innovation process. In today's post, we'll talk about how to achieve these results with InnerSource. What is InnerSource? In simple words, InnerSource is a growing trend in high-performing software development teams that adopt some principles and practices of open-source teams within an organization. Today we can see a significant change in the behavior of companies like Microsoft, which went from the biggest enemy to one of the biggest allies in the open-source world in the last two decades. One possible reason for this shift in thinking is survival. It is estimated that 90% of the entire internet runs on Linux, and between 65% and 95% of an application's code is third-party code, and a good part of these libraries are likely to be open source. In other words, it is practically impossible to think about software engineering or application development without thinking about or going through some tool, language, or database that is not open source. Open Source? But Isn’t That Philosophy? Unlike many people realize, open source goes far beyond philosophy. Currently, the most complex software across the industry has some open licenses. That generated billionaire opportunities, such as in the case of IBM buying the largest open source company in the world, Red Hat, for thirty-four billion dollars (at the time). Entering the Developer Experience market, or DevEx, these numbers are hegemonic. You can see that the most popular languages in the world are open-source. The same goes for the most popular databases. Today we can see a significant change in the behavior of companies like Microsoft, which went from the biggest enemy to one of the biggest allies in the open-source world in the last two decades. One possible reason for this shift in thinking is survival. It is estimated that 90% of the entire internet runs on Linux, and between 65% and 95% of an application's code is third-party code, and a good part of these libraries are likely to be open source. In other words, it is practically impossible to think about software engineering or application development without thinking about or going through some tool, language, or database that is not open source. Challenges and Similarities Between Open Source and Large Organizations That's right! Both open source and large organizations have a lot in common, including the pains: both have to deal with multiple components, contributors, and tools, in addition to strategies. Let's look at some of these pains below. Team Topology One point is team topology: after all, many companies started with a philosophy of remote-first and distributed teams. This became very evident, especially in the 2020s for the corporate world; however, the open source world already has experience with this for at least twenty years. Code Reuse Code reuse is another point both aim for; after all, creating the same solution several times causes the effort and knowledge to be duplicated, in addition to a high maintenance cost. Much is discussed about the problem of reinventing the wheel at the solution level; however, little is said at the organizational level. In the open source world, we have the case of the Apache Foundation with the Apache Commons project, a Java library whose focus is to centralize reusable components in Java, which among its thousands of users are also the Apache Foundation projects. The same Apache Commons library is used in Apache Kafka, HBase, and Hive instead of each having its solution. This gives you a component that is tried and tested, from commercial applications to databases and a distributed event platform. Another point of the reuse practice is the possibility of using tools that facilitate the use of components and libraries in developing your solution. Speaking again of the Apache Foundation and the Java world, there is Maven. That allows, in an effortless way, the use of components, such as libraries and plugins within the Java universe. Team Management and Decision Making The traditional model tended to work differently, mainly focusing on a management hierarchy. Which, many times, leaves all the responsibility for the result in the hands of leaders in management and directors. With time and the project's growth, this general management model has some bottlenecks. In this approach, the increase in complexity and the involvement of a larger number of people to gain speed in delivery does not necessarily translate into the expected impact, thus being able to reduce efficiency in most scenarios. The consequence is an increase in people's demotivation and turnover. This turnover can impact people who know about the problem being solved or, more specifically, about a project component, impacting deadlines, maintainability, and all that in a vicious cycle. The software is “promoted” to legacy, and the solution is to redo the project; however, if the methodology does not change along the way, the software returns to that state. After all, without sound engineering and development practices, many options are more of a hindrance than a help at critical times. The other point is the waste of knowledge and the lack or low rate of reuse of components, which means that throughout the organization, there is the same component countless times. Thus, a bug must be fixed numerous times: the same job is done countless times instead of being efficient between teams. Best Practices From the Open-Source World The open-source world brings a series of methodologies and practices for the corporate world that have already been successfully validated in sizeable open-source foundations such as Apache, Eclipse, and Linux, among others. Examples include: Visibility Code review Tests CI/CD Software documentation Issue tracker Using this approach creates a tendency to close gaps and break down knowledge silos within an organization. In other words, it is bringing to the corporate market all the know-how and maturity of a remote-first, distributed, collaborative methodology that focuses on code quality without forgetting delivery. As is the case with the Java language, the JVM, and the Linux kernel, in addition to several Apache Foundation projects such as Cassandra, HBase, and Hive, among others. What Are the Most Significant Benefits of InnerSource? Through the use of the methodology, it is understood that in the short and medium term, the InnerSource can bring, in general, the following benefits: High-quality code: With more excellent test coverage and devs paying attention to the quality standard, greater flow within continuous integration is possible. Comprehensive documentation: A well-documented code and documentation close to the code make the code the source of truth. This allows for a better understanding of the business by applying Domain-Driven Design (DDD) good practices as a ubiquitous language. Effective code reuse: With good documentation of both the code and the architecture, its reuse is much easier, not to mention that the onboarding process tends to be less expensive and faster. Strong collaboration: As a consequence of the previous points, the model promotes collaboration with fewer frictions. Healthy culture: Silos are broken, and communication becomes more transparent. Reports show this also increases the engineer's purpose, reducing the team's turnover. In other words, several people with different knowledge and specialties know the documentation, access the code, test, reusing, and ensure the quality and reputation of the code. Does It Work in Production? These practices are excellent. It is a utopian proposal. Does it exist? Does anyone already adopt it? The answer is yes! There is even an InnerSource Foundation that contains success stories from several companies worldwide; for example, Microsoft, Gitlab, Adobe, and Capital One. The benefits are very similar between companies in addition to increasing the efficiency of developing. Another point is security: almost 40% of devs identify that InnerSource helps to identify security problems. In its book on InnerSource, PayPal reports more fantastic room for innovation, quality, and scalability of existing solutions within the organization. Also, according to the State of InnerSource report, 61% of people who responded pointed to knowledge sharing as the most significant benefit of adopting InnerSource. Then, 51% declared that the use increased job satisfaction. InnerSource in Brazil In Brazil, there are already companies that already use and adopt the concepts and practices quite fluently. Next, let's see a little more about the applications in Itaú and Zup: InnerSource at Zup For example, within Zup, most of the practices applied in its open-source products were used in internal and external projects of the organization. In this case, we can highlight the architecture documentation part in which the projects were concerned with the application of the C4-model, the use of a Tech Radar in addition to the Architecture Decision Records (ADR). As a next step, the goal is to scale these practices and many others throughout the organization. InnerSource at Itaú At Itaú, many teams already practice InnerSource, and some already practice even before talking about InnerSource on a corporate scale. More than using InnerSource, it's essential to know why to use InnerSourcethe purpose and the value it will bring – as we know that InnerSource is not a silver bullet. One of the first approaches is to start InnerSource, where reuse and collaboration by more than two teams make sense. And if it makes sense to adopt InnerSource practices, we help teams structure themselves for this with their repositories. For example, we encourage teams to share their components and libraries and promote visibility so other teams can reuse and not build from scratch. In addition, we seek to understand the potential of InnerSource for teams. Some practice InnerSource to remove the day-to-day bottleneck, others for reasons of team capacity, and others to generate adoption for their components, libraries, or infrastructure templates, for example. And through this, we generate data for the teams that practice InnerSources, such as product adoption, collaboration, and the number of days the issues are open. In other words, InnerSource is a solution for day-to-day software engineering practices. Our next step is to create, with several teams, a reuse standard where everyone can bring good practices, regardless of the group, and promote a grander scale in the organization. Also, based on their experiences, Itaú, Rede and Zup focus on an InnerSource that aims to impact the three organizations further. The goal is to ensure reuse, raise the technical quality of the team, and reduce development time for existing components to focus on innovation directly for the business. Conclusion The world of open source is an excellent source of significant success cases such as JVM and Linux, in addition to supremacy regarding developer experience with languages, databases, and other widely used products. In addition, open source brings a culture that focuses on software quality, clarity in communication, good documentation, continuous delivery in a distributed team, ensuring reuse, and focusing on innovation. This culture in dealing with software development makes many organizations seek the same result. That's why organizations like Itaú, Rede, and Zup aim to increase efforts in InnerSource culture. Achieve even greater heights through more remarkable and more fluid collaboration.
Agile enables teams to provide consumers more quickly and without as many problems through an iterative project management and software development approach. An agile team produces work in manageable, small-scale increments rather than staking everything on a "big bang" launch. In addition, teams have a built-in mechanism for fast adjusting to change since requirements, plans, and results are regularly evaluated. Enterprises use agile approaches like scrum and Kanban to upgrade applications, enhance customer experiences, and drive digital transformations. Agile development has a deep history. Additionally, there is a vast body of knowledge on these approaches and how they relate to design thinking, product management, and DevOps. Organizations seek to answer the question ' What Is Agile ?' to align their team with agile best practices to improve productivity. This article is an introduction to agile approaches and their interrelationships. Additionally, you'll discover how DevOps and agile relate to one another and the best practices for fostering an agile culture within organizations and producing higher-quality software. What Is Agile Methodology? "Agile process model" refers to an iterative software development approach. Agile methods divide tasks into smaller iterations or parts and do not directly involve long-term planning. In simple words, agile means quick or adaptable. The project scope and requirements are defined at the start of the development process. Plans for the number of iterations, duration, and scope of each iteration are clearly defined in advance. In the agile process model, each iteration is regarded as a short time "frame," typically lasting one to four weeks. In addition, dividing the entire project into smaller parts helps reduce project risk and overall project delivery time requirements. Before a functional product is shown to the client, a team must complete a full iteration of the software development life cycle, which includes planning, requirements analysis, design, coding, and testing. Agile software development is a group of software development approaches based on iterative development, where requirements and solutions are developed through cooperation amongst self-organizing cross-functional teams. Agile methods or Agile processes typically encourage a disciplined project management process that enables frequent inspection and adaptation, a leadership philosophy that encourages teamwork, self-organization, and accountability, a set of engineering best practices intended to allow for rapid delivery of high-quality software, and a business approach that aligns development with customer needs and company goals. Any development methodology that adheres to the ideas in the Agile Manifesto is known as agile development. A group of fourteen influential people created the Manifesto in the software business, and it is based on their knowledge of what strategies work and don't work. History of Agile Methodology In the late 1970s, personal computers exploded, granting the common individual access to contemporary computing. The increased consumer demand stimulated innovation, and businesses were challenged to match customers' ever-changing needs. Unfortunately, the rigid approaches that formerly dominated the SDLC were incapable of delivering software quickly or efficiently adapting to changing requirements during the development process. In the early 1990s, a small group of software industry experts began creating and pushing new methods to the SDLC that emphasized rapid response and adaptation to all changing requirements and technology. Rapid application development (RAD), Scrum, extreme programming, and rational unified process (RUP) emerged as the new, extremely flexible, and responsive development methodologies. In 2001, a small group of seventeen industry executives convened in Snowbird, Utah, to explore these new and emerging methods. Here, the phrase Agile software development was initially used to represent flexible software development that proceeded in iterative stages; it eventually became the umbrella name for the new approaches. To differentiate Agile software development from conventional techniques, a group of industry experts created the Agile Manifesto, a set of values for using Agile. Agile methodologies have grown in popularity since 2001. As more and more businesses and teams adopt them, an ecosystem has emerged that includes all Agile software development practitioners as well as the individuals and organizations that support the process through training, consulting frameworks, and tools. Phases of Agile Model Here are the six phases of the Agile Model: 1. Gathering Requirements This stage requires that you specify the criteria. You should outline the business potential and schedule the time and resources needed to complete the project. You can assess the technical and financial viability based on this information. 2. Create the Requirements Once the project has been determined, collaborate with the stakeholders to create the criteria. You can use a user flow diagram or a high-level UML diagram to demonstrate how new additions will function and relate to your current system. 3. Construction/Iteration Work starts once the team determines the needs. Designers and coders get to work on their project, which aspires to release a usable product. The product's functioning is straightforward and modest because it will undergo many refinement stages. 4. Testing In this step, the Quality Assurance team evaluates the product's functionality and searches for bugs. 5. Deployment During this stage, the team distributes the product in the user's office setting. 6. Customer Input This is the final step before a product is released. This allows the team to receive and process feedback on the product. Agile Testing Methods Agile Project Management employs a variety of approaches, and this section will cover the most prevalent ones. These methodologies utilize the underlying concepts of Agile but develop distinct frameworks to achieve specific outcomes. In a firm, multiple approaches are permissible, but before settling on a single strategy, you should evaluate which alternatives make the most sense for your company. Here are the various approaches to Agile Methodology: Scrum Kanban Scrumban Crystal Dynamic Software Development Method(DSDM) Feature Driven Development(FDD) Lean Software Development eXtreme Programming(XP) Kanban, Scrum, and Scrumban are the three most essential techniques, but we will also briefly discuss a few others. Scrum Scrum is an agile development process emphasizing effective work management in team settings. The three participants' tasks are as follows: The scrum master's job is to oversee the entire master team, facilitate meetings, and eliminate obstacles. The product owner is the person that creates the product backlog, sets priorities for the delay, and is in charge of how features are divided up between iterations. Scrum Team: Within the Scrum paradigm, the team is accountable for managing and coordinating to finish the sprint or cycle promptly. The Standard Scrum procedure is as follows: A Simple Sprint Sprint Preparation This is required when a backlog develops and is intended to assist in getting back on track. Daily Scrum Regarding deadlines and assigning work. Sprint Review It ensures that everything is completed correctly. Sprint Retrospective Understanding what you performed well and what you need to do better. Kanban The Kanban approach, developed initially in Japan, involves arranging cards on a whiteboard to represent various tasks. Every job is laid out in a workflow chart, allowing everyone to know exactly where they stand regarding company-wide productivity. The team benefits from this since it provides a visual representation of the organization and the division of responsibilities within it. In addition, everyone involved in a project may maintain constant communication and awareness of its development. When everyone's work routines are visible, it's easy to spot slow spots and figure out what needs to be done about them. Scrumban Scrumban refers to a hybrid methodology that combines Kanban and Scrum. Scrum allows for greater adaptability and continuous communication across all teams. The Kanban component also provides a visual representation of the process, which helps teams stay motivated and not get overwhelmed by the project's magnitude. Team members that are interested in making the switch from one version of Agile to another will find this management approach to be a helpful resource. Many individuals find a sudden change too upsetting, but since all three approaches are part of the Agile family, making the transition is easy. Crystal In reality, crystal methodology is a subsection of a subsection. There are numerous sorts of Crystals, like Yellow, Orange, and Red. Regardless of your choice, the method prioritizes people and workers over systems and procedures. This allows individuals to operate in a setting that best suits their needs rather than adhering to a rigid template. The colors reflect the numbers in each group, with each hue employing a slightly different way to aid with clarity. This technique consists of three concepts: 1. Chartering: This phase involves a variety of tasks, including the formation of a development team, the execution of feasibility analysis, the creation of blueprints, etc. 2. Under cyclic delivery, there are two more cycles, which are: The team revises the release schedule. A product that is integrated gives to its users. 3. This phase performs deployment, and post-deployment, according to the user environment. Feature-Driven Development (FDD) Feature-Driven Development (FDD) is a software development strategy in which a new model is produced every two weeks. Though it requires work in design and development, this model will result in comprehensive records of your strategic acumen. The "Designing and Building" phases are the focal points of this approach. When compared to other intelligent approaches, FDD details the individual tasks that need to be obtained for each function. This approach allows you to use software tied to your plans rather than the other way around, resulting in a construction that performs precisely as you need. Dynamic Software Development Method (DSDM) DSDM is an agile project distribution structure and a rapid application development methodology for software development. The basic characteristics of DSDM are that users must be actively engaged and that teams have been granted decision-making authority. These are the approaches utilized in DSDM: Time Limiting MoSCoW Rules Prototyping The DSDM project is comprised of seven phases: Pre-project Feasibility Analysis Business Study Functional Model Iteration Design and Construction Cycle Implementation Post-project It is essential to the DSDM that not all requirements be deemed significant. Each iteration should include non-essential things that can be eliminated without affecting higher-priority requirements. Lean Software Development Lean software development is an additional iterative technique that emphasizes effective value stream mapping to ensure the team delivers customer value. It is adaptable and ever-changing; there are no strict principles or regulations. The Lean technique is based on the following fundamental principles: Boosting education Facilitating teamwork Fostering integrity Waste removal Realizing the whole. As late as feasible in making choices. The fastest possible delivery of the merchandise. To provide rapid and effective development workflows, the Lean methodology relies on prompt and dependable customer and programmer input. Instead of depending on a hierarchical control structure, it delegated decision-making responsibility to people and small teams. To eliminate waste, the Lean method requires users to prioritize and deliver features in small batches. Lean software development also supports the concurrent writing of automated unit tests and code and focuses on maximizing the productivity of every team member. eXTREME Programming (XP) The extreme programming (XP) methodology is a disciplined approach that emphasizes rapid delivery. It emphasizes enhanced consumer interaction, quick feedback loops, constant planning and testing, and close teamwork. Typically, every two to three weeks, the software is provided. The objective is to enhance the responsiveness and quality of software in response to changing client requirements. The XP methodology is founded on the core ideals of communication, feedback, simplicity, and bravery. Customers define and prioritize their requested user stories closely with their development team. Nevertheless, the team is responsible for producing working, iteration-tested software for the highest-priority user stories. The XP methodology provides users with a lightweight, guiding framework that facilitates the deployment of high-quality enterprise software and increases productivity. When to Use the Agile Method You can employ the Agile Method when: A customer is always available for meetings in software development. The project is minimal. Regular shifts are essential. There is a group of experts at your disposal. Roles in Agile Methodology An agile development process involves several responsibilities. A vision statement outlining the range of issues, possibilities, and values to be addressed always documented as the first step in an agile software development process. This vision is captured by the product owner, who then collaborates with a diverse team (or teams) to carry it out. Agile Portfolio Manager An Agile Portfolio Manager functions similarly to a Traditional Portfolio Manager in that they examine the products and objectives of each Scrum and devise techniques that enable many teams to collaborate. Agile differs in that decentralized control, transparency, and the ability to experiment are incorporated. The culture of openness permits individuals to raise concerns without fear of condemnation or punishment. Instead, the focus is on resolving the issue and moving ahead. As the Portfolio Manager is not a member of a specific team, they will not prefer or expect more from any particular group. Engineering Manager The efficiency of your team's work depends on the engineering managers you employ. They are responsible for eliminating bottlenecks and rerouting any accumulations so the team can operate efficiently. The timely completion of high-quality work. Typically, this entails concentrating on external factors that can impact the workload of a team. A competent engineering manager understands that a healthy team cannot consistently perform at 100 percent. This involves ensuring sufficient slack in the task to prevent burnout. Users The user or client is always the first consideration in an agile process. Therefore, user personas are frequently created today to show various workflow roles or consumer wants and behaviors. Product Creator The duty of the product owner is to represent all internal stakeholders and the customer. This person distills insights, suggestions, and feedback to produce a product vision. Although product visions are sometimes brief and straightforward, they nonetheless present a picture of the user or client, the values being addressed, and a plan for doing so. The product owner divides the product vision into several user stories to collaborate with the development team. The target users, their problems, the need for the solution, and the limitations and acceptance criteria that define it should all be mentioned in the user story. The product owner must define the vision; however, it may be and then collaborate with the development team to make it a reality. For the team to have a common idea of what is expected of them, the product owner prioritizes these user stories and evaluates them. Skills Required for Agile Methodology in Software Development A number of abilities are necessary for effective participation in an Agile project. These could be taught in schools, emphasized in culture, and discussed in meetings. T-Shaped Product Administration Each member of a team or Scrum should possess the same fundamental subject expertise. As a result, as a product manager, you must guarantee that every team member covers the fundamentals. To do this, you may reassign team members' responsibilities so that fundamental knowledge is not lost or forgotten. The lengthy horizontal line is fundamental knowledge in this "T"-shaped comparison. The vertical line signifies an in-depth understanding of something more particular. Each team member should have their specialty. This may depend on experience, skill, or specialized knowledge. This enables the Scrum team to collaborate and rely on one another's specific knowledge to solve a problem. As everyone is an expert in fundamental knowledge, they can rely on each other for the project, and as experts in other disciplines, they can call for assistance when necessary. Excellent Communication Abilities Each team member must communicate effectively for anything in Agile to function as intended. They must feel comfortable exhibiting a problem, regardless of its origin, and explaining how it affects the project. Since Agile Project Management is intended to indicate how each team member is performing and facilitate discussion of the subject matter, the ability to articulate your rationale is a crucial skill. However, with free communication comes the possibility of misunderstanding. To prevent this issue from occurring regularly, each team member must possess excellent communication skills. Capable of Adaptability Since Agile Project Management enables the workforce to adapt to change as it occurs, individuals must also be adaptable. This requires acquiring new skills when they become necessary, updating technology and staying abreast of these developments, and recognizing that change need not impede progress. Good Organization Due to the importance of adaptability in Agile Project Management, linear work is improbable in the workplace. Self-organization is required to stay on top of your duties and ensure that they align with other Scrums and Sprints. This requires recognizing and maintaining your priorities. Without this ability, it is simple to forget details or tasks, causing problems for your team. Problem Solving The ability to recognize a problem and devise a solution is a crucial part of agile project management. It takes a team effort to create solutions, but it is essential to approach the problem with a proactive perspective. Although many believe they possess exceptional problem-solving abilities, this perception results from an active desire to resolve problems. Creating team days emphasizing problem-solving might be advantageous for teaching or maintaining problem-solving skills. Escape rooms, puzzles, and other enjoyable team-building games are examples. Time Management As many Agile Projects are self-contained, each team member must be capable of self-management. This requires monitoring the due dates of their work, the duration of their projects, and the team's productivity pace. This may also need managers to monitor the transfer between projects. If they anticipate a huge project, they must manage their staff to accommodate the increased workload. Risk Management In Agile Project Management, balancing the desire for the reward and the risk associated with the pursuit is one of the greatest hazards. As a stand-alone project, it can be simple for team members to take on tasks that align with their interests while leaving undesired tasks behind. However, this might result in job delays and social conflict. As a manager, you should redirect the team if it veers off course. This results in prioritization hazards, in which management fails to prioritize work effectively. This may result in a backlog, a bottleneck, or additional delays. Keeping an eye on the group dynamic and workflow can prevent this danger. Advantages of Agile Methodology The following are the advantages of the Agile Methodology: Deliver regularly. Direct interpersonal interaction with customers or clients. Its design is both effective and satisfies the need placed by the company. Any time is efficient for making adjustments. It minimizes the amount of total development time. Disadvantages of Agile Methodology Here are the disadvantages of the Agile Methodology: Due to the lack of formal records, there is a misunderstanding, and critical decisions made during multiple phases can be misconstrued at any point by team members. After the project is over and the developers are moved on to other projects, it can be challenging to maintain the finished product due to a lack of proper documentation. Technical Best Practices for Agile Companies Scrum is the fundamental procedure for team collaboration, planning, and delivery; nevertheless, it does not address technical best practices, organizational standards, or building and driving agile cultures. Today, many technical best practices include the definition of the software development lifecycle (SDLC) and implementing DevOps procedures. The SDLC outlines best practices for writing code, maintaining software assets, and developing technical standards. DevOps automation such as CI/CD, Infrastructure as Code (IaC), and continuous testing allow for a more dependable path to production. Other methods, such as shift-left security practices, observable microservices, feature flagging, canary releases, and AIOps, offer a more adaptable and dependable delivery methodology. Conclusion Often, the Agile methodology is misunderstood as a solitary practice. Nonetheless, this investigation has not discussed hundreds of approaches and techniques. Agile teams have been shown to boost profitability by 37 percent and produce 30 percent more revenue than non-Agile organizations, regardless of the specific methodology and practices they employ. In addition, many firms are adopting Agile due to the increased speed, adaptability, and productivity gained through these methods. The extremely fast-paced nature of the software engineering industry necessitates adaptability and responsiveness in all aspects of project development. Agile approaches permit the delivery of cutting-edge goods and the cultivation of novel user experiences while keeping the product in sync with market trends and customer needs. However, variety has a permanent position in society. Therefore, depending on the needs and objectives of your firm, you may still profit from employing the Waterfall approach or a hybrid of the two.
Agile workflows have quickly established themselves as pillars of software development worldwide. They have done this to the point that the methodologies that support the framework have permeated many other fields. Agile is here to stay when it's about assisting a marketing team in updating its fundamental strategy. Also, Agile assists customer relations in achieving their most recent service objective. The software development life cycle can be approached in various ways by developers and stakeholders (SDLC). One of the best accessible methods is the agile model. Its primary emphasis is on continuous testing and development iteration. When managing a project or plan, a team uses a set of approaches called agile. Agile separate the work into many phases while maintaining constant client interaction. Continuous observation is necessary during every step. In contrast to traditional approaches, the agile methodology parallelism and synchronizes the software development and software testing processes. Then, where do I start? Given this popularity, it can be scary for a team to get in deep and adopt a new methodology. This is because a new methodology would probably completely change how they go about doing things daily. Here is the list of the most fundamental techniques. This could assist you in selecting the ideal model for your team — Check out now 10 Important Types of Agile Methodology for 2023 Agile Methodologies are the result-focused approach to software development. It is based on the main principles in the agile manifesto. It includes constant collaboration with stakeholders and continuous enhancement at every stage of product development. There are different types of agile methodologies used for rapid decision-making. Agile Development Methodologies can be effectively used on projects depending on the project requirements and objectives. Let us find out more about the Agile Frameworks and their effective agile principles: 1. Scrum This type of agile methodology is used during the development process in an organization. Scrum methods are useful in streamlining processes by eliminating blockers. This method offers the most useful technique of self-organization, which is an advantage. Scrum consists of five ceremonies, including backlog grooming, sprint planning, daily stand-up sprint review, and sprint management. Each of these ceremonies is time-tested and works according to length and sprint. 2. Kanban Kanban is one of the Agile Frameworks similar to Scrum methodology. The product life cycle faces uncertainty and needs to manage the ongoing changes. Being a popular Lean workflow management method, Kanban works best to bring clarity to the work process and manage the continuous workflow. This will enhance efficiency by limiting work in progress and help to prioritize product backlogs. Kanban works well for both IT and non-IT environments. 3. eXtreme Programming (XP) eXtreme Programming (XP) is a powerful, agile methodology emphasizing teamwork, effective communication, feedback, and respect. It also gets feedback from every process and works accordingly. The development team organizes itself around the goal to achieve customer satisfaction, which is the main focus of extreme programming methodology. It works similarly to Scrum and uses sprints and short development cycles. The methodology aims to develop efficient environments and informative workspace, so team members know the development progress and, thus, deliver higher productivity. 4. Lean Software Development This agile methodology works based on seven principles. It eliminates useless data and should not be added to the project. The processes are manageable with complete discipline and control over residuals, which are essential for quality development. Knowledge creation is a part of the documentation. It should work with commitments and have an understanding of business. It should deliver the output from the value-added services. It should offer effective communication and concept management. Optimizing the whole procedure and offering scalable and adaptable methods. 5. Crystal Crystal was introduced by Alistair Cockburn. He contributed to the Agile Manifesto, which is part of software development. Crystal offers an agile development methodology that combines Crystal yellow, Crystal Clear, Crystal Red, Crystal Orange, and many other smaller groups. Each one is an exclusive framework with factors such as team size, project priority, and system criticality. It is a lightweight method of agile methodology that helps in achieving the best results. 6. Scaled Agile Framework (SAFe) The scaled Agile Framework helps to implement organizational patterns at an entrepreneurial scale as a part of the workflow. Being a lightweight framework, SAFe offers a centralized decision-making system. This framework also contributes to software development efficiency at an enterprise level. The software developers follow the SAFe, agile philosophy to manage various strategic issues. 7. Feature-Driven Development (FDD) The industry-recognized practices can be anchored into the business with the help of a customer-centric, iterative, incremental agile method. The main object of Agile Development Methodologies is to produce time-tested, working software. And an overarching model of the project should be developed that includes life cycle stages. FDD (Feature-Driven Development) follows a five-step process that can be managed by a large project team to offer excellent services. These steps are: Develop an Initial model Create a feature list Plan by feature Design by feature Build by feature 8. Dynamic Systems Development Method (DSDM) Dynamic Systems Development Method works best to manage the standard industry character and to make use of swift delivery using these agile methods. It offers a comprehensive structure that defines a concrete plan of action. It also defines steps to execute the plan to manage every procedure as a part of software development. This DSDM framework consists of four important components that are further divided into eight key principles. These components are: Feasibility and business study Functional model/prototype iteration Design and build iteration Implementation The DSDM allows the project to be modified as per expectations and to maintain quality. It is also required to deliver the product according to schedule, without any negotiation. 9. PRINCE2 Agile PRINCE2 Agile is a type of Agile Framework that works based on PRINCE2 and agile methods. It manages the processes in an agile way, which is best for the behavior concepts and framework. It also focuses on several areas, including techniques. The PRINCE2 principles and processes, along with other things, can be tailored according to the project requirements. It is a methodology that is best suited for its usage in the project's direction and also in the project's management. The PRINCE2 methodology can be applied to a project depending on the situation of the project and the type of thinking that is best for the methodology. 10. Nexus Nexus is a framework for product or software development. You can also make use of 3–9 scrum teams using sprints for 30 days. Nexus is one of the most scalable methodologies of all. It is mainly dependent on the agile behavior of various teams that work on scrum methodology and manage to deliver an integrated product. There are around 50 types of agile methodology, and each one has its pros and cons. The most effective agile approach can be to divide products and services into autonomous parts. Agile technology has become an important aspect of project planning and execution. It can eliminate useless data and redundant processes but offers objective-related outcomes and customer satisfaction. Why Should Anyone Use an Agile Methodology for Operations? Agile methodology mainly focuses on delivering value to the customer without compromising quality. Also, all type of agile methodology helps business leaders to complete the project within the allocated deadline. In addition to those, here are the few key benefits businesses can get through agile methodology. Here are several reasons and benefits defined: Faster Time-to-Market By using agile methodology, businesses can develop their products and ship them much faster. The task prioritization provided by the agile frameworks helps business leaders to develop their products in a quicker phase and also deploy them to the market quickly. Agile methodology allows the development team to split the product development and design process into multiple small chunks. This makes things easier for the development team and designer team to do their job. The testing team also can do their testing operation easily and quickly, which ensures the deployment of better-working products to the market. Superior Quality Products When it comes to agile development methodologies, testing becomes an integral part of the project development phase. This ensures the overall project quality will be of the highest standards during the project execution phase completed. In addition, Agile allows clients to take part in the project development process. This makes the developers easily alter according to the client’s need or market condition during the development phase itself. Being an iterative process, the agile methodology allows the development team to keep on learning and improving in real-time. Enhances Project Visibility and Transparency Unlike traditional project management methods, the agile methodology allows all, including business stakeholders and clients, to take part in the development process. Moreover, agile methodology highly relies on the client’s involvement throughout the project development process. Agile development methodologies allow clients and business leaders to observe the entire project development and gain higher project visibility and transparency. This enables clients and business leaders to make changes according to the current market condition during the development stage itself instead of waiting for the project's completion. Customer Satisfaction The agile methodology allows clients to observe the project throughout the development process and to take part in the decision-making process. This delivers greater customer retention. Other than agile frameworks, clients and business leaders only take part in the project planning stage and after project completion. This doesn’t allow them to influence the project development and causes a negative impact in terms of flexibility and adaptability. However, agile project management keeps customers in the loop through the project development stage. It allows them to make essential changes and provide feedback so that after the deployment project will meet the demands of market conditions. This ensures delivering higher customer value and projects completed according to the client’s expectations. Reduces Risks and Provide Better Control Agile-based project development will never fail. Agile development methodologies work based on small sprints that allow developers to deliver a quality project with continuous improvement. Diving projects into small parts helps developers to alter them at any time if any particular approach doesn’t go according to the plan. This eventually reduces the risk factors of project development getting in the wrong direction. With higher project visibility and transparency, business leaders and clients will have better control over the project. This approach provides better tracking and management of the project. Let’s Wrap It Up Agile development methodologies are efficient tools that ensure every team involved in the project is on the same page. Implementation of Agile frameworks ensures the entire project development, project testing, and project deployment run smoothly and deliver greater agility. Agile methodologies help create a better organizational environment and deliver excellent results along with complete customer satisfaction, which is a must for every business. Agile makes things easy for teams by shortening development timelines and improving productivity which helps to deliver projects on time.
This is an article from DZone's 2022 Performance and Site Reliability Trend Report.For more: Read the Report Site reliability engineering aims to keep servers and services running with zero downtime. However, outages and incidents are inevitable, especially when dealing with a complex system that constantly gets new updates. Every company has a relatively similar process to manage incidents, mitigate risks, and analyze root causes. This can be considered an opportunity to identify issues and prevent them from happening, but not every company is successful at making it a constructive process. In this article, I will discuss the advantage of the blameless postmortem process and how it can be a culture of change in a company — a culture for a better change and not to blame! An SRE's Role in Postmortem Postmortem is a process in which a site reliability engineer (SRE) records an incident in detail. This information includes the incident description, the impact of the incident on the system, and the actions taken to mitigate the issue. SREs are engineers who are responsible for taking care of incidents. That’s why they are the ones who prepare most of the postmortem information into a report, which not only addresses the root cause but also suggests possible actions to prevent the same incident from occurring again. Therefore, a postmortem process for SREs is an opportunity to enhance the system. The Standard Postmortem Meeting Structure A postmortem meeting usually is arranged days after a team handles an incident. Let's look at the typical format for this meeting: Keep to a small group. Only related people from various roles and responsibilities are invited to this meeting. The group stays small to ensure that the meeting will be short and productive. Start with facts. One important thing about this meeting is that there is no time for guessing. Instead, facts are shared with the team to help people understand the issue and perhaps identify the root cause. Listen to stories. After highlighting the facts, there might be some extra discussion from team members who were either involved in the incident process or might have some knowledge about that particular issue. Find out the reasons. Most of the time, the root cause is found before this meeting, but in cases where the root cause is still unknown, there will be a discussion to plan for further investigations, perhaps involving a third party to help. However, the incident might occur again since the root cause is not found yet, so extra measures will be taken to prepare for possible incidents. Create action points. Depending on the outcome of the discussion, the actions will vary. If the root cause is known, actions will be taken to avoid this incident. Otherwise, further investigations will be planned and assigned to a team to find the root cause. Why You Should Have a Blameless Postmortem Traditionally, the postmortem process was about who made a mistake, and if there was a meeting, the manager would use it as an opportunity to give individual warnings about the consequences of their mistakes. Such an attitude eliminates opportunities to learn from mistakes, and facts would be replaced with who was behind the failure. Sometimes a postmortem meeting turns into another retro in which team members start arguing with each other or discuss issues that are not in the scope of the incident, resulting in people pointing at each other rather than discussing the root cause. This damages the team morale, and such an unproductive manner leads to facing more failures in the future. IT practitioners learned that failures are inevitable, but it is possible to learn from mistakes to improve the way of working and the way we design systems. That’s why the focus turned to actual design and processes instead of the people. Today, most companies are trying to move away from a conservative approach and create an environment where people can learn from failures rather than blame. That's why it is essential to have a blameless postmortem meeting to ensure people feel comfortable sharing their opinions and to focus on improving the process. Now the question is, what does a blameless postmortem look like? Here is my recipe to arrange a productive blameless postmortem process. How To Conduct a Blameless Postmortem Process Suppose an incident occurred in your company, and your team handled it. Let's look at the steps you need to take for the postmortem process. Figure 1: Blameless postmortem process Prepare Before the Meeting Here you collect as much information as possible about the incident. Find the involved people and any third parties and add their names to the report. You could also collect any notes from engineers who have supported this issue or made comments on the subject in different channels. Schedule a Meeting With a Small Group This means arranging a meeting, adding the involved people, and perhaps including stakeholders like the project manager, delivery manager, or whoever should be informed or consulted for this particular issue. Make sure to keep the group small to increase the meeting's productivity. Highlight What Went Right Now that you are in the meeting, the best thing to do is to start with a brief introduction to ensure everyone knows the incident's story. Although this meeting is about failures, you need to highlight positive parts if there are any. Positives could be good communication between team members, quick responses from engineers, etc. Focus on the Incident Facts To have a clear picture of what happened, you don’t want to guess or tell a story. Instead, focus on the precise information you have. That’s why it is recommended to draw attention to facts, such as the order of events and how the incident was mitigated at the end. Hear Stories From Related People There might be other versions of the incident's story. You need to specify a time for people with comments or opinions about it to speak. It is essential to create a productive discussion focused on the incident. Dig Deeper Into the Actual Root Cause After discussing all ideas and considering the facts, you can discuss the possible root cause. In many cases, the root cause might have been found before this meeting, but you can still discuss it here. Define Solutions If the root cause is known, you can plan with the team to implement a solution to prevent this incident from happening again. If it is not known, it would be best to spend more time on the investigation to find the root cause and take extra measures or workarounds to prepare for possible similar incidents. Document the Meeting One good practice is to document the meeting and share it with the rest of the company to make sure everyone is aware, and perhaps other teams can learn from this experience. Best Practices From Google Today in modern companies, a blameless postmortem is a culture with more activities than the traditional postmortem process. SREs at Google have done a great job implementing this culture by ensuring that the postmortem process is not only one event. Let's review some of the best practices from Google that are complementary to your current postmortem process: No postmortem is left unreviewed. Arranging regular review sessions helps to look into outstanding postmortems and close the discussions, collect ideas, and draw actions. As a result, all postmortems are taken seriously and processed. Introduce a postmortem culture. Using a collaborative approach with teams helps introduce postmortem culture to an organization easier and faster by providing various programs, including: Postmortem of the month: This event motivates teams to conduct a better postmortem process. So every month, the best and most well-written postmortem will be shared with the rest of the organization. Postmortem reading clubs: Regular sessions are conducted to review past postmortems. Engineers can see what other teams faced in previous postmortems and learn from the lessons. Ask for feedback on postmortem effectiveness. From time to time, there is a survey for teams to share their experiences and the feedback they have about the postmortem process. This helps evaluate the postmortem culture and increase its effectiveness. If you are interested in learning more about Google's postmortem culture, check out Chapter 15 of Google's book, Site Reliability Engineering. Conclusion Site reliability engineers play an essential role in ensuring that systems are reliable, and keeping this reliability is a continuous job. While developers are thinking of new features, SREs are thinking of a better and smoother process to release features. Incidents are part of the software development lifecycle, but modern teams like SRE teams define processes to help turn those incidents into opportunities to improve their systems. SREs know the importance of blameless postmortem meetings where failures are accepted as part of development. That’s why they focus on reliability. The future of incident management will be more automation and perhaps using artificial intelligence, where a system can fix most of the issues itself. For now, SREs are using blameless postmortems to improve uptime, productivity, and the quality of team relationships. This is an article from DZone's 2022 Performance and Site Reliability Trend Report.For more: Read the Report
This is an article from DZone's 2022 Performance and Site Reliability Trend Report.For more: Read the Report Site reliability engineering (SRE) is the state of the art for ensuring services are reliable and perform well. SRE practices power some of the most successful websites in the world. In this article, I'll discuss who site reliability engineers (SREs) are, what they do, key philosophies shared by successful SRE teams, and how to start migrating your operations teams to the SRE model. Who Are SREs? SREs operate some of the busiest and most complex systems in the world. There are many definitions for an SRE, but a good working definition is a superhuman-merged engineer who is a skilled software engineer and a skilled operations engineer. Each of these roles alone are difficult to hire, train, and retain — and finding people who are good enough at both roles to excel as SREs is even harder. In addition to engineering responsibilities, SREs also require a high level of trust, a keen eye for software quality, the ability to handle pressure, and a little bit of thrill-seeking (in order to handle being on call, of course). The Variance in SREs There are many different job descriptions that are used when hiring SREs. The prototypical example of SRE hiring is Google, who has SRE roles in two different job families: operations-focused SREs and "software engineer" SREs. The interview process and career mobility for these two roles is very different despite both roles having the SRE title and similar responsibilities on the job. In reality, most people are not equally skilled at operations work and software engineering work. Acknowledging that different people have different interests within the job family is likely the best way to build a happy team. Offering a mix of roles and job descriptions is a good idea to attract a diverse mix of SRE talent to your team. What Do SREs Do? As seen in Figure 1, the SRE's work consists of five tasks, often done cyclically, but also in parallel for several component services. Figure 1: SRE responsibility cycle Depending on the size and maturity of the company, the roles of SRE vary, but at most companies, they are responsible for these elements: architecture, deployment, operations, firefighting, and fixing. Architect Services SREs understand how services actually operate in production, so they are responsible for helping design and architect scalable and reliable services. These decisions are generally sorted into design-related and capacity-related decisions. Design Considerations This aspect focuses on reviewing the design of new services and involves answering questions like: Is a new service written in a way that works with our other services? Is it scalable? Can it run in multiple environments at the same time? How does it store data/state, and how is that synchronized across other environments/regions? What are its dependencies, and what services depend on it? How will we monitor and observe what this service does and how it performs? Capacity Considerations In addition to the overall architecture, SREs are tasked with figuring out cost and capacity requirements. To determine these requirements, questions like these are asked: Can this service handle our current volume of users? What about 10x more users? 100x more users? How much is this going to cost us per request handled? Is there a way that we can deploy this service more densely? What resource is bottlenecking this service once deployed? Operate Services Once the service has been designed, it must be deployed to production, and changes must be reviewed to ensure that those changes meet architecture goals and service-level objectives. Deploy Software This part of the job is less important in larger organizations that have adopted a mature CI/CD practice, but many organizations are not yet there. SREs in these organizations are often responsible for the actual process of getting binaries into production, performing a canary deployment or A/B test, routing traffic appropriately, warming up caches, etc. At organizations without CI/ CD, SREs will generally also write scripting or other automation to assist in this deployment process. Review Code SREs are often involved in the code review process for performance-critical sections of production applications as well as for writing code to help automate parts of their role to remove toil (more on toil below). This code must be reviewed by other SREs before it is adopted across the team. Additionally, when troubleshooting an on-call issue, a good SRE can identify faulty application code as part of the escalation flow or even fix it themselves. Firefight While not glamorous, firefighting is a signature part of the role of an SRE. SREs are an early escalation target when issues are identified by an observability or monitoring system, and SREs are generally responsible for answering calls about service issues 24/7. Answering one of these calls is a combination of thrilling and terrifying: thrilling because your adrenaline starts to kick in and you are "saving the day" — terrifying because every second that the problem isn't fixed, your customers are unhappy. SREs answering on-call pages must identify the problem, find the problem in a very complicated system, and then fix the problem either on their own or by engaging a software engineer. Figure 2: The on-call workflow For each on-call incident, SREs must identify that an issue exists using metrics, find the service causing the issue using traces, then identify the cause of the issue using logs. Fix, Debrief, and Evaluate Incidents As the on-call incidents described above are stressful, SREs have a strong interest in making sure that incidents do not repeat. This is done through post-incident reviews (sometimes called "postmortems"). During these sessions, all stakeholders for the service meet and figure out what went wrong, why the service failed, and how to make sure that the exact failure never happens again. Not listed above, but sometimes an SRE's responsibility is building and maintaining platforms and tooling for developers. These include source code repositories, CI/CD systems, code review platforms, and other developer productivity systems. In smaller organizations, it is more likely that SREs will build and maintain these systems, but as organizations grow, these tasks generally grow in scale to where it makes sense to have a separate (e.g., "developer productivity") team handle them. SRE Philosophies One of the most common questions asked is how SREs differ from other operations roles. This is best illustrated through SRE philosophies, the most prevalent of which are listed below. While any operations role will likely embrace at least some of these, only SREs embrace them all. "Just say no" to toil Toil is the enemy of SREs and is described as "tedious, repetitive tasks associated with running a production environment" by Google. You eliminate toil by automating processes so that manual work is eliminated. One philosophy around toil held by many SREs is to try to "automate yourself out of a job" (though there will always be new services to work on, so you never quite get there). Cattle, not pets In line with reducing toil and increasing automation, an important philosophy for SREs is to treat servers, environments, and other infrastructure as disposable. Small organizations tend to take the opposite approach — treating each element of the application as something precious, even naming it. This doesn't scale in the long run. A good SRE will work to have the application's deployment fully automated so that infrastructure and code are stored in the same repositories and deploy at the same time, meaning that if the entire existing infrastructure was blown away, the application could be brought back up easily. Uptime above all Customer-facing downtime is not acceptable. The storied "five nines" of uptime (less than six minutes down per year) should be a baseline expectation for SREs, not a maximum. Services must have redundancy, security, and other defenses so that customer requests are always handled. Errors will happen Error budgeting and the use of service-level indicators is the secret sauce behind delivering exceptional customer facing uptime. By accepting some unreliability in services and architecting their dependent services to work around this, customer experience can be maintained. Incidents must be responded to An incident happening once sometimes happens. The same incident happening twice is beyond the pale. A thorough, blameless post-incident review process is essential to the goal of steadily increasing reliability and performance over time. How to Migrate an Ops Team to SRE Moving from a traditional operations role to an SRE practice is challenging and often seems overwhelming. Small steps add up to big impact. Adopting SRE philosophies, advancing the skill set of your team, and acknowledging that mistakes will occur are three things that can be done to start this process. Adopt SRE Philosophies The most important first step is to adopt the SRE philosophies mentioned in the previous section. The one that will likely have the fastest payoff is to strive to eliminate toil. CI/CD can do this very well, so it is a good starting point. If you don't have a robust monitoring or observability system, that should also be a priority so that firefighting for your team is easier. Start Small: Uplevel Expectations and Skills You can't boil the ocean. Everyone will not magically become SREs overnight. What you can do is provide resources to your team (some are listed at the end of this article) and set clear expectations and a clear roadmap to how you will go from your current state to your desired state. A good way to start this process is to consider migrating your legacy monitoring to observability. For most organizations, this involves instrumenting their applications to emit metrics, traces, and logs to a centralized system that can use AI to identify root causes and pinpoint issues faster. The recommended approach to instrument applications is using OpenTelemetry, a CNCFsupported open-source project that ensures you retain ownership of your data and that your team learns transferable skills. Acknowledge There Will Be Mistakes Downtime will likely increase as you start to adopt these processes, and that must be OK. Use of SRE principles described in this article will ultimately reduce downtime in the long run as more processes are automated and as people learn new skills. In addition to mistakes, accepting some amount of unreliability from each of your services is also critical to a healthy SRE practice in the long run. If the services are all built around this, and your observability is on-point, your application can remain running and serving customers without the unrealistic demands that come with 100 percent uptime for everything. Conclusion SRE, traditionally, merges application developers with operations engineers to create a hybrid superhuman role that can do anything. SREs are difficult to hire and retain, so it's important to embrace as much of the SRE philosophy as possible. By starting small with one app or part of your infrastructure, you can ease the pain associated with changing how you develop and deploy your application. The benefits gained by adopting these modern practices have real business value and will enable you to be successful for years to come. Resources: Site Reliability Engineering, Google "How to Run a Blameless Postmortem," Atlassian Implementing Service Level Objectives, Alex Hidalgo This is an article from DZone's 2022 Performance and Site Reliability Trend Report.For more: Read the Report
Since 2015, Lex Neva has been publishing SRE Weekly. If you’re interested enough in reading about SRE to have found this post, you’re probably familiar with it. If not, there are a lot of great articles to catch up on! Lex selects around 10 entries from across the internet for each issue, focusing on everything from SRE best practices to the socio-side of systems to major outages in the news. I had always figured Lex must be among the most well-read people in SRE, and likely #1. I met up with Lex on a call and was so excited to chat with him about how SRE Weekly came to be, how it continues to run, and his perspective on SRE. The Origins of SRE Weekly I felt like an appropriate start of our conversation was to ask about the start of SRE Weekly: Why did he take on this project? Like many good projects, Lex was motivated to, “be the change he wanted to see." He was an avid reader of DevOps Weekly but wished that something similar existed for SRE. With so much great and educational content created in the SRE space, shouldn’t there be something to help people find the very best? “I wanted there to be a list of things related to SRE every week, and such a thing didn’t exist, and I’m like… Oh.” Lex explained. “I almost fell into it sideways, I thought this was gonna be a huge time sink, but it ended up being pretty fun, actually.” How SRE Weekly Is Made When thinking about the logistics of SRE Weekly, one question likely comes to mind: how? How does he have time to read all those articles? SRE is a methodology of methodologies, a practice that encourages building and improving practices. Lex certainly embodies this with his efficient method of finding and digesting dozens of articles a week. First, he finds new articles. For this, RSS feeds are his favorite tool. Once he’s got a buffer of new articles queued up, he uses an Android application called @voice to listen to them with text-to-speech – at 2.5x speed! Building up the ability to comprehend an article at that speed is a challenge, but for someone tackling the writing output of the entire community, it’s worth it. To choose which articles to include, Lex doesn’t have any sort of strict requirements. He’s interested in articles that can bring new ideas or perspectives, but also likes to periodically include well-written introductory articles to get people up to speed. Things that focus on the socio-side of the sociotechnical spectrum also interest him, especially when highlighting the diversity of voices in SRE. Incident retrospectives are also a genre of post that Lex likes to highlight. Companies posting public statements about outages they’ve experienced and what they’ve learned is a trend Lex wants to encourage to grow. Although they might seem to only tell the story of one incident at one company, good incident retrospectives can bring out a more universal lesson. “An incident is like an unexpected situation that can teach us something – if it’s something that made you surprised about your system, it probably can teach someone else about their system too.” Lex explained how in the aviation industry, massive leaps forward in reliability were made when competing airlines started sharing what they learned after crashes. They realized that any potential competitive advantages should be secondary to working together to keep people safe. “The more you share about your incidents, the more we can realize that everyone makes errors, that we’re all human,” Lex says. Promoting incident retrospectives is how he can further these beneficial trends. Lex’s View of SRE As someone with a front-row seat to the evolution of SRE, I was curious about what sort of trends Lex had seen and how he foresees them growing and changing. We touched on many subjects, but I’ll cover three major ones here: Going Beyond the Google SRE Book Since it was published in 2016, the Google SRE book has been the canonical text when it comes to SRE. In recent years, however, the idea that this book shouldn’t be the end-all-be-all is becoming more prominent. At SREcon 21, Niall Murphy, one of the book’s authors, ripped it up live on camera! Lex has seen this shift in attitudes in a lot of recent writing, and he’s happy to see a more diverse understanding of what SRE can be: “Even if Google came up with the term SRE, lots of companies had been doing this sort of work for even longer,” Lex said. “I want SRE to not just mean the technical core of making a reliable piece of code – although that’s important too – but to encompass everything that goes into building a reliable system.” As SRE becomes more popular, companies of more sizes are seeing the benefits and wanting to hop aboard. Not all of these companies can muster the same resources as Google. Actually, practically only Google is at Google’s level! Lex has been seeing more learning emerge around the challenges of doing SRE at other scales, like startups, where there aren’t any extra resources to spare. Broadening What an SRE Can Be As we break away from the Google SRE book, we also start to break away from traditional descriptions of what a Site Reliability Engineer needs to do. “SRE is still in growing pains,” Lex said. “We’re still trying to figure out what we are. But it’s not a bad thing. I’ve embraced that there’s a lot under the umbrella.” We often think of the “Engineer” in Site Reliability Engineer to be like a “Software Engineer," that is, someone who primarily writes code. But Lex encourages a more holistic view: that SRE is about engineering reliability into a system, which involves so much more than just writing code. He’s been seeing more writing and perspectives from SREs who have “writing code” as a small percentage of their duties – even 0%. “They’re focusing more on the people side of things, the incident response, and coming up with the policies that engender reliability in their company… And I think there’s room for that in SRE because at the heart of it is still engineering, it’s still the engineering mindset. If you only do the technical side of things, you’re really missing out.” Diversifying the Perspectives of SREs Alongside diversifying the role of SREs, Lex hopes to see more diversity among SREs themselves. In our closing discussion, I asked Lex what message he would broadcast to everyone in this space if he could. “It’s all about the people,” he said. “These complex systems that we’re building, they will always have people. They’re a critical piece of the infrastructure, just as much as servers.” Even if what we build in SRE seems to be governed just by technical interactions, people are intrinsic to making those systems reliable. This isn’t a negative; this isn’t just people being “error-makers." People are what give a system strength and resiliency. To this point, Lex highlighted what can make this socio-side of systems better: diversity and inclusion. “Inclusion is important for the reliability of our socio-technical systems because we need to understand the perspective of all our users, not just the ones that are like us. That means thinking across race, gender expression, class, neurodivergence, everything. It’s an area where we need to do better.” Lex hopes to highlight the richness of this diversity in SRE Weekly. As people standing at the relative beginning of SRE, working together to build and evolve the practice, we’re given both a challenge and an opportunity. In order to truly understand and engineer reliability into what we do, we need to discuss proactively our goals and how we’re achieving them. We hope you take the time to reflect on the learning that many great SRE writers share through spaces like SRE Weekly.
Cloud-native computing extends well past Kubernetes-based infrastructure to roll up many modern best-practice approaches to building, running, and leveraging software assets at scale. The cloud-native approach then extends these practices beyond the cloud to the entire IT landscape. Included in this list of best practices are ones that fall into the category of site reliability engineering (SRE). At the core of the practice of SRE is a modern approach to managing the risks inherent in running complex, dynamic software deployments – risks like downtime, slowdowns, and the like. Following the cloud-native approach, we should extend these practices to all software landscape risks, including cybersecurity risks. What, then, might it look like to apply SRE principles beyond their traditional focus on reliability to the full breadth of cybersecurity risk? Error Budgets: The Key to Cloud Native SRE To tie SRE and cybersecurity together, we need a bit of background, starting with Service Level Objectives. The Service Level Objective (SLO) for a site, system, or service (collectively ‘service’) is a precise numerical target for any reliability dimension an organization wants to measure for a given user journey. For example, an SLO might quantify the availability of a service, the latency or the freshness of the information provided to users at the user interface, or other key performance metrics important to the business. Based upon this SLO, the ops team and its stakeholders can make fact-based judgments about whether to increase a service’s reliability (and hence, its cost) or lower its reliability and cost to increase the speed of development of the applications providing the service. Instead of targeting perfection – SLOs of 100% that reflect no issues – the real question is just how far short of perfect reliability you should aim for. We call this quantity the error budget. The error budget represents the number of allowable errors in a given time window that results from an SLO target of less than 100%. In other words, this budget represents the total number of errors a particular service can accumulate over time before users become dissatisfied with the service. Most importantly, it should never be the operator’s goal to entirely eliminate reliability issues because such an approach would both be too costly and take too long – thus impacting the ability of the organization to deploy software quickly and run dynamic software at scale (both of which are core cloud native practices). Instead, the operator should maintain an optimal balance between cost, speed, and reliability. Error budgets quantify this balance. Bringing SRE to Cybersecurity The most fundamental enabler of SRE is observability. Operators must have sufficiently accurate, real-time data about the behavior of the systems and services in their purview to perform the calculations required to quantify SLOs and how close those services are to maintaining them. Cybersecurity engineers require the same sort of observability specific to the threats that they must manage and mitigate. We call this particular type of observability risk-based alerting (RBA). RBA depends upon risk scores. Every observed event that might be relevant to the cybersecurity engineer must calculate its risk score. The risk score for any event is a product of the risk impact (how severe would the effect of the threat’s associated compromise be), risk confidence (how confident the engineer is that the event is a positive indicator of a threat), and a risk modifier that quantifies how critical the threatened user or system is. RBA then quantifies the risk score for each event by leveraging the organization’s choice of security framework (MITRE ATT&CK, for example). RBA gives the cybersecurity engineer the raw data they need to make informed threat mitigation decisions, just as reliability-centric observability provides the SRE with the data they need to mitigate reliability issues. Introducing the Threat Budget Once we have a quantifiable, real-time measure of threats – threat telemetry, as it were – then we can create an analog to SRE for cybersecurity engineers. We can posit Threat Level Objectives (TLOs), which would be precise numerical targets for any particular threat facing the cybersecurity team. Similarly, we can create the notion of a threat budget that would reflect the number of unmitigated threats in a given time window that results from a TLO of less than 100%. In other words, the threat budget represents the total number of unmitigated threats a particular service can accumulate over time before a corresponding compromise adversely impacts the service users. The essential insight here is that threat budgets should never be 100% since eliminating threats entirely would be too expensive and would slow the software effort down, just as 100% error budgets would. Therefore, some threat budgets less than 100% would reflect the optimal compromise among cost, time, and the risk of compromise. We might call this approach to TLOs and threat budgets Service Threat Engineering, analogous to Site Reliability Engineering. What Service Threat Engineering means is that based upon RBA, cybersecurity engineers now have a quantifiable approach to achieving optimal threat mitigation that takes into account all of the relevant parameters instead of relying upon personal expertise, tribal knowledge, and irrational expectations for cybersecurity effectiveness. Even though RBA uses the word risk, I’ve used the word threat to differentiate Service Threat Engineering from SRE. After all, SRE is also about quantifying and managing risks – except with SRE, the risks are reliability-related rather than threat-related. As a result, Service Threat Engineering is more than analogous to SRE. Instead, they are both examples of approaches to managing two different but related kinds of risks. Cybersecurity compromises can certainly lead to reliability issues (ransomware and denial of service being two familiar examples). But there is more to this story. Ops and security teams have always had a strained relationship, working on the same systems with different priorities. However, bringing threat management to the same level as SRE may help these two teams align over similar approaches to managing risk. Service Threat Engineering, therefore, targets the organizational challenges that continue to plague DevSecOps efforts – a strategic benefit that many organizations should welcome.
Berlin Product People GmbH
Software Development Manager,