DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports Events Over 2 million developers have joined DZone. Join Today! Thanks for visiting DZone today,
Edit Profile Manage Email Subscriptions Moderation Admin Console How to Post to DZone Article Submission Guidelines
View Profile
Sign Out
Refcards
Trend Reports
Events
Zones
Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Partner Zones AWS Cloud
by AWS Developer Relations
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Partner Zones
AWS Cloud
by AWS Developer Relations
The Latest "Software Integration: The Intersection of APIs, Microservices, and Cloud-Based Systems" Trend Report
Get the report
  1. DZone
  2. Testing, Deployment, and Maintenance
  3. Deployment
  4. Reliability Is Slowing You Down

Reliability Is Slowing You Down

You can’t work on delivering new features if you’re constantly trying to figure out what’s broken and fix releases that have already gone out the door.

Kit Merker user avatar by
Kit Merker
·
Mar. 19, 23 · Analysis
Like (2)
Save
Tweet
Share
4.22K Views

Join the DZone community and get the full member experience.

Join For Free

Three Hard Facts

First, the complexity of your software systems is through the roof, and you have more external dependencies than ever before. 51% of IT professionals surveyed by SolarWinds in 2021 selected IT complexity as the top issue facing their organization.

Second, you must deliver faster than the competition, which is increasingly difficult as more open-source and reusable tools let small teams move extremely fast. Of the 950 IT professionals surveyed by RedHat, only 1% indicated that open-source software was “not at all important.”

And third, reliability is slowing you down. 

The Reliability/Speed Tradeoff

In the olden days of software, we could just test the software before a release to ensure it was good. We ran unit tests, made sure the QA team took a look, and then we’d carefully push a software update during a planned maintenance window, test it again, and hopefully get back to enjoying our weekend. By 2023 standards, this is a lazy pace! We expect teams to constantly push new updates (even on Fridays) with minimal dedicated manual testing. They must keep up with security patches, release the latest features, and ensure that bug fixes flow to production.

The challenge is that pushing software faster increases the risk of something going wrong. If you took the old software delivery approach and sped it up, you’d undoubtedly have always broken releases. To solve this, modern tooling and cloud-native infrastructure make delivering software more reliable and safer, all while reducing the manual toil of releases.

According to the 2021 State of DevOps report, more than 74% of organizations surveyed have Change Failure Rate (CFR) greater than 16%. For organizations seeking to speed up software changes (see DORA metrics), many of these updates caused issues requiring additional remediation like a hotfix or rollback.

If your team hasn’t invested in improving the reliability of software delivery tooling, you won’t be able to achieve reliable releases at speed. In today’s world, all your infrastructure, including dev/test infrastructure, is part of the production environment.

To go fast, you also have to go safely. More minor incremental changes, automated release and rollback procedures, high-quality metrics, and clearly defined reliability goals make fast and reliable software releases possible.

Defining Reliability

With clearly defined goals, you will know if your system is reliable enough to meet expectations. What does it mean to be up or down? You have hundreds of thousands of services deployed in clouds worldwide in constant flux. The developers no longer coordinate releases and push software. Dependencies break for unexpected reasons. Security fixes force teams to rush updates to production to avoid costly data breaches and cybersecurity threats.

You need a structured, interpreted language to encode your expectations and limits of your systems and automated corrective actions. Today, definitions are in code. Anything less is undefined. The alternative is manual intervention, which will slow you down. 

You can’t work on delivering new features if you’re constantly trying to figure out what’s broken and fix releases that have already gone out the door. The most precious resource in your organization is attention, and the only way to create more is to reduce distractions.

Speeding Up Reliably

Service level objectives (SLOs) are reliability targets that are precisely defined. SLOs include a pointer to a data source, usually a query against a monitoring or observability system. They also have a defined threshold and targets that clearly define pass or fail at any given time. SLOs include a time window (either rolling or calendar aligned) to count errors against a budget. OpenSLO is the modern de facto standard for declaring your reliability targets.

Once you have SLOs to describe your reliability targets across services, something changes. While SLOs don’t improve reliability directly, they shine a light on the disconnect between expectations and reality. There is a lot of power in simply clarifying and publishing your goals. What was once a rough shared understanding becomes explicitly defined. We can debate the SLO and decide to raise, lower, redefine, split, combine, and modify it with a paper trail in the commit history. We can learn from failures as well as successes. 

Whatever other investments you’re making, SLOs help you measure and improve your service. Reliability is engineered; you can’t engineer a system without understanding its requirements and limitations. SLOs-as-code defines consistent reliability across teams, companies, implementations, clouds, languages, etc.

Testing Continuous Integration/Deployment Software development process

Opinions expressed by DZone contributors are their own.

Popular on DZone

  • Fargate vs. Lambda: The Battle of the Future
  • Full Lifecycle API Management Is Dead
  • The 5 Books You Absolutely Must Read as an Engineering Manager
  • Introduction to Containerization

Comments

Partner Resources

X

ABOUT US

  • About DZone
  • Send feedback
  • Careers
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 600 Park Offices Drive
  • Suite 300
  • Durham, NC 27709
  • support@dzone.com
  • +1 (919) 678-0300

Let's be friends: