DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Please enter at least three characters to search
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Zones

Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks

Last call! Secure your stack and shape the future! Help dev teams across the globe navigate their software supply chain security challenges.

Modernize your data layer. Learn how to design cloud-native database architectures to meet the evolving demands of AI and GenAI workloads.

Releasing software shouldn't be stressful or risky. Learn how to leverage progressive delivery techniques to ensure safer deployments.

Avoid machine learning mistakes and boost model performance! Discover key ML patterns, anti-patterns, data strategies, and more.

Related

  • Three Ways AI Is Reshaping DevSecOps
  • The Role of AI in Enhancing DevOps Processes
  • The Future of DevOps
  • The Impact of AI Agents on Modern Workflows

Trending

  • From Zero to Production: Best Practices for Scaling LLMs in the Enterprise
  • Beyond ChatGPT, AI Reasoning 2.0: Engineering AI Models With Human-Like Reasoning
  • How to Practice TDD With Kotlin
  • How to Configure and Customize the Go SDK for Azure Cosmos DB
  1. DZone
  2. Data Engineering
  3. AI/ML
  4. Creating Self-Serve DevOps Without Creating More Toil

Creating Self-Serve DevOps Without Creating More Toil

My team was mouthing off RTFM every time a dev would ask them a question. It needed to end.

By 
Shaked Askayo user avatar
Shaked Askayo
·
Dec. 15, 22 · Opinion
Likes (4)
Comment
Save
Tweet
Share
4.9K Views

Join the DZone community and get the full member experience.

Join For Free

My journey to co-founding Kubiya.ai was triggered by the very real pain of being a DevOps leader supporting both broader organizational goals along with the day-to-day support of software engineers and others. This is my story (or at least the somewhat interesting parts) of what it was like and how I found a productive approach to managing it all.

DevOps Opening Hours: 2:45 p.m. until a Quarter to 3:00 p.m.

It’s really not a joke. DevOps have no time. We need to make sure everything is running smoothly from development to production and often beyond. 

We are busy enhancing CI/CD processes, upgrading and configuring monitoring and alerting systems, and optimizing infrastructure for both cost and security, as well as sitting in lots of meetings. But on top of all that, we need to deal with endless and oftentimes repetitive requests from developers (and others). 

This was exactly my experience as the head of DevOps at my previous company. 

The repetitive requests were not only taking up around 70% of my team’s time and energy, it had them mouthing off RTFM every time a dev would ask them a question.

In Search of DevOps Self-Serve that Works

So I started exploring different solutions for enabling a self-service culture in the organization to reduce the burden of the DevOps team while empowering developers to do more on the operational side.

But before exploring the solutions I explored, I want to mention several things that were constantly on my plate as head of DevOps:

  • Planning, building, and managing multiple types of pipelines for the R&D teams which included CI/CD pipelines, self service automation, and infrastructure as code (IAC)
  • Dealing with permissions requests from different personas in the organization while keeping the security team in the loop
  • Taking care of the onboarding and offboarding process of engineers in the company
  • And of course the maintenance of all the different tools to accomplish those tasks

While doing all of this, my team had to keep an eye on several Slack channels to answer questions from the tech organization, such as:

  • “Where are my logs?”
  • “Why did my pipeline fail?”
  • “Why did my service stop responding?”
  • Kubernetes questions, cloud questions, and more

So we tried several different approaches.

Internal Developer Platforms: Loads of Development

Internal developer platforms, typically created and maintained by a dedicated team, are a combination of common tools and technologies that can be used by software developers for easier access and management of complex infrastructure and processes. 

While we considered building a developer platform, there simply was too much planning required to make this a reasonable endeavor. 

For starters, we had so many tools in use and with a very dynamic organization the number and types of tools were constantly changing making ongoing curation a real challenge. But aside from bringing all these tools together into a single platform, training and adoption was not realistic with a software development team singularly focused on coding the next best thing. We found ourselves struggling to decide how to create such a platform and what features should be included.

Naturally we worried about creating a new kind of toil. Even if we reduced the workload coming from developer requests, embracing an internal developer platform looked like it would bring fresh pain with managing the platform lifecycle, adding new features, and as always, supporting different users.

Workflow Automation Is Not So Automatic

Of course our industry’s standard solution — and one that our tech organization already understood — was to automate everything possible.

So, utilizing tools that our developers were already familiar with, such as Jenkins, we created automation for nearly any issue you can think of. As soon as an issue occurred, we were able to create a dedicated pipeline to solve it.

The main problem was that developers didn't know where to find these workflows. And even if they knew where to find the workflow, they did not usually know which parameters to fill in. So we were back to the same routine of devs approaching us for help.

Permissions were another issue. It was very risky for us to allow large groups to trigger workflows. With so much potential for chaos, deciding who should have authority to run workflows was no easy task.

Even granting permissions and approvals on an ad hoc basis with the security team took a lot of time and effort. Permission lifecycles were also problematic, as offboarding a user who left the company required special handling. 

New workflows required adding permissions and defining who would receive those permissions.

Finally, each time an existing workflow was edited to include more actions, a fresh review of both the workflow and associated permissions was required. 

Chatbot to the Rescue

In light of the fact that developer requests came to us via a chat interface (in our case it was Slack), whether by direct messaging me or my team members, or via the on-call channel, I decided to create a chatbot or Slackbot. For permissions management, I connected our chatbot to our company’s identity provider. 

This allowed the Slackbot to know the role of anyone who approached it. This made it easy to create unique policies for different user roles (defined by the identity provider) in terms of consuming the operational workflows via the Slackbot.

For context gathering, the Slackbot would ask the relevant questions, guiding users to provide the details needed to fill in as parameters of the different workflows that already existed for CI/CD tools like Jenkins, cloud infrastructure, and more.

Besides solving the lack of domain expertise and lack of interest in operations, our Slackbot acted as a proxy with guardrails for developers. This allowed them to do what they needed without over-permissioning.

Most importantly, it reduced my team workload by 70% while eliminating long delays for end users, avoiding long waits in a virtual queue.

Trouble in Chatbot Paradise

While this was amazing, our Slackbot was still not 100% user-friendly. Users had to choose from a static list of workflows and use predetermined words or slash commands. 

Our Slackbot was also unable to handle anything not included in its rule-based, canned interactions. As a result, our Dev team would be left empty-handed in cases of  "out of scope" requests.

The Slackbot maintenance, however, was far worse. With so many workflows to create and so many DevOps cooks in the kitchen, I could not enforce a standard programming language. Whenever one workflow broke, figuring out all the dependencies to find a fix took way too much effort. If I wanted to add a brand-new workflow, it would also require very significant effort and domain expertises.

Which brought us all the way back to the same problem of more toil in managing the Slackbot.

AI-Driven Virtual Assistants

Exploring and experiencing the pros and cons of the various solutions led me to understand that the key to success is finding a solution that benefits both the developers AND DevOps. 

A system that can ask the developer questions if context is missing, transforming context gathering (which DevOps would normally have to handle) into simple conversations.

Using NLU to infer user intent from a phrase fragment and offer to execute the relevant workflows is another area where AI can improve the end-user experience — so even if a developer only knows, for example, a part of the name of a cluster or service, the virtual assistant can figure out what it is that they need.

This combination of all of the features of a chatbot — plus the ability to understand or learn the user's intentions (on the fly) even if it’s something new and unfamiliar — keeps workflows flowing.

In addition to all this, my conversations with Kubiya.ai customers made it clear that a self-service approach needs to be tied in with security as well. Being able to easily manage user permissions both upfront in the form of policy creation for different users and groups as well as with just-in-time, ad hoc approvals is key to a successful self-serve solution.

In summary, my experience building a self-serve culture has shown me that having an efficient system in place is essential for companies who want to move fast as it ensures that all parties — operations, development, security teams — can get their work done with the least amount of toil and friction. 

AI Chatbot DevOps workflow

Opinions expressed by DZone contributors are their own.

Related

  • Three Ways AI Is Reshaping DevSecOps
  • The Role of AI in Enhancing DevOps Processes
  • The Future of DevOps
  • The Impact of AI Agents on Modern Workflows

Partner Resources

×

Comments
Oops! Something Went Wrong

The likes didn't load as expected. Please refresh the page and try again.

ABOUT US

  • About DZone
  • Support and feedback
  • Community research
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 100
  • Nashville, TN 37211
  • support@dzone.com

Let's be friends:

Likes
There are no likes...yet! 👀
Be the first to like this post!
It looks like you're not logged in.
Sign in to see who liked this post!