DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports Events Over 2 million developers have joined DZone. Join Today! Thanks for visiting DZone today,
Edit Profile Manage Email Subscriptions Moderation Admin Console How to Post to DZone Article Submission Guidelines
View Profile
Sign Out
Refcards
Trend Reports
Events
Zones
Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
  1. DZone
  2. Data Engineering
  3. AI/ML
  4. Creating Self-Serve DevOps Without Creating More Toil

Creating Self-Serve DevOps Without Creating More Toil

My team was mouthing off RTFM every time a dev would ask them a question. It needed to end.

Shaked Askayo user avatar by
Shaked Askayo
·
Dec. 15, 22 · Opinion
Like (4)
Save
Tweet
Share
3.64K Views

Join the DZone community and get the full member experience.

Join For Free

My journey to co-founding Kubiya.ai was triggered by the very real pain of being a DevOps leader supporting both broader organizational goals along with the day-to-day support of software engineers and others. This is my story (or at least the somewhat interesting parts) of what it was like and how I found a productive approach to managing it all.

DevOps Opening Hours: 2:45 p.m. until a Quarter to 3:00 p.m.

It’s really not a joke. DevOps have no time. We need to make sure everything is running smoothly from development to production and often beyond. 

We are busy enhancing CI/CD processes, upgrading and configuring monitoring and alerting systems, and optimizing infrastructure for both cost and security, as well as sitting in lots of meetings. But on top of all that, we need to deal with endless and oftentimes repetitive requests from developers (and others). 

This was exactly my experience as the head of DevOps at my previous company. 

The repetitive requests were not only taking up around 70% of my team’s time and energy, it had them mouthing off RTFM every time a dev would ask them a question.

In Search of DevOps Self-Serve that Works

So I started exploring different solutions for enabling a self-service culture in the organization to reduce the burden of the DevOps team while empowering developers to do more on the operational side.

But before exploring the solutions I explored, I want to mention several things that were constantly on my plate as head of DevOps:

  • Planning, building, and managing multiple types of pipelines for the R&D teams which included CI/CD pipelines, self service automation, and infrastructure as code (IAC)
  • Dealing with permissions requests from different personas in the organization while keeping the security team in the loop
  • Taking care of the onboarding and offboarding process of engineers in the company
  • And of course the maintenance of all the different tools to accomplish those tasks

While doing all of this, my team had to keep an eye on several Slack channels to answer questions from the tech organization, such as:

  • “Where are my logs?”
  • “Why did my pipeline fail?”
  • “Why did my service stop responding?”
  • Kubernetes questions, cloud questions, and more

So we tried several different approaches.

Internal Developer Platforms: Loads of Development

Internal developer platforms, typically created and maintained by a dedicated team, are a combination of common tools and technologies that can be used by software developers for easier access and management of complex infrastructure and processes. 

While we considered building a developer platform, there simply was too much planning required to make this a reasonable endeavor. 

For starters, we had so many tools in use and with a very dynamic organization the number and types of tools were constantly changing making ongoing curation a real challenge. But aside from bringing all these tools together into a single platform, training and adoption was not realistic with a software development team singularly focused on coding the next best thing. We found ourselves struggling to decide how to create such a platform and what features should be included.

Naturally we worried about creating a new kind of toil. Even if we reduced the workload coming from developer requests, embracing an internal developer platform looked like it would bring fresh pain with managing the platform lifecycle, adding new features, and as always, supporting different users.

Workflow Automation Is Not So Automatic

Of course our industry’s standard solution — and one that our tech organization already understood — was to automate everything possible.

So, utilizing tools that our developers were already familiar with, such as Jenkins, we created automation for nearly any issue you can think of. As soon as an issue occurred, we were able to create a dedicated pipeline to solve it.

The main problem was that developers didn't know where to find these workflows. And even if they knew where to find the workflow, they did not usually know which parameters to fill in. So we were back to the same routine of devs approaching us for help.

Permissions were another issue. It was very risky for us to allow large groups to trigger workflows. With so much potential for chaos, deciding who should have authority to run workflows was no easy task.

Even granting permissions and approvals on an ad hoc basis with the security team took a lot of time and effort. Permission lifecycles were also problematic, as offboarding a user who left the company required special handling. 

New workflows required adding permissions and defining who would receive those permissions.

Finally, each time an existing workflow was edited to include more actions, a fresh review of both the workflow and associated permissions was required. 

Chatbot to the Rescue

In light of the fact that developer requests came to us via a chat interface (in our case it was Slack), whether by direct messaging me or my team members, or via the on-call channel, I decided to create a chatbot or Slackbot. For permissions management, I connected our chatbot to our company’s identity provider. 

This allowed the Slackbot to know the role of anyone who approached it. This made it easy to create unique policies for different user roles (defined by the identity provider) in terms of consuming the operational workflows via the Slackbot.

For context gathering, the Slackbot would ask the relevant questions, guiding users to provide the details needed to fill in as parameters of the different workflows that already existed for CI/CD tools like Jenkins, cloud infrastructure, and more.

Besides solving the lack of domain expertise and lack of interest in operations, our Slackbot acted as a proxy with guardrails for developers. This allowed them to do what they needed without over-permissioning.

Most importantly, it reduced my team workload by 70% while eliminating long delays for end users, avoiding long waits in a virtual queue.

Trouble in Chatbot Paradise

While this was amazing, our Slackbot was still not 100% user-friendly. Users had to choose from a static list of workflows and use predetermined words or slash commands. 

Our Slackbot was also unable to handle anything not included in its rule-based, canned interactions. As a result, our Dev team would be left empty-handed in cases of  "out of scope" requests.

The Slackbot maintenance, however, was far worse. With so many workflows to create and so many DevOps cooks in the kitchen, I could not enforce a standard programming language. Whenever one workflow broke, figuring out all the dependencies to find a fix took way too much effort. If I wanted to add a brand-new workflow, it would also require very significant effort and domain expertises.

Which brought us all the way back to the same problem of more toil in managing the Slackbot.

AI-Driven Virtual Assistants

Exploring and experiencing the pros and cons of the various solutions led me to understand that the key to success is finding a solution that benefits both the developers AND DevOps. 

A system that can ask the developer questions if context is missing, transforming context gathering (which DevOps would normally have to handle) into simple conversations.

Using NLU to infer user intent from a phrase fragment and offer to execute the relevant workflows is another area where AI can improve the end-user experience — so even if a developer only knows, for example, a part of the name of a cluster or service, the virtual assistant can figure out what it is that they need.

This combination of all of the features of a chatbot — plus the ability to understand or learn the user's intentions (on the fly) even if it’s something new and unfamiliar — keeps workflows flowing.

In addition to all this, my conversations with Kubiya.ai customers made it clear that a self-service approach needs to be tied in with security as well. Being able to easily manage user permissions both upfront in the form of policy creation for different users and groups as well as with just-in-time, ad hoc approvals is key to a successful self-serve solution.

In summary, my experience building a self-serve culture has shown me that having an efficient system in place is essential for companies who want to move fast as it ensures that all parties — operations, development, security teams — can get their work done with the least amount of toil and friction. 

AI Chatbot DevOps workflow

Opinions expressed by DZone contributors are their own.

Popular on DZone

  • Three SQL Keywords in QuestDB for Finding Missing Data
  • Web Application Architecture: The Latest Guide
  • Artificial Intelligence in Drug Discovery
  • A Simple Union Between .NET Core and Python

Comments

Partner Resources

X

ABOUT US

  • About DZone
  • Send feedback
  • Careers
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 600 Park Offices Drive
  • Suite 300
  • Durham, NC 27709
  • support@dzone.com
  • +1 (919) 678-0300

Let's be friends: