Generative AI for DevOps: A Practical View
Generative AI empowers DevOps teams to eliminate tedious repetition, strengthen automation, and condense complex workflows into simple conversational actions.
Join the DZone community and get the full member experience.Join For Free
The concept of generative AI describes machine learning algorithms that can create new content from minimal human input. The field has rapidly advanced in the past few years, with projects such as the text authorship tool ChatGPT and realistic image creator DALL-E2 attracting mainstream attention.
Generative AI isn't just for content creators, though. It's also poised to transform technical work in the software engineering and DevOps fields. For example, GitHub Copilot, the controversial "AI pair programmer," is already prompting reconsideration of how code is written, but collaborative AI's potential remains relatively unexplored in the DevOps arena.
In this article, we'll look toward a future where generative AI empowers DevOps teams to eliminate tedious repetition, strengthen their automation, and condense complex workflows into simple conversational actions. But before all that, let's dive into the DevOps issues that generative AI can improve.
What's Wrong With DevOps?
DevOps is far from being a solved problem. While the adoption of DevOps mentalities is growing rapidly year-over-year, the process remains dependent on many tools, a limited talent pool, and repetitive tasks that are only partially automated.
DevOps engineers can spend too much time on menial work that doesn't contribute significant business value, such as approving deployments, checking the status of environments, and scaffolding basic config files. Although unavoidable, these jobs are chores that don't directly contribute to the final product. They're also great candidates for generative AI to handle, with ChatGPT and Copilot (or OpenAI Codex upon which Copilot is built) all potentially able to alleviate some of the stress:
- They can populate common config files and templates, so engineers don't have to.
- They help team members gain new skills by suggesting contextually relevant snippets. This provides assistance when it's needed, lessening the learning curve during upskilling.
- They reduce the time taken to scaffold new assets and improve their consistency, helping to improve maintainability.
However, existing systems are limited by their narrow focus on content generation. DevOps assistants are more powerful if they also offer intent- and action-based experiences to trigger workflow steps and apply state changes. For example, imagine the experience if you merged Copilot's code authorship with a bi-directional conversational interface:
- You could ask the assistant to start processes on-demand, then be prompted to supply inputs when required.
- Developers would have self-service access to potentially sensitive tasks, such as requesting a deployment to production. AI would safely perform the action on their behalf, minimizing the risk of errors and establishing a safety barrier between the developer and the infrastructure. The AI assistant could also request a review from relevant team members before committing to the procedure to ensure everyone's informed of platform changes.
- AI could alert you in real time as monitoring metrics change. For example, you'd receive a message with a choice of immediate actions when deployments fail, a security breach is detected, or performance deviates from the baseline.
Importantly, these capabilities aren't replacing humans or fundamentally changing their role. This form of AI augments engineering abilities by handling the mundane and consistently enforcing safety mechanisms. It frees up DevOps teams to complete more meaningful work in less time.
The Future of DevOps With Generative AI
There's huge potential for generative AI to redefine how DevOps works. Here are three specific areas where it will dominate.
1. Automatic Failure Detection, With Suggested Remedies
Failures are a constant problem for developers and operators alike. They're unpredictable interruptions that force an immediate context switch to prioritize a fix. Unfortunately, this hinders productivity, slows release schedules, and causes frustration when remedial work doesn't go as planned.
AI agents can detect faults and investigate their causes. Moreover, they can combine their analysis with generative capabilities and knowledge of past failures to suggest immediate actions within the context where the alert's displayed.
Consider a simple Kubernetes example: The assistant notices that production is down; realizes the Pod has been evicted due to resource constraints; and provides action buttons to restart the Pod, scale the cluster, or terminate other disused resources. The team can resolve the incident with a single click instead of spending several minutes manually troubleshooting.
2. On-Demand Code/Config Generation and Deployment
Generative AI's ability to author code provides incredible value. Layering in conversational intents makes it more accessible and convenient. For example, you can ask an AI agent to set up a new project, config file, or Terraform state definition by writing a brief message into a chat interface. The agent can prompt you to supply values for any template placeholders, then notify appropriate stakeholders that the content's ready for review.
After approval's been obtained, AI can inform the original developer, launch the project into a live environment, and provide a link to view the deployment and start iterating upon it. This condenses several distinct sequences into one self-service action for developers. Ops teams don't need to manually provision the project's resources beforehand, allowing them to stay focused on their own tasks.
3. Prompt-Driven On-Demand Workflow Management
The next generation of AI agents goes beyond simple text and photo creation to support fully automated prompt-driven workflows. For example, Bi-directional AI lets you start processes using natural language, such as "restart the production cluster" to interact with your AWS ECS resources. AI doesn't need to be told which platform you're using or the specific steps it should run. At Kubiya.ai, for example, we are already taking full advantage of this and now offer our customers the option to create any DevOps workflow via natural language prompts.
These agents' language models are trained against the vocabularies of your cloud services. When you ask for a cluster to be restarted, the agent interprets your words using its domain knowledge. For example, it knows that your "production" cluster runs on AWS and that it must retrieve the cluster's details, then make the correct API calls to restart it, such as ecs.UpdateService, etc. Your words are directly translated into fully functioning workflows.
Furthermore, the bi-directional aspect means the AI agent becomes even more capable over time. Once you've started running your workflows, the agent trains against them, too, allowing it to suggest similar processes for future scenarios and describe what each workflow actually does.
This approach lets devs do more without involving ops teams. The AI agent mediates between humans and infrastructure platforms, allowing anyone to initiate workflows consistently and without compromising security. As part of the workflow, the agent can prompt for input at relevant points, such as requesting you to select a cloud account, data center region, machine type, and pricing tier when you ask it to "add a new virtual machine."
The Takeaway: Generative AI Safely Accelerates Your Work
DevOps use cases for generative AI accelerate primary tasks while increasing accessibility, security, and reliability. In addition, they empower developers to focus on moving forwards with new functionality instead of repeatedly running familiar processes and waiting for results.
Agents that are intelligent enough to sustain a conversation act like another member of your team. They support developers who could be unfamiliar with certain tools while ensuring that the organization's security and compliance policies are fully adhered to. These safeguards protect the codebase and give developers the confidence that they can initiate any workflow. In addition, reducing the number of interactions with the DevOps team enhances efficiency, tightening the feedback loop.
Generative AI isn't a static experience either. It gets better over time as it analyzes interactions to more accurately establish user intent. For example, if recommendations aren't suitable the first time you type a query, you can expect them to be improved as you and others repeat the request and take different courses of action.
AI agents support missing human knowledge too. They let developers start processes even when they're unfamiliar with some of the steps, tools, or terms involved. AI can fill the gaps in questions such as "Which instances have failed?" to work out that you're referring to the Kubernetes Pods in your production cluster. These capabilities let AI effectively supplement human abilities, rendering it a source of supportive hints for the team.
ROI Is Critical With Generative AI
Organizations that use AI regularly will likely have the best results because their agents will become more adept at anticipating their requirements. However, it's also important not to overreach as you add AI to your workflows. The most successful adoptions will be focused on solving a genuine business need. First, assess your processes to identify bottlenecks between dev and ops teams, then target those repetitive use cases with AI.
The solution you select should help you reach your KPIs, such as closing more issues or resolving incidents faster. Otherwise, the AI agent will be underused, hindering your natural operating procedures.
Generative AI is one of today's most quickly maturing technologies. As a result, ChatGPT has attained a degree of virality as more researchers, consumers, and organizations begin exploring its capabilities. DALL-E2 has delivered similarly spectacular results, while over 1.2 million developers used GitHub Copilot during its first 12 months.
All three technologies demonstrate clear revolutionary potential, but it's the mixed and highly complex workflows of DevOps that could benefit the most in the long term. For example, DevOps combines the creation of new assets, such as code and configs, with sequential processes like deployment approvals and review requests.
Contrary to some outsider projections, generative AI for DevOps will go beyond mere templating of common file snippets to offer full workflow automation. Using simple conversational phrases, you can instruct your agent to take specific actions on your behalf, from provisioning new cloud resources to checking performance in production. As a result, the agent will provide a real-time bi-directional feedback loop that improves collaboration, boosts productivity, and reduces the everyday pressures faced by devs.
Opinions expressed by DZone contributors are their own.