AIGenOps: Generative AI and Platform Engineering
In the domain of regulated software, especially in the banking environment where we work, there are constraints of security, quality, and network limitations.
Join the DZone community and get the full member experience.
Join For FreeA While Ago...
We have been collaborating with a client in finance for some time now, and in a moment of relaxation, we started discussing generative artificial intelligence. And so, caught up in the excitement, as in a positive retroactive system, we began to sketch out the idea of how to integrate it and implement it in the real-world scenario in which we found ourselves.
Merging the LLM/AI skills and knowledge of a DevOps engineer with the vision of a platform engineer, we began to define the requirements, constraints, and loads of a real scenario in the area of regulated software and then define possible processes and solutions.
But in Which Context?
In the domain of regulated software, especially in the banking environment where we work, there are constraints of security, quality, and network limitations. Added to this are CI/CD loads that can be very high in numbers, already overloaded developers, and cost management. From here the list of initial requirements:
- On-prem system or on managed VM or private cloud, potentially air-gapped
- No performance drops in CI and CD pipelines
- A zero-trust model with approval from the dev
- Selection of components to be impacted
- Limitation of generated objects
The last two points, rather than strong constraints are intended as common sense practices to better address the issues.
Which in Detail...
In the area of regulated software as well as limitations on network reachability (dealing with possible trade secrets) you do not want your data and code to be sent outside on unsecured or unverified systems. Therefore, the system should be hosted on private machines in well-segregated networks. Generative AI processes have a high impact on resource consumption, as well as can require high processing times. Therefore, to limit time and performance impact, they cannot be introduced into the CI/CD cycle: we then assumed an asynchronous and independent “continuous generative loop.”
As a system subject to certification and verification, and having to try to limit improper introductions, one must necessarily approach a zero-trust model, in which the “continuous generative loop” proposes pull requests (also referred to below as PRs) that a manager must validate and approve. With these assumptions, remembering that one of the principles behind platform engineering is “starting with the dev” and wanting to limit processing cost and time, one cannot generate thousands of lines of code in all applications. The generation part should then:
Select and Prioritize Only Those Applications on Which To Take Action
If, for example, there were 3 applications
- One with coverage around 85% and a few code smells,
- One with coverage around 80%, many code smells, and a few minor bugs
- One with 60% coverage and critical vulnerabilities
The system should prioritize the last application and work on that one to equalize the overall level of the application pool.
Limit the Objects Generated
If we require a manager or developer to validate a pull request, in case it contains a massive amount of deliverables, the worst-case scenarios are that the PR is either rejected in its entirety or it is summarily checked, with the risk of introducing errors.
To make sure that the generation activity is in synergy with the day-to-day work of the devs, one has to act by selecting and prioritizing the activities, going for the few (hopefully!) most impactful bugs/vulnerabilities, or covering with tests the most uncovered class with the highest impact.
The selection and prioritization approach allows for faster processing, lower costs, and acting only on the applications that really need external help, but above all not impact the work of the developers.
What Next?
And then the next steps will be:
- Define the application prioritization and selection algorithms
- Define the selection and prioritization algorithms for quality/vulnerability resolutions and code coverage
- Based on the principles of innovation management and platform engineering, identify early adopters, and pioneers to implement in a usable way a solution in a real-world environment, with the help of skilled developers that can collaborate on the optimal development for the end user
In Conclusion
It is possible to introduce generative AI into an IDP in a regulated context, respecting all the constraints and requirements of the environment, without neglecting the end user and his user experience with the system.
Opinions expressed by DZone contributors are their own.
Comments