Artificial intelligence (AI) and machine learning (ML) are two fields that work together to create computer systems capable of perception, recognition, decision-making, and translation. Separately, AI is the ability for a computer system to mimic human intelligence through math and logic, and ML builds off AI by developing methods that "learn" through experience and do not require instruction. In the AI/ML Zone, you'll find resources ranging from tutorials to use cases that will help you navigate this rapidly growing field.
DZone's 2025 Developer Community Survey
Fortifying Cloud Security Operations with AI-Driven Threat Detection
User Entity Behavior Analysis (UEBA) is a security layer that uses machine learning and analytics to detect threats by analyzing patterns in user and entity behavior. Here’s an oversimplified example of UEBA: suppose you live in Chicago. You’ve lived there for several years and rarely travel. But suddenly there’s a charge to your credit card from a restaurant in Italy. Someone is using your card to pay for their lasagna! Luckily, your credit card company recognizes the behavior as suspicious, flags the transaction, and stops it from settling. This is easy for your credit card company to flag: they have plenty of historical information on your habits and have created a set of logical rules and analytics for when to flag your transactions. But most threats are not this easy to detect. Attackers are continuously becoming more sophisticated and learning to work around established rules. As a result, traditional UEBA that relies primarily on static, rigid rules is no longer enough to protect your systems. The End of Traditional UEBA — or, Why Your UEBA No Longer Works Many UEBA tools were built around static rules and predefined behavioral thresholds. Those approaches were useful for catching predictable, well-understood behavior patterns, but are not great in modern environments where user activity, applications, and attacker behavior change constantly. Soon, AI agents will act on behalf of users and introduce even more diversity. Here are the main limitations of traditional, static-rule-driven UEBA: Static behavioral thresholds don’t adapt to real user behavior over time. They rely on fixed assumptions (e.g., “alert if X happens more than Y times”), which quickly become outdated as user behavior and environments evolve.Rules require continuous manual tuning. Security teams spend time chasing false positives or rewriting rules as user behavior changes, which slows response and increases operational overhead.Isolated detection logic lacks context. Legacy UEBA often analyzes events in silos, instead of correlating activity across identity, endpoint, network, and application layers. This limits the ability to detect subtle behavioral anomalies. As a result, certain types of threats that blend into normal activity can go unnoticed despite the presence of rules. So if legacy UEBA is not effective …what’s the solution? What Modern Enterprises Actually Need from UEBA Modern enterprises need UEBA systems that can do three things: Immediately detect attacks. When attackers can morph in an instant, you need a security layer that moves just as fast.Recognize attacks that are highly sophisticated and complex. Attacks are no longer simple enough to be caught by a set of rules — even advanced ones backed with behavioral analytics.Integrate seamlessly with existing security operations. Let’s look at each in more detail and how it can be achieved. Immediately Detect Attacks (Without a Long Training Period) Traditional UEBA training periods leave organizations vulnerable for months and chronically behind on detecting the latest threats. A typical three to six-month learning period creates a huge security gap. Day-one detection capabilities for behavioral threats and compromised accounts require first-seen and outlier rules that can spot anomalous behavior immediately without waiting for machine learning models to mature. You need near-instant behavioral baselines. How? Luckily, most organizations already have the data they need: years of historical log information sitting in their security information and event management (SIEM) systems. Modern UEBA systems use this information to create behavioral baselines instantly. For example, companies that advocate for the “log everything” approach have tools that use the data you already have to create powerful baselines in just minutes. Detect Highly Sophisticated Attacks Today’s attacks blend in with normal business operations. Correlation rules miss behavioral threats that show only subtle anomalies; they can’t identify compromised accounts or insider threats that are performing normal-looking activities. Modern UEBA solutions must be able to detect first-seen activities, such as unusual file sharing via OneDrive. They need to gain access to new proxy categories and suspicious cloud service usage that don’t match historical user behavior. This comes down to using the right tools. For example, Microsoft Sentinel can identify unusual Azure identity behaviors such as abnormal cloud service access patterns that could indicate account compromise or data exfiltration. And Sumo Logic has first-seen and outlier rules that can detect when an attacker is trying to use a network sniffing tool. They catch endpoint enumeration and suspicious email forwarding rules that deviate from established patterns. Integration With Existing Security Operations UEBA delivers meaningful value when it fits naturally into existing security workflows. Security teams rely on SIEM, SOAR, identity systems, and endpoint tools to build a complete picture of activity across their environment. UEBA works best when its behavioral insights are delivered in the same place analysts already investigate and respond to alerts. Effective UEBA solutions, therefore, integrate directly with the broader security platform, allowing behavioral anomalies to be correlated with logs, identity events, and threat intelligence. This unified context helps analysts make faster, more accurate decisions without switching tools or managing separate consoles. Flexibility is also important. Organizations must be able to adjust detection logic and behavioral thresholds to match their environment, risk tolerance, and operational needs. When UEBA is tightly integrated and adaptable, it becomes an extension of the security workflow rather than an additional system to maintain. UEBA as the Foundation for AI Security Agents UEBA hasn’t been replaced by AI. Instead, UEBA has become the way to train AI. AI-powered detection and response solutions perform best when they ingest clean, comprehensive behavior baselines, and that’s exactly what mature UEBA can provide. AI Agents Need Quality Behavioral Baselines AI security agents aren’t magic. They follow the GIGO (garbage in, garbage out) principle just like any other data-intensive system. Feed an AI agent high-quality behavioral data, and it will thrive. But if you feed it insufficient or poor-quality data, then you’ll become part of the 95% statistic of AI pilots that fail to deliver real business value. Structured UEBA rules also give the agents specialist knowledge, such as who should log in where, how often a service account connects to S3, and typical overnight file volumes. AI agents can learn (and extend) these rulesets. AI Detects, Then AI Responds Security teams often drown in alerts. Teams can’t keep up. But when UEBA feeds AI: First-seen rules become automatic triggers. Instead of waiting for an analyst, an agent can begin gathering data and context within seconds.AI can rank threats, helping to make sure human attention is given to the events with the biggest deviation or highest blast radius.Entity relationship maps derived from UEBA help agents model lateral-movement risk and choose containment tactics (for example: quarantine the host, revoke credentials, etc.). Once the system can reliably detect threats, you can take it to the next level and allow your AI agents to take action, too. From UEBA Rules to Autonomous Security Operations Manual security operations have a scaling problem. They can’t keep pace with modern threat volumes and complexity. As a result, organizations miss threats or burn out security analysts with overwhelming alert fatigue. But with UEBA first-seen rules, AI agents can immediately begin collecting evidence and correlating events when anomalous behavior is detected. Outlier detection can feed AI-driven risk scoring and prioritization algorithms. And behavioral baselines can ensure that automated responses are based on a solid understanding of what constitutes legitimate versus suspicious activity within the specific organizational context. You can still have a human in the loop, authorizing each action recommended by the AI system. Eventually, you may delegate action to the AI system as well, with humans merely being notified after the fact. Building AI-Ready Behavioral Foundations Now Modern UEBA platforms are already generating AI-compatible behavioral data. These platforms structure their outputs in ways that AI agents can easily consume and act upon. For example: Ongoing discovery of the best ways to format and organize data to fit optimally into the context of LLMs, and how to provide them with effective toolsSignal-clustering algorithms to reduce the noise that might confuse an AI agent’s decision-making. This ensures that only meaningful behavioral anomalies reach automated systems for action.Rule customization and match lists provide structured data that AI agents can interpret and act upon. This creates clear decision trees for autonomous responses.Historical baseline capabilities create rich training datasets without waiting months for AI deployment. Organizations can leverage years of existing log data. AI agents can begin operating with sophisticated behavioral understanding from day one rather than starting with blank behavioral models. With these capabilities already in place, organizations can transition seamlessly from manual to automated security operations. The Bottom Line When implementing UEBA, focus on true principles and actionable strategies: 1. Ensure comprehensive, high‑quality data integration. Use all relevant data sources: existing logs, new telemetry, identity systems, endpoints, and cloud apps to build complete behavioral profiles. If critical data is missing, you should collect it and add it to the UEBA’s scope. For example, a user’s calendar showing a business trip to Tokyo is very relevant when the system detects login attempts from Japan. 2. Accelerate meaningful baselines using historical data and rapid observation periods. Leverage historical log data to establish baselines quickly, but expect this to take a couple of days to a few weeks. Supplement with fresh data as needed to ensure the baseline reflects current behaviors. For example, if an employee moves to a different team, the system should expect a change in behavior. 3. Integrate UEBA insights with your current security workflows. UEBA should capitalize on SIEM and other security tools to deliver high-impact alerts and operational value. Avoid requiring extensive new infrastructure unless necessary, and always align the solution to your organization’s needs. These approaches deliver immediate value and adapt to changes to maximize the coverage and accuracy of behavioral analytics. Your success metrics matter just as much as your implementation. Track the following: How many sophisticated threats does UEBA catch that your traditional systems missThe reduction in dwell time for compromised accountsCoverage improvements for lateral movement and unknown attack patternsAnalyst efficiency gains from richer contextual alerts. These metrics prove value to stakeholders and help you continuously refine your approach. While classic rule-based UEBA relied on manual configuration, today’s UEBA platforms enhance these foundations with autonomous analytics using statistical models, adaptive baselines, and AI-driven outlier detection. Functions like first-seen and outlier detection do leverage rules, but they operate as part of a dynamic, context-aware system rather than a set of static match expressions. AI agents continuously monitor and analyze vast streams of behavioral data, learning what constitutes normal activity and identifying subtle anomalies that may indicate emerging threats. By correlating signals across users, devices, and time, these agents enable real-time, adaptive detection and response. This elevates security operations from manually maintained static rule matching to intelligent and proactive protection.
If you read my articles published on DZone this year, you would have sensed that I love automation and that Infrastructure as Code (IaC) is my buddy for automating infrastructure provisioning. Recently, I started exploring and learning about the major shifts happening in the IaC landscape. As part of my weekend readings in the last couple of months, I came across several exciting announcements from HashiConf 2025, Pulumi's new AI capabilities, and a revolutionary platform called Formae. In this article, let's learn about how IaC progressed in 2025 and how it helped automation, particularly for provisioning AI infrastructure. Infrastructure-as-Code has transformed how we manage cloud resources, yet 2025 brought innovations that fundamentally changed the game. From AI-powered agents that write and deploy infrastructure code to stateless platforms that eliminate drift detection complexity, this year marked a turning point in infrastructure automation. The State of IaC: Where We Stand Today Before diving into specific tools and announcements, let's understand the current landscape. According to the State of IaC 2025 report, cloud complexity has grown for 65% of organizations. Only 6% achieved full cloud codification, meaning most infrastructure is still managed manually. Less than 33% continuously monitor drift, taking a reactive approach to infrastructure changes. The report makes it clear that manual provisioning is legacy. Declarative configuration files are table stakes. The automation-first pipeline emerged as the gold standard, where infrastructure changes are treated the same way as code deployments: version-controlled, tested, reviewed, and automated. HashiConf 2025: Major Announcements That Matter September 2025 marked HashiConf's 10th anniversary in San Francisco. HashiCorp, now part of IBM, had several announcements that caught my attention. Project Infragraph: Real-Time Infrastructure Intelligence Project Infragraph represents a fundamental shift in infrastructure observability. Instead of piecing together data from multiple monitoring tools, teams get a unified view that understands relationships between resources. Project Infragraph Project Infragraph enables infrastructure that can observe its own state, reason about optimal configurations, and act autonomously. The private beta launches in December 2025. Source: https://newsroom.ibm.com/2025-09-25-hashicorp-previews-the-future-of-agentic-infrastructure-automation-with-project-infragraph Terraform Stacks: General Availability After months in public beta, Terraform Stacks reached general availability with backward-compatible APIs. The concept addresses a pain point I've experienced countless times: coordinating deployments across different teams, each managing their own state files. Stacks use a component-based architecture. Here's a simple example showing how you define reusable components: Plain Text # stack.tfcomponent.hcl component "vpc" { source = "./modules/vpc" inputs = { region = var.region environment = var.environment } } component "eks_cluster" { source = "./modules/eks" inputs = { vpc_id = component.vpc.vpc_id region = var.region } depends_on = [component.vpc] } What Changed in the GA Release? All configuration files now use the .tfcomponent.hcl extension instead of .tfstack.hcl, providing a standardized naming convention. Deployment groups support new orchestration rules for better control over deployment order. Destroy operations work through code instead of UI-only, giving teams version-controlled teardown workflows. Most importantly, Terraform manages dependency resolution automatically, eliminating manual orchestration. What used to require careful orchestration and multiple deployment windows now happens with one action. Terraform handles orchestration, dependency resolution, and change propagation automatically. This makes managing complex multi-component infrastructures significantly simpler. MCP Servers: Bridging AI and Infrastructure HashiConf introduced Model Context Protocol servers for Terraform, Vault, and Vault Radar. These MCP servers act as bridges between AI agents and existing infrastructure tools. Here's a simple example of how you might use MCP to interact with Terraform through natural language: Python # Example: Using MCP to trigger Terraform workspace runs from mcp_client import MCPClient client = MCPClient("terraform") # Natural language request response = client.execute( "Trigger a workspace run for the production environment and notify the team on Slack when complete" ) # MCP translates this to Terraform API calls # No need to write complex API integration code print(f"Workspace run initiated: {response.run_id}") You can now tell your AI assistant to trigger workspace runs, query secrets, or discover unmanaged resources without switching contexts or writing complex scripts. This dramatically reduces the friction in infrastructure operations. Additional Features Worth Noting HashiConf also announced several other features that reached general availability. Terraform search helps teams discover and import resources in bulk more efficiently. Azure Copilot with Terraform integration simplifies adoption without requiring deep Terraform knowledge. Hold Your Own Key gives organizations ownership of encryption keys used to access sensitive data in HCP Terraform. HCP Waypoint provides application template catalogs, shielding developers from code-level infrastructure details. Pulumi Neo: AI-Powered Infrastructure Agent While Terraform continued its market dominance, Pulumi made serious waves with Neo, their AI infrastructure agent. After a long journey with Terraform, when HCL2 came out, I started exploring alternatives where I could use the programming language of my choice. That's when I found Pulumi. Why Pulumi Matters Pulumi is a modern Infrastructure as Code platform that enables developers to create, deploy, and manage cloud resources using familiar programming languages instead of domain-specific languages. Instead of learning HCL, you can use TypeScript, Python, Go, C#, Java, or even YAML. This means full IDE support with code completion, error checking, and refactoring capabilities that come naturally with general-purpose programming languages. Here's a comparison of the same infrastructure in Terraform vs Pulumi. First, the Terraform approach: Terraform: Plain Text resource "aws_s3_bucket" "data_bucket" { bucket = "my-data-bucket" tags = { Environment = "Production" } } Pulumi: Python import pulumi_aws as aws data_bucket = aws.s3.Bucket( "data-bucket", bucket="my-data-bucket", tags={"Environment": "Production"} ) pulumi.export("bucket_name", data_bucket.id) Neo: The AI Infrastructure Agent Neo represents Pulumi's answer to the "velocity trap," where AI coding assistants make developers faster, but infrastructure teams can't keep up. Neo request flow Neo offers progressive autonomy. Development environments might permit fully autonomous operation, like daily waste cleanup and weekly drift reconciliation. Production changes may require human approval. When Neo encounters unexpected states or errors, it can self-diagnose or loop in a human for assistance. As confidence builds, the autonomy boundary expands. Formae: Rethinking IaC Fundamentals In October 2025, Platform Engineering Labs launched Formae, challenging fundamental assumptions about how IaC should work. Let's learn about how it uses PKL and introduces a stateless approach. The Problems Formae Solves State file corruption and drift detection have plagued infrastructure teams forever. You know the scenario: someone makes a manual change in the console, your Terraform state drifts, and now you're spending hours reconciling reality with what your code thinks exists. Traditional IaC tools require importing existing resources through a painful manual process, maintaining state files with corruption risk, detecting drift reactively, and reconciling manual changes in a time-consuming manner. Formae eliminates these issues by making reality itself the state. Metastructure: A New Concept Formae introduces "Metastructure," which combines infrastructure configuration with operational logic. Traditional IaC uses static configuration and planned state files, requires manual imports, performs periodic drift detection, and only manages tool-specific resources. Formae's Metastructure combines configuration with operational logic, uses reality as state, provides automatic discovery, performs continuous synchronization, and discovers all resources regardless of creation method. Here's a PKL configuration example: Plain Text // infrastructure.pkl module infrastructure import "pkl:aws" vpc: aws.VPC { cidrBlock = "10.0.0.0/16" tags { ["Name"] = "production-vpc" } } instances: Listing<aws.EC2.Instance> { new { instanceType = "t3.large" vpcId = vpc.id } } Brownfield Environments Formae excels in brownfield environments where existing infrastructure needs code management. For existing AWS resources, traditional approaches require manually importing each resource, while Formae simply runs formae extract --target aws. For resources created via the console, traditional tools show drift detection alerts, but Formae automatically codifies and merges them. When using multiple management tools, traditional approaches face import conflicts, but Formae co-exists with all tools. The team learning curve is steep with tool-specific import syntax, but Formae makes it automatic with no imports needed. Example workflow: Shell # Discover existing infrastructure formae extract --target aws --output current-infrastructure.pkl # Review and modify vim current-infrastructure.pkl # Apply changes formae apply current-infrastructure.pkl Formae automatically discovers and codifies existing infrastructure, eliminating painful import processes. AI Infrastructure Provisioning: The Driving Force Much of the IaC innovation in 2025 came from one driving force: provisioning and managing AI infrastructure at scale. Training frontier AI models requires coordination that makes traditional deployments look simple. The AI Infrastructure Challenge Let's understand the complexity through a diagram showing the full AI training infrastructure stack: AI infrastructure provisioning You're dealing with petabytes of data preparation across thousands of CPU cores. Massive GPU clusters running hot for months. Checkpoint management, where you lose hours of training because you didn't save state properly, costs real money. Gradient synchronization across hundreds of GPUs. Fault-tolerant scheduling where hardware failures become statistical certainties rather than edge cases. Traditional applications use CPU-based compute running for minutes to hours, dealing with gigabytes of data, using standard retry logic for fault tolerance, having predictable costs, and scaling horizontally. AI training requires GPU clusters with hundreds of GPUs, runs for days to months, handles petabytes of data, needs checkpoint-based recovery with resumption on failure, has extremely high costs requiring constant optimization, and uses specialized distributed training approaches. If you are new to the processing units world, learn about CPUs vs GPUs vs TPUs in this article. Here's provisioning a GPU training environment with Pulumi: Python import pulumi_gcp as gcp cluster = gcp.container.Cluster( "ai-training-cluster", initial_node_count=1, remove_default_node_pool=True, location="us-central1-a" ) gpu_node_pool = gcp.container.NodePool( "a100-node-pool", cluster=cluster.name, location=cluster.location, node_count=10, node_config=gcp.container.NodePoolNodeConfigArgs( machine_type="a2-highgpu-8g", guest_accelerators=[ gcp.container.NodePoolNodeConfigGuestAcceleratorArgs( type="nvidia-tesla-a100", count=8 ) ] ) ) pulumi.export("cluster_name", cluster.name) Platform Engineering: The Abstraction Layer Platform engineering emerged as a discipline providing self-service infrastructure catalogs. Instead of learning Terraform or Pulumi directly, developers select pre-built templates for common use cases, customize a few parameters, and get infrastructure that meets organizational standards. The platform engineering stack consists of multiple layers working together. The self-service portal layer uses tools like Backstage, Port, and Humanitec to provide the developer interface. The IaC templates layer leverages Terraform modules and Pulumi components as reusable infrastructure patterns. Policy enforcement happens through OPA, Sentinel, or CrossGuard for governance and compliance. Deployment automation uses ArgoCD, Flux, or HCP Waypoint for GitOps workflows. Cost management relies on tools like Cloudability and Kubecost for spending visibility. Here's a self-service database template: Python import pulumi_aws as aws class DatabaseService(pulumi.ComponentResource): def __init__(self, name, args, opts=None): super().__init__('custom:database:Service', name, None, opts) self.db = aws.rds.Instance( f"{name}-db", engine="postgres", instance_class=args.get("instance_class", "db.t3.medium"), allocated_storage=args.get("storage_gb", 100), storage_encrypted=True, multi_az=args.get("environment") == "production", opts=pulumi.ResourceOptions(parent=self) ) Developers use this without understanding RDS details: Python user_database = DatabaseService( "user-service-db", args={ "database_name": "users", "team": "backend-team", "environment": "production" } ) Security and Compliance in the AI Era With AI tools generating more infrastructure code than ever, security validation has become critical. Google reports that 25% of its new code comes from AI, making automated security validation non-negotiable. Essential security tools include Checkov for static analysis of misconfigurations, tfsec for Terraform-specific security scanning, Terrascan for policy-as-code security, OPA for runtime policy enforcement, and Sentinel as HashiCorp's policy framework. These tools integrate at different points: Checkov and tfsec run in pre-commit hooks and CI/CD pipelines, OPA validates at runtime, and Sentinel enforces policies within HCP Terraform. Here's an example of how you might configure Checkov for your infrastructure repository: YAML # .checkov.yml branch: main download-external-modules: true framework: - terraform - terraform_plan - cloudformation soft-fail: false check: - CKV_AWS_20 # S3 bucket encryption - CKV_AWS_21 # S3 bucket versioning - CKV_AWS_19 # S3 bucket logging - CKV_AWS_18 # S3 bucket access logging - CKV_AWS_145 # S3 bucket KMS encryption - CKV2_AWS_6 # S3 bucket public access block skip-check: - CKV_AWS_23 # Skip unencrypted S3 for public static assets output: cli quiet: false Comprehensive IaC Platform Comparison OpenTofu, the open-source fork created after HashiCorp's license change, continued gaining traction in 2025 under the Linux Foundation. Organizations appreciated having a community-driven alternative without vendor lock-in concerns. Terraform uses the Business Source License, which is proprietary, while OpenTofu uses the Mozilla Public License 2.0. Governance differs significantly: Terraform is controlled by HashiCorp, which is now owned by IBM, while OpenTofu operates under the Linux Foundation with community governance. Terraform includes proprietary additions in its feature set, while OpenTofu maintains community-driven development. Enterprise support for Terraform comes through HCP Terraform, while OpenTofu relies on third-party vendors. Both maintain robust provider ecosystems, though Terraform's is officially backed by HashiCorp while OpenTofu's is community-maintained. Terragrunt also announced its own Stacks feature reaching GA in May 2025, providing orchestration capabilities for teams in the OpenTofu ecosystem. Gruntwork built Terragrunt Stacks through extensive community engagement, with the RFC gathering dozens of positive reactions and hundreds of comments from participants. After exploring the major developments in 2025, here's a comprehensive comparison of leading IaC platforms: FeatureTerraformOpenTofuPulumiFormaeCloudFormationCrossplaneLanguageHCL (proprietary)HCL (open)TypeScript, Python, Go, C#, JavaPKLYAML/JSONYAML (CRDs)LicenseBSL (proprietary)MPL 2.0 (open)Apache 2.0FSL → Apache 2.0Proprietary (AWS)Apache 2.0State ManagementLocal/remote filesLocal/remote filesSaaS backendStateless (reality = state)AWS-managedKubernetes etcdResource DiscoveryManual importManual importManual importAutomaticManual importKubernetes-nativeDrift DetectionPeriodic checksPeriodic checksPeriodic checksContinuous syncAWS-onlyController-basedTesting SupportLimitedLimitedNative unit/integrationBuilt-inLimitedKubernetes testsIDE SupportBasicBasicFull (LSP, completion)PKL toolingBasicYAML validationLearning CurveLearn HCL DSLLearn HCL DSLUse existing languageLearn PKLLearn CFN syntaxLearn K8s + CRDsMulti-CloudExcellentExcellentExcellentGrowingAWS onlyGoodProvider Ecosystem3000+ providersCommunity-maintained150+ packagesEarly stageAWS services80+ providersAI CapabilitiesMCP serversNoneNeo agentNoneNoneNoneGovernanceHashiCorp/IBMLinux FoundationPulumi CorpPlatform Eng LabsAWSCNCFBest ForMature ecosystemsOpen governanceDeveloper teamsBrownfield envsAWS-only shopsK8s-centric orgsBrownfield SupportManual processManual processManual processExcellentManual processK8s resources onlyEnterprise FeaturesHCP TerraformThird-partyPulumi CloudComingAWS OrgsEnterprise distrosCostFree/EnterpriseFreeFree/Team/EnterpriseOpen sourceFree (AWS costs)FreeDeployment SpeedModerateModerateFastFastModerateModerateCommunity SizeVery largeGrowingMediumSmallAWS communityGrowing The Challenges That Remain Despite progress, significant challenges persist. Only 6% achieved full codification. Configuration drift continues to plague teams. Multi-cloud complexity affects 65% of organizations. The human element remains crucial for defining policies, setting guardrails, and making architectural decisions. Conclusion The infrastructure community stands at an inflection point. Manual provisioning is legacy. The next frontier involves infrastructure that can observe its own state, reason about optimal configurations, and act autonomously. Project Infragraph represents this future. AI agents will reason about infrastructure state and act across the application lifecycle. These agents won't replace infrastructure engineers, but they'll handle repetitive tasks that currently burn out teams. As we close out 2025, one thing seems certain: infrastructure automation will only accelerate. The organizations that embrace these tools, invest in platform engineering, and leverage AI while maintaining proper guardrails will move faster than competitors still manually clicking through cloud consoles. The infrastructure has become code. Now the code is becoming intelligent.
C-suite executives are rushing to implement their AI transformation strategies. Visions of cost savings, streamlined workforces, and exploding productivity are making them foam at the mouth. Despite this AI feeding frenzy, however, many of the same execs are becoming disillusioned by the whole AI transformation boondoggle. AI tools and access to popular large language models (LLMs) are running up costs. Software developers have devolved into vibe coders as they churn out massive quantities of error-prone, unmaintainable code. Knowledge workers are forgoing the thinking parts of their jobs to produce drivel masquerading as work output. What gives? No one questions the transformative power of the technology. The business benefits are certainly right around the corner. Why, then, are AI transformation strategies foundering? The simple answer: executives haven’t taken the time to answer the basic question why. AI can certainly transform the business. But to what end? What are organizations trying to accomplish? Déjà Vu All Over Again If those questions elicit a sense of déjà vu, you’re not alone. We’ve been down this road before. Remember digital transformation? Organizations of all sizes rushed to leverage digital technologies to transform their businesses — with many successes, to be sure, but with a commensurate number of failures. What differentiated the successes was a clear statement of value — the why that we need now so desperately. Why digitally transform? The answer: to better meet customer needs. Digital transformation has always been about the customer: how to leverage digital technologies to better align the business with customer needs and desires. The lesson for AI transformation is simple: the why should always be about the customer. We might even go so far as to say that AI transformation is, in fact, digital transformation — with "digital technologies" updated for the AI era. Do You Know Who Your Customers Are? To understand the full breadth of digital transformation — and, by extension, AI transformation — we must broaden our definition of customer. First, customers are human. They may be individuals or groups of humans (as is the case for B2B), but customers aren’t bots or AI agents. Behind every AI "customer" are humans with human needs and desires. Second, we must extend our definition of customer to include employees. Employees provide value to their organizations and receive value in turn, just as paying customers do. In fact, including employees is critical for AI transformation success. In other words, digital transformation (and, by extension, AI transformation) is about meeting the needs of all the humans that interact with your organization — putting those humans at the center of your business. Why AI Transformation Has Gone So Wrong For many organizations, AI transformation involves replacing human interactions and human activities with automated ones. We can save so much money by replacing human customer service reps with AI avatars or chatbots! We can improve productivity by replacing level 1 support with AI agents! Just think of all the money we’ll save! Not so fast. Efficiency and cost savings at the expense of the customer (including the employee) are a path to failure. Instead, put customers and employees first and build an efficient and cost-effective business around them. This approach leads to leveraging digital technologies only when appropriate to meet these goals — and now that approach includes AI. Putting the Customer at the Center Taking a customer-centric approach to AI transformation raises the bar on the effort your organization must undergo to achieve the goals of the transformation. For example, ask yourself what the most delightful customer experience would be for someone reaching out to your contact center. If the customer wants an automated experience, make sure it is efficient, streamlined, and intelligent. But if the customer wants to interact with a human, then make sure that experience is wonderful. Support all communication channels with a unified support staff. Answer the phone (or chatbot or text, etc.) on the first ring. Make sure the person answering the phone can actually solve the customer problem. The point here is that AI can help you achieve these human interaction goals. There are tools today that turn your average customer service rep into a paragon of customer delight. Instead of disintermediating the rep, empower them. True, empowering human reps is more difficult than shifting all interactions to a chatbot — but taking this approach is the difference between the success and failure of your AI transformation. What About the Employee? Putting the employee at the center of your AI transformation is as important as putting your paying customers there — although the motivation for doing so is more subtle. Without an effective employee-first AI transformation strategy, you will fall into two primary pitfalls when implementing AI for your organization’s personnel: Pitfall #1: the dumbing down of your employees. If employees can simply use ChatGPT to write that report or put together that PowerPoint, then they no longer have to think about how to accomplish those tasks. Over time, they’ll become less and less proficient at tackling any activity that requires real thought. The end result: cube farms full of useless boneheads. Bonus pitfall: universities have the same problem. Students are using AI instead of thinking. So if you try hiring at the entry level to address this problem, you’ll simply onboard more useless boneheads. Pitfall #2: no junior people to become senior people. Replace your level 1 and level 2 support staff with AI (or junior developers or financial analysts or whoever). Keep the more senior staff to handle the trickier problems that the AI can’t solve. Everything is fine until those senior people move on, and now you have no junior people who have been learning the ropes to replace them.Bonus pitfall: you might think you can hire more senior people who’ve come up through the ranks in other organizations. Only those organizations have also replaced their junior staff with AI. Oops. To avoid these pitfalls, organizations must have a well-thought out employee AI training and governance strategy. People must know which AI tools to use, when and how to use them, and most importantly, when not to use them. In fact, this repositioning of human effort is at the heart of every successful AI transformation. It’s human nature to take the easy path, and AI is adept at creating such paths. Sometimes, however, the easy path isn’t the best one for achieving an organization’s strategic goals. The Intellyx Take To get your AI transformation back on track, refer to the lessons of digital transformation. Many organizations looking to transform made the same kinds of mistakes: favoring cost savings and efficiency over customer (and employee) delight. It may appear that AI offers transformative levels of cost savings and efficiency — but such benefits are illusory. In the end, you’ll push away customers and dumb down your workforce, thus defeating the purpose of the transformation. Instead, learn the lesson of digital transformation and put humans at the center of your AI transformation. True, you’ll end up passing up some short-term cost savings and productivity benefits. But remember: those are tactical concerns, while business transformation (a.k.a. digital or AI transformation) must be strategic. As with digital transformation efforts of the past, many organizations won’t take my advice. Just make sure your competition falls into that trap rather than you.
AI agents are growing at a breakneck pace and are becoming highly efficient at automating routine tasks. However, amid all the exciting innovation across different use cases, even the most advanced models fall short due to a fundamental limitation: real-world applicability. They can think autonomously, yet they struggle to act reliably in real-world environments. For all their reasoning power, large language models (LLMs) often remain isolated. To unlock their full usability, they must be connected to the right tools, data sources, and systems. This is where the Model Context Protocol (MCP) is rewriting the rules of the AI landscape. One could say that MCP is the missing layer in the current Agentic AI stack. It is a unifying protocol that provides models with a predictable way to integrate with external environments. Its power lies in being cleanly designed, extensible, and capable of working across a broad array of platforms and runtimes. While MCP is still in its early stages, its rapidly growing use cases already allow developers and enterprises to build automation and agent workflows with far greater confidence. In this sense, MCP is doing for AI what HTTP did for the web: laying the foundational bricks for an ecosystem of intelligent, interoperable, and highly capable systems. MCP in Action: Use Cases MCP opens up new possibilities in orchestrating workflows, integrations, and multi-agent coordination. One of the striking benefits is streamlined workflow automation. As an example, consider a marketing analytics platform. It is based on a plethora of AI models – from a sentiment analysis model and a content recommendation engine to a predictive sales model. Without the organizing layer of MCP, every model operates in silos. This often necessitates manual interventions to integrate systems or share contextual data. With an MCP – exchange of information across audience segmentation, campaign metadata, or engagement history becomes a breeze. This can result in comprehensive insights. In the area of tools and API integrations, MCP bridges the gap between AI and 3rd party software systems. Consider a scenario wherein a research assistant needs information from multiple data repositories and APIs. These could scientific journals, patent databases, or regulatory records. MCP harmonizes the contextual information the AI receives and sends. This ensures that the assistant retrieves relevant data and updates all downstream systems in real time. Multi-agent coordination is another area where MCP excels at. Consider a logistics use case wherein multiple AI agents take upon the tasks of route optimization, inventory management, and customer notifications. MCP becomes a glue in combining stock levels, shipment delays, or traffic updates – all without requiring custom connectors for every interaction. This entire system works cohesively with the changing business conditions. Benefits of Adopting MCP MCP drives efficiency: a major factor in standardizing framework for context sharing. Further, this efficiency can be extended to time-sensitive environments inclusive of real-time analytics or autonomous systems, where delays in context exchange can have cascading effects. Interoperability is another significant benefit of MCP. It can work as a “universal” language for AI models and data. Even legacy systems can be integrated with modern AI systems in a jiffy – combining third-party APIs and linking specialized datasets without developing custom connectors for each workflow. This has the potential to significantly accelerate development timelines. Lastly, scalability is achieved by MCP. Based on the business needs, and as organizations expand their scope of AI use cases – adding new models or agents can be done easily without rewriting existing logic. A plug and play approach can be adopted wherein the required component can plug into the ecosystem while maintaining consistent context exchange. This reduces operational friction in the long run and helps in driving complex AI deployments seamlessly. Future of Agentic AI with MCP MCP is becoming a pivotal enabler for Agentic AI systems —allowing them to operate autonomously, collaborate seamlessly, and adapt dynamically to complex environments. Minimal human intervention is required for agents to share context and coordinate actions. MCP also accelerates experimentation by enabling organizations to integrate cutting-edge models, tools, and datasets without custom coding. Researchers can simulate multi-agent environments, train models with dynamic contextual inputs, and deploy adaptive systems that evolve over time. Looking ahead, MCP is likely to underpin community-driven AI standards, promoting shared protocols that reduce fragmentation and improve reliability across industries. By adopting MCP, organizations position themselves at the forefront of agentic AI innovation, fostering ecosystems where autonomous agents collaborate safely, efficiently, and transparently. In essence, the future of agentic AI is one of connected, context-aware intelligence — and MCP is the missing link that turns this vision into reality. As adoption grows, MCP will not only streamline AI operations but also redefine how humans and intelligent systems work together, opening the door to a new era of autonomous, coordinated, and highly adaptive AI solutions. This creates a truly collaborative intelligence, where the sum of the system is far greater than its individual parts.
The Structural Limits of the Current Approach The industry is currently seeing a clear decoupling between the commercial roadmaps of vendors and the reality of engineering. The pressure to deploy Artificial General Intelligence (AGI) rests partly on a hypothesis of linearity. The idea is that increasing computing power will mechanically suffice to spark the emergence of human-level intelligence. But is this realistic? Gartner (1) predicts that AGI will not materialize for at least a decade. The analyst highlights that simply scaling current technologies will not suffice without several fundamental breakthroughs. Even by 2035, they consider it unlikely that AGI will truly be fully achieved. There is a fundamental confusion regarding the nature of the systems. Technology providers are already discussing "Superintelligence" (Artificial Superintelligence - ASI), an AI that vastly surpasses human capabilities. Yet, AGI and ASI require distinct architectures and parallel development paths. Gartner, in fact, identifies the pursuit of ASI as a risky approach, likely to create single points of failure, and suggests avoiding it. We must be realistic. We have high-performance models for simulation, but they remain structurally limited when it comes to performing complex cognitive tasks. Yet, Gartner places significant hope in the emergence of "reasoning models," expected to surpass traditional models thanks to "chain-of-thought" processes and self-reflection. However, deep analysis of these architectures reveals an intrinsic limitation. It involves a collapse in performance when facing complexity. Research (2) shows that as the number of logical steps increases, these models suffer a "complete performance collapse." Even more counter-intuitively, as they approach this breaking point, instead of redoubling their efforts, the models "reduce their reasoning effort," somewhat like an organism in a state of cognitive freeze. What happens is that on simple tasks, the model suffers from overthinking. In essence, it wastes resources exploring incorrect alternatives when it already possesses the solution. Then, on complex tasks, it can fixate on early incorrect attempts and fail to self-correct, thereby squandering its compute budget. It becomes clear that we are not dealing with an adaptive intelligence, but rather with a system that oscillates between pointless over-analysis and premature abandonment. The increase in computational and time costs inherent to these models does not in any way guarantee better reliability. It even turns out that this can, on the contrary, generate more costly errors. From the Single Organism to the Swarm If AGI proves unrealistic, the true breakthrough lies elsewhere, in an architectural paradigm shift. We must stop fantasizing about a monolithic and universal entity that would surpass us in every way. “Superintelligence, far surpassing human capability, can become problematic when in the wrong hands or be a single point of failure.” — Gartner (1) The future would then belong to Augmented Collective Intelligence (ACI). To draw a biological analogy, picture not a giant brain, but an immune system. That is, a swarm of specialized agents collaborating in real time with human operators. In such an architecture, intelligence would emerge from the network, not from a single node. The goal would not be human obsolescence, but rather the improvement of the quality of life and the augmentation of human capability. AI would handle the massive processing of information, while humans would retain control over intention and ethics. However, implementing these swarms imposes strict industrial realism regarding the hardware requirements of intelligence. The recent study "LLM-Powered Swarms" (3) provides a critical corrective to theoretical projections. For reflex coordination tasks (simulating swarm behavior, such as bird flocks or schools of fish), the LLM agent approach can prove "approximately 300 times more computationally expensive" than classical algorithms. In this instance, this obviously renders real-time deployment prohibitive. The real gain of this agent-based approach, therefore, lies not in execution speed but in decisional plasticity. On complex optimization topics (such as Ant Colony Optimization), LLM agents demonstrate superior learning stability and a better ability to transition from exploration to exploitation. The architectural conclusion suggests that the future lies not in "all-generative" models, but rather in hybrid architectures. In this model, the LLM would handle "high-level strategic reasoning" (context-dependent decision-making), while classical algorithms would generate low-latency reflex execution. Architectural Hybridization (QPU + GPU) The current limitations of AI are not exclusively software-based; they are also physical. Our conventional computing infrastructures are hitting a physical limit of density and energy efficiency. To simulate materials chemistry or optimize global logistics networks, classical AI exhausts itself through brute force. The engineering response is structural. It lies in hybridization. The QPU as a Cognitive Accelerator Quantum computer technologies are not intended to replace our classical computers. Rather, for many use cases, we can view them as specialized coprocessors (QPUs) that would insert themselves into data centers alongside GPUs. Recent work on Hybrid Quantum-Classical Neural Networks (HQCNNs) (4) formalizes this architecture. In this setup, the classical network handles general perception, while the quantum circuit is called upon specifically to explore massive "feature maps" inaccessible to classical methods. We are not asking quantum to run Excel; we are asking it to unlock optimization knots that classical AI would take centuries to unravel. Sobriety as a Performance Vector Unlike the race for gigantism driving American LLMs, European excellence (and particularly French excellence) is betting heavily on energy efficiency. The paradigm is shifting. The goal is no longer "Quantum Supremacy," but "Energy Advantage." Pasqal's neutral atom architectures (5) or Alice & Bob's "Energetic Optimisation of Quantum Circuits" (OECQ) project in partnership with the French National Center for Scientific Research (CNRS) and EDF (the French electric utility giant) (6) aim to execute these complex calculations with a fraction of the energy required by an exascale supercomputer. Toward the Quantum Digital Twin (NQDT) Finally, according to Gartner, this intelligence will not remain confined to the cloud. They think it could even be embodied in spatial computing (11). But the true architectural breakthrough plays out at the level of fundamental simulation. Recent work on Neural Quantum Digital Twins (NQDT) (7) is redefining the standard. Digital twins are no longer simple passive 3D replicas, but "neural surrogates" capable of reconstructing the energy landscape of a complex system. In this architecture, AI, via neural networks, does not content itself with analyzing data. It "learns the physics" of the system to simulate its excited states and optimize processes. Here, we are no longer speaking of classical modeling but of "predictive scenarization" for quantum control. This allows for the identification of evolutionary optimums (annealing schedules) that classical engineering could not perceive due to a lack of access to the full energy spectrum. Strategic Impact: From Speculative to Operational It is certainly realistic to sideline AGI for the time being. It appears reasonable, at least for now, that our roadmap should fund technologies with a likely return in the relatively near term. The mistake would be to expect AI to become capable of "thinking" (general cognition) when its immediate industrial value add lies in predicting and "simulating." Gartner identifies Intelligent Simulation (the convergence of digital twins, generative AI, and quantum) as the true engine of performance. Within Intelligent Simulation, with 31% of use cases focused on prediction (and just as many on digital twins), it is in this segment that immediate ROI crystallizes. The objective is not content generation, but the modeling of complex scenarios to reduce decision-making uncertainty. “By 2030, 90% of humans will engage with smart robots on a daily basis, due to smart robot advancements in intelligence, social interactions and human augmentation capabilities, up from less than 10% today.” - Gartner (1) This shift imposes a physical transformation of operations. The managerial response must not be replacement, but the "rewiring" of processes. AI and robotics serve to augment operator capabilities. The major strategic risk is no longer the obsolescence of the worker, but the organizational inability to collaborate effectively with these autonomous agents. Conclusion: Toward an Architecture of Symbiosis Ultimately, the quest for AGI currently acts as a decoy. If we believe Gartner's projections, this "Superintelligence" will not materialize for at least a decade. In the same spirit, experts like Yann LeCun, often described as one of the founding fathers of deep learning and now Chief AI Scientist at Meta, defend the view that the current approach is a structural dead end for AGI. He insists that auto-regressive models (LLMs) are intrinsically incapable of reasoning or planning because they lack a "World Model." Thus, according to him, promising AGI with current technology has less to do with science than with magical thinking. For decision-makers, the imperative lies in the necessity to decouple this speculation from their roadmaps. Value does not lie in waiting for an artificial consciousness that is improbable in the short to medium term, but in the immediate deployment of realistic use cases. We are referring here to predictive simulation, the orchestration of specialized agents, and the augmentation of human capabilities. The industrial future will not be written tomorrow with a monolithic model that "knows everything", but through a distributed hybrid architecture. An architecture where the Quantum Computer can provide the brute power to resolve exponential physical complexity, where AI Agents will execute reflex processes, and where the Human will retain strategic mastery. Sources and References Gartner - “Trending Questions on AI and Emerging Technologies” [link]P. Shojaee, I. Mirzadeh, K. Alizadeh, M. Horton, S. Bengio, M. Farajtabar - “The Illusion of Thinking: Understanding the Strengths and Limitations of Reasoning Models via the Lens of Problem Complexity” [link]M. Atta Ur Rahman, M. Schranz, S. Hayat - “LLM-Powered Swarms: A New Frontier or a Conceptual Stretch?” [link]A. H. ABBAS - “TunnElQNN: A Hybrid Quantum-classical Neural Network for Efficient Learning” [link]Pascal - “Towards Regenerative Quantum Computing with Proven Positive Sustainability Impact” [link]N. Coppola - “EDF, Alice & Bob, Quandela and CNRS Partner to Optimize Quantum Computing’s Energy Efficiency” [link]J. Lu, H. Peng, Y. Chen - “Neural Quantum Digital Twins for Optimizing Quantum Annealing” [link]BBC - “Meta scientist Yann LeCun says AI won't destroy jobs forever” [link]R. T. McCoy, S. Yao, D. Friedman, M. Hardy, T. L. Griffiths - “Embers of Autoregression: Understanding Large Language Models Through the Problem They are Trained to Solve” [link] F. Jacquet - “Debunking LLM Intelligence: What's Really Happening Under the Hood?” [link]Gartner - “Spatial Computing Creates Immersive Experiences for Businesses and Customers Alike” [link]
You snap a photo of a hotel lobby and ask your AI assistant, "Find me places with this vibe." Seconds later, you get recommendations. No keywords, no descriptions — just an image and a question. This is multimodal AI in action. For years, AI models operated in silos. Computer vision models processed images. Natural language models handled text. Audio models transcribed speech. Each was powerful alone, but they couldn't talk to each other. If you wanted to analyze a video, you'd need separate pipelines for visual frames, audio tracks, and any text overlays, then somehow stitch the results together. Not anymore. What Is Multimodal AI?Multimodal AI systems process and understand multiple data types simultaneously — text, images, video, audio — and crucially, they understand the relationships between them. The core modalities: Text: Natural language, code, structured dataImages: Photos, diagrams, screenshots, medical imageryVideo: Sequential visual data with audio and temporal contextAudio: Speech, environmental sounds, musicGIFs: Animated sequences (underrated for UI tutorials and reactions How Multimodal Systems Actually WorkThink of it like a two-person team: One person describes what they see ("There's a red Tesla at a modern glass building, overcast sky, three people in business attire heading inside"), while the other interprets the context ("Likely a corporate HQ. The luxury EV and professional setting suggest a high-level business meeting"). Modern multimodal models work similarly — specialized components handle different inputs, then share information to build unified understanding. The breakthrough isn't just processing multiple formats; it's learning the connections between them. In this guide, we'll build practical multimodal applications — from video content analyzers to accessibility tools — using current frameworks and APIs. Let's start with the fundamentals. How Multimodal AI Works Behind the ScenesLet's walk through what actually happens when you upload a photo and ask, "What's in this image?" The Three Core Components1. Encoders: Translating to a Common Language Think of encoders as translators. Your photo and question arrive in completely different formats —pixels and text. The system can't compare them directly. Vision Encoder: Takes your image (a grid of RGB pixels) and converts it into a numerical vector —an embedding. This might look like [0.23, -0.41, 0.89, 0.12, ...] with hundreds or thousands of dimensions. Text Encoder: Takes your question "What's in this image?" and converts it into its own embedding vector in the same dimensional space. The key: These encoders are trained so that related concepts end up close together. A photo of a cat and the word "cat" produce similar embeddings — they're neighbors in this high-dimensional space. 2. Embeddings: The Universal Format An embedding is just a list of numbers that captures meaning. But here's what makes them powerful: Similar concepts have similar embeddings (measurable by cosine similarity)They preserve relationships (king - man + woman ≈ queen)Different modalities can share the same embedding spaceWhen your image and question are both converted to embeddings, the model can finally "see" how they relate. 3. Adapters: Connecting Specialized Models Here's where it gets practical. Many multimodal systems don't build everything from scratch — they connect existing, powerful models using adapters. What's an adapter? A lightweight neural network layer that bridges two pre-trained models. Think of it as a translator between two experts who speak different languages. Common pattern: Pre-trained vision model (like CLIP's image encoder) → Adapter layer → Pre-trained language model (like GPT)The adapter learns to transform image embeddings into a format the language model understandsThis is how systems like LLaVA work — they don't retrain GPT from scratch. They train a small adapter that "teaches" GPT to understand visual inputs. Walking Through: Photo + QuestionLet's trace exactly what happens when you ask, "How many people are in this photo?" Step 1: Image Processing Your photo → Vision Encoder → Image embedding [768 dimensions]The vision encoder (often a Vision Transformer or ViT) processes the image in patches, like looking at a grid of tiles, and outputs a rich numerical representation. Step 2: Question Processing "How many people are in this photo?" → Text Encoder → Text embedding [768 dimensions]Step 3: Adapter Alignment Image embedding → Adapter layer → "Visual tokens"The adapter transforms the image embedding into "visual tokens" — fake words that the language model can process as if they were text. You can think of these as the image "speaking" in the language model's native tongue. Step 4: Fusion in the Language Model The language model now receives: [Visual tokens representing the image] + [Text tokens from your question]It processes this combined input using cross-attention — essentially asking: "Which parts of the image are relevant to the question about counting people?" Step 5: Response Generation Language model → "There are three people in this photo." Why This Architecture MattersModularity: You can swap out components. Better vision model released? Just retrain the adapter. Efficiency: Training an adapter (maybe 10M parameters) is far cheaper than training a full multimodal model from scratch (billions of parameters). Leverage existing strengths: GPT-4 is already great at language. CLIP is already great at vision. Adapters let them collaborate without losing their individual expertise. The Interaction Flow Real-World Applications That Actually MatterUnderstanding the architecture is one thing. Seeing it solve real problems is another. Healthcare: Beyond Single-Modality DiagnosticsMedical diagnosis has traditionally relied on specialists examining individual data types —radiologists read X-rays, pathologists analyze tissue samples, and physicians review patient histories. Multimodal AI is changing this paradigm. Microsoft's MedImageInsight Premium demonstrates the power of integrated analysis, achieving 7-15% higher diagnostic accuracy across X-rays, MRIs, dermatology, and pathology compared to single-modality approaches. The system doesn't just look at an X-ray in isolation — it understands how imaging findings relate to patient history, lab results, and clinical notes simultaneously. Oxford University's TrustedMDT agents take this further, integrating directly with clinical workflows to summarize patient charts, determine cancer staging, and draft treatment plans. These systems will pilot at Oxford University Hospitals NHS Foundation Trust in early 2026, representing a significant step toward production deployment in critical healthcare environments. The implications extend beyond accuracy improvements. Multimodal systems can identify patterns that span multiple data types, potentially catching early disease indicators that single-modality analysis would miss. E-commerce: Understanding Intent Across ModalitiesThe retail sector is experiencing a fundamental transformation through multimodal AI that understands customer intent expressed through images, text, voice, and behavioral patterns simultaneously. Consider a customer uploading a photo of a dress they saw at a wedding and asking, "Find me something similar but in blue and under $200." Traditional search requires precise keywords and filters. Multimodal AI understands the visual style, color transformation request, and budget constraint in a single query. Tech executives predict AI assistants will handle up to 20% of e-commerce tasks by the end of 2025, from product recommendations to customer service. Meta's Llama 4 Scout, with its 10 million token context window, can maintain a sophisticated understanding of customer interactions across multiple touchpoints, remembering preferences and providing genuinely personalized experiences. Content Moderation: Evaluating Context, Not Just ContentContent moderation has evolved from simple keyword filtering to sophisticated context-aware systems that evaluate whether content violates policies based on the interplay between text, images, and audio. OpenAI's omni-moderation-latest model demonstrates this evolution, evaluating images in conjunction with accompanying text to determine if content contains harmful material. The system shows a 42% improvement in multilingual evaluation, with particularly impressive gains in low-resource languages such as Telugu (6.4x) and Bengali (5.6x). Companies like Grammarly and ElevenLabs have integrated these capabilities into their safety infrastructure, ensuring that AI-generated content across multiple modalities meets safety standards. The key advancement isn't just detecting problematic content but also understanding when context makes seemingly innocuous content harmful, or when potentially sensitive content is actually acceptable within a proper context. Accessibility: Breaking Down Digital BarriersMultimodal AI is revolutionizing accessibility by creating systems that can process text, images, audio, and video simultaneously to identify and remediate accessibility issues in real-time. New vision-language models can generate alt text that describes not just what's in an image, but the relationships, contexts, and implicit meanings that make images comprehensible to users who can't see them. Advanced personalization engines can automatically adjust contrast for users with low vision in the evening, simplify language complexity for users who need it, or predict when someone might need additional navigation support. Practical implementations already exist: OrCam wearable devices for people who are blind instantly read text, recognize faces, and identify products using multimodal AI. WordQ and SpeakQ help people with dyslexia or ADHD by combining text analysis with speech synthesis to suggest words and read text aloud. By 2026 to 2027, AI-powered accessibility scans are projected to detect approximately 70% of WCAG success criteria with 98% accuracy, dramatically reducing the manual effort required to make digital content accessible. What Actually Goes Wrong at ScaleThe technical literature often glosses over practical difficulties that trip up real implementations: Data alignment is deceptively difficult. Synchronizing dialogue with facial expressions in video or mapping sensor data to visual information in robotics requires precision that can fundamentally corrupt your model's understanding if done incorrectly. A 100-millisecond audio-video desynchronization might seem trivial, but it can teach your model that people's lips move after they speak. Computational demands are substantial. Multimodal fine-tuning requires 4-8x more GPU resources than text-only models. Recent benchmarking shows that optimized systems can achieve 30% faster processing through better GPU utilization, but you're still looking at significant infrastructure investment. Google increased its AI spending from $85 billion to $93 billion in 2025 largely due to multimodal computational requirements. Cross-modal bias amplification represents an insidious challenge. When biased inputs interact across modalities, effects compound unpredictably. A dataset with demographic imbalances in images combined with biased language patterns can create systems that appear more intelligent but are actually more discriminatory. The research gap is substantial — Google Scholar returns only 33,400 citations for multimodal fairness research, compared with 538,000 for language model fairness. Legacy infrastructure struggles. Traditional data stacks excel at SQL queries and batch analytics but struggle with real-time semantic processing across unstructured text, images, and video. Organizations often must rebuild entire data pipelines to support multimodal AI effectively. What's Coming: Trends Worth Watching Several emerging developments are reshaping the landscape: Extended context windows of up to 2 million tokens reduce reliance on retrieval systems, enabling more sophisticated reasoning over large amounts of multimodal content. This changes architectural decisions—instead of chunking content and using vector databases, you can process entire documents, videos, or conversation histories in a single pass. Bidirectional streaming enables real-time, two-way communication where both human and AI can speak, listen, and respond simultaneously. Response times have dropped to 0.32 seconds on average for voice interactions, making the experience feel genuinely natural rather than transactional. Test-time compute has emerged as a game-changer. Frontier models like OpenAI's o3 achieve remarkable results by giving models more time to reason during inference rather than simply scaling parameters. This represents a fundamental shift from training-time optimization to inference-time enhancement. Privacy-preserving techniques are maturing rapidly. On-device processing and federated learning approaches enable sophisticated multimodal analysis while keeping sensitive data local, addressing the growing concern that multimodal systems create detailed personal profiles by combining multiple data types. The Strategic Reality By 2030, Gartner predicts that 80% of enterprise software will be multimodal. This isn't a gradual evolution — it's a fundamental restructuring of how AI systems perceive and interact with information. However, Deloitte survey data reveals a sobering implementation gap: while companies actively experiment with multimodal AI, most expect fewer than 30% of current experiments to reach full scale in the next six months. The difference between recognizing potential business value and successfully delivering it in production remains substantial. Success requires more than technical capability. Organizations must address computational requirements, specialized talent acquisition (finding professionals who understand computer vision, NLP, and audio processing simultaneously is challenging), and ethical frameworks that account for cross-modal risks rather than isolated data flaws. The promise of multimodal AI is substantial, but it demands responsible exploration with higher standards of data integration, fairness, and security. As these systems mature toward more natural, efficient, and capable interactions that mirror human perception and cognition, they will become the foundation for a new generation of AI applications. The transformation is already underway. The developers and organizations that begin building multimodal capabilities now — while proactively addressing the associated challenges — will be best positioned to capitalize on this fundamental shift in artificial intelligence capabilities. The era of AI systems that truly understand the world, rather than just processing isolated data streams, has arrived. It's time to build accordingly
Generative AI (GenAI) is rapidly transforming the landscape of intelligent applications, driving innovation across industries. Python has emerged as the language of choice for GenAI development, thanks to its simplicity, agility in prototyping, and a rich ecosystem of machine learning libraries like TensorFlow, PyTorch, and LangChain. However, Java — long favored for enterprise-scale systems — is actively evolving to stay relevant in this new paradigm. With the rise of Spring AI, Java developers now have a growing toolkit to integrate GenAI capabilities without abandoning their existing infrastructure. While switching from Java to Python is technically feasible, it often involves a shift in development culture and tooling preferences. The convergence of these two ecosystems — Python for experimentation and Java for scalability — offers a compelling narrative for hybrid GenAI architectures. As GenAI continues to mature, both languages will play complementary roles in shaping robust, scalable, and intelligent systems. This article explores the Retrieval-Augmented Generation (RAG) architecture, a powerful approach that combines the strengths of information retrieval and generative models to build intelligent, context-aware question-answering systems. The focus is on implementing a question-answering system grounded in the classic narratives of Aesop's Fables. To demonstrate the versatility of RAG, two parallel implementations are presented: one using Python, leveraging Hugging Face Transformers and relevant NLP libraries; and the other using Java, built upon the emerging capabilities of the Spring AI framework. The comparative analysis of both implementations highlights practical considerations, including developer productivity, ecosystem maturity, and integration with existing enterprise systems. Accompanied by a comprehensive guide to kick-start journey to Spring AI, finally a viable weapon in Java arsenal, the article concludes that while developers already proficient in Python could continue leveraging its rich ML ecosystem, the Java engineers — especially those maintaining enterprise applications — finally have viable tools to adopt similar AI capabilities through Spring AI, enabling them to remain within their familiar development stack without compromising on innovation. Regardless of their respective ethos, the AI gap between Python and Java has started to close in. 1. Use Case: This article demonstrates a Retrieval-Augmented Generation (RAG) system designed to enable users to interact with a web-based interface for querying arbitrary documents. The core functionality allows users to input natural language questions, which are then processed through a RAG pipeline to retrieve relevant content and generate contextually accurate responses. To illustrate the system's capabilities, we use a sample document — Aesop's Fables in PDF format — as a reference corpus. However, the architecture is fully adaptable to any document type or domain-specific dataset, making it suitable for a wide range of applications, including legal, academic, and enterprise contexts. The implementation is presented in two parallel technology stacks: Python, leveraging its rich ecosystem of machine learning and GenAI libraries for rapid prototyping; and Java, showcasing the emerging capabilities of the Spring AI framework. This dual-stack approach highlights the respective strengths and development philosophies of both languages, while emphasizing Java’s evolving relevance in the generative AI landscape. . Figure 1: A Basic RAG Architecture 2. Python Implementation Python has rapidly ascended to dominance in the fields of data science, machine learning, and artificial intelligence due to its simplicity, readability, and vast ecosystem of specialized libraries such as NumPy, Pandas, TensorFlow, and PyTorch. Its intuitive syntax and strong community support make it the preferred language for both rapid prototyping and production-grade AI solutions. As the demand for intelligent systems grows, Python continues to evolve as the backbone of modern computational innovation. The full code base is available here: https://github.com/trainerpb/article-communications/blob/feature/manuscript/Building%20RAG-Powered%20AI%20in%20Python%20and%20Java%20Tales%20of%20Two%20Stacks/mauscript/rag_search_langchain_opensource_article/ 2.1 How It Works User Interaction: The user enters a question about "Aesop's Fables" in the text area.The user clicks the "Find Answer" button to submit the question.Backend Processing: The StoryContentSearchinstance processes the question using the RAG pipeline: The PDF is loaded and split into chunks.Relevant chunks are retrieved based on the question.A language model generates an answer using the retrieved context.Answer Display: The generated answer is displayed on the web app. 2.2 rag_serach_ui.py This Python script creates a simple web-based user interface (UI) for a question-answering system using Streamlit, a popular framework for building interactive web apps in Python. The script integrates with the class (defined in the rag_search_opensource_article.py file) to allow users to ask questions about the content of a specific PDF document, "Aesop's Fables for Children." 2.2.1 Overview of the Code The script provides a user-friendly interface for interacting with a retrieval-augmented generation (RAG) pipeline. It allows users to input a question, processes the question using the RAG pipeline, and displays the generated answer. The backend logic for document processing and question answering is handled by the StoryContentSearch class. 2.2.2 Key Components Streamlit SetupThe script uses Streamlit to create a web-based interface.st.title sets the title of the app: "You can ask questions from Aesop's Fables and this bot will answer."st.text_area provides a text box for users to input their question.st.button creates a button labeled "Find Answer" that triggers the question-answering process when clicked. 2.2.3 Integration The StoryContentSearch class is imported from the rag_search_opensource_article.py file. This class implements the RAG pipeline for processing the PDF document and generating answers.An instance of StoryContentSearch is created with the path to the PDF file of Aesop’s fableThe process_in_chain method of the StoryContentSearch class is called with the user's question as input. This method: Retrieves relevant chunks from the PDF document.Generates an answer using a language model. Returns the answer as a string. 2.2.4 Displaying the Answer The result of the process_in_chain method is displayed using st.success, which highlights the answer in a visually appealing way. 2.3 rag_search_opensource_article.py The full code is given and explained below: Python from langchain_huggingface import ChatHuggingFace, HuggingFaceEndpoint from langchain_community.document_loaders import PyPDFLoader from langchain.text_splitter import RecursiveCharacterTextSplitter from langchain_community.vectorstores import FAISS from langchain_nomic import NomicEmbeddings from langchain_core.prompts import PromptTemplate from langchain_core.runnables import RunnableParallel, RunnablePassthrough, RunnableLambda from langchain_core.output_parsers import StrOutputParser from dotenv import load_dotenv class StoryContentSearch: def __init__(self, file_path): load_dotenv() self.file_path = file_path self.faiss_file_name = file_path.replace(".pdf", "_faiss_index_nomic") self.loader = PyPDFLoader(file_path) def create_full_context(self, docs): full_content = "" for doc in docs: full_content += doc.page_content + " " return full_content def check_faiss_file_exists(self, faiss_file_name): import os return os.path.exists(faiss_file_name) def format_docs(self, docs): return "\n\n".join([doc.page_content for doc in docs]) def process_in_chain(self, question): print("Processing question:", question) embeddings = NomicEmbeddings(model="nomic-embed-text-v1.5") if self.check_faiss_file_exists(self.faiss_file_name): print(f"FAISS file {self.faiss_file_name} exists. Loading from file.") vector_store = FAISS.load_local(self.faiss_file_name, embeddings, allow_dangerous_deserialization=True) print("Vector store loaded from FAISS file.") else: docs = self.loader.load() full_context = self.create_full_context(docs) splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=100) docs = splitter.create_documents([full_context]) print(f"Number of chunks: {len(docs)}") vector_store = FAISS.from_documents(docs, embeddings) print("Vector store created with FAISS.") vector_store.save_local(self.faiss_file_name) print(f"Vector store locally saved as {self.faiss_file_name}") retriever = vector_store.as_retriever(search_type="similarity", search_kwargs={"k":6}) prompt = PromptTemplate(template=""" You are a helpful AI assistant. Use only the following pieces of context to answer the question at the end. If the context is insufficient, just say you don't know. Do not try to make up an answer. {context} Question: {question} """, input_variables=['context','question']) llm = HuggingFaceEndpoint( repo_id="meta-llama/Llama-3.2-3B-Instruct", task="text-generation" ) opensourcemodel = ChatHuggingFace(llm=llm) parallel_chain = RunnableParallel( {"context": retriever | RunnableLambda(self.format_docs), "question": RunnablePassthrough()}) parser = StrOutputParser() final_chain = parallel_chain | prompt | opensourcemodel | parser answer = final_chain.invoke(question) print(answer) return answer 2.3.1 Overview of the Code The script uses the LangChain ecosystem to combine document processing, vector-based retrieval, and large language model (LLM) inference. It is designed to handle the following tasks: Load a PDF document and split it into manageable chunks.Create or load a FAISS vector store for semantic search.Retrieve the most relevant chunks based on a user query.Use a language model to generate an answer based on the retrieved context. 2.3.2 How It Works Document Loading: The PDF is loaded, and its content is split into smaller chunks for efficient processing. Vector Store Creation: The chunks are embedded into a vector space using, and a FAISS index is created for similarity search. Question Answering: The input question is used to retrieve the most relevant chunks from the vector store.These chunks are passed as context to a language model, which generates an answer. 2.3.3 Key Components Imports and Dependencies The script relies on several key libraries: LangChain Modules: PyPDFLoader: Loads PDF documents.RecursiveCharacterTextSplitter: Splits text into smaller chunks for processing.FAISS: A vector store for efficient similarity search.NomicEmbeddings: Generates embeddings for text chunks.PromptTemplate: Structures the input for the language model.RunnableParallel, RunnablePassthrough, RunnableLambda: Enable parallel and modular processing.StrOutputParser: Parses the output of the language model.Hugging Face Integration: HuggingFaceEndpoint and ChatHuggingFace: Connect to a Hugging Face model for text generation. Environment Management: dotenv: Loads environment variables for configuration. 2.3.4 Class: StoryContentSearch The StoryContentSearch class is the core of the script. It is initialized with the path to a PDF file and provides methods for processing the document and answering questions. Initialization (__init__) The constructor accepts a file_path for the PDF document.It sets up the FAISS index file name by appending _faiss_index_nomic to the PDF file name.A PyPDFLoader instance is created to load the document. create_full_context Combines the content of all pages in the document into a single string.This is used to prepare the document for chunking. check_faiss_file_exists Checks if the FAISS index file already exists on disk.This avoids redundant computation by reusing the existing index. format_docs Formats retrieved documents into a readable string by concatenating their content with double newlines. 2.3.5 Method: This method implements the RAG pipeline using a chain-based approach. It performs the following steps: Step 1: Load or Create FAISS Vector Store Check for Existing Index: If the FAISS index file exists, it is loaded using FAISS.load_local. • Create a New Index: If the index does not exist: 1. The PDF is loaded using PyPDFLoader. 2. The document is converted into a single string using create_full_context. 3. The text is split into smaller chunks using RecursiveCharacterTextSplitter (chunk size: 1000 characters, overlap: 100 characters). 4. Embeddings are generated for the chunks using NomicEmbeddings. 5. A FAISS vector store is created and saved locally. Step 2: Retrieve Relevant Chunks A retriever is created from the FAISS vector store to find the top 6 most relevant chunks based on the input question. Step 3: Create a Prompt A PromptTemplate is used to structure the context and question into a prompt for the language model. The template ensures that the model only uses the provided context to answer the question. Step 4: Parallel Processing A RunnableParallel chain is used to process the context and question in parallel: The retriever fetches relevant documents, which are formatted using a lambda function (RunnableLambda).The question is passed through as-is using RunnablePassthrough. Step 5: Language Model Inference The prompt is passed to a Hugging Face model (meta-llama/Llama-3.2-3B-Instruct) for generating an answer. The output is parsed into a string format using StrOutputParser. Step 6: Return the Answer The generated answer is printed and returned. 2.4 Key Features Interactive Interface: The Streamlit app provides a simple and intuitive interface for users to interact with the question-answering system.Backend Integration: The app seamlessly integrates with the StoryContentSearch class, which handles the heavy lifting of document processing and language model inference.Efficient Retrieval: The FAISS vector store enables fast and accurate retrieval of relevant document chunks.Reusable Index: The FAISS index is saved locally, reducing computation for subsequent queries.Parallel Processing: The RunnableParallel chain allows for efficient and modular processing of the context and question.Customizable Prompt: The PromptTemplate ensures that the language model receives structured input.Real-Time Answering: The app processes the question and displays the answer in real time, making it responsive and user-friendly. 2.5 Summary In summary, this script provides a simple yet powerful interface for a question-answering system based on "Aesop's Fables." By leveraging Streamlit for the UI and the StoryContentSearch class for backend processing, the app demonstrates how to build an interactive RAG-based application that combines semantic search and generative AI. The development is awfully fast and straightforward for prototyping. However, to deploy it to existing big Enterprise applications requires an HTTP / gRPC-based API call, bringing in yet another point of hop and thus, failure; especially for reactive service clients and a philosophical shift to Python. 3. Java + Spring AI Approach Java and Spring Boot are widely recognized for their robust support of scalable, cloud-native microservices architectures. In the evolving landscape of generative AI, Spring AI introduces a powerful abstraction layer that simplifies integration with various GenAI providers. By embracing Spring’s core philosophy of auto-configuration and profile-specific beans, Spring AI enables developers to seamlessly adapt to provider-specific API changes while maintaining clean, modular, and maintainable codebases. The full code base is available at https://github.com/trainerpb/simple-spring-ai-rag-example/tree/feature/article-manuscript 3.1 Spinning up a DEV environment The Spring AI project was developed using Java 17 or later, with Maven for dependency management. Although any IDE could be used, IntelliJ IDEA was the one selected. Since a vector database was required, pgVector was deployed through Docker. In addition, the Docker model ai:gemma3 was executed, and Nomic Embed Text v1.5 was run within LM Studio. No further instruction on set up or installation of the Tools, IDE is discussed with the assumption that our readers are aware of those cross-cutting stuff; to ensure brevity. However, the following extensions and table and index in the PostgreSQL were created CREATE EXTENSION IF NOT EXISTS vector CREATE EXTENSION IF NOT EXISTS hstore; CREATE EXTENSION IF NOT EXISTS "uuid-ossp"; CREATE TABLE IF NOT EXISTS vector_store ( id UUID DEFAULT uuid_generate_v4() PRIMARY KEY, content text, metadata json, embedding vector(768) ); CREATE INDEX ON vector_store USING HNSW (embedding vector_cosine_ops); 3.2 Dependencies To implement a Retrieval-Augmented Generation (RAG) use case within a Spring MVC application, several specialized dependencies from the Spring AI ecosystem are required. These libraries enable seamless integration with vector stores and generative AI models, while adhering to Spring Boot’s principles of auto-configuration and modular design. In addition to the standard dependencies commonly used in Spring MVC applications, the following components are essential for enabling document retrieval and generation capabilities: These dependencies collectively support the construction of a scalable, cloud-native RAG pipeline within the Spring framework, making it easier to build intelligent applications that interact with arbitrary documents through a web-based interface. 3.3 Configuration The file application.properties is described as below: App Specific app.vector.load-on-start-up true Whether to load the PDF app.vector.pdf.file-path C:\Users\My User1\Downloads\Aesops_Fables_for_children.pdf Path to the PDF usd in the example Spring Application & Logging spring.application.name <<Our app name>> App name logging.level.org.springframework.ai << LOG_LEVEL>> Log level of the package Database (PostgreSQL) spring.datasource.url jdbc:postgresql://myHost:PG_PORT/ragdb?options=-c%20TimeZone=MT%20%Timezone spring.datasource.username <<user name>> spring.datasource.password <<password> spring.datasource.driver-class-name org.postgresql.Driver AI (Spring AI / OpenAI Config) spring.ai.openai.base-url http://localhost:1234 Embedding model via LM Studio spring.ai.openai.api-key <<ignored>> spring.ai.openai.embedding.options.model text-embedding-nomic-embed-text-v1.5 spring.ai.openai.chat.options.model ai/gemma3 spring.ai.openai.chat.base-url http://localhost:12434/engines Gemma3 via DMR spring.ai.openai.chat.api-key <<ignored>> Vector (pgVector Config) spring.ai.vectorstore.pgvector.max-document-batch-size 10000 spring.ai.vectorstore.pgvector.index-type HNSW spring.ai.vectorstore.pgvector.distance-type COSINE_DISTANCE spring.ai.vectorstore.pgvector.dimensions 768 spring.ai.vectorstore.pgvector.schema-validation true spring.ai.vectorstore.pgvector.table-name vector_store Default name of the table 3.4 Key Components The class PDFServiceprovides three methods for loading, chunking a PDF and saving embedding Documents to the VectorStore. The Vector service, in this small example, is auto-configured by Spring framework.Unlike the Python implementation, a very simple, non-recursive strategy is used to help understand quickly.Apache Tika library can safely parse a document regardless of file extension.The basic terminologies of Spring AI are mostly similar with the gen AI terminologies Python uses. The class PdfChunkIngestorOnStartup Ingests the Vector store with the documents , using the service class PDFService when Spring application loads.Ingest-on-start-up behavior is tuneable via the property app.vector.load-on-start-up.The class RagService Uses ChatClient and VectorStore.Contains a single method retrieveAndGenerateStreaming that returns Flux<String> to achieve streaming chat. We use a SearchRequest searchRequest = SearchRequest.builder().query(msg).topK(3).build(); to query similarity with top 3 results from the vecor store, and convert all documents thus retrieved to a single string : List<Document> similaritySearchDocuments = vectorStore.similaritySearch(searchRequest); String informationAsString = similaritySearchDocuments.stream().map(Document::getText).collect(Collectors.joining("\n\n")); Now we create a SystemPromptTemplate using var systemPromptTemplate = new SystemPromptTemplate(promptResource); promptResource is an .st file placed in src/main/resource directory with the following text prompt You are a helpful assistant. Use the following information to answer the question in detail. Please use a friendly and professional tone. Please acknowledge the question and relate the answer back to it. If the answer is not in the provided information, say "I don't know." Information: {information} Answer: Value for the placeholder {information} is substituted in this line var prompt= new Prompt(List.of(systemPromptTemplate.createMessage(Map.of("information",informationAsString)), new UserMessage(msg) )); Finally, we make call to the AI model to get a streaming response chatClient.prompt(prompt).stream().content() Important: To support the reactive paradigm, all these lines of code are executed with subscribeOn using a boundedElastic scheduler. While this may not be the most optimal choice, it remains valid and allows the author to stay focused on the context of the topic. return Mono.fromCallable(()->{ List<Document> similaritySearchDocuments = vectorStore.similaritySearch(searchRequest); String informationAsString = similaritySearchDocuments.stream().map(Document::getText).collect(Collectors.joining("\n")); SystemPromptTemplate systemPromptTemplate = new SystemPromptTemplate(promptResource); var prompt= new Prompt(List.of(systemPromptTemplate.createMessage(Map.of("information",informationAsString)), new UserMessage(msg) )); return chatClient.prompt(prompt).stream().content(); }) .subscribeOn(Schedulers.boundedElastic()) .flatMapMany(flux -> flux); The RestController RagApiController Uses RagService to stream generated responses for the given query as a reactive Flux<String> by delegating to GET end point /search/streamThe Controller WebMvcController Handles HTTP GET requests to/home and returns the view name ChatClient, which renders the chat client page. The frontend is built with Thymeleaf templating, providing a simple interface that accepts user input and invokes the streaming REST API.However, the frontend can be decoupled, and any technology stack may be used to consume the streaming API. The only requirement is the use of Server-Sent Events (SSE) to receive chunked responses. Figure 2: A simplistic UI for Spring AI Rag The RestController RagApiController Uses RagService to stream generated responses for the given query as a reactive Flux<String> by delegating to GET end point /search/stream.The Controller WebMvcController Handles HTTP GET requests to /home and returns the view name ChatClient, which renders the chat client page. The frontend is built with Thymeleaf templating, providing a simple interface that accepts user input and invokes the streaming REST API.However, the frontend can be decoupled, and any technology stack may be used to consume the streaming API. The only requirement is the use of Server-Sent Events (SSE) to receive chunked responses. 4. Comparison In either implementation, the frontend can be decoupled since the architecture is API‑driven. In both cases, generative AI providers and core components can be externalized through configuration.For skilled developers, both stacks deliver comparable speed and efficiency: Python is less verbose.Java, supported by modern IDE tooling and AI plugins (such as Copilot), is nearly as concise.Spring AI offers the added advantage of provider abstraction, simplifying development, maintenance, and scalability for large enterprise projects.Thanks to well‑documented libraries and strong community support, both ecosystems keep their APIs aligned with generic terminology, enabling developers to adapt quickly even if they are not deeply proficient in one stack.Each stack reflects features that align with contemporary industry needs and standards.Both continue to borrow and adapt features from one another.At present, Python maintains a lead due to its extensive AI/ML libraries.However, Java AI and Spring are rapidly catching up, and the gap is narrowing.JavaScript, long dominant in web development, is rapidly evolving to meet the demands of the GenAI era. Frameworks like TensorFlow.js, ONNX.js, and Transformers.js are enabling in-browser and server-side AI capabilities, making JavaScript a viable player in the AI landscape. With the rise of Node.js-based backends and full-stack frameworks like Next.js and SvelteKit, developers can now integrate GenAI features into web apps without switching languages. While not yet as mature as Python or Java in AI tooling, JavaScript’s accessibility, community momentum, and growing ecosystem suggest it’s closing the gap fast—positioning itself as a strong contender in the GenAI race. Final verdict: Choose the technology stack you are most comfortable with, and embrace the power of AI within it.
Business data often lives in hundreds of disconnected Excel files, making it invisible to decision-makers. Here is a pattern for Citizen Data Engineering using Python, GitHub Copilot, and Qlik Sense to unify data silos without writing a single line of manual code. In the enterprise world, the most common database isn't Oracle or PostgreSQL — it’s Excel. We see it everywhere: Governance teams track technology standards in spreadsheets, Sales teams track leads in .xlsx files, and HR tracks utilization in shared folders. In a recent case study at Fujitsu, a technical standardization team found themselves managing 400+ separate Excel files (TSI Checklists) containing critical data on software adoption. The data was valuable, but the format was useless for strategic analysis. To answer a simple question like "Which department is using high-risk software?" someone had to manually open dozens of files. Traditionally, solving this requires a Data Engineering team to build an ETL (Extract, Transform, Load) pipeline. But resources are scarce. This article explores a "Field-Driven" data pattern where a non-programmer used GitHub Copilot as a "Virtual Data Engineer," generating Python scripts to clean, merge, and load this data into Qlik Sense for visualization. The Problem: The "Excel Silo" Architecture The starting state of most legacy business processes looks like this: Fragmentation: Data is split across hundreds of files (e.g., one file per month or per department).Inconsistency: Humans enter data differently. One cell says "US," another "U.S.A.," and another "United States."Fragility: Excel formulas (like VLOOKUP) break easily when rows are moved. The goal is to move from static files to dynamic intelligence. The Solution: AI-Assisted "Citizen" ETL The proposed architecture bypasses the need for a dedicated engineering team by leveraging an LLM (GitHub Copilot) to bridge the gap between business logic and Python syntax. The Workflow: Ingest: Python (pandas) reads 400+ Excel files from a directory.Clean: Scripts normalize column names, handle missing values (NaN), and fix typos.Transform: Convert wide tables (human-readable) to long tables (machine-readable).Visualize: Qlik Sense loads the clean dataset for interactive dashboarding. 1. The "Prompt-First" Development Cycle In the Fujitsu case study, the developer was a domain expert, not a Python coder. Instead of writing code, they wrote Intent. The development loop shifted from Write Code -> Run -> Fail to Prompt -> Review -> Run. The Prompt Strategy: "I have a folder of Excel files. I need a Python script to read all of them, merge them into a single DataFrame, and add a column for the filename so I know the source." The Copilot-Generated Pattern: Python import pandas as pd import glob import os def merge_excel_files(folder_path): # Create a list to hold all dataframes all_data = [] # Get all excel files in the directory files = glob.glob(os.path.join(folder_path, "*.xlsx")) for filename in files: try: df = pd.read_excel(filename) # Feature Engineering: Track the source file df['source_file'] = os.path.basename(filename) all_data.append(df) except Exception as e: print(f"Error reading {filename}: {e}") # Concatenate all data into one DataFrame merged_df = pd.concat(all_data, ignore_index=True) return merged_df # Execution path = r'C:\Data\TSI_Checklists' final_df = merge_excel_files(path) print(f"Merged {len(final_df)} rows successfully.") 2. Auto-Correction and Data Cleansing One of the biggest hurdles in Excel data is "dirty data." A human might leave a cell blank or type a date as text. In this pattern, the "Citizen Developer" used Copilot to implement robust error handling. When the script failed due to a missing column, the developer didn't check StackOverflow — they simply pasted the error into the IDE chat. Prompt: "I am getting a KeyError: 'Product Name' because some files have 'ProductName' without a space. Fix the script to standardize column names." The Resulting "Self-Healing" Logic: Python def standardize_columns(df): # Map varying column names to a standard schema column_mapping = { 'ProductName': 'Product Name', 'Prod_Name': 'Product Name', 'dept_id': 'Department ID' } df.rename(columns=column_mapping, inplace=True) # Drop rows where critical data is missing df.dropna(subset=['Product Name'], inplace=True) return df This iterative process allowed the team to build advanced anomaly detection, listing files that failed processing so they could be fixed at the source. Visualizing the Pipeline The architecture transforms unstructured inputs into a structured analytical model. The Visualization Layer: Qlik Sense Once the data was unified into a single clean dataset, it was loaded into Qlik Sense. Because the data was now machine readable (long format), the BI tool could perform aggregations that were impossible in Excel. Key capabilities included: Cross-department filtering: Selecting high-risk products and instantly seeing which departments use them.Trend analysis: Visualizing adoption over time — impossible when data is trapped in monthly files.Drill-down: Clicking a bar chart to see the specific Excel file and row ID that contributed to the number. The ROI: Metrics That Matter The results of this "Citizen Data Engineering" experiment were significant. By removing the dependency on a central IT delivery team, the business achieved: Development Velocity: The author produced 350 steps of production-ready Python code in roughly 40 hours.Productivity: This equates to 8.8 steps per hour — comparable to professional developers (benchmarked at ~14.6 steps/hour for similar tasks).Quality: Data quality improved significantly because the Python script enforced validation rules that manual entry could not. Conclusion The barrier to entry for building robust data pipelines has collapsed. You no longer need to know the syntax of pandas.melt() or how to handle Python exceptions from memory. By combining GitHub Copilot (for syntax and logic generation) with Qlik Sense (for visualization), domain experts can escape the "Excel Trap." They can build their own ETL pipelines, turning hundreds of static files into a living, breathing decision-support system. The future of data utilization isn't just about better tools for data engineers; it's about giving engineering powers to the people who understand the data best.
You log into your DevOps portal, pinched to think about 300 different metrics: CPU, latency, errors, all lighting up red on your dashboard. But what to prioritize? It’s what an AI-based recommendation tool could resolve. Every SaaS platform managing cloud operations records an incredible amount of telemetry data. Most products, however, simply provide visualization: interesting graphics, yet no actionable information. What if your product could provide automated suggestions for config, scaling, or alerts based on tenant behavior? Using this article, you can incorporate AI recommendation functionality into your SaaS-based platform, thereby turning data into meaningful information. AI’s role in transforming DevOps dashboards into optimization engines shall also be discussed. Why This Matters Now It is now common for multi-cloud platforms to produce data points in the tens of millions each day. Engineers simply can't process these results. AI-powered assistance is no longer a "nice to have," it’s the next frontier in dashboard usability for new SaaS interfaces. Include context for actionable, customized suggestions to encourage faster decision-making, reduce unnecessary expenditures, and improve adoption. Background Most DevOps tools are built to monitor, not to advise. Dashboards and alerts tell you "what’s happening," not "what to do next!" Some typical problems are: Decision fatigue: too many charts with no actionable conclusions.Compute waste: 40-60% of servers are below 40% utilization.Low feature adoption: less than 25% of alert templates are in use. Conventional rules-based automated approaches simply cannot be scaled for individual workloads or tenants. It follows naturally to move to a model based on learning from usage patterns. Solution Overview To resolve these kinds of earlier problems, something more than dashboards with static rules is needed. Intelligence with the capacity to learn from user activity, in addition to directing these engineers, would be necessary to inform their efforts, rather than having to interpret each of these many different graphics. It starts with behavioral learning. Every engagement, whether it's opening a metric, muting an alert, or modifying an autoscaling rule, leaves traces, or signals, about what's useful, confusing, or volatile. To illustrate, if multiple groups of users are consistently modifying the volatile settings for an alert, it suggests modifications to point to a more stable environment. On this foundation, collaborative intelligence finds similarities among tenants. Even for tenants whose data isn't shared, there's apt to be similarity in their workloads. If one set of tenants has successfully optimized CPU variability by turning on scheduled autoscaling, such variability is noticed by collaborative intelligence, which suggests an analogous solution for other tenants with similar workloads. Generalization to new, unseen situations would be enabled by vector embeddings, which represent the properties of resources, such as instance types, alerting settings, or metric behaviors. A new instance type, for instance, could then be automatically matched with tenants likely to benefit from their workloads. Every finding passes through a low-latency scoring layer, which instantly recommends actions in the SaaS interface. With each new telemetry, the engine suggests actions based on rankings, including offers like "Enable autoscaling: Similar groups with analogous workloads observed a 15% cost savings." Since inference times are below 100ms, these suggestions are perceived as native to the platform. It improves with feedback. Successful predictions strengthen the model, while failed predictions decrease model confidence. It’s retrained nightly or hourly to keep its model fresh, eliminating costly real-time retraining. Trials with such a model have provided clouds with performance variability reductions of 22% with inference latency below 80ms. With such embedded intelligence, your SaaS offering not only remains a monitoring solution but also becomes a proactive collaborator, which assists in taking faster actions, optimizing expenditures, and making more informed operational choices. Architecture and Flow Below is a visual representation of the complete AI recommendation system architecture utilized in these types of SaaS/DevOps applications. It highlights the flow of data from telemetry, behavioral, or workload metadata through the ingestion, model, and real-time scoring to provide actionable recommendations in product UI. Before vs. After: Operational Intelligence Transformation Before AI Recommendations DevOps professionals spend inordinate amounts of their time trying to understand their dashboards, navigating between their metric views, or manually setting thresholds for alerts or scaling rules. Most choices are based on tribal knowledge or iterative testing. Inconsistent settings, sluggish response, or continuous over- or under-provisioning plague DevOps professionals' systems. Highly useful functions, such as anomaly detection or predictive scaling, go unused simply due to unfamiliarity with their appropriate use. After AI Recommendations It transforms from being a passive observer to an active decision-maker. Rather than looking at several hundred graphs, engineers get their most pressing answers in context with what they are working on. It shows them what they care about most, why, and what they should do next, which could be solved with one click. It’s dynamic because it changes based on what’s being received from the users, which improves with time. Implementation Steps A DevOps SaaS platform employed AI in offering cost-efficient settings based on cross-tenant activity. 1. Resource optimization begins with aggregating workload data with operational activity driven by users. It is able to capture the patterns in machine activity, such as over-provisioned machines, noisy pages, misconfigured scaling rules, and inefficient workloads, while also taking into consideration human intention. SQL: Aggregate multi-tenant resource signals SQL SELECT tenant_id, user_id, resource_id, AVG(cpu_util) AS avg_cpu, AVG(mem_util) AS avg_mem, PERCENTILE_CONT(0.95) WITHIN GROUP (ORDER BY cpu_util) AS p95_cpu, COUNT(*) FILTER (WHERE action='resize') AS resize_events, MAX(event_ts) AS last_seen FROM telemetry_user_actions GROUP BY tenant_id, user_id, resource_id; Apply decay weighting to prioritize current workloads. Python df['weight'] = df['resize_events'] * np.exp(-(now - df['last_seen']).dt.days / 7) Why this matters for optimization: Utilization patterns shift weeklyUsers adjust scaling rules when workloads misbehaveDecay prevents old sizing decisions from influencing new recommendations Compliance Per-tenant matrices keep resource/workload behavior isolated for SOC2/HIPAA environments. 2. Build a hybrid behavioral/semantic model for optimization. DevOps workloads vary widely — there are APIs, constant batch processing, and spiky ML inference work. Consequently, it needs to have an understanding of the nature of similar workloads, plus adjustments made to them by engineers. Collaborative filtering learns human decisions. Python model_cf = AlternatingLeastSquares(factors=64, regularization=0.1) model_cf.fit(user_resource_matrix) Embeddings capture workload semantics. Python resource_desc = workload["instance_type"] + " " + workload["traffic_pattern"] item_embeddings = model_emb.encode(resource_desc) Blend into unified optimization score. Python score = 0.65 * cf_score + 0.35 * cosine_similarity(item_embeddings) Why this hybrid approach works: CF mirrors real optimization patterns (“teams like you downsize this VM after consistently low CPU”).Embeddings capture resource characteristics, enabling recommendations for new instance families or unknown workloads.Hybrid ensures stability during high variance and low-data periods. 3. Deploy low-latency optimization API with SLOs and observability. These optimization suggestions have to be made in real time, within dashboards where engineers are analyzing performance. FastAPI Microservice (P99 < 120 ms target): Python @app.get("/optimize/{tenant}/{resource}") async def optimize(tenant, resource): rec = hybrid_engine.recommend(tenant, resource) return { "resource": resource, "suggested_actions": rec, "explainability": rec.weights # CPU%, peer similarity, cost delta } Add system observability. Use Prometheus counters: recsys_latency_msrecsys_success_totalrecsys_failure_totalrecsys_cache_hit_ratio Example recommendations: “Resize c5.large → c5.medium (avg CPU: 22%, p95 CPU: 41%).”“Enable HPA (high variance workload detected).”“Adopt anomaly alerting (noise reduced 40% for similar apps).” 4. Feedback learning, drift detection, and retraining. Resource optimization begins with recognizing what drives performance and cost improvements. Log impact-aware feedback: Python feedback.append({ "tenant": tenant, "resource": resource, "action": action, "accepted": accepted, "cpu_delta": post_cpu - pre_cpu, "cost_delta": post_cost - pre_cost }) Drift detection (seasonality, traffic spikes, new deployments): Python drift_detector = drift.ADWIN() if drift_detector.update(current_avg_cpu): trigger_early_retrain() Nightly retraining via Airflow: Python with DAG("retrain_optimizer", schedule_interval="@daily") as dag: PythonOperator(task_id="retrain", python_callable=retrain_pipeline) 5. Validate the engine using optimization-centric metrics. Consumer recommenders differ in that success is not measured by click-throughs in this case. Offline Right-sizing prediction accuracyImprovement in CPU/memory balanceRecall@5 for optimization suggestions Online (A/B tests) % cost reduction per tenantReduction in manual resize/edit operationsAlert noise reductionImproved scaling stability (fewer OOMs, fewer restarts) Key Takeaways and Future Directions AI-driven optimization represents a paradigm shift for SaaS/DevOps platforms in general, since it translates raw infrastructure data into actionable insights to eliminate cloud inefficiencies and variability. By leveraging a hybrid approach to behavioral learning and workload semantics, the model is able to provide meaningful suggestions for rightsizing, scaling, and alarm tuning with significantly improved accuracy levels compared to traditional rule-based systems. These features are most beneficial in multi-tenant scenarios, which are influenced by tenant isolation, compliance, and real-time inference needs in terms of constructing these suggestions. As the model learns from accepted and rejected suggestions, it gets more in line with operational needs for each tenant, thereby addressing their inefficiency levels, noise in alerts, and performance variability. These foundations will also lay the groundwork for what next-generation optimization engines look like. Future engines will look not just at recommendations, but at safe, autonomous changes to make within given guardrails. Enhanced support for CI/CD pipelines will enable changes to be tracked against deployments or new service launches, while seasonal forecasts and time-series analysis will enable platforms to forecast demand peaks in advance. With increasing adoption in multiple clouds, engines will also enable analogous instance types on AWS, GCP, and Azure, to provide truly cloud-agnostic optimization for workloads. It all suggests a next chapter in which SaaS platforms evolve to be intelligent partners, with partners that enable self-improved, self-optimized, and faster decision-making for operationally confident teams.
I've been covering enterprise AI deployments since Watson was still pretending to revolutionize healthcare, and I've learned to distinguish genuine paradigm shifts from rebranded hype cycles. What's happening with agentic AI in 2025 feels uncomfortably like both. The pitch is seductive: autonomous software agents that plan, reason, and execute complex tasks without constant human supervision. Instead of asking a chatbot for information, you delegate an entire workflow — "book my travel to the conference in Austin, find a hotel near the venue, block my calendar, and brief me on attendees I should meet." The agent figures out the rest. Every major tech company is making noise about this. Salesforce launched Agentforce earlier this year. Microsoft is embedding autonomous capabilities into Copilot. An IBM survey from early 2025 claimed 99% of enterprise AI developers are either building or exploring agents — a number so suspiciously round it should come with an asterisk, but directionally it captures the frenzy. Here's what bothers me: I've seen this movie before. The innovation token collision If you accept Dan McKinley's framework — and after watching it play out for a decade, I do — every engineering organization gets roughly three chances to bet on unproven technology before institutional chaos consumes them. Most companies already burned one token on cloud migration, another on Kubernetes, maybe a third on microservices. Now, leadership wants to add autonomous AI agents into production systems? I spoke with a VP of engineering at a mid-size fintech company in September. They'd been pressured by the board to "demonstrate AI leadership" after a competitor announced an AI agent for customer onboarding. His team spent four months building a proof-of-concept that could autonomously verify documents and initiate account setup. It worked beautifully in demos. In production, it hallucinated routing numbers, approved fraudulent IDs that passed basic checks but failed human scrutiny, and required so much hand-holding that support costs actually increased. "We spent an innovation token we didn't have," he told me. "Now we're stuck maintaining this thing while our payment infrastructure still runs on code from 2019 that desperately needs attention." That's the quiet disaster unfolding across the industry. Agentic AI isn't being deployed alongside boring, reliable infrastructure — it's being forced into companies that haven't mastered the basics. When autonomy meets reality The technical capabilities are real, I'll grant that. Modern large language models have made genuine leaps — better reasoning through chain-of-thought training, expanded context windows that function as working memory, native tool-calling that lets them interact with APIs and databases. IBM's researchers are right that we finally have the ingredients for autonomous agents. But having ingredients doesn't mean you should bake the cake. Gartner published a forecast in June 2025 that should have been a warning shot: they predict over 40% of current agentic AI pilots will be canceled by the end of 2027. Not because the technology failed, but because the business case evaporated under scrutiny. High costs, unclear ROI, misaligned expectations, and operational complexity nobody anticipated during the pitch meeting. I've reviewed half a dozen postmortems from failed agent projects over the past year. The pattern is consistent: leadership approves an ambitious vision, engineering builds something that technically works, operations discovers the hidden costs of monitoring and correcting autonomous decisions, and finance eventually kills it when the TCO becomes undeniable. One retail company tried deploying an agent to manage dynamic pricing across their e-commerce platform. The agent could adjust prices based on inventory, competitor data, and demand signals —tasks too complex for rule-based automation. But it required constant guardrails. When it cut prices too aggressively during a flash sale, margins cratered. When it raised prices to protect inventory during unexpected demand, customers revolted on social media. Eventually, someone had to babysit every decision, defeating the entire purpose. The agent wasn't wrong, technically. It was optimizing for the metrics it was given. But autonomous optimization in messy real-world contexts produces outcomes humans would never approve — and catching those outcomes in time requires the kind of oversight that negates the efficiency gains. The boring alternative nobody wants to hear Here's the argument I've been making to skeptical CTOs: before you deploy autonomous agents, do you have boring infrastructure working flawlessly? Can your database team handle routine failover without escalating to management? Do your deployment pipelines have enough observability that you can trace production issues to specific commits? When was the last time you reviewed your API rate limits and caching strategies? If the answer to any of those is "we should really get around to that," you're not ready for agentic AI. Stripe processed over a trillion dollars in payments in 2024 with five-nines uptime because their database team is pathologically obsessed with reliability. They instrument everything. They document failure modes. They celebrate incident-free quarters more than feature launches. That operational maturity is why Stripe could deploy sophisticated AI features if they chose to — but notice they're not rushing to make their payment decisioning fully autonomous. They understand the difference between augmentation and abdication. Shopify, meanwhile, is still running a modular monolith for its core checkout flows. One repository, one database, one CI/CD pipeline. They've resisted the microservices frenzy that consumed competitors, and they're certainly not handing autonomous agents the keys to their merchant platform. I asked a Shopify engineer about their AI strategy at a conference in March. "We're using models for recommendations, search ranking, fraud signals — all supervised use cases where we control the blast radius," they said. "Autonomy is a privilege you earn after you've proven you can keep the lights on." That's the perspective shift the industry needs. Autonomy isn't the goal — reliability is. Agents are tools, not strategies. The governance gap Gartner's optimistic projections say 15% of routine work decisions will be made autonomously by AI agents by 2028, up from basically zero today. In customer service specifically, they're forecasting 80% autonomous resolution of common requests by 2029, potentially cutting service costs by 30%. Those numbers assume everything goes according to plan. But I've covered enough security incidents and compliance failures to know what happens when autonomous systems make unexpected decisions. Who's liable when an AI agent approves a loan that violates fair lending regulations? Who gets fired when an agent mishandles PII in a way that triggers GDPR fines? Who explains to customers why their insurance claim was denied by a system that can't articulate its reasoning in legal terms? The governance frameworks don't exist yet. Most companies are building agents faster than they're building oversight. That gap will close eventually — probably after a few high-profile disasters that force regulatory intervention. In the meantime, we're in a period of maximum risk and minimum accountability. I spoke with a former Google engineer now consulting on AI safety who put it bluntly: "Every autonomous agent is a liability waiting to materialize. If you can't explain what it will do in adversarial conditions, you shouldn't deploy it." What smart teams are actually doing The companies I trust are taking a radically different approach. They're using AI for augmentation within tightly scoped guardrails, not full autonomy. One healthcare platform I've been tracking uses AI to surface patient risk factors for clinician review, but the actual treatment decisions remain human. An insurance company is testing agents that draft policy language, but every output goes through legal review before publication. A logistics startup built an agent that proposes optimal delivery routes, but dispatchers have final approval and can override with context that the model doesn't have. These aren't sexy use cases. They won't get glowing coverage in tech press. But they're the ones that will still be running in three years when half the current crop of autonomous agent projects have been quietly shelved. This approach mirrors what Netflix figured out with microservices — they only succeeded because they'd spent years building the operational muscle to handle distributed complexity. Chaos Monkey, Spinnaker, comprehensive observability, self-healing infrastructure. That foundation didn't come cheap or fast, and most companies simply can't afford to replicate it. The same logic applies to agentic AI. The infrastructure requirements aren't just technical — they're cultural. You need teams comfortable with probabilistic systems, incident response processes that account for emergent failures, and leadership willing to kill projects that aren't working instead of throwing good money after bad. The real test ahead We're about 18 months into the current agent hype cycle, which means we're approaching the trough of disillusionment. Gartner's 40% cancellation rate prediction might actually be conservative. I expect we'll see a wave of quiet project shutdowns in mid-2026 as companies realize the operational burden exceeds the value. The survivors will be the ones who started with boring, reliable infrastructure and added autonomy incrementally. The ones who treated agent capabilities as a privilege earned through operational excellence, not a right granted by vendor promises. I think about Cloudflare's November 2025 outage whenever someone pitches me on the transformative potential of autonomous agents. A routine permissions change doubled a config file size, hit an undocumented limit, and took down a chunk of the internet. Not because their engineering was bad — Cloudflare has world-class talent. But because complex systems have failure modes that nobody predicts until they happen. Now imagine that same dynamic with an autonomous agent making decisions across your business without clear audit trails. How confident are you that your team would even notice something was wrong before customers started complaining? Choosing boring, again The irony is that the best use cases for AI in 2025 remain the least autonomous ones. Code completion that speeds up development. Search ranking that surfaces better results. Fraud detection that flags suspicious patterns for human review. These applications add enormous value without the governance nightmare of full autonomy. But they don't generate conference keynotes or VC funding rounds. So the industry keeps chasing the bleeding edge, burning innovation tokens it doesn't have on technology it isn't ready for. I've been doing this long enough to know how it ends. Some companies will succeed with agentic AI — the ones with Netflix-caliber operational maturity and realistic expectations. Most will fail quietly, their postmortems locked behind NDA walls where they can't warn others. The winners, as always, will be the teams that picked boring reliability over brilliant autonomy. That chose incremental progress over revolutionary transformation. That understood their job isn't to deploy cutting-edge AI — it's to keep the business running while carefully expanding what's possible. McKinley's innovation token framework has aged remarkably well precisely because human nature hasn't changed. We're still drawn to shiny new capabilities, still tempted to bet the farm on unproven technology, still learning the same lessons the hard way. Agentic AI will eventually matter. But not yet. Not for most companies. And probably not in the form we're currently building. For now, the smartest move remains the same one that's worked for a decade: choose boring technology, instrument everything, document your failures, and earn the right to experiment through flawless operation of the basics. The agents can wait. Your production database can't.
Tuhin Chattopadhyay
CEO at Tuhin AI Advisory and Professor of Practice,
JAGSoM
Frederic Jacquet
Technology Evangelist,
AI[4]Human-Nexus
Suri (thammuio)
Data & AI Services and Portfolio
Pratik Prakash
Principal Solution Architect,
Capital One