DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Zones

Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks

How does AI transform chaos engineering from an experiment into a critical capability? Learn how to effectively operationalize the chaos.

Data quality isn't just a technical issue: It impacts an organization's compliance, operational efficiency, and customer satisfaction.

Are you a front-end or full-stack developer frustrated by front-end distractions? Learn to move forward with tooling and clear boundaries.

Developer Experience: Demand to support engineering teams has risen, and there is a shift from traditional DevOps to workflow improvements.

AI/ML

Artificial intelligence (AI) and machine learning (ML) are two fields that work together to create computer systems capable of perception, recognition, decision-making, and translation. Separately, AI is the ability for a computer system to mimic human intelligence through math and logic, and ML builds off AI by developing methods that "learn" through experience and do not require instruction. In the AI/ML Zone, you'll find resources ranging from tutorials to use cases that will help you navigate this rapidly growing field.

icon
Latest Premium Content
Trend Report
Generative AI
Generative AI
Refcard #158
Machine Learning Patterns and Anti-Patterns
Machine Learning Patterns and Anti-Patterns
Refcard #401
Getting Started With Agentic AI
Getting Started With Agentic AI

DZone's Featured AI/ML Resources

MCP Client Agent: Architecture and Implementation

MCP Client Agent: Architecture and Implementation

By Venkata Buddhiraju
In this post, we’ll go deeper into the overall MCP architecture and client flow, and we’ll also implement an MCP client agent. The goal is to provide some clarity on “What happens when you submit your request to MCP powered with LLMs”—breaking down what’s actually going on behind the scenes. There are plenty of articles out there about building MCP servers. For reference, here is an official example from the MCP website. In this post, though, we’ll focus only on implementing an MCP client agent that can programmatically connect to MCP servers. High-Level MCP Architecture MCP Components Host: AI Code editors (like Claude Desktop or Cursor) that users directly interact with, serving as the main interface and system manager. Clients: Intermediaries that maintain connections between hosts and MCP servers, handling communication protocols and data flow. Servers: Components that provide specific functionalities, data sources, and tools to AI models through standardized interfaces Without delaying further lets get to the core of this article. What are MCP Client Agents? Custom MCP Clients: Programmatically Invoking MCP Servers Most of the use cases we've seen so far involve using MCP within an AI-powered IDE. In these setups, users configure MCP servers inside the IDE and interact with them through a chat interface. In this case, the chat interface acts as the MCP client or host. But what if you want to invoke MCP servers programmatically from your own services? That’s where the real strength of MCP comes in. It provides a standardized way to supply context and tools to your LLMs. Instead of writing custom code to integrate with every external API, resource, or file, you can focus on packaging the right context and capabilities, then hand them off to the LLM to reason over and act on. MCP Client Agent Workflow With Multiple MCP Servers The diagram illustrates how MCP Custom Clients/AI agents process user requests through MCP servers. Below is a step-by-step breakdown of this interaction flow: Step 1: User Initiates Request User asks a query or submits a request either through an IDE, or browser or terminal.Query is received by the Custom MCP Client/Agent interface. Step 2: MCP Client and Server Connection MCP Client connects to the MCP Server. It can connect to multiple servers at a time and requests for tools from these serversServers send back the supported list of tools and functions. Step 3: AI Processing Both user query and tools list are sent to the LLM (e.g., OpenAI).LLM analyzes the request and suggests appropriate tool and input parameters and sends back response to MCP Client. Step 4: Function Execution MCP Client calls the selected function in MCP Server with the suggested parameters.MCP Server receives the function call and processes the request, depending on the request the corresponding tool in a specific MCP Server will get called. Please note to make sure the tool names across your MCP servers are different to avoid LLM hallucination and non-deterministic responses.Server may interact with databases, external APIs, or file systems to process the request. Step 5: (Optional) Improve Response using LLM MCP Server returns the function execution response to MCP Client. (Optional) MCP Client can then forward that response to LLM for refinement. LLM converts technical response to natural language or creates a summary. Step 6: Respond to User Final processed response is sent back to the user through the client interface.User receives the answer to their original query. Custom MCP Client Implementation / Source Code Connecting to MCP Servers: As discussed earlier, an MCP client can connect to multiple MCP servers. This behavior can be simulated in a custom MCP client implementation. Note: To reduce hallucinations and ensure consistent results, it’s recommended to avoid tool name collisions across multiple MCP servers. MCP Server Transport Options: MCP servers support two types of transport mechanisms: STDIO – for local process communicationSSE – for HTTP/WebSocket-based requests Connecting to STDIO Transport Python async def connect_to_stdio_server(self, server_script_path: str): """Connect to an MCP stdio server""" is_python = server_script_path.endswith('.py') is_js = server_script_path.endswith('.js') if not (is_python or is_js): raise ValueError("Server script must be a .py or .js file") command = "python" if is_python else "node" server_params = StdioServerParameters( command=command, args=[server_script_path], env=None ) stdio_transport = await self.exit_stack.enter_async_context(stdio_client(server_params)) self.stdio, self.write = stdio_transport self.session = await self.exit_stack.enter_async_context(ClientSession(self.stdio, self.write)) await self.session.initialize() print("Initialized stdio...") Connecting to SSE Transport Python async def connect_to_sse_server(self, server_url: str): """Connect to an MCP server running with SSE transport""" # Store the context managers so they stay alive self._streams_context = sse_client(url=server_url) streams = await self._streams_context.__aenter__() self._session_context = ClientSession(*streams) self.session: ClientSession = await self._session_context.__aenter__() await self.session.initialize() print("Initialized SSE...") Get Tools and Process User request With LLM and MCP Servers Once the Servers are initialized, we can now fetch tools from all available servers and process user query, processing user query will follow the steps as described above: Python stdio_tools = await std_server.list_tools() sse_tools = await sse_server.list_tools() Process User Request: Python async def process_user_query(self, available_tools: any, user_query: str, tool_session_map: dict): """ Process the user query and return the response. """ model_name = "gpt-35-turbo" api_version = "2022-12-01-preview" # On first user query, initialize messages if empty self.messages = [ { "role": "user", "content": user_query } ] # Initialize your LLM - e.g., Azure OpenAI client openai_client = AzureOpenAI( api_version=api_version, azure_endpoint=<OPENAI_ENDPOINT>, api_key=<API_KEY>, ) # send the user query to the LLM along with the available tools from MCP Servers response = openai_client.chat.completions.create( messages=self.messages, model=model_name, tools=available_tools, tool_choice="auto" ) llm_response = response.choices[0].message # append the user query along with LLM response self.messages.append({ "role": "user", "content": user_query }) self.messages.append(llm_response) # Process respose and handle tool calls if azure_response.tool_calls: # assuming only one tool call suggested by LLM or keep in for loop to go over all suggested tool_calls tool_call = azure_response.tool_calls[0] # tool call based on the LLM suggestion result = await tool_session_map[tool_call.function.name].call_tool( tool_call.function.name, json.loads(tool_call.function.arguments) ) # append the response to messages self.messages.append({ "role": "tool", "tool_call_id": tool_call.id, "content": result.content[0].text }) # optionally send the response to LLM to summarize azure_response = openai_client.chat.completions.create( messages=self.messages, model=model_name, tools=available_tools, tool_choice="auto" ).choices[0].message Hopefully, this gave you a solid starting point for implementing MCP clients. In future posts, we'll explore how to host MCPs for remote access using tools like Kubernetes and Docker. If you’d like to dive deeper right away, check out this sample source code, which includes both an MCP client agent and server implementation. More
Debunking LLM Intelligence: What's Really Happening Under the Hood?

Debunking LLM Intelligence: What's Really Happening Under the Hood?

By Frederic Jacquet DZone Core CORE
Large language models (LLMs) possess an impressive ability to generate text, poetry, code, and even hold complex conversations. Yet, a fundamental question arises: do these systems truly understand what they are saying, or do they merely imitate a form of thought? Is it a simple illusion, an elaborate statistical performance, or are LLMs developing a form of understanding, or even reasoning? This question is at the heart of current debates on artificial intelligence. On one hand, the achievements of LLMs are undeniable: they can translate languages, summarize articles, draft emails, and even answer complex questions with surprising accuracy. This ability to manipulate language with such ease could suggest genuine understanding. On the other hand, analysts emphasize that LLMs are first and foremost statistical machines, trained on enormous quantities of textual data. They learn to identify patterns and associations between words, but this does not necessarily mean they understand the deep meaning of what they produce. Don’t they simply reproduce patterns and structures they have already encountered, without true awareness of what they are saying? The question remains open and divides researchers. Some believe that LLMs are on the path to genuine understanding, while others think they will always remain sophisticated simulators, incapable of true thought. Regardless, the question of LLM comprehension raises philosophical, ethical, and practical issues that translate into how we can use them. Also, it appears more useful than ever today to demystify the human "thinking" capabilities sometimes wrongly attributed to them, due to excessive enthusiasm or simply a lack of knowledge about the underlying technology. This is the very point a team of researchers at Apple recently demonstrated in their study "The Illusion of Thinking." They observed that despite LLMs' undeniable progress in performance, their fundamental limitations remained poorly understood. Critical questions persisted, particularly regarding their ability to generalize reasoning or handle increasingly complex problems. "This finding strengthens evidence that the limitation is not just in problem-solving and solution strategy discovery but also in consistent logical verification and step execution limitation throughout the generated reasoning chains" - Example of Prescribed Algorithm for Tower of Hanoi - “The Illusion of Thinking” - Parshin Shojaee, Iman Mirzadeh, Keivan Alizadeh, Maxwell Horton, Samy Bengio, Mehrdad Farajtabar - APPLE To better get the essence of LLMs, let’s explore their internal workings and establish fundamental distinctions with human thought. To do this, let’s use the concrete example of this meme ("WHAT HAPPENED TO HIM? - P > 0.05") to illustrate both the technological prowess of LLMs and the fundamentally computational nature of their operation, which is essentially distinct from human consciousness. The 'P > 0.05' Meme Explained Simply by an LLM I asked an LLM to explain this meme to me simply, and here is its response: The LLM Facing the Meme: A Demonstration of Power If we look closely, for a human, understanding the humor of this meme requires knowledge of the Harry Potter saga, basic statistics, and the ability to get the irony of the funny juxtaposition. Now, when the LLM was confronted with this meme, it demonstrated an impressive ability to decipher it. It managed to identify the visual and textual elements, recognize the cultural context (the Harry Potter scene and the characters), understand an abstract scientific concept (the p-value in statistics and its meaning), and synthesize all this information to explain the meme's humor. Let's agree that the LLM's performance in doing the job was quite remarkable. It could, at first glance, suggest a deep "understanding," or even a form of intelligence similar to ours, capable of reasoning and interpreting the world. The Mechanisms of 'Reasoning': A Computational Process However, this performance does not result from 'reflection' in the human sense. The LLM does not 'think,' has no consciousness, no introspection, and even less subjective experience. What we perceive as reasoning is, in reality, a sophisticated analysis process, based on algorithms and a colossal amount of data. The Scale of Training Data An LLM like Gemini or ChatGPT is trained on massive volumes of data, reaching hundreds of terabytes, including billions of text documents (books, articles, web pages) and billions of multimodal elements (captioned images, videos, audio), containing billions of parameters. This knowledge base is comparable to a gigantic, digitized, and indexed library. It includes an encyclopedic knowledge of the world, entire segments of popular culture (like the Harry Potter saga), scientific articles, movie scripts, online discussions, and much more. It’s this massive and diverse exposure to information that allows it to recognize patterns, correlations, and contexts. The Algorithms at Work To analyze the meme, several types of algorithms come into play: Natural language processing (NLP): It’s the core of interaction with text. NLP allows the model to understand the semantics of phrases ('WHAT HAPPENED TO HIM?') and to process textual information.Visual recognition / OCR (Optical Character Recognition): For image-based memes, the system uses OCR algorithms to extract and 'read' the text present in the image ('P > 0.05'). Concurrently, visual recognition allows for the identification of graphic elements: the characters' faces, the specific scene from the movie, and even the creature's frail nature.Transformer neural networks: These are the main architectures of LLMs. They are particularly effective at identifying complex patterns and long-term relationships in data. They allow the model to link 'Harry Potter' to specific scenes and to understand that 'P > 0.05' is a statistical concept. The Meme Analysis Process, Step-by-Step: When faced with the meme, the LLM carries out a precise computational process: Extraction and recognition: The system identifies keywords, faces, the scene, and technical text.Activation of relevant knowledge: Based on these extracted elements, the model 'activates' and weighs the most relevant segments of its knowledge. It establishes connections with its data on Harry Potter (the 'limbo,' Voldemort's soul fragment), statistics (the definition of the p-value and the 0.05 threshold), and humor patterns related to juxtaposition.Response synthesis: The model then generates a text that articulates the humorous contrast. It explains that the joke comes from Dumbledore's cold and statistical response to a very emotional and existential question. This highlights the absence of 'statistical significance' of the creature's state. This explanation is constructed by identifying the most probable and relevant semantic associations, learned during its training. The Fundamental Difference: Statistics, Data, and Absence of Consciousness This LLM's 'reasoning,' or rather, its mode of operation, therefore results from a series of complex statistical inferences based on correlations observed in massive quantities of data. The model does not 'understand' the abstract meaning, emotional implications, or moral nuances of the Harry Potter scene. It just predicts the most probable sequence, the most relevant associations, based on the billions of parameters it has processed. This fundamentally contrasts with human thought. Indeed, humans possess consciousness, lived experience, and emotions. It’s with these that we create new meaning rather than simply recombining existing knowledge. We apprehend causes and effects beyond simple statistical correlations. It’s this that allows us to understand Voldemort's state, the profound implications of the scene, and the symbolic meaning of the meme. And above all, unlike LLMs, humans act with intentions, desires, and beliefs. LLMs merely execute a task based on a set of rules and probabilities. While LLMs are very good at manipulating very large volumes of symbols and representations, they lack the understanding of the real world, common sense, and consciousness inherent in human intelligence, not to mention the biases, unexpected behaviors, or 'hallucinations' they can generate. Conclusion Language models are tools that possess huge computational power, capable of performing tasks that mimic human understanding in an impressive way. However, their operation relies on statistical analysis and pattern recognition within vast datasets, and not on consciousness, reflection, or an inherently human understanding of the world. Understanding this distinction is important when the technological ecosystem exaggerates supposed reasoning capabilities. In this context, adopting a realistic view allows us to fully leverage the capabilities of these systems without attributing qualities to them that they don't possess. Personally, I’m convinced that the future of AI lies in intelligent collaboration between humans and machines, where each brings its unique strengths: consciousness, creativity, and critical thinking on one side; computational power, speed of analysis, and access to information on the other. More
From OCR Bottlenecks to Structured Understanding
From OCR Bottlenecks to Structured Understanding
By Pier-Jean MALANDRINO DZone Core CORE
It’s Not Magic. It’s AI. And It’s Brilliant.
It’s Not Magic. It’s AI. And It’s Brilliant.
By Ananya K V
Building Smarter Chatbots: Using AI to Generate Reflective and Personalized Responses
Building Smarter Chatbots: Using AI to Generate Reflective and Personalized Responses
By Surabhi Sinha
Elevating LLMs With Tool Use: A Simple Agentic Framework Using LangChain
Elevating LLMs With Tool Use: A Simple Agentic Framework Using LangChain

Large Language Models (LLMs) are significantly changing the way we interact with data and generate insights. But their real superpower lies in the ability to connect with external tools. Tool calling turns LLMs into agents capable of browsing the web, querying databases, and generating content — all from a simple natural language prompt. In this article, we go one step beyond single-tool agents and show how to build a multi-tool LangChain agent. We’ll walk through a use case where the agent: Picks up the latest trending topics in an industry (via a Google Trends-like tool),Searches for up-to-date articles on those topicsWrites a concise, informed digest for internal use This architecture is flexible, extensible, and relevant for any team involved in market research, content strategy, or competitive intelligence. Why Tool Use Matters LLMs are trained on static data. They can't fetch live information or interact with external APIs unless we give them tools. Think of tool use like giving your AI a "superpower" — the ability to Google something, call a database, or access a calendar. LangChain provides a flexible framework to enable these capabilities via tools and agents. Use Case: From Trend to Insight Imagine you're part of a product or marketing team, and you want to stay updated on developments in the electric vehicle (EV) industry. Every week, you’d like a short, insightful write-up on the top trends, based on the latest data and news. With a multi-tool LLM agent, this task can be automated. The agent performs three tasks: Discover: Identify top-trending EV-related topics (e.g., "solid-state batteries," "charging infrastructure").Research: Search the web for recent news on those topics.Synthesize: Summarize the findings in an internal digest. Architecture Overview Here’s how the system is structured: The flow starts when a user asks a question or gives a topic/industry. The agent first invokes the GoogleTrendsFetcher to identify emerging keywords, then uses the Search tool to retrieve relevant articles, and finally synthesizes a concise summary using LLM. Each tool acts like a specialized worker, and the agent orchestrates their actions based on the task at hand. This modular approach allows for easy integration, customization, and scaling of the system for broader enterprise use cases. Tools Used LLM (ChatOpenAI): The reasoning engine that decides what to do and synthesizes the final output.Tool 1 (GoogleTrendsFetcher): A wrapper around a trends API (real or mocked) to return current hot topics for a domain.Tool 2 (DuckDuckGoSearch or TavilySearch): A tool that returns web search results for a given query. Code Overview: Python from langchain.agents import initialize_agent, Tool from langchain.chat_models import ChatOpenAI from custom_tools import GoogleTrendsFetcher, DuckDuckGoSearchRun # Initialize LLM llm = ChatOpenAI(temperature=0, model="gpt-4") # Define tools google_trends = Tool( name="GoogleTrends", func=GoogleTrendsFetcher().run, description="Fetch trending topics for a specific industry" ) search_tool = Tool( name="Search", func=DuckDuckGoSearchRun().run, description="Search web for news or updates on any topic" ) # Agent with multi-tool capabilities tools = [google_trends, search_tool] agent = initialize_agent(tools, llm, agent="zero-shot-react-description", verbose=True) # Ask agent to perform the end-to-end task question = "Give me a digest of the top 3 emerging trends in the EV industry this week." response = agent.run(question) print(response) One major benefit of using LangChain’s agent architecture is interpretability. Developers and analysts can trace each tool invocation, see intermediate decisions, and validate results step by step. This not only builds trust in outputs but also helps debug failures or hallucinations — an essential feature when deploying such agents in business-critical workflows. Prompting Tips for Multi-Step Reasoning To make full use of multi-tool capabilities, your prompt should: Specify a goal that involves multiple stepsClarify the domain (e.g., electric vehicles, fintech)Ask for structured output (e.g., bullet points, digest format) Example Prompt: "Act as a market intelligence agent. First, fetch the top 3 trending topics in the electric vehicle industry. Then, search for recent news on each topic. Finally, write a short digest summarizing the findings." Benefits of Multi-Tool Agents Automation of Research Pipelines: Saves hours of manual workCross-Domain Application: Replace EVs with any industry — AI, finance, real estateReal-Time Awareness: Leverages current data rather than static knowledgeHigh-Quality Summarization: Converts raw data into valuable narratives Conclusion: From A Few Tools to Autonomous Workflows In this walkthrough, we've explored how combining multiple tools within LangChain unlocks true agentic power. Instead of just fetching search results, our agent plans a multi-step workflow: trend detection, article discovery, and insight generation. This is a general pattern you can adapt: Swap GoogleTrendsFetcher with Twitter trends, internal dashboards, or RSS feedsReplace the search tool with a database query toolUse the final output in newsletters, Slack updates, or dashboards Some potential use cases for the multi-agentic framework Skill gap analyzer: Reads performance reviews, looks at feedback, and goes through the list of available courses and matches the one that suits the user best in terms of upskilling.Automating IT Ticket resolution: One agent could summarize the tickets, followed by another agent looking at the past resolutions for similar ones, and then a third agent implementing the potential fix. Outcome: Personalized employee learning plans based on performance and business goals. As LLMs evolve into core infrastructure, the next frontier will be defined by intelligent agents that can plan, act, and learn from their actions, mimicking real-world decision-making and enabling deeper automation across industries.

By Arjun Bali
Driving Streaming Intelligence On-Premises: Real-Time ML With Apache Kafka and Flink
Driving Streaming Intelligence On-Premises: Real-Time ML With Apache Kafka and Flink

Lately, companies, in their efforts to engage in real-time decision-making by exploiting big data, have been inclined to find a suitable architecture for this data as quickly as possible. With many companies, including SaaS users, choosing to deploy their own infrastructures entirely on their own, the combination of Apache Flink and Kafka offers low-latency data pipelines that are built for complete reliability. Particularly due to the financial and technical constraints it brings, small and medium-sized enterprises often have a number of challenges to overcome when using cloud service providers. One major issue is the complexity of cloud pricing models, which can lead to unexpected costs and budget overruns. This article explores how to design, build, and deploy a predictive machine learning (ML) model using Flink and Kafka in an on-premises environment to power real-time analytics. Why Apache Kafka and Apache Flink? Apache Kafka’s architecture versatility makes it exceptionally suitable for streaming data at a vast ‘internet’ scale, ensuring fault tolerance and data consistency crucial for supporting mission-critical applications. Flink is a high-throughput, unified batch and stream processing engine, renowned for its capability to handle continuous data streams at scale. It seamlessly integrates with Kafka and offers robust support for exactly-once semantics, ensuring each event is processed precisely once, even amidst system failures. Flink emerges as a natural choice for Kafka as a stream processor. Apache Flink enjoys significant success and popularity as a tool for real-time data processing, and accessing sufficient resources. Together, they form a scalable and fault-tolerant foundation for data pipelines that can feed machine-learning models in real time. Use Case: Predictive Maintenance in Manufacturing Consider a manufacturing facility where IoT sensors are gathering data from machinery to determine temperature, vibration, and pressure. With the help of this sensor data, we want to minimize downtime by using real-time machine failure prediction and alerting. Architecture Overview Data Ingestion (Apache Kafka)Stream Processing and Feature Engineering (Apache Flink)Model Serving (Flink + Embedded ML or External Model Server)Real-Time Dashboard or Alert System Setting Up Kafka and Flink On-Prem Install Apache Kafka, version 3.8, on dedicated machines. Relying on ZooKeeper on the operational multi-node Kafka cluster introduced complexity and could be a single point of failure. Kafka’s reliance on ZooKeeper for metadata management was eliminated by introducing the Apache Kafka Raft (KRaft) consensus protocol. This eliminates the need for and configuration of two distinct systems — ZooKeeper and Kafka—and significantly simplifies Kafka’s architecture by transferring metadata management into Kafka itself. Configure Kafka topics for each data stream and tune replication plus partition settings for fault tolerance. To set up an Apache Flink cluster on-premises, first, we will have to prepare or ensure the environment, such as Java installation on all the nodes, network connectivity, SSH key-based authentication for passwordless, etc. The next step is to configure the cluster where flink-conf.yaml on the master node should be edited and subsequently on the worker nodes by ensuring that they are configured to connect to the master. The next step is to stream real-time data from Kafka to Flink for processing. With Flink version 1.18.1 onwards, we can directly consume data from a Kafka topic without an additional connector. Designing the Data Pipeline To design the data pipeline in a nutshell, we can start by defining the topics on the multi-node Kafka cluster and subsequently ingest the real-time or simulated IoT sensors data in JSON or Avro format to Kafka topics. Secondly, we need to use Flink’s DataStream API to consume, parse, and process by fetching the Kafka messages from the topic. Integrating Machine Learning Models With Apache Flink Apache Flink is a great stream processing engine for scaling ML models in real-time applications as it supports high-throughput, low-latency data processing. Flink’s distributed architecture allows it to scale horizontally across clusters of machines. ML inference pipelines can be scaled to handle larger throughput simply by increasing resources (CPU, memory, and nodes). We can embed trained ML models (from frameworks like TensorFlow, PyTorch, XGBoost, etc.) into Flink jobs. There are typically two main approaches to club models: either by model inference in the Flink Pipelines or model serving with External Systems. In model inference in the Flink Pipelines approach, trained ML models (using libraries like TensorFlow, PyTorch, or Scikit-learn) can be exported and loaded into Flink jobs. These models are often serialized and used for inference within Flink’s operators or functions. With the second approach, ML inference can be offloaded to external model servers/services like NVIDIA Triton, and Flink can interact with these services via asynchronous I/O to keep the pipeline non-blocking and scalable. Real-Time Metrics Evaluation and System Tracking For a model monitoring system, Grafana and Prometheus can be a powerful combination. Prometheus is for data collection and storage; on the other hand, Grafana is for visualization and alerting. We need to set up a complete ML model monitoring pipeline using Prometheus and Grafana. Prometheus can collect and store metrics from the ML model integrated with Flink jobs, exposing them via an HTTP endpoint. Grafana then connects to Prometheus and visualizes these metrics in real-time dashboards. Conclusion As organizations look to capture real-time insights from the data generated within their own environments, deploying on-premises streaming intelligence is not just a technical solution but also a strategic advantage. Apache Kafka’s high-efficiency data ingestion capabilities, combined with Apache Flink’s powerful stream processing and support for real-time machine learning, allow businesses to establish an intelligent pipeline with low latency and high throughput entirely based within enterprise confines. This design not only guarantees data sovereignty and conformity but also allows continuous model inference, adaptive decision-making, and fast response to dynamic events. In this article, I have outlined high-level concepts, but implementing them will involve numerous steps, starting from setting up the environment to achieving the desired outcomes. Besides, there are numerous technical problems to solve, including state management in Flink for complex ML models, low-latency predictions at scale, synchronization of model updates, and more. Thank you for reading! If you found this article valuable, please consider liking and sharing it.

By Gautam Goswami DZone Core CORE
Why 99% Accuracy Isn't Good Enough: The Reality of ML Malware Detection
Why 99% Accuracy Isn't Good Enough: The Reality of ML Malware Detection

The threat of malware in enterprises is evolving each year. As enterprises expand their digital footprint through remote work and cloud adoption, their attack surface increases, making them more vulnerable to targeted malware campaigns. FBI’s 2023 Internet Crime Report showed that Business Email Compromise (BEC) scams alone caused over USD 2.9 billion in losses. Investment fraud losses also rose by 38% to USD 4.57 billion, and ransomware caused USD 59.6 million in losses. Other reports paint similarly bleak pictures of the state of enterprise security today. The 2024 IBM Cost of a Data Breach Report shows that the average cost of a data breach jumped 10% to USD 4.88 million. It also shows that organizations using AI in incident prevention saved USD 2.2 million on average. More than half of breached organizations are experiencing severe security staffing shortages, a 26.2% increase from last year. AI tools can help fill the gap. Malware Detection Techniques Digital forensic techniques are traditionally signature-based, using static hashes like SHA-256 and MD5, byte pattern matching techniques like YARA rules or digests based on static analysis like import hashes. Threat actors authors evade these techniques by deploying polymorphic malware that changes its code structure with each infection while maintaining functionality. Ransomware also encrypts its payload with different keys each time it spreads. In general, signatures require prior knowledge of malware and are usually implemented by matching signatures against datasets of known good or bad signatures. Machine Learning Approaches Modern security software employs machine learning approaches to detect unknown and unseen malware. CrowdStrike’s 2024 Global Threat Report highlights the increasing use of ML techniques to detect novel threats and uncover hidden patterns, and for analyzing the large-scale, evolving datasets associated with modern attacks, including cloud-conscious intrusions. Generative AI tools have lowered the entry barrier to the threat landscape for less sophisticated threat actors, and facilitate social engineering and information operations campaigns. This reinforces the need for AI-based counter techniques to combat these new attack vectors. Recent neural network-based malware detection techniques using transfer learning with CNNs and LSTMs have yielded models with over 99% accuracy and over 99% precision rates. In this article, we’ll show why this isn’t sufficient in practice by discussing how the base rate (the number of malware samples evaluated) is a critical metric in determining the rate of false positive detections. We’ll also discuss how these challenges can be mitigated in practice. Evaluating Machine Learning Models Key Metrics and Terminology Performance of any machine learning algorithm generally uses terms like confusion matrix, precision, recall, etc. We will understand their meaning by taking a simple example. Let's suppose the ML Model predicts whether a particular file is malware or not. We ran our ML model on 20 files, out of which it predicted 4 being as malware and 16 as not malware. Out of the 4 files predicted, only 3 had actual malware. We know that there are a total of 6 malware files and 14 regular files. The following table represents the example being taken: Total Files = 20 Predicted Malware Predicted Not Malware Actual Malware = 6 2 4 Not Malware = 14 1 13 Now we define the following terms: Confusion Matrix: The matrix shown above is the confusion matrix. This is the main performance matrix for any classification Model.Precision: As the name suggests, it tells how precise the algorithm is. In the above example it is the ratio of Total Malware Predicted / Total Malware i.e 3/6. In technical terms it is the ratio of true positives to total predicted. That is why Precision is also called positive predictive value.Recall/True Positive Rate: This term tells us about the sensitivity of prediction. It tells us how much the algorithm was actually able to predict properly. In our case it is the ratio of Malware Predicted / Actual Malware i.e. 2/6 = 1/3. Technically, it is ratio of true positives to all positiveFalse Positive Rate: When it is not Malware, how often does the Model predict a file to be Malware. In our example, it is 1/14.Accuracy: This tells us how accurate the Model is. In the above example it is (2+13)/20 = 75%.Misclassification Rate: This is opposite of Accuracy i.e. misclassification rate = 1 - Accuracy = 25%ROC (Receiver Operating Characteristics) is the graphing between TPR over FPR.AUC (Area under the ROC curve). Higher the AUC, better the model. A random model has AUC as 0.5. Model having AUC less than 0.5 means that model is worse than a random classifier model. Prevalence: This tells whether the data used in the classification is sparse or not. In the above example it is the percentage of files that have Malware i.e. 6/20 = 30%. A perfect model has TPR as 1, FPR as 0, AUC as 1. Early detection of Malware is critical for the companies. Mistakenly detecting a file as Malware is not much of an issue as it can be manually marked as not Malware. Thus, it is important to improve the Recall matrix for the Malware detection model(s). The Base Rate Problem Malware dataset(s) have the same issue as the datasets for rare disease or the credit card fraud data i.e. the base rate is very low to identify the malware. If the malware dataset has only 0.01% malware. If the malware detection model marks every file as not malware, then it would be 99.99% accurate. But it would miss all the malware. Hence, it is important to understand the data and various techniques to take care of the base rate. Accounting for base rate changes the precision formula to Plain Text TruePositiveRate x BaseRate –––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––– TruePositiveRate x BaseRate + FalsePositiveRate x (1 - BaseRate) Deploying Malware Detectors in the Enterprise Enterprise Data Volumes A 2022 study by the Ponemon Institute found that the average U.S. enterprise manages approximately 135,000 endpoint devices. Notably, nearly half of these devices are either undetected by IT departments or running outdated operating systems, posing significant security risks. Each device contains thousands of files, most of which are completely benign. Assuming we have a system where 0.01% files are actually malware, and the prediction model has a 99.99% true positive rate and a 0.001% false positive rate. That still gives us a precision of 9%, which means 91% of the alarms are false positives! This highlights the importance of low false positive rates in systems with a very low base rate. Implementation Challenges Given the challenge of low base rates, the training approach for malware detection models needs to be adapted. Oversampling minority class samples or undersampling majority class samples can help lower the rate of false positives, along with using cost-sensitive learning algorithms that weigh errors differently. Hybrid approaches incorporating signature-based techniques to perform anomaly detection can complement pure classification offered by ML models. Careful prediction time threshold tuning based on operational requirements can also increase the accuracy of these models. Future Directions Several promising research directions may help address the fundamental base rate challenge in enterprise malware detection. One approach is developing hierarchical detection systems that use lightweight models for initial screening, followed by more sophisticated analysis only for suspicious files. A related approach is to use active learning techniques, which select the most informative samples for human analysis. These approaches could help security teams validate potential malware and feed that knowledge back into detection models. This could be particularly valuable for enterprises dealing with large volumes of files but limited security staff. The continued development of these techniques, along with careful attention to operational requirements and performance metrics discussed earlier, will be crucial for building malware detection systems that can perform effectively despite the inherent base rate challenges in enterprise environments.

By Udbhav Prasad
Scrum Smarter, Not Louder: AI Prompts Every Developer Should Steal
Scrum Smarter, Not Louder: AI Prompts Every Developer Should Steal

Most developers think AI’s only job is writing code, debugging tests, or generating documentation. But Scrum? That’s still a human mess, full of vague stories, chaotic meetings, and awkward silences. Here’s the truth: prompt engineering can turn AI into your secret Agile assistant—if you know how to talk to it. In this guide, we share field-tested, research-backed prompts that developers can use in real time to make Agile rituals smoother, smarter, and actually useful. Based on findings from Alamu et al. (2024), Verma et al. (2025), and Mitra & Lewis (2025), we show how prompt structures can turn your next standup, sprint planning, or retro into something that works for you, not just for your Scrum Master. Sprint Planning Prompts: From Chaos to Clarity Use Case: Defining scope, estimating work, and avoiding the “What’s this story even mean?” syndrome. Prompt: "As an expert in Agile backlog refinement, help me break down this story: '[insert story text]'. List sub-tasks with realistic developer effort in hours. Flag any missing requirements." Why it works: Adds structure to vague backlog items and creates an actionable breakdown, saving planning time. Prompt: "You are an Agile coach specialized in value prioritization. Here’s a list of five backlog items with estimated effort: [list]. Rank them based on business value impact, risk, and delivery speed." Why it works: Helps developers push back against arbitrary prioritization. Prompt: "Act as a Product Owner. Review these backlog stories: [list]. Suggest any that should be merged, split, or sent back for clarification based on user value." Why it works: Promotes clarity early, reduces mid-sprint surprises. Standups: Async, Remote, and Useful Again Use Case: Remote teams or developers who want to be more concise. Prompt: "Act as a standup facilitator. Summarize my work in these bullet points: [insert]. Highlight blockers and suggest one follow-up question I can ask the team." Why it works: Refines communication and highlights action. Prompt: "You are a Scrum lead tracking momentum. Based on this Git log and ticket status, generate a concise standup update (Yesterday/Today/Blockers): [insert data]." Why it works: Builds a data-driven update without fluff. Prompt: "As a burnout-aware Agile bot, review these updates: [insert]. Flag any signs of overload or repeated blockers, and suggest wellness check-in prompts." Why it works: Adds a human touch through AI. Retrospectives: Say What Needs Saying (Without the Drama) Use Case: Emotional tension, team friction, or addressing recurring issues. Prompt: "You are a retrospective expert. Analyze these notes: [insert retro notes or observations]. Suggest 3 ‘Start/Stop/Continue’ talking points that are tactful but honest." Why it works: Offers safe but direct feedback phrasing. Prompt: "As an Agile conflict mediator, suggest retro feedback for this situation: [describe team tension]. Focus on constructive language and psychological safety." Why it works: Coaches developers through conflict-aware participation. Prompt: "Act as an AI retro board tool. Cluster the following feedback into themes and suggest one lesson learned per theme: [feedback list]." Why it works: Organizes chaos into insight, fast. Ticket Crafting: User Stories That Actually Work Use Case: Turning chaos into structured tickets that meet expectations. Prompt: "As a certified Product Owner, help me rewrite this vague task into a full user story with acceptance criteria: [insert task]. Format it in the ‘As a… I want… so that…’ style and add 3 testable conditions." Why it works: Bridges development thinking with business expectations. Prompt: "You are a Jira expert and Agile coach. I need to document a technical debt ticket that meets DOD. Convert this explanation into a clean ticket description and add a checklist for completion." Why it works: Helps developers write what gets accepted and shipped. Prompt: "Act like a QA reviewer. Scan this user story: [story]. Suggest edge cases or acceptance tests we might have missed." Why it works: Avoids future rework by adding a testing lens early. Sprint Syncs and Review Prep: Impress Without Overthinking Use Case: Showing progress without turning into a status robot. Prompt: "Act like a Scrum Master prepping for Sprint Review. Based on this list of closed tasks, create a short impact summary and link to business goals." Why it works: Connects delivery to outcomes. Prompt: "As a technical demo expert, outline a 3-minute walkthrough script for this feature: [insert feature]. Include who it’s for, what problem it solves, and how it works." Why it works: Makes Sprint Reviews easier to navigate. Prompt: "Act as a release coordinator. Based on this sprint’s output, draft a release note with technical highlights, known limitations, and user-facing improvements." Why it works: Delivers value to internal and external stakeholders. This Is Not Cheating Using AI in Agile isn’t about faking it—it’s about making the system work for your brain. These prompts don’t replace human discussion. They just help developers show up prepared, focused, and less drained. So next time your backlog makes no sense, or your standup feels pointless, try typing instead of talking. Let the AI sharpen your edge—one prompt at a time. Why This Research Matters for Developers At a glance, integrating AI into Agile rituals may seem like a tool for managers or coaches, but developers stand to benefit just as much, if not more. That’s why so much current research is digging into the impact of prompt engineering specifically tailored for technical contributors. These aren't academic fantasies. They're responses to real developer pain points: vague tickets, unproductive standups, poorly scoped retros, and communication fatigue. Frameworks such as Prompt-Driven Agile Facilitation and Agile AI Copilot don’t just suggest AI can help—they show how developers can use targeted, structured prompts to support both solo and team productivity. These studies are increasingly reflecting the reality of hybrid work: asynchronous meetings, remote collaboration, and cross-functional handoffs. We’re seeing tools and bots being created that support retrospectives (Nguyen et al., 2025), sprint demos, and conflict resolution (Kumar et al., 2024), not because developers can't manage these—but because time and energy are finite. Prompt-based systems reduce friction and help technical teams align faster. They don't take the human out of Agile—they reduce the waste that prevents teams from being truly Agile. More importantly, this isn’t about creating robotic output. It’s about giving developers ownership of the process. These prompts act as a developer’s voice coach, technical writer, and backlog cleaner—all rolled into one. That’s why researchers are paying attention: prompt engineering isn't a passing trend. It's becoming a silent infrastructure in high-performing teams. So, if you’ve ever sat through a meaningless retro or received a user story that made no sense, know that AI isn't replacing your voice. It's amplifying it. You just need to know what to ask. Research Foundations Prompt-Driven Agile Facilitation – Alamu et al. (2024)The Role of Prompt Engineering in Agile Development – Verma et al. (2025)Agile Standups with Conversational Agents – Mitra & Lewis (2025)Retrospectives Enhanced by Prompted AI Tools – Nguyen et al. (2025)Agile AI Copilot: Prompting and Pitfalls – Carlsen & Ghosh (2024)Guiding LLMs with Prompts in Agile Requirements Engineering – Feng & Liu (2023)Prompt-Based Chatbots in Agile Coaching – Kumar et al. (2024)AI Prompts in Agile Knowledge Management – Samadi & Becker (2025)

By Ella Mitkin
AI's Cognitive Cost: How Over-Reliance on AI Tools Impacts Critical Thinking
AI's Cognitive Cost: How Over-Reliance on AI Tools Impacts Critical Thinking

Tools driven by artificial intelligence (AI) are improving our learning, working, and decision-making processes. There is a cost involved in this change, though. With an eye toward cognitive offloading—the process by which we assign mental tasks to outside aids like digital assistants, search engines, or recommendation systems. In this article, I examine a 2025 study by Michael Gerlich. It provides a thorough and sophisticated investigation into how artificial intelligence tools influence critical thinking. The study offers a data-rich perspective on a rising issue: depending more on AI to think for us reduces our own cognitive capacity. Gerlich's study combined 50 qualitative interviews with a systematic survey of 666 UK-based participants. The study used validated instruments, including Terenzini's self-reported cognitive development measures and the Halpern Critical Thinking Assessment (HCTA), to choose participants to reflect different age groups and educational backgrounds. Two main questions directed the research: RQ1: In what ways might using AI tools affect critical thinking abilities? RQ2: In the link between artificial intelligence tool usage and critical thinking, what mediator role does cognitive offloading serve? Two hypotheses sprang from these questions: H1: Reduced critical thinking ability is linked to more artificial intelligence tools used. H2: Cognitive offloading moderates the link between critical thinking ability and AI tool usage. Combining descriptive statistics, ANOVA, correlation analysis, multiple regression, and random forest regression, the study tested these hypotheses. Semi-structured interviews yielded qualitative insights, which were examined using Braun and Clarke's six-phase thematic framework. Some important conclusions were later drawn. AI Use Negatively Correlates With Critical Thinking Support for the first hypothesis was found by analysis showing a strong negative correlation between frequent use of AI tools and critical thinking abilities. Participants who regularly turned to AI tools for information retrieval and decision-making scored lower on measures of critical thinking. AI tool use and critical thinking: r = -0.68 Pearson correlation ANOVA findings: p = 0.001 Multiple regression coefficient for artificial intelligence tool use: -1.76 (p < 0.001) Frequent users of AI tools also showed less involvement with deep-thinking activities and reported more dependence on AI for decisions. This corresponds with worries that convenience might compromise free will to reason. Cognitive Offload as a Mediating Variable Furthermore supported was the second hypothesis. One important mediator in the link between artificial intelligence use and reduced critical thinking turned out to be cognitive offloading. Cognitive offloading and critical thinking: r = -0.75Mediating indirect effect: b = -0.25, SE = 0.06, p = 0.001 Total effect of artificial intelligence on critical thinking: b = -0.42, SE = 0.08, p = 0.001 The results highlight that cognitive offloading is not only a side effect but also a major clarifying factor for why critical thinking might drop in environments high in artificial intelligence. Those who rely on artificial intelligence tools to finish mental tasks are less likely to do reflective analysis, hypothesis testing, and decision-making. Variations in Demographic AI Usage and Critical Thinking The study also looked at how age and education level affected critical thinking performance, cognitive offloading, and usage of AI tools. Along with the lowest critical thinking scores, younger participants (17–25 years) used artificial intelligence tools and cognitive offloaded most. Older participants (46 years of age and above) reported much reduced use of artificial intelligence tools and improved critical thinking capacity. Deep thought activities and critical thinking scores were favorably correlated with increasing degrees of higher education. Using Kruskal-Wallis and Dunn's tests, post hoc analysis revealed that those with a bachelor's degree or above participated more often in deep thinking activities than those with only secondary education. This suggests that reducing the possible cognitive costs of artificial intelligence use depends much on educational exposure. Random Forest Regression Outcomes Using a random forest regression to find the most significant predictors of critical thinking complemented conventional regression. With R² = 0.370, the model accounted for 37% of the variance in critical thinking scores. The most important traits were: Cognitive outloading Degree of education Activities using deep thought Reliance on AI decisions Usually distributed were residuals; cross-validation verified the model's robustness. Furthermore, important was the interaction term between AI tool use and education level, which implies that more education could help to offset the negative consequences of artificial intelligence dependence. Thematic Understanding Gleaned from Interviews Important background for the statistical findings came from the qualitative data from fifty semi-structured interviews. The interviews turned out to have three main themes: 1. One's Reliance on Artificial Intelligence Participants regularly claimed to use AI tools for a variety of tasks, including decision-making, information retrieval, and scheduling. Many said they couldn't live without artificial intelligence in their daily lives. "I find information using AI and schedule everything using it as well. It's evolved into something I consider to be natural. This reliance on artificial intelligence tools reveals a cognitive change whereby outside systems progressively replace internal mental effort. 2. Lower Cognitive Involvement Participants voiced worries about how artificial intelligence tools were limiting their chances to practice deep introspection. "I feel less need to personally solve problems the more artificial intelligence I use. Like I'm losing my capacity for critical thinking.". Particularly among younger participants and those who regularly applied artificial intelligence in both personal and professional spheres, this attitude was rather common. 3. Ethical and Trust Issues Interviewees also expressed concerns about the dependability and openness of artificial intelligence technologies. "I sometimes wonder if artificial intelligence is gently guiding me towards decisions I wouldn't usually make". While some participants admitted to using AI-generated recommendations without checking, others were dubious of them. This emphasizes how dangerous naive faith in artificial intelligence systems can be. Chris Westfall explores in a Forbes piece how artificial intelligence is changing our decision-making and information processing, so often reducing our dependence on our own cognitive capacity. Although artificial intelligence tools provide ease, he points out that as people rely more on automated systems, they might also cause a drop in critical thinking and problem-solving ability. How Artificial Intelligence Affects Cognitive Function: Are Our Brains Under Attack? SFI Health expresses worries on the overuse of artificial intelligence and its possible capacity to limit our capacity for critical thinking and autonomous development. To maintain brain health, the paper stresses the need to match AI use with activities that boost cognitive ability, such as digital detoxing and screen time monitoring. Artificial Intelligence and the Erasure of Human Cognition Psychology Today looks at how the entry of artificial intelligence into spheres of human cognition marks a basic change. The paper challenges the uniqueness of human cognition in an AI-augmented environment and addresses how depending too much on AI could cause a drop in our cognitive capacity, including critical thinking and creativity. AI Systems Replacing Cognitive Activities Research reported on the National Center for Biotechnology Information (NCBI) website shows that people may lose mental engagement and stimulation when artificial intelligence systems replace cognitive tasks. Lack of active cognitive participation might cause critical thinking to drop, problem-solving ability to fade, and creativity to suffer. Concerning Society and Education Gerlich's results imply that user involvement and openness should come first in the design of artificial intelligence tools. AI systems should inspire consumers to consider, check, and critically assess material rather than providing ready-made responses, which can create an environment for cognitive invasions. Interventions in Education Schools must include critical thinking instruction in courses utilizing artificial intelligence-enhanced learning environments. Although AI-based grading systems and tutoring tools might increase efficiency, they should not replace the need for students to think creatively and independently. Using adaptive learning systems has to be counterbalanced with activities promoting alternative thinking, evidence evaluation, and hypothesis generation. Higher critical thinking development is correlated with instructional strategies encouraging active learning, including group discussions and problem-based learning. Correcting the Digital Divide According to the study, people with a lesser educational background could be more vulnerable to the negative cognitive effects of artificial intelligence use. This begs questions regarding a digital cognition divide: those without the knowledge to use artificial intelligence critically could lag behind as it spreads more and more. Policies and Workplace Issues Particularly in sectors like healthcare and finance, organizations should exercise great caution when implementing AI-driven decision-support systems. Although these systems can simplify processes, they might also discourage professionals from participating in autonomous critical analysis, so causing over-dependence and less responsibility. Methodological Accuracy The method of the study is strong. To improve validity, it used both quantitative and qualitative data triangulation in addition to a statistically relevant sample size—666 participants against a needed 384. The survey was first tested; member-checked interview transcripts were used in thematic analysis using accepted frameworks. Gerlich was able to verify that the cognitive effects of artificial intelligence tool use are both statistically significant and experientially validated by combining several data sources and analytical approaches. Conclusion Michael Gerlich's 2025 research offers convincing proof, especially via the mechanism of cognitive offloading, that over-reliance on artificial intelligence technologies is linked to diminished critical thinking. While higher education and participation in deep thinking events act as protective elements, younger people and those with reduced educational attainment seem most vulnerable. It is impossible to overlook the impact of artificial intelligence tools on our cognitive processes as they continue to develop and integrate into daily life. The difficulty is striking a balance — design and application of artificial intelligence should improve rather than replace human cognition. Gerlich's studies, together with ideas from current literature, remind us of the ongoing value of deep thinking, reflective judgment, and intellectual autonomy in a digital age that honors speed and convenience. AI can help us, but it should never be let to think for us instead.

By Srinivas Chippagiri DZone Core CORE
Create POM With LLM (GitHub Copilot) and Playwright MCP
Create POM With LLM (GitHub Copilot) and Playwright MCP

Test automation is a critical part of modern software development, but maintaining test scripts for dynamic web applications can be a challenge. The Page Object Model (POM) is a proven design pattern that makes test suites maintainable and scalable. When paired with GitHub Copilot, an AI-powered coding assistant, and Playwright’s Model Context Protocol (MCP), you can supercharge your automation workflow with intelligent code generation and seamless tool integration. In this blog, we’ll walk through how to create a POM-based test automation framework using Playwright, leverage GitHub Copilot to write and optimize code, and integrate Playwright MCP to enable AI-driven browser interactions. Whether you’re a QA engineer or a developer, this guide will help you build a robust, AI-enhanced testing solution. Page Object Model (POM) The Page Object Model is a design pattern that organizes test automation code by representing each web page or component as a class (a Page Object). These classes encapsulate the page’s elements (e.g., buttons, inputs) and interactions (e.g., clicking, typing), keeping test logic separate from UI manipulation. Benefits of POM Maintainability: Update one Page Object class when the UI changes, instead of rewriting multiple tests.Reusability: Reuse Page Objects across test cases to reduce code duplication.Readability: Write clear, business-focused test scripts that are easy to understand.Scalability: Modular code structure supports large, complex projects. GitHub Copilot: Your AI Coding Partner GitHub Copilot, powered by OpenAI’s Codex, is an AI-driven coding assistant integrated into IDEs like Visual Studio Code. It suggests code snippets, completes functions, and even generates entire classes based on your context or comments. For test automation, Copilot can: Generate boilerplate POM classes and test scripts.Suggest Playwright locators based on page descriptions.Write assertions and error-handling logic.Optimize existing code for better performance or readability. Playwright MCP: Bridging AI and Automation The Model Context Protocol (MCP) is an emerging standard (popularized by Anthropic and adopted by tools like Playwright) that enables AI models to interact with external systems, such as browsers, APIs, or databases. Think of MCP as a universal adapter that lets AI tools like Copilot control Playwright’s browser automation capabilities. With Playwright MCP, you can: Automate browser actions: Navigate pages, click elements, or fill forms via AI-driven commands.Integrate with AI: Allow Copilot to dynamically generate and execute Playwright commands.Integrate with database: Combine browser automation with API calls or database queries. Why Use GitHub Copilot and Playwright MCP Together? Using GitHub Copilot and Playwright MCP (Model Context Protocol) together enhances the development and testing workflow by combining AI-driven code assistance with advanced browser automation capabilities. Here’s why they are powerful when used together: Faster test creation: GitHub Copilot generates Playwright test scripts from natural language prompts, saving coding time.Reliable automation: Playwright MCP uses the accessibility tree for robust, cross-browser test execution.Enhanced productivity: Copilot suggests optimized code, while MCP automates browser tasks, streamlining workflows. Step-by-Step Guide: Building a POM With GitHub Copilot and Playwright MCP Here are the steps to set up Playwright Model Context Protocol (MCP) in Visual Studio Code for browser automation with GitHub Copilot: Step 1: Install Prerequisites Ensure Node.js (version 14 or higher) is installed. Verify with node -v and npm -v in a terminal. Download from nodejs.org if needed.Install Visual Studio Code (version 1.99 or later). Download from code.visualstudio.com.Install the GitHub Copilot extension in VS Code via the Extensions Marketplace. Step 2: Configure Playwright MCP Server In VS Code, open or create the settings.json file (File > Preferences > Settings > Open Settings (JSON)).Add the following configuration to enable the Playwright MCP server: JSON { "mcp": { "servers": { "playwright": { "command": "npx", "args": ["@playwright/mcp@latest"] } } }, } Step 3: Alternatively, Use the Command Shell code --add-mcp '{"name":"playwright","command":"npx","args":["@playwright/mcp@latest"]}' Step 4: Select Agent in GitHub Copilot Once all the above setups are done in GitHub Copilot, select “Agent.” Step 5: Verify the Installed Tool To verify whether the tool is installed properly, click on the tool icon and verify all the available tools, e.g., browser_close, browser_resize, etc. Create POM With GitHub Copilot and Playwright MCP Once all the above setup is complete, the next step is to provide instructions or prompts to the LLM (GitHub Copilot). Use Case For demo purposes, we are using the scenario below and asking GitHub Copilot and Playwright MCP to create a POM. Plain Text Create a POM model with the steps below: 1.Open https://www.saucedemo.com/ 2.Login with username and password 3.Add product “Sauce Labs Backpack” into the cart 4.Open the cart 5.Click on Checkout button 6.Fill random data in First Name,Last Name and Zip 7.Click on continue button 8.Click on Finish button 9.Verify message “Thank you for your order” Steps 1. Open the terminal. 2. Create a directory, e.g., mkdir MCP_DEMO. 3. Open the created directory in VS Code. 4. Now give the above use case step to GitHub Copilot. In the screenshot below, you can see how we can create a POM for the provided sites/steps in a few minutes. In the screenshot below, we can see the pages and test classes created in the respective folder. Video Conclusion GitHub Copilot and Playwright MCP help build robust automation frameworks in significantly less time. This AI-powered setup boosts productivity by accelerating code generation and simplifying browser interactions. However, while it streamlines development, the end user must always review and validate the generated code to ensure quality and accuracy. Despite this, the combination is a game-changer for QA engineers and developers aiming for scalable, maintainable, and intelligent test automation.

By Kailash Pathak DZone Core CORE
AI Agents in PHP with Model Context Protocol
AI Agents in PHP with Model Context Protocol

If you are building AI agents you’ve probably heard about MCP (Model Context Protocol). Actually, really everyone is talking about MCP right now. From what I’ve read online, it seems many people don’t seem to have the slightest idea what we’re talking about, and the new product development opportunities associated with that. I want to break down a couple of key concepts to help establish the foundational understanding you might need as a software developer exploring new ideas for agent implementations. This will also help clarify the role of MCP servers when directly connected to an AI agent, such as a Neuron AI Agent, as one example. Introduction to LLM Tools One of the things we as engineers love are standards. And the reason standards are important is they allow for us as engineers and developers, to build systems that communicate with each other. Imagine the REST APIs. The idea to have a standard process to authenticate and use a third party service, created a big wave of innovation for years. The idea behind MCP is to allow developers to implement a standard protocol to expose their application services to LLMs. At this point, we have to remember that LLM by themselves are incapable of doing anything. They are just “token tumblers”. If you open any LLM chat and you ask to send an email it won’t know how to do that. It will just tell you “I don’t know how to send an email, I can eventually write the email content if you want.” At its core an LLM can just manage text. The next evolution of these platforms was when developers figured out how to combine the LLMs capabilities with a mechanism to make functions (or callbacks) available to LLMs. Take the ability of the recent chat interface where you can paste a web url into the message, and the LLM is able to fetch its content to give you the final response. Imagine this prompt: “Can you give me advice on how can I improve SEO performance of this article: https://exmaple.com/blog/article” The LLM itself is not capable of doing this task. But what developers have done is to construct a textual protocol to make LLMs able to ask the program that is executing the LLM for a function in charge of providing the content of the web page to continue to formulate the final response. Using this mechanism developers were able to implement and provide any sort of functions (tools) to LLMs in order to perform actions and resolve user questions with information that are not in their training dataset. If you do not provide an LLM with a tool to get the content of a web page, they basically are not able to complete this task. You can now make functions available to LLMs also to perform queries to your database, or gather information from external APIs, or any other task you need for your specific usa case. Before MCP (Model Context Protocol) LLMs started to become more powerful when we connect tools to them, because we can join the reasoning power of LLM with the ability to perform actions to the external world. The problem here is that it could be really frustrating if you need to build an assistant that does multiple tasks. Imagine reading your email, searching on the internet, gathering information from your database, connecting to external services like Google Drive to read documents, GitHub for code, knowledge base, and any other sort of resources. You can imagine how the implementation of the Agent could become really cumbersome. It could also be really complicated to stack all these tools together and make them able to work coherently in the context of the LLM. Another level of complexity is that each service we want to connect with has its own APIs, with different technical requirements, and they should be implemented from scratch by every developer that wants to talk with these external services. Some companies can do it, for many others it would simply be impossible. This is where we are right now. Introducing MCP (Model Context Protocol) MCP is a layer between your LLM and the tools you want to connect. Now companies can implement an MCP server that is basically a new way to expose their APIs, but in a way that is ready to use by LLMs. Think about the Stripe APIs. They provide features to access any kind of information about subscriptions, invoices, transactions, and so on. Using the Stripe MCP server (built by Stripe) you can basically expose the entire Stripe APIs to your LLM to make it able to gather information and answer questions about the status of your finance, or customer questions about their subscriptions and invoices. The Agent can even perform actions like cancel subscriptions or activate a new one for a customer. You just have to install the MCP server, connect your agent to the resources exposed by the server (we are going to see how to do it in a moment), and you instantly have an Agent with powerful skills without all the effort to implement the Stripe API calls. Furthermore you no longer have to worry in case Stripe changes its APIs. Even highly interconnected systems, made up of multiple steps dependent on each other, can be developed more easily and be more reliable. Using simple tools to implement all actions one at a time would be impossible to overcome certain levels of complexity. How MCP Works Let’s get into a practical example of how you can host an MCP server to be used by your Agents. At its core MCP needs three components to work: A Host, an MCP server, and an MCP client. Don’t let the word “server” fool you. At this stage of the protocol implementations the MCP server runs in the same machine of your Agent. They communicate via the standard input/output local interface (stdio). Probably in the future it will be possible to host MCP servers remotely, but now it works on board the same machine. So you have to install the MCP server on your computer first during development, and in your cloud machine if you want to deploy the implementation in the production environment. I will go deeper into MCP server installation in another dedicated article, for now you can access the installation instructions on the MCP servers repository. Here are some websites where you can explore a list of available MCP servers: https://github.com/modelcontextprotocol/servershttps://mcp-get.com/ At this stage it’s not all sunshine and rainbows, there are some technical things to configure. You have to set up the server, configure some files, but once you figure it out, your Agents can become very powerful and capable of completing any sort of tasks autonomously. Connect Your AI Agent to MCP servers in PHP To get started with AI agent development in PHP, you can install the Neuron AI framework. Neuron is an open source project designed to support PHP developers in building agent-based applications without switching to another language. It includes tools for creating agents, working with retrieval-augmented generation (RAG) systems, using vector stores, generating embeddings, and monitoring system behavior. Installation is available via Composer using the following command. You can find more technical details in the documentation. You can install Neuron via composer with the command below: Shell composer require inspector-apm/neuron-ai Create your custom agent extending the NauronAI\Agent class: PHP use NeuronAI\Agent; use NeuronAI\Providers\AIProviderInterface; use NeuronAI\Providers\Anthropic\Anthropic; class MyAgent extends Agent { public function provider(): AIProviderInterface { // return an AI provider (Anthropic, OpenAI, Mistral, etc.) return new Anthropic( key: 'ANTHROPIC_API_KEY', model: 'ANTHROPIC_MODEL', ); } public function instructions() { return "LLM system instructions."; } } Now you need to attach tools to the Agent so it can perform tasks in the application context and resolve questions sent by you or your users. If you need to implement a specific action related to your specific environment you can attach a Tool and create your own implementation: PHP use NeuronAI\Agent; use NeuronAI\Providers\AIProviderInterface; use NeuronAI\Providers\Anthropic\Anthropic; use NeuronAI\Tools\Tool; use NeuronAI\Tools\ToolProperty; class MyAgent extends Agent { public function provider(): AIProviderInterface { // return an AI provider (Anthropic, OpenAI, Mistral, etc.) return new Anthropic( key: 'ANTHROPIC_API_KEY', model: 'ANTHROPIC_MODEL', ); } public function instructions() { return "LLM system instructions."; } public function tools(): array { return [ Tool::make( "get_article_content", "Use the ID of the article to get its content." )->addProperty( new ToolProperty( name: 'article_id', type: 'integer', description: 'The ID of the article you want to analyze.', required: true ) )->setCallable(function (string $article_id) { // You should use your DB layer here... $stm = $pdo->prepare("SELECT * FROM articles WHERE id=? LIMIT 1"); $stm->execute([$article_id]); return json_encode( $stmt->fetch(PDO::FETCH_ASSOC) ); }) ]; } } For other tools you can search for a ready to use MCP servers and attach the exposed tools to your agent. Neuron provides you with the McpConnector components to automatically gather available tools from the server and attach them to your Agent. PHP use NeuronAI\Agent; use NeuronAI\Providers\AIProviderInterface; use NeuronAI\Providers\Anthropic\Anthropic; use NeuronAI\Tools\Tool; use NeuronAI\Tools\ToolProperty; class MyAgent extends Agent { public function provider(): AIProviderInterface { // return an AI provider (Anthropic, OpenAI, Mistral, etc.) return new Anthropic( key: 'ANTHROPIC_API_KEY', model: 'ANTHROPIC_MODEL', ); } public function instructions() { return "LLM system instructions."; } public function tools(): array { return [ // Load tools from an MCP server ...McpConnector::make([ 'command' => 'npx', 'args' => ['-y', '@modelcontextprotocol/server-everything'], ])->tools(), // Your custom tools Tool::make( "get_article_content", "Use the ID of the article to get its content." )->addProperty( new ToolProperty( name: 'article_id', type: 'integer', description: 'The ID of the article you want to analyze.', required: true ) )->setCallable(function (string $article_id) { // You should use your DB layer here... $stm = $pdo->prepare("SELECT * FROM articles WHERE id=? LIMIT 1"); $stm->execute([$article_id]); return json_encode( $stmt->fetch(PDO::FETCH_ASSOC) ); }) ]; } } Neuron automatically discovers the tools exposed by the server and connects them to your agent. When the agent decides to run a tool, Neuron will generate the appropriate request to call the tool on the MCP servers and return the result to the LLM to continue the task. It feels exactly like with your own defined tools, but you can access a huge archive of predefined actions your agent can perform with just one line of code. You can also check out the MCP server connection example in the documentation. Let Us Know What You Are Building It was clear to me that MCP is a big deal in the sense that now Agents can become capable of doing an incredible amount of tasks with so little effort in developing these features. With all these tools available also the product development opportunities will grow exponentially. And I can’t wait to see what you are going to build.

By Valerio Barbera
Smarter IoT Systems With Edge Computing and AI
Smarter IoT Systems With Edge Computing and AI

The Internet of Things (IoT) is no longer just about connectivity. Today, IoT systems are becoming intelligent ecosystems that make real-time decisions. The convergence of edge computing and artificial intelligence (AI) is driving this transformation, meaning that IoT devices can now locally process their own data, then act autonomously. This revolutionizes industries, from healthcare and agriculture to smart cities and autonomous vehicles. When Edge Computing Meets AI Traditional IoT has a central cloud architecture used for data processing and analysis. While effective, this model struggles to meet the demands of real-time applications due to: Latency: Transmitting data to and from the cloud can be delayed, and that can delay critical decision-making.Bandwidth: IoT data can overwhelm networks and increase costs when large volumes of it need to be transmitted to the cloud.Privacy: Breachable and compliance-violating sensitive data sent to centralized servers. By running AI capabilities at the edge, IoT devices can run analyses locally, without the delay of transmitting data to the cloud, providing faster, more secure, and affordable operations. Edge-AI IoT Systems Have Key Applications Wearable devices like smart watches monitor real-time health metrics, like heart rate and blood oxygen levels, and alert users and healthcare providers of any anomalies without the need to send data to the cloud. AI algorithms running at the edge assist in the early diagnosis of arrhythmias and sleep apnea.Edge AI IoT systems manage traffic lights, reducing congestion by dynamically adjusting signals according to real-time vehicle data.Edge AI sensors embedded in waste management systems optimize garbage collection schedules, saving resources and reducing emissions.With AI-driven image recognition, edge-enabled drones analyze crop health so farmers can focus irrigation and pest control efforts where it’s needed most. Soil sensors used localized AI to suggest when to plant and how much fertilizer to use, maximizing yield while limiting resource usage. Edge AI systems are used to process data from cameras, LIDAR, and sensors in self-driving cars immediately for safe navigation, without waiting to receive instructions from the cloud. Stores use AI for smart shelves that monitor inventory and customer behavior, giving insight into product placement and stock replenishment. Technological Synergies The intersection of edge computing and AI is made possible by advancements in several key areas, such as: Hardware Acceleration: Specialized chips like GPUs and TPUs can make sure IoT devices run AI models efficiently at the edge.On-Device Machine Learning: ML models that are lightweight minimizes computation while keeping accuracy, which tend to fit better on edge devices.5G Connectivity: Edge AI IoT systems are better served by high-speed, low-latency 5G networks.Federated Learning: By training the AI models collaboratively across edge devices, data privacy is maintained, yet system-wide intelligence is improved. Challenges in Implementation Despite its potential, integrating AI with edge computing in IoT systems presents challenges: Hardware Constraints: Many IoT devices have very limited processing power and memory, making it challenging to run complex AI models.Interoperability: Integration efforts in IoT ecosystems often involve a variety of devices and standards, making it quite complex.Cost: Edge-AI systems are expensive to develop and deploy, especially for smaller and medium-sized enterprises.Security Risks: Edge computing cuts down on data exposure, but the edge devices themselves are also potential targets for cyberattacks. The Future of Edge-AI IoT Systems AI-Driven Maintenance: Predictive maintenance will be pervasive everywhere, with less equipment downtime and a longer lifespan.Decentralized AI Networks: IoT systems will increasingly leverage decentralized networks of AI, powered by devices learning and adapting together in a collaborative way rather than relying on centralized data hubs.Energy Efficiency: Low-power AI hardware advances will enable sustainable edge-AI IoT systems, which are important in the remote or resource-constrained application area.Next-Generation Smart Cities: Next-generation urban infrastructure will be built on Edge AI IoT systems, such as self-healing power grids, intelligent transportation systems, and real-time disaster management. The combination of edge computing and AI is not just improving IoT systems; it’s starting to redefine what they can do. These technologies are making ecosystems that are smarter and more responsive by enabling devices to think, learn, and act autonomously. Industries are now adopting Edge AI IoT solutions, and the benefits of increased efficiency, security, and innovation will restructure how we live and work in a connected world.

By Surendra Pandey
When Incentives Sabotage Product Strategy
When Incentives Sabotage Product Strategy

TL;DR: When Incentives Sabotage Product Strategy Learn why many Product Owners and Managers worry about the wrong thing: saying no instead of saying yes to everything. This article reveals three systematic rejection techniques that strengthen stakeholder relationships while protecting product strategy to avoid organizational incentives sabotaging product strategy. Discover how those drive feature demands, why AI prototyping complicates strategic decisions, and how transparent Anti-Product Backlog systems transform resistance into collaboration. The Observable Problem: When Organizational Incentives Create Anti-Product Behaviors Product Owners and Managers often encounter a puzzling dynamic: stakeholders who champion features that clearly misalign with product strategy, resisting rejection with surprising intensity. While individual stakeholder psychology gets attention, the more powerful force may be systemic incentives that reward behaviors incompatible with desired product success. Charlie Munger’s observation proves relevant here: “Never, ever, think about something else when you should be thinking about the power of incentives.” Few forces shape human behavior more predictably than compensation structures, performance metrics, and career advancement criteria. Consider the sales director pushing for a dashboard feature that serves three enterprise prospects. Their quarterly bonus depends on closing those deals and rationalizing the feature request from their incentive perspective, even if it contradicts the product strategy. The customer support manager advocating for complex workflow automation may face performance reviews based on ticket resolution times, not customer satisfaction scores. These aren’t character flaws or political maneuvering. They’re logical responses to organizational incentive structures. Until Product Owners and Managers recognize incentive patterns driving stakeholder behavior, rejection conversations will address symptoms while ignoring causes. The challenge compounds when organizations layer agile practices onto unchanged incentive systems. Teams practice “collaborative prioritization,” while stakeholders receive bonuses for outcomes that require non-collaborative resource allocation. The resulting tension manifests as resistance to strategic rejection, which Product Owners and Managers often interpret as relationship problems rather than systems problems. The Generative AI Complication: When Low-Cost Prototyping Enables Poor Strategy Generative AI introduces a new dynamic that may make strategic rejection more difficult: the perceived reduction in experimentation costs. Stakeholders can now present Product Owners with quick prototypes, mockups, or even functioning code snippets, arguing that implementation costs have dropped dramatically: “Look, I already built a working prototype in an hour using Claude/ChatGPT/Copilot. How hard could it be just to integrate this?” becomes a common refrain. This generally beneficial capability creates an illusion that feature requests now carry minimal technical debt or opportunity cost. The fallacy proves dangerous: running more experiments doesn’t equate to delivering more outcomes. AI-generated prototypes may reduce initial development time, but don’t eliminate the strategic costs of unfocused Product Backlogs. Regardless of implementation speed, every feature (request) still requires user research, quality assurance, maintenance, support documentation, and, most critically, cognitive load from users navigating increasingly complex products. Worse, the ease of prototype generation may push teams toward what you may call the “analysis-paralysis zone:” endless experimentation without clear hypotheses or success criteria. When stakeholders can generate working demos quickly or assume the product team can, the pressure to “just try it and see” intensifies, potentially undermining the strategic discipline that effective product management requires. Product Owners need frameworks for rejecting AI-generated prototypes based on strategic criteria rather than technical feasibility. The question isn’t “Can we build this quickly?” but “Does this experiment advance our strategic learning objectives?” Questioning Assumptions About Stakeholder Collaboration The Agile Manifesto’s emphasis on “collaboration over contract negotiation” may create unintended consequences when stakeholder incentives misalign with product strategy. While collaboration generally produces better outcomes than adversarial relationships, some interpretations of collaboration may actually inhibit strategic clarity. Consider this hypothesis: endless collaboration on fundamentally misaligned requests might be less valuable than clear, well-reasoned rejection. This approach contradicts conventional wisdom about stakeholder management, which may not account for modern incentive complexity. The distinction between outcomes (measurable business results) and outputs (features shipped) becomes critical here. Stakeholder requests typically focus on outputs, possibly because their performance metrics reward feature delivery rather than business impact. However, optimizing for stakeholder comfort with concrete deliverables may create “feature factories,” organizations that measure success by shipping velocity rather than strategic advancement. Understanding stakeholder incentive structures seems essential for effective rejection conversations. Stakeholder requests aren’t inherently problematic, but they optimize for individual stakeholder success rather than product strategy coherence. Effective rejection requires acknowledging these incentive realities while maintaining strategic focus. The Strategic Framework: A Proven Decision-Making System to Avoid Incentives Sabotage Product Strategy The following Product Backlog management graphic illustrates a sophisticated and proven decision-making system that many Product Owners and Managers underutilize. It isn’t a theoretical framework; it represents battle-tested approaches to strategic resource allocation under constraint. The alignment-value pipeline concept demonstrates how ideas flow from multiple sources (stakeholder requests, user feedback, market data) through strategic filters (Product Goal, Product Vision) before reaching development resources. This systematic approach ensures that every feature request undergoes strategic evaluation rather than ad-hoc prioritization. The framework’s key strengths lie in its transparency and predictability. When the decision criteria are explicit and consistently applied, stakeholders can understand why their requests receive specific treatment. This transparency reduces political pressure and relationship friction because rejection feels systematic rather than personal. Moreover, it applies to everyone, regardless of position. The Anti-Product Backlog component proves particularly powerful for managing stakeholder relationships during rejection conversations. Rather than dismissing ideas, this approach documents rejected requests with clear strategic rationales, demonstrating respect for stakeholder input while maintaining product focus. The experimental validation loop directly addresses the generative AI challenge. Instead of building features because prototyping is easy, teams validate underlying hypotheses through structured experiments with measurable success criteria. This approach channels stakeholder enthusiasm for quick prototypes toward strategic learning rather than feature accumulation. The refinement color coding (green, orange, grey, white) provides tactical communication tools for managing stakeholder expectations. When stakeholders understand that development capacity is finite and strategically allocated, they may begin self-filtering inappropriate requests and presenting others more effectively. Technique One: Address Incentive Misalignments Before Feature Discussions Traditional rejection conversations focus on feature merit without addressing underlying incentive structures. This approach treats symptoms while ignoring causes, often leading to recurring requests for the same misaligned features. Consider starting rejection conversations by acknowledging stakeholder incentive realities: “I understand your quarterly goals include improving customer onboarding metrics, and this feature seems designed to address that objective. Let me explain why I think our current user activation experiments will have a greater impact on those same metrics.” This approach accomplishes several things: it demonstrates an understanding of stakeholder motivations, connects rejection to shared objectives, and redirects energy toward aligned solutions. You’re working within incentive structures rather than fighting them while maintaining strategic focus. For AI-generated prototypes, address the incentive to optimize for implementation speed over strategic value: “This prototype demonstrates technical feasibility, but before committing development resources, I need to understand the strategic hypothesis we’re testing and how we’ll measure success beyond technical implementation.” Document these incentive conversations as part of your Anti-Product Backlog entries. When stakeholders see their motivations acknowledged and addressed systematically, they’re more likely to trust future rejection decisions and collaborate on alternative approaches. Technique Two: Leverage Transparency as Strategic Protection The Anti-Product Backlog system provides more than rejection documentation: it creates transparency that protects Product Owners and Managers from political pressure while educating stakeholders about strategic thinking. Make your strategic criteria explicit and easily accessible. When stakeholders understand your decision framework before making requests, they can self-filter inappropriate ideas and present others more strategically. This transparency reduces rejection conversations by improving request quality. For each rejected item, document: The strategic misalignment (how does this conflict with Product Goal/Vision?)The opportunity cost (what strategic work would this displace?)The incentive analysis (what stakeholder objectives does this serve?)The alternative approaches (how else might we address the underlying need?)The reconsideration criteria (what would need to change to revisit this?) This systematic transparency serves multiple purposes: it demonstrates thoughtful analysis rather than arbitrary rejection, provides stakeholders with clear feedback on request quality, and creates precedent documentation that prevents the same arguments from recurring. Address AI prototype presentations with similar transparency: “I appreciate the technical exploration, but our Product Backlog prioritization depends on strategic alignment and validated user needs rather than implementation feasibility. Let me show you how this request fits into our current strategic framework.” Technique Three: Transform Rejection into Strategic Education Every rejection conversation represents an opportunity to educate stakeholders about strategic product thinking while addressing their underlying incentive pressures. Connect rejection rationales to measurable outcomes that align with stakeholder objectives: “I understand you need to improve support ticket resolution times. This feature might help marginally, but our planned user onboarding improvements could reduce ticket volume by 30% based on our support analysis, which would have a greater impact on your team’s performance metrics.” For AI-generated prototypes, use rejection as education about strategic experimentation: “This prototype shows what we could build, but effective product strategy requires understanding why we should build it and how we’ll know if it succeeds. Before committing to development, let’s define the strategic hypothesis and success criteria.” Reference the systematic process explicitly: “Our alignment-value pipeline shows 47 items in various stages representing 12 weeks of development work. This request would need to demonstrate higher strategic impact than current items to earn prioritization, and I don’t see evidence for that impact yet.” This educational approach gradually shifts stakeholder mental models from feature-focused to outcome-focused thinking. When stakeholders understand the true cost of product decisions and the strategic logic behind prioritization, they begin collaborating more effectively within strategic constraints rather than trying to circumvent them. The Incentive Reality: Systematic Causes Require Systematic Solutions Organizational incentives create predictable stakeholder behavior patterns that individual rejection conversations cannot address. Sales teams get compensated for promises that product teams must deliver. Marketing departments face engagement metrics that feature requests that could theoretically improve. Customer support managers need ticket resolution improvements that workflow automation might provide. These incentive structures aren’t necessarily wrong, but often conflict with product strategy coherence. Effective Product Owners and Managers must navigate these realities without compromising strategic focus. Building Systematic Rejection Capability Individual rejection conversations matter less than systematic practices that align organizational incentives with product strategy while maintaining stakeholder relationships. Consequently, establish regular stakeholder education sessions in which you share the alignment-value pipeline framework and demonstrate how strategic decisions are made. When stakeholders understand the system, they can work more effectively within it. Create metrics that track rejection effectiveness: ratio of strategic alignment in requests over time, stakeholder satisfaction despite rejections, value creation improvements from strategic focus, and business impact metrics from accepted features. Use Sprint Reviews to reinforce outcome-focused thinking by presenting strategic learning and business impact rather than just feature demonstrations. This gradually shifts organizational culture from output celebration to outcome achievement. Most importantly, recognize that strategic rejection isn’t about individual skills. Instead, it’s about organizational systems that either support or undermine strategic product thinking. Master systematic approaches, and you will build products that create a sustainable competitive advantage while maintaining stakeholder relationships based on mutual respect and strategic discipline, rather than diplomatic accommodation. Conclusion: Transform Your Strategic Rejection Skills Most Product Owners and Managers recognize these challenges but struggle with implementation. Reading frameworks doesn’t change entrenched stakeholder behavior patterns; systematic practice does. Start immediately: Document the incentive structures driving your three most persistent stakeholder requests. Create your first Anti-Product Backlog entry with a strategic rationale. Practice direct rejection language, focusing on strategic alignment rather than diplomatic deflection.

By Stefan Wolpers DZone Core CORE
RAG vs. CAG: A Deep Dive into Context-Aware AI Generation Techniques
RAG vs. CAG: A Deep Dive into Context-Aware AI Generation Techniques

As artificial intelligence systems become core components of everything from enterprise workflows to everyday tools, one thing is becoming crystal clear: context matters. It's no longer enough for a model to simply string together grammatically correct sentences. To truly add value—whether as a legal assistant, an AI tutor, or a customer support bot—an AI system needs to deliver the right answer at the right time, grounded in real-world knowledge and tuned to the situation at hand. That’s where two key techniques come into play: Retrieval-Augmented Generation (RAG) and Context-Aware Generation (CAG). These two approaches offer different solutions to the same challenge: how to make large language models (LLMs) smarter, more reliable, and more useful. RAG bridges the gap between generative models and real-time information by pulling in relevant documents from a knowledge base before generating a response. It’s like giving your model a stack of reference material moments before it starts talking. CAG, meanwhile, focuses on embedding relevant context—like conversation history, user preferences, or task-specific metadata—right into the generation process. Instead of looking outward for new information, CAG leverages what the system already knows or remembers about the user or task. In this article, we’ll break down how each method works, what problems they solve, where they shine, and how they’re being used in the wild today. RAG Explained: Making Models Smarter With On-the-Fly Knowledge One of the biggest challenges with LLMs is that their knowledge is static. They're trained on massive amounts of data, but once that training is done, they can't adapt to new information—unless you retrain them, which is slow, expensive, and rarely practical. This leads to a major issue known as hallucination, where models confidently output incorrect or outdated facts. Retrieval-Augmented Generation (RAG) solves this by plugging the model into a live source of knowledge. Here’s the idea: when a user submits a prompt, the system first sends that query to a retriever. This retriever searches an external document store—like a vector database or search index—and fetches the most relevant results. These documents are then combined with the original prompt to create an enriched input that’s sent to the language model. The model uses this combined context to generate a response that’s not only coherent, but also grounded in real, up-to-date information. RAG pipeline in action: This modular structure is why RAG is so flexible. You can swap out the document store or retriever depending on your domain (e.g., use Pinecone for support articles, Elasticsearch for financial reports), and you never have to retrain the model to reflect new information. Why Use RAG? Factual grounding: Reduces hallucination by basing responses on source material.Easy to update: Swap in new documents at any time without touching the model.Domain-specific answers: Great for legal, medical, technical, or regulatory contexts.Explainability: Enables citations and traceability in responses. CAG Explained: Teaching Models to Pay Attention to Context While RAG is great for injecting external knowledge, it doesn’t handle one important thing: what the model already knows about the user, task, or conversation. That’s where Context-Aware Generation (CAG) comes in. CAG techniques aim to make LLMs more “aware” of their environment. Instead of reaching out to fetch new data, they focus on embedding relevant context into the generation process. This could include previous chat messages, system roles or instructions, metadata about the user, or even memory modules that store persistent data across sessions. There are many ways to implement CAG, depending on your needs: TechniqueWhat It DoesTools UsedPrompt chainingCarries over past inputs/outputsLangChain, LlamaIndexInstruction promptingDefines behavior or tone with role/system messagesOpenAI system promptsEmbedding memoryStores past interactions as retrievable embeddingsPinecone, RedisFine-tuningTrains models on custom data or toneHugging Face TransformersLoRA / AdaptersInjects domain or user-specific behavior efficientlyQLoRA, PEFT CAG pipeline in action: In practice, CAG is ideal for personalized experiences—like AI writing assistants, virtual therapists, or sales bots that need to maintain context across a conversation. Why Use CAG? Continuity: Keeps the conversation or task flow consistent.Personalization: Adapts tone, style, or content based on user preferences.Low latency: No retrieval step means faster responses. RAG vs. CAG: Key Differences at a Glance FeatureRAGCAGSource of contextExternal document retrievalInternal session memory / prompt injectionModel architectureRetriever + GeneratorUnified prompt-aware modelLatencyMedium to high (due to retrieval)Low (fast inference)Best forKnowledge-grounded Q&APersonalized, ongoing user interactionsExplainabilityStrong (can cite sources)Weaker (depends on context injection)ComplexityHigher (needs vector DB, retriever setup)Moderate (prompt or memory handling)Failure risksPoor retrieval = poor outputContext window overflow or drift Where Each One Shines Use RAG when you need: Support bots that pull answers from up-to-date articles or FAQsLegal or medical assistants referencing formal documentationInternal research copilots querying knowledge bases Use CAG when you want: Long-term memory in productivity tools or agentsAI storytelling or creative writing that remembers plot or toneSales agents adapting pitch and language to each client Combine Them: Hybrid Systems Are the Future In most serious AI systems today, you don’t have to pick between RAG and CAG—you can use both. Many modern tools, like Microsoft Copilot, Notion AI, and Salesforce Einstein GPT, combine these strategies to create truly powerful assistants. Here’s how it works: CAG handles memory, such as the user’s goals, past actions, and tone, while RAG brings in real-time facts, like documentation or product updates. The LLM takes both kinds of context and merges them into a single, coherent response. This hybrid setup is already being used in: Customer support bots that remember past tickets and cite the latest help docsAI tutors that remember the user’s learning goals and fetch content on-demandEnterprise copilots that blend personalization with structured document access Engineering Considerations FactorRAGCAGToken usageRetrieved docs add to promptHistory/context adds to promptInfrastructureRequires retriever + vector DB setupNeeds prompt memory or caching logicCostHigher due to search + LLM usageModerate, depending on implementationMaintainabilityEasy to update documentsHarder to design long-term memoryTraining needNo retraining neededMay benefit from fine-tuning What’s Next for RAG and CAG? Both techniques are evolving fast. In RAG, expect to see: Multimodal retrieval (images, charts, PDFs—not just text)Streaming RAG for real-time monitoring and alert systemsAgentic RAG, combining document fetching with reasoning chains In CAG, we’ll see: Support for longer context windows (up to 1 million tokens!)Persistent memory APIs (like ChatGPT’s memory feature)Context compression techniques to avoid prompt bloat Conclusion: When (and Why) to Use RAG, CAG, or Both When building modern AI systems, context is no longer optional—it’s foundational. RAG and CAG give you two powerful ways to give your models the context they need to be useful and reliable. Choose RAG when factual accuracy and document grounding matter most.Choose CAG when you need memory, personalization, or conversational continuity.Combine both to build systems that are both informed and intelligent. In short: Use RAG when knowledge matters.Use CAG when context matters.Use both when everything matters. References Lewis, P., Perez, E., Piktus, A., Petroni, F., Karpukhin, V., Goyal, N., Küttler, H., Lewis, M., Yih, W., Rocktäschel, T., Riedel, S., & Kiela, D. (2020, May 22). Retrieval-Augmented Generation for Knowledge-Intensive NLP tasks. arXiv.org. https://arxiv.org/abs/2005.11401Retrieval Augmented Generation (RAG). (n.d.). Pinecone. https://www.pinecone.io/learn/retrieval-augmented-generation/Memory and new controls for ChatGPT. (2024, February 13). OpenAI. https://openai.com/index/memory-and-new-controls-for-chatgpt/RAG. (n.d.). https://huggingface.co/docs/transformers/en/model_doc/rag[Beta] Memory | ️ LangChain. (n.d.). https://python.langchain.com/v0.1/docs/modules/memory/

By Rambabu Bandam

Top AI/ML Experts

expert thumbnail

Tuhin Chattopadhyay

CEO at Tuhin AI Advisory and Professor of Practice,
JAGSoM

Dr. Tuhin Chattopadhyay is a celebrated technology thought leader among both the academic and corporate fraternity. Recipient of numerous prestigious awards, Tuhin is hailed as India's Top 10 Data Scientists by Analytics India Magazine. Besides driving his consultancy organization Tuhin AI Advisory, Dr. Tuhin also serves as Professor of Practice at JAGSoM, Bengaluru. His professional accomplishments can be explored from https://www.tuhin.ai/, art portfolio from https://tuhin.art/, joie de vivre from https://tuhinism.com/ and adventures with MySon from https://dogfather.rocks/.
expert thumbnail

Frederic Jacquet

Technology Evangelist,
AI[4]Human-Nexus

My goal is to deepen my research and analysis to track technological developments and understand their real impacts on businesses and individuals. I focus on untangling exaggerated perceptions and irrational fears from genuine technological advances. My approach is critical: I aim to move beyond myths and hype to identify the concrete, realistic progress we can expect from new technologies.
expert thumbnail

Suri Nuthalapati

Data & AI Practice Lead, Americas,
Cloudera

Suri is an accomplished Technical Leader and Innovator specializing in Big Data, Cloud, Machine Learning, and Generative AI technologies to continuously create strategies and solutions to modernize data ecosystem supporting many analytics use cases. Engaging communicator with solid business acumen, leadership, product development, and adept at translating business needs into cost-effective solutions. History of delivering a broad range of projects and data feeds for various business use cases. Record of significant architectural enhancements that have increased ROI while reducing operating expenses. Dedicated life-long learner, recognized for the ability to quickly assimilate and utilize new technologies and methods. Recognized across functions as the go-to person for advice and expertise on emerging technologies. Startup Founder experience in building cutting-edge data products and SaaS platforms. Suri is an Official Member of both the Forbes Technology Council and the Entrepreneur Leadership Network, where he contributes thought leadership articles and collaborate with industry experts to drive innovation and share insights on technology, entrepreneurship, and leadership.
expert thumbnail

Pratik Prakash

Principal Solution Architect,
Capital One

Pratik, an experienced solution architect and passionate open-source advocate, combines hands-on engineering expertise with an extensive experience in multi-cloud and data science .Leading transformative initiatives across current and previous roles, he specializes in large-scale multi-cloud technology modernization. Pratik's leadership is highlighted by his proficiency in developing scalable serverless application ecosystems, implementing event-driven architecture, deploying AI-ML & NLP models, and crafting hybrid mobile apps. Notably, his strategic focus on an API-first approach drives digital transformation while embracing SaaS adoption to reshape technological landscapes.

The Latest AI/ML Topics

article thumbnail
AI/ML Big Data-Driven Policy: Insights Into Governance and Social Welfare
Enabling more informed, transparent, and responsive policies that directly address societal needs and enhance resilience in the face of issues.
June 24, 2025
by Ram Ghadiyaram
· 179 Views
article thumbnail
Unveiling Supply Chain Transformation: IIoT and Digital Twins
Dive into industrial automation using industrial internet of things (IIoTs) and digital twins (DTs) towards the advancement of the supply chain using key technologies.
June 24, 2025
by Manvinder Kumra
· 223 Views
article thumbnail
The Future Is Now: Top Generative AI Services You Can’t Ignore
Generative AI services create new ideas and content. You just give a prompt, and the AI will write, draw an image, also create a video. That simple!
June 23, 2025
by Chandrasekhar Kumar Sah DZone Core CORE
· 435 Views
article thumbnail
Snowflake Cortex for Developers: How Generative AI and SaaS Enable Self-Serve Data Analytics
This article explores how Snowflake Cortex, Snowflake’s generative AI solution, advances self-serve analytics for both structured and unstructured data.
June 23, 2025
by Dipankar Saha
· 350 Views
article thumbnail
Why Is NLP Essential in Speech Recognition Systems?
Discover how natural language processing enhances speech recognition systems for improved accuracy, context understanding, and multilingual support.
June 23, 2025
by Matthew McMullen
· 323 Views
article thumbnail
Architects of Ambient Intelligence With IoT and AI Personal Assistants
Traditional IoT + AI faces latency, privacy, and ecosystem issues. Decentralized AI and federated learning enhance real-time, privacy-centric, user-trusted solutions.
June 23, 2025
by Praveen Chinnusamy
· 360 Views · 1 Like
article thumbnail
Building Smarter Chatbots: Using AI to Generate Reflective and Personalized Responses
With the advent of artificial intelligence-based tools, chatbots have been integral for user interactions. An introductory analysis.
June 20, 2025
by Surabhi Sinha
· 789 Views
article thumbnail
MCP Client Agent: Architecture and Implementation
Learn how to build a custom MCP client agent that connects to MCP servers programmatically and understand the end-to-end request flow in the process.
June 20, 2025
by Venkata Buddhiraju
· 958 Views
article thumbnail
Beyond Automation: How Artificial Intelligence Is Transforming Software Development
AI is more than a tool, it’s a teammate. See how it’s helping developers move faster, tackle tough problems, and focus more on building great software.
June 19, 2025
by SAURABH AGARWAL
· 1,215 Views · 8 Likes
article thumbnail
How to Add a Jenkins Agent With Docker Compose
A comprehensive step-by-step tutorial to add a Jenkins agent using Docker Compose. Simplify CI/CD setup with this step-by-step guide for scalable automation.
June 19, 2025
by Faisal Khatri DZone Core CORE
· 1,164 Views · 1 Like
article thumbnail
It’s Not Magic. It’s AI. And It’s Brilliant.
A curious mind’s take on AI, a powerful technology that can mimic human intelligence, learn from data, and make decisions.
June 18, 2025
by Ananya K V
· 1,251 Views
article thumbnail
The Shift of DevOps From Automation to Intelligence
The history of tech is a story of reinvention. This post explores how we are entering a new era of software intelligence.
June 18, 2025
by Arunsingh Jeyasingh Jacob
· 1,141 Views · 1 Like
article thumbnail
Debunking LLM Intelligence: What's Really Happening Under the Hood?
Debunk LLM 'reasoning.' Go 'under the hood' to uncover the computational reality of AI's language abilities. It's about statistical power, not human thought.
June 18, 2025
by Frederic Jacquet DZone Core CORE
· 1,256 Views · 4 Likes
article thumbnail
From OCR Bottlenecks to Structured Understanding
OCR errors cascade through RAG pipelines, killing performance. SmolDocling (256M params) processes docs holistically → structured output → better RAG.
June 18, 2025
by Pier-Jean MALANDRINO DZone Core CORE
· 841 Views · 1 Like
article thumbnail
Elevating LLMs With Tool Use: A Simple Agentic Framework Using LangChain
Build a smart agent with LangChain that allows LLMs to look for the latest trends, search the web, and summarize results using real-time tool calling.
June 17, 2025
by Arjun Bali
· 1,022 Views · 1 Like
article thumbnail
AI's Cognitive Cost: How Over-Reliance on AI Tools Impacts Critical Thinking
Learn how over-reliance on AI tools impacts critical thinking, with insights from Michael Gerlich's 2025 study on cognitive offloading and AI usage trends.
June 17, 2025
by Srinivas Chippagiri DZone Core CORE
· 698 Views · 1 Like
article thumbnail
Driving Streaming Intelligence On-Premises: Real-Time ML With Apache Kafka and Flink
This article explores how to design, build, and deploy a predictive ML model using Flink and Kafka in an on-premises environment to power real-time analytics.
June 17, 2025
by Gautam Goswami DZone Core CORE
· 918 Views · 1 Like
article thumbnail
Why 99% Accuracy Isn't Good Enough: The Reality of ML Malware Detection
ML models need to be complemented with traditional detection techniques for malware detection to work in real enterprise environments, due to the "base rate problem."
June 16, 2025
by Udbhav Prasad
· 1,247 Views · 3 Likes
article thumbnail
Scrum Smarter, Not Louder: AI Prompts Every Developer Should Steal
A practical guide that helps developers use AI to improve backlog grooming, retros, standups, and reviews, without waiting for the Scrum Master to save the sprint.
June 16, 2025
by Ella Mitkin
· 1,795 Views · 4 Likes
article thumbnail
AI Agents in PHP with Model Context Protocol
Supercharge your PHP applications with AI Agents using MCP (Model Context Protocol). Connect powerful tools to LLMs with minimal coding.
June 16, 2025
by Valerio Barbera
· 662 Views
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • ...
  • Next

ABOUT US

  • About DZone
  • Support and feedback
  • Community research
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 100
  • Nashville, TN 37211
  • [email protected]

Let's be friends: