Artificial intelligence (AI) and machine learning (ML) are two fields that work together to create computer systems capable of perception, recognition, decision-making, and translation. Separately, AI is the ability for a computer system to mimic human intelligence through math and logic, and ML builds off AI by developing methods that "learn" through experience and do not require instruction. In the AI/ML Zone, you'll find resources ranging from tutorials to use cases that will help you navigate this rapidly growing field.
Using Spring AI to Generate Images With OpenAI's DALL-E 3
AI/ML Techniques for Real-Time Fraud Detection
Remember when you had to squint at wonky text or click on traffic lights to prove you're human? Those classic CAPTCHAs are being rendered obsolete by the day. As artificial intelligence improves, these once-reliable gatekeepers let automated systems through. That poses a challenge — and an opportunity — for developers to think again about how they verify human users. What’s Wrong With Traditional CAPTCHAs? Traditional CAPTCHAs have additional problems besides becoming increasingly ineffective against AI. Modern users expect seamless experiences, and presenting them with puzzles creates serious friction in their flow. Even more so, these systems introduce real accessibility challenges for users with visual or cognitive disabilities [1]. Recent research shows that traditional text-based CAPTCHAs can be solved with up to 99% accuracy using modern AI systems. Worse still, image recognition tasks — such as recognizing crosswalks and traffic lights — are trivial for state-of-the-art computer vision systems [2]. Why Has User Authentication Remained So Stagnant? The challenges are numerous and complex, but they also present an exciting opportunity for us as developers to innovate and adapt. The architecture of modern authentication systems is shifting from explicit challenges (a.k.a. "prove you’re human") to implicit verification ("we can tell you're human by how you interact"). Increases in underlying heuristics are driving higher and higher levels of frictionless, implicit authentication, marking a paradigm shift in our thinking about authentication [3]. The new systems function based on three essential qualities: User interactivity: They observe how users interact organically with websites and applications. A human's mouse, keyboard, or scroll behavior is unique and challenging for machines to replicate with 100% fidelity.Analysis of context: They process the context of every dialogue, including analyzing when and how users access services, their devices, and their generic behavior patterns.Adaptive security: These new systems use adaptive security, a concept in which the level of security changes depending on the risk factors involved. Instead of applying the same level of security to everyone, these systems can increase security measures when something seems suspicious while remaining almost undetectable to legitimate users. For AI Challenge: The Computer Use of Claude Recent developments in AI, including Anthropic’s Claude 3.5 Sonnet, have also significantly clouded the authentication landscape. Now, Claude can, in many ways, independently take control of a user's computer and browse the Internet, doing things like building websites or planning vacations [4]. This adds yet another layer of difficulty in distinguishing humans from machines. While providing exciting automation possibilities, this also requires more advanced authentication moments to secure and prevent AI impersonation [5]. About CAPTCHA in the Age of Generative AI These traditional CAPTCHA systems are becoming less effective as generative AI improves. Customer-facing product builders must evolve their authentication frameworks to run ahead of the sophisticated bot arms race without compromising their risky user experience. Here’s a way to approach this challenge in the GenAI era: 1. Adopt Multi-Layered Authentication That means not just using visual or text-based challenges but taking a multi-faceted approach: Behavioral analysis: Using AI, analyze how users interact(e.g., mouse movement, typing pattern [6], and more) with the application.Contextual verification: Assess device data, access patterns, and historical data [6].Adaptive security: Provide real-time security response based on risk [6]. 2. Focus on the User Experience User experience can place friction in the authentication process: Work towards invisible authentication methods operating behind the scenes [7].Obstacles that will be necessary are fast and intuitive to solve.Offer alternative text for users with disabilities [8]. 3. Use Powerful AI Techniques Protect yourself from malicious AI by using advanced technologies: Deploy machine learning models to identify human and AI-generated responses [9].Enable federated learning to learn detection without compromising user anonymity.Investigate the application of adversarial examples to fool AI-based CAPTCHA solvers [9]. 4. Institute Continuous Monitoring and Adjustment A lot is changing in the AI landscape, and we need to be on guard: Continuously evaluate the strength of your authentication mechanism in light of emerging AI advancements [10].Invest in real-time monitoring and threat detection/response systemsBe ready to deploy updates and patches at the drop of a hat as new vulnerabilities come to light. 5. Explore Other Authentication Options Go beyond conventional CAPTCHA systems: Investigate using biometric authentication (e.g., fingerprint, facial recognition [9])Use risk-based authentication that only prompts for a challenge on suspicious activity.Employ computationally expensive proof-of-work systems for bots but inexpensive ones for real users [7]. 6. Keep Transparency and User Trust As authentication systems get more complex, it’s essential to maintain trust in users: Inform users about your security practices.Give users options to perceive and navigate their data needs through the authentication. Ops must continually make privacy laws compliant, such as GDPR and CCPA [7]. Product builders can utilize this framework to create robust authentication mechanisms that defend against AI-enabled attacks and do not hinder user experience. The aim is not so much to render authentication impossible to AI altogether as to make it considerably more expensive than for legitimate users. Thinking Creatively-Challenging the AI Gods To address these issues, researchers are proposing new CAPTCHA ideas. Recently, a group from UCSF introduced creative solutions that utilize aspects of human cognition that contemporary AI models can not yet reproduce [11]. Their approach includes: Logical reasoning challenges: These are problems that require human-like logical reasoning that data-driven algorithms may need help solving quickly.Dynamic challenge generation: Designing unique CAPTCHAs generated on the fly and hard for AI systems to learn or predict.After-image visual patterns: Creating challenges involving visual perception of time-based movements and patterns beyond current capabilities for static-image processing AI.Scalable complexity: Assembling puzzles of increasing difficulty and complexity, from challenges that provide images to choose from to more complicated ones that require pattern detection. These methods are designed to provide a more robust defense against AI copying while remaining accessible to human users. As AI capabilities advance, such solutions will become necessary to preserve the integrity of user authentication. The Future of Authentication As we look ahead, there will be a few trends that will be defining the future of authentication. Helping people act as their own best friends while securing their access, authentication is getting more tailored to individual user behavior within acceptable privacy limits [12]. Integrating existing identity systems continues to become more seamless, minimizing the need for separate authentication steps. On the attacking side, both authentication systems and attackers continue to evolve new approaches using machine learning, which creates a continuous arms race. This migration away from traditional CAPTCHAs is a big step forward in challenging user identity. We can move to more advanced, intuitive approaches and devise systems that are simultaneously more secure and nice to use [13]. Shortly, the challenge of authentication may no longer be about making humans solve puzzles but rather about designing intelligent systems capable of identifying human behavior while safeguarding the privacy and security of individuals. Learning to understand and use these practices from the present allows us to construct better, more protected programs for each of us. Disclaimer: The views and opinions expressed in this article are those of the authors solely and do not reflect the official policy or position of any institution, employer, or organization with which the authors may be affiliated.
AWS Lambda is enhancing the local IDE experience to make developing Lambda-based applications more efficient. These new features enable developers to author, build, debug, test, and deploy Lambda applications seamlessly within their local IDE using Visual Studio Code (VS Code). Overview The improved IDE experience is part of the AWS Toolkit for Visual Studio Code. It includes a guided setup walkthrough that helps developers configure their local environment and install necessary tools. The toolkit also includes sample applications that demonstrate how to iterate on your code both locally and in the cloud. Developers can save and configure build settings to accelerate application builds and generate configuration files for setting up a debugging environment. With these enhancements, you can sync local code changes quickly to the cloud or perform full application deployments, enabling faster iteration. You can test functions locally or in the cloud and create reusable test events to streamline the testing process. The toolkit also provides quick action buttons for building, deploying, and invoking functions locally or remotely. Additionally, it integrates with AWS Infrastructure Composer, allowing for a visual application-building experience directly within the IDE. If anyone has worked with AWS Lambda, you will find IDE is not developer-friendly and has poor UI. It's hard to make code changes in the code and test from the present IDE. On top of that, if you don't want to use AWS-based CI/CD services, automated deployment can be a bit challenging for a developer. You can use Terraform or Github actions now, but AWS came up with another better option to deploy and test code AWS Lambda code. Considering these challenges, AWS Lambda recently announced the Visual Studio integration feature, which is a part of the AWS toolkit. It will make it easier for the developers to push, build, test, and deploy the code. This integration feature option uses Visual Studio. Although it still has restrictions on using 50 MB code size, it now provides a better IDE experience similar to what Visual Studio will be like on your local host. This includes dependencies installation with extension, split screen layout, writing code and running test events without opening new windows, and live logs from CloudWatch for efficient debugging. In addition, Amazon Q's in the console can be used as a coding assistant similar to a co-pilot. This provides a better developer experience. To start using Visual Studio for AWS Lambda: 1. You should have Visual Studio locally installed. After that, install the AWS Toolkit from the marketplace. You will see that the webpage will redirect to Visual Studio and open this tab. You can go ahead and install this. 2. After installing the AWS Toolkit, you will see the AWS logo on the left sidebar under extensions. Click on that. 3. Now, select the option to connect with your AWS account. 3. After a successful connection, you will get a tab to invoke the Lambda function locally. As you can see below, this option requires AWS SAM installed to invoke Lambda locally. After login, it will also pull all your Lambda functions from your AWS account. If you want to update those, you can right-click on the Lambda function and select Upload Lambda. It will ask you for the zip file of the Lambda function. Alternatively, you can select samples from the explorer option in the left sidebar. If you want to go with remote invoke, you can click on any Lambda functions visible to you from the sidebar. 4. If you want to create your own Lambda function and test the integration, you can click on the Application Builder option and select AWS CLI or SAM. If you want the Lambda code to deploy to the AWS account, you can select the last option, as shown in the above screenshot. After that, if you log into your AWS account, you will be asked to log in. Then, it will let you deploy AWS code. This way, you can easily deploy AWS code from your IDE, which can be convenient for developer testing. Conclusion Lambda is enhancing the local development experience for Lambda-based applications by integrating with the VS Code IDE and AWS Toolkit. This upgrade simplifies the code-test-deploy-debug workflow. A step-by-step walkthrough helps you set up your local environment and explore Lambda functionality through sample applications. With intuitive icon shortcuts and the Command Palette, you can build, debug, test, and deploy Lambda applications seamlessly, enabling faster iteration without the need to switch between multiple tools.
For years, developers have dreamed of having a coding buddy who would understand their projects well enough to automatically create intelligent code, not just pieces of it. We've all struggled with the inconsistent naming of variables across files, trying to recall exactly what function signature was defined months ago, and wasted valuable hours manually stitching pieces of our codebase together. This is where large language models (LLMs) come in — not as chatbots, but as strong engines in our IDEs, changing how we produce code by finally grasping the context of our work. Traditional code generation tools, and even basic features of IDE auto-completion, usually fall short because they lack a deep understanding of the broader context; hence, they usually operate in a very limited view, such as only the current file or a small window of code. The result is syntactically correct but semantically inappropriate suggestions, which need to be constantly manually corrected and integrated by the developer. Think about suggesting a variable name that is already used at some other crucial module with a different meaning — a frustrating experience we've all encountered. LLMs now change this game entirely by bringing a much deeper understanding to the table: analyzing your whole project, from variable declarations in several files down to function call hierarchies and even your coding style. Think of an IDE that truly understands not just the what of your code but also the why and how in the bigger scheme of things. That is a promise of LLM-powered IDEs, and it's real. Take, for example, a state-of-the-art IDE using LLMs, like Cursor. It's not simply looking at the line you're typing; it knows what function you are in, what variables you have defined in this and related files, and the general structure of your application. That deep understanding is achieved by some fancy architectural components. This is built upon what's called an Abstract Syntax Tree, or AST. An IDE will parse your code into a tree-like representation of the grammatical constructs in that code. This gives the LLM at least an elementary understanding of code, far superior to simple plain text. Secondly, in order to properly capture semantics between files, a knowledge graph has been generated. It interlinks all of the class-function-variable relationships throughout your whole project and builds an understanding of these sorts of dependencies and relationships. Consider a simplified JavaScript example of how context is modeled: JavaScript /* Context Model based on an edited single document and with external imports */ function Context(codeText, lineInfo, importedDocs) { this.current_line_code = codeText; // Line with active text selection this.lineInfo = lineInfo; // Line number, location, code document structure etc. this.relatedContext = { importedDocs: importedDocs, // All info of imported or dependencies within text }; // ... additional code details ... } This flowchart shows how information flows when a developer changes his/her code. Markdown graph LR A[Editor(User Code Modification)] --> B(Context Extractor); B --> C{AST Structure Generation}; C --> D[Code Graph Definition Creation ]; D --> E( LLM Context API Input) ; E --> F[LLM API Call] ; F --> G(Generated Output); style A fill:#f9f,stroke:#333,stroke-width:2px style F fill:#aaf,stroke:#333,stroke-width:2px The Workflow of LLM-Powered IDEs 1. Editor The process starts with a change that you, as the developer, make in the code using the code editor. Perhaps you typed some new code, deleted some lines, or even edited some statements. This is represented by node A. 2. Context Extractor That change you have just made triggers the Context Extractor. This module essentially collects all information around your modification within the code — somewhat like an IDE detective looking for clues in the environs. This is represented by node B. 3. AST Structure Generation That code snippet is fed to a module called AST Structure Generation. AST is the abbreviation for Abstract Syntax Tree. This module will parse your code, quite similar to what a compiler would do. Then, it begins creating a tree-like representation of the grammatical structure of your code. For LLMs, such a structured view is important for understanding the meaning and the relationships among the various parts of the code. This is represented by node C, provided within the curly braces. 4. Creation of Code Graph Definition Next, the creation of the Code Graph Definition will be done. This module will take the structured information from the AST and build an even greater understanding of how your code fits in with the rest of your project. It infers dependencies between files, functions, classes, and variables and extends the knowledge graph, creating a big picture of the general context of your codebase. This is represented by node D. 5. LLM Context API Input All the context gathered and structured — the current code, the AST, and the code graph — will finally be transformed into a particular input structure. This will be done so that it is apt for the large language model input. Then, finally, this input is sent to the LLM through a request, asking for either code generation or its completion. This is represented by node E. 6. LLM API Call It is now time to actually call the LLM. At this moment, the well-structured context is passed to the API of the LLM. This is where all the magic has to happen: based on its training material and given context, the LLM should give suggestions for code. This is represented with node F, colored in blue to indicate again that this is an important node. 7. Generated Output The LLM returns its suggestions, and the user sees them inside the code editor. This could be code completions, code block suggestions, or even refactoring options, depending on how well the IDE understands the current context of your project. This is represented by node G. So, how does this translate to real-world improvements? We've run benchmarks comparing traditional code completion methods with those powered by LLMs in context-aware IDEs. The results are compelling: Metric Baseline (Traditional Methods) LLM-Powered IDE (Context Aware) Improvement Accuracy of Suggestions (Score 0-1) 0.55 0.91 65% Higher Average Latency (ms) 20 250 Acceptable for Benefit Token Count in Prompt Baseline **~ 30% Less (Optimized Context)** Optimized Prompt Size Graph: Comparison of suggestion accuracy scores across 10 different code generation tasks. A higher score indicates better accuracy. Markdown graph LR A[Test Case 1] -->|Baseline: 0.5| B(0.9); A -->|LLM IDE: 0.9| B; C[Test Case 2] -->|Baseline: 0.6| D(0.88); C -->|LLM IDE: 0.88| D; E[Test Case 3] -->|Baseline: 0.7| F(0.91); E -->|LLM IDE: 0.91| F; G[Test Case 4] -->|Baseline: 0.52| H(0.94); G -->|LLM IDE: 0.94| H; I[Test Case 5] -->|Baseline: 0.65| J(0.88); I -->|LLM IDE: 0.88| J; K[Test Case 6] -->|Baseline: 0.48| L(0.97); K -->|LLM IDE: 0.97| L; M[Test Case 7] -->|Baseline: 0.58| N(0.85); M -->|LLM IDE: 0.85| N; O[Test Case 8] -->|Baseline: 0.71| P(0.90); O -->|LLM IDE: 0.90| P; Q[Test Case 9] -->|Baseline: 0.55| R(0.87); Q -->|LLM IDE: 0.87| R; S[Test Case 10] -->|Baseline: 0.62| T(0.96); S -->|LLM IDE: 0.96| T; style B fill:#ccf,stroke:#333,stroke-width:2px style D fill:#ccf,stroke:#333,stroke-width:2px Let's break down how these coding tools performed, like watching a head-to-head competition. Imagine each row in our results table as a different coding challenge (we called them "Test Case 1" through "Test Case 10"). For each challenge, we pitted two approaches against each other: The Baseline: Think of this as the "old-school" method, either using standard code suggestions or a basic AI that doesn't really "know" the project inside and out. You'll see an arrow pointing from the test case (like 'Test Case 1', which we labeled Node A) to its score — that's how well the baseline did.The LLM IDE: This is the "smart" IDE we've built, the one with a deep understanding of the entire project, like it's been studying it for weeks. Another arrow points from the same test case to the same score, but this time, it tells you how the intelligent IDE performed. Notice how the result itself (like Node B) is highlighted in light blue? That's our visual cue to show where the smart IDE really shined. Take Test Case 1 (that's Node A) as an example: The arrow marked 'Baseline: 0.5' means the traditional method got it right about half the time for that task.But look at the arrow marked 'LLM IDE: 0.9'! The smart IDE, because it understands the bigger picture of the project, nailed it almost every time. If you scan through each test case, you'll quickly see a pattern: the LLM-powered IDE consistently and significantly outperforms the traditional approach. It's like having a super-knowledgeable teammate who always seems to know the right way to do things because they understand the entire project. The big takeaway here is the massive leap in accuracy when the AI truly grasps the context of your project. Yes, there's a tiny bit more waiting time involved as the IDE does its deeper analysis, but honestly, the huge jump in accuracy and the fact that you'll spend way less time fixing errors makes it a no-brainer for developers. But it's more than just the numbers. Think about the actual experience of coding. Engineers who've used these smarter IDEs say it feels like a weight has been lifted. They're not constantly having to keep every tiny detail of the project in their heads. They can focus on the bigger, more interesting problems, trusting that their IDE has their back on the details. Even tricky stuff like reorganizing code becomes less of a headache, and getting up to speed on a new project becomes much smoother because the AI acts like a built-in expert, helping you connect the dots. These LLM-powered IDEs aren't just about spitting out code; they're about making developers more powerful. By truly understanding the intricate connections within a project, these tools are poised to change how software is built. They'll make us faster and more accurate and, ultimately, allow us to focus on building truly innovative things. The future of coding assistance is here, and it's all about having that deep contextual understanding.
Structured logging has become essential in modern applications to simplify the analysis of logs and improve observability. Spring Boot 3.4 extends the logging capabilities of Spring Framework 6.2. This can be easily configured log formats using application.yml or application.properties. Before jumping into the details of the improvements, below is a brief on how structured logging has evolved, with comparisons between traditional logging and structured logging in Spring Framework 6.2 and Spring Boot 3.4. Traditional Logging Traditional logging relies on string concatenation or placeholders in log messages. This approach was simple but lacked the ability to structure data in a way that log aggregation tools could parse easily. Java import org.slf4j.Logger; import org.slf4j.LoggerFactory; public class TraditionalLogging { private static final Logger logger = LoggerFactory.getLogger(TraditionalLogging.class); public static void main(String[] args) { String userId = "12345"; String action = "login"; logger.info("User action: userId={}, action={}", userId, action); } } Logged as below: 2025-01-10 10:00:00 INFO TraditionalLogging: User action: userId=12345, action=login The above logging is hard to parse for automated systems, and manual effort is required to extract and analyze the fields, e.g., userId, and action. Structured Logging With structured logging, the logs are in machine-readable format (e.g., JSON), making it easier to process and analyze. With Spring Framework 6.2 and Spring Boot 3.4, we can use structured logging schemas like Elastic Common Schema (ECS) and Graylog Extended Log Format (GELF). This means structured logging generates logs as key-value pairs or JSON objects with minimal setup. Add the below configuration to application.yml to ensure the logs are generated in ECS format. YAML logging: ecs: enabled: true level: root: INFO Below is an example of logging structured information using SLF4J. Spring handles converting the log messages into a JSON format automatically. Java import org.slf4j.Logger; import org.slf4j.LoggerFactory; import org.springframework.boot.autoconfigure.SpringBootApplication; @SpringBootApplication public class StructuredLoggingExample { private static final Logger logger = LoggerFactory.getLogger(StructuredLoggingExample.class); public static void main(String[] args) { logger.info("User logged in", Map.of("userId", "12345", "action", "login")); } } Output in ECS format (JSON): JSON { "@timestamp": "2025-01-10T10:00:00Z", "log.level": "INFO", "message": "User logged in", "userId": "12345", "action": "login", "logger": "StructuredLoggingExample", "ecs.version": "1.2.0" } Advantages of Structured Logging Tools like Kibana, Grafana, and Splunk can directly ingest structured logs for alerts.Simplified debugging with the ability to search for specific log entries (e.g., filter by userId)Structured logging aligns with modern logging standards like ECS, which makes it easier to integrate with distributed systems. Machine-Readability Let's look at an example of how structured logging helps with machine readability. Step 1: Structured logs are ingested into tools like: Elastic Search: Parses JSON logs and indexes fields for search.Grafana Loki: Stores structured logs for querying.Splunk: Processes and visualizes log data. Step 2: Field-based indexing Each field in a log entry (e.g., userId, action) is indexed separately. This allows tools to perform operations like: Filtering: Shows logs where userId = 12345Aggregating: Count login actions(action = login). Finally, tools like Kibana or Grafana can create dashboards or alerts based on the log fields. Log entries: JSON [ {"@timestamp": "2025-01-10T12:00:00Z", "action": "login", "userId": "123"}, {"@timestamp": "2025-01-10T12:01:00Z", "action": "logout", "userId": "123"}, {"@timestamp": "2025-01-10T12:02:00Z", "action": "login", "userId": "456"} ] Filters in Kibana/Elasticsearch: JSON { "query": { "term": { "userId": "12345" } } } Result: All logs where userId = 12345. Aggregating logs: JSON { "aggs": { "login_count": { "terms": { "field": "action.keyword" } } } } Output: JSON { "aggregations": { "login_count": { "buckets": [ {"key": "login", "doc_count": 2}, {"key": "logout", "doc_count": 1} ] } } } Conclusion Structured logging enables developers to create machine-readable logs that are easy to index, analyze, and manage with advanced observability tools. With the latest updates in Spring Framework 6.2 and Spring Boot 3.4, and with built-in support for widely used formats like ECS and GELF, adopting structured logging has become easier and more effective. Beyond improving readability, these improvements enhanced debugging, monitoring, and system analytics, making it an essential practice for building scalable distributed environments.
Over the years, data has become more and more meaningful and powerful. Both the world and artificial intelligence move at a very quick pace. In this case, AI is very useful for implementations of real-time data use cases. Furthermore, streaming data with AI offers a competitive edge for businesses and industries. AI for real-time and streaming data analytics allows for the most current data to be managed in a timely, continuous flow, as opposed to the traditional way, with several batches of information being handled in varying intervals. Data silos with one platform for streaming and batching data are old news, and pipelines that simplify operations with automated tooling and unified governance are the way of the future. Generative AI relies on large language models trained on large swaths of data. Therefore, while artificial intelligence is extremely beneficial for streaming data analytics, it is also extremely reliant on accurate, robust, real-time data. There is a real synergy between the two. Real-time data streaming in conjunction with AI is a strategic differentiator for decision-making and improving customer experiences. From dynamic pricing and personalized marketing to predictive maintenance and fraud detection, the future of AI is real-time data. Benefits of Streaming Real-Time Data With AI There is an undeniable benefit to streaming real-time data with AI. AI is able to take in data from numerous sources essentially instantaneously. This amount of information allows for great speed in analyzing this data and impeccable accuracy in the data that institutions rely heavily on. GPU acceleration and in-memory processing fuel efficiency while cross-validation and hyperparameter training with the implementation of feedback loops can improve accuracy. Personalized customer service experiences are another benefit, as customer preferences and expectations can be better understood by culminating data from multiple sources such as social media, customer reviews, and transaction history. Simply removing end-to-end latency can greatly help respond to and predict events as well as continually learn. Scalability is largely benefited from real-time streamed analytics because hundreds and thousands of streams of data can be managed simultaneously. This, in turn, can save lots of money since it is distributed and requires fewer resources. User interfaces and experiences can be modernized to be intuitive and less complex. What Are the Components and Processes? Streaming data in real time is a complex and technological process. Therefore, it involves a particular infrastructure to operate. IoT sensors, mobile apps, and databases supply the data. This data is transformed into a readable and usable format before being input into a streaming storage solution such as a data lakehouse. It is then in a stage ready to be used in a meaningful way. The process of streaming involves data coming in from multiple sources: the Internet of Things, mobile apps, or web processing. New data patterns can be learned from this data, and AI applications and algorithms can adapt to necessary changes. They can also apply online inferencing and decisions to handle updates in real time. Who Benefits the Most? Many industries and businesses can benefit from AI in real-time and streaming data. Major contenders include finance, retail, transportation, and healthcare. In finance, fraud can be detected, and loss can be minimized by catching it in real-time by consistently tracking market prices. In retail, online shopping platforms can recommend items based on browsing activities, which can drive higher sales returns. In transportation, routes can be optimized, fleets can be managed, and deliveries can be more efficient by tracking weather and traffic in real time in AI. In healthcare, diagnoses can be detected, and conditions can be monitored with these systems with heightened accuracy. Retail can also use it for better inventory management, analysis of sales data, and managing supply chains. Investment decisions in finance can be better predicted to lower risk. AI can improve health records, wearable device activity, and treatment plans for patients. Network data can be followed with AI implementation to find outages and deliver more reliable service to customers. Preventative and predictive maintenance efforts can be planned effectively by locating patterns of behavior and tracking sensors in equipment. Industrial equipment can be tracked to help limit downtime and decreases in efficiencies. Additionally, media streaming and live broadcasting largely rely on these systems. Cybersecurity is improved by detecting security threats quicker and more accurately so that the appropriate measures can be taken. What Are the Challenges and Difficulties? Despite the overarching benefits of real-time and streaming data with AI, there are also several challenges and difficulties that arise, as always happens with technology. There are large amounts of potentially sensitive data, so care not to encounter any breaches by following regulation compliance is crucial. Encryption measures, security audits, and access control can minimize any problems in this area. There can also be push-back from some organizations as implementing artificial intelligence and real-time streamed data can be seen as innovative and tech-heavy. End-to-end latency, which is important for response time, calculation of real-time information, and instant insights, as well as user-ended applications, can drive costs up, which can be tricky to keep low. Further, highly adept teams are required for building, deploying, and maintaining streaming AI systems which can be challenging to find. The Future Data streams supply a stream of real-time information to other sources. This data is then integrated with other streaming platforms that allow for real-time data to be fed where it needs to be at all times. Security and governance are laced in all aspects, keeping data origination intact and keeping secure and reliable data access at all times. Finally, complex environments allow for data to be used anywhere and everywhere at any time — capitalizing on the importance of adaptability and flexibility in today's business world. This moves towards the future of business operations and gives you a competitive advantage in the market. Future innovations with deep learning and neural networks can revolutionize insights and predictions moving forward.
When you develop generative AI applications, you typically introduce three additional components to your infrastructure: an embedder, an LLM, and a vector database. However, if you are using MariaDB, you don't need to introduce an additional database along with its own SQL dialect — or even worse — its own proprietary API. Since MariaDB version 11.7 (and MariaDB Enterprise Server 11.4) you can simply store your embeddings (or vectors) in any column of any table—no need to make your applications database polyglots. "After announcing the preview of vector search in MariaDB Server, the vector search capability has now been added to the MariaDB Community Server 11.7 release," writes Ralf Gebhardt, Product Manager for MariaDB Server at MariaDB. This includes a new datatype (VECTOR), vector index, and a set of functions for vector manipulation. Why Are Vectors Needed in Generative AI Applications? Vectors are needed in generative AI applications because they embed complex meanings in a compact fixed-length array of numbers (a vector). This is more clear in the context of retrieval-augmented generation (or RAG). This technique allows you to fetch relevant data from your sources (APIs, files, databases) to enhance an AI model input with the fetched, often private-to-the-business, data. Since your data sources can be vast, you need a way to find the relevant pieces, given that current AI models have a finite context window — you cannot simply add all of your data to a prompt. By creating chunks of data and running these chunks of data through a special AI model called embedder, you can generate vectors and use proximity search techniques to find relevant information to be appended to a prompt. For example, take the following input from a user in a recommendation chatbot: Plain Text I need a good case for my iPhone 15 pro. Since your AI model was not trained with the exact data containing the product information in your online store, you need to retrieve the most relevant products and their information before sending the prompt to the model. For this, you send the original input from the user to an embedder and get a vector that you can later use to get the closest, say, 10 products to the user input. Once you get this information (and we'll see how to do this with MariaDB later), you can send the enhanced prompt to your AI model: Plain Text I need a good case for my iPhone 15 pro. Which of the following products better suit my needs? 1. ProShield Ultra Case for iPhone 15 Pro - $29.99: A slim, shock-absorbing case with raised edges for screen protection and a sleek matte finish. 2. EcoGuard Bio-Friendly Case for iPhone 15 Pro - $24.99: Made from 100% recycled materials, offering moderate drop protection with an eco-conscious design. 3. ArmorFlex Max Case for iPhone 15 Pro - $39.99: Heavy-duty protection with military-grade durability, including a built-in kickstand for hands-free use. 4. CrystalClear Slim Case for iPhone 15 Pro - $19.99: Ultra-thin and transparent, showcasing the phone's design while providing basic scratch protection. 5. LeatherTouch Luxe Case for iPhone 15 Pro - $49.99: Premium genuine leather construction with a soft-touch feel and an integrated cardholder for convenience. This results in AI predictions that use your own data. Creating Tables for Vector Storage To store vectors in MariaDB, use the new VECTOR data type. For example: MariaDB SQL CREATE TABLE products ( id INT PRIMARY KEY, name VARCHAR(100), description TEXT, embedding VECTOR(2048) ); In this example, the embedding column can hold a vector of 2048 dimensions. You have to match the number of dimensions that your embedder generates. Creating Vector Indexes For read performance, it's important to add an index to your vector column. This speeds up similarity searches. You can define the index at table creation time as follows: MariaDB SQL CREATE TABLE products ( id INT PRIMARY KEY, name VARCHAR(100), description TEXT, embedding VECTOR(2048) NOT NULL, VECTOR INDEX (embedding) ); For greater control, you can specify the distance function that the database server will use to build the index, as well as the M value of the Hierarchical Navigable Small Worlds (HNSW) algorithm used by MariaDB. For example: MariaDB SQL CREATE TABLE products ( id INT PRIMARY KEY, name VARCHAR(100), description TEXT, embedding VECTOR(2048) NOT NULL, VECTOR INDEX (embedding) M=8 DISTANCE=cosine ); Check the documentation for more information on these configurations. Inserting Vectors When you pass data (text, image, audio) through an embedder, you get a vector. Typically, this is a series of numbers in an array in JSON format. To insert this vector in a MariaDB table, you can use the VEC_FromText function. For example: MariaDB SQL INSERT INTO products (name, embedding) VALUES ("Alarm clock", VEC_FromText("[0.001, 0, ...]")), ("Cow figure", VEC_FromText("[1.0, 0.05, ...]")), ("Bicycle", VEC_FromText("[0.2, 0.156, ...]")); Remember that the inserted vectors must have the correct number of dimensions as defined in the CREATE TABLE statement. Similarity Search (Comparing Vectors) In RAG applications, you send the user input to an embedder to get a vector. You can then query the records in your database that are closer to that vector. Closer vectors represent data that are semantically similar. At the time of writing this, MariaDB has two distance functions that you can use for similarity or proximity search: VEC_DISTANCE_EUCLIDEAN: calculates the straight-line distance between two vectors. It is best suited for vectors derived from raw, unnormalized data or scenarios where spatial separation directly correlates with similarity, such as comparing positional or numeric features. However, it is less effective for high-dimensional or normalized embeddings since it is sensitive to differences in vector magnitude.VEC_DISTANCE_COSINE: measures the angular difference between vectors. Good for comparing normalized embeddings, especially in semantic applications like text or document retrieval. It excels at capturing similarity in meaning or context. Keep in mind that similarity search using the previous functions is only approximate and highly depends on the quality of the calculated vectors and, hence, on the quality of the embedder used. The following example, finds the top 10 most similar products to a given vector ($user_input_vector should be replaced with the actual vector returned by the embedder over the user input): MariaDB SQL SELECT id, name, description FROM products ORDER BY VEC_DISTANCE_COSINE( VEC_FromText($user_input_vector), embedding ) LIMIT 10; The VEC_DISTANCE_COSINE and VEC_DISTANCE_EUCLIDEAN functions take two vectors. In the previous example, one of the vectors is the vector calculated over the user input, and the other is the corresponding vector for each record in the products table. A Practical Example I have prepared a practical example using Java and no AI frameworks so you truly understand the process of creating generative AI applications leveraging MariaDB's vector search capabilities. You can find the code on GitHub.
The software conception, development, testing, deployment, and maintenance processes have fundamentally changed with the use of artificial intelligence (AI) and machine learning (ML) in the software development life cycle (SDLC). Businesses today want to automate their development processes in any way they can with the goals of increasing efficiency, positively impacting time to market, improving the quality of software, and being data-driven in their approaches. AI/ML is instrumental in achieving these goals as it helps in automating repetitive work processes, assists with predictive analytics and empowers intelligent systems that respond to changing needs. This article discusses the role of AI/ML at each stage of the SDLC, how they are able to add value to it, and the challenges organizations face or will face in order to exploit them to the maximum. Planning and Requirements Gathering Planning and requirements gathering is the first step in initiating the software development lifecycle and forming the basis of the entire software development project. With organizations being able to utilize ML and AI-enabled tools that can analyze historical data, they can make more educated guesses about user behavior, requirements, and project time frames. Key Applications Requirement analysis: Now, it is possible to gather and interpret functional requirements based on feedback using NLP tools such as IBM Watson, as they greatly assist in understanding the needs of teams, users, and other stakeholders.Predictive analytics: Machine learning models estimate risks of a project that could arise, allocation of resources and timelines based on the past. This capability helps teams avoid setbacks.Stakeholder sentiment analysis: Feedback from stakeholders is analyzed by AI tools for feature specification prioritization, ensuring time is not wasted on unimportant ones. Benefits Increased precision in capturing true requirements.Reduction in project risk identification time.Strengthened linkage between business objectives and technical aspects. Design Phase AI/ML in the design phase helps by giving the users tools for architecture decision-making, simulations, and visualizations, hence augmenting manual effort and facilitating the workflow. Key Applications Automated UI/UX design: AI solutions such as Figma make recommendations regarding the optimal design layout by applying behavioral data to improve user experience.Codebase analysis and optimization: Investigating business-specific needs, AI systems recommend the most effective system structures or data flow diagrams.Simulation and prototyping: Simulating multi-agent models, AI prototypical images of the product are constructed, helping them imagine converting the idea into an actual product without being fully developed. Benefits Quicker and multiple iterations of prototype models.Better addressing various needs would be through design and user elements integration.Enhancement in interrelationship between the designers of the development and the users of the designs. Development Phase AI/ML can improve the automation of coding tasks, code quality, and productivity during the development stage. Application Code generation: GitHub Copilot and OpenAI Codex are tools that aid developers, particularly in the more monotonous tasks to which they allow these developers to generate snippets of code, thus saving time.Code review and refactoring: Tools such as Deep Code and SonarQube perform a more in-depth function in that they check embedded code against standards, which verify against code quality by looking for vulnerabilities and inefficiencies.Version control optimization: Al algorithms assist by predicting potential solver collision and require more attention while taking care of most problems that involve versioning processes, including Git. Benefits Developments were sped up thanks to lowered coding requirements.Due to improved code quality, the number of defects was also reduced.Some other issues include the fostering of better teams using these automated code reviews. Testing Phase In an all-encompassing way, AI/ML assists in the testing phases by achieving automation of repetitive tasks, test case generation, and improvement of test coverage, which in all, results in quicker and more trustworthy releases. Application Test case generation: ML models reduce the causal part greatly by producing test cases depending on user stories, historical data, and other types of data, including past testing patterns.Automated testing: Intelligent frameworks such as Testim and Applitools guarantee full coverage of UI testing due to their automation capabilities, which allow for the continuous interface and interaction of users.Predictive bug detection: Early defect identification is made possible through machine learning models that do pattern analysis on repositories of code in order to spot potential bugs.Defect prioritization: Artificial intelligence tools assist QA teams by classifying and ordering the defects according to their impact, this assists them to concentrate on the most important ones first. Benefits Decreased manual efforts and increased coverage.Faster identification and resolution of bugs.Improved quality of the product provided there is constant validation. Deployment Phase Minimizing the duration of the downtime of the sense and also improving the efficiency of the deployment processes is part and parcel of the automation of the processes by AI/ML. Key Applications Predictive Deployment Strategies: With the use of AI systems, minimal risk and the duration of redevelopment have been decreased by recommending the most appropriate time to deploy and the strategies needed.Monitoring and Rollbacks: AI-managed deployment statistics that inform Roll Note mechanisms to be enabled once anomalies are detected are employed by Harness.Infrastructure Optimization: Deployments are enhanced by AI, which better predicts and satisfies requirements more effectively and at reduced costs. Benefits Lowered risks when deploying and the time it would take to do so.Cost of infrastructure is significantly lowered due to the effective allocation of resources.Stability has never been better, with operations running smoother and being able to recover from issues much quicker. Maintenance and Operations AI and machine learning tools come into play in the post-deployment stage to provide constant user support while ensuring that the system is reliable and its performance is optimized. Key Applications Anomaly detection: Anomaly detection tools powered by AI continually examine system logs and metrics for signs of abnormality, aiding in the limitation of service outages. Predictive maintenance: Predictive training models are used to estimate the likelihood of failures that might occur and the actions to avoid them, which results in a drop in the amount of repair work that can’t be planned. Chatbots for support: AI chatbots function as a first line of support 24/7 by providing answers to standard questions and passing over challenging cases to human support staff.Dynamic scaling: Real-time reports of how the system is used inform AI models, which then reallocate the system's resources as needed. Benefits A system that is always maintained will result in less equipment interruption.Use of AI-based support features reduces the amount of work needed to run the system.Resources and their allocation and automation are done based on how much is currently in demand. Benefits of AI/ML in SDLC Incorporating AI/ML into the SDLC brings about a multitude of advantages, including but not limited to increased efficiency, better quality products, and a shorter time to enter the market. Improved efficiency: The need for manual effort is eliminated since several repetitive tasks are done automatically, development time is hence shortened with productivity levels increased.Increased quality: AI/ML automated tools are able to raise the quality of the software produced through the modification of the code, increasing the test coverage and decreasing the rate of defects, among other things.Improved decision-making processes: The AI in the models makes a bazillion guesswork, enabling a data-driven decision-making process anytime during the SDLC.Cost reduction: The implementation of AI/ML leads to less reliance on human intervention, thereby ensuring a complete and streamlined process and eliminating unwanted wastage of resources.Adaptive systems: With the help of AI/ML, self-adjusting learning systems are developed that correct themselves to meet changing targets, resulting in a more efficient system with the passage of time. Challenges of AI/ML in SDLC While AI/ML has numerous advantages in the software development lifecycle, there are some challenges organizations should address. Data dependency: Construction of competent AI/ML models requires a large amount of quality data. In the absence of proper data, biases will be introduced, leading to poor performance.Integration complexity: To implement AI/ML tools in the existing framework, numerous changes to the workflow would be required, resulting in severe disruption and loss of time, therefore making the integration process complicated.Skill gaps: These tools have become a necessity across all sectors, yet there remain gaps still where people lack the specialized skills to use AI/ML tools resulting in the need for extra training.Bias and fairness: The algorithms built on AI tend to mirror the inherent biases within the data used to train it. This issue is especially problematic in the use of AI models within the finance and healthcare sectors, as it can generate unjustified consequences. Final Remarks It is celebrated that new technologies in AI/ML have mostly been adopted within the processes of the modern life cycle of system/ software development, deployment, and maintenance, and those actively automate processes, assist with decision-making, and help improve the quality of the software. AI/ML equips companies by enabling them to speed systems to market, slash costs, and design systems that are highly adaptable and efficient. Nevertheless, for organizations to fully enjoy the benefits, certain roadblocks need to be dealt with, things such as the quality of the data, complexity of integration, and lastly, skills. So, as long as they have appropriate adoption approaches, AI/ML can be effectively used for modern-day "software development." References Luger G.F. & Stubblefield W.A. (ref) "Artificial Intelligence: Structures and Strategies for Complex Problem Solving," Montreal: Benjamin/Cummings (1993). Dvorkin and Melnik G (2021) "AI in Software Development Lifecycle: State of the Art and Challenges." Journal of Software Engineering Research and Development. Leekha, Sophiya. (2020) "Impact of AI and Machine Learning on Software Development Lifecycle." Proceedings of the International Conference on Computer Science and Software Engineering. Raj, A. And Verma, A. (2019) Artificial Intelligence and Machine Learning for Agile SDLC: A Comprehensive Review. Journal of Systems and Software. Sharma, Rashmi & Singh, Sharmila. (2021) AI-based Automation in Software Testing: Trends and Challenges. Journal of Testing Technology. Zou, J., & Yuan, S. (2022). "Integrating Machine Learning into Software Development: Benefits, Challenges, and Best Practices." Journal of Software Engineering Practice. Seshan, V., & Mahadevan, P. (2018). Predictive Analytics and AI in Software Development Lifecycle: Opportunities and Challenges. International Journal of Computer Science and Information Systems.
End-to-end tests are essential for ensuring the reliability of your application, but they can also be a source of frustration. Even small changes to the user interface can cause tests to fail, leading developers and QA teams to spend hours troubleshooting. In this blog post, I’ll show you how to utilize AI tools like ChatGPT or Copilot to automatically fix Playwright tests. You’ll learn how to create an AI prompt for any test that fails and attach it to your HTML report. This way, you can easily copy and paste the prompt into AI tools for quick suggestions on fixing the test. Join me to streamline your testing process and improve application reliability! Let’s dive in! Plan The solution comes down to three simple steps: Identify when a Playwright test fails.Create an AI prompt with all the necessary context: The error messageA snippet of the test codeAn ARIA snapshot of the pageIntegrate the prompt into the Playwright HTML report. By following these steps, you can enhance your end-to-end testing process and make fixing Playwright tests a breeze. Step-by-Step Guide Step 1: Detecting a Failed Test To detect a failed test in Playwright, you can create a custom fixture that checks the test result during the teardown phase, after the test has completed. If there’s an error in testInfo.error and the test won't be retried, the fixture will generate a helpful prompt. Check out the code snippet below: JavaScript import { test as base } from '@playwright/test'; import { attachAIFix } from '../../ai/fix-with-ai' export const test = base.extend({ fixWithAI: [async ({ page }, use, testInfo) => { await use() await attachAIFix(page, testInfo) }, { scope: 'test', auto: true }] }); Step 2: Building the Prompt Prompt Template I'll start with a simple proof-of-concept prompt (you can refine it later): You are an expert in Playwright testing. Your task is to fix the error in the Playwright test titled "{title}". - First, provide a highlighted diff of the corrected code snippet. - Base your fix solely on the ARIA snapshot of the page. - Do not introduce any new code. - Avoid adding comments within the code. - Ensure that the test logic remains unchanged. - Use only role-based locators such as getByRole, getByLabel, etc. - For any 'heading' roles, try to adjust the heading level first. - At the end, include concise notes summarizing the changes made. - If the test appears to be correct and the issue is a bug on the page, please note that as well. Input: {error} Code snippet of the failing test: {snippet} ARIA snapshot of the page: {ariaSnapshot} Let’s fill the prompt with the necessary data. Error Message Playwright stores the error message in testInfo.error.message. However, it includes special ASCII control codes for coloring output in the terminal (such as [2m or [22m): TimeoutError: locator.click: Timeout 1000ms exceeded. Call log: [2m - waiting for getByRole('button', { name: 'Get started' })[22m After investigating Playwright’s source code, I found a stripAnsiEscapes function that removes these special symbols: JavaScript const clearedErrorMessage = stripAnsiEscapes(testInfo.error.message); Cleared error message: TimeoutError: locator.click: Timeout 1000ms exceeded. Call log: - waiting for getByRole('button', { name: 'Get started' }) This cleaned-up message can be inserted into the prompt template. Code Snippet The test code snippet is crucial for AI to generate the necessary code changes. Playwright often includes these snippets in its reports, for example: 4 | test('get started link', async ({ page }) => { 5 | await page.goto('https://playwright.dev'); > 6 | await page.getByRole('button', { name: 'Get started' }).click(); | ^ 7 | await expect(page.getByRole('heading', { level: 3, name: 'Installation' })).toBeVisible(); 8 | }); You can see how Playwright internally generates these snippets. I’ve extracted the relevant code into a helper function, getCodeSnippet(), to retrieve the source code lines from the error stack trace: const snippet = getCodeSnippet(testInfo.error); ARIA Snapshot ARIA snapshots, introduced in Playwright 1.49, provide a structured view of the page’s accessibility tree. Here’s an example ARIA snapshot showing the navigation menu on the Playwright homepage: - document: - navigation "Main": - link "Playwright logo Playwright": - img "Playwright logo" - text: Playwright - link "Docs" - link "API" - button "Node.js" - link "Community" ... While ARIA snapshots are primarily used for snapshot comparison, they are also a game-changer for AI prompts in web testing. Compared to raw HTML, ARIA snapshots offer: Small size → Less risk of hitting prompt limitsLess noise → Less unnecessary contextRole-based structure → Encourages AI to generate role-based locators Playwright provides .ariaSnapshot(), which you can call on any element. For AI to fix a test, it makes sense to include the ARIA snapshot of the entire page retrieved from the root <html> element: HTML const ariaSnapshot = await page.locator('html').ariaSnapshot(); Assembling the Prompt Finally, combine all the pieces into one prompt: HTML const errorMessage = stripAnsiEscapes(testInfo.error.message); const snippet = getCodeSnippet(testInfo.error); const ariaSnapshot = await page.locator('html').ariaSnapshot(); const prompt = promptTemplate .replace('{title}', testInfo.title) .replace('{error}', errorMessage) .replace('{snippet}', snippet) .replace('{ariaSnapshot}', ariaSnapshot); Example of the generated prompt: Step 3: Attach the Prompt to the Report When the prompt is built, you can attach it to the test using testInfo.attach: HTML export async function attachAIFix(page: Page, testInfo: TestInfo) { const willRetry = testInfo.retry < testInfo.project.retries if (testInfo.error && !willRetry) { const prompt = generatePrompt({ title: testInfo.title, error: testInfo.error, ariaSnapshot: await page.locator('html').ariaSnapshot(), }); await testInfo.attach('AI Fix: Copy below prompt and paste to Github Copilot Edits to see the magic', { body: prompt }) } } Now, whenever a test fails, the HTML report will include an attachment labeled "Fix with AI." Fix Using Copilot Edits When it comes to using ChatGPT for fixing tests, you typically have to manually implement the suggested changes. However, you can make this process much more efficient by using Copilot. Instead of pasting the prompt into ChatGPT, simply open the Copilot edits window in VS Code and paste your prompt there. Copilot will then recommend code changes that you can quickly review and apply — all from within your editor. Check out this demo video of fixing a test with Copilot in VS Code: Integrating "Fix with AI" into Your Project Vitaliy Potapov created a fully working GitHub repository demonstrating the "Fix with AI" workflow. Feel free to explore it, run tests, check out the generated prompts, and fix errors with AI help. To integrate the "Fix with AI" flow into your own project, follow these steps: Ensure you’re on Playwright 1.49 or newerCopy the fix-with-ai.ts file into your test directoryRegister the AI-attachment fixture: HTML import { test as base } from '@playwright/test'; import { attachAIFix } from '../../ai/fix-with-ai' export const test = base.extend({ fixWithAI: [async ({ page }, use, testInfo) => { await use() await attachAIFix(page, testInfo) }, { scope: 'test', auto: true }] }); Run your tests and open the HTML report to see the “Fix with AI” attachment under any failed test From there, simply copy and paste the prompt into ChatGPT or GitHub Copilot, or use Copilot’s edits mode to automatically apply the code changes. Relevant Links Fully-working GitHub repositoryOriginally written by Vitaliy Potapov: https://dev.to/vitalets/fix-with-ai-button-in-playwright-html-report-2j37 I’d love to hear your thoughts or prompt suggestions for making the “Fix with AI” process even more seamless. Feel free to share your feedback in the comments. Thanks for reading, and happy testing with AI!
Content moderation is crucial for any digital platform to ensure the trust and safety of the users. While human moderation can handle some tasks, AI-driven real-time moderation becomes essential as platforms scale. Machine learning (ML) powered systems can moderate content efficiently at scale with minimal retraining and operational costs. This step-by-step guide outlines an approach to deploying an AI-powered real-time moderation system. Attributes of Real-Time Moderation System A real-time content moderation system evaluates user-submitted content — text, images, videos, or other formats — to ensure compliance with platform policies. Key attributes of an effective system include: Speed: Ability to review content without degrading the user experience or introducing significant latency.Scalability: Ability to handle thousands of requests per second in a timely manner.Accuracy: Minimizing false positives and false negatives for reliability. Step-by-Step Guide to Deploying AI Content Moderation System Step 1: Define Policies Policies are the foundation for any content moderation system. A policy defines the rules against which the content would be evaluated. There can be different policies such as hate speech, fraud prevention, adult and sexual content, etc. Here is an example of policies defined by X (Twitter). These policies are defined as objective rules, which can be stored as a configuration for easy access and evaluation. Step 2: Data Collection and Preprocessing Once the policies are defined, we need to collect data to serve as samples for training machine learning models. The dataset should include a good mix of different types of content expected on the platform, as well as both policy-compliant and non-compliant examples, to avoid bias. Sources of data: Synthetic data generation: Use generative AI to create data.Open-source datasets: Multiple datasets are available online on platforms and other open-source websites. Choose the dataset that fits the platform's needs.Historical user-generated content: Ethically utilize the historical content posted by the users. Once the data is collected, it needs to be labeled with highly trained human reviewers who have a strong understanding of the platform's policies. This labeled data would be treated as a "Golden Set" and can be used to train or fine-tune the ML models. Before the ML models can operate on data and produce results, the data must be processed for efficiency and compatibility purposes. Some preprocessing techniques might include: Text data: Normalize the text by removing stop words and breaking it down into n-grams, depending on how the data is supposed to be consumed.Image data: Standardize images to certain resolution or pixels or size or format for model compatibility.Video: Extract different frames to process them as images.Audio: Transcribe audio into text using widely available NLP models and use the text models afterward. However, this approach may miss out on any non-verbal content that needs to be moderated. Step 3: Model Training and Selection A variety of models can be used depending on the platform's needs and the content type being supported. Some options to consider are: Text Bag of words/Term Frequency-Inverse Document Frequency (TF-IDF): Harmful or policy-violating words can be assigned high weights, making it possible to catch policy violations even if they occur infrequently. However, this approach can have limitations as the word list to match the violating text would be limited, and sophisticated actors can find loopholes.Transformers: This is the idea behind GPTs and can be effective in capturing euphemisms or subtle forms of harmful text. One possible approach is to fine-tune GPT based on the platform's policies. Image Pre-Trained Convolutional Neural Networks (CNNs): These models are trained on a large dataset of images and can identify harmful content like nudity, violence, etc. Some common models are VGG and ResNetCustom CNNS: For improved precision and recall, CNNs can be fine-tuned for specific categories and adapted for the platform's policy needs. All of these models must be trained and evaluated against the "Golden Data Set" to achieve the desired performance before deployment. The models can be trained to generate labels which can then be processed to provide the decision regarding the content. Step 4: Deployment Once the models are ready for deployment, they can be exposed using some APIs that different services can call for real-time moderation. If real-time moderation is not required for less urgent tasks, a batch processing system can be set up instead. Step 5: Human Review AI/ML systems may not be able to confidently make decisions for all cases. Ambiguous decisions may arise where the predicted ML score can be lower than the selected thresholds for confident decision-making. In these scenarios, the content should be reviewed by human moderators for accurate decision-making. Human reviewers are essential for reviewing the false positive decisions made by the AI system. Human reviewers can generate similar labels as ML models using a decision tree (with policies coded in the form of a decision tree), and these labels can be used to finalize decisions. Step 6: Label Processor A label processor can be used to interpret the labels generated by ML systems and human reviewers and convert them into actionable decisions for users. This could be a straightforward system that maps system-generated strings to human-readable strings. Step 7: Analytics and Reporting Tools like Tableau and Power BI can be used to track and visualize the moderation metrics, and Apache airflow can be used to generate insights. Key metrics to monitor include precision and recall for the ML systems, human review time, throughput, and response time. Conclusion Building and deploying an AI-powered real-time moderation system ensures the scalability and safety of digital platforms. This guide provides a roadmap to balancing speed, accuracy, and human oversight, ensuring content aligns with your platform’s policies and values.
Chain-of-thought (CoT) prompting has emerged as a transformative technique in artificial intelligence, enabling large language models (LLMs) to break down complex problems into logical, sequential steps. First introduced by Wei et al. in 2022, this approach mirrors human cognitive processes and has demonstrated remarkable improvements in tasks requiring multi-step reasoning[1]. CoT: Explanation and Definition What Is CoT? Chain-of-thought prompting is a technique that guides LLMs through structured reasoning processes by breaking down complex tasks into smaller, manageable steps. Unlike traditional prompting, which seeks direct answers, CoT encourages models to articulate intermediate reasoning steps before reaching a conclusion, significantly improving their ability to perform complex reasoning tasks [1]. Figure 1: Wei et al. (2022) "Chain-of-Thought Prompting Elicits Reasoning in Large Language Models" Types of CoT Let's break down the different types of chain-of-thought (CoT) approaches in detail: Zero-Shot CoT This simplest form of CoT requires no examples and uses basic prompts like "Let's think step by step" or "Let's solve this problem step by step." It relies on the model's inherent ability to break down problems without demonstrations [1]. As demonstrated in Kojima et al.'s seminal work (2022), Large Language Models are inherently capable of zero-shot reasoning when prompted appropriately. Their research, illustrated in Figure 1 of their paper, shows how LLMs can generate coherent reasoning chains simply by including phrases like "Let's solve this step by step" without requiring any demonstrations or examples. This ability emerges naturally in sufficiently large language models, though it's important to note that this capability is primarily observed in models with more than 100B parameters [2]. Figure 2: Kojima et al.'s seminal (2022), Large Language Models are inherently capable of zero-shot reasoning Key characteristics: No examples neededUses simple universal promptsLower performance than other CoT variantsWorks mainly with larger models (>100B parameters) Few-Shot CoT Few-shot CoT builds upon zero-shot CoT by incorporating demonstrations with explicit reasoning steps. Unlike zero-shot, which relies solely on simple prompts, few-shot CoT provides carefully crafted examples that guide the model's reasoning process. Wei et al. (2022) demonstrated that providing eight exemplars with chains of thought significantly improves performance across various reasoning tasks [1]. Key to successful few-shot CoT implementation is selecting appropriate exemplars that align with the target reasoning task. The examples should demonstrate the complete thought process, from initial problem understanding to final solution, allowing the model to learn both the reasoning structure and the expected output format [1][2]. Key characteristics: Uses 2-8 examples typicallyEach example includes: input question, step-by-step reasoning, final answerMore reliable than zero-shotRequires manual creation of demonstrations Auto-CoT Auto-CoT is an automated approach to generating chain-of-thought demonstrations through clustering and pattern recognition, as introduced by Fu et al. (2023). Unlike few-shot CoT which requires manual examples, auto-CoT automatically generates its own reasoning chains. Figure 3: Zhang, Z et al., Automatic Chain of Thought Prompting in Large Language Models Key characteristics: Automatically clusters similar questions from the datasetGenerates reasoning chains for representative examples from each clusterReduces the need for manual annotation while maintaining effectivenessDone during the setup phase, not at inference time Active-Prompt CoT Active-prompt CoT represents an advanced approach to chain-of-thought prompting that uses uncertainty estimation to identify challenging questions and strategically selects examples for human annotation. Fu et al. (2023) demonstrated that this method achieves substantial improvements over traditional CoT approaches [5]. Key to successful active-prompt CoT implementation is the strategic selection of examples based on model uncertainty, focusing annotation efforts on the most uncertain cases. This targeted approach reduces the need for exhaustive dataset annotation while maintaining or improving performance compared to standard CoT methods [5]. Figure 4: Diao, S et al. (2023). Active Prompting with Chain-of-Thought for Large Language Models Key characteristics: Uses uncertainty estimation to identify challenging questionsDynamically adapts to different tasks with task-specific promptsFocuses annotation efforts on uncertain casesMore efficient than manual annotation of entire datasetsAchieves better performance than standard CoT and Auto-CoT Self-Consistency CoT Self-consistency CoT enhances the standard CoT approach by sampling multiple reasoning paths and selecting the most consistent answer. Introduced by Wang et al. (2022), this method significantly improves reasoning performance compared to greedy decoding [6]. Figure 5: Wang, X. et al. (2022). Self-Consistency Improves Chain of Thought Reasoning in Language Models Key characteristics: Samples multiple reasoning chains (typically 40-50) instead of using greedy decodingTakes majority vote among generated answersMore robust than single-path reasoningBetter handles complex problems with multiple possible approaches Comparison Here is a comparison table that resumes previously detailed based on some key factors: Method Complexity Human EffortAccuracy Key Advantage Main Limitation Zero-shot CoT Low None Lowest Simple implementation Limited performance Few-shot CoT Medium High High Reliable results Manual example creation Auto-CoT Medium Low Medium+ Automated examples Clustering overhead Active-Prompt High Medium High Targeted optimization Complex implementation Self-Consistency Highest Medium Highest Most reliable Highest computation cost When to Use It Chain-of-thought (CoT) prompting is particularly effective for complex tasks requiring multi-step reasoning. Understanding when to apply CoT is crucial for optimal results. Benefits CoT prompting offers several key advantages when implemented correctly [1]. First, it significantly enhances accuracy in complex problem-solving tasks requiring multiple steps, showing improvements of up to +18% on arithmetic tasks. This improvement is particularly notable in mathematical reasoning and symbolic manipulation tasks where step-by-step problem decomposition is essential [1][7]. Figure 6: Wei et al. (2022) "Chain-of-Thought Prompting Elicits Reasoning in Large Language Models" Second, CoT provides unprecedented transparency into the model's reasoning process. By making intermediate steps explicit and verifiable, it enables a better understanding of how the model arrives at its conclusions [2]. This transparency is crucial for both validation and debugging of model outputs [1][6][7]. Third, CoT excels at handling complex tasks requiring sequential reasoning. It shows particular effectiveness in mathematical word problems, temporal reasoning, and multi-step logical deductions12. The ability to break down complex problems into manageable steps makes it especially valuable for tasks that would be difficult to solve in a single step [1][3][7]. Figure 7: Fu, Y. et al. (2023). Complexity-Based Prompting for Multi-step Reasoning. arXiv preprint arXiv:2210.00720. Trade-Offs While chain-of-thought (CoT) prompting demonstrates impressive capabilities, it comes with several significant considerations that must be carefully weighed. First, computational costs represent a major trade-off. Generating detailed reasoning chains demands substantially more computational resources and processing time compared to direct prompting, as models need to generate longer sequences that include intermediate reasoning steps, directly impacting operational costs when using commercial API services. Second, implementation requirements pose considerable challenges. CoT demands careful prompt engineering and typically requires larger models exceeding 100B parameters for optimal performance. Ma et al. (2023) demonstrated that while smaller models can be enhanced through knowledge distillation, they still struggle to match the reasoning capabilities of larger models in complex tasks. Figure 8: Wei et al. (2022) "Chain-of-Thought Prompting Elicits Reasoning in Large Language Models" Third, reliability concerns have emerged in recent research. Wang et al. (2022) found that CoT can sometimes produce convincing but incorrect reasoning chains, particularly in domains requiring specialized knowledge. This "false confidence" problem becomes especially critical in applications where reasoning verification is essential. Figure 9: Wei et al. (2022) "Chain-of-Thought Prompting Elicits Reasoning in Large Language Models" Fourth, domain adaptation remains challenging. Recent work by Fu et al. (2023) highlights that CoT performance varies significantly across different domains and task types. The effectiveness of CoT prompting depends heavily on the alignment between the task domain and the model's training data, making consistent cross-domain application difficult. Conclusion Chain-of-thought (CoT) prompting represents a significant advancement in enhancing Large Language Models' reasoning capabilities. Through its various implementations — from simple zero-shot approaches to sophisticated methods like active-prompt and self-consistency — CoT has demonstrated remarkable improvements in complex problem-solving tasks, particularly in areas requiring multi-step reasoning. The evolution of CoT techniques reflects the field's rapid progress. While zero-shot and few-shot CoT provided initial breakthroughs in reasoning capabilities, newer approaches like auto-CoT and active-prompt CoT have addressed scalability and efficiency challenges. Self-consistency CoT further enhanced reliability by leveraging multiple reasoning paths, marking a significant step toward more robust AI reasoning systems. However, important challenges remain. The requirement for large models (>100B parameters) limits accessibility, while computational costs and prompt engineering complexity pose implementation challenges. These limitations suggest future research directions, including: Developing more efficient CoT techniques for smaller modelsReducing computational overhead while maintaining performanceImproving prompt engineering automationEnhancing reliability for critical applications As AI continues to evolve, CoT prompting stands as a crucial technique for enabling transparent and verifiable reasoning in language models. Its ability to break down complex problems into interpretable steps not only improves performance but also provides valuable insights into AI decision-making processes, making it an essential tool for the future of artificial intelligence. References [1] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Chi, E., Le, Q., & Zhou, D. (2022). Chain of Thought Prompting Elicits Reasoning in Large Language Models. arXiv preprint arXiv:2201.11903. [2] Kojima, T., Gu, S. S., Reid, M., Matsuo, Y., & Iwasawa, Y. (2022). Large Language Models are Zero-Shot Reasoners. arXiv preprint arXiv:2205.11916. [3] Fu, Y., Peng, H., Sabharwal, A., Clark, P., & Khot, T. (2023). Complexity-Based Prompting for Multi-step Reasoning. arXiv preprint arXiv:2210.00720. [4] Zhang, Z., Zhang, A., Li, M., & Smola, A. (2022). Automatic Chain of Thought Prompting in Large Language Models. arXiv:2210.03493 [5] Diao, S., Wang, P., Lin, Y., Pan, R., Liu, X., & Zhang, T. (2023). Active Prompting with Chain-of-Thought for Large Language Models. [6] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E. H., Narang, S., Chowdhery, A., & Zhou, D. (2022). Self-Consistency Improves Chain of Thought Reasoning in Language Models. arXiv:2203.11171. [7] Hao, H., Zhang, K., & Xiong, M. (2023). Dynamic Models of Neural Population Dynamics. Society of Artificial Intelligence Research and University of Texas, School of Public Health. [8] Chu, Z., Chen, J., Chen, Q., Yu, W., He, T., Wang, H., Peng, W., Liu, M., Qin, B., & Liu, T. (2023). Navigate through Enigmatic Labyrinth: A Survey of Chain of Thought Reasoning: Advances, Frontiers and Future. arXiv preprint arXiv:2309.15402. [9] Ma, Y., Jiang, H., & Fan, C. (2023). Sci-CoT: Leveraging Large Language Models for Enhanced Knowledge Distillation in Small Models for Scientific QA. arXiv preprint arXiv:2308.04679.
Tuhin Chattopadhyay
CEO at Tuhin AI Advisory and Professor of Practice,
JAGSoM
Frederic Jacquet
Technology Evangelist,
AI[4]Human-Nexus
Suri Nuthalapati
Data & AI Practice Lead, Americas,
Cloudera
Pratik Prakash
Principal Solution Architect,
Capital One