DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Related

  • Why Your Test Automation Is Always Behind the Code And the Architecture That Fixes It
  • Persistent Memory for AI Agents Using LangChain's Deep Agents
  • The Hidden Cost of AI Tokens: Engineering Patterns for 10x Resource Efficiency
  • Testing AI-Infused Apps: A Dual-Layer Framework for AI Quality Assurance

Trending

  • Getting Started With Agentic Workflows in Java and Quarkus
  • 5 Common Security Pitfalls in Serverless Architectures
  • Every Cache Miss Is a Tiny Tax on Your Performance
  • Building a Spring AI Assistant With MCP Servers: A Step-by-Step Tutorial
  1. DZone
  2. Data Engineering
  3. AI/ML
  4. The Importance of Verification and Validation on Artificial Intelligence Systems

The Importance of Verification and Validation on Artificial Intelligence Systems

Learn the importance of verification and validation among artificial intelligence systems with an example from CalysoAI’s VESPR platform.

By 
Neil Serebryany user avatar
Neil Serebryany
·
Feb. 21, 22 · Opinion
Likes (2)
Comment
Save
Tweet
Share
5.1K Views

Join the DZone community and get the full member experience.

Join For Free

AI systems are increasingly being used in mission and safety-critical applications. One of the more high-profile uses of AI is in the development of autonomous and semi-autonomous vehicles. 

A key component of these systems are models that perform image classification: they take an input image and predict a label. These systems help classify objects like the type of sign on the side of a road: does it indicate to stop, yield, or is it simply a warning about a turtle crossing? 

These systems have experienced several high-profile failures that appear to have come about in naturally-occurring scenarios rather than intentional manipulation (e.g. a hacker accessing and attacking the system). While these particular failures are specific to autonomous vehicles, any AI system that performs image classification is similarly vulnerable. Such failures rightly jeopardize users’ trust in these systems. 

If we want to establish justified confidence in these systems, we must start by quantifying how robust they are to these, and other types of manipulation. 

Testing AI Systems Using Verification and Validation

Verification and validation tests are at the heart of CalysoAI’s VESPR platform. Every image classification model developed through the platform is evaluated using a battery of assessments, including a suite of tests for naturally-occurring image corruptions. These assessments can also be used to evaluate externally-trained TensorFlow (or Keras) models. 

As a quick example, we used the VESPR platform to evaluate an off-the-shelf image classification model published by DeepMind that achieves near-perfect (99 percent top-1 accuracy) performance on in-sample images. The model uses a VGG-style architecture and was trained on the popular CIFAR10 dataset of 32 x 32-pixel images. However, when this model is evaluated using our suite of 25 different corruption tests (5 different corruption types with 5 levels each) the top-1 percent accuracy drops to between 7.8 percent and 11.3 percent, which is equivalent to the model randomly guessing one of the 10 possible labels. 

These tests effectively simulate things like digital noise, an out-of-focus camera, and distortions caused by up-sampling a low-resolution image. While these tests do not increase the robustness of the pre-trained model to naturally-occurring corruptions, they do provide important diagnostic information. At the lowest level, this information can be used to compare multiple candidate models and select the best among them. However, this information is also critical for decision-makers who must assess if the model will perform adequately in real-world conditions. 

Overlaying the context in which the model is deployed informs the deployment decision. There are a few things we should ask ourselves: Should a model be deployed in isolation? Should it be deployed with a human-in-the-loop to review decisions in certain circumstances? Or, should the model not be deployed at all? Answering these questions in an evidence-based way is necessary to enable confidence, trust, and transparency in any AI system.

AI

Published at DZone with permission of Neil Serebryany. See the original article here.

Opinions expressed by DZone contributors are their own.

Related

  • Why Your Test Automation Is Always Behind the Code And the Architecture That Fixes It
  • Persistent Memory for AI Agents Using LangChain's Deep Agents
  • The Hidden Cost of AI Tokens: Engineering Patterns for 10x Resource Efficiency
  • Testing AI-Infused Apps: A Dual-Layer Framework for AI Quality Assurance

Partner Resources

×

Comments

The likes didn't load as expected. Please refresh the page and try again.

  • RSS
  • X
  • Facebook

ABOUT US

  • About DZone
  • Support and feedback
  • Community research

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 215
  • Nashville, TN 37211
  • [email protected]

Let's be friends:

  • RSS
  • X
  • Facebook