DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Please enter at least three characters to search
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Zones

Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks

The software you build is only as secure as the code that powers it. Learn how malicious code creeps into your software supply chain.

Apache Cassandra combines the benefits of major NoSQL databases to support data management needs not covered by traditional RDBMS vendors.

Generative AI has transformed nearly every industry. How can you leverage GenAI to improve your productivity and efficiency?

Modernize your data layer. Learn how to design cloud-native database architectures to meet the evolving demands of AI and GenAI workloads.

Related

  • Cross-Validation in AI and Machine Learning for Healthcare
  • AI and Text Analysis: Best Approaches To Follow
  • Maintaining ML Model Accuracy With Automated Drift Detection
  • Implementing Ethical AI: Practical Techniques for Aligning AI Agents With Human Values

Trending

  • Cloud Security and Privacy: Best Practices to Mitigate the Risks
  • Designing a Java Connector for Software Integrations
  • Unit Testing Large Codebases: Principles, Practices, and C++ Examples
  • Modern Test Automation With AI (LLM) and Playwright MCP
  1. DZone
  2. Data Engineering
  3. AI/ML
  4. Machine Learning: Validation Techniques

Machine Learning: Validation Techniques

In this post, you will briefly learn about different validation techniques: resubstitution, hold-out, k-fold cross-validation, LOOCV, random subsampling, and bootstrapping.

By 
Ajitesh Kumar user avatar
Ajitesh Kumar
·
Feb. 12, 18 · Opinion
Likes (5)
Comment
Save
Tweet
Share
44.5K Views

Join the DZone community and get the full member experience.

Join For Free

Validation techniques in machine learning are used to get the error rate of the ML model, which can be considered as close to the true error rate of the population. If the data volume is large enough to be representative of the population, you may not need the validation techniques. However, in real-world scenarios, we work with samples of data that may not be a true representative of the population. This is where validation techniques come into the picture.

In this post, you will briefly learn about different validation techniques:

  • Resubstitution
  • Hold-out
  • K-fold cross-validation
  • LOOCV
  • Random subsampling
  • Bootstrapping

Machine Learning Validation Techniques

Resubstitution

If all the data is used for training the model and the error rate is evaluated based on outcome vs. actual value from the same training data set, this error is called the resubstitution error. This technique is called the resubstitution validation technique.

Holdout

To avoid the resubstitution error, the data is split into two different datasets labeled as a training and a testing dataset. This can be a 60/40 or 70/30 or 80/20 split. This technique is called the hold-out validation technique. In this case, there is a likelihood that uneven distribution of different classes of data is found in training and test dataset. To fix this, the training and test dataset is created with equal distribution of different classes of data. This process is called stratification.

K-Fold Cross-Validation

In this technique, k-1 folds are used for training and the remaining one is used for testing as shown in the picture given below. K-fold cross-validation

Figure 1: K-fold cross-validation

The advantage is that entire data is used for training and testing. The error rate of the model is average of the error rate of each iteration. This technique can also be called a form the repeated hold-out method. The error rate could be improved by using stratification technique.

Leave-One-Out Cross-Validation (LOOCV)

In this technique, all of the data except one record is used for training and one record is used for testing. This process is repeated for N times if there are N records. The advantage is that entire data is used for training and testing. The error rate of the model is average of the error rate of each iteration. The following diagram represents the LOOCV validation technique. LOOCV validation technique

Figure 2: LOOCV validation technique

Random Subsampling

In this technique, multiple sets of data are randomly chosen from the dataset and combined to form a test dataset. The remaining data forms the training dataset. The following diagram represents the random subsampling validation technique. The error rate of the model is the average of the error rate of each iteration. random subsampling validation technique

Figure 3: Random subsampling validation technique

Bootstrapping

In this technique, the training dataset is randomly selected with replacement. The remaining examples that were not selected for training are used for testing. Unlike K-fold cross-validation, the value is likely to change from fold-to-fold. The error rate of the model is average of the error rate of each iteration. The following diagram represents the same. bootstrapping validation technique

Figure 4: Bootstrapping validation technique

Machine learning Cross-validation (analytical chemistry) Data (computing)

Published at DZone with permission of Ajitesh Kumar, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

Related

  • Cross-Validation in AI and Machine Learning for Healthcare
  • AI and Text Analysis: Best Approaches To Follow
  • Maintaining ML Model Accuracy With Automated Drift Detection
  • Implementing Ethical AI: Practical Techniques for Aligning AI Agents With Human Values

Partner Resources

×

Comments
Oops! Something Went Wrong

The likes didn't load as expected. Please refresh the page and try again.

ABOUT US

  • About DZone
  • Support and feedback
  • Community research
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 100
  • Nashville, TN 37211
  • support@dzone.com

Let's be friends:

Likes
There are no likes...yet! 👀
Be the first to like this post!
It looks like you're not logged in.
Sign in to see who liked this post!