DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports Events Over 2 million developers have joined DZone. Join Today! Thanks for visiting DZone today,
Edit Profile Manage Email Subscriptions Moderation Admin Console How to Post to DZone Article Submission Guidelines
View Profile
Sign Out
Refcards
Trend Reports
Events
Zones
Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
  1. DZone
  2. Data Engineering
  3. AI/ML
  4. QA: How Reliable Are Your Machine Learning Systems?

QA: How Reliable Are Your Machine Learning Systems?

Let's learn about what the reliability of ML systems is, why you should bother with ML model's reliability, and who should take care of the ML system's reliability.

Ajitesh Kumar user avatar by
Ajitesh Kumar
CORE ·
Oct. 01, 18 · Opinion
Like (5)
Save
Tweet
Share
5.71K Views

Join the DZone community and get the full member experience.

Join For Free

In this post, you will learn about different aspects of creating a Machine Learning system with high reliability. It should be noted that system reliability is one of the key software quality attributes as per ISO 25000 SQUARE specifications.

Have you put measures in place to ensure high reliability of your Machine Learning systems? In this post, you will learn about some of the following:

  1. What is the reliability of Machine Learning systems?
  2. Why bother about Machine Learning models reliability?
  3. Who should take care of the ML systems reliability?

What Is the Reliability of ML Systems?

As like software applications, the reliability of Machine Learning systems is primarily related to the fault tolerance and recoverability of the system in production. In addition, the reliability of ML systems is related to how reliable is the training process of ML models. Let's look into the details related to both the aspects:

ML Model ReliabilityFig: ML Model Reliability

Fault-Tolerance/Recoverability of ML Systems in Production

Fault tolerance of ML systems could be defined as the behavior of the system when the model performance starts degrading beyond the acceptable limits. The ideal behavior of the ML system is to fall back to either one of last best serving models or simplistic heuristics model built using rules.

One of the key aspects of recoverability is to record the features information and related predictions for monitoring the data and related metrics. This would help in coming up with alternate models, which could provide greater accuracy in case the model performance starts degrading.

Reliability of ML Training Process

Reliability of ML training process depends upon how repeatable is the model training process. The goal is to detect the problems with the models and prevent the models from moving into production. One of the goals of operationalizing the Machine Learning training/testing process is to achieve automation of the overall ML models training/testing process. As part of the model training automation process, the following would need to be achieved:

  • Automated data extraction from disparate data sources
  • Feature extraction
  • Model training/testing
  • Evaluating models
  • Model selection
  • Storing model evaluation metrics apart from storing the information such as hyperparameters, data used for training the models

In order to avoid the bad models to move into the production, the different form of quality checks would need to be performed on different aspects of ML models such as the following:

  • Data (data poisoning, data quality, data compliance)
  • Features (features threshold, features importance, unit tests)
  • Models (fairness, stability, online-offline metrics)
  • ML pipeline (pipeline security)

The container technology along with workflow tools could be used to achieve a repeatable model training process.

Why Bother With ML Models Reliability?

Reliability is one of the key traits of software product quality. This is as per ISO 25000 SQUARE specifications for evaluating software product quality. Ensuring reliability of the model would make the models more trustable and hence greater adoption of models by the end user.

Who Should Take Care of the ML Systems Reliability?

It is the responsibility of some of the following to create and monitor reliable ML model training/testing system.

  • ML researcher/Data scientists: Helps in designing test cases (related to data, features, models, pipeline) around testing the reliability of ML models.
  • Quality assurance engineers: Plays an important role in checking QA testing outcomes related to data, features, models and ML pipeline tests.
  • Operations guy/engineer: Plays important role in automating the ML pipeline

References

  • Making Netflix Machine Learning algorithms reliable

Summary

In this post, you learned about different aspects of the reliability of a machine learning system. While creating a reliable ML system, one would require to assure the quality of model reliability in production and also model training process reliability. While model reliability in production is related with fault-tolerance and recoverability of models, the model training reliability would mean the repeatability of ML training/testing process which is associated with automation of ML training/testing processes. Being a data scientist/ML researcher, you would have a key role in laying out guidelines for achieving reliability of ML systems.

Machine learning Question answering

Published at DZone with permission of Ajitesh Kumar, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

Popular on DZone

  • GPT-3 Playground: The AI That Can Write for You
  • Implementing Adaptive Concurrency Limits
  • 3 Ways That You Can Operate Record Beyond DTO [Video]
  • How To Avoid “Schema Drift”

Comments

Partner Resources

X

ABOUT US

  • About DZone
  • Send feedback
  • Careers
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 600 Park Offices Drive
  • Suite 300
  • Durham, NC 27709
  • support@dzone.com
  • +1 (919) 678-0300

Let's be friends: