Making Machine Learning Accessible for Enterprises: Part 2
Let's take a look at discussing critical areas of machine learning-based solutions, such as model explainability and model governance.
Join the DZone community and get the full member experience.Join For Free
In Part 1 of this series, we discussed the need for automation of data science and the need for speed and scale in data transformation and building models. In this part, we will discuss other critical areas of ML-based solutions like:
- Model Explainability
- Model Governance (Traceability, Deployment, and Monitoring)
Simpler Machine Learning models like linear and logistic regression have high interpretability, but may have limited accuracy. On the other hand, Deep Learning models have time and again produced high accuracy results, but are considered black boxes because of the machine’s inability to explain their decisions and actions to human users. With regulations like GDPR, model explainability is quickly becoming one of the biggest challenges for data scientists, legal teams, and enterprises. Explainable AI, commonly referred to as XAI, is becoming one of the most sought-after research areas in Machine Learning. Predictive accuracy and explainability are frequently subject to a trade-off; higher levels of accuracy may be achieved but at the cost of decreased levels of explainability. Unlike Kaggle, competitions where complex ensemble models are created to win competitions, for enterprises, model interpretability is very important. Loan Default Prediction model cannot be used to reject loan to a customer until the model is able to explain why a loan is being rejected. Also, it is often required at the model level as well as individual test instance level. At Model level, there is need to explain key features which are important and how variation in these features affect the model decision. Variable Importance and Partial Dependence plots are popularly used for this. For an individual test instance level, there are packages like “lime,” which help in explaining how black box models make a decision.
Figure 1: Screenshot with Variable Importance chart from Infosys Nia Machine Learning
Figure 2: Screenshot of Partial Dependence Plots from Infosys Nia Machine Learning
Figure 3: Test Point Variable Importance using LIME
Model Governance (Traceability, Deployment, and Monitoring)
Any Machine Learning project would involve trying multiple hypotheses, data transformation strategies, models, etc. Machine Learning algorithms have dozens of configurable parameters, and whether you work alone or in a team, it is difficult to track which parameters, code, and data went into each experiment to produce a model. Keeping track of where you started and what all options were tried for a particular project is a typical challenge faced by data scientist in any project. Also, there are some industries with certain compliance requirements which makes it essential to track all the activities of a project.
Figure 4: Project Audit feature from Infosys Nia ML platform
Once the model is built, validated, and signed-off by all stakeholders, the model has to be deployed in Production. REST APIs are the preferred method for Model scoring as they can be easily integrated with the line of business applications. It is critical that the entire data pipeline including all feature engineering and transformations are also packaged along with the model for deployment.
Production Model deployed should be monitored to detect “data drift,” in which production data differs from training data with an emphasis on how such a drift might impact the model performance. Models should be refreshed periodically in Production to avoid data/model drift and the process of moving models from training to production along with data pipeline have to be automated and made seamless.
To succeed in the data-driven economy, you have to get new machine-learning projects up and running quickly. For this, choice of platform that caters to the above needs is very critical.
With these platform capabilities, enterprises can help their teams in accelerating delivery of data science projects along with more accurate results. Enterprise business and IT teams can focus on identifying business problems that are of great value and that can be solved with data at hand.
Opinions expressed by DZone contributors are their own.
How To Use Geo-Partitioning to Comply With Data Regulations and Deliver Low Latency Globally
Revolutionizing Algorithmic Trading: The Power of Reinforcement Learning
Mastering Go-Templates in Ansible With Jinja2
Event-Driven Architecture Using Serverless Technologies