Operationalization of Machine Learning Models: Part 2
Explore model operationalization of machine learning.
Join the DZone community and get the full member experience.
Join For FreeIn part 1 of the series, we discussed the need to focus on Model Operationalization. We discussed Model Deployment and features required in scoring engines. We will continue with other areas of focus for model operationalization.
Fig: Operationalization (O16N) of Machine Learning Models in Enterprise
Collect Metrics/Failures
The scoring engine in addition to returning prediction results should log both the input request, the model used, model version, data validation errors, and output prediction. This is required for the evaluation of model performance or to identify issues in model scoring in prediction. Upstream instability can create problems for inferences. For this, it's required to validate the input record to see invalid data. If there are any errors in prediction, response times for each request have to be recorded in a data store. Observed outcomes (ground truth) for predictions when available, have to be ingested to compare with predictions.
Monitor Model Performance
Fig: Performance of model on timeline graph. The dotted line indicates metrics during training.
Models deployed in production become stale become of age of models or drift in the distribution of data over time. This is referred to as training-serving skew or model drift. It is critical to monitor the performance of the model regularly to identify these issues. A model with high accuracy in training but reduced accuracy during scoring means the model is overfitted and are not generalizing. One of the primary reasons for model performance degradation is a drift in the distribution of data. Comparison of data distribution and statisics like mean, median, standard deviation for continuous values, and frequency distribution of categorical values can help in identifying data drifts. Statistical tests like Kolmogorov-Smirnov can also be used in identifying the data drift, which, in turn, results in a model drift (degradation of model performance). Also, patterns and relations in data often evolve over time and so models built for analyzing certain initial data quickly become obsolete over time as the relationship between input and output variables change. This phenomenon is referred to as concept drift.
Analyze Results and Errors
Fig: Income dataset — Train vs. Scoring data analysis by Relationship and Occupuation feature
All models need to be updated eventually to account for changes in the external world. So, careful assessment is important to guide such decisions. Slicing a data set along certain dimensions of interest and comparing them with results during training can provide a fine-grained understanding of model performance. These analyses would be different for each dataset and so it would be better to leverage visualization tools like Tableau, QlikView, or Power BI, which is used by the enterprise. The ability to set thresholds for alerts to concerned teams (data scientist, business and engineering teams) on the degradation of model performance and other errors is required.
Refine Model
Drops in model accuracy compared to training accuracy beyond the acceptable threshold as defined by business needs is an indicator for re-creating a model. There are two options in this scenario. One is to rebuild the model with the same features but including data with ground truth since the last training. This is the simplest approach and can be automatically triggered to rebuild the model and deploy a new version of the model as well. But it may be a good idea to have data scientists/businesses review the explanation of the new model before its promoted to production.
There will be scenarios where a re-creation of models doesn’t improve model performance. In these situations, analyzing examples that the model went wrong and looking for trends that are outside your current feature set can help in identifying new features. Creation of new features based on this knowledge can give models new experiences to learn. This can help in improving model performance.
The paper “Continuum: A Platform for Cost-Aware, Low-Latency Continual Learning” (https://dl.acm.org/citation.cfm?) demonstrates the need for continuous learning/training of models. For this, there is a need for continuous deployment platform for ML models. Enterprises today have a strategy for Data Platform and many have selected or evaluated a Machine learning platform. Focus on enterprise-wide strategies for model operationalization. It is equally important and critical for enterprises to get benefits out of their AI investments.
Conclusion
For operationalization of machine learning models, there are multiple areas that need to be focused. Operational best practices followed by application teams for existing enterprise applications combined with areas that are unique to machine learning are required for a successful implementation of AI-led enterprise solutions.
Opinions expressed by DZone contributors are their own.
Comments