Introduction to OptiML: Automatic Model Optimization
Introduction to OptiML: Automatic Model Optimization
Get a quick introduction to OptiML before moving on to the remainder of this series to get a detailed perspective of what’s behind OptiML.
Join the DZone community and get the full member experience.Join For Free
BigML’s upcoming release on Wednesday, May 16, 2018, will be presenting a new resource to the platform: OptiML. In this post, we’ll do a quick introduction to OptiML before we move on to the remainder of our series of six blogposts (including this one) to give you a detailed perspective of what’s behind the model optimization part of the release. Today’s post explains the basic concepts that will be followed by an example use case. There will be three more blog posts focused on how to use OptiML through the BigML Dashboard, API, and WhizzML and Python Bindings for automation. Finally, we will complete this series of posts with a technical view of how OptiML works behind the scenes.
At BigML, we are believers of human-in-the-loop machine learning and the importance of feature engineering driven by subject matter expertise in real-life situations. As such, we have been treading carefully when it comes to ML automation, as it is very easy these days to overpromise and fail to deliver a solution that doesn’t overfit or introduce unacceptable tradeoffs between bias and variance.
BigML already offers a variety of highly effective supervised learning algorithms including Deepnets, logistic regressions, models (decision trees), and ensembles. Thanks to our one-click modeling capability, these can be executed with intelligent defaults to quickly form baseline models before you may prefer iterate on your project with different configuration options that can better solve your ML problem. Over time, based on popular demand, we also have made available a number of complementary WhizzML scripts that you can easily clone and execute to perform automated hyperparameter tuning or feature selection for specific algorithms such as ensembles.
We have been witnessing clear interest from our users to further automate model selection for classification or regression problems they are tackling via BigML’s built-in automation options. The drive for more productivity is nothing surprising. However, the issue boils down to: Is it possible to create a generalized automation approach whereby all applicable algorithms offered on the platform can be compared and contrasted with as little as a few clicks? The obvious benefit is time savings in deciding which direction in the hypothesis space to further explore to find an optimum model as you avoid exhaustive trial and error experimentation with different algorithms and their parameter configurations.
Well, we have some good news to share on this very front! BigML’s OptiML capability is taking the automation of model selection to the next level.
In essence, OptiML is an automatic optimization option that will allow you to find the best supervised learning model for your data.
- It can be used for both classification or regression problems.
- It works by automatically creating and evaluating multiple models with multiple configurations (decision trees, ensembles, logistic regressions, and Deepnets) by using Bayesian parameter optimization.
- When the process finishes, you get a list of the best models so that you can compare them and select the one that best suits your use case.
The OptiML menu option on the BigML Dashboard attempts to find the best model for a given dataset by sequentially trying groups of parameters, training models using them, evaluating them, and trying a new group of parameters based on the results of the previous tries. In many cases, this process tends to converge to a good solution faster due to the ability to reason about the expected outcome of a new set of parameters before they are executed. Furthermore, the search can be parameterized by a user-specified performance metric that will guide the optimization process e.g. ROC AUC, F-measure, etc.
OptiML can be configured to allow the search to try all applicable model types (Deepnets, logistic regressions, models, and ensembles) or a subset of them. However, if Deepnets are selected, it won’t iterate over them because they already come with two automatic optimization options: automatic structure suggestion and automatic network search. In those instances, two Deepnets, a search, and a structure suggestion will automatically be executed as part of the model optimization.
On a related note, even though we consider them part of the supervised learning toolbox, time series are not included in the scope of OptiML, as time series datasets tend to present a different type of data structure that are best treated differently than the other supervised tasks mentioned.
Finally, for completeness sake, in addition to finding the best supervised model among several algorithms with OptiML, we have also enabled the Automatic Optimization option for models, ensembles, logistic regression, and Deepnets, separately. This means that you no longer need to manually tune any of your supervised models to achieve the best results. Instead, you can simply select the Automatic Optimization option and BigML will execute this task for your chosen algorithm only. Once complete, it will similarly return the top-performing model along with its related parameter values.
The OptiML algorithm is split into two phases. The first, the “parameter search” phase uses a single holdout set to iteratively find promising sets of parameters. The second, the “validation” phase is used to iteratively perform Monte Carlo cross-validation on those parameters that are somewhat close to the best.
For this second phase, the algorithm iteratively does new train/test splits for the top half of algorithms remaining. Thus, the best models will typically have more than one evaluation associated with them.
The two phases are both governed by an argument specifying the maximum training time allowed. So, BigML halts a given phase of the algorithm when it goes over time in that phase. It does, however, guarantee that at least one iteration of each phase will complete before returning. Thus, in extreme cases, such as massive datasets coupled with very low maximum training times, it may overrun the said maximum training time significantly.
Published at DZone with permission of Atakan Cetinsoy , DZone MVB. See the original article here.
Opinions expressed by DZone contributors are their own.