{{announcement.body}}
{{announcement.title}}

Automatic Machine Learning (AutoML) Infrastructure — Oracle Data Science Cloud Service

DZone 's Guide to

Automatic Machine Learning (AutoML) Infrastructure — Oracle Data Science Cloud Service

In this article, DZone Core member talks about AutoML, one of the features that come with the Oracle Cloud Data Science Service.

· AI Zone ·
Free Resource

In this article, I will talk about AutoML, one of the features that come with the Oracle Cloud Data Science Service, and I hope it will be a useful article in terms of awareness.

Keras

As it is known and mentioned in my previous articles, Oracle recently added a new service called Data Science to cloud services. This service has been offered to users as a platform where many libraries come pre-installed. This platform, which includes many features like prototype development, project development, model management, to the production of produced models, contains many new features. Undoubtedly, one of the most interesting and useful features is the AutoML feature.

AutoML aims to automate the important steps we take when developing Machine Learning/Artificial Intelligence/Data Science projects. In the image below, the development steps of Machine Learning/Artificial Intelligence/Data Science projects are visualized. 

With AutoML, we can automate algorithm selection, feature selection, and determination of hyperparameters of algorithms. Thus, we can reduce the time spent by developers in these parts. Also, the AutoML infrastructure helps developers who are not experts in developing these steps as much as possible.

dataset

Image Reference

AutoML consists of three different modules.

Automated Feature Selection

Automated Feature Selection

Image Reference

Model (Algorithm) Selection

Model (Algorithm) Selection

Image Reference

Hyperparameters Optimization

Hyperparameters Optimization

Image Reference

In addition to automating the defined workflows, the AutoML infrastructure also allows the quality and performance of the models to be produced to increase and enables all steps of the workflow to run and scale in parallel.

We can successfully use the AutoML infrastructure in classification and regression problems.

Let's look at how the infrastructure works and how it produces results with an example application.

First of all, we can only use this infrastructure for now on Oracle Cloud Infrastructure Data Science Cloud Service. Because to use this infrastructure, we need to import the ADS (Accelerated Data Science) package. This package comes installed inside the OCI Data Science service and does not have an external installation. ADS is a package that Oracle offers in the cloud service and includes methods that will enable us to implement all the requirements for AI/ML/DS workloads.

The dataset I will use as an example is the iris dataset that comes in Sklearn, first, we will import the necessary libraries and load the dataset.

Python
 




x


 
1
from sklearn import datasets
2
import numpy as np
3
from ads.automl.driver import AutoML
4
from ads.automl.provider import OracleAutoMLProvider
5
from ads.dataset.factory import DatasetFactory
6
 
          
7
iris = datasets.load_iris()
8
df = pd.DataFrame(data= np.c_[iris['data'], iris['target']],columns= iris['feature_names'] + ['target'])



To use AutoML, we convert the pandas dataframe to the ADS dataframe. This process is very easy and fast.

Python
 




xxxxxxxxxx
1


 
1
ml_engine = OracleAutoMLProvider()
2
train = DatasetFactory.open(df).set_target('target')



Yes, we converted the dataframe to the ADS dataframe. Now we will realize the process of sending the dataframe we obtained to the AutoML method. AutoML roughly awaits an algorithm list from us. This algorithm list can be used as a combination of the algorithms written in the list below.

  • AdaBoostClassifier
  • DecisionTreeClassifier
  • ExtraTreesClassifier
  • KNeighborsClassifier
  • LGBMClassifier
  • LinearSVC
  • LogisticRegression
  • RandomForestClassifier
  • SVC
  • XGBClassifier
Python
 




xxxxxxxxxx
1


 
1
automl = AutoML(training_data=train,provider=ml_engine)
2
 
          
3
model, baseline = automl.train(model_list=[
4
    'LogisticRegression',
5
    'XGBClassifier',
6
    'SVC'], time_budget=600)



Another parameter that we can determine in AutoML is the time_budget parameter. This variable sets an upper limit for the running time of the AutoML method. In our example above, we see that 600 seconds (10 minutes) are given as time constraints. The method will try to return a result to us within 600 seconds (10 minutes).

It should be remembered that different combinations can be tried by giving time_budget variables longer. Apart from the time_budget variable, the min_features parameter can also be given in the AutoML method. With this parameter, a minimum value can be determined for the number of features to be used in the models to be produced. This parameter can be a number int or float variable, or it can be a list of property names to be listed.

Let's look at the results returned from AutoML.

Training complete (4.11 seconds)

Training Dataset size (150, 4)
Validation Dataset size None
CV 5
Target variable target
Optimization Metric recall_macro
Initial number of Features 4
Selected number of Features 4
Selected Features [sepal_length_(cm), sepal_width_(cm), petal_length_(cm), petal_width_(cm)]
Selected Algorithm SVC
End-to-end Elapsed Time (seconds) 4.11
Selected Hyperparameters {'C': 1.0, 'class_weight': None, 'gamma': 0.25}
Mean Validation Score 0.9667
AutoML n_jobs 32
AutoML version 0.3.1
Python version 3.6.9 |Anaconda, Inc.| (default, Jul 30 2019, 19:07:31) \n[GCC 7.3.0]


Yes, as seen, the best algorithm SVC (Support Vector Classifier), the best feature set [sepal_length_ (cm), sepal_width_ (cm), petal_length_ (cm), petal_width_ (cm)] and the best hyperparameters ('C'): 1.0, 'gamma': 0.25, 'class_weight': None}. We see that the best model obtained has a 96.67% accuracy.

At the time limit we gave, AutoML could try more combinations. It can show us the details of these in the same output.


The results obtained through the model output of AutoML can also be accessed separately.

Python
 




xxxxxxxxxx
1


 
1
model.selected_model_params_
2
model.ranked_models_
3
automl.visualize_algorithm_selection_trials()



This infrastructure can be a very useful benchmark mechanism not only for non-expert users but also for expert users.


Topics:
oracle cloud ,data science ,machine learning ,ai artificial intelligence ,ai ,keras ,anaconda

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}