Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

Stacked Ensemble Model in Scala Using H2O GBM and Deep Learning Models

DZone's Guide to

Stacked Ensemble Model in Scala Using H2O GBM and Deep Learning Models

This code-heavy tutorial will teach you how to use the H2O Stacked Ensembles algorithm, GBM, and deep learning models to build a machine learning model.

· AI Zone ·
Free Resource

Did you know that 50- 80% of your enterprise business processes can be automated with AssistEdge?  Identify processes, deploy bots and scale effortlessly with AssistEdge.

In this full Scala sample, we will be using the H2O Stacked Ensembles algorithm. Stacked Ensembles involve the process of building models of various types — first, with cross-validation and second, keeping fold columns for each model. You can learn more about Stacked Ensembles here.

In this Stacked Ensemble, we will be using GBM and deep learning algorithms and then finally build the Stacked Ensemble model using the GBM and deep learning models.

First, let's import key classes specific to H2O:

import org.apache.spark.h2o._
import water.Key
import java.io.File

Now, we will create the H2O context so that we can call key H2O functions specific to data ingestion and deep learning algorithms:

val h2oContext = H2OContext.getOrCreate(sc)
import h2oContext._
import h2oContext.implicits._

Let's import data from the local file system as an H2O DataFrame:

val prostateData = new H2OFrame(new File("/Users/avkashchauhan/src/github.com/h2oai/sparkling-water/examples/smalldata/prostate.csv"))

In this Stacked Ensemble, we will be using GBM and deep learning algorithms. Let's first build the deep learning model:

import _root_.hex.deeplearning.DeepLearning
import _root_.hex.deeplearning.DeepLearningModel.DeepLearningParameters

val dlParams = new DeepLearningParameters()
dlParams._epochs = 100
dlParams._train = prostateData
dlParams._response_column = 'CAPSULE
dlParams._variable_importances = true
dlParams._nfolds = 5
dlParams._seed = 1111
dlParams._keep_cross_validation_predictions = true;
val dl = new DeepLearning(dlParams, Key.make("dlProstateModel.hex"))
val dlModel = dl.trainModel.get

Now, let's build the GBM model:

import _root_.hex.tree.gbm.GBM
import _root_.hex.tree.gbm.GBMModel.GBMParameters

val gbmParams = new GBMParameters()
gbmParams._train = prostateData
gbmParams._response_column = 'CAPSULE
gbmParams._nfolds = 5
gbmParams._seed = 1111
gbmParams._keep_cross_validation_predictions = true;
val gbm = new GBM(gbmParams,Key.make("gbmRegModel.hex"))
val gbmModel = gbm.trainModel().get()

Now, build the Stacked Ensemble models. First, we need classes required for Stacked Ensembles:

import _root_.hex.Model
import _root_.hex.StackedEnsembleModel
import _root_.hex.ensemble.StackedEnsemble

Next, we will define Stacked Ensembles parameters:

val stackedEnsembleParameters = new StackedEnsembleModel.StackedEnsembleParameters()
stackedEnsembleParameters._train = prostateData._key
stackedEnsembleParameters._response_column = 'CAPSULE

Now, we need to pass all the different algorithms we would want to use in the Stacked Ensemble by passing their keys:

type T_MODEL_KEY = Key[Model[_, _ <: Model.Parameters, _ <:Model.Output]]

// Option 1
stackedEnsembleParameters._base_models = Array(gbmRegModel._key.asInstanceOf[T_MODEL_KEY], dlModel._key.asInstanceOf[T_MODEL_KEY])
// Option 2 
stackedEnsembleParameters._base_models = Array(gbmRegModel, dlModel).map(model => model._key.asInstanceOf[T_MODEL_KEY])

// Note: You can choose any of the above option to pass the model keys

Finally, let's define the Stacked Ensemble job:

val stackedEnsembleJob = new StackedEnsemble(stackedEnsembleParameters)

And as the last steps, let's build the Stacked Ensemble model:

val stackedEnsembleModel = stackedEnsembleJob.trainModel().get();

Now, we can take a look at our Stacked Ensemble model:

stackedEnsembleModel

If you like to see the parameters, i.e. the meta-learner from the Stacked Ensemble, try the following:

stackedEnsembleModel._output._metalearner

That's it. Enjoy!

Consuming AI in byte sized applications is the best way to transform digitally. #BuiltOnAI, EdgeVerve’s business application, provides you with everything you need to plug & play AI into your enterprise.  Learn more.

Topics:
h2o ,scala ,machine learning ,deep learning ,gbm ,random forest ,ai ,tutorial ,stacked ensemble

Published at DZone with permission of

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}