Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

H2O AutoML Examples in Python and Scala [Code Snippets]

DZone's Guide to

H2O AutoML Examples in Python and Scala [Code Snippets]

If you want to automate your machine learning workflow, look no further than H2O AutoML. It trains and tunes models, uses performance-based stopping criteria, and more.

· AI Zone ·
Free Resource

Enable your enterprise to add AI to your existing infrastructure with EdgeVerve’s Business Applications built on AI platform Infosys Nia™. Register for our webinar to learn more.

AutoML is included in H2O versions 3.14.0.1 and above. You can learn more about AutoML here.

H2O AutoML can be used to automate a large portion of the machine learning workflow, which includes automatic training and tuning of many models within a user-specified time limit. The user can also use performance metric-based stopping criteria for the AutoML process rather than a specific time constraint. Stacked ensembles will be automatically trained on the collection's individual models to produce a highly predictive ensemble model which, in most cases, will be the top performing model in the AutoML Leaderboard.

Here is the full working Python code, taken from here:

import h2o
from h2o.automl import H2OAutoML

h2o.init()
df = h2o.import_file("https://raw.githubusercontent.com/h2oai/sparkling-water/master/examples/smalldata/prostate.csv")
train, test = df.split_frame(ratios=[.9])
# Identify predictors and response
x = train.columns
y = "CAPSULE"
x.remove(y)

# For binary classification, response should be a factor
train[y] = train[y].asfactor()
test[y] = test[y].asfactor()

# Run AutoML for 60 seconds
aml = H2OAutoML(max_runtime_secs = 60)
aml.train(x = x, y = y, training_frame = train, leaderboard_frame = test)

# View the AutoML Leaderboard
aml.leaderboard
aml.leader

# To generate predictions on a test set, use `"H2OAutoML"` object, or on the leader model object directly as below:
preds = aml.predict(test)
# or
preds = aml.leader.predict(test)

Here is the full working Scala code:

import ai.h2o.automl.AutoML;
import ai.h2o.automl.AutoMLBuildSpec
import org.apache.spark.h2o._
val h2oContext = H2OContext.getOrCreate(sc)
import h2oContext._
import java.io.File
import h2oContext.implicits._
import water.Key
val prostateData = new H2OFrame(new File("/Users/avkashchauhan/src/github.com/h2oai/sparkling-water/examples/smalldata/prostate.csv"))
val autoMLBuildSpec = new AutoMLBuildSpec()
autoMLBuildSpec.input_spec.training_frame = prostateData
autoMLBuildSpec.input_spec.response_column = "CAPSULE";
autoMLBuildSpec.build_control.loss = "AUTO"
autoMLBuildSpec.build_control.stopping_criteria.set_max_runtime_secs(5)
import java.util.Date;
val aml = AutoML.makeAutoML(Key.make(), new Date(), autoMLBuildSpec)
AutoML.startAutoML(aml)
// Note: In some cases the above call is non-blocking
// So using the following alternative function will block the next commmand, untill the exection of action command
AutoML.startAutoML(autoMLBuildSpec).get()  ## This is forced blocking call
aml.leader
aml.leaderboard

If you want to see the full code execution, see here.

That's it. Enjoy!

Adopting a digital strategy is just the beginning. For enterprise-wide digital transformation to take effect, you need an infrastructure that’s #BuiltOnAI. Register for our webinar to learn more.

Topics:
h2o ,machine learning ,scala ,python ,ai ,automation

Published at DZone with permission of

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}