Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

Saving H2O Models From R/Python API in Hadoop Environment

DZone's Guide to

Saving H2O Models From R/Python API in Hadoop Environment

Check out an example of why you might get a No Such File Or Directory error and learn how you can get around this issue.

· Big Data Zone
Free Resource

Effortlessly power IoT, predictive analytics, and machine learning applications with an elastic, resilient data infrastructure. Learn how with Mesosphere DC/OS.

When you are using H2O in a clustered environment, i.e. Hadoop, the machine could be different where h2o.savemodel() is trying to write the model. That's why you see the error, “No such file or directory.” If you just give the path, i.e. /tmp, and visit the machine ID where the H2O connection is initiated from R, you will see the model stored there.

Here is a good example to understand it better.

1. Start Hadoop Driver in EC2 Environment 

[ec2-user@ip-10-0-104-179 ~]$ hadoop jar h2o-3.10.4.8-hdp2.6/h2odriver.jar -nodes 2 -mapperXmx 2g -output /usr/ec2-user/005
....
....
....
Open H2O Flow in your web browser: http://10.0.65.248:54323  <=== H2O is started.
[ec2-user@ip-10-0-104-179 ~]$ hadoop jar h2o-3.10.4.8-hdp2.6/h2odriver.jar -nodes 2 -mapperXmx 2g -output /usr/ec2-user/005
....
....
....
Open H2O Flow in your web browser: http://10.0.65.248:54323  <=== H2O is started.

2. Connect R Client With H2O

> h2o.init(ip = "10.0.65.248", port = 54323, strict_version_check = FALSE)

Note: I have used the IP address as shown above to connect with the existing H2O cluster. However, the machine where I am running the R client is different, as its IP address is 34.208.200.16.

3. Saving H2O Model

h2o.saveModel(my.glm, path = "/tmp", force = TRUE)

The mode is saved at 10.0.65.248 even when the R client is running at 34.208.200.16.

ec2-user@ip-10-0-65-248 ~]$ ll /tmp/GLM*
-rw-r--r-- 1 yarn hadoop 90391 Jun 2 20:02 /tmp/GLM_model_R_1496447892009_1

You need to make sure you have access to a folder where the H2O service is running, or you can save model at HDFS something similar to as below:

h2o.saveModel(my.glm, path = "hdfs://ip-10-0-104-179.us-west-2.compute.internal/user/achauhan", force = TRUE)

Learn to design and build better data-rich applications with this free eBook from O’Reilly. Brought to you by Mesosphere DC/OS.

Topics:
hadoop ,big data ,python ,r ,api ,h2o ,tutorial

Published at DZone with permission of Avkash Chauhan, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

THE DZONE NEWSLETTER

Dev Resources & Solutions Straight to Your Inbox

Thanks for subscribing!

Awesome! Check your inbox to verify your email so you can start receiving the latest in tech news and resources.

X

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}