Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

Starter Script for rsparkling (H2O on Spark With R)

DZone's Guide to

Starter Script for rsparkling (H2O on Spark With R)

H2O, R, and Spark integrated together create a fabulous blend for AI implementation in enterprises. Check out the integration script below and start your AI journey.

· AI Zone ·
Free Resource

EdgeVerve’s Business Applications built on AI platform Infosys Nia™ enables your enterprise to manage specific business areas and make the move from a deterministic to cognitive approach.

The rsparkling R package is an extension package for sparklyr that creates an R front-end for the Sparkling Water Spark package from H2O. This provides an interface to H2O’s high-performance, distributed Machine Learning algorithms on Spark using R. Visit the GitHub project here.

You must have the following packages installed in your R environment:

  • sparklyr
  • H2O
  • rsparkling
  • You must have the latest Sparkling Water package downloaded and un-zipped locally.

    I am using the following packages in my environment:

    • Spark 2.1
    • Sparkling Water 2.1.8
    • sparklyr 0.4.4
    • rsparkling 0.2.0

    Now here is the rsparkling script to create the cluster locally:

    options(rsparkling.sparklingwater.location="/tmp/sparkling-water-assembly_2.11-2.1.8-all.jar") 
    Sys.setenv(SPARK_HOME="/usr/hdp/current/spark2-client/") 
    library(sparklyr) 
    library(rsparkling) 
    config <- spark_config() 
    config$spark.executor.cores <- 4 
    config$spark.executor.memory <- "4G" 
    sc <- spark_connect(master = "local", config = config, version = '2.1.0') 
    print(sc) 
    h2o_context(sc, strict_version_check = FALSE) 
    h2o_flow(sc, strict_version_check = FALSE) 
    spark_disconnect(sc)

    Now here is the rsparkling script to create a Spark cluster with Yarn:

    options(rsparkling.sparklingwater.location="/tmp/sparkling-water-assembly_2.11-2.1.8-all.jar")
    Sys.setenv(SPARK_HOME="/usr/hdp/current/spark2-client/")
    library(sparklyr)
    library(rsparkling)
    config <- spark_config()
    config$spark.executor.cores <- 4
    config$spark.executor.memory <- "4G"
    config$spark.executor.instances = 2
    sc <- spark_connect(master = "yarn-client", config = config, version = '2.1.0')
    print(sc)
    h2o_context(sc, strict_version_check = FALSE)
    h2o_flow(sc, strict_version_check = FALSE)
    spark_disconnect(sc)

    That's it. Enjoy!

    Adopting a digital strategy is just the beginning. For enterprise-wide digital transformation to truly take effect, you need an infrastructure that’s #BuiltOnAI. Click here to learn more.

    Topics:
    ai ,h2o ,spark ,r ,sparklyr ,rsparkling

    Published at DZone with permission of

    Opinions expressed by DZone contributors are their own.

    {{ parent.title || parent.header.title}}

    {{ parent.tldr }}

    {{ parent.urlSource.name }}