DZone
Cloud Zone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
  • Refcardz
  • Trend Reports
  • Webinars
  • Zones
  • |
    • Agile
    • AI
    • Big Data
    • Cloud
    • Database
    • DevOps
    • Integration
    • IoT
    • Java
    • Microservices
    • Open Source
    • Performance
    • Security
    • Web Dev
DZone > Cloud Zone > Deploying Machine Learning to the Cloud

Deploying Machine Learning to the Cloud

Learn about the team at Alpine Data and how they are using Chorus to deploy machine learning to the cloud.

Lawrence Spracklen user avatar by
Lawrence Spracklen
·
Nov. 21, 16 · Cloud Zone · Tutorial
Like (1)
Save
Tweet
2.55K Views

Join the DZone community and get the full member experience.

Join For Free

While enterprises have traditionally deployed Hadoop clusters on their data centers, there is a growing number creating clusters in the cloud. Cloud providers such as AWS and GCP make it almost effortless to spin-up and tear-down Hadoop clusters on-demand and provide a cost-effective approach to on-demand big data systems. However, the current analytics solutions offered by vendors are extremely limited and may not even extend to the use of Hadoop clusters.

Chorus can be readily deployed to cloud environments and supports not only the typical Hadoop distributions, but also AWS Elastic MapReduce (EMR). Chorus easily leverages data residing in RedShift or MySQL instances — sourcing and syncing data to and from S3.

Deploying, configuring and maintaining a bare-metal Hadoop cluster can be a time-consuming effort. In contrast, a multi-node Hadoop cluster can be created with cloud Hadoop deployments at the click of a button.

Once such an instance has been created, deploying Chorus is similarly efficient:

  • Create a small container.
  • Download Chorus from S3 to the container.
  • Launch the installer (and hit yes a couple of times).
  • Log into Chorus via your web browser.
  • Point Chorus at the resource manager of your Hadoop cluster, and instruct Chorus to autoconfigure itself for that cluster.
  • Start building high-performance analytical workflows using the Chorus visual workflow editor, and running them on your Hadoop cluster using Spark and/or MapReduce.

Start to finish, this process takes as little as 10-minutes. It’s also just a few clicks to add the Redshift data source: data can be moved back and forth between Redshift and EMR, ETL can be performed on the Redshift data in situ, and models trained on Hadoop can be used to score data residing in Redshift (or seamlessly deploy to a customer’s cloud scoring engines using either PMML or PFA). All within minutes of creating the instance and without having to write a single line of code!

With the recent introduction of Spark autotuning in Chorus 6.1, Chorus has a detailed understanding of the resource requirements associated with all of the analyses being run by data scientists using Chorus. As a result, Chorus is capable of understanding the optimal cluster sizing required to support the aggregate load. Future releases will provide the functionality to scale-up and down the size of the cluster by dynamically adding nodes when required and pausing idle nodes when not. This means that Chorus will be able to minimize the cost associated with running the cluster (indeed, customers can already integrate cluster control into their individual flows using the Chorus extensibility SDK).

At Alpine, we’re excited to see more of our customers use Chorus to deploy machine learning in the cloud. Stay tuned for future posts detailing other ways we’re transforming the traditional enterprise data science workflow.

Machine learning Cloud Data science Big data hadoop

Published at DZone with permission of Lawrence Spracklen, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

Popular on DZone

  • What I Miss in Java, the Perspective of a Kotlin Developer
  • How Java Apps Litter Beyond the Heap
  • DZone's Article Submission Guidelines
  • Biometric Authentication: Best Practices

Comments

Cloud Partner Resources

X

ABOUT US

  • About DZone
  • Send feedback
  • Careers
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • MVB Program
  • Become a Contributor
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 600 Park Offices Drive
  • Suite 300
  • Durham, NC 27709
  • support@dzone.com
  • +1 (919) 678-0300

Let's be friends:

DZone.com is powered by 

AnswerHub logo