Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

Deploying Machine Learning to the Cloud

DZone's Guide to

Deploying Machine Learning to the Cloud

Learn about the team at Alpine Data and how they are using Chorus to deploy machine learning to the cloud.

· Cloud Zone
Free Resource

Learn how our document data model can map directly to how you program your app, and native database features like secondary indexes, geospatial and text search give you full access to your data. Brought to you in partnership with MongoDB.

While enterprises have traditionally deployed Hadoop clusters on their data centers, there is a growing number creating clusters in the cloud. Cloud providers such as AWS and GCP make it almost effortless to spin-up and tear-down Hadoop clusters on-demand and provide a cost-effective approach to on-demand big data systems. However, the current analytics solutions offered by vendors are extremely limited and may not even extend to the use of Hadoop clusters.

Chorus can be readily deployed to cloud environments and supports not only the typical Hadoop distributions, but also AWS Elastic MapReduce (EMR). Chorus easily leverages data residing in RedShift or MySQL instances — sourcing and syncing data to and from S3.

Deploying, configuring and maintaining a bare-metal Hadoop cluster can be a time-consuming effort. In contrast, a multi-node Hadoop cluster can be created with cloud Hadoop deployments at the click of a button.

Once such an instance has been created, deploying Chorus is similarly efficient:

  • Create a small container.
  • Download Chorus from S3 to the container.
  • Launch the installer (and hit yes a couple of times).
  • Log into Chorus via your web browser.
  • Point Chorus at the resource manager of your Hadoop cluster, and instruct Chorus to autoconfigure itself for that cluster.
  • Start building high-performance analytical workflows using the Chorus visual workflow editor, and running them on your Hadoop cluster using Spark and/or MapReduce.

Start to finish, this process takes as little as 10-minutes. It’s also just a few clicks to add the Redshift data source: data can be moved back and forth between Redshift and EMR, ETL can be performed on the Redshift data in situ, and models trained on Hadoop can be used to score data residing in Redshift (or seamlessly deploy to a customer’s cloud scoring engines using either PMML or PFA). All within minutes of creating the instance and without having to write a single line of code!

With the recent introduction of Spark autotuning in Chorus 6.1, Chorus has a detailed understanding of the resource requirements associated with all of the analyses being run by data scientists using Chorus. As a result, Chorus is capable of understanding the optimal cluster sizing required to support the aggregate load. Future releases will provide the functionality to scale-up and down the size of the cluster by dynamically adding nodes when required and pausing idle nodes when not. This means that Chorus will be able to minimize the cost associated with running the cluster (indeed, customers can already integrate cluster control into their individual flows using the Chorus extensibility SDK).

At Alpine, we’re excited to see more of our customers use Chorus to deploy machine learning in the cloud. Stay tuned for future posts detailing other ways we’re transforming the traditional enterprise data science workflow.

Discover when your data grows or your application performance demands increase, MongoDB Atlas allows you to scale out your deployment with an automated sharding process that ensures zero application downtime. Brought to you in partnership with MongoDB.

Topics:
cloud ,chorus ,data migration ,machine learning

Published at DZone with permission of Lawrence Spracklen, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

THE DZONE NEWSLETTER

Dev Resources & Solutions Straight to Your Inbox

Thanks for subscribing!

Awesome! Check your inbox to verify your email so you can start receiving the latest in tech news and resources.

X

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}