Improving Analytics With a Hybrid Cloud Workflow

DZone 's Guide to

Improving Analytics With a Hybrid Cloud Workflow

The right balance of on-premises and pubic cloud storage should be a conscious decision based on your data strategy.

· Big Data Zone ·
Free Resource

The public cloud has changed computing forever. It moves information technology into a world of utility where compute and storage are available as needed — easy to implement and decommission. It provides a flexible infrastructure for a data-centric world increasingly based on analytics, where experimentation is the foundation of digital transformation.

Analytics are a complex workflow that relies on both large data sets to take advantage of historic data for analytic models, and also high performance for making timely decisions and generating more iterations to derive deeper insights to your data.

Organizations big and small have adopted the public cloud for analytics to address several business needs.

  • Convenient: yes.
  • Flexible: yes.
  • Inexpensive: it depends.

Rethinking Cloud Strategies for Analytics

Cloud Strategies

After the initial great wave of adoption, many companies are rethinking their public cloud strategy after getting unexpected large bills[1], often for extracting data from the public cloud. For any company that has a large amount of data, a hybrid cloud model is often the more cost effective and impactful approach.

Most companies use a combination of public and private clouds to achieve their business goals. The question remains: what is the right balance? Customers want to know the right amount of data to keep in the public cloud and what to store on-premises.

Analytics: On-Premises vs. the Public Cloud

Let's consider the analytics workflow and how to choose an optimized infrastructure. The public cloud and on-premises storage both have many attributes.

  • Public cloud: The reason that many companies use the public cloud is to be able to spin up a massive number of compute cores to run a job and then decommission it after the job to avoid the capital cost if done on-premises. The tool set available in the public cloud can also be a compelling reason to use those facilities. Lastly, the unstructured data that underlies many analytics workloads is often put in the public cloud for lack of a local alternative.
  • On-premises: The case for on-premises analytics is the ability to control the environment, including security features, get predictable performance, as well as move and store data without incurring charges.

How to Approach a Hybrid Cloud Analytics Workflow

Consider a hybrid cloud approach where you can take advantage of the strengths of both approaches. Start with your data. You want to control this data, and you may want to use this data for more than one analytics job, even combine one data set with other data for other jobs. You may also have massive historic data sets that you may want to access for predictive models.

Continuously storing data and moving data out of the public cloud can be expensive. A more cost-efficient approach is to have the bulk of your data stored on-premises in a highly scalable, low cost object-based system. This is a great foundation for unstructured data that can already "speak cloud" for a seamless hybrid cloud configuration.

To run your analytics job, you'll replicate only the data you need from your on-premises system to the public cloud, and only bring back the results, not the raw data itself. This architectural approach will significantly reduce ongoing costs. In some use cases, such as Western Digital's analytics workflow, it can save your company millions of dollars:

As you can see in the video, the ActiveScale™ system is a perfect fit for enabling a hybrid cloud analytics workflow. The ActiveScale cloud object storage system allows you to replicate buckets in your local system to Amazon® AWS™ where you can spin up compute, analytics tools, and storage as needed. Send the results back to your on-premises ActiveScale and delete the data bucket in AWS. In this way you take advantage of the resources in AWS, retain control of the raw data, you get the results of the analytics job, and avoid high export fees of AWS.

Data can be synchronized from a single GEO or a 3-GEO configuration and enabled on a bucket level so you can balance the right combination of data copies on-premises and in public storage, and take advantage of ActiveScale's extreme data durability (up to 19 nines!).

Hybrid Cloud Workflow

Hybrid Cloud Analytics: Plan Right

The right balance of on-premises and pubic cloud storage should be a conscious decision based on your data strategy. Understanding what data you want to keep, and for how long, is important. Understanding that you can build a data forever architecture at lower costs should be part of that equation as well as how much active data you have and how you expect to use it. Think 3-5 years forward and how you can extract more value from your data.

These questions and scenarios can help you decide the right balance of local and public storage. Without a doubt, hybrid cloud architectures can improve your analytics workflow, and help you garner more value from your data.

If you'd like to see how others are viewing the hybrid/private cloud balance, check out the 451 Research survey here:


aws, big data, data analytics, hybrid cloud, public cloud

Published at DZone with permission of Erik Ottem . See the original article here.

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}