Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

Moving HDInsight Workloads to Qubole: Cost Comparison and Benefits

DZone's Guide to

Moving HDInsight Workloads to Qubole: Cost Comparison and Benefits

As projects mature to prod or expand to new teams, HDI platforms require constant support and revision. Upgrades must be managed manually and costs will quickly accumulate.

· Big Data Zone ·
Free Resource

Hortonworks Sandbox for HDP and HDF is your chance to get started on learning, developing, testing and trying out new features. Each download comes preconfigured with interactive tutorials, sample data and developments from the Apache community.

Many companies start their big data journey in the cloud on Azure by gravitating to Microsoft's native offering HDInsight (HDI). This is because, with data already in the Blob Store or ADLS, it was easy to get started on projects and experiment.

For an experienced team with deep big data expertise and with data engineers and data scientists on staff, configuring and tuning an infrastructure on HDI can make sense. However, for most organizations just starting out, managing and tuning this infrastructure can be a barrier to scale beyond a department or POC project. Setup requires the separate configuration of many components to customize the platform to your specific use case. The administrative overhead to maintain the platform also will require additional staff within IT to keep it operational. This still creates dependency on IT and a potential bottleneck for end users to be productive because every new project may require the spinning up of clusters or new infrastructure.

As projects mature to production or expand to different teams, an HDI platform will require constant support and revision. Upgrades of software still must be managed manually even if it is simpler than on-premises installed open-source software. Costs can quickly accumulate because HDI does not have automated ways to size clusters according to the workload at any given time.

A Different Approach to Big Data on Azure

Qubole provides a managed service. We focus on automation of infrastructure for big data in the cloud. Our policy-driven, workload-aware, auto-scaling capability reduces setup, administrative, and operational management overhead for a big data platform. Similar to HDI, Qubole leverages the cloud data store and Azure's compute to spin up infrastructure on demand. The difference with Qubole is that we look at the workload itself and re-size the cluster accordingly. In addition, Qubole Data Service (QDS) automates the startup and shutdown of a cluster for you.

This type of automation enables platforms to scale as demand grows very cost-effectively. By optimizing compute and the number of nodes in a cluster, organizations can expand and contract flexibly and remove the need to maintain clusters 24/7. As organizations scale into production workloads, this capability has a huge impact on TCO.

As an example, an organization wants to run several data science workloads on HDI. They require an R-server support and only want to run the platform during working hours in one region for a maximum of ten hours per day, five days per week. IT sets up the infrastructure for this team consisting of a 50-node HDI cluster running on A10 VMs (8 CPUs, 56 GB of RAM, and 382GB HDD). They also assign two resources to manage the infrastructure and ensure that the cluster starts and stops at the agreed upon times.

In contrast, with Qubole, an administrator defines a cluster policy that matches the same hardware specification — but has the ability to auto-scale from a minimum number of two nodes and a maximum of 50 nodes. A data scientist submitting the first workload will self-service start-up the infrastructure. Throughout the day, the cluster will re-size automatically based on activity. By optimizing over the course of days, week, and months, Qubole will lower infrastructure costs enabling the team scale as needs grow.

TCO Comparison

Collaboration and Productivity

An organization investing in big data will have additional personas running other big data workloads. Data engineers can create data pipelines for data analysts running business intelligence queries or data scientists running machine learning algorithms. QDS provides a single environment for each of these user types to access the same data and work together through a common interface. The ability to share queries, troubleshoot issues, and plug into BI tools such as PowerBI or Tableau accelerates team productivity.

Most enterprises lack the deep experience to manage big data at scale. Qubole provides automation, scale, cost reduction, and self-service data access. As a managed service Qubole also takes care of upgrading open source technology seamlessly, so your IT team can focus on other needs. Qubole is lowering the barrier to scaling your big data initiatives on Azure. Trying another option on Azure is low risk with our Business Edition.

Hortonworks Community Connection (HCC) is an online collaboration destination for developers, DevOps, customers and partners to get answers to questions, collaborate on technical articles and share code examples from GitHub.  Join the discussion.

Topics:
big data ,workloads ,qubole ,azure ,data workloads ,hdinsight

Published at DZone with permission of

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}