DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports
Events Video Library
Over 2 million developers have joined DZone. Join Today! Thanks for visiting DZone today,
Edit Profile Manage Email Subscriptions Moderation Admin Console How to Post to DZone Article Submission Guidelines
View Profile
Sign Out
Refcards
Trend Reports
Events
View Events Video Library
Zones
Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks

Mobile Database Essentials: Assess data needs, storage requirements, and more when leveraging databases for cloud and edge applications.

Full-Stack Observability Essentials: Explore the fundamentals of system-wide observability and key components of the OpenTelemetry standard.

Monitoring and Observability for LLMs: Datadog and Google Cloud discuss how to achieve optimal AI model performance.

Automated Testing: The latest on architecture, TDD, and the benefits of AI and low-code tools.

Related

  • Apache Kafka Is the New Black at the Edge in IoT Projects
  • How To Use Geo-Partitioning to Comply With Data Regulations and Deliver Low Latency Globally
  • Auditing Tools for Kubernetes
  • Dynatrace Perform: Day Two

Trending

  • The Power of Visualization in Exploratory Data Analysis (EDA)
  • Setting up Request Rate Limiting With NGINX Ingress
  • Log Analysis: How to Digest 15 Billion Logs Per Day and Keep Big Queries Within 1 Second
  • The Agile Architect: Mastering Architectural Observability To Slay Technical Debt
  1. DZone
  2. Software Design and Architecture
  3. Cloud Architecture
  4. Transient Clusters in the Cloud for Big Data

Transient Clusters in the Cloud for Big Data

When it comes to getting value from big data, paying less and processing it faster to reduce time-to-insight are always top-of-mind goals.

Scott Gidley user avatar by
Scott Gidley
·
Aug. 24, 16 · Opinion
Like (2)
Save
Tweet
Share
12.97K Views

Join the DZone community and get the full member experience.

Join For Free

Cheaper, faster. Faster, cheaper. When it comes to getting value from Big Data, paying less and processing it faster to reduce time-to-insight are always top-of-mind goals. To achieve these goals, many enterprises are turning to the cloud to augment their on-premise Hadoop infrastructure or replace it.

Pay Only for What You Need

One key reason for the shift is that Hadoop in the cloud allows for the decoupling of storage and compute services, so enterprises can pay for storage at a lower rate than for computing services. Also, the cloud provides the unlimited scalability that on-premise architecture can’t. With cloud services like AWS EMR or Microsoft Azure HD Insight, enterprises can spin up and scale Hadoop clusters on demand. Have a job that isn’t processing fast enough? Add more nodes and then scale back down when it’s done. Have several jobs of various sizes? Run multiple clusters of exactly the size needed so that no resources are wasted. Add transient clusters to the mix, and the cloud becomes an extremely customizable Big Data solution.

Leverage Transient Clusters

Transient clusters are compute clusters that automatically shut down and stop billing when processing is finished. However, using this cost-effective approach has been an issue in the past, as metadata is automatically deleted by the cloud provider when a transient cluster is shut down. Therefore, most enterprises have opted to pay for persistent compute across the board in order to maintain the metadata.

Now with a data management platform like Bedrock, enterprises can leverage transient clusters for cost-savings and maintain their metadata. How does it work? In Bedrock’s case, the data management platform monitors the ingestion of the data that’s being loaded to the transient cluster in the cloud and stores the resulting metadata outside EMR/HD Insight. That way, the metadata is still available after the cluster is terminated.

Why is this important? Metadata is the key to getting value from Big Data. It’s the technical, operational and business information about the data that allows users to find the data they need in the data lake, verify its quality and trust the validity of their analyses and business intelligence.  

A Hybrid Approach

Moving storage and applications to the cloud isn’t an all-or-nothing proposition. In reality, most enterprises are employing a hybrid approach to the data lake, with some data storage—perhaps of less sensitive, third-party data—and processing—including transient clusters—in the cloud, and some on-premise. An intelligent Hadoop data lake management platform, like Bedrock, is flexible and provides a centralized way to manage on-premise and cloud-based computing across the enterprise.

Big data cluster Cloud Transient (computer programming)

Published at DZone with permission of Scott Gidley. See the original article here.

Opinions expressed by DZone contributors are their own.

Related

  • Apache Kafka Is the New Black at the Edge in IoT Projects
  • How To Use Geo-Partitioning to Comply With Data Regulations and Deliver Low Latency Globally
  • Auditing Tools for Kubernetes
  • Dynatrace Perform: Day Two

Comments

Partner Resources

X

ABOUT US

  • About DZone
  • Send feedback
  • Careers
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 100
  • Nashville, TN 37211
  • support@dzone.com

Let's be friends: