DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports
Events Video Library
Over 2 million developers have joined DZone. Join Today! Thanks for visiting DZone today,
Edit Profile Manage Email Subscriptions Moderation Admin Console How to Post to DZone Article Submission Guidelines
View Profile
Sign Out
Refcards
Trend Reports
Events
View Events Video Library
Zones
Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks

Integrating PostgreSQL Databases with ANF: Join this workshop to learn how to create a PostgreSQL server using Instaclustr’s managed service

Mobile Database Essentials: Assess data needs, storage requirements, and more when leveraging databases for cloud and edge applications.

Monitoring and Observability for LLMs: Datadog and Google Cloud discuss how to achieve optimal AI model performance.

Automated Testing: The latest on architecture, TDD, and the benefits of AI and low-code tools.

Related

  • Data Migration from AWS DocumentDB to Atlas on AWS
  • Advanced Maintenance of a Multi-Database Citus Cluster With Flyway
  • What Developers Need to Know About Table Geo-Partitioning
  • Raft in Tarantool: How It Works and How to Use It

Trending

  • Next.js vs. Gatsby: A Comprehensive Comparison
  • Bad Software Examples: How Much Can Poor Code Hurt You?
  • The Systemic Process of Debugging
  • How To Verify Database Connection From a Spring Boot Application
  1. DZone
  2. Software Design and Architecture
  3. Cloud Architecture
  4. Large Scale Distributed Consensus Approaches: Large Data Sets

Large Scale Distributed Consensus Approaches: Large Data Sets

Oren Eini user avatar by
Oren Eini
·
Dec. 23, 14 · Interview
Like (0)
Save
Tweet
Share
4.67K Views

Join the DZone community and get the full member experience.

Join For Free

in my previous post, i talked about how we can design a large cluster for compute bound operations. the nice thing about this is that is that the actual amount of shared data that you need is pretty small, and you can just distribute that information among your nodes, then let them do stateless computation on that, and you are done.

a much more common scenario is when can’t just do stateless operations, but need to keep track of what is actually going on. the typical example is a set of users changing data. for example, let us say that we want to keep track of the pages each user visit on our site. (yes, that is a pretty classic big table scenario, i’ll ignore the prior art issue for now). how would we design such a system?

well, we still have the same considerations. we don’t want a single point of failures, and we want to have very large number of machines and make the most of their resources.

in this case, we are merely going to change the way we look at the data. we still have the following topology:

image

there is the consensus cluster, which is responsible for cluster wide immediately consistent operations. and there are all the other nodes, which actually handle processing requests and keeping the data.

what kind of decisions do we get to make in the consensus cluster? those would be:

  • adding & removing nodes from the entire cluster.
  • changing the distribution of the data in the cluster.

in other words, the state that the consensus cluster is responsible for is the entire cluster topology. when a request comes in, the cluster topology is used to decide into which set of nodes to direct it to.

typically in such systems, we want to keep the data on three separate nodes, so we get a request, then route it to one of those three nodes that match this. this is done by sharding the data according the the actual user id whose page views we are trying to track.

distributing the sharding configuration is done as described in the compute cluster example, and the actual handling of requests, or sending the data between the sharded instances is handled by the cluster nodes directly.

note that in this scenario, you cannot ensure any kind of safety. two requests for the same user might hit different nodes, and do separate operations without being able to consider the concurrent operation. usually, that is a good thing, but that isn’t always the case. but that is an issue of the next post.

Data (computing) cluster

Published at DZone with permission of Oren Eini, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

Related

  • Data Migration from AWS DocumentDB to Atlas on AWS
  • Advanced Maintenance of a Multi-Database Citus Cluster With Flyway
  • What Developers Need to Know About Table Geo-Partitioning
  • Raft in Tarantool: How It Works and How to Use It

Comments

Partner Resources

X

ABOUT US

  • About DZone
  • Send feedback
  • Careers
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 100
  • Nashville, TN 37211
  • support@dzone.com

Let's be friends: