DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports
Events Video Library
Over 2 million developers have joined DZone. Join Today! Thanks for visiting DZone today,
Edit Profile Manage Email Subscriptions Moderation Admin Console How to Post to DZone Article Submission Guidelines
View Profile
Sign Out
Refcards
Trend Reports
Events
View Events Video Library
Zones
Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks

Integrating PostgreSQL Databases with ANF: Join this workshop to learn how to create a PostgreSQL server using Instaclustr’s managed service

Mobile Database Essentials: Assess data needs, storage requirements, and more when leveraging databases for cloud and edge applications.

Monitoring and Observability for LLMs: Datadog and Google Cloud discuss how to achieve optimal AI model performance.

Automated Testing: The latest on architecture, TDD, and the benefits of AI and low-code tools.

Related

  • API and Database Performance Optimization Strategies
  • Building the Next-Generation Data Lakehouse: 10X Performance
  • Providing Enum Consistency Between Application and Data
  • Designing High-Volume Systems Using Event-Driven Architectures

Trending

  • Docker and Kubernetes Transforming Modern Deployment
  • Mastering Persistence: Why the Persistence Layer Is Crucial for Modern Java Applications
  • Auditing Spring Boot Using JPA, Hibernate, and Spring Data JPA
  • REST vs. Message Brokers: Choosing the Right Communication
  1. DZone
  2. Data Engineering
  3. Data
  4. Two-Phase-Commit for In-Memory Caches - Part II

Two-Phase-Commit for In-Memory Caches - Part II

What happens when an in-memory cache serves as a layer on top of a persistent database? You need data consistency via read- or write-throughs.

Dmitriy Setrakyan user avatar by
Dmitriy Setrakyan
·
Sep. 11, 14 · Interview
Like (0)
Save
Tweet
Share
4.32K Views

Join the DZone community and get the full member experience.

Join For Free

Generally, persistent disk-oriented systems will require the additional 3rd phase in commit protocol in order to ensure data consistency in case of failures. In my previous blog I covered why the 2-Phase-Commit protocol (without 3rd phase) is sufficient to handle failures for distributed in-memory caches. The explanation was based on the open source GridGain architecture, however it can be applied to any in-memory distributed system.

In this blog we will cover a case when an in-memory cache serves as a layer on top of a persistent database. In this case the database serves as a primary system of records, and distributed in-memory cache is added for performance and scalability reasons to accelerate reads and (sometimes) writes to the data. Cache must be kept consistent with database which means that a cache transaction must merge with the database transaction.

When we add a persistent store to an in-memory cache, our primary goal is to make sure that the cache will remain consistent with on-disk database at all times. 

In order to keep the data consistent between memory and database, data is automatically loaded on demand whenever a read happens and the data cannot be found in cache. This behavior is called read-through. Alternatively, whenever a write operation happens, data is stored in cache and is automatically persisted to the database.  This behavior is called write-through. Additionally, there is also a mode called write-behind which batches up the writes in memory and flushes them to the database in one bulk operation (we will not be covering this mode here).


Figure 1: Read-through operation for K2                                

When we add a persistent store to the 2-Phase-Commit protocol, in order to merge cache and database transactions into one, the coordinator will have to write the transactional changes to the database before it sends the Commit message to the other participants. This way, if database transaction fails, the coordinator can still send the Rollback message to everyone involved, so that the cached data will remain consistent with database. Figure below illustrates this behavior.
Figure 2: Two-Phase-Commit with In-Memory-Cache and Database
Handling failures is actually more straight forward whenever a database is present rather than when it is not. We always assume that the database must have the utmost up-to-date copy, and it is acceptable to reload data from the database into cache whenever in doubt (see Figure 1). Just like in my previous blog, the most challenging scenario here is when the coordinator node crashes (potentially together with other nodes), because in this case we cannot tell whether it crashed before it was able to commit to the database or not. Other failure scenarios are handled the same way with database present as without.
Whenever we cannot tell whether the database commit had happened or not, we can simply reload the relevant data from database into cache upon committing the transaction. This effectively ensures that database and cache always remain in consistent state.

Conclusion

When working with in-memory caches, we can always manage to keep the data within transactions consistent by slightly enhancing the standard 2-Phase-Commit protocol. The main advantage of in-memory vs. disk is that failure handling does not introduce any additional overhead, and we do not need to add an expensive 3rd phase to the 2-Phase-Commit protocol in order to keep caches consistent with databases in case of failures.
Database Cache (computing)

Published at DZone with permission of Dmitriy Setrakyan, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

Related

  • API and Database Performance Optimization Strategies
  • Building the Next-Generation Data Lakehouse: 10X Performance
  • Providing Enum Consistency Between Application and Data
  • Designing High-Volume Systems Using Event-Driven Architectures

Comments

Partner Resources

X

ABOUT US

  • About DZone
  • Send feedback
  • Careers
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 100
  • Nashville, TN 37211
  • support@dzone.com

Let's be friends: