DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Zones

Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks

Last call! Secure your stack and shape the future! Help dev teams across the globe navigate their software supply chain security challenges.

Modernize your data layer. Learn how to design cloud-native database architectures to meet the evolving demands of AI and GenAI workloads.

Releasing software shouldn't be stressful or risky. Learn how to leverage progressive delivery techniques to ensure safer deployments.

Avoid machine learning mistakes and boost model performance! Discover key ML patterns, anti-patterns, data strategies, and more.

Related

  • Unmasking Entity-Based Data Masking: Best Practices 2025
  • Fixing Common Oracle Database Problems
  • SAP HANA Triggers: Enhancing Database Logic and Automation
  • The Future of Data Lakehouses: Apache Iceberg Explained

Trending

  • A Guide to Developing Large Language Models Part 1: Pretraining
  • Beyond Linguistics: Real-Time Domain Event Mapping with WebSocket and Spring Boot
  • Building Enterprise-Ready Landing Zones: Beyond the Initial Setup
  • Mastering Fluent Bit: Installing and Configuring Fluent Bit on Kubernetes (Part 3)
  1. DZone
  2. Data Engineering
  3. Data
  4. What to Do When Data Goes out of Sync?

What to Do When Data Goes out of Sync?

We've all said this at some point. So, how do we fix it? Read this article to find out.

By 
Akshat Kansal user avatar
Akshat Kansal
·
Jun. 22, 18 · Opinion
Likes (1)
Comment
Save
Tweet
Share
4.6K Views

Join the DZone community and get the full member experience.

Join For Free
Oh! The data is out of sync.

I am sure many of us have heard this multiple times when we built systems to either support scale or give a better experience to the user.

We all have seen situations where we want the contents of the database in some other systems, for ex: in a Hadoop cluster for analytics, in Elasticsearch for better search, in cache systems so that the applications are nice and fast.

If we were to do this for a system where the data does not change, it would have been very easy. We could have taken a snapshot of the database and loaded the data in another system.

However, reality has a different story to tell. By the time we are done with loading the snapshot, the data is already stale, which is not really great in today's world.

So, what do we do in case we need real-time data in other systems?

I guess we all end up asking our applications to write to multiple systems. What this means is that every time the application writes to the database, it updates the cache for faster retrievals, reindexes search systems, and sends the data for analytics.

Is there any problem with the current approach? Probably not, until we do not hear that the cache is out of sync or has stale data, the changes they made did not reflect in the analytics because the sync job failed or has not pushed the data. Over a period of time, this approach starts seeing race conditions and reliability issues, and what we end up with is a data drift across multiple systems, a big team of engineers rebuilding caches, making sure data is available across all systems, and tons of monitoring infrastructure.

Now, let's try to see this from a different angle. Let's consider a write to the database as a stream. Every time a database change happens, it is a new message in the stream. If we apply the messages to a system in a similar order, we would end up with an exact copy of the same data in another system. This is typically how database replications work.

This approach to building systems is called Change Data Capture. It is already being used by companies like Yelp, Facebook, LinkedIn, etc.

I am very excited about this, as it allows us to unlock the value of data we already have, and we can feed the data into a central hub where the data can be enriched with event streams and data from other databases in real time. This makes it much easier to experiment with minimal data corruption.

I will write another post on how to implement it.

Data (computing) Sync (Unix) Database

Opinions expressed by DZone contributors are their own.

Related

  • Unmasking Entity-Based Data Masking: Best Practices 2025
  • Fixing Common Oracle Database Problems
  • SAP HANA Triggers: Enhancing Database Logic and Automation
  • The Future of Data Lakehouses: Apache Iceberg Explained

Partner Resources

×

Comments

The likes didn't load as expected. Please refresh the page and try again.

ABOUT US

  • About DZone
  • Support and feedback
  • Community research
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 100
  • Nashville, TN 37211
  • support@dzone.com

Let's be friends: