DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Please enter at least three characters to search
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Zones

Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks

The software you build is only as secure as the code that powers it. Learn how malicious code creeps into your software supply chain.

Apache Cassandra combines the benefits of major NoSQL databases to support data management needs not covered by traditional RDBMS vendors.

Generative AI has transformed nearly every industry. How can you leverage GenAI to improve your productivity and efficiency?

Modernize your data layer. Learn how to design cloud-native database architectures to meet the evolving demands of AI and GenAI workloads.

Related

  • Architecture and Code Design, Pt. 2: Polyglot Persistence Insights To Use Today and in the Upcoming Years
  • Architecture and Code Design, Pt. 1: Relational Persistence Insights to Use Today and On the Upcoming Years
  • Data Management Patterns for Microservices
  • Microservice: Creating JPA Application Using Jakarta Persistence API in Payara Micro

Trending

  • Mastering Fluent Bit: Installing and Configuring Fluent Bit on Kubernetes (Part 3)
  • Breaking Bottlenecks: Applying the Theory of Constraints to Software Development
  • Event-Driven Architectures: Designing Scalable and Resilient Cloud Solutions
  • A Developer's Guide to Mastering Agentic AI: From Theory to Practice
  1. DZone
  2. Data Engineering
  3. Databases
  4. Managing Global Data in Microservice Polyglot Persistence Scenarios

Managing Global Data in Microservice Polyglot Persistence Scenarios

Analyzing the management of global data in a microservices environment where polyglot persistence is adopted

By 
Claudio Guidi user avatar
Claudio Guidi
DZone Core CORE ·
Mar. 02, 21 · Opinion
Likes (10)
Comment
Save
Tweet
Share
11.1K Views

Join the DZone community and get the full member experience.

Join For Free

Editor’s Note: The following is an article written for and published in DZone’s 2021 Data Persistence Trend Report.


Microservices architectures are naturally prone to host polyglot persistence scenarios. Thanks to the fact that microservices are technology-agnostic, indeed, it is possible to adopt different data storage technologies depending on their functionalities.

On the one hand, such an approach allows for increasing the overall performance of a system. Architects and developers may tune each microservice to be as effective as possible by selecting technology that offers functionalities best suited to the performance needs unique to each service. On the other hand, there could be an increment of complexity. One aspect that can increase such complexity is represented by those data that have a global scope with respect to single microservices, thus requiring them to be synchronized.

This article is focused on analyzing the management of global data in a microservices environment where polyglot persistence is adopted. In particular, we will discuss some possible solutions for dealing with the alignment of global data among all the involved microservices. A discussion about transactional synchronization in a distributed microservice architecture is out of the scope of this article — eventual consistency scenarios are considered instead.

Polyglot Persistence and Microservices

Polyglot persistence means the usage of multiple data storage technologies within the same application, where each technology can be used for addressing different requirements. For example, there are:

  • Key-value databases – usually adopted when fast read and writes are required.
  • RDBMSs – used when transactions are strictly necessary and data structures are fixed.
  • Document-based databases – used when dealing with high loads and flexible data structures.
  • Graph databases – used when rapid navigation among links is necessary.
  • Column databases – used when large-scale analytics are needed. 

Polyglot Persistence Scenario: One-to-One

The ideal polyglot persistence scenario in a microservices architecture is what we call the one-to-one scenario (Figure 1), where there is a unique microservice for each data persistence technology. This is one of the natural derivations of the microservice approach through which microservices boundaries can be defined depending on the technology they manage, thus offering a set of APIs to external invokers, which abstract away from the peculiarities of the underlying technology.

Figure 1


Complex Polyglot Persistence Approaches

In real life, the scenarios could be more complicated. More microservices could need access to the same data source, and more data sources can be used within the same microservice. We name the former case more-to-one scenario and the latter case one-to-more. Finally, we call more-to-more the mixed one. Figure 2 illustrates each of these scenarios:

Figure 2


Table 1

Scenario Description
More-to-one

When the data contained in a single data source are so large and differentiated that they require more microservices that offer sets of different functionalities

An intuitive example: a large catalogue of various products having microservices that manage different types of products because they have diverse characteristics

In Figure 2, a, b, and c are mapped into API ABC at the microservice level. Note: ABC is not necessarily the bare sum of A, B, and C — it could be just a basic set of APIs from which it is possible to extract A, B, and C by refinement and composition.
One-to-more

When the data model of the microservice requires data to be stored in multiple data sources

Figure 2 shows datasets e and f are composed at the microservice level and are mapped into the API EF.
More-to-more A combination of the more-to-one and one-to-more scenarios


Ideally, these three scenarios can be normalized into a pure one-to-one scenario. Such a normalization allows for dividing the architecture into three fundamental layers:

  • Data persistence layer – the bottom layer where there are data sources.
  • Microservice Data Layer – the microservice layer that manages the data source.
  • Microservice Functional Layer – the microservices that implement the business functionalities.

Such a normalization step is not mandatory, but it may be useful as a reference when making architectural decisions. Figure 3 reports how the above approaches can be normalized into a one-to-one approach using a three-layered architecture:

Figure 3


Global Data and Data Master Services

Global data has a scope that spans even more microservices, thus it can be present in the signatures of different microservices. As an example, let us consider the field email of a customer. It will be stored in the data source with user profiles, but it could also be required to perform payments or generate reports. In Figure 4, both microservices B and D provide functionalities where global data EMAIL is present. Microservice B retrieves EMAIL from microservice W, whereas microservice D retrieves EMAIL from microservice U. In this case, we suppose EMAIL is stored both in the RDBMS and the column database.

Figure 4


Clearly, in the depicted scenario, it is possible that the EMAIL could differ because of a modification. Thus, it is necessary to synchronize the value of EMAIL within both databases.

To do that requires identifying the master data source and the slave sources. Every modification of the data must be performed in the master source and then communicated to the slaves. Usually, the master data source is more focused on the specific type of global data. In the previous example, the EMAIL could be stored in the master data source — the RDBMS, which keeps the customer registry — whereas the column database could act as a slave.

This is an excerpt from DZone's 2021 Data Persistence Trend Report.

For more:


 Read the Report 

Synchronizing Data Using Batch Processes

Batch processes are often used for extracting data from a data source and transferring them into another. They are periodically triggered and can transfer a huge amount of data, but they are strongly coupled with the data source technologies and cannot guarantee an immediate synchronization. Further, they can easily increment technical debt, thus limiting the overall flexibility of the system.

In Figure 5, global data EMAIL is stored in three data sources — RDBMS-1, CD (column database), and RDBMS-2 — and is accessed by microservices W, U and Q, respectively. RDBMS-1 is the master and accepts writing operations from W. The others are slaves and are periodically synchronized thanks to two batch processes (grey rectangles). U and Q can only read the value of EMAIL without altering it.

Figure 5


Table 2: Using Batch Processes

Advantages Disadvantages
Massive data synchronization
  • Coupled with both source and target technologies
  • Prone to easily increment the technical debt
  • Usually scheduled during nighttime; can slow down performance


Synchronizing Data Using a Coordinator and an Asynchronous Queue

A solution that is most suitable for microservices architectures is using a coordinator with an asynchronous queue (see example in Figure 6). This approach includes:

  • An asynchronous queue – keeps all the data change records in order to process them in the right sequence.
  • A gateway interceptor– intercepts both the request and response of the writing of global data, and then pushes the data change records into the queue.
    • A temporary queue can store pending requests while the microservice completes the writing and clears when the response message is intercepted.
    • The temporary queue helps find inconsistencies in global data synchronization due to writing microservice malfunctioning.
  • A coordinator – processes the data change record into the queue and calls all of the involved microservices to write the update.

Figure 6


It is worth noting that from the design point of view, the slave microservices APIs should not offer explicit writing operations for global data — just the synchronizing ones. Such a choice will avoid direct writing of global data into slave microservices.

Table 3: Using a Coordinator and Asynchronous Queue

Advantages Disadvantages
  • Independent from data persistence technology
  • Triggered at writing time
  • Light infrastructure
  • Slave microservices code is not affected by the synchronization process, but the synchronization API must be available.
  • Could require several coordinators if the global data are numerous
  • No history in the queue
  • If the interfaces of slave microservices are not standardized, thus having different signatures for synchronization, the coordinator must be changed when slaves are added, removed, or modified.
  • Transactional: all slaves must be synchronized in the same session. Such an assumption may be relaxed by developing a coordinator for each slave microservice but increasing the complexity of the infrastructure.


Synchronizing Data Using an Event-Based Backbone

Finally, a Kafka event backbone can be used when there are numerous data to be synchronized and the slave microservices could vary; some of them can be added and others can be removed. In this scenario, there are specific topics programmed into the backbone — a topic for each global data to be synchronized. Each slave microservice can independently retrieve the data change from the backbone and then synchronize its own data. This is a pluggable infrastructure that can scale easily.

Here, the price to pay is that each slave microservice must be equipped with a specific agent that is in charge of getting the data from the backbone. Figure 7 depicts an example of a Kafka backbone where a specific topic is dedicated to global data EMAIL. The master writes on the topic every time there is a modification, and the slaves read from the topic to be kept aligned.

Figure 7


Table 4: Using an Event-Based Backbone

Advantages Disadvantages
  • Independent from data persistence technology
  • Triggered at writing time
  • Data changes history in the queue, and cursors are used for selecting the synchronizing point
  • Scalable, and microservices can be easily pluggable
  • Slaves have independent synchronization.
  • Huge amount of data transmitted
  • Slave microservices are affected by the synchronization process because they must be equipped with specific agents.
  • Topics and events must be designed and managed properly.
  • Costly infrastructure

Conclusion

In this article, we analyzed polyglot persistence scenarios with microservices. We showed how to normalize an architecture to become the more ideal one-to-one architecture and discussed the differences between master and slave microservices. Finally, we demonstrated three ways to synchronize global data across microservices, which are reexamined simply below: 

  • Batch processes – a legacy solution that should be deprecated.
  • Coordinator with an asynchronous queue – the right solution when there are a limited set of global data and slave microservices to be synchronized.
  • Event-based backbone – the right solution when the system is quite complex with many microservices and different data to be synchronized.

Claudio Guidi, Chairman of the Board at italianaSoftware s.r.l.
@cguidi on DZone | @guidiclaudio on Twitter | @claudioguidi on LinkedIn

Claudio is the co-creator and language designer of the programming language Jolie. He works also as a software architect consultant, and he is a member of the Council of the Microservices Community.
microservice Data (computing) Database Persistence (computer science) Polyglot (computing) Relational database

Opinions expressed by DZone contributors are their own.

Related

  • Architecture and Code Design, Pt. 2: Polyglot Persistence Insights To Use Today and in the Upcoming Years
  • Architecture and Code Design, Pt. 1: Relational Persistence Insights to Use Today and On the Upcoming Years
  • Data Management Patterns for Microservices
  • Microservice: Creating JPA Application Using Jakarta Persistence API in Payara Micro

Partner Resources

×

Comments
Oops! Something Went Wrong

The likes didn't load as expected. Please refresh the page and try again.

ABOUT US

  • About DZone
  • Support and feedback
  • Community research
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 100
  • Nashville, TN 37211
  • support@dzone.com

Let's be friends:

Likes
There are no likes...yet! 👀
Be the first to like this post!
It looks like you're not logged in.
Sign in to see who liked this post!