Managing Global Data in Microservice Polyglot Persistence Scenarios

Analyzing the management of global data in a microservices environment where polyglot persistence is adopted

Claudio Guidi

CORE ·

Mar. 02, 21 · Opinion

Likes (10)

Comment

Save

11.1K Views

Editor’s Note: The following is an article written for and published in DZone’s 2021 Data Persistence Trend Report.

Microservices architectures are naturally prone to host polyglot persistence scenarios. Thanks to the fact that microservices are technology-agnostic, indeed, it is possible to adopt different data storage technologies depending on their functionalities.

On the one hand, such an approach allows for increasing the overall performance of a system. Architects and developers may tune each microservice to be as effective as possible by selecting technology that offers functionalities best suited to the performance needs unique to each service. On the other hand, there could be an increment of complexity. One aspect that can increase such complexity is represented by those data that have a global scope with respect to single microservices, thus requiring them to be synchronized.

This article is focused on analyzing the management of global data in a microservices environment where polyglot persistence is adopted. In particular, we will discuss some possible solutions for dealing with the alignment of global data among all the involved microservices. A discussion about transactional synchronization in a distributed microservice architecture is out of the scope of this article — eventual consistency scenarios are considered instead.

Polyglot Persistence and Microservices

Polyglot persistence means the usage of multiple data storage technologies within the same application, where each technology can be used for addressing different requirements. For example, there are:

Key-value databases – usually adopted when fast read and writes are required.
RDBMSs – used when transactions are strictly necessary and data structures are fixed.
Document-based databases – used when dealing with high loads and flexible data structures.
Graph databases – used when rapid navigation among links is necessary.
Column databases – used when large-scale analytics are needed.

Polyglot Persistence Scenario: One-to-One

The ideal polyglot persistence scenario in a microservices architecture is what we call the one-to-one scenario (Figure 1), where there is a unique microservice for each data persistence technology. This is one of the natural derivations of the microservice approach through which microservices boundaries can be defined depending on the technology they manage, thus offering a set of APIs to external invokers, which abstract away from the peculiarities of the underlying technology.

Figure 1

Complex Polyglot Persistence Approaches

In real life, the scenarios could be more complicated. More microservices could need access to the same data source, and more data sources can be used within the same microservice. We name the former case more-to-one scenario and the latter case one-to-more. Finally, we call more-to-more the mixed one. Figure 2 illustrates each of these scenarios:

Figure 2

Table 1

Scenario	Description
More-to-one	When the data contained in a single data source are so large and differentiated that they require more microservices that offer sets of different functionalities An intuitive example: a large catalogue of various products having microservices that manage different types of products because they have diverse characteristics In Figure 2, a, b, and c are mapped into API ABC at the microservice level. Note: ABC is not necessarily the bare sum of A, B, and C — it could be just a basic set of APIs from which it is possible to extract A, B, and C by refinement and composition.
One-to-more	When the data model of the microservice requires data to be stored in multiple data sources Figure 2 shows datasets e and f are composed at the microservice level and are mapped into the API EF.
More-to-more	A combination of the more-to-one and one-to-more scenarios

Ideally, these three scenarios can be normalized into a pure one-to-one scenario. Such a normalization allows for dividing the architecture into three fundamental layers:

Data persistence layer – the bottom layer where there are data sources.
Microservice Data Layer – the microservice layer that manages the data source.
Microservice Functional Layer – the microservices that implement the business functionalities.

Such a normalization step is not mandatory, but it may be useful as a reference when making architectural decisions. Figure 3 reports how the above approaches can be normalized into a one-to-one approach using a three-layered architecture:

Figure 3

Global Data and Data Master Services

Global data has a scope that spans even more microservices, thus it can be present in the signatures of different microservices. As an example, let us consider the field email of a customer. It will be stored in the data source with user profiles, but it could also be required to perform payments or generate reports. In Figure 4, both microservices B and D provide functionalities where global data EMAIL is present. Microservice B retrieves EMAIL from microservice W, whereas microservice D retrieves EMAIL from microservice U. In this case, we suppose EMAIL is stored both in the RDBMS and the column database.

Figure 4

Clearly, in the depicted scenario, it is possible that the EMAIL could differ because of a modification. Thus, it is necessary to synchronize the value of EMAIL within both databases.

To do that requires identifying the master data source and the slave sources. Every modification of the data must be performed in the master source and then communicated to the slaves. Usually, the master data source is more focused on the specific type of global data. In the previous example, the EMAIL could be stored in the master data source — the RDBMS, which keeps the customer registry — whereas the column database could act as a slave.

This is an excerpt from DZone's 2021 Data Persistence Trend Report.

For more:

Read the Report

Synchronizing Data Using Batch Processes

Batch processes are often used for extracting data from a data source and transferring them into another. They are periodically triggered and can transfer a huge amount of data, but they are strongly coupled with the data source technologies and cannot guarantee an immediate synchronization. Further, they can easily increment technical debt, thus limiting the overall flexibility of the system.

In Figure 5, global data EMAIL is stored in three data sources — RDBMS-1, CD (column database), and RDBMS-2 — and is accessed by microservices W, U and Q, respectively. RDBMS-1 is the master and accepts writing operations from W. The others are slaves and are periodically synchronized thanks to two batch processes (grey rectangles). U and Q can only read the value of EMAIL without altering it.

Figure 5

Table 2: Using Batch Processes

Advantages	Disadvantages
Massive data synchronization	Coupled with both source and target technologies Prone to easily increment the technical debt Usually scheduled during nighttime; can slow down performance

Synchronizing Data Using a Coordinator and an Asynchronous Queue

A solution that is most suitable for microservices architectures is using a coordinator with an asynchronous queue (see example in Figure 6). This approach includes:

An asynchronous queue – keeps all the data change records in order to process them in the right sequence.
A gateway interceptor– intercepts both the request and response of the writing of global data, and then pushes the data change records into the queue.
- A temporary queue can store pending requests while the microservice completes the writing and clears when the response message is intercepted.
- The temporary queue helps find inconsistencies in global data synchronization due to writing microservice malfunctioning.
A coordinator – processes the data change record into the queue and calls all of the involved microservices to write the update.

Figure 6

It is worth noting that from the design point of view, the slave microservices APIs should not offer explicit writing operations for global data — just the synchronizing ones. Such a choice will avoid direct writing of global data into slave microservices.

Table 3: Using a Coordinator and Asynchronous Queue

Advantages

Disadvantages

Independent from data persistence technology
Triggered at writing time
Light infrastructure
Slave microservices code is not affected by the synchronization process, but the synchronization API must be available.

Could require several coordinators if the global data are numerous
No history in the queue
If the interfaces of slave microservices are not standardized, thus having different signatures for synchronization, the coordinator must be changed when slaves are added, removed, or modified.
Transactional: all slaves must be synchronized in the same session. Such an assumption may be relaxed by developing a coordinator for each slave microservice but increasing the complexity of the infrastructure.

Synchronizing Data Using an Event-Based Backbone

Finally, a Kafka event backbone can be used when there are numerous data to be synchronized and the slave microservices could vary; some of them can be added and others can be removed. In this scenario, there are specific topics programmed into the backbone — a topic for each global data to be synchronized. Each slave microservice can independently retrieve the data change from the backbone and then synchronize its own data. This is a pluggable infrastructure that can scale easily.

Here, the price to pay is that each slave microservice must be equipped with a specific agent that is in charge of getting the data from the backbone. Figure 7 depicts an example of a Kafka backbone where a specific topic is dedicated to global data EMAIL. The master writes on the topic every time there is a modification, and the slaves read from the topic to be kept aligned.

Figure 7

Table 4: Using an Event-Based Backbone

Advantages	Disadvantages
Independent from data persistence technology Triggered at writing time Data changes history in the queue, and cursors are used for selecting the synchronizing point Scalable, and microservices can be easily pluggable Slaves have independent synchronization. Huge amount of data transmitted	Slave microservices are affected by the synchronization process because they must be equipped with specific agents. Topics and events must be designed and managed properly. Costly infrastructure

Conclusion

In this article, we analyzed polyglot persistence scenarios with microservices. We showed how to normalize an architecture to become the more ideal one-to-one architecture and discussed the differences between master and slave microservices. Finally, we demonstrated three ways to synchronize global data across microservices, which are reexamined simply below:

Batch processes – a legacy solution that should be deprecated.
Coordinator with an asynchronous queue – the right solution when there are a limited set of global data and slave microservices to be synchronized.
Event-based backbone – the right solution when the system is quite complex with many microservices and different data to be synchronized.

Claudio Guidi, Chairman of the Board at italianaSoftware s.r.l.
@cguidi on DZone | @guidiclaudio on Twitter | @claudioguidi on LinkedIn

Claudio is the co-creator and language designer of the programming language Jolie. He works also as a software architect consultant, and he is a member of the Council of the Microservices Community.

microservice Data (computing) Database Persistence (computer science) Polyglot (computing) Relational database

Opinions expressed by DZone contributors are their own.

Related

Trending