Exploring Multi-Region Database Deployment Options With a Slack-Like Messenger
Distributing data, maintaining messaging.
Join the DZone community and get the full member experience.
Join For FreeDistributed database deployments across multiple regions are becoming commonplace. And there are several reasons for that. More and more applications have to comply with data residency requirements such as GDPR, serve user requests as fast as possible from the data centers closest to the user, and withstand cloud region-level outages.
This article reviews the most widespread multi-region deployment options for distributed transactional databases by designing a data layer for a Slack-like corporate messenger.
Corporate Messenger Architecture
Our Slack-like corporate messenger is a cloud-native application that comprises several microservices. Let’s suppose we’re working on the first version of the app with the following services:
Figure 1. Corporate Messenger Microservices.
Every microservice belongs to one of these groups:
- Tier 1: A mission-critical service. Its downtime significantly impacts a company's reputation and revenue. The service must be highly-available, strongly consistent, and restored immediately in case of an outage.
- Tier 2: A service that provides an important function. Its downtime impacts customer experience and can increase a customer churn rate. The service has to be restored within two hours.
- Tier 3: A service that provides a valuable capability. Its downtime impacts customer experience (but insignificantly). The service has to be restored within four hours.
The Profile microservice stores information about user profiles (i.e., name, email address, and profile picture URL). It belongs to the Tier 1 group as long as every other microservice depends on it.
The Messaging microservice provides an essential capability for every messenger app - an ability to exchange messages - and this places it in the Tier 1 group. As a Slack-like corporate messenger, the microservice supports company-specific workspaces and channels within those workspaces.
Two remaining microservices - Reminders and Status - improve the overall user experience. The Reminders service lets us forget about a particular conversation for now. However, we can come back to it later by scheduling a prompt like “Remind me about this message in a week at 9am”. With the Status microservice, we can let others know what we are up to - ‘Active’, ‘Busy’, ‘Vacationing’, or ‘Out Sick’. Both services are not mission-critical, thus, we place them in Tier 2 and 3 groups, respectively.
The messenger is designed as a geo-distributed application that spans all continents, scales horizontally, and tolerates even cloud region-level outages. While such a design requires us to talk about the deployment/orchestration of microservice instances across the globe and traffic routing through global load balancers, we’ll stay focused on the data layer in this article.
Multi-Region Database Deployment Options
The messenger’s data layer will be built on a transactional distributed database. We need a database that is strongly consistent, scales up and out on commodity hardware, and “speaks” SQL natively. Why not start with a single-server database such as PostgreSQL or MySQL and then upgrade it with some form of sharding and replication? Well, because it works to a certain extent, and here we can learn from the Slack engineering team. But, while the Slack team uses Vitess as a database clustering solution for MySQL, our corporate messenger will be built on YugabyteDB, a Postgres-compliant distributed database for geo-distributed apps.
The table below summarizes different multi-region deployment options available in YugabyteDB. You can use it as a quick reference and return to it later after we explore all the options in the following sections.
Overview |
The cluster is “stretched” across multiple regions. The regions can be located in relatively close proximity (e.g. US Midwest and East regions) or in distant locations (e.g. US East and Asia South regions). |
The cluster is spread across multiple distant geographic locations (e.g. North America and Asia). Every geographic location has its own group of nodes deployed across one or more local regions in close proximity (e.g. US Midwest and East regions). Data is pinned to a specific group of nodes based on the value of a partitioning column. |
The cluster is deployed in one geographic location (e.g. North America) and “stretched” across one or more local regions in relatively close proximity (e.g. US Midwest and East regions). Read replicas are usually placed in distant geographic locations (e.g. Europe and Asia). |
Multiple standalone clusters are deployed in various regions. The regions can be located in relatively close proximity (e.g. US Midwest and East regions) or in distant locations (e.g. US East and Asia South regions). The changes are replicated asynchronously between the clusters. |
Replication |
All data is synchronously replicated across all regions. |
Data is replicated synchronously and only within a group of nodes belonging to the same geographic location. |
Synchronous within the cluster and asynchronous from the cluster to read replicas. |
Synchronously within each cluster and asynchronously between the clusters. |
Consistency |
Strong Consistency |
Strong Consistency |
Timeline consistency |
Timeline consistency |
Write latency |
High latency (e.g., the writes can go from North America to a European region) |
Low latency (if written to a group of nodes from nearby geographic locations) |
Low latency for writes within the primary cluster’s region. High latency for writes from distant geographic locations (e.g. locations with read replicas). |
Low latency |
Read latency |
High latency |
Low latency |
Low latency (if queried from nearby geographic locations with the usage of read replicas) |
Low latency (if queried from a cluster from a nearby geographic location) |
Data Loss on Region Outage |
No data loss (if the cluster is “stretched” across multiple regions) |
No data loss (if in every geographic location the nodes are spread across multiple local regions) |
No data loss (if the primary cluster is “stretched” across multiple regions) |
Partial data loss - if a cluster is deployed within a single region, and the region fails before changes get replicated to other clusters. No data loss - if every standalone cluster is “stretched” across multiple local regions. |
Table 1. YugabyteDB Multi-Region Deployment Options
All our microservices belong to different service tiers and have various requirements. This allows us to explore how each of the above-mentioned multi-region deployment options can come into play and under what circumstances.
Also, we will experiment with the multi-region deployment options by using a Gitpod sandbox that bootstraps clusters with a desired configuration, creates schemas for our microservices, and provides sample instructions for data querying and management.
Profile and Messaging Services: A Single Geo-Partitioned Database
The Profile and Messaging microservices are always under high load. Users from all over the world send and read thousands of messages per second. User profile data is requested by other dependent microservices continuously. This all means that both Profile and Messaging microservices must perform at high speeds across the globe with no interruption. Additionally, the services and their data layer must scale up and out easily within a geographic location. So, if the load increases but only in Europe, then we want to scale up (or out) just in that region by adding more CPUs, storage, RAM, nodes, and leaving other regional infrastructure untouched.
These microservices also have to comply with data residency requirements in certain geographies. For instance, if our corporate messenger wants to operate in Europe, it must comply with the GDPR policies by storing profiles, messages, and other personal data in European data centers.
But if we push these requirements down to our data layer, what would be the most suitable multi-region deployment option? A single geo-partitioned cluster, that’s the best way to go.
Figure 2. Multi-Region Geo-Partitioned YugabyteDB Deployment.
Our single database cluster spreads across multiple distant geographic locations - North America, Europe, and Asia-Pacific (APAC). Every geographic location has its own group of nodes that, for the sake of high availability, can span across one or more local regions, such as the US Midwest, East, and South cloud regions in North America.
Data is pinned to a specific group of nodes based on the value of a partitioning column. In our case, the partitioning column is country
. For instance, if country=France
, a message will be automatically stored in a European region. But if country=Japan
, then the message will be located in an APAC data center.
Now, let’s get down to code and see how this multi-region deployment type works in practice. We will take advantage of our Gitpod sandbox that can bootstrap a sample YugabyteDB cluster in the geo-partitioned mode.
First, we need to configure our YugabyteDB nodes and microservices schemas:
1. Upon startup, cluster nodes are placed in one of the cloud regions belonging to a specific geographic location: us-west-1
for North America, eu-west-1
for Europe, and ap-south-1
for APAC (see the Deployment#1 (Geo-Distributed) task for details):
"placement_cloud=aws,placement_region=us-west-1,placement_zone=us-west-1a"
"placement_cloud=aws,placement_region=eu-west-1,placement_zone=eu-west-1a"
"placement_cloud=aws,placement_region=ap-south-1,placement_zone=ap-south-1a"
2. Region-specific tablespaces are created and match the placement regions of our cluster nodes:
CREATE TABLESPACE americas_tablespace WITH (
replica_placement='{"num_replicas": 1, "placement_blocks":
[{"cloud":"aws","region":"us-west-1","zone":"us-west-1a","min_num_replicas":1}]}'
);
CREATE TABLESPACE europe_tablespace WITH (
replica_placement='{"num_replicas": 1, "placement_blocks":
[{"cloud":"aws","region":"eu-west-1","zone":"eu-west-1a","min_num_replicas":1}]}'
);
CREATE TABLESPACE asia_tablespace WITH (
replica_placement='{"num_replicas": 1, "placement_blocks":
[{"cloud":"aws","region":"ap-south-1","zone":"ap-south-1a","min_num_replicas":1}]}'
);
3. All tables from the Profile and Messaging schemas get partitioned across those tablespaces with the usage of the country column. Here is what DDL statements look like for the Message table:
CREATE TABLE Message(
id integer NOT NULL DEFAULT nextval('message_id_seq'),
channel_id integer,
sender_id integer NOT NULL,
message text NOT NULL,
sent_at TIMESTAMP(0) DEFAULT NOW(),
country text NOT NULL,
PRIMARY KEY(id, country)
) PARTITION BY LIST(country);
CREATE TABLE Message_Americas
PARTITION OF Message
FOR VALUES IN ('USA', 'Canada', 'Mexico') TABLESPACE americas_tablespace;
CREATE TABLE Message_Europe
PARTITION OF Message
FOR VALUES IN ('United Kingdom', 'France', 'Germany', 'Spain') TABLESPACE europe_tablespace;
CREATE TABLE Message_Asia
PARTITION OF Message
FOR VALUES IN ('India', 'China', 'Japan', 'Australia') TABLESPACE asia_tablespace;
Next, once the cluster is configured and schema is created, we can try it out:
1. Exchange a few messages within the APAC region (country=India
):
INSERT INTO Message (channel_id, sender_id, message, country) VALUES
(8, 9, 'Prachi, the customer has a production outage. Could you join the line?', 'India');
INSERT INTO Message (channel_id, sender_id, message, country) VALUES
(8, 10, 'Sure, give me a minute!', 'India');
2. Confirm the messages were stored only in the APAC location by querying the shadow table Messages_Asia
directly:
SELECT c.name, p.full_name, m.message FROM Message_Asia as m
JOIN Channel_Asia as c ON m.channel_id = c.id
JOIN Profile_Asia as p ON m.sender_id = p.id
WHERE c.id = 8;
name | full_name | message
--------------------------+---------------+------------------------------------------------------------------------
on_call_customer_support | Venkat Sharma | Prachi, the customer has a production outage. Could you join the line?
on_call_customer_support | Prachi Garg | Sure, give me a minute!
(2 rows)
3. Double check the messages were not replicated to the North America location by querying the Message_Americas
table
SELECT c.name, p.full_name, m.message FROM Message_Americas as m
JOIN Channel_Americas as c ON m.channel_id = c.id
JOIN Profile_Americas as p ON m.sender_id = p.id
WHERE c.id = 8;
name | full_name | message
------+-----------+---------
(0 rows)
That’s it. Simple. But some of you might be bothered with one unanswered question. If all the data is stored in predefined geographic locations, why don’t we deploy multiple independent clusters in each location? Well, because with the single geo-partitioned cluster, we can do cross-region queries using a single connection endpoint to the database. There are use cases for this. Check out our Gitpod sandbox to learn more about the cross-region queries and the geo-partitioned deployment option.
Reminders Service: Single Database With Read Replicas
The Reminders microservice has more relaxed requirements than the Profile and Messaging microservices. The Reminders service neither experiences thousands of requests per second nor needs to comply with the data residency regulations. It allows us to consider another multi-region deployment option: the single cluster with read replicas.
Figure 3. Multi-Region Deployment With Read Replicas.
We deploy a single database cluster in one geographic location: North America. The microservice needs to tolerate cloud region-level outages, thus, the cluster is “stretched” across several regions in relative close proximity (e.g., US Midwest, East, and South). We can define one of the regions as the preferred zone to remain highly performant (we’ll talk more about this later in the next section).
Next, we deploy read replica nodes in distant geographic locations - one in Europe and one in Asia. The primary cluster synchronizes data to the read replicas asynchronously, which is acceptable for the Reminders.
In such a data layer configuration, writes always go to the primary cluster. Even if a user who is based in Singapore creates a reminder like “Remind me about this discussion in 1 hour”, the reminder will still be sent and stored in our North American cluster. However, we don’t need to connect to the primary cluster’s endpoint in order to send the reminder. The user’s browser/application will be connected to the read replica and the replica will automatically forward the write to the primary cluster for us!
The read replicas enable fast reads across all geographic locations. For instance, our microservice has a background task that wakes up every minute, pulls all reminders from the database that has just expired, and sends notifications to the users. Instances of such tasks are running in every distant geographic location - one in North America, one in Europe, and one in APAC. North America’s instance pulls America-specific reminders from the primary cluster while European and APAC instances read the reminders (belonging to the users from their regions) from read replica nodes. This is how read replicas can help to reduce network traffic across distant locations.
Now let’s use our Gitpod sandbox to experiment with this type of multi-region database deployment. The sandbox starts a single-node primary cluster and two read replicas (see the Deployment#2 (Read Replicas) task for details).
Once the deployment is ready:
1. We create a schema for the Reminders microservice:
CREATE TABLE Reminder(
id integer NOT NULL DEFAULT nextval('reminder_id_seq'),
profile_id integer,
message_id integer NOT NULL,
notify_at TIMESTAMP(0) NOT NULL DEFAULT NOW(),
PRIMARY KEY(id)
);
2. And then adding the first reminder to our primary cluster
ysqlsh -h 127.0.0.4 (connecting to the primary cluster’s endpoint)
INSERT INTO Reminder (profile_id, message_id, notify_at)
VALUES (5, 1, now() + interval '1 day');
3. Finally, we can connect to an APAC read replica node and confirm the reminder was replicated there:
ysqlsh -h 127.0.0.6 (connecting to the APAC replica’s endpoint)
SELECT * FROM Reminder;
id | profile_id | message_id | notify_at
----+------------+------------+---------------------
1 | 5 | 1 | 2022-04-02 18:55:02
(1 row)
To learn more, check out the Gitpod sandbox that has a Reminders-specific tutorial.
Status Service: Multiple Databases with Async Replication
The Status microservice has requirements similar to the Reminders one. It needs to be highly available and scalable but doesn’t need to comply with the data residency requirements. However, we project that the users will be changing their statuses ('Active', 'Busy', 'In a Meeting', 'Vacationing', 'Out Sick') much more frequently than they schedule reminders, thus, we need both fast reads and writes. With our truly global user base, we can’t expect fast writes to the read replica clusters. That’s why the Status service will be using another multi-region deployment option: multiple clusters with async xCluster replication.
Figure 4. Multi-Region Deployment of Multiple Standalone Clusters.
A multi-node standalone cluster is deployed in every distant location - North America, Europe, and APAC. The changes are replicated synchronously within each cluster and asynchronously between the clusters. We relax high availability requirements for this microservice by stretching each cluster across multiple availability zones within a cluster’s region. If the cluster’s region goes down so does the cluster, and some recent status changes might not be fully replicated to other clusters. But missed recent status changes are not mission-critical information and we are ready to lose those updates in favor of faster performance while all the clusters operate normally.
Let’s use the same Gitpod sandbox to see this type of deployment in action. Note, that at the time of writing, the bidirectional replication between more than three clusters has not been supported yet. As a result, our sandbox provisions two clusters (see the Deployment#3 (xCluster Replication) task).
Once the clusters are ready:
1. We create the Status service schema in each cluster:
CREATE TYPE profile_status AS ENUM ('Active', 'Busy', 'In a Meeting', 'Vacationing', 'Out Sick');
CREATE TABLE Status (
profile_id integer PRIMARY KEY,
status profile_status NOT NULL DEFAULT 'Active',
set_at TIMESTAMP(0) NOT NULL DEFAULT NOW()
);
2. Then finishing the setup of the bidirectional replication.
3. Next, a user from the USA changes his status. The status is updated in the cluster closest to the user:
ysqlsh -h 127.0.0.7 (connecting to the North America cluster’s endpoint)
UPDATE Status SET status = 'Active' WHERE profile_id = 3;
4. Finally, the update is replicated to the European cluster. The relations of the user from Europe can read the status from the European cluster:
ysqlsh -h 127.0.0.8 (connecting to the European cluster’s endpoint)
SELECT * FROM Status;
profile_id | status | set_at
------------+--------+---------------------
3 | Active | 2022-03-30 15:10:26
Review this section of our Gitpod tutorial to learn more about the xCluster-based replication setup.
Single “Stretched” Cluster Option
In the beginning, we provided a reference table with the four most widely-used multi-region deployment options. We used a single geo-partitioned cluster for the Profile and Messaging microservice, the read replica cluster for the Reminders service, and multiple clusters with async replication for the Status microservice. But what’s missing from the list? The single “stretched” cluster.
Due to high latencies between distant geographic locations (i.e., North America, Europe, APAC), the single “stretched” cluster across those faraway regions is not a suitable option for our microservices. This is because read and write latencies for cross-region queries might be prohibitively high. That’s why our services built up their data layers on other multi-region deployment options.
But, if you paid attention to details, you would oppose and say that, in fact, the single “stretched” cluster option was used by our microservices. And you would be right! We did use that multi-region deployment option, but for regions in relatively close proximity.
Figure 5. Multi-Region Cluster Across Regions in Close Proximity.
For instance, the primary cluster of the read replica deployment option (that we used for the Reminders service) spans three regions in close proximity: US Midwest, East, and South. With this configuration, our service becomes tolerant to region-level outages without taking a significant performance penalty. We’ll remain highly performant by setting a preferred region with the set_preferred_zones setting. For example, if US Midwest is used as a preferred region, then it will host the primary master process and all tablet leaders that will minimize network roundtrips across multiple regions.
Closing Words
As we discovered, various multi-region database deployment options exist for a reason. We don’t need to pick one option and use it for all use cases. By breaking down the Slack-like messenger into microservices, we’ve seen that it’s up to every microservice to decide how to architect its multi-region data layer. And, again, you can try those options out on your personal laptop by using the Gitpod sandbox.
This concludes the deep dive into the data layer of our Slack-like corporate messenger. If you are curious about the design and architecture of the messenger’s upper layer (i.e., deployment and coordination of microservices across multiple regions, multi-region APIs layer, and usage of global load balancers), then leave a note in the comments section.
Opinions expressed by DZone contributors are their own.
Comments