How to Maximize the Azure Cosmos DB Availability

In this article, we will learn how to improve the Azure Cosmos database availability on applications that have zero tolerance for downtime.

Prem Kumar Mani

Feb. 06, 25 · Analysis

Likes (0)

Comment

Save

3.2K Views

Most of the e-commerce applications are zero-tolerant of any downtime. Any impact on application resources can impact the overall availability metrics of the site. Azure Cosmos database is one of the major NoSQL databases used across the industry. Though the Azure Cosmos itself provides 99.99% minimum availability for a single region without an availability zone, how do we further improve the database availability with the options available in the Azure Cosmos?

Multi-Region Read and Write

Single-region reads will impact the availability and will also lead to a single point of failure. So, read-heavy applications should at least have multi-region read enabled, though multi-region writes are not an option for an application. But, multi-region write provides a greater availability on both read and write-heavy applications.

With multi-region write capability, you can enable multi-master replication, where all configured regions can serve as write endpoints.

Best Practices

Select regions based closer to the region where the application is deployed.
Configure multiple preferred regions based on the application's requirements to enhance availability.
Set more than one preferred region in the application for reads and writes to improve availability and reduce latency.
Set the preferred regions in the order of the application's current or nearest regions first in the list.

Application Deployed in West US 2

    Java
   
   //Configure application deployed in West US 2 as below

import com.azure.cosmos.CosmosClientBuilder;
import com.azure.cosmos.CosmosClient;

// ...

CosmosClientBuilder clientBuilder = new CosmosClientBuilder()
    .setEndpoint(accountEndpoint)
    .setKey(accountKey)
    .setPreferredRegions(Arrays.asList("West US 2", "East US"));

CosmosClient client = clientBuilder.buildClient();

//

Application Deployed in East US

    Java
   
   //Configure application deployed in East US as below
import com.azure.cosmos.CosmosClientBuilder;
import com.azure.cosmos.CosmosClient;

// ...

CosmosClientBuilder clientBuilder = new CosmosClientBuilder()
    .setEndpoint(accountEndpoint)
    .setKey(accountKey)
    .setPreferredRegions(Arrays.asList( "East US","West US 2"));

CosmosClient client = clientBuilder.buildClient();

//

Conclusion

Though enabling multi-region read and writes can provide greater availability, configuring the application read and writes closer to the region it's being deployed and providing more than one preferred region helps the application to fall back immediately to the available region without any manual intervention.

Consistency Levels

Select consistency levels based on the application's requirements. Higher consistency expectations typically result in reduced availability. If the application demands strong data consistency, ensure it can tolerate potential higher latencies. Conversely, if weaker consistency is acceptable, the application can benefit from improved throughput and availability.

Conclusion

Choosing the right consistent level purely depends on the application's need, and though there might be an impact on the availability of stronger consistency, the overall availability of an application will not be impacted by choosing the stronger consistency levels.

Failover

Manual Failover

Developers or associates can log in to the portal and manually failover to the next available region during an outage on the region the application is currently connected to. Though this option provides availability to some extent, it requires a manual intervention to failover, which can impact the overall site availability metrics.

Service-Manged Failover

Enabling service-managed failover allows Cosmos to automatically switch to the next available region based on the priority configured in the portal. This option eliminates the need for any application changes during the failover process.

Conclusion

Though both provide increased availability, service-managed failover throughput gives the flexibility of failing over to the next available region without worrying about the application deployment.

Partition Key and Indexes

Defining a partition key in Azure Cosmos DB is crucial before running any application on it. Cosmos DB is highly efficient for read-intensive applications, so it's essential to consider the lookup criteria and define the queries for reading records from the database before integrating Cosmos DB into your application.
By default, every item in a Cosmos DB container is automatically indexed. However, excluding certain items or fields from indexing can help reduce the consumption of Request Units (RUs). It is just as important to add fields for indexing as it is to remove indexes on fields that aren't required to be indexed.
Avoid storing excessively large items in Azure Cosmos DB.
Minimize cross-partition queries whenever possible.
Ensure queries include filters to improve efficiency.
Avoid querying the same partition key repeatedly; rather, implement a caching layer on such use cases.

Throughput Autoscale

Azure Cosmos DB supports both standard (manual) and autoscale throughput at the container level.

Manual Throughput

The application decides the RU/s allowed, and maxing out the RU/s requests will be throttled for the configured time. Requires manual intervention to increase the throughput.

Autoscale Throughput

The application can configure the maximum throughput it supports, and Cosmos autoscales itself based on the traffic received. On exceeding the autoscale throughput, requests will be throttled for configured time.

Conclusion

Though both provide increased availability, autoscale throughput gives the flexibility of handling varying traffic without throttling or impacting the availability.

Backup and Restore

Azure Cosmos DB enables periodic backups by default for all accounts

Periodic Backup

Backups are taken periodically for every configured minute with a minimum value of 1 hour and a maximum of 24 hours. It also provides options to keep the backup storage redundant at the Geo, Zone, or Local level. The application team needs to reach out to the support to retrieve the backup.

Continuous Backup

The continuous backup option keeps the backup storage on the region's Cosmos database configured, and it allows the retention of data from the last 7 days or from the last 30 days. It also provides point in time restoration.

Conclusion

Opting for continuous backup ensures faster restoration of the database. This eliminates the need for back-and-forth interactions with support to restore the database and allows applications to restore it to any region (where backups exist) at a specific point in time.

In conclusion, while availability metrics are crucial for any application, they come at a cost. Options that offer higher availability than the standard configuration incur additional expenses. Moreover, the above-mentioned options may not be necessary or suitable for all applications using Cosmos. However, it is essential to adopt and implement best practices in Azure Cosmos to optimize availability effectively.

Cosmos DB Database azure Cosmos (operating system)

Opinions expressed by DZone contributors are their own.

Related

Trending

How to Maximize the Azure Cosmos DB Availability

In this article, we will learn how to improve the Azure Cosmos database availability on applications that have zero tolerance for downtime.

Multi-Region Read and Write

Best Practices

Application Deployed in West US 2

Application Deployed in East US

Conclusion

Consistency Levels

Conclusion

Failover

Manual Failover

Service-Manged Failover

Conclusion

Partition Key and Indexes

Throughput Autoscale

Manual Throughput

Autoscale Throughput

Conclusion

Backup and Restore

Periodic Backup

Continuous Backup

Conclusion

Related

Partner Resources