Azure Cosmos DB: Use Cases and Trade-Offs
Azure Cosmos DB is a fully managed, elastically scalable, and globally distributed database with a multi-model approach with the ability to provide visual data.
Join the DZone community and get the full member experience.Join For Free
Azure Cosmos DB is a fully managed, elastically scalable, and globally distributed database with a multi-model approach, and provides you with the ability to use document, key-value, wide-column, or graph-based data.
We will drill further into the multi-model capabilities and explore the options that are available to store and access data. Hopefully, it can help you make an informed decision on which API are the right choice.
- Core (SQL) API: Flexibility of a NoSQL document store combined with the power of SQL for querying.
- MongoDB API: Supports the MongoDB wire protocol so that existing MongoDB clients continue to work with Azure Cosmos DB as if they are running against an actual MongoDB database.
- Cassandra API: Supports the Cassandra wire protocol so that existing Apache drivers compliant with CQLv4 continue to work with Azure Cosmos DB as if they are running against an actual Cassandra database.
- Gremlin API: Supports graph data with Apache TinkerPop (a graph computing framework) and the Gremlin query language.
- Table API: Provides premium capabilities for applications written for Azure Table storage.
You can read further on the some of the Key Benefits here.
Before diving in, let's look at some of the common scenarios for which Azure Cosmos DB is used for. This is by no means an exhaustive list.
Modern gaming services need to deliver customized and personalized content like in-game stats, social media integration, and high-score leaderboards. As a fully managed offering, Azure Cosmos DB requires minimal setup and management to allow for rapid iteration, and reduced time to market.
E-commerce platforms and retail applications store catalog data and event sourcing in order for processing pipelines.
Applications that provide personalized experiences can be quite complex. They need to be able to retrieve user-specific settings effectively to render UI elements and experiences quickly.
There is no perfect formula! At times, the choice will be clear, but other scenarios may require some analysis.
A rather obvious point to consider is whether there are existing applications that use any of the supported APIs via the wire protocol (i.e. Cassandra and MongoDB). If the answer is yes, you should consider using the specific Azure Cosmos DB API, as that will reduce your migration tasks, and make the best use of previous experience in your team.
If you expect the schema to change a lot, you may want to leverage a document database, making Core (SQL) a good choice (although the MongoDB API should also be considered).
If your data model consists of relationships between entities with associated metadata, you're better off using the graph support in Azure Cosmos DB Gremlin API.
If you are currently using Azure Table Storage, the Core (SQL) API would be a better choice, as it offers a richer query experience, with improved indexing over the Table API. If you don't want to rewrite your application, consider migrating to the Azure Cosmos DB Table API.
Let's look at each of the APIs and apply some of this.
Core (SQL) API is the default Azure Cosmos DB API. You can store data using JSON documents to represent your data, but there are a couple ways you can retrieve it:
- SQL queries: Write queries using the Structured Query Language (SQL) as a JSON query language
- Point reads: Key/value style lookup on a single item ID and partition key (these are cheaper and faster than SQL queries)
For many applications, the semi-structured data model can provide the flexibility they need. For example, an e-commerce set up with a large number of product categories will need to add new product categories and support operations (search, sort, etc.) across many product attributes. You can model this with a relational database, but, the constantly evolving product categories may require downtime to update the table schemas, queries, and databases.
With the Core (SQL), introducing a new product category is as simple as adding a document for the new product without schema changes or downtimes. The Azure Cosmos DB MongoDB API is also suitable for these requirements, but Core (SQL) API has an advantage since it supports SQL-like queries on top of a flexible data model.
Consider the example of a blogging platform where users can create posts, like, and add comments to those posts.
You can represent a
post as such:
As well as
To handle likes and comments, you can use a stored procedure:
If you want to learn more, refer to Working with JSON in Azure Cosmos DB
Azure Cosmos DB implements the wire protocol for MongoDB which allows transparent compatibility with native MongoDB client SDKs, drivers, and tools. This means that any MongoDB client driver (as well as existing tools such as Studio 3T) that understands this protocol version should be able to natively connect to Cosmos DB.
For example, if you have an existing MongoDB database as the data store for purchase orders processing application to capture failed and partial orders, fulfillment data, shipping status with different formats. Due to increasing data volumes, you want to migrate to a scalable cloud-based solution and continue using MongoDB. It makes perfect sense to use Azure Cosmos DB's API for MongoDB. But you will want to do so with minimal code changes to the existing application and migrate the current data with as little downtime as possible.
With Azure Database Migration Service, you can perform an online (minimal downtime) migration and elastically scale the provisioned throughput and storage for your Cosmos databases based on your need and pay only for the throughput and storage you need. This leads to significant cost savings.
Once you've migrated your data, you can continue to use your existing application. Here is an example using MongoDB .NET driver:
Initialize the client:
Retrieve the database and the collection:
Insert a task into the collection:
Retrieve all tasks:
You should also carefully consider how MongoDB write/read concerns are mapped to the Azure Cosmos consistency levels, manage indexing, and leverage Change feed support.
Please note that the Core (SQL) API would have been an appropriate choice if there wasn't a requirement to reuse existing code and import data from the existing MongoDB database.
If your application needs to handle high-volume real-time data, Apache Cassandra is an ideal choice. With Azure Cosmos DB Cassandra API, you can use existing Apache drivers compliant with CQLv4, and in most cases, you should be able to switch from using Apache Cassandra to using Azure Cosmos DB's Cassandra API, by just changing a connection string. You can also continue to use Cassandra-based tools such as
For advanced analytics use cases, you should combine Azure Cosmos DB Cassandra API with Apache Spark. You can use the familiar Spark connector for Cassandra to connect to Azure Cosmos DB Cassandra API. Additionally, you will also need the Azure-Cosmos-Cassandra-Spark-helper helper library from Azure Cosmos DB for features such as custom connection factories and retry policies.
To connect to Cosmos DB, you can use the Scala API as such:
You can perform aggregations, such as
As part of a personalized customer experience, your application may need to provide recommendations for products on the website. For example, "people who bought product X also bought product Y." Your use cases might also need to predict customer behavior, or connect people with others with similar interests.
Azure Cosmos DB supports Apache Tinkerpop's graph traversal language (known as Gremlin). There are many use cases that give rise to common problems associated with a lack of flexibility and relational approaches. For example, managing connected social networks, Geospatial use cases such as finding a location of interest within an area or locate the shortest/optimal route between two location, or modelling network and connections between IoT devices as a graph in order to build a better understanding of the state of your devices and assets. On top of this, you can continue to enjoy features common to all the Azure Cosmos DB APIs such as global distribution, elastic scaling of storage and throughput, automatic indexing and query, and tuneable consistency levels.
For example, you could use the following commands to add three vertices for product and two edges for related-purchases to a graph:
Then, you can query the additional products that were purchased along with the 'Industrial Saw':
Here is what the results will look like:
In addition to traditional clients, you can use the graph bulk executor .NET library to perform bulk operations in Azure Cosmos DB Gremlin API. Using this will improve the data migration efficiency as compared to using a Gremlin client. Traditionally, inserting data with Gremlin will require the application to send a query at a time that will need to be validated, evaluated, and then executed to create the data. The bulk executor library will handle the validation in the application and send multiple graph objects at a time for each network request. Here is an example of how to create Vertices and Edges:
Although it is possible to model and store this data using the Core (SQL) API as JSON documents, it's not suitable for queries that need to determine relationships between entities (e.g. products).
If you have existing applications written for Azure Table storage, they can be migrated to Azure Cosmos DB by using the Table API with no code changes and take advantage of premium capabilities.
Moving your database from Azure Table Storage into Azure Cosmos DB with a low throughput could bring lots of benefits in terms of latency (single-digit millisecond for reads and writes), throughput, global distribution, comprehensive SLAs, and as well as cost (you can use consumption-based or provisioned capacity modes. Storing table data in Cosmos DB automatically indexes all the properties (without an index management overhead) as opposed to Table Storage that only allows for indexing on the Partition and Row keys.
The Table API has client SDKs available for .NET, Java, Python, and Node.js. For example, with the Java client, use a connection string to store the table endpoint and credentials:
.. and perform CRUD (create, read, update, delete) operations as such:
Refer to Where is Table API not identical with Azure Table storage behavior? for more details.
Azure Cosmos DB supports offers a lot of options and flexibility in terms of APIs, and has its pros and cons depending on your use-case and requirements. Although the Core (SQL) is quite versatile and applicable to a wide range of scenarios, if your data is better represented in terms of relationships, then the Gremlin (graph) API is a suitable choice.
In the case that you have existing applications using Cassandra or MongoDB, then migrating them to the respective Azure Cosmos DB APIs provides a path of least resistance with added benefits. In the case that you want to migrate from Azure Table Storage and do not want to refactor your application to use the Core (SQL) API, remember that you have the option of choosing the Azure Cosmos DB Table API, that can provide API compatibility with Table Storage as well as capabilities such as guaranteed high availability, automatic secondary indexing and much more!
Published at DZone with permission of Abhishek Gupta, DZone MVB. See the original article here.
Opinions expressed by DZone contributors are their own.