DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Please enter at least three characters to search
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Zones

Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks

Modernize your data layer. Learn how to design cloud-native database architectures to meet the evolving demands of AI and GenAI workkloads.

Secure your stack and shape the future! Help dev teams across the globe navigate their software supply chain security challenges.

Releasing software shouldn't be stressful or risky. Learn how to leverage progressive delivery techniques to ensure safer deployments.

Avoid machine learning mistakes and boost model performance! Discover key ML patterns, anti-patterns, data strategies, and more.

Oops! Something Went Wrong

The likes didn't load as expected. Please refresh the page and try again.

  1. DZone
  2. Data Engineering
  3. Databases
  4. Azure Cosmos DB — A to Z

Azure Cosmos DB — A to Z

In this article, explore Azure Cosmos DB and look at consistency, automatic indexing, and more.

By 
Rajat Toshniwal user avatar
Rajat Toshniwal
·
Dec. 13, 19 · Tutorial
Likes (5)
Comment (0)

Save
Tweet
Share
21.07K Views

Join the DZone community and get the full member experience.

Join For Free

Azure Cosmos DB

Introduction

Cosmos DB can be considered one of the most lethal weapons in Azure’s arsenal. It comes with a bunch of features that makes this service stand out among its various database offerings.

Recently, I utilized it for a flash sale requirement, where the hits-per-second requirement was more than 6k, the application was storing the users and orders details from multiple regions in real-time.

We evaluated multiple databases and settled on Cosmos DB based on its high scalability, ease of use and globally distributed approach. This can be a good choice for highly data-intensive systems like IoT and telematics, online gaming applications, and more. In this blog, I've touched upon the concepts of Cosmos DB.

Cosmos DB Features

Here I’m highlighting some of the features that will help you in identifying whether it's a good fit for your use case or not.



  • Globally Distributed: Cosmos DB can be deployed across regions. Suppose you are currently running only with Australia East; this can easily be replicated to other regions like Australia Central or US Central etc. It’s a multi-master setup.
  • Highly Scalable: The read and write capacity of the database can be easily managed based on RUs (Read units). RUs are described later in this tutorial. This will be of help if you want to scale for some event like a flash sale.
  • Low Latency: Azure Cosmos DB guarantees end-to-end latency of reads under 10ms and indexed writes under 15ms at the 99th percentile within the same Azure region
  • Multi-Model: Cosmos DB is a multi-model database, meaning it can be used for storing data in Key-value Pair, Document-based, Germlin, and Column Family-based databases. There are multiple models that can be utilized.
  • Cassandra API and MongoDB API: If you are already using Cassandra or MongoDB, then you can migrate to Cosmos DB without much changes in the code. A wide column format can be used.
  • SQL API: If you are familiar with SQL queries or if you’re starting from scratch, you can go with SQL API. It is the most native version of working with Cosmos.
  • Storage Table: Store your data as key and value pairs. This solution uses the standard Azure table API, which follows the same schema and design as Azure table storage.
  • Gremlin API: This model is used to create the graph databases to hold the relationships. This is an open-source API based on Apache TinkerPop.
  • High Availability: Data can be replicated across regions. Within single regions, the SLA is 99.99%, whereas for multi-region setup SLA is 99.999%. Under the hood, data is distributed across partitions, within each region this partition is replicated.
  • Consistency: Based on the CAP theorem, a system can have only 2 of the 3: Consistency, Availability, and Partition Tolerance. Developers need to understand the trade-offs. Cosmos DB provides multiple consistency options.

Consistency Levels

Consistency levels

Consistency levels


  • Strong Consistency: User never sees an uncommitted or partial write. Reads are guaranteed to return the most recent committed version of an item. Highest consistency and low throughput and as you scale, write latency will further deteriorate.
  • Bounded Staleness: Instead of 100% consistency, this level allows a lag in operations or time. It is kind of a MySQL dirty read. For example, you can configure the database to allow a lag of 5 seconds. It is eventual, but you can control the eventuality. Consistency is still high, but throughput is low.
  • Session Consistency: It is the default consistency. This is bounded by session; it can be considered strongly consistent within that session and anything outside that session may or may not lag. This provides good performance and good throughput.
  • Consistent Prefix: Reads are consistent to a specific time, but not necessarily the current timestamp. It is similar to Bounded Staleness in that you neither control nor know the lag if there is any. Normally you can see the delays in seconds. Good performance and excellent availability.
  • Eventual Consistency: No guarantee how much time it will take to become consistent. Moreover, updates aren’t guaranteed to come in order. This is also the highest performing, lowest cost, and highest availability level.


Automatic Indexing

Indexing policy is defined over a container, which explains how the data within that container can be indexed. The default is to index every property of each item. This can be customized by specifying the required indices.

The concept of an index is pretty similar to any other database: it enhances the read performance and negatively affects the write performance. Be cautious and identify the right indexes for your application. More indexes also mean more data and then more RUs, which in turn means less performance and more cost. You can include or exclude the properties from indexing by defining them in the include path and exclude path.

Include the root path to selectively exclude paths that don't need to be indexed. This is the recommended approach as it lets Azure Cosmos DB proactively index any new property that may be added to your model. Exclude the root path to selectively include paths that need to be indexed.

Indexing Mode


  • Consistence: The index is updated synchronously as you create, update or delete items.
  • None: Indexing is disabled on the container when you don't need the secondary index.

Indexes Types


  • Hash: Supports efficient equality queries.
  • Range: Supports efficient equality queries, range queries, and order by queries.
  • Spatial: Supports efficient spatial (within and distance) queries. The data type can be Point, Polygon, or LineString.

Partition Keys

Image title

Image from Azure


Behind the scenes, Cosmos DB uses distributed data algorithm to increase the RUs/performance of the database, every container is divided into logical partitions based on the partition key. The hash algorithm is used to divide and distribute the data across multiple containers. Further, these logical containers are mapped with multiple physical containers (hosted on multiple servers).

Placement of Logical partitions over physical partitions is handled by Cosmos DB to efficiently satisfy the scalability and performance needs of the container. As the RU needs increase, it increases the number of Physical partitions (More Servers).

As the best practice, you must choose a partition key that has a wide range of values and access patterns that are evenly spread across logical partitions. For example, if you are collecting some data from multiple schools, but 75% of your data is collected from one school only, then, it’s not a good idea to create the school as the partition key.

Throughput

Cost is very much dependent on the throughput you have provisioned. Remember it's not whatever you are consuming, it is whatever you have provisioned. Throughput is further expressed as RUs (Read Units); this defines the read and write capabilities per container or database. It abstracts the system resources such as CPU, IOPS, and memory that are required to perform the database operations.

The following are the parameters that define the RUs requirement.
Image title

Image from Azure

Item Size 

• Indexing

• Item Property Count

• Indexed Properties

• Consistency level

• Query Complexity

• Script Usage

Cosmos DB will throttle the requests if you try to run any load where the RU/s requirement is more than the defined due to rate limiting. Cosmos DB throws "Too many Request" responses with a status code of 429. Along with the response, it also returns a header with a value, which defines after how much time you can retry.

Programmatic Access Sample (Python)

In this section, I have provided a brief introduction of the SQL APIs. SQL APIs supports the CRUD functionality. In case, you don't get the required SDK, you can switch to this mode.

The most prominent and tricky part of the API access is the creation of the authorization string. Entities required to create the string are:

  • Master Key: Best practice is to keep it inside the AzureVault. During the operations, you can call the Vault APIs to fetch the master keys and keep it in the memory.
  • ResourceID: This is the path of the object, which you want to reference or more precisely the parent path of the object. Example for docs — the parent will be the collection, so it should be the complete path of the collection. "/dbs/{databasename}/colls/{collectionname}/docs"
  • Resource Type: Type of the resource you want to access. For the database, it is dbs, for collections, it is colls, and for documents, it is docs.
  • Verb: HTTP verb like "Get," "POST," or "PUT."
  • x-ms-date: This should be UTC date and time the message was sent. Format "Fri, 29 Nov 2019 08:15:00 GMT".

Authorization String Creation

  1. Create a payload string: Verb.toLowerCase() + "\n" + ResourceType.toLowerCase() + "\n" + ResourceLink + "\n" + Date.toLowerCase() + "\n" + "" + "\n"
  2. Encode the payload string with base64
  3. Generate a signature by applying an HMAC-SHA512 hash function
  4. URL encode the string type={typeoftoken}&ver={tokenversion}&sig={hashsignature}

Samples

Authentication String Generation

Java
 




xxxxxxxxxx
1
23


 
1
#master keys
2
key = 'abcdefghijklmnopqrstuvwxyz=='
3

          
4
#Date Generation UTC Format.
5
now = datetime.utcnow().strftime('%a, %d %b %Y %H:%M:00 GMT')
6

          
7
print(now)
8

          
9
#Verb is "Get", resourceID is "dbs/testcosmosdb/colls/testcosmostable", resource type is "docs"
10
payload = ('get\ndocs\ndbs/testcosmosdb/colls/testcosmostable\n' + now + '\n\n').lower()
11

          
12
#Base64 Encoding
13
payload = bytes(payload, encoding='utf8')
14

          
15
key = base64.b64decode(key.encode('utf-8'))
16

          
17
#HMAC hashing
18
signature = base64.b64encode(hmac.new(key, msg = payload, digestmod = hashlib.sha256).digest()).decode()
19

          
20
#URL Encoding
21
authStr = urllib.parse.quote('type=master&ver=1.0&sig={}'.format(signature))
22

          
23
print(authStr)



Database Creation

Java
 




xxxxxxxxxx
1
25


 
1
import requests
2
import json
3
import hmac
4
import hashlib
5
import base64
6
from datetime import datetime
7
import urllib
8
key = 'abcdefghijklmnopqrstuvwxyz=='
9
now = datetime.utcnow().strftime('%a, %d %b %Y %H:%M:00 GMT')
10
payload = ('post\ndbs\n\n' + now + '\n\n').lower()
11
payload = bytes(payload, encoding='utf8')
12
key = base64.b64decode(key.encode('utf-8'))
13
signature = base64.b64encode(hmac.new(key, msg = payload, digestmod = hashlib.sha256).digest()).decode()
14
authStr = urllib.parse.quote('type=master&ver=1.0&sig={}'.format(signature))
15
print(authStr)
16
headers = {
17
    'Authorization': authStr,
18
    "x-ms-date": now,
19
    "x-ms-version": "2017-02-22"
20
}
21
data={"id":"mydb"}
22
url = 'https://mycosmosdb.documents.azure.com/dbs'
23
res = requests.post(url, headers = headers ,data = json.dumps(data))
24

          
25
print(res.content)



Collection Creation with custom Index and partition keys

Java
 




xxxxxxxxxx
1
69


 
1
import requests
2
import json
3
import hmac
4
import hashlib
5
import base64
6
from datetime import datetime
7
import urllib
8
key = 'abcdefghijklmnopqrstuvwxyz=='
9
now = datetime.utcnow().strftime('%a, %d %b %Y %H:%M:00 GMT')
10
payload = ('post\ncolls\ndbs/mydb\n' + now + '\n\n').lower()
11
payload = bytes(payload, encoding='utf8')
12
key = base64.b64decode(key.encode('utf-8'))
13
signature = base64.b64encode(hmac.new(key, msg = payload, digestmod = hashlib.sha256).digest()).decode()
14
authStr = urllib.parse.quote('type=master&ver=1.0&sig={}'.format(signature))
15
//Custom Index
16
headers = {
17
    'Authorization': authStr,
18
    "x-ms-date": now,
19
    "x-ms-version": "2017-02-22"
20
}
21
data={  "id": "mytable1",  
22
  "indexingPolicy": {  
23
    "automatic": True,  
24
    "indexingMode": "Consistent",  
25
    "includedPaths": [  
26
      {  
27
        "path":"/userId/?",  
28
        "indexes": [  
29
          {  
30
            "dataType": "String",  
31
            "precision": -1,  
32
            "kind": "Range"  
33
          }  
34
        ]  
35
      },
36
      {
37
      "path": "/\"_ts\"/?",
38
      "indexes": [
39
        {
40
          "kind": "Range",
41
          "dataType": "Number",
42
          "precision": -1
43
        },
44
        {
45
          "kind": "Hash",
46
          "dataType": "String",
47
          "precision": 3
48
        }
49
      ]
50
    } 
51
    ],
52
  "excludedPaths":[
53
    {
54
      "path":"/*",
55
    }
56
   ]
57
},  
58
  "partitionKey": {  
59
    "paths": [  
60
      "/userId"  
61
    ],  
62
    "kind": "Hash",
63
     "Version": 2
64
  }  
65
}
66
url = 'https://mycosmosdb.documents.azure.com/dbs/mydb/colls'
67
res = requests.post(url, headers = headers ,data = json.dumps(data))
68

          
69
print(res.content)



Document Creation

Java
 




xxxxxxxxxx
1
25


 
1
import requests
2
import json
3
import hmac
4
import hashlib
5
import base64
6
from datetime import datetime
7
import urllib
8
key = 'abcdefghijklmnopqrstuvwxyz=='
9
now = datetime.utcnow().strftime('%a, %d %b %Y %H:%M:00 GMT')
10
payload = ('post\ndocs\ndbs/mydb/colls/mytable1\n' + now + '\n\n').lower()
11
payload = bytes(payload, encoding='utf8')
12
key = base64.b64decode(key.encode('utf-8'))
13
signature = base64.b64encode(hmac.new(key, msg = payload, digestmod = hashlib.sha256).digest()).decode()
14
authStr = urllib.parse.quote('type=master&ver=1.0&sig={}'.format(signature))
15
headers = {
16
    'Authorization': authStr,
17
    "x-ms-date": now,
18
    "x-ms-version": "2017-02-22",
19
    "x-ms-documentdb-partitionkey": "[\"iasbjsb25\"]"
20
}
21
data={"id": "order1", "userId": "iasbjsb25"}
22
url = 'https://mycosmosdb.documents.azure.com/dbs/mydb/colls/mytable1/docs'
23
res = requests.post(url, headers = headers ,data = json.dumps(data))
24

          
25
print(res.content)



Read documents (Query)

Java
 




xxxxxxxxxx
1
31


 
1
import requests
2
import json
3
import hmac
4
import hashlib
5
import base64
6
from datetime import datetime
7
import urllib
8
key = 'abcdefghijklmnopqrstuvwxyz=='
9
now = datetime.utcnow().strftime('%a, %d %b %Y %H:%M:00 GMT')
10
payload = ('post\ndocs\ndbs/mydb/colls/mytable\n' + now + '\n\n').lower()
11
payload = bytes(payload, encoding='utf8')
12
key = base64.b64decode(key.encode('utf-8'))
13
signature = base64.b64encode(hmac.new(key, msg = payload, digestmod = hashlib.sha256).digest()).decode()
14
authStr = urllib.parse.quote('type=master&ver=1.0&sig={}'.format(signature))
15
headers = {
16
    'Authorization': authStr,
17
    "x-ms-date": now,
18
    "x-ms-version": "2018-12-31",
19
    "x-ms-documentdb-isquery": "True",
20
    "Content-Type":"application/query+json",
21
    "x-ms-documentdb-query-enablecrosspartition": "True"
22
}
23
data={ "query": "select * from mytable c",
24
"parameters":[{
25
"name": "@userId",
26
"value": "iasbjsb25"
27
}]}
28
url = 'https://mycosmosdb.documents.azure.com/dbs/mydb/colls/mytable/docs'
29
res = requests.post(url, headers = headers ,data = json.dumps(data))
30

          
31
print(res.content)



Read documents (By ID)

Java
 




x
30


 
1
import requests
2
import json
3
import hmac
4
import hashlib
5
import base64
6
from datetime import datetime
7
import urllib
8
key = 'abcdefghijklmnopqrstuvwxyz=='
9
now = datetime.utcnow().strftime('%a, %d %b %Y %H:%M:00 GMT')
10
print(now)
11
payload = ('get\ndocs\ndbs/mydb/colls/mytable/docs/order1\n' + now + '\n\n').lower()
12
payload = bytes(payload, encoding='utf8')
13
print("payload is \n" )
14
print(payload)
15
key = base64.b64decode(key.encode('utf-8'))
16
print("Key is \n")
17
print(key)
18
signature = base64.b64encode(hmac.new(key, msg = payload, digestmod = hashlib.sha256).digest()).decode()
19
authStr = urllib.parse.quote('type=master&ver=1.0&sig={}'.format(signature))
20

          
21
headers = {
22
    'Authorization': authStr,
23
    "x-ms-date": now,
24
    "x-ms-version": "2018-12-31",
25
    "x-ms-documentdb-partitionkey": "[\"iasbjsb25\"]"
26
}
27
url = 'https://mycosmosdb.documents.azure.com/dbs/mydb/colls/mytable/docs/order1'
28
res = requests.get(url, headers = headers)
29

          
30
print(res.content)



Recommendations 

  • Cosmos DB performance can be monitored from Metric Page or Azure Monitoring. Reports/Graphs can be fetched based on timeframes, regions, and containers.
  • Use Replication across the globe at least in two regions if the data is very critical.
  • Create alerts to check the performance and the RUs requirements continuously.
  • Create alerts to verify the attacks.
  • Automated backups are already there. Retention period and frequency can be increased by using the Datafactory.
  • Cosmos DB access key should be saved inside the vault and not inside the code.
  • Storage encryption is already managed by Microsoft.
  • Configure proper permissions to the users using IAM, follow the principle of least privilege.
    Default Firewall allows "All Network", this should be configured for whitelisting the specific IPs or limiting the access to particular VNets.
  • Geo-fencing can be done.
  • Data in-transit should be accessible through SSL only.

Further Reading

Experience Using Azure Cosmos DB in a Commercial Project


How to Use Caching With Azure Cosmos DB

Database Cosmos DB Cosmos (operating system) azure Data Types

Opinions expressed by DZone contributors are their own.

Partner Resources

×

    Daily Digest December 28, 2019

  • Azure Cosmos DB — A to Z

Likes
There are no likes...yet! 👀
Be the first to like this post!
It looks like you're not logged in.
Sign in to see who liked this post!