DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Please enter at least three characters to search
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Zones

Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks

The software you build is only as secure as the code that powers it. Learn how malicious code creeps into your software supply chain.

Apache Cassandra combines the benefits of major NoSQL databases to support data management needs not covered by traditional RDBMS vendors.

Generative AI has transformed nearly every industry. How can you leverage GenAI to improve your productivity and efficiency?

Modernize your data layer. Learn how to design cloud-native database architectures to meet the evolving demands of AI and GenAI workloads.

Related

  • Non-blocking Database Migrations
  • Building a Database Written in Node.js From the Ground Up
  • Manage Hierarchical Data in MongoDB With Spring
  • Understanding the Fan-Out/Fan-In API Integration Pattern

Trending

  • Cloud Security and Privacy: Best Practices to Mitigate the Risks
  • AI Speaks for the World... But Whose Humanity Does It Learn From?
  • Secrets Sprawl and AI: Why Your Non-Human Identities Need Attention Before You Deploy That LLM
  • Code Reviews: Building an AI-Powered GitHub Integration
  1. DZone
  2. Data Engineering
  3. Databases
  4. MongoDB Shard Key Examples

MongoDB Shard Key Examples

Sharding is one of MongoDB's more useful features. But given the options, here is a breakdown of suggestions and scenarios to pick the most efficient keys.

By 
Darel Lasrado user avatar
Darel Lasrado
·
Jun. 21, 16 · Tutorial
Likes (5)
Comment
Save
Tweet
Share
14.1K Views

Join the DZone community and get the full member experience.

Join For Free

MongoDB enables horizontal scaling of the database using Shards. Sharding is generally used when your data gets large enough that it cannot fit on a single machine or the query performance starts degrading due to the increase in data.

Note that sharding is not a solution for some slow-running queries. For such cases, the problem might be in the structure of the data or the indexes which are created on the collection. Sharding does not solve the slowness created by poor indexing or lack of indexing. Sharding should only be considered when the data is larger than your resources on a machine and adding more resources on that machine is not feasible or is more expensive.

If you have decided to shard the MongoDB cluster, then it's very important to choose a good shard key. An unwisely chosen shard key nullifies all the benefits of sharding, and you may end up having lot more data on one shard than the other. Also, you must note that in MongoDB, you cannot change the shard key automatically at a later stage. So, once a key is selected, it will not change.

There are two types of sharding in MongoDB, one is Range Based and another is Hash Based.

MongoDB document explains what should you consider when choosing a shard key. You can have a look at it here — Choose a shard key. The shard key has to be an indexed field.

But for me, having some examples really helps understand the concept or theory. So I have written some down and shown how the shard key is chosen for each of them.

Note that you can shard by collection. Hence, if a collection is small, such as a collection of categories or user roles, don't shard it. Only shard large collections.

1. E-Commerce Shopping Order/Cart.

If you are working on an e-commerce application that stores the user orders in MongoDB (also known as a cart), then generally you have to retrieve the data by the user. Hence, you want all orders related to a given user to be in one shard. For this case, user_key would be a good shard key.

2. B2B Order Management

If your application manages orders or purchases by another organization, then you will generally retrieve all orders placed by that organization in one query. The user who actually placed the order may not matter to you.

Example: If there are three users from Organization A who make purchases on the organization's behalf, while displaying on the dashboard, you might be retrieving data only by organization key and not by individual user key. In this scenario, organization_key, would be a good choice.

But consider if you need to query data based on organization for a few reports or also based on user for a few other reports. If the user belongs to only one organization, then sharding by organization_key is fine. But if the user belongs to multiple organizations, then shard by user_key.

3. Product Data

If you have a lot of product data, then you can shard it using category or product type. This will help if you query based on a particular product type. But if your category list is very limited, then it is not advisable to use it as a shard key. In such scenarios, you should use hash_id as the shard key.

4. Blogs or Posts

If it's an application that has only one or two bloggers, then I believe you don't need to consider sharding. I don't think one person can create 10 million blogs. I checked with Superman, he is not interested in blogging.

So, if you have a blogging platform, then the shard key for the blog collection can be user_key again. All the shards by a particular user will be on one shard and can easy retrieved to show all blogs by a given user.

5. Page Access

If you are collecting page access details like time series and appending the information in MongoDB, the ideal shard key would be page_id. I am suggesting page_id here based on what reports and analytics I can think of. If your use case is user specific, then page_id may not fit. For me, such data is more useful to analyze the page performance, popularity etc.

6. Invoices or Payments

Invoices and payments are generally by user, if your application is B2C, or by organization, if your application is B2B. So, if it is B2C, use user_key as the shard key and if its B2B, use organization_key as shard key.

7. Hotel or Flight Reservations

Hotel or flight reservations can be tricky, as the reservation can be done by anonymous users and you may not have a user_key. The other option is hash_id, which randomly distributes the data across the shards. But a few performance runs have shown that sharding by hotel_property_id or flight_id is more efficient than hash_id. Most of the business use case requires the data to be fetched by hotel_id. Since user_key is not available for anonymous users, hotel_id or flight_id would be a better choice from my perspective.

Compound Shard Key

MongoDB also supports compound shard keys on indexed compound fields. In any of my projects so far, I have not found a good reason to use a compound shard key. I will update if I find a good one.

I hope this article helps at least some of you in deciding the right shard key.

Related Refcard: MongoDB

Shard (database architecture) MongoDB Data (computing) application Use case Database Blog Time series Machine IT

Opinions expressed by DZone contributors are their own.

Related

  • Non-blocking Database Migrations
  • Building a Database Written in Node.js From the Ground Up
  • Manage Hierarchical Data in MongoDB With Spring
  • Understanding the Fan-Out/Fan-In API Integration Pattern

Partner Resources

×

Comments
Oops! Something Went Wrong

The likes didn't load as expected. Please refresh the page and try again.

ABOUT US

  • About DZone
  • Support and feedback
  • Community research
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 100
  • Nashville, TN 37211
  • support@dzone.com

Let's be friends:

Likes
There are no likes...yet! 👀
Be the first to like this post!
It looks like you're not logged in.
Sign in to see who liked this post!