DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Please enter at least three characters to search
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Zones

Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks

Because the DevOps movement has redefined engineering responsibilities, SREs now have to become stewards of observability strategy.

Apache Cassandra combines the benefits of major NoSQL databases to support data management needs not covered by traditional RDBMS vendors.

The software you build is only as secure as the code that powers it. Learn how malicious code creeps into your software supply chain.

Generative AI has transformed nearly every industry. How can you leverage GenAI to improve your productivity and efficiency?

Related

  • A Developer's Guide to Database Sharding With MongoDB
  • Kafka Link: Ingesting Data From MongoDB to Capella Columnar
  • How To Convert MySQL Database to SQL Server
  • Harmonizing Space, Time, and Semantics: Navigating the Complexity of Geo-Distributed IoT Databases

Trending

  • Navigating and Modernizing Legacy Codebases: A Developer's Guide to AI-Assisted Code Understanding
  • Navigating Change Management: A Guide for Engineers
  • Analyzing Techniques to Provision Access via IDAM Models During Emergency and Disaster Response
  • Introducing Graph Concepts in Java With Eclipse JNoSQL, Part 2: Understanding Neo4j
  1. DZone
  2. Data Engineering
  3. Databases
  4. Understanding and Managing Disk Space on Your MongoDB Server

Understanding and Managing Disk Space on Your MongoDB Server

You need to be aware of MongoDB storage statistics and how you can compact or repair the database to handle fragmentation.

By 
Vishal Kumawat user avatar
Vishal Kumawat
·
Jul. 19, 17 · Tutorial
Likes (1)
Comment
Save
Tweet
Share
8.6K Views

Join the DZone community and get the full member experience.

Join For Free

Disk storage is a critical resource for any scalable database system. The performance of disk-based databases is dependent on how data is managed on the disk. Your MongoDB server supports various pluggable storage engines that handle the storage management. MongoDB storage engines initially store all documents sequentially. As the database grows, and multiple write operations run, this contiguous space gets fragmented into smaller blocks with chunks of free space in between. The usual solution is to increase the disk size in such situations; however, there are alternatives that can help you regain the free space without scaling the disk size. You need to be aware of MongoDB storage statistics and how you can compact or repair the database to handle fragmentation.

How Large Is Your Database, Really?

You should always keep an eye on the amount of free disk space on your production server. It would also be prudent to know your database size when you are paying for it on a cloud platform. MongoDB has a command db.stats() that can provide insights into the storage statistics of a MongoDB instance.

>db.stats()
{
	"db" : "test",
	"collections" : 5,
	"views" : 0,
	"objects" : 53829,
	"avgObjSize" : 43.555,
	"dataSize" : 2344556121,
	"storageSize" :3124416336,
	"numExtents" : 0,
	"indexes" : 7,
	"indexSize" : 8096876,
	"ok" : 1
}
  • dataSize: The total size in bytes of the uncompressed data held in this database.

  • storageSize: The total amount of disk space allocated to all collections in the database.

The response of db.stats() is dependent on the type of MongoDB engine. You can find a version-dependent description of above metrics in the MongoDB documentation.

Why the big difference between storageSize and dataSize? This is due to fragmentation of data files that was explained earlier. MongoDB tries to reuse free space in between fragmented data whenever possible and does not release it to the operating system. However, in WiredTiger, storageSize may be smaller than dataSize if compression is enabled.

In case a large chunk of data is deleted from a collection and the collection never uses the deleted space for new documents, this space needs to be returned to the operating system so that it can be used by other databases or collections. You will need to run a compact or repair operation in order to defragment the disk space and regain the usable free space.

Compacting MongoDB

The MongoDB compact operation rewrites all documents and indexes in a collection to contiguous blocks of disk space. However, this operation blocks all other operations on the database to which the collection belongs. So for a standalone server, it is recommended to run it during a maintenance window. For replica sets, you should run it in a rolling fashion for each shard. This means compacting all secondaries first, and then finally the primary. Thus, database availability would be not be affected. The syntax of the command is:

db.runCommand({compact: collection-name })
  • MMAPv1

    • Compaction operation defragments data files and indexes. However, it does not release space to the operating system. The operation is still useful to defragment and create more contiguous space for reuse by MongoDB. However, it is of no use though when the free disk space is very low.
    • An additional disk space up to 2GB  is required during the compaction operation.
    • A database level lock is held during the compaction operation.
  • WiredTiger

    • The WiredTiger engine provides compression by default which consumes less disk space than MMAPv1.
    • The compact process releases the free space to the operating system.
    • Minimal disk space is required to run the compact operation.
    • WiredTiger also blocks all operations on the database as it needs database level lock.

If you are running WiredTiger. We recommend you run the compact operation when the storage has reached 80% of the disk size. You can do this by triggering the compact operation from our details page.

Repairing MongoDB

The MongoDB repair operation repairs all errors and inconsistencies in data storage. It is similar to the fcsk command for a file system. This command ensures the data integrity after unexpected shutdowns or crashes. However, if journaling is enabled on the server, then there is no requirement of repair. The server uses journal to get into a clean state automatically after restart. If your database has been corrupted, then repairing the database would not save the corrupt data. Therefore, it is not recommended to use this operation for data recovery when you have other options. For MMAPv1, repairDatabase is the only way to reclaim disk space if you think that your database is not corrupted and have enough space required by repair operation. The syntax of the command is:

db.runCommand({repairDatabase: 1})
  • This command compacts all collections in the database and recreates all indexes.
  • The job requires free disk space equal to the size of your current data set plus 2 gigabytes.

At ScaleGrid, we use the repairDatabase operation to reclaim free space for MMAPv1 engine clusters.

Space (architecture) MongoDB Database operating system Data integrity

Published at DZone with permission of Vishal Kumawat, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

Related

  • A Developer's Guide to Database Sharding With MongoDB
  • Kafka Link: Ingesting Data From MongoDB to Capella Columnar
  • How To Convert MySQL Database to SQL Server
  • Harmonizing Space, Time, and Semantics: Navigating the Complexity of Geo-Distributed IoT Databases

Partner Resources

×

Comments
Oops! Something Went Wrong

The likes didn't load as expected. Please refresh the page and try again.

ABOUT US

  • About DZone
  • Support and feedback
  • Community research
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 100
  • Nashville, TN 37211
  • support@dzone.com

Let's be friends:

Likes
There are no likes...yet! 👀
Be the first to like this post!
It looks like you're not logged in.
Sign in to see who liked this post!