DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports
Events Video Library
Over 2 million developers have joined DZone. Join Today! Thanks for visiting DZone today,
Edit Profile Manage Email Subscriptions Moderation Admin Console How to Post to DZone Article Submission Guidelines
View Profile
Sign Out
Refcards
Trend Reports
Events
View Events Video Library
Zones
Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks

Integrating PostgreSQL Databases with ANF: Join this workshop to learn how to create a PostgreSQL server using Instaclustr’s managed service

Mobile Database Essentials: Assess data needs, storage requirements, and more when leveraging databases for cloud and edge applications.

Monitoring and Observability for LLMs: Datadog and Google Cloud discuss how to achieve optimal AI model performance.

Automated Testing: The latest on architecture, TDD, and the benefits of AI and low-code tools.

Related

  • Best Practices for Exchange Server 2019 Storage
  • Mastering Database Unit Testing: A Full Guide and 5 Essential Tools
  • How to Move System Databases to Different Locations in SQL Server on Linux
  • Using the PostgreSQL Pager With MariaDB Xpand

Trending

  • Monkey-Patching in Java
  • REST vs. Message Brokers: Choosing the Right Communication
  • Software Verification and Validation With Simple Examples
  • Build a Serverless App Fast With Zipper: Write TypeScript, Offload Everything Else
  1. DZone
  2. Data Engineering
  3. Databases
  4. Understanding and Managing Disk Space on Your MongoDB Server

Understanding and Managing Disk Space on Your MongoDB Server

You need to be aware of MongoDB storage statistics and how you can compact or repair the database to handle fragmentation.

Vishal Kumawat user avatar by
Vishal Kumawat
·
Jul. 19, 17 · Tutorial
Like (1)
Save
Tweet
Share
7.77K Views

Join the DZone community and get the full member experience.

Join For Free

Disk storage is a critical resource for any scalable database system. The performance of disk-based databases is dependent on how data is managed on the disk. Your MongoDB server supports various pluggable storage engines that handle the storage management. MongoDB storage engines initially store all documents sequentially. As the database grows, and multiple write operations run, this contiguous space gets fragmented into smaller blocks with chunks of free space in between. The usual solution is to increase the disk size in such situations; however, there are alternatives that can help you regain the free space without scaling the disk size. You need to be aware of MongoDB storage statistics and how you can compact or repair the database to handle fragmentation.

How Large Is Your Database, Really?

You should always keep an eye on the amount of free disk space on your production server. It would also be prudent to know your database size when you are paying for it on a cloud platform. MongoDB has a command db.stats() that can provide insights into the storage statistics of a MongoDB instance.

>db.stats()
{
	"db" : "test",
	"collections" : 5,
	"views" : 0,
	"objects" : 53829,
	"avgObjSize" : 43.555,
	"dataSize" : 2344556121,
	"storageSize" :3124416336,
	"numExtents" : 0,
	"indexes" : 7,
	"indexSize" : 8096876,
	"ok" : 1
}
  • dataSize: The total size in bytes of the uncompressed data held in this database.

  • storageSize: The total amount of disk space allocated to all collections in the database.

The response of db.stats() is dependent on the type of MongoDB engine. You can find a version-dependent description of above metrics in the MongoDB documentation.

Why the big difference between storageSize and dataSize? This is due to fragmentation of data files that was explained earlier. MongoDB tries to reuse free space in between fragmented data whenever possible and does not release it to the operating system. However, in WiredTiger, storageSize may be smaller than dataSize if compression is enabled.

In case a large chunk of data is deleted from a collection and the collection never uses the deleted space for new documents, this space needs to be returned to the operating system so that it can be used by other databases or collections. You will need to run a compact or repair operation in order to defragment the disk space and regain the usable free space.

Compacting MongoDB

The MongoDB compact operation rewrites all documents and indexes in a collection to contiguous blocks of disk space. However, this operation blocks all other operations on the database to which the collection belongs. So for a standalone server, it is recommended to run it during a maintenance window. For replica sets, you should run it in a rolling fashion for each shard. This means compacting all secondaries first, and then finally the primary. Thus, database availability would be not be affected. The syntax of the command is:

db.runCommand({compact: collection-name })
  • MMAPv1

    • Compaction operation defragments data files and indexes. However, it does not release space to the operating system. The operation is still useful to defragment and create more contiguous space for reuse by MongoDB. However, it is of no use though when the free disk space is very low.
    • An additional disk space up to 2GB  is required during the compaction operation.
    • A database level lock is held during the compaction operation.
  • WiredTiger

    • The WiredTiger engine provides compression by default which consumes less disk space than MMAPv1.
    • The compact process releases the free space to the operating system.
    • Minimal disk space is required to run the compact operation.
    • WiredTiger also blocks all operations on the database as it needs database level lock.

If you are running WiredTiger. We recommend you run the compact operation when the storage has reached 80% of the disk size. You can do this by triggering the compact operation from our details page.

Repairing MongoDB

The MongoDB repair operation repairs all errors and inconsistencies in data storage. It is similar to the fcsk command for a file system. This command ensures the data integrity after unexpected shutdowns or crashes. However, if journaling is enabled on the server, then there is no requirement of repair. The server uses journal to get into a clean state automatically after restart. If your database has been corrupted, then repairing the database would not save the corrupt data. Therefore, it is not recommended to use this operation for data recovery when you have other options. For MMAPv1, repairDatabase is the only way to reclaim disk space if you think that your database is not corrupted and have enough space required by repair operation. The syntax of the command is:

db.runCommand({repairDatabase: 1})
  • This command compacts all collections in the database and recreates all indexes.
  • The job requires free disk space equal to the size of your current data set plus 2 gigabytes.

At ScaleGrid, we use the repairDatabase operation to reclaim free space for MMAPv1 engine clusters.

Space (architecture) MongoDB Database operating system Data integrity

Published at DZone with permission of Vishal Kumawat, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

Related

  • Best Practices for Exchange Server 2019 Storage
  • Mastering Database Unit Testing: A Full Guide and 5 Essential Tools
  • How to Move System Databases to Different Locations in SQL Server on Linux
  • Using the PostgreSQL Pager With MariaDB Xpand

Comments

Partner Resources

X

ABOUT US

  • About DZone
  • Send feedback
  • Careers
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 100
  • Nashville, TN 37211
  • support@dzone.com

Let's be friends: