Understanding and Managing Disk Space on Your MongoDB Server
You need to be aware of MongoDB storage statistics and how you can compact or repair the database to handle fragmentation.
Join the DZone community and get the full member experience.
Join For FreeDisk storage is a critical resource for any scalable database system. The performance of disk-based databases is dependent on how data is managed on the disk. Your MongoDB server supports various pluggable storage engines that handle the storage management. MongoDB storage engines initially store all documents sequentially. As the database grows, and multiple write operations run, this contiguous space gets fragmented into smaller blocks with chunks of free space in between. The usual solution is to increase the disk size in such situations; however, there are alternatives that can help you regain the free space without scaling the disk size. You need to be aware of MongoDB storage statistics and how you can compact or repair the database to handle fragmentation.
How Large Is Your Database, Really?
You should always keep an eye on the amount of free disk space on your production server. It would also be prudent to know your database size when you are paying for it on a cloud platform. MongoDB has a command db.stats()
that can provide insights into the storage statistics of a MongoDB instance.
>db.stats()
{
"db" : "test",
"collections" : 5,
"views" : 0,
"objects" : 53829,
"avgObjSize" : 43.555,
"dataSize" : 2344556121,
"storageSize" :3124416336,
"numExtents" : 0,
"indexes" : 7,
"indexSize" : 8096876,
"ok" : 1
}
dataSize
: The total size in bytes of the uncompressed data held in this database.storageSize
: The total amount of disk space allocated to all collections in the database.
The response of db.stats()
is dependent on the type of MongoDB engine. You can find a version-dependent description of above metrics in the MongoDB documentation.
Why the big difference between storageSize
and dataSize
? This is due to fragmentation of data files that was explained earlier. MongoDB tries to reuse free space in between fragmented data whenever possible and does not release it to the operating system. However, in WiredTiger, storageSize
may be smaller than dataSize
if compression is enabled.
In case a large chunk of data is deleted from a collection and the collection never uses the deleted space for new documents, this space needs to be returned to the operating system so that it can be used by other databases or collections. You will need to run a compact
or repair
operation in order to defragment the disk space and regain the usable free space.
Compacting MongoDB
The MongoDB compact operation rewrites all documents and indexes in a collection to contiguous blocks of disk space. However, this operation blocks all other operations on the database to which the collection belongs. So for a standalone server, it is recommended to run it during a maintenance window. For replica sets, you should run it in a rolling fashion for each shard. This means compacting all secondaries first, and then finally the primary. Thus, database availability would be not be affected. The syntax of the command is:
db.runCommand({compact: collection-name })
-
- Compaction operation defragments data files and indexes. However, it does not release space to the operating system. The operation is still useful to defragment and create more contiguous space for reuse by MongoDB. However, it is of no use though when the free disk space is very low.
- An additional disk space up to 2GB is required during the compaction operation.
- A database level lock is held during the compaction operation.
-
- The WiredTiger engine provides compression by default which consumes less disk space than MMAPv1.
- The compact process releases the free space to the operating system.
- Minimal disk space is required to run the compact operation.
- WiredTiger also blocks all operations on the database as it needs database level lock.
If you are running WiredTiger. We recommend you run the compact operation when the storage has reached 80% of the disk size. You can do this by triggering the compact
operation from our details page.
Repairing MongoDB
The MongoDB repair
operation repairs all errors and inconsistencies in data storage. It is similar to the fcsk
command for a file system. This command ensures the data integrity after unexpected shutdowns or crashes. However, if journaling is enabled on the server, then there is no requirement of repair. The server uses journal
to get into a clean state automatically after restart. If your database has been corrupted, then repairing the database would not save the corrupt data. Therefore, it is not recommended to use this operation for data recovery when you have other options. For MMAPv1, repairDatabase
is the only way to reclaim disk space if you think that your database is not corrupted and have enough space required by repair operation. The syntax of the command is:
db.runCommand({repairDatabase: 1})
- This command compacts all collections in the database and recreates all indexes.
- The job requires free disk space equal to the size of your current data set plus 2 gigabytes.
At ScaleGrid, we use the repairDatabase
operation to reclaim free space for MMAPv1 engine clusters.
Published at DZone with permission of Vishal Kumawat, DZone MVB. See the original article here.
Opinions expressed by DZone contributors are their own.
Comments