DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports Events Over 2 million developers have joined DZone. Join Today! Thanks for visiting DZone today,
Edit Profile Manage Email Subscriptions Moderation Admin Console How to Post to DZone Article Submission Guidelines
View Profile
Sign Out
Refcards
Trend Reports
Events
Zones
Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Partner Zones AWS Cloud
by AWS Developer Relations
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Partner Zones
AWS Cloud
by AWS Developer Relations
  1. DZone
  2. Data Engineering
  3. Databases
  4. When to Use GridFS on MongoDB

When to Use GridFS on MongoDB

Now why does a document oriented database like MongoDB provide a file layer abstraction? Turns out there are some very good reasons.

Dharshan Rangegowda user avatar by
Dharshan Rangegowda
·
Feb. 27, 14 · Interview
Like (2)
Save
Tweet
Share
51.92K Views

Join the DZone community and get the full member experience.

Join For Free

GridFS is a simple file system abstraction on top of MongoDB. If you are familiar with Amazon S3, GridFS is a very similar abstraction. Now why does a document oriented database like MongoDB provide a file layer abstraction? Turns out there are some very good reasons.

1. Storing user generated file content

A large number of web applications allow users to upload files. Historically when working with relational databases these user generated files get stored on the file system separate from the database. This creates a number of problems.  How to replicate the files to all the needed servers?, How to delete all the copies when the file is deleted? How to backup the files for safety and disaster recovery? GridFS solves this problem for the user by storing the files along with the database. You can leverage your database backup to backup your files. Also due to MongoDB replication a copy of your files is stored in each replica. Deleting the file is as easy as deleting an object in the DB.

2. Accessing portions of file content

When a file is uploaded to GridFS, the file is split into chunks of 256k and stored separately.  So when you need to read only a certain range of bytes of the file, only those chunks are brought into memory and not the whole file. This is extremely useful when dealing with large media content that needs to be selectively read or edited.

3. Storing documents greater than 16MB in MongoDB

By default MongoDB document size is capped at 16MB. So if you have documents that are greater than 16MB you can use store them using GridFS.

4. Overcoming file system limitations

If you are storing a large number of files you will need to consider file system limitations like the maximum number of files/directory etc. With GridFS you don't need to worry about the file system limits. Also with GridFS and MongoDB sharding you can distribute your files across different servers without significantly increasing the operational complexity.

Underneath the covers

GridFS uses two collections to store the data


> show collections;
fs.chunks
fs.files
system.indexes
>

The fs.files collections contains metadata about the files and the fs.chunks collections stores the actual 256k chunks. If you have a sharded collection the chunks are distributed across different servers and you might get better performance than a filesystem!

> db.fs.files.findOne();
{
"_id" : ObjectId("530cf1bf96038f5cb6df5f39"),
"filename" : "./conn.log",
"chunkSize" : 262144,
"uploadDate" : ISODate("2014-02-25T19:40:47.321Z"),
"md5" : "6515e95f8bb161f6435b130a0e587ccd",
"length" : 1644981
}
>

MongoDB also creates a compound index on files_id and the chunk number to help quickly access the chunks

> db.fs.chunks.getIndexes();
[
{
"v" : 1,
"key" : {
"_id" : 1
},
"ns" : "files.fs.chunks",
"name" : "_id_"
},
{
"v" : 1,
"key" : {
"files_id" : 1,
"n" : 1
},
"ns" : "files.fs.chunks",
"name" : "files_id_1_n_1"
}
]
>

Examples

MongoDB has a built in utility called "mongofiles" to help exercise the GridFS scenarios. Please refer to your driver documentation on how to use GridFS with your driver.

Put
#mongofiles -h  -u  -p  --db files put /conn.log
connected to: 127.0.0.1
added file: { _id: ObjectId('530cf1009710ca8fd47d7d5d'), filename: "./conn.log", chunkSize: 262144, uploadDate: new Date(1393357057021), md5: "6515e95f8bb161f6435b130a0e587ccd", length: 1644981 }
done!

Get
#mongofiles -h  -u  -p  --db files get /conn.log
connected to: 127.0.0.1
done write to: ./conn.log

List
# mongofiles -h  -u  -p  list
connected to: 127.0.0.1
/conn.log 1644981

Delete
[root@ip-10-198-25-43 tmp]# mongofiles -h  -u  -p  --db files delete /conn.log
connected to: 127.0.0.1
done!

Modules

If you would like to serve the file data stored in MongoDB directly from your web server or file system there are serveral GridFS plugin modules available

  • GridFS-Fuse - Plugin GridFS into the filesystem
  • GridFS-Nginx - Plugin to server GridFS files directly from Nginx

Limitations

  • Working Set: Serving files along with your database content can significantly churn your memory working set. If you would not like to disturb your working set it might be best to serve your files from a different mongodb server.
  • Performance: The file serving performance will be slower than natively serving the file from your webserver and filesystem. However the added management benefits might be worth the slowdown.
  • Atomic update: GridFS does not provide a way to do an atomic update of a file. If this scenario is necessary you will need to maintain multiple versions of your files and pick the right version.
MongoDB File system

Published at DZone with permission of Dharshan Rangegowda, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

Popular on DZone

  • OWASP Kubernetes Top 10
  • Tracking Software Architecture Decisions
  • Rust vs Go: Which Is Better?
  • Use Golang for Data Processing With Amazon Kinesis and AWS Lambda

Comments

Partner Resources

X

ABOUT US

  • About DZone
  • Send feedback
  • Careers
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 600 Park Offices Drive
  • Suite 300
  • Durham, NC 27709
  • support@dzone.com
  • +1 (919) 678-0300

Let's be friends: