Streaming Files from MongoDB GridFS
Join the DZone community and get the full member experience.
Join For FreeNot too long ago I tweeted what I felt was a small triumph on my latest project, streaming files from MongoDB GridFS for downloads (rather than pulling the whole file into memory and then serving it up). I promised to blog about this but unfortunately my specific usage was a little coupled to the domain on my project so I couldn’t just show it off as is. So I’ve put together an example node.js+GridFS application and shared it on github and will use this post to explain how I accomplished it.
GridFS Module
First off, special props go to tjholowaychuk who responded in the #node.js irc channel when I asked if anyone has had luck with using GridFS from mongoose.
A lot of my resulting code is derived from an gist he shared with me.
Anyway, to the code. I’ll describe how I’m using gridfs and after
setting the ground work illustrate how simple it is to stream files from
GridFS.
I created a gridfs module that basically accesses GridStore through
mongoose (which I use throughout my application) that can also share the
db connection created when connecting mongoose to the mongodb server.
mongoose = require "mongoose" request = require "request" GridStore = mongoose.mongo.GridStore Grid = mongoose.mongo.Grid ObjectID = mongoose.mongo.BSONPure.ObjectID
We can’t get files from mongodb if we cannot put anything into it, so let’s create a putFile operation.
exports.putFile = (path, name, options..., fn) -> db = mongoose.connection.db options = parse(options) options.metadata.filename = name new GridStore(db, name, "w", options).open (err, file) -> return fn(err) if err file.writeFile path, fn parse = (options) -> opts = {} if options.length > 0 opts = options[0] if !opts.metadata opts.metadata = {} opts
This really just delegates to the putFile operation that exists in
GridStore as part of the mongodb module. I also have a little logic in
place to parse options, providing defaults if none were provided. One
interesting feature to note is that I store the filename in the metadata
because at the time I ran into a funny issue where files retrieved from
gridFS had the id as the filename (even though a look in mongo reveals
that the filename is in fact in the database).
Now the get operation. The original implementation of this simply
passed the contents as a buffer to the provided callback by calling
store.readBuffer(), but this is now changed to pass the resulting store
object to the callback. The value in this is that the caller can use the
store object to access metadata, contentType, and other details. The
user can also determine how they want to read the file (either into
memory or using a ReadableStream).
exports.get = (id, fn) -> db = mongoose.connection.db id = new ObjectID(id) store = new GridStore(db, id, "r", root: "fs" ) store.open (err, store) -> return fn(err) if err # band-aid if "#{store.filename}" == "#{store.fileId}" and store.metadata and store.metadata.filename store.filename = store.metadata.filename fn null, store
This code just has a small blight in that it checks to see if the
filename and fileId are equal. If they are, it then checks to see if
metadata.filename is set and sets store.filename to the value found
there. I’ve tabled the issue to investigate further later.
The Model
In my specific instance, I wanted to attach files to a model. In this
example, let’s pretend that we have an Application for something (job, a
loan application, etc) that we can attach any number of files to. Think
of tax receipts, a completed application, other scanned documents.
ApplicationSchema = new mongoose.Schema( name: String files: [ mongoose.Schema.Mixed ] ) ApplicationSchema.methods.addFile = (file, options, fn) -> gridfs.putFile file.path, file.filename, options, (err, result) => @files.push result @save fn
Here I define files as an array of Mixed object types (meaning they
can be anything) and a method addFile which basically takes an object
that at least contains a path and filename attribute. It uses this to
save the file to gridfs and stores the resulting gridstore file object
in the files array (this contains stuff like an id, uploadDate,
contentType, name, size, etc).
Handling Requests
This all plugs in to the request handler to handle form submissions to /new.
All this entails is creating an Application model instance, adding the
uploaded file from the request (in this case we named the file field
“file”, hence req.files.file) and saving it.
app.post "/new", (req, res) -> application = new Application() application.name = req.body.name opts = content_type: req.files.file.type application.addFile req.files.file, opts, (err, result) -> res.redirect "/"
Now the sum of all this work allows us to reap the rewards by making it super simple to download a requested file from gridFS.
app.get "/file/:id", (req, res) -> gridfs.get req.params.id, (err, file) -> res.header "Content-Type", file.type res.header "Content-Disposition", "attachment; filename=#{file.filename}" file.stream(true).pipe(res)
Here we simply look up a file by id and use the resulting file object
to set Content-Type and Content-Disposition fields and finally make use
of ReadableStream::pipe to write the file out to the
response object (which is an instance of WritableStream). This is the
piece of magic that streams data from MongoDB to the client side.
Ideas
This is just a humble beginning. Other ideas include completely
encapsulating gridfs within the model. Taking things further we could
even turn the gridfs model into a mongoose plugin to allow completely
blackboxed usage of gridfs.
Feel free to check the project out and let me know if you have ideas to take it even further. Fork away!
Source: http://blog.james-carr.org/2012/01/09/streaming-files-from-mongodb-gridfs/
Opinions expressed by DZone contributors are their own.
Comments