Aggregation Framework Example
Join the DZone community and get the full member experience.
Join For Free(also posted to the 10gen blog here)
In this blog post, you run a concise set of aggregation framework examples on the mongo Javascript shell against a MongoLab hosted 2.1 database. The framework includes the aggregation operators $project, $unwind, $group, and others. These operators allow you to calculate values across documents in a collection, like averages and sums. They also let you reshape documents, unpacking nested structures and regrouping them as needed.
The aggregation framework, one of the most powerful and highly anticipated features in the forthcoming production MongoDB 2.2 release, lets you construct a server-side processing pipeline to be run on a collection. A rich set of operations are available for incorporation in the pipeline so as to achieve various kinds of collection transforms, ranging from simple multi-document calculations (e.g., sums and averages) to complex projections and pivots.
The framework fits nicely in a range of data manipulation tools available in MongoDB from basic built-in functions like document counts to map-reduce and Javascript, to custom code and language-specific packages, including Hadoop.
Overview
- Create a 2.1 MongoLab database with your own unique name, say <myaggdemo>. Instructions on how to do that are here. You’ll need your mongod username and password.
- On your database’s home page, copy the mongo shell connection to your clipboard.
- git clone git://gist.github.com/1401585.git aggdemo ; cd aggdemo
- Edit articles.js and aggregation.js to use the your db <myaggdemo> (see below)
- mongo <your connection> -u <mongod username> -p <mongod password> articles.js (inserts the data into your database, 3 documents)
- mongo --shell <your connection> -u <mongod username> -p <mongod password> aggregation.js (performs several aggregation examples and leaves you in the mongo shell.)
- Type g1 in the mongo shell to see the first $group result discussed below.
(I’ve tested this to work with the production 2.0.6 mongo client, and the latest development 2.1.2 mongo client.)
Code snippets
articles.js
/* sample articles for aggregation demonstrations */ // make sure we're using the right db; this is the same as "use mydb;" in shell db = db.getSiblingDB("aggdb"); //Put your MongoLab database name here. db.article.drop(); db.article.save( { title : "this is my title" , author : "bob" , posted : new Date(1079895594000) , pageViews : 5 , tags : [ "fun" , "good" , "fun" ] , comments : [ { author :"joe" , text : "this is cool" } , { author :"sam" , text : "this is bad" } ], other : { foo : 5 } }); //...snip
aggregation.js
// make sure we're using the right db; this is the same as "use aggdb;" in shell db = db.getSiblingDB("aggdb"); //Put your MongoLab database name here. // ...snip... // grouping var g1 = db.runCommand( { aggregate : "article", pipeline : [ { $project : { author : 1, tags : 1, pageViews : 1 }}, { $unwind : "$tags" }, { $group : { _id : "$tags", docsByTag : { $sum : 1 }, viewsByTag : { $sum : "$pageViews" }, mostViewsByTag : { $max : "$pageViews" }, avgByTag : { $avg : "$pageViews" } }} ]}); // ...snip
g1 aggregation result
{ "result" : [ //...snip... { "_id" : "fun", "docsByTag" : 3, "viewsByTag" : 17, "mostViewsByTag" : 7, "avgByTag" : 5.666666666666667 } ], "Ok" : 1 }
- Props to Chris Westin, 10gen architect for the aggregation framework for providing these examples
- See also his presentation here.
Discussion
The results of the aggregation are saved to convenient variables for examination. The group operations (g1 and g5) at the end of the aggregation.js file are noteworthy because they rollup three operators into a common pivot and aggregation example. The g1 data flow is shown above. Click it for a larger .png version or here for a .pdf version.
- Collection -> Intermediate-1: First using the initial Collection of documents as input, g1 uses a $project to filter the document list’s fields to only include author, tags, and pageViews fields. The output is shown in Intermediate-1.
- Intermediate-1 -> Intermediate-2: Then g1 $unwinds Intermediate-1 by the embedded tags array so that each tag instance its own document with the output shown in Intermediate-2.
- Intermediate-2 -> Result: Then g1 uses the $group operator to create a list of documents by each tag instance, calculating statistics like total and average page views, shown as Result.
(Note that both Intermediate forms are internal to the processing engine and are not visible to the shell directly; Intermediate-2 is actually shown as example p2.)
For another example, you can look at g5. It also pivots on the embedded tag arrays but this time rolls up authors as embedded arrays using $addToSet, essentially completing the pivot.
NB: There’s a slight bug in the design of the g1 aggregation. The first object has the “fun” tag twice. I intentionally chose this one as it shows how the $unwind duplicates “fun” in the Intermediate-2 output for the first document, meaning that its aggregates are counted twice. A free MongoLab T-shirt to the first person who can correct the code to properly calculate the aggregates. Enter in the comments. (@cwestin63, you’re disqualified; you get a T-shirt anyway)
Summary
The MongoDB 2.1 Aggregation Framework is a powerful mechanism that can help you answer questions across documents. It will become production-ready in the upcoming 2.2 release, and you can try it out with minimal risk by using the MongoLab hosted experimental service. Happy aggregating!
(Update 2012-07-10 untabify indentation in aggregation.js for proper formatting. 2012-07-11 Re-arranged images.)
Published at DZone with permission of Ben Wen, DZone MVB. See the original article here.
Opinions expressed by DZone contributors are their own.
Trending
-
Demystifying SPF Record Limitations
-
Scaling Site Reliability Engineering (SRE) Teams the Right Way
-
Low Code vs. Traditional Development: A Comprehensive Comparison
-
What Is Istio Service Mesh?
Comments