DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports Events Over 2 million developers have joined DZone. Join Today! Thanks for visiting DZone today,
Edit Profile Manage Email Subscriptions Moderation Admin Console How to Post to DZone Article Submission Guidelines
View Profile
Sign Out
Refcards
Trend Reports
Events
Zones
Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
  1. DZone
  2. Coding
  3. Frameworks
  4. Aggregation Framework Example

Aggregation Framework Example

Ben Wen user avatar by
Ben Wen
·
Aug. 07, 12 · Interview
Like (0)
Save
Tweet
Share
9.01K Views

Join the DZone community and get the full member experience.

Join For Free

(also posted to the 10gen blog here)

In this blog post, you run a concise set of aggregation framework examples on the mongo Javascript shell against a MongoLab hosted 2.1 database.  The framework includes the aggregation operators $project, $unwind, $group, and others.  These operators allow you to calculate values across documents in a collection, like averages and sums.  They also let you reshape documents, unpacking nested structures and regrouping them as needed.

The aggregation framework, one of the most powerful and highly anticipated features in the forthcoming production MongoDB 2.2 release, lets you construct a server-side processing pipeline to be run on a collection.  A rich set of operations are available for incorporation in the pipeline so as to achieve various kinds of collection transforms, ranging from simple multi-document calculations (e.g., sums and averages) to complex projections and pivots.

The framework fits nicely in a range of data manipulation tools available in MongoDB from basic built-in functions like document counts to map-reduce and Javascript, to custom code and language-specific packages, including Hadoop.

Overview

  1. Create a 2.1 MongoLab database with your own unique name, say <myaggdemo>.  Instructions on how to do that are here. You’ll need your mongod username and password.
  2. On your database’s home page, copy the mongo shell connection to your clipboard.
  3. git clone git://gist.github.com/1401585.git aggdemo ; cd aggdemo
  4. Edit articles.js and aggregation.js to use the your db <myaggdemo> (see below)
  5. mongo <your connection> -u <mongod username> -p <mongod password> articles.js  (inserts the data into your database, 3 documents)
  6. mongo --shell <your connection> -u <mongod username> -p <mongod password> aggregation.js (performs several aggregation examples and leaves you in the mongo shell.)
  7. Type g1 in the mongo shell to see the first $group result discussed below.

(I’ve tested this to work with the production 2.0.6 mongo client, and the latest development 2.1.2 mongo client.)

Code snippets

articles.js

/* sample articles for aggregation demonstrations */
 
// make sure we're using the right db; this is the same as "use mydb;" in shell
db = db.getSiblingDB("aggdb"); //Put your MongoLab database name here.
db.article.drop();
 
db.article.save( {
    title : "this is my title" ,
    author : "bob" ,
    posted : new Date(1079895594000) ,
    pageViews : 5 ,
    tags : [ "fun" , "good" , "fun" ] ,
    comments : [
        { author :"joe" , text : "this is cool" } ,
        { author :"sam" , text : "this is bad" }
    ],
    other : { foo : 5 }
});
//...snip

aggregation.js

// make sure we're using the right db; this is the same as "use aggdb;" in shell
db = db.getSiblingDB("aggdb"); //Put your MongoLab database name here.
// ...snip...
// grouping
var g1 = db.runCommand(
    { aggregate : "article", pipeline : [
        { $project : {
            author : 1,
            tags : 1,
            pageViews : 1
        }},
        { $unwind : "$tags" },
        { $group : {
            _id : "$tags",
            docsByTag : { $sum : 1 },
            viewsByTag : { $sum : "$pageViews" },
            mostViewsByTag : { $max : "$pageViews" },
            avgByTag : { $avg : "$pageViews" }
        }}
    ]});
// ...snip

 g1 aggregation result

{
    "result" : [
//...snip...
        {
            "_id" : "fun",
            "docsByTag" : 3,
            "viewsByTag" : 17,
            "mostViewsByTag" : 7,
            "avgByTag" : 5.666666666666667
        }
    ],
    "Ok" : 1
}

 

  • Props to Chris Westin, 10gen architect for the aggregation framework for providing these examples
  • See also his presentation here.

Discussion

The results of the aggregation are saved to convenient variables for examination. The group operations (g1 and g5) at the end of the aggregation.js file are noteworthy because they rollup three operators into a common pivot and aggregation example. The g1 data flow is shown above.  Click it for a larger .png version or here for a .pdf version.

  1. Collection -> Intermediate-1: First using the initial Collection of documents as input, g1 uses a $project to filter the document list’s fields to only include author, tags, and pageViews fields. The output is shown in Intermediate-1.
  2. Intermediate-1 -> Intermediate-2: Then g1 $unwinds Intermediate-1 by the embedded tags array so that each tag instance its own document with the output shown in Intermediate-2.
  3. Intermediate-2 -> Result: Then g1 uses the $group operator to create a list of documents by each tag instance, calculating statistics like total and average page views, shown as Result.

(Note that both Intermediate forms are internal to the processing engine and are not visible to the shell directly; Intermediate-2 is actually shown as example p2.)

For another example, you can look at g5. It also pivots on the embedded tag arrays but this time rolls up authors as embedded arrays using $addToSet, essentially completing the pivot.

NB: There’s a slight bug in the design of the g1 aggregation.  The first object has the “fun” tag twice.  I intentionally chose this one as it shows how the $unwind duplicates “fun” in the Intermediate-2 output for the first document, meaning that its aggregates are counted twice.  A free MongoLab T-shirt to the first person who can correct the code to properly calculate the aggregates.  Enter in the comments.  (@cwestin63, you’re disqualified; you get a T-shirt anyway)

Summary

The MongoDB 2.1 Aggregation Framework is a powerful mechanism that can help you answer questions across documents. It will become production-ready in the upcoming 2.2 release, and you can try it out with minimal risk by using the MongoLab hosted experimental service. Happy aggregating!

(Update 2012-07-10 untabify indentation in aggregation.js for proper formatting. 2012-07-11 Re-arranged images.)

Framework

Published at DZone with permission of Ben Wen, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

Popular on DZone

  • Why It Is Important To Have an Ownership as a DevOps Engineer
  • Distributed Stateful Edge Platforms
  • Best Practices for Writing Clean and Maintainable Code
  • How To Validate Three Common Document Types in Python

Comments

Partner Resources

X

ABOUT US

  • About DZone
  • Send feedback
  • Careers
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 600 Park Offices Drive
  • Suite 300
  • Durham, NC 27709
  • support@dzone.com
  • +1 (919) 678-0300

Let's be friends: