Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

Introduction To MongoDB Aggregation Pipeline

DZone's Guide to

Introduction To MongoDB Aggregation Pipeline

· Java Zone
Free Resource

Learn how to troubleshoot and diagnose some of the most common performance issues in Java today. Brought to you in partnership with AppDynamics.

MongoDB Aggregation pipeline is a framework for data aggregation. It is modelled on the concept of data processing pipelines. Documents enter a multi-stage pipeline that transforms the documents into an aggregated results. It was introduced in MongoDB 2.2 to do aggregation operations without needing to use map-reduce.

Aggregation Pipeline

  • The $match and $sort pipeline operators can take advantage of an index when they occur at the beginning of the pipeline [Reference].
  • There are no restrictions on result size as a cursor is returned [Reference].
  • The output can be returned inline or written to a collection [Reference].
  • Pipeline stages have a limit of 100MB of RAM. To handle large datasets use allowDiskUse option [Reference].
  • Aggregation Pipeline have an optimization phase which attempts to reshape the pipeline for improved performance [Reference].

For most aggregation operations, the Aggregation Pipeline provides better performance and more coherent interface. However, map-reduce operations provide some flexibility that is presently not available in the aggregation pipeline.

The syntax for aggregation pipeline is

db.collection.aggregate( [ { <stage> }, ... ] )

Stages

The MongoDB aggregation pipeline consists of stages. Each stage transforms the documents as they pass through the pipeline. Pipeline stages do not need to produce one output document for every input document; e.g., some stages may generate new documents or filter out documents. Pipeline stages can appear multiple times in the pipeline.

Various stage operators supported by MongoDB are listed below-

Name Description
$geoNear Returns an ordered stream of documents based on the proximity to a geospatial point. Incorporates the functionality of $match, $sort, and $limit for geospatial data. The output documents include an additional distance field and can include a location identifier field.
$group Groups input documents by a specified identifier expression and applies the accumulator expression(s), if specified, to each group. Consumes all input documents and outputs one document per each distinct group. The output documents only contain the identifier field and, if specified, accumulated fields.
$limit Passes the first n documents unmodified to the pipeline where n is the specified limit. For each input document, outputs either one document (for the first n documents) or zero documents (after the first n documents).
$match Filters the document stream to allow only matching documents to pass unmodified into the next pipeline stage. $match uses standard MongoDB queries. For each input document, outputs either one document (a match) or zero documents (no match).
$out Writes the resulting documents of the aggregation pipeline to a collection. To use the $out stage, it must be the last stage in the pipeline.
$project Reshapes each document in the stream, such as by adding new fields or removing existing fields. For each input document, outputs one document.
$redact Reshapes each document in the stream by restricting the content for each document based on information stored in the documents themselves. Incorporates the functionality of $project and $match. Can be used to implement field level redaction. For each input document, outputs either one or zero document.
$skip Skips the first n documents where n is the specified skip number and passes the remaining documents unmodified to the pipeline. For each input document, outputs either zero documents (for the first n documents) or one document (if after the first n documents).
$sort Reorders the document stream by a specified sort key. Only the order changes; the documents remain unmodified. For each input document, outputs one document.
$unwind Deconstructs an array field from the input documents to output a document for each element. Each output document replaces the array with an element value. For each input document, outputs n documents where n is the number of array elements and can be zero for an empty array.

Different expressions supported by MongoDB are listed here.


Understand the needs and benefits around implementing the right monitoring solution for a growing containerized market. Brought to you in partnership with AppDynamics.

Topics:

Published at DZone with permission of Rishav Rohit, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}