Skip to main content

Aggregation Pipelines

MongoDB provides some advanced operators to query data, but if you want to build complex queries, similar to JOIN and GROUP BY in SQL, you need to use the aggregation pipeline framework.

Intro

The aggregation pipeline is a framework for data aggregation modeled on the concept of data processing pipelines. Documents enter a multi-stage pipeline that transforms the documents into an aggregated result.

The most basic pipeline stages provide filters that operate like queries and document transformations that modify the form of the output document.

Other pipeline operations provide tools for grouping and sorting documents by specific field or fields as well as tools for aggregating the contents of arrays, including arrays of documents. The pipeline provides efficient data aggregation using native operations within MongoDB, and is the preferred method for data aggregation in MongoDB.

Pipeline Stages

A pipeline consists of a sequence of stages. Each stage transforms the documents as they pass through the pipeline. Pipeline stages do not need to produce one output document for every input document; e.g., some stages may generate new documents or filter out documents.

Pipeline stages can appear multiple times in the pipeline. Certain pipeline operators can also appear multiple times: for example, you can have multiple $match stages in a pipeline.

The documents travel through the stages in sequence. The first stage receives the entire input and the last stage emits the entire output as documents.

Some of the common pipeline stages are:

  • $match - filters documents
  • $group - groups documents by some expression
  • $sort - sorts documents
  • $limit - limits the number of documents

Building aggregation pipelines in the Atlas UI

You can build aggregation pipelines in the Atlas UI using the Aggregation Pipeline Builder. The Aggregation Pipeline Builder provides a graphical interface for building aggregation pipelines.

In the Atlas UI, go to the Database navigation link. You will see the cluster you created earlier. In the click Browse Collections.

This will bring you to the collections view. Click on the players collection to see the matching documents.

note

Just above the documents, you will see an input box that says Type a query. In here, you can type in a query exactly like you used with the find method.

Click on the Aggregation tab at the top. This will bring you to the Aggregation Pipeline Builder.

From here, let's create a simple aggregation that will return the name, and overall score of the top three Argentinian players.

First add a stage to $match all players from Argentina.

{
nationality_name: "Argentina"
}

Next use a $sort stage to sort the players by their overall score.

{
overall: -1
}

Use a $limit stage to limit the results to the top three players.

3

Finally, use a $project stage to return only the name and overall score.

{
long_name: 1,
overall: 1
}

This will return a new collection of documents with the top three Argentinian players.