Aggregation Framework

Introduction to the Aggregation Framework and its pipeline operators for performing complex data transformations and analysis.


Understanding MongoDB's $lookup Stage

$lookup: Joining Data from Other Collections

The $lookup aggregation pipeline stage in MongoDB is used to perform a left outer join to another collection in the same database to filter in documents to pass into the next stage in the pipeline. It adds a new array field to the input documents. This new array field contains the matching documents from the "joined" collection. Effectively, it allows you to enrich documents from one collection with data from another, based on specified matching criteria.

Think of it like an SQL LEFT OUTER JOIN, where you want to retrieve all documents from the "left" table (the input documents to the $lookup stage) and also any matching documents from the "right" table (the collection being joined). If there are no matching documents in the right table, the new array field will be an empty array.

Key Components of $lookup:

  • from: The name of the collection to join with. This is the "right" collection in a left outer join analogy.
  • localField: The field from the input documents (the "left" collection) to use for the join. This is the field that will be compared to the foreignField.
  • foreignField: The field from the from collection (the "right" collection) to use for the join. This is the field that will be compared to the localField.
  • as: The name of the new array field to add to the input documents. This field will contain the matching documents from the from collection.

In its simplest form, the $lookup stage takes the following structure:

{
    $lookup: {
    from: "<collection to join>",
    localField: "<field from the input documents>",
    foreignField: "<field from the 'from' collection>",
    as: "<output array field>"
}
}

Utilizing the $lookup Stage for Left Outer Joins

Let's illustrate how to use $lookup to perform a left outer join with a concrete example. Suppose we have two collections: products and categories. The products collection contains information about individual products, and the categories collection contains information about product categories. We want to retrieve all products and, if available, the corresponding category information for each product.

products Collection:

[
    { "_id": 1, "name": "Laptop", "category_id": 101 },
    { "_id": 2, "name": "Keyboard", "category_id": 102 },
    { "_id": 3, "name": "Mouse", "category_id": 102 },
    { "_id": 4, "name": "Tablet", "category_id": 101 },
    { "_id": 5, "name": "Headphones", "category_id": 103 }
]

categories Collection:

[
    { "_id": 101, "name": "Electronics" },
    { "_id": 102, "name": "Accessories" }
]

We can use the following aggregation pipeline with the $lookup stage to join these collections and retrieve all products along with their category information:

db.products.aggregate([
    {
    $lookup: {
        from: "categories",
        localField: "category_id",
        foreignField: "_id",
        as: "category"
    }
}
])

Explanation:

  • from: "categories": Specifies that we are joining with the categories collection.
  • localField: "category_id": Indicates that we are using the category_id field from the products collection for the join.
  • foreignField: "_id": Indicates that we are using the _id field from the categories collection for the join.
  • as: "category": Specifies that the matching category document(s) from the categories collection will be stored in a new field named category.

Result:

[
    { "_id": 1, "name": "Laptop", "category_id": 101, "category": [ { "_id": 101, "name": "Electronics" } ] },
    { "_id": 2, "name": "Keyboard", "category_id": 102, "category": [ { "_id": 102, "name": "Accessories" } ] },
    { "_id": 3, "name": "Mouse", "category_id": 102, "category": [ { "_id": 102, "name": "Accessories" } ] },
    { "_id": 4, "name": "Tablet", "category_id": 101, "category": [ { "_id": 101, "name": "Electronics" } ] },
    { "_id": 5, "name": "Headphones", "category_id": 103, "category": [ ] }
]

Notice that the category field is an array. If a product's category_id matches an _id in the categories collection, the corresponding category document will be included in the array. If there is no match (e.g., the "Headphones" product with category_id: 103), the category field will be an empty array ([]). This demonstrates the left outer join behavior, where all documents from the "left" collection (products) are included, regardless of whether there's a match in the "right" collection (categories).