Aggregation Framework
Introduction to the Aggregation Framework and its pipeline operators for performing complex data transformations and analysis.
Understanding MongoDB's $lookup Stage
$lookup: Joining Data from Other Collections
The $lookup
aggregation pipeline stage in MongoDB is used to perform a left outer join to another collection in the same database to filter in documents to pass into the next stage in the pipeline. It adds a new array field to the input documents. This new array field contains the matching documents from the "joined" collection. Effectively, it allows you to enrich documents from one collection with data from another, based on specified matching criteria.
Think of it like an SQL LEFT OUTER JOIN
, where you want to retrieve all documents from the "left" table (the input documents to the $lookup
stage) and also any matching documents from the "right" table (the collection being joined). If there are no matching documents in the right table, the new array field will be an empty array.
Key Components of $lookup:
from
: The name of the collection to join with. This is the "right" collection in a left outer join analogy.localField
: The field from the input documents (the "left" collection) to use for the join. This is the field that will be compared to theforeignField
.foreignField
: The field from thefrom
collection (the "right" collection) to use for the join. This is the field that will be compared to thelocalField
.as
: The name of the new array field to add to the input documents. This field will contain the matching documents from thefrom
collection.
In its simplest form, the $lookup
stage takes the following structure:
{
$lookup: {
from: "<collection to join>",
localField: "<field from the input documents>",
foreignField: "<field from the 'from' collection>",
as: "<output array field>"
}
}
Utilizing the $lookup Stage for Left Outer Joins
Let's illustrate how to use $lookup
to perform a left outer join with a concrete example. Suppose we have two collections: products
and categories
. The products
collection contains information about individual products, and the categories
collection contains information about product categories. We want to retrieve all products and, if available, the corresponding category information for each product.
products Collection:
[
{ "_id": 1, "name": "Laptop", "category_id": 101 },
{ "_id": 2, "name": "Keyboard", "category_id": 102 },
{ "_id": 3, "name": "Mouse", "category_id": 102 },
{ "_id": 4, "name": "Tablet", "category_id": 101 },
{ "_id": 5, "name": "Headphones", "category_id": 103 }
]
categories Collection:
[
{ "_id": 101, "name": "Electronics" },
{ "_id": 102, "name": "Accessories" }
]
We can use the following aggregation pipeline with the $lookup
stage to join these collections and retrieve all products along with their category information:
db.products.aggregate([
{
$lookup: {
from: "categories",
localField: "category_id",
foreignField: "_id",
as: "category"
}
}
])
Explanation:
from: "categories"
: Specifies that we are joining with thecategories
collection.localField: "category_id"
: Indicates that we are using thecategory_id
field from theproducts
collection for the join.foreignField: "_id"
: Indicates that we are using the_id
field from thecategories
collection for the join.as: "category"
: Specifies that the matching category document(s) from thecategories
collection will be stored in a new field namedcategory
.
Result:
[
{ "_id": 1, "name": "Laptop", "category_id": 101, "category": [ { "_id": 101, "name": "Electronics" } ] },
{ "_id": 2, "name": "Keyboard", "category_id": 102, "category": [ { "_id": 102, "name": "Accessories" } ] },
{ "_id": 3, "name": "Mouse", "category_id": 102, "category": [ { "_id": 102, "name": "Accessories" } ] },
{ "_id": 4, "name": "Tablet", "category_id": 101, "category": [ { "_id": 101, "name": "Electronics" } ] },
{ "_id": 5, "name": "Headphones", "category_id": 103, "category": [ ] }
]
Notice that the category
field is an array. If a product's category_id
matches an _id
in the categories
collection, the corresponding category document will be included in the array. If there is no match (e.g., the "Headphones" product with category_id: 103
), the category
field will be an empty array ([]
). This demonstrates the left outer join behavior, where all documents from the "left" collection (products
) are included, regardless of whether there's a match in the "right" collection (categories
).