Data Modeling in MongoDB

Best practices for data modeling in MongoDB, including embedded documents, referencing documents, and choosing the appropriate model for different scenarios.


MongoDB Essentials: Document Referencing

Referencing Documents

In MongoDB, referencing involves storing the _id of one document within another document. This establishes a relationship between the two documents without physically embedding one inside the other. Referencing is crucial for building normalized data models and managing complex relationships between entities.

Exploring Document Referencing

Manual Referencing

Manual referencing is the most common approach. It simply involves storing the _id field (or other unique identifier) of the target document within the referencing document.

Example: A books collection and an authors collection. Each book references its author by storing the author's _id in the book document.

 // Author Document
                    {
                        "_id": ObjectId("643e0e23b8e5a828f78a9b01"),
                        "name": "Jane Austen",
                        "bio": "English novelist known primarily for her six major novels..."
                    }

                    // Book Document
                    {
                        "_id": ObjectId("643e0e23b8e5a828f78a9b02"),
                        "title": "Pride and Prejudice",
                        "author_id": ObjectId("643e0e23b8e5a828f78a9b01"),  // References the author
                        "publication_year": 1813
                    } 

To retrieve the author's details when querying for a book, you would typically perform a second query using the author_id.

 db.books.findOne({title: "Pride and Prejudice"}) // returns the book
                    //Then use the author_id from the result to find the author
                    db.authors.findOne({_id: ObjectId("643e0e23b8e5a828f78a9b01")}) // returns the author 

DBRefs (Discouraged for new projects)

DBRefs are a standardized way of representing references, including the database name and collection name. They're essentially a structure containing the database, collection and id. While they exist, manual referencing is generally preferred because of its simplicity and because DBRefs are not universally supported by all MongoDB drivers and tools. Modern drivers often provide equivalent or superior functionality.

Example: Using DBRef

 // Book Document using DBRef
                    {
                        "_id": ObjectId("643e0e23b8e5a828f78a9b02"),
                        "title": "Pride and Prejudice",
                        "author": {
                            "$ref": "authors",
                            "$id": ObjectId("643e0e23b8e5a828f78a9b01"),
                            "$db": "mydatabase"  // Optional: specify the database name
                        },
                        "publication_year": 1813
                    } 

Note: While DBRefs provide a structured format for referencing, they often require extra processing in application code to resolve the references. Modern MongoDB drivers and aggregation pipelines provide more efficient ways to handle relationships.

When Referencing is More Appropriate Than Embedding

Choosing between referencing and embedding depends on several factors:

  • Data Duplication: If embedding would lead to significant data duplication, referencing is usually better.
  • Cardinality: One-to-many or many-to-many relationships are often better suited for referencing. One-to-one relationships or one-to-few relationships where the related data is small and frequently accessed might be suitable for embedding.
  • Document Size: MongoDB has a document size limit (16MB). If embedding would cause a document to exceed this limit, referencing is necessary.
  • Update Frequency: If the embedded document is updated frequently, embedding can lead to performance issues as the entire document needs to be rewritten. Referencing allows you to update the referenced document independently.
  • Data Relationships: When there are complex, deeply nested relationships that can change, referencing provides better flexibility.

In summary, reference when: data is normalized, relationships are many-to-many or one-to-many, updates to related data are frequent, document size is a concern, or you have complex, evolving data relationships.

Embed when: the relationship is one-to-one or one-to-few, related data is small and rarely changes, and you prioritize read performance over write performance.

How to Implement Relationships Between Documents

Implementing relationships using referencing typically involves the following steps:

  1. Design your data model: Identify the entities and their relationships (one-to-one, one-to-many, many-to-many).
  2. Create collections: Create a collection for each entity.
  3. Store references: In the referencing document, store the _id of the target document as a field. Use appropriate field names that clearly indicate the relationship (e.g., author_id, category_ids).
  4. Retrieve related data: Use multiple queries to retrieve related data, or leverage aggregation pipelines ($lookup operator) to join data from multiple collections in a single query.

Example using Aggregation with $lookup:

 db.books.aggregate([
                  {
                    $lookup: {
                      from: "authors",
                      localField: "author_id",
                      foreignField: "_id",
                      as: "author"
                    }
                  },
                  {
                    $unwind: "$author"  // Optional: If you want to remove the author array
                  }
                ]) 

This aggregation pipeline joins the books collection with the authors collection based on the author_id field. The result will include the book document along with the author's information embedded within the author field.

By mastering document referencing and aggregation, you can effectively model complex relationships in MongoDB and build performant applications.