🦸‍♂️ Hybrid Search Exercise

信息

Extra activity, do it if you have extra time or are following at home, won't be covered during the hands-on Lab.

One of the powerful features of MongoDB Atlas is the ability to combine vector search with traditional text search, creating a hybrid search solution. This allows us to leverage both semantic understanding and keyword matching for more comprehensive search results. In this exercise, we'll implement a hybrid search pipeline and then experiment with different weightings.

Creating a Basic Hybrid Search Pipeline

Let's start by creating a basic hybrid search pipeline that combines vector search on book synopses with text search on titles and author names:

[
  {
    $vectorSearch: {
      queryVector: vector, // Assume this is already defined
      path: "embeddings",
      numCandidates: 100,
      index: "books_synopsis_vector",
      limit: 20
    }
  },
  {
    $search: {
      index: "books_text_index",
      compound: {
        should: [
          {
            text: {
              query: searchQuery, // Assume this is already defined
              path: "title",
              score: { boost: { value: 3 } }
            }
          },
          {
            text: {
              query: searchQuery,
              path: "authors.name",
              score: { boost: { value: 2 } }
            }
          }
        ]
      }
    }
  },
  {
    $addFields: {
      vector_score: { $meta: "vectorSearchScore" },
      text_score: { $meta: "searchScore" }
    }
  },
  {
    $addFields: {
      combined_score: {
        $add: [
          { $multiply: ["$vector_score", 0.5] },
          { $multiply: ["$text_score", 0.5] }
        ]
      }
    }
  },
  { $sort: { combined_score: -1 } },
  { $limit: 10 }
]

Add this aggregation pipeline to your code in server/src/controllers/books.ts inside the searchBooks method.

Experimenting with Score Weighting

Now that we have a basic hybrid search implemented, let's experiment with different weightings for the vector and text scores.

Try adjusting the weights in the combined_score calculation. For example:

combined_score: {
  $add: [
    { $multiply: ["$vector_score", 0.7] },
    { $multiply: ["$text_score", 0.3] }
  ]
}

This gives more weight to the vector search results.

Test the search with different queries and observe how the results change with different weightings.
Experiment with the boost values in the text search stage. Try increasing the boost for the title or author name and see how it affects the results.

Adding Pre-filtering

To further refine our hybrid search, let's add pre-filtering capabilities. We'll filter books by their publication year before performing the vector search.

Modify your vector search stage to include a filter:

{
  $vectorSearch: {
    queryVector: vector,
    path: "embeddings",
    numCandidates: 100,
    index: "books_synopsis_vector",
    limit: 20,
    filter: { year: { $gte: 2000 } } // Only books published from 2000 onwards
  }
}

Remember to update your vector index to support filtering on the year field, similar to what you did in the pre-filtering exercise:

{
  "fields": [
    {
      "type": "vector",
      "path": "embeddings",
      "numDimensions": 1536,
      "similarity": "cosine"
    },
    {
      "type": "filter",
      "path": "year"
    }
  ]
}

Exercise Tasks

Implement the basic hybrid search pipeline in your application.
Experiment with at least three different weight combinations for vector and text scores. Document how the results change.
Add pre-filtering to your hybrid search to only include books published in the last 20 years.
Create a function that allows users to specify the importance of title matches vs. content similarity, and adjust the weights accordingly.
Test your hybrid search with various queries and compare the results to those from pure vector search and pure text search.

Completing this exercise, you'll gain hands-on experience in implementing and fine-tuning a hybrid search solution, combining the strengths of both vector and text search capabilities in MongoDB Atlas.

Creating a Basic Hybrid Search Pipeline​

Experimenting with Score Weighting​

Adding Pre-filtering​

Exercise Tasks​

Creating a Basic Hybrid Search Pipeline

Experimenting with Score Weighting

Adding Pre-filtering

Exercise Tasks