Skip to main content

πŸ¦Έβ€β™‚οΈ Hybrid Search Exercise

info

Extra activity, do it if you have extra time or are following at home, won't be covered during the hands-on Lab.

One of the powerful features of MongoDB Atlas is the ability to combine vector search with traditional text search, creating a hybrid search solution. This allows us to leverage both semantic understanding and keyword matching for more comprehensive search results. In this exercise, we'll implement a hybrid search pipeline and then experiment with different weightings.

Creating a Basic Hybrid Search Pipeline​

Let's start by creating a basic hybrid search pipeline that combines vector search on book synopses with text search on titles and author names:

[
{
$vectorSearch: {
queryVector: vector, // Assume this is already defined
path: "embeddings",
numCandidates: 100,
index: "books_synopsis_vector",
limit: 20
}
},
{
$search: {
index: "books_text_index",
compound: {
should: [
{
text: {
query: searchQuery, // Assume this is already defined
path: "title",
score: { boost: { value: 3 } }
}
},
{
text: {
query: searchQuery,
path: "authors.name",
score: { boost: { value: 2 } }
}
}
]
}
}
},
{
$addFields: {
vector_score: { $meta: "vectorSearchScore" },
text_score: { $meta: "searchScore" }
}
},
{
$addFields: {
combined_score: {
$add: [
{ $multiply: ["$vector_score", 0.5] },
{ $multiply: ["$text_score", 0.5] }
]
}
}
},
{ $sort: { combined_score: -1 } },
{ $limit: 10 }
]

Add this aggregation pipeline to your code in server/src/controllers/books.ts inside the searchBooks method.

Experimenting with Score Weighting​

Now that we have a basic hybrid search implemented, let's experiment with different weightings for the vector and text scores.

  1. Try adjusting the weights in the combined_score calculation. For example:
combined_score: {
$add: [
{ $multiply: ["$vector_score", 0.7] },
{ $multiply: ["$text_score", 0.3] }
]
}

This gives more weight to the vector search results.

  1. Test the search with different queries and observe how the results change with different weightings.
  2. Experiment with the boost values in the text search stage. Try increasing the boost for the title or author name and see how it affects the results.

Adding Pre-filtering​

To further refine our hybrid search, let's add pre-filtering capabilities. We'll filter books by their publication year before performing the vector search.

Modify your vector search stage to include a filter:


{
$vectorSearch: {
queryVector: vector,
path: "embeddings",
numCandidates: 100,
index: "books_synopsis_vector",
limit: 20,
filter: { year: { $gte: 2000 } } // Only books published from 2000 onwards
}
}

Remember to update your vector index to support filtering on the year field, similar to what you did in the pre-filtering exercise:

{
"fields": [
{
"type": "vector",
"path": "embeddings",
"numDimensions": 1536,
"similarity": "cosine"
},
{
"type": "filter",
"path": "year"
}
]
}

Exercise Tasks​

  1. Implement the basic hybrid search pipeline in your application.
  2. Experiment with at least three different weight combinations for vector and text scores. Document how the results change.
  3. Add pre-filtering to your hybrid search to only include books published in the last 20 years.
  4. Create a function that allows users to specify the importance of title matches vs. content similarity, and adjust the weights accordingly.
  5. Test your hybrid search with various queries and compare the results to those from pure vector search and pure text search.

Completing this exercise, you'll gain hands-on experience in implementing and fine-tuning a hybrid search solution, combining the strengths of both vector and text search capabilities in MongoDB Atlas.