🦸♂️ Hybrid Search Exercise
Extra activity, do it if you have extra time or are following at home, won't be covered during the hands-on Lab.
One of the powerful features of MongoDB Atlas is the ability to combine vector search with traditional text search, creating a hybrid search solution. This allows us to leverage both semantic understanding and keyword matching for more comprehensive search results. In this exercise, we'll implement a hybrid search pipeline and then experiment with different weightings.
Creating a Basic Hybrid Search Pipeline
Let's start by creating a basic hybrid search pipeline that combines vector search on book synopses with text search on titles and author names:
[
{
$vectorSearch: {
queryVector: vector, // Assume this is already defined
path: "embeddings",
numCandidates: 100,
index: "books_synopsis_vector",
limit: 20
}
},
{
$search: {
index: "books_text_index",
compound: {
should: [
{
text: {
query: searchQuery, // Assume this is already defined
path: "title",
score: { boost: { value: 3 } }
}
},
{
text: {
query: searchQuery,
path: "authors.name",
score: { boost: { value: 2 } }
}
}
]
}
}
},
{
$addFields: {
vector_score: { $meta: "vectorSearchScore" },
text_score: { $meta: "searchScore" }
}
},
{
$addFields: {
combined_score: {
$add: [
{ $multiply: ["$vector_score", 0.5] },
{ $multiply: ["$text_score", 0.5] }
]
}
}
},
{ $sort: { combined_score: -1 } },
{ $limit: 10 }
]
Add this aggregation pipeline to your code in server/src/controllers/books.ts
inside the searchBooks
method.
Experimenting with Score Weighting
Now that we have a basic hybrid search implemented, let's experiment with different weightings for the vector and text scores.
- Try adjusting the weights in the
combined_score
calculation. For example:
combined_score: {
$add: [
{ $multiply: ["$vector_score", 0.7] },
{ $multiply: ["$text_score", 0.3] }
]
}
This gives more weight to the vector search results.
- Test the search with different queries and observe how the results change with different weightings.
- Experiment with the
boost
values in the text search stage. Try increasing the boost for the title or author name and see how it affects the results.
Adding Pre-filtering
To further refine our hybrid search, let's add pre-filtering capabilities. We'll filter books by their publication year before performing the vector search.
Modify your vector search stage to include a filter:
{
$vectorSearch: {
queryVector: vector,
path: "embeddings",
numCandidates: 100,
index: "books_synopsis_vector",
limit: 20,
filter: { year: { $gte: 2000 } } // Only books published from 2000 onwards
}
}
Remember to update your vector index to support filtering on the year
field, similar to what you did in the pre-filtering exercise:
{
"fields": [
{
"type": "vector",
"path": "embeddings",
"numDimensions": 1536,
"similarity": "cosine"
},
{
"type": "filter",
"path": "year"
}
]
}
Exercise Tasks
- Implement the basic hybrid search pipeline in your application.
- Experiment with at least three different weight combinations for vector and text scores. Document how the results change.
- Add pre-filtering to your hybrid search to only include books published in the last 20 years.
- Create a function that allows users to specify the importance of title matches vs. content similarity, and adjust the weights accordingly.
- Test your hybrid search with various queries and compare the results to those from pure vector search and pure text search.
Completing this exercise, you'll gain hands-on experience in implementing and fine-tuning a hybrid search solution, combining the strengths of both vector and text search capabilities in MongoDB Atlas.