π¦ΈββοΈ Implementing Hybrid Search
Extra activity, do it if you have extra time or are following at home, won't be covered during the hands-on Lab.
In this section, we'll explore how we can implement Hybrid Search using Reciprocal Rank Fusion, a common technique used to combine results from different search methods.
The MongoDB documentation shows how you can implement the Hybrid Search using a single aggregation pipeline. However, in this lab, we will combine the results from vector search and full-text search with application logic instead.
Prerequitesβ
This segment depends on the setup you have done in previous exercises:
- Ensure you have created an Atlas Search index on the books collection with dynamic mappings.
- Ensure you have imported the embeddings.
- Ensure you have created the vector search index on the embeddings field.
Function for Full-Text Searchβ
Write a new function called fullTextSearch
in books.ts
:
- It accepts query as an argument of string type, and limit as an argument of number type
- Using query, it should perform a full-text search on title, synopsis, and author names, and you may boost scores at your own discretion
- It should limit the number of books returned using the limit argument parsed
- Using
$project
to only return the following book properties: _id, title, authors, synopsis, and cover
Answer
public async fullTextSearch(query: string,limit: number): Promise<Book[]>{
const pipeline = [
{
$search: {
index: "fulltextsearch",
compound: {
should: [
{
text: {
query,
path: "title",
score: { boost: { value: 3 } }
}
},
{
text: {
query,
path: "synopsis",
score: { boost: { value: 2 } }
}
},
{
text: {
query,
path: "authors.name",
score: { boost: { value: 2 } }
}
}
]
}
}
},{
$limit: limit
},{
$project:{ //project only relevant fields
title: 1, authors: 1, synopsis:1, cover:1
}
}
]
const books = await collections?.books?.aggregate(pipeline).toArray() as Book[];
return books;
}
Function for Vector Searchβ
Write a new function called vectorSearch
in books.ts
:
- It accepts query as an argument of string type, and limit as an argument of number type
- It should vectorize the query and use resulting vector to perform a vector search on books collection.
- It should limit the number of books returned using the limit argument parsed
- Using
$project
to only return the following book properties: _id, title, authors, synopsis, and cover
Answer
public async vectorSearch(query: string, limit: number): Promise<Book[]>{
const vector = await getEmbeddings(query);
const pipeline = [
{
$vectorSearch: {
queryVector: vector,
path: 'embeddings',
numCandidates: 100,
index: 'vectorsearch',
limit: limit,
}
},{
$project:{ //project only relevant fields
title: 1, authors: 1, synopsis:1, cover:1
}
}
]
const books = await collections?.books?.aggregate(pipeline).toArray() as Book[];
return books;
}
Function to compute Weighted Reciprocal Rank Fusionβ
Atlas full-text search and vector search, by default, returns documents sorted by their relevancy score from highest to lowest i.e. they are ranked by default.
A reciprocal rank score is given by 1 / (RANK * RANK_CONSTANT)
. A RANK_CONSTANT (typically about 60), prevents the case of 1/0
where RANK is 0, and smoothens the scores so that it is not too heavily skewed towards higher ranked documents.
You may give different weightage to full-text search versus vector search by multiplying the scores with a different weight.
Write a new private function called computeWeightedRRF
in books.ts
:
- It accepts books as an argument of Book[] type, and weight as an argument of number type
- It should compute the Reciprocal Rank score based on its position in the array (the first book in the array should be ranked 0).
- Mulitply the score further by the given weight and store the resultant score as
score
back into the book object.
Answer
private computeWeightedRRF(books: Book[], weight: number): void{
const RANK_CONSTANT = 60;
books.forEach((book,i)=>{
book['score'] = weight*1/(i+RANK_CONSTANT)
return book;
})
}
Function to perform Hybrid Searchβ
The final thing we need to do is to combine the scores by summing the scores from both types of searches. If the same book is found in two searches, the score should be summed up giving it a higher score.
Replace the searchBooks
function in books.ts
:
- It accepts query as an argument of string type
- It should execute both full-text search and vector search based on the query
- It should parse both results into the
computeWeightedRRF
. Lets give full-text search a weight of 0.5 and vector search a weight of 0.5. - Sum the two scores from both sets of results, re-rank the books based on the new scores and return the top 5 books
Answer
public async searchBooks(query: string): Promise<Book[]> {
const VECTOR_WEIGHT = 0.6;
const FULL_TEXT_WEIGHT = 0.4;
const SEARCH_LIMIT = 6
//run full text search and vector search separately
const [fts_results,vs_results] = await Promise.all([
this.fullTextSearch(query, SEARCH_LIMIT),
this.vectorSearch(query, SEARCH_LIMIT)
])
//compute weighted Reciprocal Rank Fusion on both results
this.computeWeightedRRF(fts_results, FULL_TEXT_WEIGHT)
this.computeWeightedRRF(vs_results, VECTOR_WEIGHT)
//aggregate both arrays to a single map using _id as a key
const documentMap = [...fts_results,...vs_results].reduce((map,book:any)=>{
if(map.hasOwnProperty(book._id)){
map[book._id].score += book.score;
}else{
map[book._id] = book;
}
return map;
},{})
//transform map back to an array
const books = Object.keys(documentMap).map(k=>documentMap[k]);
//return books with the highest scores
const topBooks = books.sort((a,b)=>b.score-a.score).slice(0,SEARCH_LIMIT);
return topBooks;
}