Skip to main content

πŸ¦Έβ€β™‚οΈ Implementing Hybrid Search

info

Extra activity, do it if you have extra time or are following at home, won't be covered during the hands-on Lab.

In this section, we'll explore how we can implement Hybrid Search using Reciprocal Rank Fusion, a common technique used to combine results from different search methods.

The MongoDB documentation shows how you can implement the Hybrid Search using a single aggregation pipeline. However, in this lab, we will combine the results from vector search and full-text search with application logic instead.

Prerequites​

This segment depends on the setup you have done in previous exercises:

  1. Ensure you have created an Atlas Search index on the books collection with dynamic mappings.
  2. Ensure you have imported the embeddings.
  3. Ensure you have created the vector search index on the embeddings field.

Write a new function called fullTextSearch in books.ts:

  • It accepts query as an argument of string type, and limit as an argument of number type
  • Using query, it should perform a full-text search on title, synopsis, and author names, and you may boost scores at your own discretion
  • It should limit the number of books returned using the limit argument parsed
  • Using $project to only return the following book properties: _id, title, authors, synopsis, and cover
Answer
    public async fullTextSearch(query: string,limit: number): Promise<Book[]>{
const pipeline = [
{
$search: {
index: "fulltextsearch",
compound: {
should: [
{
text: {
query,
path: "title",
score: { boost: { value: 3 } }
}
},
{
text: {
query,
path: "synopsis",
score: { boost: { value: 2 } }
}
},
{
text: {
query,
path: "authors.name",
score: { boost: { value: 2 } }
}
}
]
}
}
},{
$limit: limit
},{
$project:{ //project only relevant fields
title: 1, authors: 1, synopsis:1, cover:1
}
}
]
const books = await collections?.books?.aggregate(pipeline).toArray() as Book[];
return books;
}

Write a new function called vectorSearch in books.ts:

  • It accepts query as an argument of string type, and limit as an argument of number type
  • It should vectorize the query and use resulting vector to perform a vector search on books collection.
  • It should limit the number of books returned using the limit argument parsed
  • Using $project to only return the following book properties: _id, title, authors, synopsis, and cover
Answer
    public async vectorSearch(query: string, limit: number): Promise<Book[]>{
const vector = await getEmbeddings(query);
const pipeline = [
{
$vectorSearch: {
queryVector: vector,
path: 'embeddings',
numCandidates: 100,
index: 'vectorsearch',
limit: limit,
}
},{
$project:{ //project only relevant fields
title: 1, authors: 1, synopsis:1, cover:1
}
}
]
const books = await collections?.books?.aggregate(pipeline).toArray() as Book[];
return books;
}

Function to compute Weighted Reciprocal Rank Fusion​

Atlas full-text search and vector search, by default, returns documents sorted by their relevancy score from highest to lowest i.e. they are ranked by default.

A reciprocal rank score is given by 1 / (RANK * RANK_CONSTANT). A RANK_CONSTANT (typically about 60), prevents the case of 1/0 where RANK is 0, and smoothens the scores so that it is not too heavily skewed towards higher ranked documents.

You may give different weightage to full-text search versus vector search by multiplying the scores with a different weight.

Write a new private function called computeWeightedRRF in books.ts:

  • It accepts books as an argument of Book[] type, and weight as an argument of number type
  • It should compute the Reciprocal Rank score based on its position in the array (the first book in the array should be ranked 0).
  • Mulitply the score further by the given weight and store the resultant score as score back into the book object.
Answer
    private computeWeightedRRF(books: Book[], weight: number): void{
const RANK_CONSTANT = 60;
books.forEach((book,i)=>{
book['score'] = weight*1/(i+RANK_CONSTANT)
return book;
})
}

The final thing we need to do is to combine the scores by summing the scores from both types of searches. If the same book is found in two searches, the score should be summed up giving it a higher score.

Replace the searchBooks function in books.ts:

  • It accepts query as an argument of string type
  • It should execute both full-text search and vector search based on the query
  • It should parse both results into the computeWeightedRRF. Lets give full-text search a weight of 0.5 and vector search a weight of 0.5.
  • Sum the two scores from both sets of results, re-rank the books based on the new scores and return the top 5 books
Answer
    public async searchBooks(query: string): Promise<Book[]> {
const VECTOR_WEIGHT = 0.6;
const FULL_TEXT_WEIGHT = 0.4;
const SEARCH_LIMIT = 6

//run full text search and vector search separately
const [fts_results,vs_results] = await Promise.all([
this.fullTextSearch(query, SEARCH_LIMIT),
this.vectorSearch(query, SEARCH_LIMIT)
])

//compute weighted Reciprocal Rank Fusion on both results
this.computeWeightedRRF(fts_results, FULL_TEXT_WEIGHT)
this.computeWeightedRRF(vs_results, VECTOR_WEIGHT)

//aggregate both arrays to a single map using _id as a key
const documentMap = [...fts_results,...vs_results].reduce((map,book:any)=>{
if(map.hasOwnProperty(book._id)){
map[book._id].score += book.score;
}else{
map[book._id] = book;
}
return map;
},{})

//transform map back to an array
const books = Object.keys(documentMap).map(k=>documentMap[k]);

//return books with the highest scores
const topBooks = books.sort((a,b)=>b.score-a.score).slice(0,SEARCH_LIMIT);
return topBooks;
}