🦸‍♂️ Implementing Hybrid Search

信息

Extra activity, do it if you have extra time or are following at home, won't be covered during the hands-on Lab.

In this section, we'll explore how we can implement Hybrid Search using Reciprocal Rank Fusion, a common technique used to combine results from different search methods.

The MongoDB documentation shows how you can implement the Hybrid Search using a single aggregation pipeline. However, in this lab, we will combine the results from vector search and full-text search with application logic instead.

Prerequites

This segment depends on the setup you have done in previous exercises:

Ensure you have created an Atlas Search index on the books collection with dynamic mappings.
Ensure you have imported the embeddings.
Ensure you have created the vector search index on the embeddings field.

Function for Full-Text Search

Write a new function called fullTextSearch in books.ts:

It accepts query as an argument of string type, and limit as an argument of number type
Using query, it should perform a full-text search on title, synopsis, and author names, and you may boost scores at your own discretion
It should limit the number of books returned using the limit argument parsed
Using $project to only return the following book properties: _id, title, authors, synopsis, and cover

Answer

    public async fullTextSearch(query: string,limit: number): Promise<Book[]>{
        const pipeline = [
            {
              $search: {
                index: "fulltextsearch",
                compound: {
                  should: [
                    {
                      text: {
                        query,
                        path: "title",
                        score: { boost: { value: 3 } }
                      }
                    },
                    {
                        text: {
                          query,
                          path: "synopsis",
                          score: { boost: { value: 2 } }
                        }
                    },
                    {
                      text: {
                        query,
                        path: "authors.name",
                        score: { boost: { value: 2 } }
                      }
                    }
                  ]
                }
              }
            },{
              $limit: limit
            },{
              $project:{ //project only relevant fields
                title: 1, authors: 1, synopsis:1, cover:1
              }
            }
          ]
        const books = await collections?.books?.aggregate(pipeline).toArray() as Book[];
        return books;
    }

Function for Vector Search

Write a new function called vectorSearch in books.ts:

It accepts query as an argument of string type, and limit as an argument of number type
It should vectorize the query and use resulting vector to perform a vector search on books collection.
It should limit the number of books returned using the limit argument parsed
Using $project to only return the following book properties: _id, title, authors, synopsis, and cover

Answer

    public async vectorSearch(query: string, limit: number): Promise<Book[]>{
        const vector = await getEmbeddings(query);
        const pipeline = [
            {
                $vectorSearch: {
                  queryVector:  vector,
                  path: 'embeddings',
                  numCandidates: 100,
                  index: 'vectorsearch',
                  limit: limit,
                }
            },{
                $project:{ //project only relevant fields
                  title: 1, authors: 1, synopsis:1, cover:1
                }
            }
          ]
        const books = await collections?.books?.aggregate(pipeline).toArray() as Book[];
        return books;
    }

Function to compute Weighted Reciprocal Rank Fusion

Atlas full-text search and vector search, by default, returns documents sorted by their relevancy score from highest to lowest i.e. they are ranked by default.

A reciprocal rank score is given by 1 / (RANK * RANK_CONSTANT). A RANK_CONSTANT (typically about 60), prevents the case of 1/0 where RANK is 0, and smoothens the scores so that it is not too heavily skewed towards higher ranked documents.

You may give different weightage to full-text search versus vector search by multiplying the scores with a different weight.

Write a new private function called computeWeightedRRF in books.ts:

It accepts books as an argument of Book[] type, and weight as an argument of number type
It should compute the Reciprocal Rank score based on its position in the array (the first book in the array should be ranked 0).
Mulitply the score further by the given weight and store the resultant score as score back into the book object.

Answer

    private computeWeightedRRF(books: Book[], weight: number): void{
        const RANK_CONSTANT = 60;
        books.forEach((book,i)=>{
            book['score'] = weight*1/(i+RANK_CONSTANT)
            return book;
        })
    }

Function to perform Hybrid Search

The final thing we need to do is to combine the scores by summing the scores from both types of searches. If the same book is found in two searches, the score should be summed up giving it a higher score.

Replace the searchBooks function in books.ts:

It accepts query as an argument of string type
It should execute both full-text search and vector search based on the query
It should parse both results into the computeWeightedRRF. Lets give full-text search a weight of 0.5 and vector search a weight of 0.5.
Sum the two scores from both sets of results, re-rank the books based on the new scores and return the top 5 books

Answer

    public async searchBooks(query: string): Promise<Book[]> {
        const VECTOR_WEIGHT = 0.6;
        const FULL_TEXT_WEIGHT = 0.4;
        const SEARCH_LIMIT = 6

        //run full text search and vector search separately
        const [fts_results,vs_results] = await Promise.all([
            this.fullTextSearch(query, SEARCH_LIMIT),
            this.vectorSearch(query, SEARCH_LIMIT)
        ])

        //compute weighted Reciprocal Rank Fusion on both results
        this.computeWeightedRRF(fts_results, FULL_TEXT_WEIGHT)
        this.computeWeightedRRF(vs_results, VECTOR_WEIGHT)

        //aggregate both arrays to a single map using _id as a key
        const documentMap = [...fts_results,...vs_results].reduce((map,book:any)=>{
            if(map.hasOwnProperty(book._id)){
                map[book._id].score += book.score;
            }else{
                map[book._id] = book;
            }
            return map;
        },{})

        //transform map back to an array
        const books = Object.keys(documentMap).map(k=>documentMap[k]);

        //return books with the highest scores
        const topBooks = books.sort((a,b)=>b.score-a.score).slice(0,SEARCH_LIMIT);
        return topBooks;
      }

Prerequites​

Function for Full-Text Search​

Function for Vector Search​

Function to compute Weighted Reciprocal Rank Fusion​

Function to perform Hybrid Search​

Prerequites

Function for Full-Text Search

Function for Vector Search

Function to compute Weighted Reciprocal Rank Fusion

Function to perform Hybrid Search