π¦ΈββοΈ Implementing Hybrid Search
Extra activity, do it if you have extra time or are following at home, won't be covered during the hands-on Lab.
In this section, we'll explore how we can implement Hybrid Search using Reciprocal Rank Fusion, a common technique used to combine results from different search methods.
The MongoDB documentation shows how you can implement the Hybrid Search using a single aggregation pipeline. However, in this lab, we will combine the results from vector search and full-text search with application logic instead.
Prerequitesβ
This segment depends on the setup you have done in previous exercises:
- Ensure you have created an Atlas Search index on the books collection with dynamic mappings.
- Ensure you have imported the embeddings.
- Ensure you have created the vector search index on the embeddings field.
Function for Full-Text Searchβ
- π NodeJS/Express
- βοΈ Java Spring Boot
Write a new function called fullTextSearch
in books.ts
:
- It accepts query as an argument of string type, and limit as an argument of number type
- Using query, it should perform a full-text search on
title
,synopsis
, andauthor
names, and you may boost scores at your own discretion - It should limit the number of books returned using the limit argument parsed
- Using
$project
to only return the following book properties:_id
,title
,authors
,synopsis
, andcover
Answer
public async fullTextSearch(query: string,limit: number): Promise<Book[]>{
const pipeline = [
{
$search: {
index: "fulltextsearch",
compound: {
should: [
{
text: {
query,
path: "title",
score: { boost: { value: 3 } }
}
},
{
text: {
query,
path: "synopsis",
score: { boost: { value: 2 } }
}
},
{
text: {
query,
path: "authors.name",
score: { boost: { value: 2 } }
}
}
]
}
}
},{
$limit: limit
},{
$project:{ //project only relevant fields
title: 1, authors: 1, synopsis:1, cover:1
}
}
]
const books = await collections?.books?.aggregate(pipeline).toArray() as Book[];
return books;
}
Write a new method called fullTextSearch
in BookService.java
:
-
It accepts query as a String argument, and limit as an int argument
-
Uing query, it should perform a full-text search on
title
,synopsis
, andauthor
names, applying score boosts where appropriate -
It should limit the number of books returned using the limit argument
-
Use
$project
to only return the following book properties:_id
,title
,authors
,synopsis
, andcover
Answer
public List<Book> fullTextSearch(String query, int limit) {
AggregationOperation searchStage = context -> new Document("$search",
new Document("index", "fulltextsearch")
.append("compound",
new Document("should", Arrays.asList(
new Document("text",
new Document("query", query)
.append("path", "title")
.append("score", new Document("boost", new Document("value", 3L)))),
new Document("text",
new Document("query", query)
.append("path", "synopsis")
.append("score", new Document("boost", new Document("value", 2L)))),
new Document("text",
new Document("query", query)
.append("path", "authors.name")
.append("score", new Document("boost", new Document("value", 2L))))
))
)
);
AggregationOperation limitStage = context -> new Document("$limit", limit);
AggregationOperation projectStage = context -> new Document("$project",
new Document("title", 1L)
.append("authors", 1L)
.append("synopsis", 1L)
.append("cover", 1L)
);
Aggregation aggregation = Aggregation.newAggregation(searchStage, limitStage, projectStage);
return mongoTemplate.aggregate(aggregation, "books", Book.class).getMappedResults();
}
Function for Vector Searchβ
- π NodeJS/Express
- βοΈ Java Spring Boot
Write a new function called vectorSearch
in books.ts
:
- It accepts query as an argument of string type, and limit as an argument of number type
- It should vectorize the query and use resulting vector to perform a vector search on books collection.
- It should limit the number of books returned using the limit argument parsed
- Using
$project
to only return the following book properties:_id
,title
,authors
,synopsis
, and `cover``
Answer
public async vectorSearch(query: string, limit: number): Promise<Book[]>{
const vector = await getEmbeddings(query);
const pipeline = [
{
$vectorSearch: {
queryVector: vector,
path: 'embeddings',
numCandidates: 100,
index: 'vectorsearch',
limit: limit,
}
},{
$project:{ //project only relevant fields
title: 1, authors: 1, synopsis:1, cover:1
}
}
]
const books = await collections?.books?.aggregate(pipeline).toArray() as Book[];
return books;
}
Write a new method called vectorSearch
in BookService.java
:
-
It accepts query as a String argument, and limit as an int argument
-
It should vectorize query using the configured
EmbeddingProvider
and use the resulting vector to perform a$vectorSearch
on the books collection -
It should limit the number of books returned using the limit argument
-
Use
$project
to only return the following book properties:_id
,title
,authors
,synopsis
, andcover
Answer
public List<Book> vectorSearch(String query, int limit) {
List<Double> vector = embeddingProvider.getEmbeddings(query);
AggregationOperation searchStage = context ->
new Document("$vectorSearch",
new Document("index", "vectorsearch")
.append("path", "embeddings")
.append("queryVector", vector)
.append("numCandidates", 100)
.append("limit", limit)
);
AggregationOperation projectStage = context ->
new Document("$project",
new Document("title", 1L)
.append("authors", 1L)
.append("synopsis", 1L)
.append("cover", 1L)
);
Aggregation aggregation = Aggregation.newAggregation(searchStage, projectStage);
return mongoTemplate.aggregate(aggregation, "books", Book.class).getMappedResults();
}
Function to compute Weighted Reciprocal Rank Fusionβ
Atlas full-text search and vector search, by default, returns documents sorted by their relevancy score from highest to lowest i.e. they are ranked by default.
A reciprocal rank score is given by 1 / (RANK * RANK_CONSTANT)
. A RANK_CONSTANT (typically about 60), prevents the case of 1/0
where RANK is 0, and smoothens the scores so that it is not too heavily skewed towards higher ranked documents.
You may give different weightage to full-text search versus vector search by multiplying the scores with a different weight.
- π NodeJS/Express
- βοΈ Java Spring Boot
Write a new private function called computeWeightedRRF
in books.ts
:
- It accepts books as an argument of Book[] type, and weight as an argument of number type
- It should compute the Reciprocal Rank score based on its position in the array (the first book in the array should be ranked 0).
- Mulitply the score further by the given weight and store the resultant score as
score
back into the book object.
Answer
private computeWeightedRRF(books: Book[], weight: number): void{
const RANK_CONSTANT = 60;
books.forEach((book,i)=>{
book['score'] = weight*1/(i+RANK_CONSTANT)
return book;
})
}
In Java, the Book entity is defined as an immutable record mapped to the database. Because of this, we cannot simply add or modify a field such as score at runtime. To handle the computation of scores for Hybrid Search, we need two things:
- A new wrapper type to hold both the
Book
and its computed score. - A helper method that calculates the Reciprocal Rank score and returns this wrapper.
First, create a new BookWithScore
record:
Answer
public record BookWithScore(Book book, double score) {}
Then, add a private method computeWeightedRRF
in BookService.java
:
Answer
private List<BookWithScore> computeWeightedRRF(List<Book> books, double weight) {
final int RANK_CONSTANT = 60;
List<BookWithScore> scoredBooks = new ArrayList<>();
for (int i = 0; i < books.size(); i++) {
Book book = books.get(i);
double score = weight * (1.0 / (i + RANK_CONSTANT));
scoredBooks.add(new BookWithScore(book, score));
}
return scoredBooks;
}
- It accepts books as a
List<Book>
argument, and weight as a double argument - It computes the Reciprocal Rank score based on each bookβs position in the list (the first book has rank 0)
- It multiplies the
score
by the given weight - It returns a list of
BookWithScore
objects
Function to perform Hybrid Searchβ
The final thing we need to do is to combine the scores by summing the scores from both types of searches. If the same book is found in two searches, the score should be summed up giving it a higher score.
- π NodeJS/Express
- βοΈ Java Spring Boot
Replace the searchBooks
function in books.ts
:
- It accepts query as an argument of string type
- It should execute both full-text search and vector search based on the query
- It should parse both results into the
computeWeightedRRF
. Lets give full-text search a weight of 0.5 and vector search a weight of 0.5. - Sum the two scores from both sets of results, re-rank the books based on the new scores and return the top 5 books
Answer
public async searchBooks(query: string): Promise<Book[]> {
const VECTOR_WEIGHT = 0.6;
const FULL_TEXT_WEIGHT = 0.4;
const SEARCH_LIMIT = 6
//run full text search and vector search separately
const [fts_results,vs_results] = await Promise.all([
this.fullTextSearch(query, SEARCH_LIMIT),
this.vectorSearch(query, SEARCH_LIMIT)
])
//compute weighted Reciprocal Rank Fusion on both results
this.computeWeightedRRF(fts_results, FULL_TEXT_WEIGHT)
this.computeWeightedRRF(vs_results, VECTOR_WEIGHT)
//aggregate both arrays to a single map using _id as a key
const documentMap = [...fts_results,...vs_results].reduce((map,book:any)=>{
if(map.hasOwnProperty(book._id)){
map[book._id].score += book.score;
}else{
map[book._id] = book;
}
return map;
},{})
//transform map back to an array
const books = Object.keys(documentMap).map(k=>documentMap[k]);
//return books with the highest scores
const topBooks = books.sort((a,b)=>b.score-a.score).slice(0,SEARCH_LIMIT);
return topBooks;
}
Replace the searchBook methods in BookService.java:
- It accepts theTerm as a String argument
- It should execute both full-text search and vector search using the given query
- It should pass both result sets into
computeWeightedRRF
. Letβs give full-text search a weight of 0.4 and vector search a weight of 0.6 - It should combine the results into a map keyed by book
_id
, summing the scores if the same book appears in both searches - It should return the top 5β6 books sorted by score in descending order
Answer
public List<Book> searchBooks(String theTerm) {
final double VECTOR_WEIGHT = 0.6;
final double FULL_TEXT_WEIGHT = 0.4;
final int SEARCH_LIMIT = 6;
List<Book> ftsResults = fullTextSearch(theTerm, SEARCH_LIMIT);
List<Book> vsResults = vectorSearch(theTerm, SEARCH_LIMIT);
List<BookWithScore> scoredFts = computeWeightedRRF(ftsResults, FULL_TEXT_WEIGHT);
List<BookWithScore> scoredVs = computeWeightedRRF(vsResults, VECTOR_WEIGHT);
Map<String, Double> scoreMap = new HashMap<>();
Map<String, Book> bookMap = new HashMap<>();
Stream.concat(scoredFts.stream(), scoredVs.stream())
.forEach(bws -> {
String id = bws.book().id();
bookMap.putIfAbsent(id, bws.book());
scoreMap.merge(id, bws.score(), Double::sum);
});
return scoreMap.entrySet().stream()
.sorted(Map.Entry.<String, Double>comparingByValue().reversed())
.limit(SEARCH_LIMIT)
.map(entry -> bookMap.get(entry.getKey()))
.toList();
}