π¦ΈββοΈ Pre-filtering Data
Extra activity! Do it if you have extra time or are following along at home. It won't be covered during the hands-on lab.
One of the nice things about Vector Search in Atlas is its seamless integration with the MongoDB ecosystem. For instance, to do a vector search, we use an aggregation pipeline stage, and after searching, we can project, limit our data, etc. But sometimes, we want to filter before running the semantic search. For that, we can use the optional filter
property in $vectorSearch
.
Pre-filtering using number fieldsβ
If we want to pre-filter all books that are from 2001, we can try this (but it won't work right now):
[
{$vectorSearch: {
queryVector: vector,
path: "embeddings",
numCandidates: 100,
index: "vectorsearch",
limit: 100,
filter: {year: {$eq: 2001}}
}
}
]
Pre-filtering using string fieldsβ
We can try this and it won't work as well:
[
{$vectorSearch: {
queryVector: vector,
path: "embeddings",
numCandidates: 100,
index: "vectorsearch",
limit: 100,
filter: {language: {$eq: "es"}}
}
}
]
The problem lies in the vectorsearch
index, not in this code. For string fields to be pre-filtered, we need to add a mapping to those fields in our search index definition. To do that, go to MongoDB Atlas, go to your collections, and open the Search Indexes tab, as you did while creating the indexes.
In this case, we already have our index and we're going to edit it in the JSON editor. Just change the index by adding a mapping for the year
field and language
field. The index should look like:
{
"fields": [
{
"type": "vector",
"path": "embeddings",
"numDimensions": 1536,
"similarity": "cosine"
},
{
"type": "filter",
"path": "year"
},
{
"type": "filter",
"path": "language"
}
]
}
The only difference is that we've added this part, stating that year
and language
should be indexed as a filter.
{
"type": "filter",
"path": "year"
},
{
"type": "filter",
"path": "language"
}
Add that new aggregation pipeline in your code (server/src/controllers/books.ts
inside the now familiar searchBooks
method) and when searching, you'll get semantic results written in Spanish.