Skip to main content

πŸ¦Έβ€β™‚οΈ Pre-filtering Data

info

Extra activity! Do it if you have extra time or are following along at home. It won't be covered during the hands-on lab.

One of the nice things about Vector Search in Atlas is its seamless integration with the MongoDB ecosystem. For instance, to do a vector search, we use an aggregation pipeline stage, and after searching, we can project, limit our data, etc. But sometimes, we want to filter before running the semantic search. For that, we can use the optional filter property in $vectorSearch.

Pre-filtering using number fields​

If we want to pre-filter all books that are from 2001, we can try this (but it won't work right now):

[
{$vectorSearch: {
queryVector: vector,
path: "embeddings",
numCandidates: 100,
index: "vectorsearch",
limit: 100,
filter: {year: {$eq: 2001}}
}
}
]

Pre-filtering using string fields​

We can try this and it won't work as well:

[
{$vectorSearch: {
queryVector: vector,
path: "embeddings",
numCandidates: 100,
index: "vectorsearch",
limit: 100,
filter: {language: {$eq: "es"}}
}
}
]

The problem lies in the vectorsearch index, not in this code. For string fields to be pre-filtered, we need to add a mapping to those fields in our search index definition. To do that, go to MongoDB Atlas, go to your collections, and open the Search Indexes tab, as you did while creating the indexes.

In this case, we already have our index and we're going to edit it in the JSON editor. Just change the index by adding a mapping for the year field and language field. The index should look like:

{
"fields": [
{
"type": "vector",
"path": "embeddings",
"numDimensions": 1536,
"similarity": "cosine"
},
{
"type": "filter",
"path": "year"
},
{
"type": "filter",
"path": "language"
}
]
}

The only difference is that we've added this part, stating that year and language should be indexed as a filter.

    {
"type": "filter",
"path": "year"
},
{
"type": "filter",
"path": "language"
}

Add that new aggregation pipeline in your code (server/src/controllers/books.ts inside the now familiar searchBooks method) and when searching, you'll get semantic results written in Spanish.