Skip to main content

๐Ÿ“˜ Vector Search

Vector search, a.k.a. semantic search is an information retrieval technique that retrieves data based on intent or meaning.

Unlike traditional full-text search which finds keyword matches, vector search uses embeddings to find items closest to your search query in multi-dimensional vector space. The closer the embeddings are to your query, the more similar they are in meaning.

Vector search in MongoDBโ€‹

In MongoDB, you can semantically search through your data using MongoDB Atlas Vector Search.

To perform vector search on your data in MongoDB, you need to create a vector search index. An example of a vector search index definition looks as follows:

{
"fields":[
{
"type": "vector",
"path": "embedding",
"numDimensions": 1536,
"similarity": "cosine"
},
{
"type": "filter",
"path": "pages"
},
...
]
}

In the index definition, you specify the path to the embedding field (path), the number of dimensions in the embedding vectors (numDimensions), and a similarity metric that specifies how to determine nearest neighbors in vector space (similarity). You can also index filter fields that allow you to pre-filter on certain metadata to narrow the scope of the vector search.

Vector search in MongoDB takes the form of an aggregation pipeline stage. It always needs to be the first stage in the pipeline and can be followed by other stages to further process the semantic search results. An example pipeline including the $vectorSearch stage is as follows:

[
{
"$vectorSearch": {
"index": "vector_index",
"path": "embedding",
"queryVector": [0.02421053, -0.022372592,...],
"numCandidates": 150,
"filter": {"pages": 100},
"limit": 10
}
},
{
"$project": {
"_id": 0,
"title": 1,
"score": {"$meta": "vectorSearchScore"}
}
}
]

In this example, you can see a vector search query with a pre-filter. The limit field in the query definition specifies how many documents to return from the vector search.

The $project stage that follows only returns documents with the title field and the similarity score from the vector search.