🦸‍♂️ Vector quantization

Vector quantization is a technique to reduce the number of bits required to represent a vector. This can help vastly improve vector search performance.

However, before you enable quantization, note the current vector search index size. You may do so via the Atlas UI.

Enabling quantization

To enable vector auto-quantization on your embeddings, simply set the quantization field to one of the supported quantization types (scalar or binary) in the vector search index definition.

You can edit your vector search index definition to the following:

Click to expand

{
  "fields": [
    {
      "numDimensions": 1408,
      "path": "embeddings",
      "quantization": "scalar",
      "similarity": "cosine",
      "type": "vector"
    }
    //...
  ]
}

信息

Notice the slight increase in the size of the vector search index upon enabling automatic-quantization. This is because full-fidelity vectors are also stored on disk for re-scoring and/or exact nearest neighbors (ENN) search.

In the Atlas UI, the entire index size is displayed, which might be larger than the original index size, since Atlas does not show a break down of the data structures within an index that are stored in RAM and on disk.

The Atlas Search metrics however will show a much smaller index that is held in memory when you enable automatic quantization. Refer to our documentation to learn more about these considerations.

If storage and index size is a concern, you may also consider pre-quantized vectors.

Enabling quantization​

Enabling quantization