Skip to main content

πŸ“˜ What Are Vectors?

Simply put, a vector is a list of numbers. For example, a vector of length 3 could be [1, 2, 3]. A vector of length 5 could be [1, 2, 3, 4, 5]. A vector of length 100 could be [1, 2, 3, 4, 5, ..., 100]. The length of a vector is the number of elements it contains.

In AI, vectors are a mathematical representation of data.

tip

When it comes to GenAI, you will hear about vectors and embeddings. While they don't convey the exact meaning, you will often see the terms used interchangeably.

Technically, an embedding is a vector that has been created by a model. For example, a model could convert a word into a vector. The vector would be the embedding for that word.

Thanks to some of the latest advances in AI, we can now use vectors to represent words, sentences, paragraphs, and even entire documents. This is a huge breakthrough because it allows us to use AI to understand the meaning of text.

Vectors can even be used to represent images, audio, and video, but we'll focus on text in this workshop.

Why do we need vectors?​

Computers can't understand text. They can only understand numbers. So, we need a way to convert text into numbers. That's where vectors come in.

Using vectors, we can plot text in a multi-dimensional space. It is hard to visualize a multi-dimensional space, so let's start with a two-dimensional space.

Imagine a plot with an x and y axis. Our ML model will plot various points on this plot. This could represent words, sentences, paragraphs, documents, or even images.

The position where the points are plotted is determined by the model you used. The model converts the data you passed it into a vector. Then, it plots the vector on the chart.

Points on a chart

When doing a search, we will create a new vector for the search term. We then plot this new vector on the chart.

Search term on a chart

Then, we will find the closest words to the search term. The closest words will be the words that are plotted closest to the search term.

The closest term will depend on the algorithm you use to calculate the distance between vectors. Using Euclidian distance, the closest words will be the words that are closest to the search term.

Closest words

Vector search also provides a cosine algorithm. Using cosine distance, the closest words will be the words that are closest to the search term, but in the same direction.

Closest words

How do we create vectors?​

The big breakthrough with GenAI is that developers can now easily use models that have been pre-trained and made available freely online. These models have been trained on huge datasets and are able to convert text (or any sort of data, really) into vectors.

There are many ways to create vectors. In this workshop, we'll use a pre-trained model and an API that will return vectors for us.