๐ What Are Vectors?
Simply put, a vector is a list of numbers. For example, a vector of length 3 could be [1, 2, 3]. A vector of length 5 could be [1, 2, 3, 4, 5]. A vector of length 100 could be [1, 2, 3, 4, 5, ..., 100]. The length of a vector is the number of elements it contains.
In AI, vectors are a mathematical representation of data.
When it comes to GenAI, you will hear about vectors and embeddings. While they don't convey the exact meaning, you will often see the terms used interchangeably.
Technically, an embedding is a vector that has been created by a model. For example, a model could convert a word into a vector. The vector would be the embedding for that word.
Thanks to some of the latest advances in AI, we can now use vectors to represent words, sentences, paragraphs, and even entire documents. This is a huge breakthrough because it allows us to use AI to understand the meaning of text.
Vectors can even be used to represent images, audio, and video, but we'll focus on text in this workshop.
Why Do We Need Vectors?โ
Computers can't understand text. They can only understand numbers. So, we need a way to convert text into numbers. That's where vectors come in.
Using vectors, we can plot text in a multi-dimensional space. It is hard to visualize a multi-dimensional space, so let's start with a 2-dimensional space.
Imagine a plot with a x and y axis. Our ML model will plot various points on this plot. This could represent words, sentences, paragraphs, documents, or even images.
The position where the points are plotted is determined by the model you used. The models converts the data you passed it into a vector. Then, it plots the vector on the chart.
When doing a search, we will create a new vector for the search term. We then plot this new vector on the chart.
Then, we will find the closest words to the search term. The closest words will be the words that are plotted closest to the search term.
The closest term will depend on the algorithm you use to calculate the distance between vectors. Using Euclidian distance, the closest words will be the words that are closest to the search term.
Vector search also provides a cosine algorithm. Using cosine distance, the closest words will be the words that are closest to the search term, but in the same direction.
How Do We Create Vectors?โ
The big breakthrough with GenAI is that developers can now easily use models that have been pre-trained, and made available freely online. These models have been trained on huge datasets, and are able to convert text (or any sort of data, really) into vectors.
There are many ways to create vectors. In this workshop, we'll use our OpenAI API to create vectors for us.