๐๏ธ ๐ Running Jupyter Notebooks in Google Colab
Jupyter Notebooks is an interactive Python environment. Cells in a Jupyter notebook are a modular unit of code or text that you can execute and view outputs for.
๐๏ธ ๐ Load the dataset
First, let's download the dataset for our lab. We'll use a subset of articles from the MongoDB Developer Center as the source data for our RAG application.
๐๏ธ ๐ Chunk up the data
Since we are working with large documents, we first need to break them up into smaller chunks before embedding and storing them in MongoDB.
๐๏ธ ๐ Generate embeddings
To perform vector search on our data, we need to embed it (i.e. generate embedding vectors) before ingesting it into MongoDB.
๐๏ธ ๐ Ingest data into MongoDB
The final step to build a MongoDB vector store for our RAG application is to ingest the embedded article chunks into MongoDB.