Skip to main content

๐Ÿ‘ Chunk and embed the data

Since we are working with large documents, we first need to break them up into smaller chunks. Then, to make each chunk searchable using vector search, we need to add embeddings to them.

In this workshop, we will use voyage-context-3 from Voyage AI to produce contextualized embeddings for the chunks.

Fill in any <CODE_BLOCK_N> placeholders and run the cells under the Step 3: Chunk and embed the data section in the notebook to chunk and embed the articles we loaded.

The answers for code blocks in this section are as follows:

CODE_BLOCK_1

Answer
text_splitter.split_text(text)

CODE_BLOCK_2

Answer
vo.contextualized_embed(inputs=[content], model="voyage-context-3", input_type=input_type)

CODE_BLOCK_3

Answer
get_chunks(doc, "body")

CODE_BLOCK_4

Answer
get_embeddings(chunks, "document")

CODE_BLOCK_5

Answer
chunk_doc["body"] = chunk

CODE_BLOCK_6

Answer
chunk_doc["embedding"] = embedding