🦹 Stream responses from the RAG application

By default, generation results are returned once the generation is completed. Another option is to stream the results as they come, which is useful for chat use cases where the user can incrementally see results as each token is generated.

Fill in any <CODE_BLOCK_N> placeholders and run the cells under the 🦹‍♀️ Return streaming responses section in the notebook to stream the results from your RAG application.

The answers for code blocks in this section are as follows:

CODE_BLOCK_20

Answer

create_prompt(user_query)

CODE_BLOCK_21

Answer

fw_client.chat.completions.create(
    model=model,
    messages=[{"role": "user", "content": prompt}],
    stream=True
)

CODE_BLOCK_22

Answer

for chunk in response:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="")