🦹 Stream responses from the RAG application
By default, generation results are returned once the generation is completed. Another option is to stream the results as they come, which is useful for chat use cases where the user can incrementally see results as each token is generated.
Fill in any <CODE_BLOCK_N>
placeholders and run the cells under the 🦹♀️ Return streaming responses section in the notebook to stream the results from your RAG application.
The answers for code blocks in this section are as follows:
CODE_BLOCK_20
Answer
create_prompt(user_query)
CODE_BLOCK_21
Answer
fw_client.chat.completions.create(
model=model,
messages=[{"role": "user", "content": prompt}],
stream=True
)
CODE_BLOCK_22
Answer
for chunk in response:
if chunk.choices[0].delta.content:
print(chunk.choices[0].delta.content, end="")