Skip to main content

Handling Streaming Responses

When you set stream=True in your API request, the server sends back data in chunks as it becomes available. This is incredibly useful for applications like chatbots, where you want to display the response to the user word-by-word.

This guide provides more robust examples for handling streamed responses.

Python

When you iterate over the stream, you collect delta chunks. You should check if the content of the delta is available before appending it.

import os
from openai import OpenAI

client = OpenAI(
base_url="https://api.inceptron.io/v1",
api_key=os.environ["INCEPTRON_API_KEY"],
)

stream = client.chat.completions.create(
model="meta-llama/Llama-3.3-70B-Instruct",
messages=[
{
"role": "user",
"content": "Write a short story about a robot who discovers music."
}
],
stream=True,
)

full_response = ""
for chunk in stream:
content = chunk.choices[0].delta.content
if content:
full_response += content
print(content, end="", flush=True)

print("\n--- Full Response ---")
print(full_response)

JavaScript (Node.js)

Using an for await...of loop is the modern way to handle async iterators in JavaScript.

import { OpenAI } from "openai";

const client = new OpenAI({
baseURL: "https://api.inceptron.io/v1",
apiKey: process.env.INCEPTRON_API_KEY,
});

async function main() {
const stream = await client.chat.completions.create({
model: "meta-llama/Llama-3.3-70B-Instruct",
messages: [
{
role: "user",
content: "Write a short story about a robot who discovers music.",
},
],
stream: true,
});

let fullResponse = "";
for await (const chunk of stream) {
const content = chunk.choices[0]?.delta?.content || "";
fullResponse += content;
process.stdout.write(content);
}

process.stdout.write('\n--- Full Response ---\n');
process.stdout.write(fullResponse);
process.stdout.write('\n');
}

main();