Skip to main content

Chat Completions API Reference

The Chat Completions API is the core of our text generation service. Given a series of messages, the model will return a predicted response. This is ideal for building chatbots and other conversational AI applications.

Request Body

The request body must be a JSON object with the following parameters:

ParameterTypeRequiredDescription
modelstringYesThe ID of the model to use.
messagesarrayYesA list of message objects that form the conversation history. See the structure below.
temperaturenumberNoControls randomness. A value between 0 and 2. Higher values like 0.8 make the output more random, while lower values like 0.2 make it more focused and deterministic. Defaults to 1.
max_tokensintegerNoThe maximum number of tokens to generate in the chat completion. The total length of input tokens and generated tokens is limited by the model's context length.
streambooleanNoIf true, the API will send back partial message deltas as they are generated, like in ChatGPT. This is useful for creating a real-time, responsive user experience. Defaults to false.
top_pnumberNoAn alternative to temperature sampling, called nucleus sampling. The model considers only the tokens with a cumulative probability of top_p. For example, 0.1 means only the top 10% of tokens are considered. Defaults to 1.
nintegerNoHow many chat completion choices to generate for each input message. Note that this will multiply the number of API calls you make. Defaults to 1.
stopstring or arrayNoUp to 4 sequences where the API will stop generating further tokens.

The messages Object

The messages array is the heart of the request, providing the conversation history that the model will use to generate a response. Each message object in the array has a role and content.

RoleDescription
systemThe system message helps set the behavior of the assistant. It can be used to provide high-level instructions for the conversation, like "You are a helpful assistant that translates English to French."
userA message from the user. This is where you provide the prompts and questions for the assistant.
assistantA message from the assistant. This can be used to provide examples of desired behavior (few-shot prompting) or to continue a conversation.

A typical conversation starts with a system message, followed by an alternating series of user and assistant messages.

Response Body

The API returns a JSON object containing the completion choices.

ParameterTypeDescription
idstringA unique identifier for the chat completion.
objectstringThe object type, which is always chat.completion.
createdintegerThe Unix timestamp (in seconds) of when the completion was created.
modelstringThe model used for the completion.
choicesarrayA list of chat completion choices.
usageobjectAn object containing token usage statistics for the completion.

The choices object

ParameterTypeDescription
indexintegerThe index of the choice in the list of choices.
messageobjectThe message object generated by the model, containing role and content.
finish_reasonstringThe reason the model stopped generating tokens. Can be stop (if it reached a stop sequence), length (if it reached max_tokens), or content_filter.

The usage object

ParameterTypeDescription
prompt_tokensintegerThe number of tokens in the prompt.
completion_tokensintegerThe number of tokens in the generated completion.
total_tokensintegerThe total number of tokens used in the request (prompt + completion).

Example Response

{
"id": "chatcmpl-123",
"object": "chat.completion",
"created": 1677652288,
"model": "meta-llama/Llama-3.3-70B-Instruct",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "\n\nHello there, how may I assist you today?"
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 9,
"completion_tokens": 12,
"total_tokens": 21
}
}

Example Request

Here is an example of a request to the Chat Completions API using different programming languages.

Stream
import os
from openai import OpenAI

client = OpenAI(
base_url="https://api.inceptron.io/v1",
api_key=os.environ["INCEPTRON_API_KEY"],
)

completion = client.chat.completions.create(
model="meta-llama/Llama-3.3-70B-Instruct",
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "What is the capital of France?"}
],
)

print(completion.choices[0].message.content)

Response Body

The response body is a JSON object that contains the result of the chat completion.

Standard Response

{
"id": "chatcmpl-123",
"object": "chat.completion",
"created": 1677652288,
"model": "meta-llama/Llama-3.3-70B-Instruct",
"choices": [{
"index": 0,
"message": {
"role": "assistant",
"content": "

Paris is the capital of France."
},
"finish_reason": "stop"
}],
"usage": {
"prompt_tokens": 9,
"completion_tokens": 12,
"total_tokens": 21
}
}

Streaming Response

When stream is set to true, the API will return a stream of data:-prefixed JSON objects. The final object will be [DONE].

data: {"id":"chatcmpl-123","object":"chat.completion.chunk","created":1677652288,"model":"meta-llama/Llama-3.3-70B-Instruct","choices":[{"index":0,"delta":{"role":"assistant"},"finish_reason":null}]}

data: {"id":"chatcmpl-123","object":"chat.completion.chunk","created":1677652288,"model":"meta-llama/Llama-3.3-70B-Instruct","choices":[{"index":0,"delta":{"content":"
"},"finish_reason":null}]}

data: {"id":"chatcmpl-123","object":"chat.completion.chunk","created":1677652288,"model":"meta-llama/Llama-3.3-70B-Instruct","choices":[{"index":0,"delta":{"content":"Paris"},"finish_reason":null}]}

data: {"id":"chatcmpl-123","object":"chat.completion.chunk","created":1677652288,"model":"meta-llama/Llama-3.3-70B-Instruct","choices":[{"index":0,"delta":{"content":" is"},"finish_reason":null}]}

data: {"id":"chatcmpl-123","object":"chat.completion.chunk","created":1677652288,"model":"meta-llama/Llama-3.3-70B-Instruct","choices":[{"index":0,"delta":{"content":" the"},"finish_reason":null}]}

data: {"id":"chatcmpl-123","object":"chat.completion.chunk","created":1677652288,"model":"meta-llama/Llama-3.3-70B-Instruct","choices":[{"index":0,"delta":{"content":" capital"},"finish_reason":null}]}

data: {"id":"chatcmpl-123","object":"chat.completion.chunk","created":1677652288,"model":"meta-llama/Llama-3.3-70B-Instruct","choices":[{"index":0,"delta":{"content":" of"},"finish_reason":null}]}

data: {"id":"chatcmpl-123","object":"chat.completion.chunk","created":1677652288,"model":"meta-llama/Llama-3.3-70B-Instruct","choices":[{"index":0,"delta":{"content":" France"},"finish_reason":null}]}

data: {"id":"chatcmpl-123","object":"chat.completion.chunk","created":1677652288,"model":"meta-llama/Llama-3.3-70B-Instruct","choices":[{"index":0,"delta":{"content":"."},"finish_reason":null}]}

data: {"id":"chatcmpl-123","object":"chat.completion.chunk","created":1677652288,"model":"meta-llama/Llama-3.3-70B-Instruct","choices":[{"index":0,"delta":{},"finish_reason":"stop"}]}

data: [DONE]