Chat Completions API Reference

The Chat Completions API is the core of our text generation service. Given a series of messages, the model will return a predicted response. This is ideal for building chatbots and other conversational AI applications.

Request Body

The request body must be a JSON object with the following parameters:

Parameter	Type	Required	Description
`model`	string	Yes	The ID of the model to use.
`messages`	array	Yes	A list of message objects that form the conversation history. See the structure below.
`temperature`	number	No	Controls randomness. A value between 0 and 2. Higher values like 0.8 make the output more random, while lower values like 0.2 make it more focused and deterministic. Defaults to `1`.
`max_tokens`	integer	No	The maximum number of tokens to generate in the chat completion. The total length of input tokens and generated tokens is limited by the model's context length.
`stream`	boolean	No	If `true`, the API will send back partial message deltas as they are generated, like in ChatGPT. This is useful for creating a real-time, responsive user experience. Defaults to `false`.
`top_p`	number	No	An alternative to temperature sampling, called nucleus sampling. The model considers only the tokens with a cumulative probability of `top_p`. For example, `0.1` means only the top 10% of tokens are considered. Defaults to `1`.
`n`	integer	No	How many chat completion choices to generate for each input message. Note that this will multiply the number of API calls you make. Defaults to `1`.
`stop`	string or array	No	Up to 4 sequences where the API will stop generating further tokens.

The `messages` Object

The messages array is the heart of the request, providing the conversation history that the model will use to generate a response. Each message object in the array has a role and content.

Role	Description
`system`	The system message helps set the behavior of the assistant. It can be used to provide high-level instructions for the conversation, like "You are a helpful assistant that translates English to French."
`user`	A message from the user. This is where you provide the prompts and questions for the assistant.
`assistant`	A message from the assistant. This can be used to provide examples of desired behavior (few-shot prompting) or to continue a conversation.

A typical conversation starts with a system message, followed by an alternating series of user and assistant messages.

Response Body

The API returns a JSON object containing the completion choices.

Parameter	Type	Description
`id`	string	A unique identifier for the chat completion.
`object`	string	The object type, which is always `chat.completion`.
`created`	integer	The Unix timestamp (in seconds) of when the completion was created.
`model`	string	The model used for the completion.
`choices`	array	A list of chat completion choices.
`usage`	object	An object containing token usage statistics for the completion.

The `choices` object

Parameter	Type	Description
`index`	integer	The index of the choice in the list of choices.
`message`	object	The message object generated by the model, containing `role` and `content`.
`finish_reason`	string	The reason the model stopped generating tokens. Can be `stop` (if it reached a stop sequence), `length` (if it reached `max_tokens`), or `content_filter`.

The `usage` object

Parameter	Type	Description
`prompt_tokens`	integer	The number of tokens in the prompt.
`completion_tokens`	integer	The number of tokens in the generated completion.
`total_tokens`	integer	The total number of tokens used in the request (prompt + completion).

Example Response

{
  "id": "chatcmpl-123",
  "object": "chat.completion",
  "created": 1677652288,
  "model": "meta-llama/Llama-3.3-70B-Instruct",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "\n\nHello there, how may I assist you today?"
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 9,
    "completion_tokens": 12,
    "total_tokens": 21
  }
}

Example Request

Here is an example of a request to the Chat Completions API using different programming languages.

Stream

Python
JavaScript
cURL

import os
from openai import OpenAI

client = OpenAI(
  base_url="https://api.inceptron.io/v1",
  api_key=os.environ["INCEPTRON_API_KEY"],
)

completion = client.chat.completions.create(
  model="meta-llama/Llama-3.3-70B-Instruct",
  messages=[
      {"role": "system", "content": "You are a helpful assistant."},
      {"role": "user", "content": "What is the capital of France?"}
  ],
)

print(completion.choices[0].message.content)

import { OpenAI } from "openai";

const client = new OpenAI({
baseURL: "https://api.inceptron.io/v1",
apiKey: process.env.INCEPTRON_API_KEY,
});

const chatCompletion = await client.chat.completions.create({
model: "meta-llama/Llama-3.3-70B-Instruct",
  messages: [
      {"role": "system", "content": "You are a helpful assistant."},
      {"role": "user", "content": "What is the capital of France?"},
  ],
});

console.log(chatCompletion.choices[0].message.content);

curl https://api.inceptron.io/v1/chat/completions \
        -H "Content-Type: application/json" \
        -H "Authorization: Bearer $INCEPTRON_API_KEY" \
        -d '{
              "model": "meta-llama/Llama-3.3-70B-Instruct",
              "messages": [
                {"role": "system", "content": "You are a helpful assistant."},
                {"role": "user", "content": "What is the capital of France?"}
              ]
          }'

Response Body

The response body is a JSON object that contains the result of the chat completion.

Standard Response

{
"id": "chatcmpl-123",
"object": "chat.completion",
"created": 1677652288,
"model": "meta-llama/Llama-3.3-70B-Instruct",
"choices": [{
  "index": 0,
  "message": {
    "role": "assistant",
    "content": "

Paris is the capital of France."
  },
  "finish_reason": "stop"
}],
"usage": {
  "prompt_tokens": 9,
  "completion_tokens": 12,
  "total_tokens": 21
}
}

Streaming Response

When stream is set to true, the API will return a stream of data:-prefixed JSON objects. The final object will be [DONE].

data: {"id":"chatcmpl-123","object":"chat.completion.chunk","created":1677652288,"model":"meta-llama/Llama-3.3-70B-Instruct","choices":[{"index":0,"delta":{"role":"assistant"},"finish_reason":null}]}

data: {"id":"chatcmpl-123","object":"chat.completion.chunk","created":1677652288,"model":"meta-llama/Llama-3.3-70B-Instruct","choices":[{"index":0,"delta":{"content":"
"},"finish_reason":null}]}

data: {"id":"chatcmpl-123","object":"chat.completion.chunk","created":1677652288,"model":"meta-llama/Llama-3.3-70B-Instruct","choices":[{"index":0,"delta":{"content":"Paris"},"finish_reason":null}]}

data: {"id":"chatcmpl-123","object":"chat.completion.chunk","created":1677652288,"model":"meta-llama/Llama-3.3-70B-Instruct","choices":[{"index":0,"delta":{"content":" is"},"finish_reason":null}]}

data: {"id":"chatcmpl-123","object":"chat.completion.chunk","created":1677652288,"model":"meta-llama/Llama-3.3-70B-Instruct","choices":[{"index":0,"delta":{"content":" the"},"finish_reason":null}]}

data: {"id":"chatcmpl-123","object":"chat.completion.chunk","created":1677652288,"model":"meta-llama/Llama-3.3-70B-Instruct","choices":[{"index":0,"delta":{"content":" capital"},"finish_reason":null}]}

data: {"id":"chatcmpl-123","object":"chat.completion.chunk","created":1677652288,"model":"meta-llama/Llama-3.3-70B-Instruct","choices":[{"index":0,"delta":{"content":" of"},"finish_reason":null}]}

data: {"id":"chatcmpl-123","object":"chat.completion.chunk","created":1677652288,"model":"meta-llama/Llama-3.3-70B-Instruct","choices":[{"index":0,"delta":{"content":" France"},"finish_reason":null}]}

data: {"id":"chatcmpl-123","object":"chat.completion.chunk","created":1677652288,"model":"meta-llama/Llama-3.3-70B-Instruct","choices":[{"index":0,"delta":{"content":"."},"finish_reason":null}]}

data: {"id":"chatcmpl-123","object":"chat.completion.chunk","created":1677652288,"model":"meta-llama/Llama-3.3-70B-Instruct","choices":[{"index":0,"delta":{},"finish_reason":"stop"}]}

data: [DONE]

Request Body​

The messages Object​

Response Body​

The choices object​

The usage object​

Example Response​

Example Request​

Response Body​

Standard Response​

Streaming Response​

Request Body

The `messages` Object

Response Body

The `choices` object

The `usage` object

Example Response

Example Request

Response Body

Standard Response

Streaming Response