Introduction

This guide will walk you through performing your first inference. The steps involve creating an account to get an API key, setting up your local environment, and running the code.

First, choose the platform you will be using. The instructions and code examples on this page will adapt to your selection.

Inceptron
Hugging Face

1. Create an Inceptron Account and Obtain Your API Key

Visit the Inceptron website and sign up for a new account.
After signing up, navigate to the account section of the dashboard to create an API key.
Copy the API key and store it securely in an environment variable named INCEPTRON_API_KEY.

To set the environment variable in your terminal:

export INCEPTRON_API_KEY="your_api_key_here"

You can also use an .env file to store your API key securely in your project directory. Create a file named .env in your project's root directory (and make sure to add .env to your .gitignore file to prevent it from being committed). To use the .env file in your project, you can use libraries like python-dotenv for python projects or dotenv for Node.js projects. Read more at Securing Your API Key.

INCEPTRON_API_KEY="your_api_key_here"

2. Set Up Your Development Environment

Ensure you have Python or Node.js installed on your machine.
Install the OpenAI SDK using pip for Python or npm for Node.js:

You can skip the environment setup if you prefer calling the API directly using cURL or another HTTP client.

1. Create a Hugging Face Account and Obtain Your API Key

Visit the Hugging Face website and sign up for a new account.
After signing up, navigate to the Access Tokens page in your settings.
Create a new token (a "read" role is sufficient) and store it securely in an environment variable named HF_TOKEN.

Set the environment variable in your terminal:

export HF_TOKEN="your_hf_token_here"

HF_TOKEN="your_hf_token_here"

2. Set Up Your Development Environment

Ensure you have Python or Node.js installed on your machine.
Install the OpenAI SDK using pip for Python or npm for Node.js:

You can skip the environment setup if you prefer calling the API directly using cURL or another HTTP client.

Python
JavaScript

Python Environment Setup

We recommend that you create a virtual environment to manage your project dependencies. We have found the best way to manage virtual environments is by using uv, which can be installed via pip:

pip install uv

or via a one-liner as found on the uv uv documentation site.

Once uv is installed, create and activate a new virtual environment for your project:

uv venv
source .venv/bin/activate

Then, install the OpenAI SDK:

uv pip install openai

JavaScript Environment Setup

You can use nvm (Node Version Manager) to manage your Node.js versions. First, install nvm by following the instructions on the nvm GitHub repository. After installing nvm, use it to install the latest LTS version of Node.js and set it as the default:

nvm install --lts
nvm use --lts
nvm alias default lts/*

Next, create a new directory for your project and navigate into it:

mkdir my-inceptron-project
cd my-inceptron-project

Initialize a new Node.js project:

npm init -y

Finally, install the OpenAI SDK:

npm install openai

3. Run Your First Inference

The following examples show how to call the Chat Completions API.

Inceptron
Hugging Face

Using OpenAI SDK

Stream

Python
JavaScript

import os
from openai import OpenAI

client = OpenAI(
  base_url="https://api.inceptron.io/v1",
  api_key=os.environ["INCEPTRON_API_KEY"],
)

completion = client.chat.completions.create(
  model="meta-llama/Llama-3.3-70B-Instruct",
  messages=[{"role": "user", "content": "How many moons are there in the Solar System?"}],
)

print(completion.choices[0].message.content)

import { OpenAI } from "openai";

const client = new OpenAI({
  baseURL: "https://api.inceptron.io/v1",
  apiKey: process.env.INCEPTRON_API_KEY,
});

const chatCompletion = await client.chat.completions.create({
  model: "meta-llama/Llama-3.3-70B-Instruct",
  messages: [{"role": "user", "content": "How many moons are there in the Solar System?"}],
});

console.log(chatCompletion.choices[0].message.content);

Using cURL

Stream

cURL

curl https://api.inceptron.io/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $INCEPTRON_API_KEY" \
-d '{
  "model": "meta-llama/Llama-3.3-70B-Instruct",
  "messages": [{"role": "user", "content": "How many moons are there in the Solar System?"}]
}'

Using `huggingface_hub` InferenceClient

This is the recommended client for interacting with the Hugging Face Router.

Stream

Python
JavaScript

import os
from huggingface_hub import InferenceClient

client = InferenceClient(token=os.environ["HF_TOKEN"])

completion = client.chat_completion(
  model="meta-llama/Llama-3.3-70B-Instruct:inceptron",
  messages=[{"role": "user", "content": "How many moons are there in the Solar System?"}],
)

print(completion.choices[0].message.content)

import { HfInference } from "@huggingface/inference";

const hf = new HfInference(process.env.HF_TOKEN);

const response = await hf.chatCompletion({
  model: "meta-llama/Llama-3.3-70B-Instruct:inceptron",
  messages: [{"role": "user", "content": "How many moons are there in the Solar System?"}],
});

console.log(response.choices[0].message.content);

Using OpenAI SDK

You can also use the OpenAI SDK by pointing it to the Hugging Face endpoint.

Stream

Python
JavaScript

import os
from openai import OpenAI

client = OpenAI(
  base_url="https://router.huggingface.co/v1",
  api_key=os.environ["HF_TOKEN"],
)

completion = client.chat.completions.create(
  model="meta-llama/Llama-3.3-70B-Instruct:inceptron",
  messages=[{"role": "user", "content": "How many moons are there in the Solar System?"}],
)

print(completion.choices[0].message.content)

import { OpenAI } from "openai";

const client = new OpenAI({
  baseURL: "https://router.huggingface.co/v1",
  apiKey: process.env.HF_TOKEN,
});

const chatCompletion = await client.chat.completions.create({
  model: "meta-llama/Llama-3.3-70B-Instruct:inceptron",
  messages: [{"role": "user", "content": "How many moons are there in the Solar System?"}],
});

console.log(chatCompletion.choices[0].message.content);

Using cURL

Stream

cURL

curl https://router.huggingface.co/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $HF_TOKEN" \
-d '{
  "model": "meta-llama/Llama-3.3-70B-Instruct:inceptron",
  "messages": [{"role": "user", "content": "How many moons are there in the Solar System?"}]
}'

1. Create an Inceptron Account and Obtain Your API Key​

2. Set Up Your Development Environment​

1. Create a Hugging Face Account and Obtain Your API Key​

2. Set Up Your Development Environment​

Python Environment Setup​

JavaScript Environment Setup​

3. Run Your First Inference​

Using OpenAI SDK​

Using cURL​

Using huggingface_hub InferenceClient​

Using OpenAI SDK​

Using cURL​

1. Create an Inceptron Account and Obtain Your API Key

2. Set Up Your Development Environment

1. Create a Hugging Face Account and Obtain Your API Key

2. Set Up Your Development Environment

Python Environment Setup

JavaScript Environment Setup

3. Run Your First Inference

Using OpenAI SDK

Using cURL

Using `huggingface_hub` InferenceClient

Using OpenAI SDK

Using cURL