Skip to main content

Use EigenAI

We're starting off with supporting the gpt-oss-120b-f16 model based on initial demand and expanding from there. To get started or request another model, visit our onboarding page.

$ curl -X POST https://eigenai.eigencloud.xyz/v1/chat/completions \
-H "X-API-Key: <api-key>" \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-oss-120b-f16",
"max_tokens": 120,
"seed": 42,
"messages": [{"role": "user", "content": "Write a story about programming"}]
}' | jq

{
"id": "chatcmpl-eigenai-llama-1758835890305965",
"created": 1758835890,
"model": "gpt-oss-120b-f16",
"system_fingerprint": "eigenai-llama@0.1.0+eigenai-llama",
"object": "chat.completion",
"usage": {
"completion_tokens": 120,
"prompt_tokens": 72,
"total_tokens": 192
},
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "<|channel|>analysis<|message|>The user asks: \"Write a story about programming\". Need to produce a story. Probably creative, about programming, characters, maybe AI, debugging, etc. Should be engaging. No constraints other than story. I will craft a narrative with a protagonist, their journey, maybe metaphorical. Use vivid language. Provide a story. Ensure it's appropriate. Probably around 800-1200 words? Not too long, but enough. I'll deliver.<|end|><|start|>assistant<|channel|>final<|message|>**The Compiler’s Apprentice**\n\nWhen Maya first walked into the cramped, fluorescent‑lit office of LumenTech"
},
"finish_reason": "length"
}
],
"signature": "2ee2e48ae9f6796c1de67977981d937c886efec3957c9bfc28f7a3f3be8262f04af609f22878b3bc2f0c4858fd04005d76877208881abd473208317df1dcfaf11c"
}

OpenAI Client usage

from openai import OpenAI

client = OpenAI(
base_url="https://eigenai.eigencloud.xyz/v1",
default_headers={"x-api-key": api_key}
)

tools: List[Dict[str, Any]] = [
{
"type": "function",
"function": {
"name": "get_current_weather",
"description": "Get the current weather in a given location",
"parameters": {
"type": "object",
"properties": {
"location": {
"type": "string",
"description": "The city and state, e.g. San Francisco, CA",
},
"unit": {"type": "string", "enum": ["celsius", "fahrenheit"]},
},
"required": ["location"],
},
},
}
]

step1 = client.chat.completions.create(
model=model,
messages=[{"role": "user", "content": "What is the weather like in Boston today?"}],
tools=tools,
tool_choice="auto",
)

"""
Response:
{
"id": "chatcmpl-eigenai-llama-1758836092182536",
"object": "chat.completion",
"created": 1758727565,
"model": "gpt-oss-120b-f16",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": null,
"tool_calls": [
{
"id": "call_YDzzMHFtp1yuURbiPe09uyHt",
"type": "function",
"function": {
"name": "get_current_weather",
"arguments": "{\"location\":\"Boston, MA\",\"unit\":\"fahrenheit\"}"
}
}
],
"refusal": null,
"annotations": []
},
"finish_reason": "tool_calls"
}
],
"usage": {
"prompt_tokens": 81,
"completion_tokens": 223,
"total_tokens": 304,
"prompt_tokens_details": {
"cached_tokens": 0,
"audio_tokens": 0
},
"completion_tokens_details": {
"reasoning_tokens": 192,
"audio_tokens": 0,
"accepted_prediction_tokens": 0,
"rejected_prediction_tokens": 0
}
},
"""

messages_step2: List[Dict[str, Any]] = [
{"role": "user", "content": "What is the weather like in Boston today?"},
{
"role": "assistant",
"content": None,
"tool_calls": [
{
"id": tool_call_id,
"type": "function",
"function": {
"name": "get_current_weather",
"arguments": json.dumps({"location": "Boston, MA", "unit": "fahrenheit"}),
},
}
],
},
{"role": "tool", "tool_call_id": tool_call_id, "content": "58 degrees"},
{"role": "user", "content": "Do I need a sweater for this weather?"},
]

step2 = client.chat.completions.create(model=model, messages=messages_step2)

"""
Response
{
"id": "chatcmpl-eigenai-llama-CJOZTszzusoHvAYYrW8PT5lv6vzKo",
"object": "chat.completion",
"created": 1758738719,
"model": "gpt-oss-120b-f16",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "At around 58°F in Boston you’ll feel a noticeable chill—especially if there’s any breeze or you’re out in the morning or evening. I’d recommend throwing on a light sweater or layering a long-sleeve shirt under a casual jacket. If you tend to run cold, go with a medium-weight knit; if you’re just mildly sensitive, a thin cardigan or pullover should be enough.",
"refusal": null,
"annotations": []
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 67,
"completion_tokens": 294,
"total_tokens": 361,
"prompt_tokens_details": {
"cached_tokens": 0,
"audio_tokens": 0
},
"completion_tokens_details": {
"reasoning_tokens": 192,
"audio_tokens": 0,
"accepted_prediction_tokens": 0,
"rejected_prediction_tokens": 0
}
},
"""

Supported parameters

This list will be expanding to cover the full parameter set of the Chat Completions API.

  • messages: array
    • A list of messages comprising the conversation so far
  • model: string
    • Model ID used to generate the response, like gpt-oss-120b-f16
  • max_tokens: (optional) integer
    • The maximum number of tokens that can be generated in the chat completion. This value can be used to control costs for text generated via API.
  • seed: (optional) integer
    • If specified, our system will run the inference deterministically, such that repeated requests with the same seed and parameters should return the same result.
  • stream: (optional) bool
    • If set to true, the model response data will be streamed to the client as it is generated using Server-Side Events (SSE).
  • temperature: (optional) number
    • What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic.
  • top_p: (optional) number
    • An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered.
  • logprobs: (optional) bool
    • Whether to return log probabilities of the output tokens or not. If true, returns the log probabilities of each output token returned in the content of message
  • frequency_penalty: (optional) number
    • Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
  • presence_penalty: (optional) number
    • Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
  • tools: array
  • tool_choice: (optional) string
    • “auto”, “required”, “none”
    • Controls which (if any) tool is called by the model. none means the model will not call any tool and instead generates a message. auto means the model can pick between generating a message or calling one or more tools. required means the model must call one or more tools. Specifying a particular tool via {"type": "function", "function": {"name": "my_function"}} forces the model to call that tool.
    • none is the default when no tools are present. auto is the default if tools are present.