Skip to main content
The /v1/chat/completions endpoint accepts a list of messages and returns the model’s reply. Because OpenOpen8 implements the OpenAI Chat Completions format exactly, you can point any existing OpenAI SDK or client at OpenOpen8 by changing only the base URL and API key — no other code changes are required. OpenOpen8 authenticates with the token you created in the dashboard, and routes the request to whichever upstream channel is configured for the model you requested.

Endpoint

POST /v1/chat/completions

Authentication

Pass your OpenOpen8 token as a Bearer token in the Authorization header:
Authorization: Bearer YOUR_TOKEN

Request parameters

model
string
required
The model identifier to use for the completion. OpenOpen8 routes the request to the configured upstream channel for this model.For reasoning effort on OpenAI o-series models, use suffix variants: o3-mini-high, o3-mini-medium, or o3-mini-low. These map to reasoning_effort values of high, medium, and low respectively.
messages
object[]
required
The conversation history as an ordered array of messages. Each message must include a role and content.
stream
boolean
default:"false"
When true, the response is returned as a stream of server-sent events (SSE). Each event contains a partial ChatCompletionChunk object. The stream ends with a data: [DONE] line.
stream_options
object
Options that apply only when stream is true.
temperature
number
Sampling temperature between 0 and 2. Higher values produce more random output; lower values produce more deterministic output. Mutually exclusive with top_p — use one or the other, not both.
top_p
number
Nucleus sampling probability mass. The model considers only the tokens whose cumulative probability is at least top_p. Values between 0 and 1. Mutually exclusive with temperature.
max_tokens
integer
The maximum number of tokens the model may generate. If omitted, the model’s default limit applies. Use max_completion_tokens for newer OpenAI models.
max_completion_tokens
integer
The maximum number of tokens to generate in the completion, including reasoning tokens. Takes precedence over max_tokens when both are provided.
reasoning_effort
string
Controls how much reasoning the model performs before responding. Accepted values: low, medium, high. Applies to OpenAI reasoning models (o-series). As an alternative, you can encode effort in the model name directly: o3-mini-high, o3-mini-medium, o3-mini-low.
stop
string | string[]
One or more sequences at which the model stops generating. The model output will not include the stop sequence itself.
n
integer
default:"1"
How many completion choices to generate for each message.
frequency_penalty
number
default:"0"
Number between -2.0 and 2.0. Positive values penalize tokens that appear frequently in the text so far, reducing repetition.
presence_penalty
number
default:"0"
Number between -2.0 and 2.0. Positive values penalize tokens that have appeared at all in the text so far, increasing topic diversity.
seed
integer
When set, the model attempts to produce deterministic output. Reproducibility is not guaranteed across model versions.
logprobs
boolean
default:"false"
Whether to return log probabilities of the output tokens.
top_logprobs
integer
The number of most likely tokens to return at each token position, along with their log probabilities. Requires logprobs to be true. Between 0 and 20.
tools
object[]
A list of tools the model may call. Each tool defines a function the model can invoke.
tool_choice
string | object
default:"auto"
Controls how the model selects which tool to call. Pass "none" to disable tool calls, "auto" to let the model decide, "required" to force a tool call, or an object {"type": "function", "function": {"name": "..."}} to force a specific function.
parallel_tool_calls
boolean
default:"true"
Whether the model may call multiple tools in a single turn.
response_format
object
Specifies the output format. Pass {"type": "json_object"} to enable JSON mode. Pass {"type": "json_schema", "json_schema": {...}} to enforce a specific JSON Schema.

Response fields

id
string
A unique identifier for this completion in the format chatcmpl-....
object
string
Always "chat.completion" for non-streaming responses, or "chat.completion.chunk" for streaming chunks.
created
integer
Unix timestamp (seconds) of when the completion was created.
model
string
The model identifier that was used to generate this completion.
choices
object[]
An array of completion choices. Most requests return one choice (n=1).
usage
object
Token counts for this request.

Examples

curl https://openopen8.ai/v1/chat/completions \
  -H "Authorization: Bearer YOUR_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o",
    "messages": [
      {"role": "system", "content": "You are a helpful assistant."},
      {"role": "user", "content": "What is the capital of France?"}
    ]
  }'
Example response:
{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "created": 1714000000,
  "model": "gpt-4o",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "The capital of France is Paris."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 25,
    "completion_tokens": 9,
    "total_tokens": 34
  }
}

Reasoning models

To use OpenAI reasoning models, set model to an o-series identifier. You can control reasoning effort either with the reasoning_effort parameter or by encoding effort directly in the model name:
{"model": "o3-mini-high", "messages": [...]}
is equivalent to:
{"model": "o3-mini", "reasoning_effort": "high", "messages": [...]}
For Claude thinking mode, use the -thinking suffix model name — for example, claude-3-7-sonnet-20250219-thinking. For Gemini thinking, append -thinking to any Gemini model name, or use -low, -medium, or -high for effort control.