/v1/chat/completions endpoint accepts a list of messages and returns the model’s reply. Because OpenOpen8 implements the OpenAI Chat Completions format exactly, you can point any existing OpenAI SDK or client at OpenOpen8 by changing only the base URL and API key — no other code changes are required. OpenOpen8 authenticates with the token you created in the dashboard, and routes the request to whichever upstream channel is configured for the model you requested.
Endpoint
Authentication
Pass your OpenOpen8 token as a Bearer token in theAuthorization header:
Request parameters
The model identifier to use for the completion. OpenOpen8 routes the request to the configured upstream channel for this model.For reasoning effort on OpenAI o-series models, use suffix variants:
o3-mini-high, o3-mini-medium, or o3-mini-low. These map to reasoning_effort values of high, medium, and low respectively.The conversation history as an ordered array of messages. Each message must include a
role and content.When
true, the response is returned as a stream of server-sent events (SSE). Each event contains a partial ChatCompletionChunk object. The stream ends with a data: [DONE] line.Options that apply only when
stream is true.Sampling temperature between
0 and 2. Higher values produce more random output; lower values produce more deterministic output. Mutually exclusive with top_p — use one or the other, not both.Nucleus sampling probability mass. The model considers only the tokens whose cumulative probability is at least
top_p. Values between 0 and 1. Mutually exclusive with temperature.The maximum number of tokens the model may generate. If omitted, the model’s default limit applies. Use
max_completion_tokens for newer OpenAI models.The maximum number of tokens to generate in the completion, including reasoning tokens. Takes precedence over
max_tokens when both are provided.Controls how much reasoning the model performs before responding. Accepted values:
low, medium, high. Applies to OpenAI reasoning models (o-series). As an alternative, you can encode effort in the model name directly: o3-mini-high, o3-mini-medium, o3-mini-low.One or more sequences at which the model stops generating. The model output will not include the stop sequence itself.
How many completion choices to generate for each message.
Number between
-2.0 and 2.0. Positive values penalize tokens that appear frequently in the text so far, reducing repetition.Number between
-2.0 and 2.0. Positive values penalize tokens that have appeared at all in the text so far, increasing topic diversity.When set, the model attempts to produce deterministic output. Reproducibility is not guaranteed across model versions.
Whether to return log probabilities of the output tokens.
The number of most likely tokens to return at each token position, along with their log probabilities. Requires
logprobs to be true. Between 0 and 20.A list of tools the model may call. Each tool defines a function the model can invoke.
Controls how the model selects which tool to call. Pass
"none" to disable tool calls, "auto" to let the model decide, "required" to force a tool call, or an object {"type": "function", "function": {"name": "..."}} to force a specific function.Whether the model may call multiple tools in a single turn.
Specifies the output format. Pass
{"type": "json_object"} to enable JSON mode. Pass {"type": "json_schema", "json_schema": {...}} to enforce a specific JSON Schema.Response fields
A unique identifier for this completion in the format
chatcmpl-....Always
"chat.completion" for non-streaming responses, or "chat.completion.chunk" for streaming chunks.Unix timestamp (seconds) of when the completion was created.
The model identifier that was used to generate this completion.
An array of completion choices. Most requests return one choice (
n=1).Token counts for this request.
Examples
- Non-streaming
- Streaming
Reasoning models
To use OpenAI reasoning models, setmodel to an o-series identifier. You can control reasoning effort either with the reasoning_effort parameter or by encoding effort directly in the model name:
-thinking suffix model name — for example, claude-3-7-sonnet-20250219-thinking. For Gemini thinking, append -thinking to any Gemini model name, or use -low, -medium, or -high for effort control.