POST /v1/chat/completions — OpenAI chat format

The /v1/chat/completions endpoint accepts a list of messages and returns the model’s reply. Because OpenOpen8 implements the OpenAI Chat Completions format exactly, you can point any existing OpenAI SDK or client at OpenOpen8 by changing only the base URL and API key — no other code changes are required. OpenOpen8 authenticates with the token you created in the dashboard, and routes the request to whichever upstream channel is configured for the model you requested.

Endpoint

POST /v1/chat/completions

Authentication

Pass your OpenOpen8 token as a Bearer token in the Authorization header:

Authorization: Bearer YOUR_TOKEN

Request parameters

model

string

required

The model identifier to use for the completion. OpenOpen8 routes the request to the configured upstream channel for this model.For reasoning effort on OpenAI o-series models, use suffix variants: o3-mini-high, o3-mini-medium, or o3-mini-low. These map to reasoning_effort values of high, medium, and low respectively.

messages

object[]

required

The conversation history as an ordered array of messages. Each message must include a role and content.

Show message properties

messages[].role

string

required

The role of the message author. One of system, user, assistant, or tool. For newer OpenAI reasoning models (o3 and later, gpt-5 and later), use developer instead of system.

messages[].content

string | object[]

required

The message content. Pass a plain string for text-only messages. Pass an array of content parts for multimodal messages (text, images, audio, files).

messages[].name

string

An optional name for the participant. Useful when the same role appears multiple times.

messages[].tool_calls

object[]

Tool calls generated by the model in a previous turn. Present only on assistant messages that invoked tools.

messages[].tool_call_id

string

The ID of the tool call this message is responding to. Required on tool role messages.

stream

boolean

default:"false"

When true, the response is returned as a stream of server-sent events (SSE). Each event contains a partial ChatCompletionChunk object. The stream ends with a data: [DONE] line.

stream_options

object

Options that apply only when stream is true.

Show stream_options properties

stream_options.include_usage

boolean

default:"false"

When true, the final SSE chunk includes a usage field with token counts.

temperature

number

Sampling temperature between 0 and 2. Higher values produce more random output; lower values produce more deterministic output. Mutually exclusive with top_p — use one or the other, not both.

top_p

number

Nucleus sampling probability mass. The model considers only the tokens whose cumulative probability is at least top_p. Values between 0 and 1. Mutually exclusive with temperature.

max_tokens

integer

The maximum number of tokens the model may generate. If omitted, the model’s default limit applies. Use max_completion_tokens for newer OpenAI models.

max_completion_tokens

integer

The maximum number of tokens to generate in the completion, including reasoning tokens. Takes precedence over max_tokens when both are provided.

reasoning_effort

string

Controls how much reasoning the model performs before responding. Accepted values: low, medium, high. Applies to OpenAI reasoning models (o-series). As an alternative, you can encode effort in the model name directly: o3-mini-high, o3-mini-medium, o3-mini-low.

stop

string | string[]

One or more sequences at which the model stops generating. The model output will not include the stop sequence itself.

integer

default:"1"

How many completion choices to generate for each message.

frequency_penalty

number

default:"0"

Number between -2.0 and 2.0. Positive values penalize tokens that appear frequently in the text so far, reducing repetition.

presence_penalty

number

default:"0"

Number between -2.0 and 2.0. Positive values penalize tokens that have appeared at all in the text so far, increasing topic diversity.

seed

integer

When set, the model attempts to produce deterministic output. Reproducibility is not guaranteed across model versions.

logprobs

boolean

default:"false"

Whether to return log probabilities of the output tokens.

top_logprobs

integer

The number of most likely tokens to return at each token position, along with their log probabilities. Requires logprobs to be true. Between 0 and 20.

tools

object[]

A list of tools the model may call. Each tool defines a function the model can invoke.

Show tool properties

tools[].type

string

required

The type of tool. Currently only "function" is supported.

tools[].function

object

required

The function definition.

Show function properties

tools[].function.name

string

required

The name of the function to call.

tools[].function.description

string

A description of what the function does. The model uses this to decide when to call it.

tools[].function.parameters

object

The function’s parameters in JSON Schema format.

tool_choice

string | object

default:"auto"

Controls how the model selects which tool to call. Pass "none" to disable tool calls, "auto" to let the model decide, "required" to force a tool call, or an object {"type": "function", "function": {"name": "..."}} to force a specific function.

parallel_tool_calls

boolean

default:"true"

Whether the model may call multiple tools in a single turn.

response_format

object

Specifies the output format. Pass {"type": "json_object"} to enable JSON mode. Pass {"type": "json_schema", "json_schema": {...}} to enforce a specific JSON Schema.

Show response_format properties

response_format.type

string

required

Output format. One of "text", "json_object", or "json_schema".

response_format.json_schema

object

Required when type is "json_schema". Defines the schema the output must conform to.

Response fields

string

A unique identifier for this completion in the format chatcmpl-....

object

string

Always "chat.completion" for non-streaming responses, or "chat.completion.chunk" for streaming chunks.

created

integer

Unix timestamp (seconds) of when the completion was created.

model

string

The model identifier that was used to generate this completion.

choices

object[]

An array of completion choices. Most requests return one choice (n=1).

Show choice properties

choices[].index

integer

The index of this choice, starting from 0.

choices[].message

object

The generated message.

Show message properties

choices[].message.role

string

Always "assistant" for generated messages.

choices[].message.content

string | null

The text content of the message. null when the model called a tool instead of generating text.

choices[].message.tool_calls

object[]

Tool calls the model wants to make, if any. Each entry has id, type, and function (with name and arguments).

choices[].message.reasoning_content

string

The model’s internal reasoning, when the upstream returns it (e.g. from thinking-capable models).

choices[].finish_reason

string

Why the model stopped generating. One of "stop" (natural end), "length" (token limit), "tool_calls" (tool invoked), or "content_filter".

usage

object

Token counts for this request.

Show usage properties

usage.prompt_tokens

integer

Number of tokens in the input messages.

usage.completion_tokens

integer

Number of tokens in the generated output.

usage.total_tokens

integer

Sum of prompt_tokens and completion_tokens.

Examples

Non-streaming
Streaming

curl https://openopen8.ai/v1/chat/completions \
  -H "Authorization: Bearer YOUR_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o",
    "messages": [
      {"role": "system", "content": "You are a helpful assistant."},
      {"role": "user", "content": "What is the capital of France?"}
    ]
  }'

Example response:

{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "created": 1714000000,
  "model": "gpt-4o",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "The capital of France is Paris."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 25,
    "completion_tokens": 9,
    "total_tokens": 34
  }
}

curl https://openopen8.ai/v1/chat/completions \
  -H "Authorization: Bearer YOUR_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o",
    "stream": true,
    "stream_options": {"include_usage": true},
    "messages": [
      {"role": "user", "content": "Tell me a short joke."}
    ]
  }'

The server sends events in the following format. Each line begins with data: followed by a JSON object:

data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","created":1714000000,"model":"gpt-4o","choices":[{"index":0,"delta":{"role":"assistant","content":"Why"},"finish_reason":null}]}

data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","created":1714000000,"model":"gpt-4o","choices":[{"index":0,"delta":{"content":" don't"},"finish_reason":null}]}

data: [DONE]

Reasoning models

To use OpenAI reasoning models, set model to an o-series identifier. You can control reasoning effort either with the reasoning_effort parameter or by encoding effort directly in the model name:

{"model": "o3-mini-high", "messages": [...]}

is equivalent to:

{"model": "o3-mini", "reasoning_effort": "high", "messages": [...]}

For Claude thinking mode, use the -thinking suffix model name — for example, claude-3-7-sonnet-20250219-thinking. For Gemini thinking, append -thinking to any Gemini model name, or use -low, -medium, or -high for effort control.

Overview

Chat & Completions

Media & Multimodal

Other Endpoints

POST /v1/chat/completions — OpenAI chat format

Endpoint

Authentication

Request parameters

Response fields

Examples

Reasoning models

Overview

Chat & Completions

Media & Multimodal

Other Endpoints

​Endpoint

​Authentication

​Request parameters

​Response fields

​Examples

​Reasoning models

Endpoint

Authentication

Request parameters

Response fields

Examples

Reasoning models