/v1/responses endpoint implements OpenAI’s Responses API format — a newer alternative to Chat Completions that was designed with multi-turn context management in mind. Rather than sending the full conversation history on every request, you can reference a previous response by ID and let the API manage context for you. OpenOpen8 relays these requests to whichever upstream channel supports the Responses format; not all models and channels support this endpoint, so verify that your configured channel handles it before using it in production.
Endpoints
| Method | Path | Description |
|---|---|---|
POST | /v1/responses | Create a model response |
POST | /v1/responses/compact | Create a response with conversation compaction |
Authentication
Difference from Chat Completions
The Chat Completions format (/v1/chat/completions) requires you to send the complete conversation history on every request. The Responses format manages context server-side: you send a previous_response_id and the API retrieves the prior context automatically. The /v1/responses/compact variant additionally compacts the conversation history before processing, which reduces token usage for long conversations.
Not all upstream channels support the Responses format. If you receive a
503 error, the channel backing your model does not handle this endpoint. Use /v1/chat/completions as a fallback.Request parameters
The model identifier to use. OpenOpen8 routes the request to the configured upstream channel for this model.
The input for the model. Can be a plain string or an array of structured input parts with
type fields (input_text, input_image, input_file).System-level instructions for the model, equivalent to a
system message in Chat Completions.The ID of a prior response. When set, the model uses the prior response’s context as the conversation history, so you do not need to resend the full message list.
When
true, the response is returned as a stream of server-sent events.Options that apply only when
stream is true.Sampling temperature between
0 and 2.Nucleus sampling probability mass between
0 and 1.The maximum number of tokens the model may generate in the response.
Controls reasoning behavior for models that support it.
Tools available to the model. Supports function tools and MCP-style tool configurations.
Controls how the model selects tools.
"none", "auto", "required", or a specific function object.Controls how the model handles context that exceeds its context window. Pass
"auto" to let the model decide.Advanced context management options, including compaction strategy settings for the
/v1/responses/compact endpoint.Key-value metadata to attach to the response. Values must be strings; maximum 16 pairs.
The number of most likely tokens to return at each position, with log probabilities. Between
0 and 20.Maximum number of tool calls the model may make in a single response.
Example
previous_response_id:
cURL
cURL
/v1/responses/compact endpoint compacts the prior conversation context before generating the next response, reducing token usage for long multi-turn sessions.