Skip to main content
The /v1/responses endpoint implements OpenAI’s Responses API format — a newer alternative to Chat Completions that was designed with multi-turn context management in mind. Rather than sending the full conversation history on every request, you can reference a previous response by ID and let the API manage context for you. OpenOpen8 relays these requests to whichever upstream channel supports the Responses format; not all models and channels support this endpoint, so verify that your configured channel handles it before using it in production.

Endpoints

MethodPathDescription
POST/v1/responsesCreate a model response
POST/v1/responses/compactCreate a response with conversation compaction

Authentication

Authorization: Bearer YOUR_TOKEN

Difference from Chat Completions

The Chat Completions format (/v1/chat/completions) requires you to send the complete conversation history on every request. The Responses format manages context server-side: you send a previous_response_id and the API retrieves the prior context automatically. The /v1/responses/compact variant additionally compacts the conversation history before processing, which reduces token usage for long conversations.
Not all upstream channels support the Responses format. If you receive a 503 error, the channel backing your model does not handle this endpoint. Use /v1/chat/completions as a fallback.

Request parameters

model
string
required
The model identifier to use. OpenOpen8 routes the request to the configured upstream channel for this model.
input
string | object[]
required
The input for the model. Can be a plain string or an array of structured input parts with type fields (input_text, input_image, input_file).
instructions
string | object
System-level instructions for the model, equivalent to a system message in Chat Completions.
previous_response_id
string
The ID of a prior response. When set, the model uses the prior response’s context as the conversation history, so you do not need to resend the full message list.
stream
boolean
default:"false"
When true, the response is returned as a stream of server-sent events.
stream_options
object
Options that apply only when stream is true.
temperature
number
Sampling temperature between 0 and 2.
top_p
number
Nucleus sampling probability mass between 0 and 1.
max_output_tokens
integer
The maximum number of tokens the model may generate in the response.
reasoning
object
Controls reasoning behavior for models that support it.
tools
object[]
Tools available to the model. Supports function tools and MCP-style tool configurations.
tool_choice
string | object
default:"auto"
Controls how the model selects tools. "none", "auto", "required", or a specific function object.
truncation
string | object
Controls how the model handles context that exceeds its context window. Pass "auto" to let the model decide.
context_management
object
Advanced context management options, including compaction strategy settings for the /v1/responses/compact endpoint.
metadata
object
Key-value metadata to attach to the response. Values must be strings; maximum 16 pairs.
top_logprobs
integer
The number of most likely tokens to return at each position, with log probabilities. Between 0 and 20.
max_tool_calls
integer
Maximum number of tool calls the model may make in a single response.

Example

curl https://openopen8.ai/v1/responses \
  -H "Authorization: Bearer YOUR_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o",
    "input": "Summarize the concept of entropy in thermodynamics.",
    "instructions": "You are a physics professor. Keep explanations concise."
  }'
Multi-turn with previous_response_id:
cURL
# First turn
curl https://openopen8.ai/v1/responses \
  -H "Authorization: Bearer YOUR_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o",
    "input": "What is the speed of light?"
  }'

# Second turn — reference the first response by its ID
curl https://openopen8.ai/v1/responses \
  -H "Authorization: Bearer YOUR_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o",
    "input": "How does that relate to Einstein'\''s theory of relativity?",
    "previous_response_id": "resp_abc123"
  }'
Conversation compaction:
cURL
curl https://openopen8.ai/v1/responses/compact \
  -H "Authorization: Bearer YOUR_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o",
    "input": "Continue our discussion.",
    "previous_response_id": "resp_abc123"
  }'
The /v1/responses/compact endpoint compacts the prior conversation context before generating the next response, reducing token usage for long multi-turn sessions.