POST /v1/responses — OpenAI Responses API format

The /v1/responses endpoint implements OpenAI’s Responses API format — a newer alternative to Chat Completions that was designed with multi-turn context management in mind. Rather than sending the full conversation history on every request, you can reference a previous response by ID and let the API manage context for you. OpenOpen8 relays these requests to whichever upstream channel supports the Responses format; not all models and channels support this endpoint, so verify that your configured channel handles it before using it in production.

Endpoints

Method	Path	Description
`POST`	`/v1/responses`	Create a model response
`POST`	`/v1/responses/compact`	Create a response with conversation compaction

Authentication

Authorization: Bearer YOUR_TOKEN

Difference from Chat Completions

The Chat Completions format (/v1/chat/completions) requires you to send the complete conversation history on every request. The Responses format manages context server-side: you send a previous_response_id and the API retrieves the prior context automatically. The /v1/responses/compact variant additionally compacts the conversation history before processing, which reduces token usage for long conversations.

Not all upstream channels support the Responses format. If you receive a 503 error, the channel backing your model does not handle this endpoint. Use /v1/chat/completions as a fallback.

Request parameters

model

string

required

The model identifier to use. OpenOpen8 routes the request to the configured upstream channel for this model.

input

string | object[]

required

The input for the model. Can be a plain string or an array of structured input parts with type fields (input_text, input_image, input_file).

instructions

string | object

System-level instructions for the model, equivalent to a system message in Chat Completions.

previous_response_id

string

The ID of a prior response. When set, the model uses the prior response’s context as the conversation history, so you do not need to resend the full message list.

stream

boolean

default:"false"

When true, the response is returned as a stream of server-sent events.

stream_options

object

Options that apply only when stream is true.

Show stream_options properties

stream_options.include_usage

boolean

default:"false"

When true, the final SSE chunk includes a usage field with token counts.

temperature

number

Sampling temperature between 0 and 2.

top_p

number

Nucleus sampling probability mass between 0 and 1.

max_output_tokens

integer

The maximum number of tokens the model may generate in the response.

reasoning

object

Controls reasoning behavior for models that support it.

Show reasoning properties

reasoning.effort

string

Reasoning effort level: "low", "medium", or "high".

reasoning.summary

string

Whether to include a summary of reasoning in the response.

tools

object[]

Tools available to the model. Supports function tools and MCP-style tool configurations.

tool_choice

string | object

default:"auto"

Controls how the model selects tools. "none", "auto", "required", or a specific function object.

truncation

string | object

Controls how the model handles context that exceeds its context window. Pass "auto" to let the model decide.

context_management

object

Advanced context management options, including compaction strategy settings for the /v1/responses/compact endpoint.

metadata

object

Key-value metadata to attach to the response. Values must be strings; maximum 16 pairs.

top_logprobs

integer

The number of most likely tokens to return at each position, with log probabilities. Between 0 and 20.

max_tool_calls

integer

Maximum number of tool calls the model may make in a single response.

Example

curl https://openopen8.ai/v1/responses \
  -H "Authorization: Bearer YOUR_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o",
    "input": "Summarize the concept of entropy in thermodynamics.",
    "instructions": "You are a physics professor. Keep explanations concise."
  }'

Multi-turn with previous_response_id:

cURL

# First turn
curl https://openopen8.ai/v1/responses \
  -H "Authorization: Bearer YOUR_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o",
    "input": "What is the speed of light?"
  }'

# Second turn — reference the first response by its ID
curl https://openopen8.ai/v1/responses \
  -H "Authorization: Bearer YOUR_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o",
    "input": "How does that relate to Einstein'\''s theory of relativity?",
    "previous_response_id": "resp_abc123"
  }'

Conversation compaction:

cURL

curl https://openopen8.ai/v1/responses/compact \
  -H "Authorization: Bearer YOUR_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o",
    "input": "Continue our discussion.",
    "previous_response_id": "resp_abc123"
  }'

The /v1/responses/compact endpoint compacts the prior conversation context before generating the next response, reducing token usage for long multi-turn sessions.

Overview

Chat & Completions

Media & Multimodal

Other Endpoints

POST /v1/responses — OpenAI Responses API format

Endpoints

Authentication

Difference from Chat Completions

Request parameters

Example

Overview

Chat & Completions

Media & Multimodal

Other Endpoints

​Endpoints

​Authentication

​Difference from Chat Completions

​Request parameters

​Example

Endpoints

Authentication

Difference from Chat Completions

Request parameters

Example