Realtime requires a channel configured with a Realtime-capable provider (OpenAI or Azure OpenAI). Contact your OpenOpen8 administrator if the WebSocket connection is rejected.
Connecting
Endpoint:GET /v1/realtime
Upgrade an HTTP GET request to a WebSocket connection. You can pass your token as a query parameter or as an Authorization header in the WebSocket handshake.
Authentication
Pass your token in one of the following ways:- Query parameter:
?token=YOUR_TOKEN - Authorization header:
Authorization: Bearer YOUR_TOKEN(set during the HTTP upgrade handshake)
Event types
Once connected, you exchange JSON event messages. Each message has atype field that identifies its purpose. The following are the core event types.
Client → server
| Event type | Description |
|---|---|
session.update | Configure session parameters such as voice, audio format, tools, and instructions. |
input_audio_buffer.append | Stream base64-encoded audio bytes to the model’s input buffer. |
conversation.item.create | Add a text message to the conversation. |
response.create | Prompt the model to generate a response based on the current conversation and buffer. |
Server → client
| Event type | Description |
|---|---|
session.created | Sent immediately after connecting, confirming the session is ready. |
session.updated | Confirms that a session.update was applied. |
response.audio.delta | A chunk of base64-encoded audio from the model’s response. |
response.audio_transcript.delta | A chunk of the text transcript of the model’s audio output. |
response.function_call_arguments.delta | Streamed function call arguments, when the model calls a tool. |
response.function_call_arguments.done | Signals that function call arguments are complete. |
response.done | Signals that the model has finished generating a response. Contains usage information. |
conversation.item.created | Confirms that a conversation item was added. |
error | An error occurred. Contains an error object with a message and code. |
Session configuration
After connecting, send asession.update event to configure the session:
The interaction modes to enable. For example,
["text", "audio"].System-level instructions that guide the model’s behavior for the session.
The voice to use for audio output. For example,
alloy, echo, nova, or shimmer.Format of the audio you send. For example,
pcm16, g711_ulaw, or g711_alaw.Format of the audio the model returns. For example,
pcm16.Configuration for transcribing your audio input.
Controls how the server detects end-of-turn in audio input. Set to
null to disable automatic turn detection and manage it manually.A list of tool definitions available to the model during this session, following the OpenAI function-calling schema.
Controls when the model uses tools.
auto, none, or a specific tool name.Sampling temperature for the model. Defaults to
0.8.Usage tracking
When the model finishes a response, theresponse.done event includes a usage object:
Total tokens consumed by this response turn.
Tokens in the input (audio + text).
Tokens in the model’s output (audio + text).
Breakdown of input token types (e.g., cached, audio).
Breakdown of output token types (e.g., audio, text).
Example
The following JavaScript example connects to the Realtime endpoint, configures a session, and logs events as they arrive.javascript