Skip to main content
The Realtime API lets you hold low-latency, bidirectional voice and text conversations with a model over a WebSocket connection. Instead of sending discrete HTTP requests, you open a persistent connection and exchange event messages in both directions — sending audio or text input and receiving streamed audio or text responses as they are generated. OpenOpen8’s Realtime endpoint is compatible with the OpenAI Realtime API format and also supports Azure OpenAI Realtime. You need a channel configured with a provider that supports Realtime (OpenAI or Azure OpenAI) before connecting.
Realtime requires a channel configured with a Realtime-capable provider (OpenAI or Azure OpenAI). Contact your OpenOpen8 administrator if the WebSocket connection is rejected.

Connecting

Endpoint: GET /v1/realtime Upgrade an HTTP GET request to a WebSocket connection. You can pass your token as a query parameter or as an Authorization header in the WebSocket handshake.
ws://openopen8.ai/v1/realtime
For TLS-secured instances:
wss://openopen8.ai/v1/realtime

Authentication

Pass your token in one of the following ways:
  • Query parameter: ?token=YOUR_TOKEN
  • Authorization header: Authorization: Bearer YOUR_TOKEN (set during the HTTP upgrade handshake)

Event types

Once connected, you exchange JSON event messages. Each message has a type field that identifies its purpose. The following are the core event types.

Client → server

Event typeDescription
session.updateConfigure session parameters such as voice, audio format, tools, and instructions.
input_audio_buffer.appendStream base64-encoded audio bytes to the model’s input buffer.
conversation.item.createAdd a text message to the conversation.
response.createPrompt the model to generate a response based on the current conversation and buffer.

Server → client

Event typeDescription
session.createdSent immediately after connecting, confirming the session is ready.
session.updatedConfirms that a session.update was applied.
response.audio.deltaA chunk of base64-encoded audio from the model’s response.
response.audio_transcript.deltaA chunk of the text transcript of the model’s audio output.
response.function_call_arguments.deltaStreamed function call arguments, when the model calls a tool.
response.function_call_arguments.doneSignals that function call arguments are complete.
response.doneSignals that the model has finished generating a response. Contains usage information.
conversation.item.createdConfirms that a conversation item was added.
errorAn error occurred. Contains an error object with a message and code.

Session configuration

After connecting, send a session.update event to configure the session:
modalities
string[]
The interaction modes to enable. For example, ["text", "audio"].
instructions
string
System-level instructions that guide the model’s behavior for the session.
voice
string
The voice to use for audio output. For example, alloy, echo, nova, or shimmer.
input_audio_format
string
Format of the audio you send. For example, pcm16, g711_ulaw, or g711_alaw.
output_audio_format
string
Format of the audio the model returns. For example, pcm16.
input_audio_transcription
object
Configuration for transcribing your audio input.
turn_detection
object | null
Controls how the server detects end-of-turn in audio input. Set to null to disable automatic turn detection and manage it manually.
tools
object[]
A list of tool definitions available to the model during this session, following the OpenAI function-calling schema.
tool_choice
string
Controls when the model uses tools. auto, none, or a specific tool name.
temperature
number
Sampling temperature for the model. Defaults to 0.8.

Usage tracking

When the model finishes a response, the response.done event includes a usage object:
total_tokens
integer
Total tokens consumed by this response turn.
input_tokens
integer
Tokens in the input (audio + text).
output_tokens
integer
Tokens in the model’s output (audio + text).
input_token_details
object
Breakdown of input token types (e.g., cached, audio).
output_token_details
object
Breakdown of output token types (e.g., audio, text).

Example

The following JavaScript example connects to the Realtime endpoint, configures a session, and logs events as they arrive.
javascript
const ws = new WebSocket(
  "wss://openopen8.ai/v1/realtime?token=YOUR_TOKEN"
);

ws.addEventListener("open", () => {
  // Configure the session
  ws.send(
    JSON.stringify({
      type: "session.update",
      session: {
        modalities: ["text", "audio"],
        instructions: "You are a helpful assistant. Respond concisely.",
        voice: "alloy",
        input_audio_format: "pcm16",
        output_audio_format: "pcm16",
        temperature: 0.8,
      },
    })
  );

  // Send a text message
  ws.send(
    JSON.stringify({
      type: "conversation.item.create",
      item: {
        type: "message",
        role: "user",
        content: [{ type: "input_text", text: "Hello, how are you?" }],
      },
    })
  );

  // Ask the model to respond
  ws.send(JSON.stringify({ type: "response.create" }));
});

ws.addEventListener("message", (event) => {
  const msg = JSON.parse(event.data);
  console.log(msg.type, msg);
});

ws.addEventListener("error", (err) => {
  console.error("WebSocket error", err);
});