Skip to main content
OpenOpen8 provides three audio endpoints compatible with the OpenAI Audio API. You can transcribe speech to text, translate audio in any language into English, or generate spoken audio from a text string. OpenOpen8 routes your requests to the best available audio model automatically.

POST /v1/audio/transcriptions

Convert an audio file to text in its original language. Compatible with OpenAI Whisper.

Request body

This endpoint uses multipart/form-data encoding.
model
string
required
The transcription model to use. For example, whisper-1. The available values depend on your configured channels.
file
file
required
The audio file to transcribe. Supported formats include mp3, mp4, mpeg, mpga, m4a, ogg, wav, and webm. Maximum file size is 25 MB.
language
string
The language of the audio, as an ISO-639-1 code (e.g., en, zh, fr). Providing this hint improves accuracy and speed. If omitted, the model detects the language automatically.
response_format
string
The format of the transcription output. One of json, text, srt, verbose_json, or vtt. Defaults to json.
instructions
string
Optional text to guide the model’s transcription style or vocabulary.

Response

For response_format: json, the response is:
text
string
The transcribed text.
For response_format: verbose_json, additional fields are returned:
task
string
Always "transcribe".
language
string
The detected or provided language.
duration
number
Duration of the audio in seconds.
segments
object[]
Time-aligned segments of the transcription.

Example

curl
curl https://openopen8.ai/v1/audio/transcriptions \
  -H "Authorization: Bearer YOUR_TOKEN" \
  -F model="whisper-1" \
  -F file="@recording.mp3" \
  -F language="en" \
  -F response_format="json"

POST /v1/audio/translations

Transcribe and translate an audio file into English, regardless of the source language.

Request body

This endpoint uses multipart/form-data encoding.
model
string
required
The model to use for translation. For example, whisper-1.
file
file
required
The audio file to translate. Same format restrictions as /v1/audio/transcriptions.
response_format
string
Output format: json, text, srt, verbose_json, or vtt. Defaults to json.
instructions
string
Optional text to guide the model’s translation style.

Response

text
string
The translated text in English.

Example

curl
curl https://openopen8.ai/v1/audio/translations \
  -H "Authorization: Bearer YOUR_TOKEN" \
  -F model="whisper-1" \
  -F file="@french-audio.mp3" \
  -F response_format="json"

POST /v1/audio/speech

Generate spoken audio from a text string (text-to-speech).

Request body

model
string
required
The TTS model to use. For example, tts-1 or tts-1-hd. tts-1-hd produces higher-quality audio at higher cost.
input
string
required
The text to convert to speech. Maximum length is 4,096 characters.
voice
string
required
The voice to use for synthesis. OpenAI TTS supports alloy, echo, fable, onyx, nova, and shimmer. The available voices depend on your configured provider.
response_format
string
The audio output format. One of mp3, opus, aac, or flac. Defaults to mp3.
speed
number
The playback speed of the generated audio. A value between 0.25 and 4.0. Defaults to 1.0.
instructions
string
Optional text instructions to control speaking style, tone, or pacing.

Response

The response body is raw audio binary data in the format specified by response_format. Set your HTTP client to save the response directly to a file.

Example

curl https://openopen8.ai/v1/audio/speech \
  -H "Authorization: Bearer YOUR_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "tts-1",
    "input": "Welcome to OpenOpen8, your unified AI gateway.",
    "voice": "nova",
    "response_format": "mp3"
  }' \
  --output speech.mp3