Audio: transcription, translation, and speech synthesis

OpenOpen8 provides three audio endpoints compatible with the OpenAI Audio API. You can transcribe speech to text, translate audio in any language into English, or generate spoken audio from a text string. OpenOpen8 routes your requests to the best available audio model automatically.

POST /v1/audio/transcriptions

Convert an audio file to text in its original language. Compatible with OpenAI Whisper.

Request body

This endpoint uses multipart/form-data encoding.

model

string

required

The transcription model to use. For example, whisper-1. The available values depend on your configured channels.

file

required

The audio file to transcribe. Supported formats include mp3, mp4, mpeg, mpga, m4a, ogg, wav, and webm. Maximum file size is 25 MB.

language

string

The language of the audio, as an ISO-639-1 code (e.g., en, zh, fr). Providing this hint improves accuracy and speed. If omitted, the model detects the language automatically.

response_format

string

The format of the transcription output. One of json, text, srt, verbose_json, or vtt. Defaults to json.

instructions

string

Optional text to guide the model’s transcription style or vocabulary.

Response

For response_format: json, the response is:

text

string

The transcribed text.

For response_format: verbose_json, additional fields are returned:

task

string

Always "transcribe".

language

string

The detected or provided language.

duration

number

Duration of the audio in seconds.

segments

object[]

Time-aligned segments of the transcription.

Show segment properties

integer

Segment index.

start

number

Start time in seconds.

end

number

End time in seconds.

text

string

Transcribed text for this segment.

temperature

number

Model temperature used for this segment.

avg_logprob

number

Average log probability for this segment.

compression_ratio

number

Compression ratio of the segment text.

no_speech_prob

number

Probability that this segment contains no speech.

Example

curl

curl https://openopen8.ai/v1/audio/transcriptions \
  -H "Authorization: Bearer YOUR_TOKEN" \
  -F model="whisper-1" \
  -F file="@recording.mp3" \
  -F language="en" \
  -F response_format="json"

POST /v1/audio/translations

Transcribe and translate an audio file into English, regardless of the source language.

Request body

This endpoint uses multipart/form-data encoding.

model

string

required

The model to use for translation. For example, whisper-1.

file

required

The audio file to translate. Same format restrictions as /v1/audio/transcriptions.

response_format

string

Output format: json, text, srt, verbose_json, or vtt. Defaults to json.

instructions

string

Optional text to guide the model’s translation style.

Response

text

string

The translated text in English.

Example

curl

curl https://openopen8.ai/v1/audio/translations \
  -H "Authorization: Bearer YOUR_TOKEN" \
  -F model="whisper-1" \
  -F file="@french-audio.mp3" \
  -F response_format="json"

POST /v1/audio/speech

Generate spoken audio from a text string (text-to-speech).

Request body

model

string

required

The TTS model to use. For example, tts-1 or tts-1-hd. tts-1-hd produces higher-quality audio at higher cost.

input

string

required

The text to convert to speech. Maximum length is 4,096 characters.

voice

string

required

The voice to use for synthesis. OpenAI TTS supports alloy, echo, fable, onyx, nova, and shimmer. The available voices depend on your configured provider.

response_format

string

The audio output format. One of mp3, opus, aac, or flac. Defaults to mp3.

speed

number

The playback speed of the generated audio. A value between 0.25 and 4.0. Defaults to 1.0.

instructions

string

Optional text instructions to control speaking style, tone, or pacing.

Response

The response body is raw audio binary data in the format specified by response_format. Set your HTTP client to save the response directly to a file.

Example

curl https://openopen8.ai/v1/audio/speech \
  -H "Authorization: Bearer YOUR_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "tts-1",
    "input": "Welcome to OpenOpen8, your unified AI gateway.",
    "voice": "nova",
    "response_format": "mp3"
  }' \
  --output speech.mp3

Overview

Chat & Completions

Media & Multimodal

Other Endpoints

Audio: transcription, translation, and speech synthesis

POST /v1/audio/transcriptions

Request body

Response

Example

POST /v1/audio/translations

Request body

Response

Example

POST /v1/audio/speech

Request body

Response

Example

Overview

Chat & Completions

Media & Multimodal

Other Endpoints

​POST /v1/audio/transcriptions

​Request body

​Response

​Example

​POST /v1/audio/translations

​Request body

​Response

​Example

​POST /v1/audio/speech

​Request body

​Response

​Example

POST /v1/audio/transcriptions

Request body

Response

Example

POST /v1/audio/translations

Request body

Response

Example

POST /v1/audio/speech

Request body

Response

Example