OpenAI Audio Format¶

Official Documentation

📝 Introduction¶

OpenAI Audio API provides three main functions:

Text-to-Speech (TTS) - Convert text to natural speech
Speech-to-Text (STT) - Transcribe audio to text
Audio Translation - Translate non-English audio to English text

💡 Request Examples¶

Text-to-Speech ✅¶

curl https://your-newapi-server-address/v1/audio/speech \
  -H "Authorization: Bearer $NEWAPI_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "tts-1",
    "input": "Hello, world!",
    "voice": "alloy"
  }' \
  --output speech.mp3

Speech-to-Text ✅¶

curl https://your-newapi-server-address/v1/audio/transcriptions \
  -H "Authorization: Bearer $NEWAPI_API_KEY" \
  -H "Content-Type: multipart/form-data" \
  -F file="@/path/to/file/audio.mp3" \
  -F model="whisper-1"

Response Example:

{
  "text": "Hello, world!"
}

Audio Translation ✅¶

curl https://your-newapi-server-address/v1/audio/translations \
  -H "Authorization: Bearer $NEWAPI_API_KEY" \
  -H "Content-Type: multipart/form-data" \
  -F file="@/path/to/file/chinese.mp3" \
  -F model="whisper-1"

Response Example:

{
  "text": "Hello, world!"
}

📮 Request¶

Endpoints¶

Text-to-Speech¶

POST /v1/audio/speech

Convert text to speech.

Speech-to-Text¶

POST /v1/audio/transcriptions

Transcribe audio to text in the input language.

Audio Translation¶

POST /v1/audio/translations

Translate audio to English text.

Authentication Method¶

Include the following in the request header for API key authentication:

Authorization: Bearer $NEWAPI_API_KEY

Where $NEWAPI_API_KEY is your API key.

Request Body Parameters¶

Text-to-Speech¶

`model`¶

Type: String
Required: Yes
Optional values: tts-1, tts-1-hd
Description: TTS model to use

`input`¶

Type: String
Required: Yes
Maximum length: 4096 characters
Description: Text to convert to speech

`voice`¶

Type: String
Required: Yes
Optional values: alloy, echo, fable, onyx, nova, shimmer
Description: Voice to use when generating speech

`response_format`¶

Type: String
Required: No
Default: mp3
Optional values: mp3, opus, aac, flac, wav, pcm
Description: Audio output format

`speed`¶

Type: Number
Required: No
Default: 1.0
Range: 0.25 - 4.0
Description: Speed of generated speech

Speech-to-Text¶

`file`¶

Type: File
Required: Yes
Supported formats: flac, mp3, mp4, mpeg, mpga, m4a, ogg, wav, webm
Description: Audio file to transcribe

`model`¶

Type: String
Required: Yes
Currently only supports: whisper-1
Description: Model ID to use

`language`¶

Type: String
Required: No
Format: ISO-639-1 (e.g., "en")
Description: Language of the audio, providing this can improve accuracy

`prompt`¶

Type: String
Required: No
Description: Text to guide the model's style or continue from a previous audio segment

`response_format`¶

Type: String
Required: No
Default: json
Optional values: json, text, srt, verbose_json, vtt
Description: Output format

`temperature`¶

Type: Number
Required: No
Default: 0
Range: 0 - 1
Description: Sampling temperature, higher values make output more random

`timestamp_granularities`¶

Type: Array
Required: No
Default: segment
Optional values: word, segment
Description: Granularity of transcription timestamps

Audio Translation¶

`file`¶

Type: File
Required: Yes
Supported formats: flac, mp3, mp4, mpeg, mpga, m4a, ogg, wav, webm
Description: Audio file to translate

`model`¶

Type: String
Required: Yes
Currently only supports: whisper-1
Description: Model ID to use

`prompt`¶

Type: String
Required: No
Description: English text to guide the model's style

`response_format`¶

Type: String
Required: No
Default: json
Optional values: json, text, srt, verbose_json, vtt
Description: Output format

`temperature`¶

Type: Number
Required: No
Default: 0
Range: 0 - 1
Description: Sampling temperature, higher values make output more random

📥 Response¶

Successful Response¶

Text-to-Speech¶

Returns binary audio file content.

Speech-to-Text¶

Basic JSON Format¶

{
  "text": "Transcribed text content"
}

Detailed JSON Format¶

{
  "task": "transcribe",
  "language": "english",
  "duration": 8.47,
  "text": "Complete transcribed text",
  "segments": [
    {
      "id": 0,
      "seek": 0,
      "start": 0.0,
      "end": 3.32,
      "text": "Segmented transcribed text",
      "tokens": [50364, 440, 7534],
      "temperature": 0.0,
      "avg_logprob": -0.286,
      "compression_ratio": 1.236,
      "no_speech_prob": 0.009
    }
  ]
}

Audio Translation¶

{
  "text": "Translated English text"
}

Error Response¶

When a request encounters an issue, the API will return an error response object, with HTTP status codes in the 4XX-5XX range.

Common Error Status Codes¶

400 Bad Request: Invalid request parameters
401 Unauthorized: Invalid API key or not provided
429 Too Many Requests: Exceeded API call limit
500 Internal Server Error: Server internal error

Error response example:

{
  "error": {
    "message": "Unsupported file format",
    "type": "invalid_request_error",
    "param": "file",
    "code": "invalid_file_format"
  }
}

OpenAI Audio Format¶

📝 Introduction¶

💡 Request Examples¶

Text-to-Speech ✅¶

Speech-to-Text ✅¶

Audio Translation ✅¶

📮 Request¶

Endpoints¶

Text-to-Speech¶

Speech-to-Text¶

Audio Translation¶

Authentication Method¶

Request Body Parameters¶

Text-to-Speech¶

model¶

input¶

voice¶

response_format¶

speed¶

Speech-to-Text¶

file¶

model¶

language¶

prompt¶

response_format¶

temperature¶

timestamp_granularities¶

Audio Translation¶

file¶

model¶

prompt¶

response_format¶

temperature¶

📥 Response¶

Successful Response¶

Text-to-Speech¶

Speech-to-Text¶

Basic JSON Format¶

Detailed JSON Format¶

Audio Translation¶

Error Response¶

Common Error Status Codes¶

`model`¶

`input`¶

`voice`¶

`response_format`¶

`speed`¶

`file`¶

`model`¶

`language`¶

`prompt`¶

`response_format`¶

`temperature`¶

`timestamp_granularities`¶

`file`¶

`model`¶

`prompt`¶

`response_format`¶

`temperature`¶