OpenAI Audio Format¶
Official Documentation
📝 Introduction¶
OpenAI Audio API provides three main functions:
- Text-to-Speech (TTS) - Convert text to natural speech
- Speech-to-Text (STT) - Transcribe audio to text
- Audio Translation - Translate non-English audio to English text
💡 Request Examples¶
Text-to-Speech ✅¶
curl https://your-newapi-server-address/v1/audio/speech \
-H "Authorization: Bearer $NEWAPI_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "tts-1",
"input": "Hello, world!",
"voice": "alloy"
}' \
--output speech.mp3
Speech-to-Text ✅¶
curl https://your-newapi-server-address/v1/audio/transcriptions \
-H "Authorization: Bearer $NEWAPI_API_KEY" \
-H "Content-Type: multipart/form-data" \
-F file="@/path/to/file/audio.mp3" \
-F model="whisper-1"
Response Example:
Audio Translation ✅¶
curl https://your-newapi-server-address/v1/audio/translations \
-H "Authorization: Bearer $NEWAPI_API_KEY" \
-H "Content-Type: multipart/form-data" \
-F file="@/path/to/file/chinese.mp3" \
-F model="whisper-1"
Response Example:
📮 Request¶
Endpoints¶
Text-to-Speech¶
Convert text to speech.
Speech-to-Text¶
Transcribe audio to text in the input language.
Audio Translation¶
Translate audio to English text.
Authentication Method¶
Include the following in the request header for API key authentication:
Where $NEWAPI_API_KEY
is your API key.
Request Body Parameters¶
Text-to-Speech¶
model
¶
- Type: String
- Required: Yes
- Optional values: tts-1, tts-1-hd
- Description: TTS model to use
input
¶
- Type: String
- Required: Yes
- Maximum length: 4096 characters
- Description: Text to convert to speech
voice
¶
- Type: String
- Required: Yes
- Optional values: alloy, echo, fable, onyx, nova, shimmer
- Description: Voice to use when generating speech
response_format
¶
- Type: String
- Required: No
- Default: mp3
- Optional values: mp3, opus, aac, flac, wav, pcm
- Description: Audio output format
speed
¶
- Type: Number
- Required: No
- Default: 1.0
- Range: 0.25 - 4.0
- Description: Speed of generated speech
Speech-to-Text¶
file
¶
- Type: File
- Required: Yes
- Supported formats: flac, mp3, mp4, mpeg, mpga, m4a, ogg, wav, webm
- Description: Audio file to transcribe
model
¶
- Type: String
- Required: Yes
- Currently only supports: whisper-1
- Description: Model ID to use
language
¶
- Type: String
- Required: No
- Format: ISO-639-1 (e.g., "en")
- Description: Language of the audio, providing this can improve accuracy
prompt
¶
- Type: String
- Required: No
- Description: Text to guide the model's style or continue from a previous audio segment
response_format
¶
- Type: String
- Required: No
- Default: json
- Optional values: json, text, srt, verbose_json, vtt
- Description: Output format
temperature
¶
- Type: Number
- Required: No
- Default: 0
- Range: 0 - 1
- Description: Sampling temperature, higher values make output more random
timestamp_granularities
¶
- Type: Array
- Required: No
- Default: segment
- Optional values: word, segment
- Description: Granularity of transcription timestamps
Audio Translation¶
file
¶
- Type: File
- Required: Yes
- Supported formats: flac, mp3, mp4, mpeg, mpga, m4a, ogg, wav, webm
- Description: Audio file to translate
model
¶
- Type: String
- Required: Yes
- Currently only supports: whisper-1
- Description: Model ID to use
prompt
¶
- Type: String
- Required: No
- Description: English text to guide the model's style
response_format
¶
- Type: String
- Required: No
- Default: json
- Optional values: json, text, srt, verbose_json, vtt
- Description: Output format
temperature
¶
- Type: Number
- Required: No
- Default: 0
- Range: 0 - 1
- Description: Sampling temperature, higher values make output more random
📥 Response¶
Successful Response¶
Text-to-Speech¶
Returns binary audio file content.
Speech-to-Text¶
Basic JSON Format¶
Detailed JSON Format¶
{
"task": "transcribe",
"language": "english",
"duration": 8.47,
"text": "Complete transcribed text",
"segments": [
{
"id": 0,
"seek": 0,
"start": 0.0,
"end": 3.32,
"text": "Segmented transcribed text",
"tokens": [50364, 440, 7534],
"temperature": 0.0,
"avg_logprob": -0.286,
"compression_ratio": 1.236,
"no_speech_prob": 0.009
}
]
}
Audio Translation¶
Error Response¶
When a request encounters an issue, the API will return an error response object, with HTTP status codes in the 4XX-5XX range.
Common Error Status Codes¶
400 Bad Request
: Invalid request parameters401 Unauthorized
: Invalid API key or not provided429 Too Many Requests
: Exceeded API call limit500 Internal Server Error
: Server internal error
Error response example: