Skip to main content

AI Voice: ElevenLabs & OpenAI TTS API Guide

Voice Generation Feature Banner

Transform text into natural-sounding speech with our enterprise-ready Voice Synthesis API. Perfect for developers, content creators, and businesses looking to add professional voice capabilities to their applications. Our API supports leading AI models from ElevenLabs and OpenAI, offering 30+ premium voices, multilingual support, and advanced customization options with enterprise-grade reliability.

info

For detailed information about using models with APIpie, check out our Models Overview and Completions Guide.

🎙️ Voice Synthesis Overview

The Voice Synthesis API allows developers to convert text into high-quality speech using state-of-the-art AI models. The API supports both ElevenLabs and OpenAI voice models, providing:

  • 30+ premium voices across multiple languages and accents
  • Advanced voice customization and style control
  • Real-time voice generation with low latency
  • Enterprise-grade reliability and scalability
  • Comprehensive usage analytics and monitoring
  • Secure API access with rate limiting

🎯 Model Comparison and Capabilities

Model Feature Comparison

FeatureElevenLabsOpenAI TTS
Voice QualityHigh fidelity with emotion controlProfessional studio quality
Language Support30+ languagesPrimary focus on English
Generation SpeedVariable (Flash to Standard)Consistently fast
CustomizationExtensive voice settingsBasic voice selection
Cost EfficiencyPay per characterPay per character
Real-time GenerationYes (with Flash models)Yes
Voice CloningAvailableNot available
Enterprise SupportYesYes

ElevenLabs Models

info

Visit our Models Overview for the most up-to-date list of supported voice models and their capabilities.

ModelDescriptionMax TokensProvider
eleven_multilingual_v2Latest multilingual model with enhanced quality5000elevenlabs
eleven_multilingual_v1First generation multilingual model5000elevenlabs
eleven_monolingual_v1English-optimized model5000elevenlabs
eleven_turbo_v2Fast generation model5000elevenlabs
eleven_turbo_v2_5Enhanced turbo model5000elevenlabs
eleven_flash_v2Ultra-fast generation5000elevenlabs
eleven_flash_v2_5Latest ultra-fast model5000elevenlabs

OpenAI Models

ModelDescriptionProvider
tts-1-hdHigh-definition voice modelsopenai
tts-1-1106Standard voice modelsopenai

🗣️ Available Voices

Technical Specifications

SpecificationDetails
Audio FormatMP3, WAV
Sample Rate16kHz - 48kHz
Bit Depth16-bit, 24-bit
ChannelsMono, Stereo
Latency200ms - 2000ms
Max Input Length5000 tokens
Rate LimitingYes (configurable)

ElevenLabs Voices

tip

Browse more voices in the ElevenLabs Voice Library

Professional Narration

  • Rachel: Young female, American accent, calm tone - ideal for narration
  • Drew: Middle-aged male, American accent - perfect for news reading
  • Antoni: Young male, American accent - well-rounded narrator
  • Thomas: Young male, American accent - calm meditation voice
  • Bill: Older male, American accent - trustworthy narration

Character Voices

  • Clyde: Middle-aged male, American accent - war veteran character
  • Dave: Young male, British-Essex accent - conversational gaming voice
  • Fin: Older male, Irish accent - sailor character
  • Glinda: Middle-aged female, American accent - witch character
  • Charlotte: Young female, Swedish accent - seductive character

News & Media

  • Paul: Middle-aged male, American accent - ground reporter
  • Sarah: Young female, American accent - soft news voice
  • Daniel: Middle-aged male, British accent - authoritative news
  • Alice: Middle-aged female, British accent - confident news
  • Joseph: Middle-aged male, British accent - field reporter

OpenAI Voices

HD Voices

  • Shimmer: Clear and expressive
  • Alloy: Versatile and balanced
  • Echo: Warm and natural
  • Fable: Engaging storyteller
  • Onyx: Deep and authoritative
  • Nova: Bright and energetic

📝 API Parameters and Configuration

info

For detailed API documentation and integration guides, visit our API Reference.

Required Parameters

  • model: The AI model to use for voice generation
  • voice: The specific voice to use
  • input: The text to convert to speech

Optional Parameters (ElevenLabs)

  • stability (0-1): Controls voice stability
  • similarity_boost (0-1): Enhances similarity to the original voice
  • style (0-1): Adjusts speaking style intensity
  • use_speaker_boost (boolean): Enhances speaker clarity

💡 Example API Calls

ElevenLabs Example

curl -X POST 'https://apipie.ai/v1/audio/speech' \
-H 'Content-Type: application/json' \
-H 'Authorization: Bearer YOUR_API_KEY' \
--data-raw '{
"model": "eleven_multilingual_v2",
"voice": "Rachel",
"input": "Hello! This is a test of the ElevenLabs text to speech API.",
"voice_settings": {
"stability": 0.5,
"similarity_boost": 0.75
}
}'

OpenAI Example

curl -X POST 'https://apipie.ai/v1/audio/speech' \
-H 'Content-Type: application/json' \
-H 'Authorization: Bearer YOUR_API_KEY' \
--data-raw '{
"model": "tts-1-hd",
"voice": "shimmer",
"input": "Hello! This is a test of the OpenAI text to speech API."
}'

📊 Response Examples

ElevenLabs Response

{
"created": 1729535643,
"audio": {
"content_type": "audio/mpeg",
"url": "https://example.com/generated-audio.mp3"
},
"usage": {
"text_characters": 57,
"cost": 0.004275,
"latency_ms": 1200
}
}

OpenAI Response

{
"created": 1729535643,
"audio": {
"content_type": "audio/mpeg",
"url": "https://example.com/generated-audio.mp3"
},
"usage": {
"text_characters": 52,
"cost": 0.003500,
"latency_ms": 800
}
}

🎯 Common Use Cases

  1. Content Creation

    • Audiobook production
    • Podcast generation
    • Video narration
    • E-learning content
  2. Entertainment

    • Game character voices
    • Animation dubbing
    • Interactive storytelling
    • Voice-enabled NPCs
  3. Business Applications

    • IVR systems
    • Virtual assistants
    • Customer service
    • Corporate training
  4. Accessibility

    • Screen readers
    • Text-to-speech for visually impaired
    • Language learning tools
    • Reading assistance

⚡ Best Practices

  1. Model Selection

    • Use multilingual models for multiple language support
    • Use turbo/flash models for faster generation
    • Use HD models for highest quality output
  2. Voice Selection

    • Choose voices based on use case
    • Consider accent and age appropriate for content
    • Test multiple voices to find the best fit
  3. Text Preparation

    • Use punctuation to control pacing
    • Break long text into natural segments
    • Include phonetic spelling for unusual words
  4. Performance Optimization

    • Cache frequently used audio
    • Implement proper error handling
    • Monitor usage and costs

⚠️ Error Handling

Common errors and solutions:

{
"error": {
"code": "invalid_voice",
"message": "The specified voice is not available for this model."
}
}

Solution: Verify voice compatibility with chosen model.

{
"error": {
"code": "text_too_long",
"message": "Input text exceeds maximum length for selected model."
}
}

Solution: Break text into smaller segments.

🔒 Security and Ethics

  • Voice generation requires responsible use
  • Implement appropriate content filtering
  • Monitor for potential misuse
  • Secure API access and authentication
  • Respect voice rights and permissions

📚 Additional Resources