Skip to main content

Overview

Tabbly TTS provides a streaming Text-to-Speech API that allows you to generate high-quality voice audio in real-time for your voice AI applications. The API streams audio as it’s generated, providing low-latency responses for real-time use cases.

Base URL

https://api.tabbly.io

Endpoint

Authentication

All requests require an API key passed via the X-API-Key header.
X-API-Key
string
required
Your Tabbly TTS API key

Request

text
string
required
The text to convert to speech
voice_id
string
default:"Ashley"
Voice ID to use for synthesis. Default: “Ashley”
model_id
string
default:"tabbly-tts"
Model ID to use. Default: “tabbly-tts”

Response

Content-Type: audio/wav or application/octet-stream Format: LINEAR16 PCM, 48kHz, mono Streaming: Yes - audio is streamed as it’s generated Protocol: HTTP streaming with WAV-encoded audio chunks embedded in the stream

Example Request

curl -X POST 'https://api.tabbly.io/tts/stream' \
-H 'Content-Type: application/json' \
-H 'X-API-Key: your-api-key-here' \
-d '{
    "text": "Hello, this is a test of the Tabbly TTS streaming API",
    "voice_id": "Ashley",
    "model_id": "tabbly-tts"
}'

Audio Format

Sample Rate
integer
48000 Hz (fixed)
Channels
integer
1 (mono)
Bit Depth
integer
16-bit
Format
string
LINEAR16 PCM
MIME Type
string
audio/wav (stream may contain embedded WAV files)

WAV Header Processing

The response may include WAV files embedded in the stream. When processing the stream:
  1. Detect WAV Headers: Look for RIFF and WAVE markers
  2. Extract PCM Data: Find the data chunk and extract raw PCM audio
  3. Handle Multiple WAV Files: The stream may contain multiple WAV files
  4. Process Audio Chunks: Handle audio data as it arrives for real-time playback

WAV File Structure

RIFF (4 bytes) - "RIFF"
File Size (4 bytes)
WAVE (4 bytes) - "WAVE"
... (format chunk, etc.)
data (4 bytes) - "data"
Data Size (4 bytes)
[PCM Audio Data]

Processing Tips

  • Skip Headers: First 44 bytes typically contain WAV header
  • Extract Data Chunk: Look for data marker to find audio start
  • Handle Multiple Files: Stream may contain multiple WAV files sequentially
  • Frame Alignment: Ensure chunks are aligned to 16-bit sample boundaries (even bytes)

Error Responses

400
object
Bad Request - Invalid parameters or missing required fields
401
object
Unauthorized - Invalid or missing API key
402
object
Payment Required - Insufficient wallet balance
500
object
Server error

Rate Limits

API rate limits apply to prevent abuse. Contact support if you need higher limits.

Best Practices

Process audio chunks as they arrive rather than waiting for the complete response. This reduces latency and enables real-time playback.
The API may send WAV files embedded in the stream. Always extract PCM data from WAV chunks for proper playback. See code examples above.
Always handle HTTP errors and network timeouts gracefully. Implement retry logic for transient failures.
Choose appropriate voice_id based on your use case. Different voices may have different characteristics and languages.
For very long texts, consider splitting into smaller chunks for better streaming performance and lower latency.
Reuse HTTP client connections when making multiple requests to improve performance and reduce connection overhead.
For real-time playback, implement buffering (10-20ms) to smooth out network jitter and prevent audio artifacts.
Ensure audio chunks are aligned to 16-bit sample boundaries (even number of bytes) to prevent audio clicks or pops.

Troubleshooting

No Audio Output

  • Verify API key is correct and has sufficient wallet balance
  • Check network connectivity to https://api.tabbly.io
  • Review response status code (should be 200)
  • Verify WAV headers are being detected and processed correctly
  • Check logs for HTTP connection errors

Audio Quality Issues

  • Ensure sample rate matches (48000 Hz)
  • Verify WAV header parsing is working correctly
  • Check audio data format (should be LINEAR16 PCM)
  • Ensure frame alignment (even number of bytes per chunk)
  • Check for proper PCM extraction from WAV files

Performance Issues

  • Reuse HTTP client instances for better performance
  • Monitor API response times
  • Consider caching for repeated text
  • Implement proper buffering for real-time playback
  • Check network latency to API endpoint

WAV Processing Issues

  • Verify WAV headers are being detected (RIFF and WAVE markers)
  • Check if data chunk is being found correctly
  • Ensure multiple WAV files in stream are handled properly
  • Verify PCM data extraction is working
  • Check for incomplete WAV files (keep in buffer until complete)

Integration Examples

Real-Time Playback

For real-time playback, implement buffering to smooth out network jitter:
import asyncio
import httpx

async def stream_with_buffering(text: str, api_key: str, output_queue):
    """Stream TTS with buffering for smooth playback."""
    url = "https://api.tabbly.io/tts/stream"
    headers = {"Content-Type": "application/json", "X-API-Key": api_key}
    data = {"text": text, "voice_id": "Ashley", "model_id": "tabbly-tts"}
    
    buffer = bytearray()
    CHUNK_SIZE = 960  # 10ms at 48kHz
    
    async with httpx.AsyncClient() as client:
        async with client.stream("POST", url, json=data, headers=headers) as response:
            response.raise_for_status()
            
            async for chunk in response.aiter_bytes():
                buffer.extend(chunk)
                # Process WAV headers and extract PCM...
                # When buffer >= CHUNK_SIZE, push to output_queue
                while len(buffer) >= CHUNK_SIZE:
                    await output_queue.put(buffer[:CHUNK_SIZE])
                    buffer = buffer[CHUNK_SIZE:]

Next Steps

  • Learn how to integrate with LiveKit: LiveKit Integration
  • Review best practices: Best Practices
  • Get your API key from the Tabbly dashboard
  • Review example implementations in the code samples above