Text-to-Speech
Have your AI assistant speak responses aloud with 4 TTS providers, 5 verbosity levels, auto-speak, and per-message playback controls.
Overview
The TTS system reads AI responses aloud through your speakers. You can click the speaker icon on any assistant message for on-demand playback, or enable auto-speak to have every response spoken automatically. Verbosity levels let you control how much of each response gets read — from just the first sentence to the full text including code.
4
TTS Providers
5
Verbosity Levels
6-8
Voices Per Provider
0.5-2x
Speech Rate Range
TTS Providers
Four providers are available, ranging from free offline synthesis to premium cloud voices.
Windows SAPI
Built-in Windows speech synthesis. No API key required, no internet connection needed. Quality varies by installed Windows voice packs.
Setup: Works out of the box on Windows. Install additional voice packs via Windows Settings > Time & Language > Speech.
ElevenLabs
High-quality neural voices with natural intonation. Supports voice cloning and custom voice creation. Best for dialogue and narrative content.
Setup: Get an API key from elevenlabs.io. Paste it in Settings > API Keys > ElevenLabs.
OpenAI TTS
Six distinct voices (alloy, echo, fable, onyx, nova, shimmer) with consistent quality. Uses the same API key as OpenAI LLM services.
Setup: Get an API key from platform.openai.com. Paste it in Settings > API Keys > OpenAI TTS.
Azure Cognitive Services
Enterprise-grade speech synthesis with SSML support. Large voice library across many languages and styles.
Setup: Create a Speech resource in the Azure Portal. Enter both the API key and region in Settings > API Keys > Azure TTS.
Verbosity Levels
Control how much of each AI response gets spoken aloud. Useful for keeping audio concise during rapid iteration.
| Level | What Gets Spoken | Best For |
|---|---|---|
| Brief | First sentence of the response only | Quick confirmations, rapid workflows |
| Actions | Tool call names and result summaries (e.g., "Created Blueprint PlayerShip") | Monitoring what the AI is doing without full explanations |
| Plan Briefs | Phase names, task lists, and plan summaries only | Following along with project planning sessions |
| Content | Full text minus code blocks and JSON | General use — hear explanations without code being read aloud |
| Full | Everything including code | Accessibility, hands-free workflows |
Per-Message Playback
Every assistant message displays a speaker icon button. Click it to hear that specific message spoken aloud. The button shows three states:
- Idle (gray speaker): Click to start speaking
- Generating (amber spinner): Audio is being generated by the TTS provider
- Playing (blue waves): Audio is playing — click again to stop
Auto-Speak
When enabled, every new assistant response is automatically spoken aloud. Messages are queued if they arrive while another is still playing.
- Queuing: Rapid responses are queued and played sequentially
- Interrupt on typing: Start typing in the chat input to immediately stop speech and clear the queue
- Q&A questions: When the AI asks a question with answer options, the question text is auto-spoken
Voice & Rate Settings
Fine-tune the speech output to your preference.
- Voice selection: Each provider offers 6-8 voices. Select your preferred voice in the TTS settings dropdown.
- Speech rate: Adjustable from 0.5x (half speed) to 2.0x (double speed). Default is 1.0x.
- Volume: 0% to 100%. Independent of your system volume.
- Fallback provider: If the primary provider fails (API error, rate limit), TTS automatically falls back to your configured fallback. SAPI is recommended as a fallback since it works offline.
Speaking Animations
Visual feedback while TTS is playing helps you see which message is being spoken.
- Pulsing glow: The speaking message's border oscillates with the theme accent color
- Waveform bar: A 5-bar audio visualizer driven by playback amplitude
- Speaker rings: Expanding concentric circles on the speaker icon during playback
- Avatar pulse (Unity): Manny's avatar scales subtly in sync with speech
Themed Thinking Text
While the AI is processing your request, a themed status message is displayed instead of the default "Thinking..." text. Each engine has its own theme:
Circuit Theme (UE5)
Circuitry and data-flow phrases:
- "Routing signals..."
- "Traversing node graph..."
- "Compiling blueprints..."
- "Synchronizing clock cycles..."
Manny Theme (Unity)
Noir / Grim Fandango phrases:
- "Consulting the case files..."
- "Skulking through the shadows..."
- "Rattling some bones..."
- "Consulting the Department of Death..."
Toggle themed thinking text on or off in Settings > TTS > Themed Thinking.
Troubleshooting
No sound from SAPI provider
Ensure your system volume is not muted and a default audio output device is selected. On Windows, check that at least one voice pack is installed in Settings > Time & Language > Speech. Try the "Test Voice" button in the TTS settings panel.
Cloud provider returns an error
Verify your API key is correct and has available credits. Check that the selected voice ID is valid for your provider tier. ElevenLabs free tier has limited characters per month. If the primary provider fails, the fallback provider will be used automatically.
Audio cuts out during long responses
Long responses may hit the provider's maximum character limit. Try setting the verbosity to "Brief" or "Content" to reduce the text sent to the TTS provider. SAPI has no character limit but may sound less natural on very long text.
Speaker button stuck on "Generating"
The TTS provider may be experiencing high latency. Click the speaker button again to cancel and retry. Check your internet connection for cloud providers. Switch to SAPI as a fallback for instant offline generation.