Text-to-Speech

Have your AI assistant speak responses aloud with 4 TTS providers, 5 verbosity levels, auto-speak, and per-message playback controls.

Overview

The TTS system reads AI responses aloud through your speakers. You can click the speaker icon on any assistant message for on-demand playback, or enable auto-speak to have every response spoken automatically. Verbosity levels let you control how much of each response gets read — from just the first sentence to the full text including code.

TTS Providers

Verbosity Levels

6-8

Voices Per Provider

0.5-2x

Speech Rate Range

TTS Providers

Four providers are available, ranging from free offline synthesis to premium cloud voices.

Offline

Windows SAPI

Built-in Windows speech synthesis. No API key required, no internet connection needed. Quality varies by installed Windows voice packs.

Setup: Works out of the box on Windows. Install additional voice packs via Windows Settings > Time & Language > Speech.

Cloud

ElevenLabs

High-quality neural voices with natural intonation. Supports voice cloning and custom voice creation. Best for dialogue and narrative content.

Setup: Get an API key from elevenlabs.io. Paste it in Settings > API Keys > ElevenLabs.

Cloud

OpenAI TTS

Six distinct voices (alloy, echo, fable, onyx, nova, shimmer) with consistent quality. Uses the same API key as OpenAI LLM services.

Setup: Get an API key from platform.openai.com. Paste it in Settings > API Keys > OpenAI TTS.

Cloud

Azure Cognitive Services

Enterprise-grade speech synthesis with SSML support. Large voice library across many languages and styles.

Setup: Create a Speech resource in the Azure Portal. Enter both the API key and region in Settings > API Keys > Azure TTS.

Verbosity Levels

Control how much of each AI response gets spoken aloud. Useful for keeping audio concise during rapid iteration.

Level	What Gets Spoken	Best For
Brief	First sentence of the response only	Quick confirmations, rapid workflows
Actions	Tool call names and result summaries (e.g., "Created Blueprint PlayerShip")	Monitoring what the AI is doing without full explanations
Plan Briefs	Phase names, task lists, and plan summaries only	Following along with project planning sessions
Content	Full text minus code blocks and JSON	General use — hear explanations without code being read aloud
Full	Everything including code	Accessibility, hands-free workflows

Per-Message Playback

Every assistant message displays a speaker icon button. Click it to hear that specific message spoken aloud. The button shows three states:

Idle (gray speaker): Click to start speaking
Generating (amber spinner): Audio is being generated by the TTS provider
Playing (blue waves): Audio is playing — click again to stop

Audio caching: Once a message has been spoken, the audio is cached in memory. Clicking the speaker icon again plays the cached audio instantly without re-generating.

Auto-Speak

When enabled, every new assistant response is automatically spoken aloud. Messages are queued if they arrive while another is still playing.

Queuing: Rapid responses are queued and played sequentially
Interrupt on typing: Start typing in the chat input to immediately stop speech and clear the queue
Q&A questions: When the AI asks a question with answer options, the question text is auto-spoken

Voice & Rate Settings

Fine-tune the speech output to your preference.

Voice selection: Each provider offers 6-8 voices. Select your preferred voice in the TTS settings dropdown.
Speech rate: Adjustable from 0.5x (half speed) to 2.0x (double speed). Default is 1.0x.
Volume: 0% to 100%. Independent of your system volume.
Fallback provider: If the primary provider fails (API error, rate limit), TTS automatically falls back to your configured fallback. SAPI is recommended as a fallback since it works offline.

Speaking Animations

Visual feedback while TTS is playing helps you see which message is being spoken.

Pulsing glow: The speaking message's border oscillates with the theme accent color
Waveform bar: A 5-bar audio visualizer driven by playback amplitude
Speaker rings: Expanding concentric circles on the speaker icon during playback
Avatar pulse (Unity): Manny's avatar scales subtly in sync with speech

Themed Thinking Text

While the AI is processing your request, a themed status message is displayed instead of the default "Thinking..." text. Each engine has its own theme:

Circuit Theme (UE5)

Circuitry and data-flow phrases:

"Routing signals..."
"Traversing node graph..."
"Compiling blueprints..."
"Synchronizing clock cycles..."

Manny Theme (Unity)

Noir / Grim Fandango phrases:

"Consulting the case files..."
"Skulking through the shadows..."
"Rattling some bones..."
"Consulting the Department of Death..."

Toggle themed thinking text on or off in Settings > TTS > Themed Thinking.

Troubleshooting

No sound from SAPI provider

Ensure your system volume is not muted and a default audio output device is selected. On Windows, check that at least one voice pack is installed in Settings > Time & Language > Speech. Try the "Test Voice" button in the TTS settings panel.

Cloud provider returns an error

Verify your API key is correct and has available credits. Check that the selected voice ID is valid for your provider tier. ElevenLabs free tier has limited characters per month. If the primary provider fails, the fallback provider will be used automatically.

Audio cuts out during long responses

Long responses may hit the provider's maximum character limit. Try setting the verbosity to "Brief" or "Content" to reduce the text sent to the TTS provider. SAPI has no character limit but may sound less natural on very long text.

Speaker button stuck on "Generating"

The TTS provider may be experiencing high latency. Click the speaker button again to cancel and retry. Check your internet connection for cloud providers. Switch to SAPI as a fallback for instant offline generation.

← AI Providers Avatars →