xAI launches Grok Voice API — TTS and STT with 5 named voices, WebSocket agents, and native MCP support

Source

Grok's Text to Speech API is now available. Start building with natural voices and expressive controls to bring your apps to life.

March 16, 2026VIEW ON X →

What Happened

xAI launched Grok Voice on March 16, a two-part voice API covering both Text-to-Speech and Speech-to-Text. The TTS product offers five named voices — Eve, Ara, Leo, Rex, and Sal — with expressive controls including inline tags for whisper, pause, laugh, chuckle, giggle, and hum, targeting both telephony and web audio formats. The STT product accepts mic input or file upload (MP3, up to 25MB). Beyond standalone TTS and STT, xAI is positioning Grok Voice as a voice agent platform: real-time streaming over WebSocket, multilingual support, native tool calling, MCP server support, and web search integration. Free to start, no credit card required. Available via the xAI API at x.ai/api/voice.

Why It Matters

The MCP support detail is what separates this from a standard TTS API drop. Every other voice API on the market — ElevenLabs, OpenAI TTS, Google Cloud TTS — is a synthesis endpoint: text in, audio out. xAI is shipping voice as an agent layer: the voice interface can call tools, run searches, and connect to MCP servers within the same API. That means developers can build voice agents that speak, reason, and act in a single stack without stitching together a TTS API, an LLM API, and a tool execution layer separately.