OmniVoice: Game-Changing Open-Source TTS Model Supports 600+ Languages with Zero-Shot Cloning
April 21, 2026
The intended audience spans solo developers, AI agents, multilingual content creators, and applications for low-resource languages, with a long-term maintenance history rooted in Kaldi being highlighted as a positive for infrastructure decisions.
Production tips emphasize saving reference audio for voice consistency, caching prompts to boost throughput, and normalizing numbers to words to reduce output variability and cross-language accent issues.
OmniVoice is an open-source text-to-speech model released at the end of March 2026 by the k2-fsa team, licensed Apache 2.0, and it supports 600+ languages with zero-shot voice cloning and a claimed 40x real-time speed.
Practical usage is demonstrated with Python API examples for loading the model and generating audio from reference audio, using design attributes and inline expression tokens for non-verbal cues, plus CLI tools for single and batch inferences.
Additional features include pronunciation overrides (CMU notation), language-specific scripts (pinyin with tone numbers), inline tokens for expressions like laughter or sigh, and notes on cross-lingual accent bleed to help achieve a neutral accent.
The installation guide calls for a PyTorch 2.8.0 environment and provides platform-specific commands for NVIDIA CUDA and Apple Silicon, plus a quick Gradio web demo for validation.
Key technical detail is a diffusion language model hybrid architecture based on Qwen3-0.6B that delivers high-quality speech with efficiency suitable for consumer GPUs, running on NVIDIA CUDA 12.8 or Apple Silicon, with baseline support for 600+ languages in zero-shot mode.
The model has rapidly drawn attention, accumulating thousands of GitHub stars and hundreds of thousands of HuggingFace downloads within weeks.
Resources for developers and researchers include a GitHub repository, a HuggingFace model card, an arXiv paper, and active community issues for ongoing discussion.
Summary based on 1 source
Get a daily email with more Tech stories
Source

DEV Community • Apr 21, 2026
OmniVoice: Open-Source TTS with 600+ Languages and Zero-Shot Voice Cloning