OmniVoice: Game-Changing Open-Source TTS Model Supports 600+ Languages with Zero-Shot Cloning

April 21, 2026
OmniVoice: Game-Changing Open-Source TTS Model Supports 600+ Languages with Zero-Shot Cloning
  • The intended audience spans solo developers, AI agents, multilingual content creators, and applications for low-resource languages, with a long-term maintenance history rooted in Kaldi being highlighted as a positive for infrastructure decisions.

  • Production tips emphasize saving reference audio for voice consistency, caching prompts to boost throughput, and normalizing numbers to words to reduce output variability and cross-language accent issues.

  • OmniVoice is an open-source text-to-speech model released at the end of March 2026 by the k2-fsa team, licensed Apache 2.0, and it supports 600+ languages with zero-shot voice cloning and a claimed 40x real-time speed.

  • Practical usage is demonstrated with Python API examples for loading the model and generating audio from reference audio, using design attributes and inline expression tokens for non-verbal cues, plus CLI tools for single and batch inferences.

  • Additional features include pronunciation overrides (CMU notation), language-specific scripts (pinyin with tone numbers), inline tokens for expressions like laughter or sigh, and notes on cross-lingual accent bleed to help achieve a neutral accent.

  • The installation guide calls for a PyTorch 2.8.0 environment and provides platform-specific commands for NVIDIA CUDA and Apple Silicon, plus a quick Gradio web demo for validation.

  • Key technical detail is a diffusion language model hybrid architecture based on Qwen3-0.6B that delivers high-quality speech with efficiency suitable for consumer GPUs, running on NVIDIA CUDA 12.8 or Apple Silicon, with baseline support for 600+ languages in zero-shot mode.

  • The model has rapidly drawn attention, accumulating thousands of GitHub stars and hundreds of thousands of HuggingFace downloads within weeks.

  • Resources for developers and researchers include a GitHub repository, a HuggingFace model card, an arXiv paper, and active community issues for ongoing discussion.

Summary based on 1 source


Get a daily email with more Tech stories

More Stories