French AI company Kyutai has just released its latest innovation: „Unmute“. It’s been a while since their initial model, „Moshi,“ which, to be frank, wasn’t quite suitable for younger audiences. However, the Kyutai team clearly put their heads down and persevered, proving that sometimes great things happen with persistent effort (think of the incredible turnaround of No Man’s Sky!).
Their new approach to AI Voice, „Unmute,“ features a straightforward, cascaded system:
- A speech-to-text component transcribes what you say.
- An LLM (they’re using Gemma 3 12B) generates the text of the response.
- Kyutai’s text-to-speech model then vocalizes the response.
The improvements are significant. There’s virtually no latency between your input and the system’s answer. Furthermore, „Unmute“ intelligently waits for you to finish speaking and demonstrates a strong understanding of dialogue pacing.
As always, I’m rooting for the „underdog,“ and I truly hope „Unmute“ marks the beginning of Kyutai’s redemption arc. It would be fantastic to witness this transformation. A demo is available to experience „Unmute“ firsthand, and I’d love to hear your thoughts on their progress.