Close Menu
  • AI
  • Content Creation
  • Tech
  • Robotics
AI-trends.todayAI-trends.today
  • AI
  • Content Creation
  • Tech
  • Robotics
Trending
  • How to Create AI Agents that Use Short-Term Memory, Long-Term Memory, and Episodic memory
  • A Coding Analysis and Experimentation of Decentralized Federated Education with Gossip protocols and Differential privacy
  • Jeffrey Epstein Had a ‘Personal Hacker,’ Informant Claims
  • PyKEEN: Coding for Training, Optimizing and Evaluating Knowledge Graph Embeddings
  • Robbyant LingBot World – a Real Time World Model of Interactive Simulations and Embodied AI
  • SERA is a Soft Verified Coding agent, built with only Supervised training for practical Repository level Automation Workflows.
  • I Let Google’s ‘Auto Browse’ AI Agent Take Over Chrome. It didn’t quite click
  • DeepSeek AI releases DeepSeek OCR 2 with Causal visual flow encoder for layout-aware document understanding
AI-trends.todayAI-trends.today
Home»Tech»Kyutai Releases Streaming 2B Parameter Text-to-Speech (TTS) with 220ms Latency and 2.5M Training Hours

Kyutai Releases Streaming 2B Parameter Text-to-Speech (TTS) with 220ms Latency and 2.5M Training Hours

Tech By Gavin Wallace05/07/20253 Mins Read
Facebook Twitter LinkedIn Email
Step-by-Step Guide to Creating Synthetic Data Using the Synthetic Data
Step-by-Step Guide to Creating Synthetic Data Using the Synthetic Data
Share
Facebook Twitter LinkedIn Email

Kyutai (an open AI research laboratory) has created a streaming Text to Speech (TTS), with over 2 billion parameters. This model is designed for high-fidelity audio production with ultra-low latencies (220 milliseconds). It is trained with an unprecedented 2,5 million hours of audio, and it’s licensed under the permissive CC BY 4.0. Kyutai has a commitment to openness. This breakthrough redefines large-scale models for speech production, especially in the context of edge deployment and AI agents.

Unpacking performance: Less than 350ms of latency on 32 users concurrently using a single L40 GPU

The streaming feature is the model’s most distinguishing characteristic. With a single NVIDIA N40 GPU the system supports up to 16 concurrent users with a low latency of 350ms. The model can maintain a generational latency of as little as 220ms for individual users, which allows applications like conversational agents and voice assistants to be used in near real time. Kyutai’s Delayed Streams Modeling, a novel approach to speech generation that allows models to produce incrementally as texts arrive, enables this performance.

Key technical Metrics

  • Model sizeThe 2B parameter
  • Training data2.5 Million Hours of Speech
  • Latency: 220ms single-user,
  • The Language SupportFrench, English
  • License: CC-BY-4.0 (open source)

Delayed Streams Modeling: Architecting Real-Time Responsiveness

Kyutai innovates with Delayed Streams Modeling. It is a method that enables speech synthesis even before the entire input text is complete. The approach was designed specifically to achieve a balance between prediction accuracy and response speed. This allows for high-throughput TTS. Unlike conventional autoregressive models that suffer from response lag, this architecture maintains temporal coherence while achieving faster-than-real-time synthesis.

Kyutai has the source code and the training recipe of this architecture. GitHub repositorySupporting full reproducibility as well as community contributions.

Model availability and open research commitment

Kyutai recently released model weights as well as inference scripts. Hugging FaceResearchers, developers and commercial teams can now access the data. The permissive CC BY 4.0 license allows for unrestricted integration and adaptation into applications as long as the proper attribution is kept.

It supports batch inference as well as streaming inference. This makes this release a flexible foundation for voice cloning and real-time bots. Kyutai provides the foundation for TTS pipelines in multiple languages with its pretrained English-French models.

Real-Time AI Applications and Implications

Kyutai’s Model reduces the latency of speech to 200ms or less, which is a human-perceivable delay.

  • Talking AIVoice interfaces that are human-like with a low turnaround
  • Assistive techScreen readers that are faster and systems with voice feedback
  • Media ProductionVoiceovers – rapid cycles of iteration
  • Edge DevicesOptimized Inference for Low-Power or On-Device Environments

This makes the L40 GPU attractive to cloud-based environments that need to scale up speech services.

The conclusion: Ready, Open and Fast for Deployment

Kyutai’s release of streaming TTS is a landmark in the field of speech AI. It addresses the needs of both real-world teams and researchers with its high-quality, low-latency synthesis and generous licenses. Its reproducibility and multilingual support make this model a superior alternative to proprietary software.

You can find out more about the model on our website. Hugging FaceTechnical explanation Kyutai’s siteThe implementation of. GitHub.


Sana Hassan has a passion for applying AI and technology to real world challenges. Sana Hassan, an intern at Marktechpost and dual-degree student at IIT Madras is passionate about applying technology and AI to real-world challenges.

AI met Speech Streaming x
Share. Facebook Twitter LinkedIn Email
Avatar
Gavin Wallace

Related Posts

How to Create AI Agents that Use Short-Term Memory, Long-Term Memory, and Episodic memory

02/02/2026

A Coding Analysis and Experimentation of Decentralized Federated Education with Gossip protocols and Differential privacy

02/02/2026

PyKEEN: Coding for Training, Optimizing and Evaluating Knowledge Graph Embeddings

31/01/2026

Robbyant LingBot World – a Real Time World Model of Interactive Simulations and Embodied AI

31/01/2026
Top News

AI activists rethink their strategy in the face of a changing industry

OpenAI Anthropic Block are teaming up to create AI agents that play nicely

AI may soon help you understand what your pet is trying to say

Ed Zitron gets paid to love AI. Ed Zitron is also paid to hate AI

Tech Disrupted Friendship. Now is the time to restore it

Load More
AI-Trends.Today

Your daily source of AI news and trends. Stay up to date with everything AI and automation!

X (Twitter) Instagram
Top Insights

We Replaced SMS Authentication With Email and Authenticator Apps — Here’s Why

03/10/2025

OpenAI Releases the ‘circuit sparsity’ : A set of open tools for connecting weight-sparse models and dense baselines via activation bridges

14/12/2025
Latest News

How to Create AI Agents that Use Short-Term Memory, Long-Term Memory, and Episodic memory

02/02/2026

A Coding Analysis and Experimentation of Decentralized Federated Education with Gossip protocols and Differential privacy

02/02/2026
X (Twitter) Instagram
  • Privacy Policy
  • Contact Us
  • Terms and Conditions
© 2026 AI-Trends.Today

Type above and press Enter to search. Press Esc to cancel.