NeuTTS air: A speech language model with 748M parameters on-device and instant voice cloning

Neuphonic released NeuTTS AirText-to speech (TTS), an open source text-to – speech system speech language model The software is designed to be run in real-time on the CPUs. It is designed to run locally in real time on CPUs. Hugging Face model card You can find out more about this by clicking on the links below. 748M parameter Quantization (Q4/Q8) and Qwen2 (architecture) are supported. llama.cpp/llama-cpp-python Without cloud dependency. This software is available under the terms of Apache-2.0 This includes a runnable demo Examples

So what’s new in this?

NeuTTS Air couples a 0.5B-class Qwen backbone Neuphonics NeuCodec audio codec. Neuphonic describes the system in a positive light. “super-realistic, on-device” TTS LM clones a sound from Reference audio of 3 seconds The model card and repository explicitly emphasize the importance of privacy-sensitive voice agents. Both the model card as well as repository place an emphasis on this. Real-time CPU Generation and deployment with a small footprint.

The Key Features

The scale of Realism in Sub-1B Scale Text-to-speech system 0.7B class (Qwen2-class), preserving human-like prosody.
On-device deployment: Distributed in GGUF Compatible with laptops and Raspberry Pi boards.
Instant speaker cloning: The style transfer is a great way to get a new look.Three seconds of your time Reference audio (reference WAV plus transcript)
Compact LM+codec stack: Qwen 0.55B The backbone is paired with NeuCodec (0.8 kbps / 24 kHz) To balance output quality and latency.

This article will explain the runtime path and model architecture.?

Backbone: Qwen 0.55B Used as a light-weight LM for speech condition; artifact hosted is reported as 748M Params Under the qwen2 Hugging Face – architecture
Codec: NeuCodec provides low-bitrate acoustic tokenization/decoding; it targets The nadir of 0.8kbps The following are some examples of how to use 24-kHz Output, which allows for compact representations to be used efficiently on devices.
Quantization & format: Prebuilt GGUF Backbones (Q4/Q8) and instructions are included in the repository. llama-cpp-python The optional ONNX decoder path.
Dependencies: You can use it for a variety of purposes Espeak Phonemization examples are included, as is a Jupyter Notebook for complete synthesis.

Focus on device performance

NeuTTS Air Displays ‘real-time generation on mid-range devices‘ and offers CPU-first GGUF quantization for laptops, single-board computers. The distribution targets are still listed on the card even though no RTF/fps numbers have been published. Local Inference Without a GPU The Space and examples provided demonstrate a work flow.

Voice cloning workflow

NeuTTS Air requirements (1) Reference WAV Then (2), Text transcript It encodes the reference to style tokens and then synthesizes arbitrary text. This code encodes style tokens, and synthesizes text based on that reference. The timbre used by the speaker is the same as that of the original.. The Neuphonic Team recommends 3–15 s Clean mono audio with pre-encoded sample.

Watermarking, privacy, and responsibility

Neuphonic frames the model for The privacy of your device All audio generated includes the following: Perth watermark (Perceptual Limit) Support responsible usage and provenance.

What is the comparison?

NeuTTS Air stands out for its packaging. small LM + neural codec The following are some examples of how to use instant cloning, CPU-first quantizations” watermarking A permissive license is required. The “world’s first super-realistic, on-device speech LM” The vendor is claiming something; verifiable fact are what’s being claimed. Size, formats, cloning procedures, licensed runtimes, and provided runstimes.

The focus is on system trade-offs: a ~0.7B Qwen-class backbone with GGUF quantization paired with NeuCodec at 0.8 kbps/24 kHz is a pragmatic recipe for real-time, CPU-only TTS that preserves timbre using ~3–15 s style references while keeping latency and memory predictable. Apache 2.0 licensing and watermarking is deployment friendly, but publishing curves for RTF/latency and cloning quality vs. the reference length would allow rigorous benchmarking with existing pipelines. An offline path that has minimal dependencies, such as eSpeak or llama.cpp/ONNX, lowers the privacy/compliance risks for edge agents, without compromising on intelligibility.

Click here to find out more Model Card on Hugging Face The following are some examples of how to get started: GitHub Page. Please feel free to browse our GitHub Page for Tutorials, Codes and Notebooks. Also, feel free to follow us on Twitter Don’t forget about our 100k+ ML SubReddit Subscribe now our Newsletter. Wait! What? now you can join us on telegram as well.

Michal is a professional in the field of data science with a Masters of Science degree from University of Padova. Michal is a data scientist with a background in machine learning, statistical analysis and data engineering.

NeuTTS air: A speech language model with 748M parameters on-device and instant voice cloning

OpenMythos Coding Tutorial: Recurrent-Depth Transformers, Depth Extrapolation and Mixture of Experts Routing

OpenAI Releases GPT-5.5, a Absolutely Retrained Agentic Mannequin That Scores 82.7% on Terminal-Bench 2.0 and 84.9% on GDPval

Mend Releases AI Safety Governance Framework: Masking Asset Stock, Danger Tiering, AI Provide Chain Safety, and Maturity Mannequin

Google DeepMind Introduces Decoupled DiLoCo: An Asynchronous Coaching Structure Attaining 88% Goodput Below Excessive {Hardware} Failure Charges

Join Us for Our Livestream: Musk and Altman on the Future of OpenAI

Chatbots Use Your Emotions To Avoid Saying Goodbye

DOGE used a Meta AI model to review emails from federal workers

‘Fallout’ Producer Jonathan Nolan on AI: ‘We’re in Such a Frothy Moment’

Chris Hayes offers some tips for staying up to date with news

Top Insights

UC Berkeley introduces CyberGym, a real-world cybersecurity evaluation framework to evaluate AI agents on large-scale vulnerabilities across massive codebases.

Apple plans to continue selling iPhones after it turns 100

Latest News

Apple’s new CEO must launch an AI killer product

OpenMythos Coding Tutorial: Recurrent-Depth Transformers, Depth Extrapolation and Mixture of Experts Routing

NeuTTS air: A speech language model with 748M parameters on-device and instant voice cloning

So what’s new in this?

The Key Features

This article will explain the runtime path and model architecture.?

Focus on device performance

Voice cloning workflow

Watermarking, privacy, and responsibility

What is the comparison?

Related Posts