Neuphonic released NeuTTS AirText-to speech (TTS), an open source text-to – speech system speech language model The software is designed to be run in real-time on the CPUs. It is designed to run locally in real time on CPUs. Hugging Face model card You can find out more about this by clicking on the links below. 748M parameter Quantization (Q4/Q8) and Qwen2 (architecture) are supported. llama.cpp/llama-cpp-python Without cloud dependency. This software is available under the terms of Apache-2.0 This includes a runnable demo Examples
So what’s new in this?
NeuTTS Air couples a 0.5B-class Qwen backbone Neuphonics NeuCodec audio codec. Neuphonic describes the system in a positive light. “super-realistic, on-device” TTS LM clones a sound from Reference audio of 3 seconds The model card and repository explicitly emphasize the importance of privacy-sensitive voice agents. Both the model card as well as repository place an emphasis on this. Real-time CPU Generation and deployment with a small footprint.
The Key Features
- The scale of Realism in Sub-1B Scale Text-to-speech system 0.7B class (Qwen2-class), preserving human-like prosody.
- On-device deployment: Distributed in GGUF Compatible with laptops and Raspberry Pi boards.
- Instant speaker cloning: The style transfer is a great way to get a new look.Three seconds of your time Reference audio (reference WAV plus transcript)
- Compact LM+codec stack: Qwen 0.55B The backbone is paired with NeuCodec (0.8 kbps / 24 kHz) To balance output quality and latency.
This article will explain the runtime path and model architecture.?
- Backbone: Qwen 0.55B Used as a light-weight LM for speech condition; artifact hosted is reported as 748M Params Under the qwen2 Hugging Face – architecture
- Codec: NeuCodec provides low-bitrate acoustic tokenization/decoding; it targets The nadir of 0.8kbps The following are some examples of how to use 24-kHz Output, which allows for compact representations to be used efficiently on devices.
- Quantization & format: Prebuilt GGUF Backbones (Q4/Q8) and instructions are included in the repository.
llama-cpp-pythonThe optional ONNX decoder path. - Dependencies: You can use it for a variety of purposes
EspeakPhonemization examples are included, as is a Jupyter Notebook for complete synthesis.
Focus on device performance
NeuTTS Air Displays ‘real-time generation on mid-range devices‘ and offers CPU-first GGUF quantization for laptops, single-board computers. The distribution targets are still listed on the card even though no RTF/fps numbers have been published. Local Inference Without a GPU The Space and examples provided demonstrate a work flow.
Voice cloning workflow
NeuTTS Air requirements (1) Reference WAV Then (2), Text transcript It encodes the reference to style tokens and then synthesizes arbitrary text. This code encodes style tokens, and synthesizes text based on that reference. The timbre used by the speaker is the same as that of the original.. The Neuphonic Team recommends 3–15 s Clean mono audio with pre-encoded sample.
Watermarking, privacy, and responsibility
Neuphonic frames the model for The privacy of your device All audio generated includes the following: Perth watermark (Perceptual Limit) Support responsible usage and provenance.
What is the comparison?
NeuTTS Air stands out for its packaging. small LM + neural codec The following are some examples of how to use instant cloning, CPU-first quantizations” watermarking A permissive license is required. The “world’s first super-realistic, on-device speech LM” The vendor is claiming something; verifiable fact are what’s being claimed. Size, formats, cloning procedures, licensed runtimes, and provided runstimes.
The focus is on system trade-offs: a ~0.7B Qwen-class backbone with GGUF quantization paired with NeuCodec at 0.8 kbps/24 kHz is a pragmatic recipe for real-time, CPU-only TTS that preserves timbre using ~3–15 s style references while keeping latency and memory predictable. Apache 2.0 licensing and watermarking is deployment friendly, but publishing curves for RTF/latency and cloning quality vs. the reference length would allow rigorous benchmarking with existing pipelines. An offline path that has minimal dependencies, such as eSpeak or llama.cpp/ONNX, lowers the privacy/compliance risks for edge agents, without compromising on intelligibility.
Click here to find out more Model Card on Hugging Face The following are some examples of how to get started: GitHub Page. Please feel free to browse our GitHub Page for Tutorials, Codes and Notebooks. Also, feel free to follow us on Twitter Don’t forget about our 100k+ ML SubReddit Subscribe now our Newsletter. Wait! What? now you can join us on telegram as well.

