A new audio foundation model from Liquid AI, LFM2-Audio-1.50B with response times of under 100 milliseconds.

Liquid AI has released LFM2-Audio-1.5B, a compact audio–language foundation model that both understands and generates speech and text through a single end-to-end stack. The LFM2-Audio-1.5B is aimed at low-latency real-time assistants for resource-constrained mobile devices. It extends the LFM2 Family into audio, while maintaining a compact footprint.

https://www.liquid.ai/blog/lfm2-audio-an-end-to-end-audio-foundation-model

What is new, then? The unified backbone and disentangled I/O audio

LFM2-Audio is an extension of the LFM2 1.2B parameter language model to include audio and text sequence tokens. It is important to note that the model incorporates a sequence token system. disentangles Audio representation: Inputs are continuous embeddings directly projected from raw waveforms chunks (80ms), and outputs discrete audio code. It avoids artifacts of discretization on the input while maintaining training and generation autoregressive in both modalities for the output.

The released checkpoint is used to implement:

Backbone: LFM2 (hybrid + attention), params 1.2B (LM only).
Audio encoder: FastConformer (115M canary 180m-flash).
Audio decoder: RQ-Transformer predicting discrete Mimi Codec Tokens (eight codebooks).
Context: 32,768 tokens; vocab: 65,536 (text) / 2049×8 (audio)
Precision: bfloat16; license: LFM Open License Version 1.0 languages: English

There are two modes of real-time agent generation

Interleaved Generation For live speech-to-speech, the model alternates audio and text tokens in order to minimize perceived latencies.
Sequential generation For ASR/TTS, (reversing modalities at each turn).

Liquid AI includes a Python package.liquid-audioThis is a Gradio demonstration that reproduces these behaviors.

Latency:

Team Liquid reports an end-toend latency of below One hundred ms from a 4-second audio query to the first audible response—a proxy for perceived responsiveness in interactive use—stating it is faster than models smaller than 1.5B parameters under their setup.

VoiceBench and results of ASR benchmarks

You can find out more about this by clicking here. VoiceBench—a suite of nine audio-assistant evaluations—Liquid reports an Score of 56.78 The blog chart includes the task numbers for LFM2-Audio-1.50B (e.g. AlpacaEval 3.71, CommonEval 3.49) and WildVoice 3.17) In the table, Liquid AI compares its results with those of larger models such as Qwen2.5 Omni-3B and Moshi-7B. VoiceBench (an external benchmark that was introduced for LLM-based assistants in 2024)

The model card on Hugging Face provides an additional VoiceBench table (with closely related—but not identical—per-task values) and includes classic ASR WERs where LFM2-Audio matches or improves on Whisper-large-v3-turbo for some datasets despite being a generalist speech–text model. For example (lower is better): AMI 15.36 vs. 16.13 (Whisper-large-v3-turbo), LibriSpeech-clean 2.03 vs. 2.10.

https://huggingface.co/LiquidAI/LFM2-Audio-1.5B

Why does voice AI matter?

You can find out more about this by clicking here. “omni” stacks couple ASR → LLM → TTS, which adds latency and brittle interfaces. LFM2-Audio, with its single-backbone system and continuous embeddings of input and discrete codes for output, reduces glue and allows early audio emission. This means faster response times for developers and simpler pipelines. It also supports ASR, TTS and conversational agents with one model. Liquid AI offers code, entry points for demos, and distribution through Hugging Face.

Take a look at the GitHub Page, Hugging Face Model Card The following are some examples of how to get started: Technical details. Please feel free to browse our GitHub Page for Tutorials, Codes and Notebooks. Also, feel free to follow us on Twitter Don’t forget about our 100k+ ML SubReddit Subscribe Now our Newsletter. Wait! Are you using Telegram? now you can join us on telegram as well.

Asif Razzaq, CEO of Marktechpost Media Inc. is a visionary engineer and entrepreneur who is dedicated to harnessing Artificial Intelligence’s potential for the social good. Marktechpost was his most recent venture. This platform, dedicated to Artificial Intelligence, is known for the in-depth reporting of news on machine learning and deep understanding that is technically correct and understandable by all audiences. This platform has over 2,000,000 monthly views which shows its popularity.

A new audio foundation model from Liquid AI, LFM2-Audio-1.50B with response times of under 100 milliseconds.

Anthropic releases Claude Opus 4.7, a major upgrade for agentic coding, high-resolution vision, and long-horizon autonomous tasks

The Coding Guide to Property Based Testing with Hypothesis and Stateful, Differential and Metamorphic Test Designs

Google AI Releases Google Auto-Diagnosis: A Large Language Model LLM Based System to Diagnose Integrity Test Failures At Scale

This is a complete guide to running OpenAI’s GPT-OSS open-weight models using advanced inference workflows.

Wired Roundup: 5 Trends in Tech and Politics that Will Shape 2025

Cursor launches a new AI agent experience to take on Claude Code and Codex

The Cursor AI Coding tool is now available for designers

Guillermo del Toro Hopes He’s Dead Before AI Art Goes Mainstream

AI will never be conscious

Top Insights

OpenAI hires 4 high-ranking engineers from Tesla, xAI and Meta

Google DeepMind releases Lyria 3 – an advanced music generation AI model that turns photos and text into custom tracks with included lyrics and vocals.

Latest News

Anthropic releases Claude Opus 4.7, a major upgrade for agentic coding, high-resolution vision, and long-horizon autonomous tasks

The Coding Guide to Property Based Testing with Hypothesis and Stateful, Differential and Metamorphic Test Designs

A new audio foundation model from Liquid AI, LFM2-Audio-1.50B with response times of under 100 milliseconds.

What is new, then? The unified backbone and disentangled I/O audio

There are two modes of real-time agent generation

Latency:

VoiceBench and results of ASR benchmarks

Why does voice AI matter?

Related Posts