Close Menu
  • AI
  • Content Creation
  • Tech
  • Robotics
AI-trends.todayAI-trends.today
  • AI
  • Content Creation
  • Tech
  • Robotics
Trending
  • Anthropic releases Claude Opus 4.7, a major upgrade for agentic coding, high-resolution vision, and long-horizon autonomous tasks
  • The Coding Guide to Property Based Testing with Hypothesis and Stateful, Differential and Metamorphic Test Designs
  • Schematik Is ‘Cursor for Hardware.’ The Anthropics Want In
  • Hacking the EU’s new age-verification app takes only 2 minutes
  • Google AI Releases Google Auto-Diagnosis: A Large Language Model LLM Based System to Diagnose Integrity Test Failures At Scale
  • This is a complete guide to running OpenAI’s GPT-OSS open-weight models using advanced inference workflows.
  • The Huey Code Guide: Build a High-Performance Background Task Processor Using Scheduling with Retries and Pipelines.
  • Top 19 AI Red Teaming Tools (2026): Secure Your ML Models
AI-trends.todayAI-trends.today
Home»Tech»A new audio foundation model from Liquid AI, LFM2-Audio-1.50B with response times of under 100 milliseconds.

A new audio foundation model from Liquid AI, LFM2-Audio-1.50B with response times of under 100 milliseconds.

Tech By Gavin Wallace01/10/20254 Mins Read
Facebook Twitter LinkedIn Email
Samsung Researchers Introduced ANSE (Active Noise Selection for Generation): A
Samsung Researchers Introduced ANSE (Active Noise Selection for Generation): A
Share
Facebook Twitter LinkedIn Email

Liquid AI has released LFM2-Audio-1.5B, a compact audio–language foundation model that both understands and generates speech and text through a single end-to-end stack. The LFM2-Audio-1.5B is aimed at low-latency real-time assistants for resource-constrained mobile devices. It extends the LFM2 Family into audio, while maintaining a compact footprint.

https://www.liquid.ai/blog/lfm2-audio-an-end-to-end-audio-foundation-model

What is new, then? The unified backbone and disentangled I/O audio

LFM2-Audio is an extension of the LFM2 1.2B parameter language model to include audio and text sequence tokens. It is important to note that the model incorporates a sequence token system. disentangles Audio representation: Inputs are continuous embeddings directly projected from raw waveforms chunks (80ms), and outputs discrete audio code. It avoids artifacts of discretization on the input while maintaining training and generation autoregressive in both modalities for the output.

The released checkpoint is used to implement:

🚨 [Recommended Read] ViPE (Video Pose Engine): A Powerful and Versatile 3D Video Annotation Tool for Spatial AI
  • Backbone: LFM2 (hybrid + attention), params 1.2B (LM only).
  • Audio encoder: FastConformer (115M canary 180m-flash).
  • Audio decoder: RQ-Transformer predicting discrete Mimi Codec Tokens (eight codebooks).
  • Context: 32,768 tokens; vocab: 65,536 (text) / 2049×8 (audio)
  • Precision: bfloat16; license: LFM Open License Version 1.0 languages: English
https://www.liquid.ai/blog/lfm2-audio-an-end-to-end-audio-foundation-model

There are two modes of real-time agent generation

  • Interleaved Generation For live speech-to-speech, the model alternates audio and text tokens in order to minimize perceived latencies.
  • Sequential generation For ASR/TTS, (reversing modalities at each turn).

Liquid AI includes a Python package.liquid-audioThis is a Gradio demonstration that reproduces these behaviors.

Latency:

Team Liquid reports an end-toend latency of below One hundred ms from a 4-second audio query to the first audible response—a proxy for perceived responsiveness in interactive use—stating it is faster than models smaller than 1.5B parameters under their setup.

VoiceBench and results of ASR benchmarks

You can find out more about this by clicking here. VoiceBench—a suite of nine audio-assistant evaluations—Liquid reports an Score of 56.78 The blog chart includes the task numbers for LFM2-Audio-1.50B (e.g. AlpacaEval 3.71, CommonEval 3.49) and WildVoice 3.17) In the table, Liquid AI compares its results with those of larger models such as Qwen2.5 Omni-3B and Moshi-7B. VoiceBench (an external benchmark that was introduced for LLM-based assistants in 2024)

The model card on Hugging Face provides an additional VoiceBench table (with closely related—but not identical—per-task values) and includes classic ASR WERs where LFM2-Audio matches or improves on Whisper-large-v3-turbo for some datasets despite being a generalist speech–text model. For example (lower is better): AMI 15.36 vs. 16.13 (Whisper-large-v3-turbo), LibriSpeech-clean 2.03 vs. 2.10.

https://huggingface.co/LiquidAI/LFM2-Audio-1.5B

Why does voice AI matter?

You can find out more about this by clicking here. “omni” stacks couple ASR → LLM → TTS, which adds latency and brittle interfaces. LFM2-Audio, with its single-backbone system and continuous embeddings of input and discrete codes for output, reduces glue and allows early audio emission. This means faster response times for developers and simpler pipelines. It also supports ASR, TTS and conversational agents with one model. Liquid AI offers code, entry points for demos, and distribution through Hugging Face.


Take a look at the GitHub Page, Hugging Face Model Card The following are some examples of how to get started: Technical details. Please feel free to browse our GitHub Page for Tutorials, Codes and Notebooks. Also, feel free to follow us on Twitter Don’t forget about our 100k+ ML SubReddit Subscribe Now our Newsletter. Wait! Are you using Telegram? now you can join us on telegram as well.


Asif Razzaq, CEO of Marktechpost Media Inc. is a visionary engineer and entrepreneur who is dedicated to harnessing Artificial Intelligence’s potential for the social good. Marktechpost was his most recent venture. This platform, dedicated to Artificial Intelligence, is known for the in-depth reporting of news on machine learning and deep understanding that is technically correct and understandable by all audiences. This platform has over 2,000,000 monthly views which shows its popularity.

🔥[Recommended Read] NVIDIA AI Open-Sources ViPE (Video Pose Engine): A Powerful and Versatile 3D Video Annotation Tool for Spatial AI

AI dat
Share. Facebook Twitter LinkedIn Email
Avatar
Gavin Wallace

Related Posts

Anthropic releases Claude Opus 4.7, a major upgrade for agentic coding, high-resolution vision, and long-horizon autonomous tasks

19/04/2026

The Coding Guide to Property Based Testing with Hypothesis and Stateful, Differential and Metamorphic Test Designs

19/04/2026

Google AI Releases Google Auto-Diagnosis: A Large Language Model LLM Based System to Diagnose Integrity Test Failures At Scale

18/04/2026

This is a complete guide to running OpenAI’s GPT-OSS open-weight models using advanced inference workflows.

18/04/2026
Top News

Wired Roundup: 5 Trends in Tech and Politics that Will Shape 2025

Cursor launches a new AI agent experience to take on Claude Code and Codex

The Cursor AI Coding tool is now available for designers

Guillermo del Toro Hopes He’s Dead Before AI Art Goes Mainstream

AI will never be conscious

Load More
AI-Trends.Today

Your daily source of AI news and trends. Stay up to date with everything AI and automation!

X (Twitter) Instagram
Top Insights

OpenAI hires 4 high-ranking engineers from Tesla, xAI and Meta

09/07/2025

Google DeepMind releases Lyria 3 – an advanced music generation AI model that turns photos and text into custom tracks with included lyrics and vocals.

18/02/2026
Latest News

Anthropic releases Claude Opus 4.7, a major upgrade for agentic coding, high-resolution vision, and long-horizon autonomous tasks

19/04/2026

The Coding Guide to Property Based Testing with Hypothesis and Stateful, Differential and Metamorphic Test Designs

19/04/2026
X (Twitter) Instagram
  • Privacy Policy
  • Contact Us
  • Terms and Conditions
© 2026 AI-Trends.Today

Type above and press Enter to search. Press Esc to cancel.