Liquid AI introduces LFM2.5. This is a small foundation model based on LFM2 and designed for device and edge deployments. Models include LFM2.5 1.2B Base and LFM2.5 1.2B Instruct, with Japanese, audio and vision languages as variants. Open weights can be found on Hugging Face. They are also exposed via the LEAP platform.
Recipe for Architecture and Training
LFM2.5 retains hybrid LFM2 architecture, which was originally designed to provide fast inference and efficient memory usage on CPUs and GPUs. It also scales up the pipeline for data post-training and scaling the training data. The backbone’s pretraining is now extended to include 28T tokens instead of 10T. After that, the instruct variant receives fine-tuning, alignment of preferences, large-scale multistage reinforcement learning, focusing on math and knowledge reasoning, as well as instruction following and tool usage.
The text model at the one billion scale
LFM2.5-1.2-B-Instruct, the general-purpose text model is used. Liquid AI reports benchmark results from GPQA, MMLU Pro IFEval, IFBench and several function-calling and coding suites. This model reaches 38.89 GPQA on MMLUPro and 44.35 GPQA. The metrics of other 1B-class open models like Llama-3.2-1B IT, Gemma-3-1B IT, and Llama-3.2-1B instruct score much lower.
LFM2.5-1.2B Instruct has 86.23 and 48.33 respectively on IFEval and IFBench. Both of these tests are aimed at evaluating the multi-step instructions following and quality function calling. The values above are higher than the 1B baselines shown in the Liquid AI table.
Japanese-optimized variant
LFM2.5-1.2B-JP It is an optimized Japanese text model that uses the same core. This model targets Japanese tasks like JMMLU and M-IFEval, as well as GSM8K. The checkpoint is an improvement over general instruct on Japanese tasks, competing with other multilingual small models such as Qwen3-1-7B, Llama 3-2-1B Instruct and Gemma 3–1B IT.
Multimodal Edge Workloads: Vision Language Model
LFM2.5VL-1.6B, the latest version of the vision language model is available. The model uses LFM2.5 1.2B Base as its language base and includes a vision-tower for image-understanding. Model is optimized on a variety of OCR and visual reasoning benchmarks including MMStar, MM IFEval, BLINK, InfoVQA, OCRBench v2, RealWorldQA, MMMU, and Multilingual MMBench. LFM2.5VL1.6B is an improvement over LFM2-VL1.6B in most metrics. It’s intended to be used for real-world tasks, such as reading user interfaces, document comprehension, or multi-image reasoning.
Audio language model for native speech generation
LFM2.5Audio-1.50B is an audio native language model which supports text as well as audio outputs. The model is presented as Audio to Audio and it uses an audio tokenizer which is eight times faster on the constrained hardware than previous Mimi-based detokenizers.
Model supports two major generation modes. The interleaved mode is intended for speech-to-speech conversational agents that operate in real time, where the latency factor dominates. The sequential generation method is used for tasks like automatic speech recognition or text to speech. It allows the user to switch between the two generated modes without having the model re-initialized. Audio stacks are trained using quantization-aware learning at low precision.

The Key Takeaways
- LFM2.5 hybrid models are 1.2B scale and built using the LFM2 architecture. They come in Base, Japanese, Vision Language (Visual Language), and Audio Language versions.
- It is possible to extend the pretraining of LFM2.5 from 10T up to 28T tokens. Instruct also adds preference alignment as well as large-scale multistage reinforcement. These features push instruction following quality and tool use beyond other baselines in 1B classes.
- LFM2.5 1.2B Instruct achieves a strong performance on the text benchmark at the 1B level, with a score of 38.89 GPQA on MMLU Pro. This is higher than other peer models, such as Llama 3.2 1.2B Instruct on IFEval, Granite 4.0 1.0B on IFBench, Gemma 3.1 1B IT on IFEval, and Llama 3.2 1.3B Instruct on IFBench.
- This family also includes regional and multimodal variants. LFM2.5 1.2B-JP achieves state-of-the-art results in Japanese benchmarks. LFM2.5 -VL 1.6B, and LFM2.5 -Audio – 1.5B cover native audio and vision languages workloads.
Take a look at the Technical details You can also find out more about the following: Model weights. Also, feel free to follow us on Twitter Don’t forget about our 100k+ ML SubReddit Subscribe now our Newsletter. Wait! What? now you can join us on telegram as well.
Latest Releases of ai2025.devThe platform is a focused analytics tool for 2025 that converts model launches, benchmarks and ecosystem activities into structured data you can compare and export.
Asif Razzaq serves as the CEO at Marktechpost Media Inc. As an entrepreneur, Asif has a passion for harnessing Artificial Intelligence to benefit society. Marktechpost is his latest venture, a media platform that focuses on Artificial Intelligence. It is known for providing in-depth news coverage about machine learning, deep learning, and other topics. The content is technically accurate and easy to understand by an audience of all backgrounds. This platform has over 2,000,000 monthly views which shows its popularity.

