The LiquidAI LFM2-ColBERT350M is a new small model that brings late interaction retrieval to multilingual and cross-linguistic RAG

Does a compact retriever for late interactions index only once to deliver fast cross-lingual inference and accurate results? Liquid AI release LFM2-ColBERT-350MThe compact retriever for late interaction is suited to multi- and crosslingual search. The system can retrieve documents with high accuracy, even if they are indexed and written in multiple languages. Liquid AI’s team reported inference speed that was on par or faster than models which were 2.3x smaller. This is attributable to the LFM2 core. It comes complete with Hugging Face demonstration and detailed model cards for integration into retrieval enhanced generation systems.

https://www.liquid.ai/blog/lfm2-colbert-350m-one-model-to-embed-them-all

Why late interactions are important?

The majority of production systems use either bi-encoders to increase speed, or cross encoders to improve accuracy. The late interaction is designed to achieve both benefits. Tokens are used to encode documents and queries separately. The system uses operations like MaxSim to compare token vectors during query times. It preserves the fine-grained interactions between tokens without incurring all of the costs associated with joint cross attention. This allows for pre-computations of documents, which improves the precision when ranking. In one single pass, it can be used as both a retriever and a ranking tool.

Model specification

There are 350 million total parameter values in LFM2-ColBERT. The 25 layers consist of 18 convolution blocks, 6 attention blocking blocks and 1 dense layer. The length of the context is 32k tokens. The size of the vocabulary is 65 536. MaxSim is the similarity function. Output dimension is 128. Precision of training is BF16. License: LFM Open License, Version 1.

https://huggingface.co/LiquidAI/LFM2-ColBERT-350M

Supported and evaluated languages

This model is available in 8 different languages. This includes English, Arabic and Chinese. Also included are Japanese, Korean, German, French, Japanese and Spanish. Italian and Portuguese have been added, making the total of 9 languages available for comparisons between document languages and query languages. It is important to make this distinction when deploying systems that are required to cover specific markets.

https://www.liquid.ai/blog/lfm2-colbert-350m-one-model-to-embed-them-all

Setting up the evaluation and key findings

Liquid AI has extended the NanoBEIR Benchmark with Japanese and Korean. The extension is published for reproducibility. LFM2-ColBERT 350M is more multilingual than GTE-ModernColBERT v1 with 150M parameters. German, Arabic. Korean and Japanese performance gains are the most significant.

The Key Takeaways

MaxSim’s token-level scoring preserves the finer interactions, while maintaining separate encoders. Document embeddings are precomputed with MaxSim and can then be queried quickly.
It is possible to index documents in multiple languages and retrieve them in any of those. Model cards list 8 languages supported, whereas evaluations cover 9 languages when evaluating cross-lingual pair.
On the NanoBEIR multilingual extension, LFM2-ColBERT-350M outperforms the prior late-interaction baseline (GTE-ModernColBERT-v1 at 150M) and maintains English performance.
Inference speed is reported on par with models 2.3× smaller across batch sizes, attributed to the LFM2 backbone.

Editor’s Notes

Liquid AI’s LFM2-ColBERT 350M uses late interaction with MaxSim. It encodes questions and documents separately. Then, at the time of query, scores token vectors, preserving token level interactions. The system is designed to target multilingual and cross-lingual retrieval. It indexes once, and queries in multiple languages. Evaluations are described using a NanoBEIR extension. Inference speed is comparable to models that are 2.3 times larger, according to Liquid AI’s team. This can be attributed in part by the LFM2 core. The late interaction of the nanoscale is ready to go into production for RAG trials in multilingual languages.

Take a look at the Model Weights, Demo You can also find out more about the following: Technical details. Check out our GitHub Page for Tutorials, Codes and Notebooks. Also, feel free to follow us on Twitter Don’t forget about our 100k+ ML SubReddit Subscribe Now our Newsletter. Wait! What? now you can join us on telegram as well.

Asif Razzaq serves as the CEO at Marktechpost Media Inc. As an entrepreneur, Asif has a passion for harnessing Artificial Intelligence to benefit society. Marktechpost is his latest venture, a media platform that focuses on Artificial Intelligence. It is known for providing in-depth news coverage about machine learning, deep learning, and other topics. The content is technically accurate and easy to understand by an audience of all backgrounds. Over 2 million views per month are a testament to the platform’s popularity.

🙌 Follow MARKTECHPOST: Add us as a preferred source on Google.

The LiquidAI LFM2-ColBERT350M is a new small model that brings late interaction retrieval to multilingual and cross-linguistic RAG

GitNexus, an Open-Source Knowledge Graph Engine that is MCP Native and Gives Claude Coding and Cursor Complete Codebase Structure Awareness

Deepgram Python SDK Implementation for Transcription and Async Processing of Audio, Async Text Intelligence, and Async Text Intelligence.

DeepSeek AI releases DeepSeek V4: Sparse attention and heavily compressed attention enable one-million-token contexts.

OpenMythos Coding Tutorial: Recurrent-Depth Transformers, Depth Extrapolation and Mixture of Experts Routing

The Deepfake Scams are distorting reality itself

OpenAI Updates GPT-5 Following Users’ Revolt

OpenAI’s OpenAI Scientist is hired by Meta to help lead AI Lab

Anthropic supply-chain risk label should remain in place, Appeals Court says

Jon M. Chu says AI couldn’t have made one of Wicked’s best moments

Top Insights

xAI Releases Standalone Grok Speech to text and Text to speech APIs, Aimed at Enterprise Voice Developers

NVIDIA unveils ProRL agent: A decoupled infrastructure that allows for the reinforcement learning of LLM multi-turn agents.

Latest News

Anthropic Mythos is Unauthorized by Discord Sleuths

Ace the Ping Pong Robot can Whup your Ass

The LiquidAI LFM2-ColBERT350M is a new small model that brings late interaction retrieval to multilingual and cross-linguistic RAG

Why late interactions are important?

Model specification

Supported and evaluated languages

Setting up the evaluation and key findings

The Key Takeaways

Editor’s Notes

Related Posts