Close Menu
  • AI
  • Content Creation
  • Tech
  • Robotics
AI-trends.todayAI-trends.today
  • AI
  • Content Creation
  • Tech
  • Robotics
Trending
  • Anthropic releases Claude Opus 4.7, a major upgrade for agentic coding, high-resolution vision, and long-horizon autonomous tasks
  • The Coding Guide to Property Based Testing with Hypothesis and Stateful, Differential and Metamorphic Test Designs
  • Schematik Is ‘Cursor for Hardware.’ The Anthropics Want In
  • Hacking the EU’s new age-verification app takes only 2 minutes
  • Google AI Releases Google Auto-Diagnosis: A Large Language Model LLM Based System to Diagnose Integrity Test Failures At Scale
  • This is a complete guide to running OpenAI’s GPT-OSS open-weight models using advanced inference workflows.
  • The Huey Code Guide: Build a High-Performance Background Task Processor Using Scheduling with Retries and Pipelines.
  • Top 19 AI Red Teaming Tools (2026): Secure Your ML Models
AI-trends.todayAI-trends.today
Home»Tech»LongCat-Flash Omni: An Open-Source SOTA Model for Real-Time Audio and Visual Interaction. 560B Parameters, 27B Actived.

LongCat-Flash Omni: An Open-Source SOTA Model for Real-Time Audio and Visual Interaction. 560B Parameters, 27B Actived.

Tech By Gavin Wallace02/11/20254 Mins Read
Facebook Twitter LinkedIn Email
NVIDIA AI Introduces AceReason-Nemotron for Advancing Math and Code Reasoning
NVIDIA AI Introduces AceReason-Nemotron for Advancing Math and Code Reasoning
Share
Facebook Twitter LinkedIn Email

How can you create a model which is able to listen, watch, read, and react in real-time across audio, text, images, and videos without losing efficiency? Meituan LongCat has been released LongCat Flash OmniLongCat Flash developed a Mixture of Experts, a model that is open-source and has over 560 trillion parameters. It also includes about 27 billion tokens with active values. It extends text to audio, vision and video, while keeping a context of 128K. This allows it to run conversations, as well as document-level understanding, in one stack.

https://github.com/meituan-longcat/LongCat-Flash-Omni?tab=readme-ov-file

The Architecture of Modular Attachments

LongCat Flash Omni retains the same language model and adds perception modules. LongCat’s ViT encoder can process video frames and images, eliminating the need for a separate tower. The audio encoder and the LongCat audio codec turn speech into discrete tones, so that the speech can be output from the LLM stream. This allows for real-time audio visual interactions.

Watching Streaming Content with Feature Interleaving

Researchers describe chunk-wise audiovisual feature interleaving whereby audio features, video elements and timestamps can be packed in segments of 1 second. The report doesn’t tie sampling rules to the user or the model speaking phases. Therefore, duration-conditioned sampling is the right description. It keeps the latency down and provides spatial context to GUI, OCR and Video QA tasks.

From Text to Omni

The training follows a curriculum. Training follows a staged curriculum.

System Design Modality Decoupled Paradigm

Meituan employs modality decoupled paralelism because encoders are different from LLMs. The LLM uses pipeline, context, expert, and hybrid sharding parallelism. A ModalityBridge is used to align embeddings, while the audio and vision encoders use activation and hybrid sharding. Multimodal supervised refinement, according to the research team, keeps the system’s throughput at over 90 percent.

https://github.com/meituan-longcat/LongCat-Flash-Omni?tab=readme-ov-file

Benchmarks & Positioning

LongCat Flash Omni scores a 61.4 out of 100 on OmniBench. It is slightly higher than Qwen Omni Instruct (58.5) and Qwen 2.5 Omni (55.0) but below Gemini 2.5 PRO at 66.8. VideoMME scores it at 78.2 which is similar to GPT4o and Gemini Flash. VoiceBench gives it an 88.7 score, which is slightly above GPT4o Audio.

What you need to know

  1. LongCat Flash Omni, an open source model based on Meituan’s backbone of 560B MoE parameters, activates approximately 27B parameter per token by shortcut-connected MoE without computation experts.
  2. Model attaches the unified vision video and streaming audio paths to existing LongCat Flash LLM. The video is sampled at 2 frames per second with duration conditional adjustment.
  3. LongCat Flash Omni scored 61.4 in OmniBench. This is above Qwen3 Omni Instruct, which scores 58.5. However, Gemini 2.5 pro, at 66.8, comes out on top.
  4. Meituan is using modality decoupled paralelism. Audio and vision encoders use hybrid sharding. LLM uses pipeline, context, and expert parallelism.

Meituan’s latest release is a practical attempt to bring omni-modal interaction into the mainstream, and not just as an experiment. This release keeps 560B Shortcuts and Mixture of Experts active with 27B, so that the LongCat language backbone remains compatible. The streaming audio-visual perception is enhanced with a 2 fps video sample and duration conditional adjustment. This ensures low latency without compromising spatial foundation. Through modality-decoupled parallelism, it reports text only throughputs of more than 90 percent.


Take a look at the Paper, Model Weights You can also find out more about the following: GitHub Repo. Check out our GitHub Page for Tutorials, Codes and Notebooks. Also, feel free to follow us on Twitter Don’t forget about our 100k+ ML SubReddit Subscribe now our Newsletter. Wait! What? now you can join us on telegram as well.


Michal is a professional in the field of data science with a Masters of Science degree from University of Padova. Michal is a data scientist with a background in machine learning, statistical analysis and data engineering.

🙌 Follow MARKTECHPOST: Add us as a preferred source on Google.

ar met
Share. Facebook Twitter LinkedIn Email
Avatar
Gavin Wallace

Related Posts

Anthropic releases Claude Opus 4.7, a major upgrade for agentic coding, high-resolution vision, and long-horizon autonomous tasks

19/04/2026

The Coding Guide to Property Based Testing with Hypothesis and Stateful, Differential and Metamorphic Test Designs

19/04/2026

Google AI Releases Google Auto-Diagnosis: A Large Language Model LLM Based System to Diagnose Integrity Test Failures At Scale

18/04/2026

This is a complete guide to running OpenAI’s GPT-OSS open-weight models using advanced inference workflows.

18/04/2026
Top News

I Wasn’t Sure I Wanted Anthropic to Pay Me for My Books—I Do Now

Sora II is used to create disturbing videos with AI-generated children

AWS’ Matt Garman is looking to assert Amazon’s dominance of the cloud in an AI era

Fitbit app is turning into an AI-powered personal coach

OpenAI Sora App lets you fake yourself to entertain.

Load More
AI-Trends.Today

Your daily source of AI news and trends. Stay up to date with everything AI and automation!

X (Twitter) Instagram
Top Insights

Nvidia’s Deal With Meta signals a new era in computing power

18/02/2026

What You Need to Know for a Tech Bubble

17/11/2025
Latest News

Anthropic releases Claude Opus 4.7, a major upgrade for agentic coding, high-resolution vision, and long-horizon autonomous tasks

19/04/2026

The Coding Guide to Property Based Testing with Hypothesis and Stateful, Differential and Metamorphic Test Designs

19/04/2026
X (Twitter) Instagram
  • Privacy Policy
  • Contact Us
  • Terms and Conditions
© 2026 AI-Trends.Today

Type above and press Enter to search. Press Esc to cancel.