YuanLab AI releases Yuan 3.0 ultra: a flagship multimodal MOE Foundation model, built for stronger intelligence and unrivaled efficiency

How does a Large Language Model with a trillion parameters achieve enterprise-level performance, while simultaneously reducing its parameter count by 33.3 percent and increasing pre-training effectiveness by 49 percent? Yuan Lab AI released Yuan3.0 Ultra – a large-language model based on a Mixture-of-Experts Mixture-of-Experts. 1T total parameters You can also find out more about the following: Parameters 68.8B are activated. Model architecture was designed to maximize performance for enterprise-specific tasks, while still maintaining general-purpose capability. Yuan3.0 ultra uses sparsity, instead of dense models. This allows it to scale without a linear rise in cost.

Layer-Adaptive Expert Pruning (LAEP)

Yuan3.0 Ultra is a training system that takes a new approach to the traditional Yuan3.0 Ultra. Layer-Adaptive Expert Pruning (LAEP) When you are looking for a way to improve your algorithm, then this is the right place.^{^{^{^{. LAEP is a direct replacement for expert pruning, which typically occurs after training. Pre-training phase^{^{^{^.}}}}}}}

The research on expert load distribution has revealed that pre-training is divided into two distinct phases:

First Transition Phase Expert loads are characterized by high variability resulting from random initialization.
Stable phase: The relative rankings of experts are largely unchanged, even though the expert loads have converged.

LAEP will prune based upon two factors once the phase of stability is achieved:

Individual Load Constraint (⍺): Experts whose token loads are significantly less than the average layer.
Cumulative Load Constraint (β): This subset is the group of experts that contributes the least towards the total processing of tokens.

By applying LAEP with β=0.1 and varying ⍺, the model was pruned from an initial 1.5T Parameters “down to” Parameter 1T. This Discount of 33.3% Total parameters preserve the multi-domain model performance, while dramatically reducing deployment memory needs. In 1T, experts were reduced to 64 per layer. The 48 Preservation Experts.

https://github.com/Yuan-lab-LLM/Yuan3.0-Ultra/blob/main/Docs/Yuan3.0_Ultra%20Paper.pdf

Expert Hardware Arrangers and High Efficiency Hardware

MoE models are often affected by device-level imbalances when experts have been distributed over a cluster of computers.^{^{^{^{. Yuan3.0 implements an Expert Rearranging algorithm^{^{^{^.}}}}}}}

This algorithm ranks the experts according to their token loads and then uses a strategy of greedy distribution across GPUs in order to reduce token variance.^{^{^{^.}}}

Method	TFLOPS per GPU
Base Model (1515B)	62.14
DeepSeek-V3 Loss Detection	80.82
Yuan3.0 Ultra (LAEP)	92.60

Pre-training performance improved by 49%. The improvement can be attributed to the following two factors

Model Pruning: Contribute 32.4% Efficiency gains can be achieved by reducing the number of people working.
The Expert Rearrangement Contribution 15.9% Efficiency gains can be achieved by reducing the number of people working.

Revision of the RIRM: A tool to reduce overthinking

At the stage of reinforcement learning, (RL), the model utilizes a refined Reflection-Inhibition-Reward Mechanism (RIRM), Avoid excessively complex reasoning chains when solving simple problems^{^{^{^.}}}

The The reward for reflection, $R_{ver}$, is calculated using a threshold-based penalty system^:

r_{You can also find out more about}=0: How many steps of reflection are ideal for direct response?
The r_Max.=3: This is the maximum reflection level that can be tolerated.

As the reflection step approaches r, rewards for correct samples decrease._Max., while incorrect samples that ‘overthink’ (exceeding r_Max. receive maximum penalties. The maximum penalty was imposed. Training accuracy gains 16.33% The a 14.38% reduction in output token length.

https://github.com/Yuan-lab-LLM/Yuan3.0-Ultra/blob/main/Docs/Yuan3.0_Ultra%20Paper.pdf

Enterprise Benchmark Performance

Yuan3.0 ultra was evaluated by a number of industry models including GPT 5.2 and Gemini 3.1 Pro across specialized benchmarks for enterprise.^{^{^{^.}}}

Benchmark	The Task Category	Yuan3.0 Ultra Score	Top Competitor Score
Docmatix	Multimodal RAG	67.4%	48.4% (GPT-5.2)
ChatRAG	Text Retrieval Average (Avg.)	68.2%	53.6% (Kimi K2.5)
MMTab	Table Reasoning	62.3%	66.2% (Kimi K2.5)
SummaryEval	Summary of Text	62.8%	49.9% (Claude Opus 4.6)
Spider 1.0	Text-to-SQL	83.9%	82.7% (Kimi K2.5)
BFCL V3	The Tool Callout	67.8%	78.8% (Gemini 3.1 Pro)

Yuan3.0 ultra achieves the highest accuracy possible in retrieving multimodal data (Docmatix), long context (ChatRAG), and structured data (tool calling) with a robust performance.^{^{^{^{^{^{^{^{^{^{^{^{^{^{^{^{^{^{^{^{^{^{^{^{^.}}}}}}}}}}}}}}}}}}}}}}}}

Take a look at the Paper You can also find out more about the following: Repo. Also, feel free to follow us on Twitter Join our Facebook group! 120k+ ML SubReddit Subscribe Now our Newsletter. Wait! Are you using Telegram? now you can join us on telegram as well.

YuanLab AI releases Yuan 3.0 ultra: a flagship multimodal MOE Foundation model, built for stronger intelligence and unrivaled efficiency

Anthropic releases Claude Opus 4.7, a major upgrade for agentic coding, high-resolution vision, and long-horizon autonomous tasks

The Coding Guide to Property Based Testing with Hypothesis and Stateful, Differential and Metamorphic Test Designs

Google AI Releases Google Auto-Diagnosis: A Large Language Model LLM Based System to Diagnose Integrity Test Failures At Scale

This is a complete guide to running OpenAI’s GPT-OSS open-weight models using advanced inference workflows.

The AGI Battle Between Microsoft and OpenAI is More Than Just a Contract

Hackathon Man vs. Machine: A Look Inside

OpenAI Government Partnership Unpacked: WIRED roundup

This Beanie is Designed to Read your Thoughts

“Create a replica of this image. Don’t change anything” AI Trend Takes Off

Top Insights

DOGE used a Meta AI model to review emails from federal workers

Google AI releases MedGemma-1.5, the latest version of their open Medical AI Models.

Latest News

Anthropic releases Claude Opus 4.7, a major upgrade for agentic coding, high-resolution vision, and long-horizon autonomous tasks

The Coding Guide to Property Based Testing with Hypothesis and Stateful, Differential and Metamorphic Test Designs

YuanLab AI releases Yuan 3.0 ultra: a flagship multimodal MOE Foundation model, built for stronger intelligence and unrivaled efficiency

Layer-Adaptive Expert Pruning (LAEP)

Expert Hardware Arrangers and High Efficiency Hardware

Revision of the RIRM: A tool to reduce overthinking

Enterprise Benchmark Performance

Related Posts