How does a Large Language Model with a trillion parameters achieve enterprise-level performance, while simultaneously reducing its parameter count by 33.3 percent and increasing pre-training effectiveness by 49 percent? Yuan Lab AI released Yuan3.0 Ultra – a large-language model based on a Mixture-of-Experts Mixture-of-Experts. 1T total parameters You can also find out more about the following: Parameters 68.8B are activated. Model architecture was designed to maximize performance for enterprise-specific tasks, while still maintaining general-purpose capability. Yuan3.0 ultra uses sparsity, instead of dense models. This allows it to scale without a linear rise in cost.
Layer-Adaptive Expert Pruning (LAEP)
Yuan3.0 Ultra is a training system that takes a new approach to the traditional Yuan3.0 Ultra. Layer-Adaptive Expert Pruning (LAEP) When you are looking for a way to improve your algorithm, then this is the right place.. LAEP is a direct replacement for expert pruning, which typically occurs after training. Pre-training phase.
The research on expert load distribution has revealed that pre-training is divided into two distinct phases:
- First Transition Phase Expert loads are characterized by high variability resulting from random initialization.
- Stable phase: The relative rankings of experts are largely unchanged, even though the expert loads have converged.
LAEP will prune based upon two factors once the phase of stability is achieved:
- Individual Load Constraint (⍺): Experts whose token loads are significantly less than the average layer.
- Cumulative Load Constraint (β): This subset is the group of experts that contributes the least towards the total processing of tokens.
By applying LAEP with β=0.1 and varying ⍺, the model was pruned from an initial 1.5T Parameters “down to” Parameter 1T. This Discount of 33.3% Total parameters preserve the multi-domain model performance, while dramatically reducing deployment memory needs. In 1T, experts were reduced to 64 per layer. The 48 Preservation Experts.
Expert Hardware Arrangers and High Efficiency Hardware
MoE models are often affected by device-level imbalances when experts have been distributed over a cluster of computers.. Yuan3.0 implements an Expert Rearranging algorithm.
This algorithm ranks the experts according to their token loads and then uses a strategy of greedy distribution across GPUs in order to reduce token variance..
| Method | TFLOPS per GPU |
| Base Model (1515B) | 62.14 |
| DeepSeek-V3 Loss Detection | 80.82 |
| Yuan3.0 Ultra (LAEP) | 92.60 |
Pre-training performance improved by 49%. The improvement can be attributed to the following two factors
- Model Pruning: Contribute 32.4% Efficiency gains can be achieved by reducing the number of people working.
- The Expert Rearrangement Contribution 15.9% Efficiency gains can be achieved by reducing the number of people working.
Revision of the RIRM: A tool to reduce overthinking
At the stage of reinforcement learning, (RL), the model utilizes a refined Reflection-Inhibition-Reward Mechanism (RIRM), Avoid excessively complex reasoning chains when solving simple problems.
The The reward for reflection, $R_{ver}$, is calculated using a threshold-based penalty system:
- rYou can also find out more about=0: How many steps of reflection are ideal for direct response?
- The rMax.=3: This is the maximum reflection level that can be tolerated.
As the reflection step approaches r, rewards for correct samples decrease.Max., while incorrect samples that ‘overthink’ (exceeding rMax. receive maximum penalties. The maximum penalty was imposed. Training accuracy gains 16.33% The a 14.38% reduction in output token length.

Enterprise Benchmark Performance
Yuan3.0 ultra was evaluated by a number of industry models including GPT 5.2 and Gemini 3.1 Pro across specialized benchmarks for enterprise..
| Benchmark | The Task Category | Yuan3.0 Ultra Score | Top Competitor Score |
| Docmatix | Multimodal RAG | 67.4% | 48.4% (GPT-5.2) |
| ChatRAG | Text Retrieval Average (Avg.) | 68.2% | 53.6% (Kimi K2.5) |
| MMTab | Table Reasoning | 62.3% | 66.2% (Kimi K2.5) |
| SummaryEval | Summary of Text | 62.8% | 49.9% (Claude Opus 4.6) |
| Spider 1.0 | Text-to-SQL | 83.9% | 82.7% (Kimi K2.5) |
| BFCL V3 | The Tool Callout | 67.8% | 78.8% (Gemini 3.1 Pro) |
Yuan3.0 ultra achieves the highest accuracy possible in retrieving multimodal data (Docmatix), long context (ChatRAG), and structured data (tool calling) with a robust performance..
Take a look at the Paper You can also find out more about the following: Repo. Also, feel free to follow us on Twitter Join our Facebook group! 120k+ ML SubReddit Subscribe Now our Newsletter. Wait! Are you using Telegram? now you can join us on telegram as well.

