Open source MoEs can power agentic workflows for a fraction the cost of flagship models while maintaining long-horizon tools across MCP shell, browser and code. Recently, MiniMax has launched MiniMax-M2A mixture of MoE experts model that is optimized for coder and agent workflows. Hugging Face publishes the weights under MIT licensing. The model has been designed for long-term plans and multi-file editing. It includes 229B parameters, with 10B per token. This keeps latency and memory in check.
What is the importance of activation size in architecture?
MiniMax-M2 This compact MoE routes about 10B parameters active per token. Smaller activations allow for more simultaneous runs of CI chains, Browse and Retrieval Chains. They also reduce tail latency and memory pressure in the plan, act and verify loops. It is this performance budget which allows the claims to be made about speed and costs in comparison with dense models that are of the same quality.
MiniMax-M2 The model is a combination of thinking models. Internal reasoning was wrapped in Blocks, and tells the user to retain these blocks across all turns. These segments are detrimental to the multi-step task and tool chain quality. The requirement to remove these segments is clearly stated on the model page on HF.
Benchmarks for coding and agents
MiniMax reports that a series of code and agent evaluations is closer to developer workflows compared with static QA. Table 46.3 shows Terminal Bench. In Multi SWE Bench it says 36.2. BrowseComp shows it at 44.0. On BrowseComp, it shows 44.0.

MiniMax’s official announcement emphasizes 8% off Claude Sonnet’s pricing and nearly 2x the speed. It also includes a window of free access. On the same page, you will find the exact token prices as well as the deadline for the free trial.
Compare M1 and M2
| Aspect | MiniMax M1 | MiniMax M2 |
|---|---|---|
| Total parameters | 456B total | 229B in model card metadata, model card text says 230B total |
| Active parameters per token | Active: 45.9B | The 10B Active |
| Core design | Combination of experts with Lightning attention | A sparse mix of experts focusing on coding and workflows |
| The Thinking Format | No need for Think Tag protocol with the Thinking budget versions of 40k and80k dollars in RL-training | The interweaving of thinking Segments that need to be maintained across turns |
| Benchmarks highlighted | AIME, LiveCodeBench, SWE-bench Verified, TAU-bench, long context MRCR, MMLU-Pro | BrowseComp Text only, GAIA text, Artificial Analysis intelligence suite |
| Inference defaults | Top p 1.0 temperature, 1.0 | Model card displays temperature 1, top k40, top K40, top P 0.95 |
| Serving guide | Transformers route also described by the vLLM | Tool calling guide included, vLLM & SGLang are recommended |
| The primary focus | CISPO, a reinforcement learning system that uses long-term reasoning, efficient test time computation, and CISPO, a scaled down version of the Test Time Calculator | Native code-based workflows in shell, browsers, retrievers, and code runners |
The Key Takeaways
- Safetensors are available in F32 and BF16 for M2 as well as FP8 E4M3 with F8_E4M3 under MIT.
- This compact MoE has 229B parameters total and 10B per token. The card is tied to a lower memory usage and slower tail latency for plan, act and verify loops that are typical of agents.
- The outputs of internal reasoning are wrapped in
And the model card specifically instructs to retain these segments as part of conversation history. It warns against removing them because it will affect performance when using multiple steps and tools.... - The results are reported for Terminal-Bench (Multi-SWE-Bench), BrowseComp and other benches, along with reproducibility notes and scaffolds. Day-0 service is also documented with SGLang, vLLM and concrete deployment guides.
Editor’s Notes
The MiniMax M2 comes with an open weight MIT design. This is a combination of experts’ designs with 229B in total parameters, about 10B active per token. These are aimed at agent loops and coding with lower latency and less memory. The safetensors are available on Hugging Face with FP32 and BF16 formats. It also includes deployment notes and a chat template. API lists Anthropic compatible ends and pricing, with a short evaluation period. Recipes for vLLM or SGLang are provided to serve locally and perform benchmarking. MiniMax M2 offers a great open release.
Take a look at the API Doc, Weights The following are some examples of how to get started: Repo. Please feel free to browse our GitHub Page for Tutorials, Codes and Notebooks. Also, feel free to follow us on Twitter Don’t forget about our 100k+ ML SubReddit Subscribe now our Newsletter. Wait! Are you using Telegram? now you can join us on telegram as well.


