MiniMax Releases MiniMax M2: A Mini Open Model Built for Max Coding and Agentic Workflows at 8% Claude Sonnet Price and ~2x Faster

Open source MoEs can power agentic workflows for a fraction the cost of flagship models while maintaining long-horizon tools across MCP shell, browser and code. Recently, MiniMax has launched MiniMax-M2A mixture of MoE experts model that is optimized for coder and agent workflows. Hugging Face publishes the weights under MIT licensing. The model has been designed for long-term plans and multi-file editing. It includes 229B parameters, with 10B per token. This keeps latency and memory in check.

https://github.com/MiniMax-AI/MiniMax-M2

What is the importance of activation size in architecture?

MiniMax-M2 This compact MoE routes about 10B parameters active per token. Smaller activations allow for more simultaneous runs of CI chains, Browse and Retrieval Chains. They also reduce tail latency and memory pressure in the plan, act and verify loops. It is this performance budget which allows the claims to be made about speed and costs in comparison with dense models that are of the same quality.

MiniMax-M2 The model is a combination of thinking models. Internal reasoning was wrapped in ... Blocks, and tells the user to retain these blocks across all turns. These segments are detrimental to the multi-step task and tool chain quality. The requirement to remove these segments is clearly stated on the model page on HF.

Benchmarks for coding and agents

MiniMax reports that a series of code and agent evaluations is closer to developer workflows compared with static QA. Table 46.3 shows Terminal Bench. In Multi SWE Bench it says 36.2. BrowseComp shows it at 44.0. On BrowseComp, it shows 44.0.

https://github.com/MiniMax-AI/MiniMax-M2

MiniMax’s official announcement emphasizes 8% off Claude Sonnet’s pricing and nearly 2x the speed. It also includes a window of free access. On the same page, you will find the exact token prices as well as the deadline for the free trial.

Compare M1 and M2

Aspect	MiniMax M1	MiniMax M2
Total parameters	456B total	229B in model card metadata, model card text says 230B total
Active parameters per token	Active: 45.9B	The 10B Active
Core design	Combination of experts with Lightning attention	A sparse mix of experts focusing on coding and workflows
The Thinking Format	No need for Think Tag protocol with the Thinking budget versions of 40k and80k dollars in RL-training	The interweaving of thinking `...` Segments that need to be maintained across turns
Benchmarks highlighted	AIME, LiveCodeBench, SWE-bench Verified, TAU-bench, long context MRCR, MMLU-Pro	BrowseComp Text only, GAIA text, Artificial Analysis intelligence suite
Inference defaults	Top p 1.0 temperature, 1.0	Model card displays temperature 1, top k40, top K40, top P 0.95
Serving guide	Transformers route also described by the vLLM	Tool calling guide included, vLLM & SGLang are recommended
The primary focus	CISPO, a reinforcement learning system that uses long-term reasoning, efficient test time computation, and CISPO, a scaled down version of the Test Time Calculator	Native code-based workflows in shell, browsers, retrievers, and code runners

The Key Takeaways

Safetensors are available in F32 and BF16 for M2 as well as FP8 E4M3 with F8_E4M3 under MIT.
This compact MoE has 229B parameters total and 10B per token. The card is tied to a lower memory usage and slower tail latency for plan, act and verify loops that are typical of agents.
The outputs of internal reasoning are wrapped in ... And the model card specifically instructs to retain these segments as part of conversation history. It warns against removing them because it will affect performance when using multiple steps and tools.
The results are reported for Terminal-Bench (Multi-SWE-Bench), BrowseComp and other benches, along with reproducibility notes and scaffolds. Day-0 service is also documented with SGLang, vLLM and concrete deployment guides.

Editor’s Notes

The MiniMax M2 comes with an open weight MIT design. This is a combination of experts’ designs with 229B in total parameters, about 10B active per token. These are aimed at agent loops and coding with lower latency and less memory. The safetensors are available on Hugging Face with FP32 and BF16 formats. It also includes deployment notes and a chat template. API lists Anthropic compatible ends and pricing, with a short evaluation period. Recipes for vLLM or SGLang are provided to serve locally and perform benchmarking. MiniMax M2 offers a great open release.

Take a look at the API Doc, Weights The following are some examples of how to get started: Repo. Please feel free to browse our GitHub Page for Tutorials, Codes and Notebooks. Also, feel free to follow us on Twitter Don’t forget about our 100k+ ML SubReddit Subscribe now our Newsletter. Wait! Are you using Telegram? now you can join us on telegram as well.

Michal is a professional in the field of data science with a Masters of Science degree from University of Padova. Michal is a data scientist with a background in machine learning, statistical analysis and data engineering.

🙌 Follow MARKTECHPOST: Add us as a preferred source on Google.

MiniMax Releases MiniMax M2: A Mini Open Model Built for Max Coding and Agentic Workflows at 8% Claude Sonnet Price and ~2x Faster

Anthropic releases Claude Opus 4.7, a major upgrade for agentic coding, high-resolution vision, and long-horizon autonomous tasks

The Coding Guide to Property Based Testing with Hypothesis and Stateful, Differential and Metamorphic Test Designs

Google AI Releases Google Auto-Diagnosis: A Large Language Model LLM Based System to Diagnose Integrity Test Failures At Scale

This is a complete guide to running OpenAI’s GPT-OSS open-weight models using advanced inference workflows.

‘Wicked’ Director Jon M. Chu on ‘What Makes Art Beautiful’ in the AI Era

The WIRED roundup includes Alpha School, Grokipedia and Real Estate AI Videos

This AI-powered robot keeps going even if you attack it with a chainsaw

OpenAI’s Blockbuster AMD Offer Is A Bet on Nearly Unlimited Demand for AI

What is AI? AI Blog

Top Insights

Google AI Launches Gemini Embedding 2: Multimodal Embedding Model That Lets You Bring Text, Images Video, Audio and Docs in the Embedding Area

NVIDIA Released Nemotron Speech AS: A New Open Source Transcription model Designed for Low Latency Usecases Like Voice Agents

Latest News

Anthropic releases Claude Opus 4.7, a major upgrade for agentic coding, high-resolution vision, and long-horizon autonomous tasks

The Coding Guide to Property Based Testing with Hypothesis and Stateful, Differential and Metamorphic Test Designs

MiniMax Releases MiniMax M2: A Mini Open Model Built for Max Coding and Agentic Workflows at 8% Claude Sonnet Price and ~2x Faster

What is the importance of activation size in architecture?

Benchmarks for coding and agents

Compare M1 and M2

The Key Takeaways

Editor’s Notes

Related Posts