NVIDIA Releases Nemotron 3, a Hybrid 120B Parameter Mamba-Attention MoE Open-Source Model that Delivers 5x Higher Throughput Agentic AI

The difference between frontier proprietary models and open-source, highly transparent models is narrowing faster than ever. NVIDIA is officially revealing the secrets behind its flagship GPU. Nemotron 3 SuperA staggering reasoning model with 120 billion parameters, specifically designed for multi-agent complex applications.

Released Today Nemotron 3 Super It is the perfect bridge between the light 30 billion-parameter Nemotron 3 Nano as well as the 500 billion-parameter Nemotron 3 Ultra that will be released in the future, 2026. Delivering up to 7x higher throughput Double the accuracy compared to its predecessor, this new model represents a huge leap forward in terms of efficiency and intelligence for those developers who do not want to compromise.

The ‘Five Miracles’ of Nemotron 3 Super

Five major technological advances are responsible for the performance of Nemotron 3 Super.

Hybrid MoE Architecture: This model intelligently blends Mamba layers that are memory efficient with Transformer layers of high precision. To generate tokens, only a small fraction of the parameters are activated. it achieves a 4x increase in KV SSM cache efficiency is improved.
Multi-Token Prediction (MTP): This model allows for 3x faster reasoning on complicated tasks.
1 Million Context Window The context size is 7x greater than previous generations, allowing developers to drop large technical reports, or even entire codebases, directly into the memory of the model, eliminating the necessity for reasoning in multiple-step workflows.
Latent MoE: It is possible to reduce the amount of information in a model. activate four experts for the same compute cost as onWithout this innovative technology, the model’s accuracy would have to be increased by 35 times.
NeMo RL Gym Integration: By using interactive reinforcement-learning pipelines instead of static texts, the model can learn from feedback loops that are dynamic rather than text. This effectively doubles its intelligence index.

These breakthroughs lead to incredible output token efficiency per GPU

Nemotron Super: The Ultimate AI Engine??

Nemotron 3 Super It is not just another large-language model. Instead, it’s positioned to be a reasoning tool that can plan, execute, and verify complex tasks in a system of models. This is why it’s architecture changes the game for multiagent workflows.

For deeper reasoning, high throughput is required. You can also find out more about the following: model’s 7x higher throughput physically expands its search space. It can evaluate and explore more routes and better responses because it is able to process tokens and create them faster. It allows for developers to do more reasoning with the same budget of compute, and this is crucial when building intelligent, autonomous agents.
Zero “Re-Reasoning” The Long Workflow: Multi-agent systems constantly transmit context between agents. This 1-million-token context window enables the model to store massive quantities of information, such as entire codebases and long multi-step conversation histories between agents, in its own memory. It eliminates latency and costs associated with forcing the model to reprocess context each step.
Agent-Specific training environments: The model pipeline has been extended to include over 15 interactive reinforcement-learning environments. Nemotron 3 Super was able to learn the best paths for autonomous task completion by training with dynamic simulation loops.
The Advanced Tool Calling Features: Models need to be able to respond, and not only textually, in real-world applications that involve multiple agents. The model is ready to use. Nemotron 3 Super has proven highly proficient at tool calling, successfully navigating massive pools of available functions—such as dynamically selecting from over 100 different tools in complex cybersecurity workflows.

Train the Trainer and Open Sourced

NVIDIA doesn’t release the new graphics cards. weights; they are completely open-sourcing the model’s entire stack, This includes training datasets and libraries as well as reinforcement learning environments.

Because of this level of transparency, Artificial Analysis places Nemotron 3 Super squarely in the ‘most attractive quadrant,’ noting that it achieves the highest openness score while maintaining leading accuracy alongside proprietary models. This intelligence is built on a completely new pipeline that was trained with 10 trillion tokens. It also includes an additional 9-10 billion tokens that are solely focused on advanced reasoning and coding tasks.

Developer Control: Introducing ‘Reasoning Budgets‘

NVIDIA understands the need for precise control of latency, compute cost, and user experience. To solve the classic intelligence-versus-speed dilemma, Nemotron 3 Super introduces highly flexible The Reasoning Models The developer can control the API in a granular manner that is unprecedented.

Developers don’t have to force a single output that fits everyone. They can adapt dynamically. how hard the model ‘thinks’ based on the specific task at hand:

Full Reasoning (Default:) Models are unleashed in order to maximize their capabilities. They explore deep search space and multiple-step trajectory to solve complex agentic problems.
The ‘Reasoning Budget’: The latency of applications is no longer an issue. The model can be explicitly limited in terms of its thinking time and compute allowance. Setting a budget for reasoning allows the model to intelligently maximize its search area and deliver the most accurate answer. This constraint is a good example of a strict restriction.
‘Low Effort Mode’: Some prompts do not require a multi-agent, deep analysis. When a user just needs a simple, concise answer (like standard summarization or basic Q&A) without the overhead of deep reasoning, this toggle transforms Nemotron 3 Super into a lightning-fast responder, saving massive amounts of compute and time.

The ‘Golden’ Configuration

NVIDIA’s team have simplified tuning reasoning models for this release. In order to get the most out of your GPU, it is important that you optimize all aspects. The following are some of the most effective ways to improve your own personal effectiveness. These dynamic modes are: NVIDIA recommends a global configuration of Temperature 1.0 and Top P 0.95.

NVIDIA says locking these precise hyperparameter settings will ensure that the model retains the ideal mathematical balance for creative exploration, logical precision and reasoning depth, regardless of whether it’s running in a low-effort constrained mode or uncapped reasoning deep dive.

Real-World applications and availability

Nemotron 3 Super It has already proven its worth in demanding enterprise applications.

Software Development The software is capable of handling junior level pull requests. In terms of issue localization it performs better than leading proprietary models, identifying the bug’s exact source code.
Cybersecurity: With its sophisticated tool calling logic, the model is able to navigate complex ISV security workflows.
Sovereign AI: Nemotron’s architecture is being used to develop specialized and localized models for organizations around the world, including in India, Vietnam South Korea and Europe.

Nemotron 3 Super – is the released in BF16, FP8, and NVFP4 quantizations, with NVFP4 The model must be run on a DGX Spark.

Check out the Models on Hugging Face. Details can be found on Research Paper The following are some examples of how to get started: Technical/Developer Blog.

_{Thanks to the NVIDIA AI team for the thought leadership/ Resources for this article. NVIDIA AI team has supported and sponsored this content/article.}

Jean-marc has been a highly successful AI executive. He leads and accelerates the growth of AI solutions. In 2006, he founded a company that specializes in computer vision. His MBA is from Stanford and he has been a speaker at AI events.

NVIDIA Releases Nemotron 3, a Hybrid 120B Parameter Mamba-Attention MoE Open-Source Model that Delivers 5x Higher Throughput Agentic AI

OpenAI’s GPT-5.4 Cyber: A Finely Tuned Model for Verified Security Defenders

Code Implementation for an AI-Powered Pipeline to Detect File Types and Perform Security Analysis with OpenAI and Magika

TabPFN’s superior accuracy on tabular data sets is achieved by leveraging in-context learning compared to Random Forest or CatBoost

Moonshot AI Researchers and Tsinghua Researchers propose PrfaaS, a cross-datacenter KVCache architecture that rethinks how LLMs can be served at scale.

Wukong: the AI Chatbot China Installed in its Space Station

Amazon’s ‘House of David’ Used Over 350 AI Shots in Season 2. The creator isn’t sorry

How Claude Code Is Reshaping Software—and Anthropic

It’s a race to build the DeepSeek in Europe

A Dark Horse AI is rewriting rules of game design

Top Insights

YouTube Music turns 10 and adds new ‘taste match’ playlists

How to Create a Multi-Agent System Using CAMEL with Web-Augmented reasoning, Criticism, and Persistent memory

Latest News

In China, a humanoid robot set a record for the half-marathon.

Prego Has a Dinner-Conversation-Recording Device, Capisce?

NVIDIA Releases Nemotron 3, a Hybrid 120B Parameter Mamba-Attention MoE Open-Source Model that Delivers 5x Higher Throughput Agentic AI

The ‘Five Miracles’ of Nemotron 3 Super

Nemotron Super: The Ultimate AI Engine??

Train the Trainer and Open Sourced

Developer Control: Introducing ‘Reasoning Budgets‘

The ‘Golden’ Configuration

Real-World applications and availability

Related Posts