Close Menu
  • AI
  • Content Creation
  • Tech
  • Robotics
AI-trends.todayAI-trends.today
  • AI
  • Content Creation
  • Tech
  • Robotics
Trending
  • In China, a humanoid robot set a record for the half-marathon.
  • Prego Has a Dinner-Conversation-Recording Device, Capisce?
  • AI CEOs think they can be everywhere at once
  • OpenAI’s GPT-5.4 Cyber: A Finely Tuned Model for Verified Security Defenders
  • Code Implementation for an AI-Powered Pipeline to Detect File Types and Perform Security Analysis with OpenAI and Magika
  • TabPFN’s superior accuracy on tabular data sets is achieved by leveraging in-context learning compared to Random Forest or CatBoost
  • Moonshot AI Researchers and Tsinghua Researchers propose PrfaaS, a cross-datacenter KVCache architecture that rethinks how LLMs can be served at scale.
  • OpenMythos – A PyTorch Open Source Reconstruction of Claude Mythos, where 770M Parameters match a 1.3B Transformator
AI-trends.todayAI-trends.today
Home»Tech»NVIDIA Releases Nemotron 3, a Hybrid 120B Parameter Mamba-Attention MoE Open-Source Model that Delivers 5x Higher Throughput Agentic AI

NVIDIA Releases Nemotron 3, a Hybrid 120B Parameter Mamba-Attention MoE Open-Source Model that Delivers 5x Higher Throughput Agentic AI

Tech By Gavin Wallace11/03/20266 Mins Read
Facebook Twitter LinkedIn Email
Microsoft Releases NLWeb: An Open Project that Allows Developers to
Microsoft Releases NLWeb: An Open Project that Allows Developers to
Share
Facebook Twitter LinkedIn Email

The difference between frontier proprietary models and open-source, highly transparent models is narrowing faster than ever. NVIDIA is officially revealing the secrets behind its flagship GPU. Nemotron 3 SuperA staggering reasoning model with 120 billion parameters, specifically designed for multi-agent complex applications.

Released Today Nemotron 3 Super It is the perfect bridge between the light 30 billion-parameter Nemotron 3 Nano as well as the 500 billion-parameter Nemotron 3 Ultra that will be released in the future, 2026. Delivering up to 7x higher throughput Double the accuracy compared to its predecessor, this new model represents a huge leap forward in terms of efficiency and intelligence for those developers who do not want to compromise.

The ‘Five Miracles’ of Nemotron 3 Super

Five major technological advances are responsible for the performance of Nemotron 3 Super.

  • Hybrid MoE Architecture: This model intelligently blends Mamba layers that are memory efficient with Transformer layers of high precision. To generate tokens, only a small fraction of the parameters are activated. it achieves a 4x increase in KV SSM cache efficiency is improved.
  • Multi-Token Prediction (MTP): This model allows for 3x faster reasoning on complicated tasks.
  • 1 Million Context Window The context size is 7x greater than previous generations, allowing developers to drop large technical reports, or even entire codebases, directly into the memory of the model, eliminating the necessity for reasoning in multiple-step workflows.
  • Latent MoE: It is possible to reduce the amount of information in a model. activate four experts for the same compute cost as onWithout this innovative technology, the model’s accuracy would have to be increased by 35 times.
  • NeMo RL Gym Integration: By using interactive reinforcement-learning pipelines instead of static texts, the model can learn from feedback loops that are dynamic rather than text. This effectively doubles its intelligence index.

These breakthroughs lead to incredible output token efficiency per GPU

Nemotron Super: The Ultimate AI Engine??

Nemotron 3 Super It is not just another large-language model. Instead, it’s positioned to be a reasoning tool that can plan, execute, and verify complex tasks in a system of models. This is why it’s architecture changes the game for multiagent workflows.

  • For deeper reasoning, high throughput is required. You can also find out more about the following: model’s 7x higher throughput physically expands its search space. It can evaluate and explore more routes and better responses because it is able to process tokens and create them faster. It allows for developers to do more reasoning with the same budget of compute, and this is crucial when building intelligent, autonomous agents.
  • Zero “Re-Reasoning” The Long Workflow: Multi-agent systems constantly transmit context between agents. This 1-million-token context window enables the model to store massive quantities of information, such as entire codebases and long multi-step conversation histories between agents, in its own memory. It eliminates latency and costs associated with forcing the model to reprocess context each step.
  • Agent-Specific training environments: The model pipeline has been extended to include over 15 interactive reinforcement-learning environments. Nemotron 3 Super was able to learn the best paths for autonomous task completion by training with dynamic simulation loops.
  • The Advanced Tool Calling Features: Models need to be able to respond, and not only textually, in real-world applications that involve multiple agents. The model is ready to use. Nemotron 3 Super has proven highly proficient at tool calling, successfully navigating massive pools of available functions—such as dynamically selecting from over 100 different tools in complex cybersecurity workflows.

Train the Trainer and Open Sourced

NVIDIA doesn’t release the new graphics cards. weights; they are completely open-sourcing the model’s entire stack, This includes training datasets and libraries as well as reinforcement learning environments.

Because of this level of transparency, Artificial Analysis places Nemotron 3 Super squarely in the ‘most attractive quadrant,’ noting that it achieves the highest openness score while maintaining leading accuracy alongside proprietary models. This intelligence is built on a completely new pipeline that was trained with 10 trillion tokens. It also includes an additional 9-10 billion tokens that are solely focused on advanced reasoning and coding tasks.

Developer Control: Introducing ‘Reasoning Budgets‘

NVIDIA understands the need for precise control of latency, compute cost, and user experience. To solve the classic intelligence-versus-speed dilemma, Nemotron 3 Super introduces highly flexible The Reasoning Models The developer can control the API in a granular manner that is unprecedented.

Developers don’t have to force a single output that fits everyone. They can adapt dynamically. how hard the model ‘thinks’ based on the specific task at hand:

  • Full Reasoning (Default:) Models are unleashed in order to maximize their capabilities. They explore deep search space and multiple-step trajectory to solve complex agentic problems.
  • The ‘Reasoning Budget’: The latency of applications is no longer an issue. The model can be explicitly limited in terms of its thinking time and compute allowance. Setting a budget for reasoning allows the model to intelligently maximize its search area and deliver the most accurate answer. This constraint is a good example of a strict restriction.
  • ‘Low Effort Mode’: Some prompts do not require a multi-agent, deep analysis. When a user just needs a simple, concise answer (like standard summarization or basic Q&A) without the overhead of deep reasoning, this toggle transforms Nemotron 3 Super into a lightning-fast responder, saving massive amounts of compute and time.

The ‘Golden’ Configuration

NVIDIA’s team have simplified tuning reasoning models for this release. In order to get the most out of your GPU, it is important that you optimize all aspects. The following are some of the most effective ways to improve your own personal effectiveness. These dynamic modes are: NVIDIA recommends a global configuration of Temperature 1.0 and Top P 0.95.

NVIDIA says locking these precise hyperparameter settings will ensure that the model retains the ideal mathematical balance for creative exploration, logical precision and reasoning depth, regardless of whether it’s running in a low-effort constrained mode or uncapped reasoning deep dive.

Real-World applications and availability

Nemotron 3 Super It has already proven its worth in demanding enterprise applications.

  • Software Development The software is capable of handling junior level pull requests. In terms of issue localization it performs better than leading proprietary models, identifying the bug’s exact source code.
  • Cybersecurity: With its sophisticated tool calling logic, the model is able to navigate complex ISV security workflows.
  • Sovereign AI: Nemotron’s architecture is being used to develop specialized and localized models for organizations around the world, including in India, Vietnam South Korea and Europe.

Nemotron 3 Super – is the released in BF16, FP8, and NVFP4 quantizations, with NVFP4 The model must be run on a DGX Spark.

Check out the Models on Hugging Face. Details can be found on Research Paper The following are some examples of how to get started: Technical/Developer Blog.


Thanks to the NVIDIA AI team for the thought leadership/ Resources for this article. NVIDIA AI team has supported and sponsored this content/article.


Jean-marc has been a highly successful AI executive. He leads and accelerates the growth of AI solutions. In 2006, he founded a company that specializes in computer vision. His MBA is from Stanford and he has been a speaker at AI events.

agentic ai AI ar Live met nvidia x
Share. Facebook Twitter LinkedIn Email
Avatar
Gavin Wallace

Related Posts

OpenAI’s GPT-5.4 Cyber: A Finely Tuned Model for Verified Security Defenders

20/04/2026

Code Implementation for an AI-Powered Pipeline to Detect File Types and Perform Security Analysis with OpenAI and Magika

20/04/2026

TabPFN’s superior accuracy on tabular data sets is achieved by leveraging in-context learning compared to Random Forest or CatBoost

20/04/2026

Moonshot AI Researchers and Tsinghua Researchers propose PrfaaS, a cross-datacenter KVCache architecture that rethinks how LLMs can be served at scale.

20/04/2026
Top News

Wukong: the AI Chatbot China Installed in its Space Station

Amazon’s ‘House of David’ Used Over 350 AI Shots in Season 2. The creator isn’t sorry

How Claude Code Is Reshaping Software—and Anthropic

It’s a race to build the DeepSeek in Europe

A Dark Horse AI is rewriting rules of game design

Load More
AI-Trends.Today

Your daily source of AI news and trends. Stay up to date with everything AI and automation!

X (Twitter) Instagram
Top Insights

YouTube Music turns 10 and adds new ‘taste match’ playlists

21/08/2025

How to Create a Multi-Agent System Using CAMEL with Web-Augmented reasoning, Criticism, and Persistent memory

30/12/2025
Latest News

In China, a humanoid robot set a record for the half-marathon.

20/04/2026

Prego Has a Dinner-Conversation-Recording Device, Capisce?

20/04/2026
X (Twitter) Instagram
  • Privacy Policy
  • Contact Us
  • Terms and Conditions
© 2026 AI-Trends.Today

Type above and press Enter to search. Press Esc to cancel.