NVIDIA Launches Nemotron 3 Super Open MoE Model for Agentic AI

Details

NVIDIA launched Nemotron 3 Super, a 120-billion-parameter hybrid mixture-of-experts model with 12 billion active parameters, optimized for NVIDIA Blackwell to tackle context explosion and thinking costs in multi-agent workflows.
Involves NVIDIA, AI-native firms like Perplexity, CodeRabbit, Factory, Greptile, Edison Scientific, Lila Sciences; enterprise platforms including Amdocs, Palantir, Cadence, Dassault Systèmes, Siemens; available via build.nvidia.com, Hugging Face, Perplexity, OpenRouter, Dell, HPE, Google Cloud, Oracle, AWS Bedrock soon, Microsoft Azure.
Features hybrid Mamba-transformer MoE architecture, 1M-token context window, latent MoE activating four experts at one cost, multi-token prediction for 3x faster inference, NVFP4 precision for 4x speed on Blackwell vs Hopper FP8.
Improves 5x throughput and 2x accuracy over prior Nemotron Super; tops Artificial Analysis efficiency leaderboard and powers NVIDIA AI-Q to No.1 on DeepResearch Benches; follows Nemotron 3 Nano launch in December 2025.
Open weights under permissive license with full training recipes, 10T+ tokens datasets; verified leading speed at 452 tokens/second, high benchmarks like GPQA Diamond, AIME 2025, SWE-Bench; deployable on single GPUs like B200, H100.

Impact

Nemotron 3 Super advances agentic AI by enabling scalable multi-agent systems with lower costs and higher accuracy, outpacing models like Qwen3.5-122B in throughput. Its open release democratizes efficient reasoning for enterprises shifting from chatbots, intensifying competition among NVIDIA, OpenAI, and Anthropic in hybrid architectures. Expect broader adoption in software dev, cybersecurity, and life sciences, accelerating AI workflow automation.