The AI race never stops! NVIDIA recently unveiled its new Nemotron Nano 2 model series, featuring an innovative Mamba-Transformer hybrid architecture. It not only surpasses its peers in complex reasoning tasks but also achieves up to 6 times the throughput, while compressing 128K long-text inference to run on a single GPU. Even more exciting, NVIDIA has unprecedentedly open-sourced its massive 6.6 trillion token pre-training dataset, injecting powerful momentum into the entire AI community.
The pace of AI development is breathtaking. Just as everyone was debating the merits of various models, NVIDIA dropped another bombshell. This time, they’re bringing not just a new model, but a whole new ecosystem—the NVIDIA Nemotron Nano 2 series and the massive pre-training dataset behind it.
Simply put, this is not just a technological leap, but a huge contribution to the entire open-source community. Let’s see what goodies NVIDIA has brought to the table this time.
What Makes Nemotron Nano 2 So Powerful? It’s Not Just Fast, It’s Accurate!
If you find existing language models a bit slow for complex tasks or too demanding on hardware, then Nemotron Nano 2 will definitely catch your eye.
The core model introduced this time, NVIDIA-Nemotron-Nano-9B-v2, performs on par with or even surpasses top-tier open-source models in its class, such as Qwen3-8B, in several complex reasoning benchmarks. As you can see from the comparison below, Nemotron Nano 2 maintains a leading edge in accuracy across fields like math (AIME24, AIME25), science (GPQA-D), and long-text understanding (RULER 128k).
But the real highlight is the “Throughput” test on the right. When processing long-sequence text, Nemotron Nano 2’s speed can be up to 6.3 times faster than Qwen3-8B!
What does this mean? It means developers can complete inference tasks at a lower cost and in less time. For applications requiring real-time responses (like chatbots and real-time code generation), this is fantastic news.
This is all thanks to its innovative Mamba-Transformer hybrid architecture. You can think of it as combining the strengths of two engines: the Transformer architecture excels at deep reasoning, like a powerful analytical brain, while the Mamba architecture is known for its high efficiency and ability to handle long sequences, like an unobstructed highway. The combination makes the model both smart and fast.
Not Just a Model, But a Treasure Trove of Data
In the past, the training datasets for top AI models were usually top-secret. But this time, NVIDIA made a surprising decision: they open-sourced the vast majority of the dataset used for pre-training—the Nemotron-Pre-Training-Dataset-v1.
How big is this dataset? A whopping 6.6 trillion tokens! The content covers high-quality web crawls, math, code, and question-and-answer data in multiple languages. NVIDIA has organized it into four main categories:
- Nemotron-CC-v2: Contains a large amount of processed web data and uses synthetic data techniques to generate question-and-answer pairs translated into 15 languages, significantly enhancing the model’s multilingual capabilities.
- Nemotron-CC-Math-v1: A dataset focused on mathematics. NVIDIA developed a unique process to accurately extract and preserve mathematical equations and code snippets from the web, addressing the long-standing issue of lost or corrupted math formulas in previous datasets.
- Nemotron-Pretraining-Code-v1: A large-scale code dataset from GitHub that has undergone multi-stage deduplication, license filtering, and quality checks to ensure the code’s usability and compliance.
- Nemotron-Pretraining-SFT-v1: A synthetically generated dataset covering STEM (Science, Technology, Engineering, Mathematics), academia, reasoning, and multiple languages, specifically designed to improve the model’s instruction-following and reasoning abilities.
The release of this dataset not only allows researchers to reproduce and validate NVIDIA’s results but also provides an invaluable resource for the entire AI community, which will undoubtedly accelerate future AI innovation.
Tech Deep Dive: The Secrets Behind the Magic
Such a powerful model didn’t just appear out of thin air. NVIDIA also shared some key training highlights in its technical report:
- Efficient Pre-training: The base model, Nemotron-Nano-12B-v2-Base, was trained on over 20 trillion tokens using FP8 precision and went through a continuous pre-training phase to enable it to handle 128k long text without sacrificing other performance aspects.
- Fine-grained Post-tuning: The model was fine-tuned using a combination of techniques, including Supervised Fine-Tuning (SFT), Group-wise Relative Policy Optimization (GRPO), Direct Preference Optimization (DPO), and Reinforcement Learning from Human Feedback (RLHF), to ensure it can accurately understand and execute complex instructions.
- Extreme Compression Techniques: Most impressively, NVIDIA successfully compressed the model using a Minitron-based compression strategy to handle 128k token long-text inference on a single NVIDIA A10G GPU. This significantly lowers the hardware barrier for deploying high-performance large language models.
How to Get Started with Nemotron Nano 2?
NVIDIA has released three core models on Hugging Face, available for anyone to download and use:
- NVIDIA-Nemotron-Nano-9B-v2: The fully aligned and pruned final inference model with the strongest performance.
- NVIDIA-Nemotron-Nano-9B-v2-Base: The pruned base model.
- NVIDIA-Nemotron-Nano-12B-v2-Base: The original base model without alignment or pruning.
For researchers and developers who want to delve into all the technical details, NVIDIA has also provided a complete technical report.
In conclusion, the launch of NVIDIA Nemotron Nano 2 not only sets a new benchmark in model performance but also paves the way for the future development of AI with its open data strategy. A faster, more accurate, and more accessible AI era is rapidly approaching.
Frequently Asked Questions (FAQ)
Q1: What exactly is NVIDIA Nemotron Nano 2? A: Nemotron Nano 2 is a series of high-performance, high-accuracy hybrid Mamba-Transformer architecture language models from NVIDIA. They significantly improve computational speed and efficiency while maintaining powerful reasoning capabilities.
Q2: How is Nemotron Nano 2 faster than other models? A: Thanks to its innovative hybrid architecture, Nemotron Nano 2 has a significantly higher throughput when processing long text sequences. In specific tests, it can be up to 6.3 times faster than comparable models, which means faster response times and lower computational costs.
Q3: What’s unique about the Nemotron pre-training dataset?
A: This is the first time a leading company has open-sourced such a large-scale (6.6 trillion tokens) high-quality pre-training dataset. The most special part is its Nemotron-CC-Math-v1 subset, which uses a unique technical process to successfully preserve mathematical formulas and code from the web, with quality far exceeding previous datasets.
Q4: What kind of hardware do I need to run this model? A: According to NVIDIA’s report, the compressed Nemotron Nano 2 model can handle up to 128k token long-text inference on a single NVIDIA A10G GPU (with 22 GiB of memory), which greatly lowers the hardware barrier for high-performance AI.


