Llama Nemotron Ultra 253B
NVIDIA flagship reasoning model — top GPQA and AIME in the open-weight class.
- Input
- $0.60 / 1M tokens
- Output
- $1.80 / 1M tokens
- Cached read
- — / 1M tokens
- Cached write
- — / 1M tokens
- Batch discount
- —%
- Source
- Llama Nemotron Ultra 253B pricing
- Verified
- Apr 5, 2026 (High)
Capabilities
- Modalities
- text→text
- Capabilities
- reasoningbatchSupportpromptCachingfunctionCallingstructuredOutputs
- Strengths
- Frontier reasoning quality open-weight, Top GPQA in class
- Tradeoffs
- Needs 4×B100 or 8×H100 to self-host
Benchmark Coverage
| Benchmark | Version | Score | Date | Source | Notes |
|---|---|---|---|---|---|
| GPQA | 2024 | 76.01 % | Apr 1, 2025 | NVIDIA | Reasoning ON, vendor-reported |
| AIME 2025 | 2025 | 72.5 % | Apr 1, 2025 | NVIDIA | Reasoning ON, vendor-reported |
| MATH-500 | 2024 | 97 % | Apr 1, 2025 | NVIDIA | Vendor-reported |
Release History
| Release | Alias | Lifecycle | Release Date | Deprecation | Shutdown | Summary |
|---|---|---|---|---|---|---|
| Llama Nemotron Ultra 253B | nemotron-ultra-253b | Active | Apr 7, 2025 | — | — | Current published model family snapshot. |
Host Coverage
| Host | Type | Context | Pricing Note | Differences |
|---|---|---|---|---|
| NVIDIA NIM | first-party | 131.1K | $0.60/$1.80 per MTok. | Thinking mode toggle; Multilingual |
Migration Guidance
Best open-weight reasoning model for quality-first workloads.
Change Events
| Date | Type | Title | Description | Source |
|---|---|---|---|---|
| Apr 7, 2025 | family_added | Llama Nemotron Ultra 253B published | Initial public model family launch. | Llama Nemotron Ultra 253B release notes |