Model · NVIDIA

Llama Nemotron Ultra 253B

NVIDIA flagship reasoning model with vendor-reported GPQA and AIME results in the open-weight class.

Pricing

Model Overview

Tracked token-pricing fields for this model family. Empty pricing fields stay hidden until the source publishes them.

Model pricing
Price field	Value
Input	$0.60 / 1M tokens
Output	$1.80 / 1M tokens
Source	Llama Nemotron Ultra 253B pricing
Verified	Apr 5, 2026 (High)

Surface

Input and output modalities, enabled feature flags, strengths, and tradeoffs.

Model capabilities
Attribute	Values
Modalities	text to text
Capabilities	reasoningbatchSupportpromptCachingfunctionCallingstructuredOutputs
Strengths	Frontier reasoning quality open-weight, Vendor-reported GPQA result
Tradeoffs	Needs 4×B100 or 8×H100 to self-host

References

Canonical launch, documentation, pricing, and release-note URLs.

Official model links
Reference	URL
Intro	https://developer.nvidia.com/blog/llama-nemotron-ultra-an-open-advanced-reasoning-model/
Docs	https://developer.nvidia.com/nemotron
Pricing	Verify current pricing on provider’s page →
Release note	https://developer.nvidia.com/blog/

Coverage

Reported benchmark families, versions, scores, sources, and notes.

Benchmark coverage
Benchmark	Version	Score	Date	Source	Notes
GPQA	2024	76.01 %	Apr 1, 2025	NVIDIA	Reasoning ON, vendor-reported
AIME 2025	2025	72.5 %	Apr 1, 2025	NVIDIA	Reasoning ON, vendor-reported
MATH-500	2024	97 %	Apr 1, 2025	NVIDIA	Vendor-reported

Lifecycle

Lifecycle transitions and release timeline for this model family.

Release history
Release	Alias	Lifecycle	Release Date	Deprecation	Shutdown	Summary
Llama Nemotron Ultra 253B	nemotron-ultra-253b	Active	Apr 7, 2025	—	—	Current published model family snapshot.

Surfaces

Provider-specific hosting, context, pricing notes, feature differences, and provider-status context.

Host coverage
Host	Type	Context	Pricing Note	Differences
NVIDIA NIM	first-party	131.1K	$0.60/$1.80 per MTok.	Thinking mode toggle; Multilingual

Migration

Documented migration summary, successor families, and known breaking changes.

Migration guidance
Topic	Details
Summary	Open-weight reasoning model for quality-first workloads.

Timeline

Cataloged model-family updates and source references.

Change events
Date	Type	Title	Description	Source
Apr 7, 2025	family_added	Llama Nemotron Ultra 253B published	Initial public model family launch.	Llama Nemotron Ultra 253B release notes

From NVIDIA