losclouds

AI Models / Compare

Llama Nemotron Ultra 253B

NVIDIA flagship reasoning model — top GPQA and AIME in the open-weight class.

Creator
NVIDIA
Lifecycle
Active
Context
131.1K
Max output
32.8K
Released
Apr 7, 2025
Status
unknown
Input
$0.60 / 1M tokens
Output
$1.80 / 1M tokens
Cached read
/ 1M tokens
Cached write
/ 1M tokens
Batch discount
%
Source
Llama Nemotron Ultra 253B pricing
Verified
Apr 5, 2026 (High)

Capabilities

Modalities
texttext
Capabilities
reasoningbatchSupportpromptCachingfunctionCallingstructuredOutputs
Strengths
Frontier reasoning quality open-weight, Top GPQA in class
Tradeoffs
Needs 4×B100 or 8×H100 to self-host
Official Links

Benchmark Coverage

BenchmarkVersionScoreDateSourceNotes
GPQA202476.01 %Apr 1, 2025NVIDIAReasoning ON, vendor-reported
AIME 2025202572.5 %Apr 1, 2025NVIDIAReasoning ON, vendor-reported
MATH-500202497 %Apr 1, 2025NVIDIAVendor-reported

Release History

ReleaseAliasLifecycleRelease DateDeprecationShutdownSummary
Llama Nemotron Ultra 253Bnemotron-ultra-253bActiveApr 7, 2025Current published model family snapshot.

Host Coverage

HostTypeContextPricing NoteDifferences
NVIDIA NIMfirst-party131.1K$0.60/$1.80 per MTok.Thinking mode toggle; Multilingual
Migration Guidance

Best open-weight reasoning model for quality-first workloads.

Change Events
DateTypeTitleDescriptionSource
Apr 7, 2025family_addedLlama Nemotron Ultra 253B publishedInitial public model family launch.Llama Nemotron Ultra 253B release notes

Other models from NVIDIA

Llama Nemotron Super 49B