Self-host an open model

Self-host Nemotron 3 Super 120B A12B: GPU, VRAM, and rental cost

Last updated July 28, 2026 · ByteCosts

Self-hosting Nemotron 3 Super 120B A12B (123.6B, 12B active) in FP8 needs about 163 GB of VRAM per GPU at 8K context and 8 concurrent requests. It fits on 1 tracked GPU on a single card, the cheapest being the B200 (HGX, per GPU) from about $5.89 per GPU-hour. Larger context or higher concurrency grows the KV cache and can push it past one card into tensor parallelism. This is a fit and rental-rate estimate, not a throughput quote; use the calculators below for cost per token.

Estimate cost per 1M tokens - Self-host Nemotron 3 Super 120B A12B serving cost →

GPUs that hold Nemotron 3 Super 120B A12B on one card

Single-GPU fit in FP8 at 8K context, 8 concurrent requests, with the cheapest tracked rental rate.

Nemotron 3 Super 120B A12B single-GPU fit and cheapest rental
GPU	VRAM	Cheapest /GPU-hr	Provider
B200 (HGX, per GPU)	180 GB	$5.89	RunPod

Frequently asked questions

What GPU do I need to run Nemotron 3 Super 120B A12B?

In FP8 at 8K context, Nemotron 3 Super 120B A12B needs about 163 GB of VRAM per GPU. The cheapest single GPU that holds it is the B200 (HGX, per GPU) (180 GB) from around $5.89 per GPU-hour. Higher context or concurrency needs more VRAM or tensor parallelism.

How is the VRAM figure calculated?

Model weights (parameters times bytes per weight for the precision) plus the KV cache (from the model's real layers, KV heads, head dimension, and attention pattern) plus activation and a safety margin. Architecture comes from the model's Hugging Face config; GPU VRAM from the NVIDIA datasheet.

Self-host Nemotron 3 Super 120B A12B: GPU and VRAM. ByteCosts. Updated July 28, 2026. https://bytecosts.com/gpu/self-host/nemotron-3-super-120b-a12b/

GPUs that hold Nemotron 3 Super 120B A12B on one card

Frequently asked questions

What GPU do I need to run Nemotron 3 Super 120B A12B?

How is the VRAM figure calculated?

Sources