Self-host an open model

Self-host GPT-OSS 20B: GPU, VRAM, and rental cost

Direct answer

Self-hosting GPT-OSS 20B (21.5B, 3.6B active) in FP8 needs about 31 GB of VRAM per GPU at 8K context and 8 concurrent requests. It fits on 11 tracked GPUs on a single card, the cheapest being the L40S 48GB from about $0.60 per GPU-hour. Larger context or higher concurrency grows the KV cache and can push it past one card into tensor parallelism. This is a fit and rental-rate estimate, not a throughput quote; use the calculators below for cost per token.

Estimate cost per 1M tokens - Self-host GPT-OSS 20B serving cost →

GPUs that hold GPT-OSS 20B on one card

Single-GPU fit in FP8 at 8K context, 8 concurrent requests, with the cheapest tracked rental rate.

GPT-OSS 20B single-GPU fit and cheapest rental
GPUVRAMCheapest /GPU-hrProvider
L40S 48GB48 GB$0.60Vast.ai
A100 SXM 40GB40 GB$0.67Vast.ai
L40 48GB48 GB$0.82RunPod
A100 PCIe 80GB80 GB$1.39RunPod
A100 SXM 80GB80 GB$1.49RunPod
H100 SXM 80GB80 GB$1.80Vast.ai
GH200 Grace Hopper 96GB HBM396 GB$2.29Lambda
V100 PCIe 32GB32 GB$2.30Paperspace
H200 SXM 141GB141 GB$2.35Vast.ai
H100 NVL 94GB94 GB$3.19RunPod
B200 (HGX, per GPU)180 GB$5.89RunPod

Frequently asked questions

What GPU do I need to run GPT-OSS 20B?

In FP8 at 8K context, GPT-OSS 20B needs about 31 GB of VRAM per GPU. The cheapest single GPU that holds it is the L40S 48GB (48 GB) from around $0.60 per GPU-hour. Higher context or concurrency needs more VRAM or tensor parallelism.

How is the VRAM figure calculated?

Model weights (parameters times bytes per weight for the precision) plus the KV cache (from the model's real layers, KV heads, head dimension, and attention pattern) plus activation and a safety margin. Architecture comes from the model's Hugging Face config; GPU VRAM from the NVIDIA datasheet.

Cite this page

Self-host GPT-OSS 20B: GPU and VRAM. ByteCosts. Updated June 18, 2026. https://bytecosts.com/gpu/self-host/gpt-oss-20b/

Sources