Self-host an open model

Self-host DeepSeek V4 Pro: GPU, VRAM, and rental cost

Direct answer

Self-hosting DeepSeek V4 Pro (861.6B, 49B active) in FP8 needs about 1139 GB of VRAM per GPU at 8K context and 8 concurrent requests, which exceeds every single tracked GPU, so it requires tensor parallelism across multiple GPUs. Lower the context, concurrency, or precision to fit fewer cards.

Estimate cost per 1M tokens - Self-host DeepSeek V4 Pro serving cost →

Frequently asked questions

What GPU do I need to run DeepSeek V4 Pro?

DeepSeek V4 Pro needs about 1139 GB of VRAM per GPU in FP8 at 8K context, which is more than any single tracked GPU, so it requires splitting across multiple GPUs (tensor parallelism).

How is the VRAM figure calculated?

Model weights (parameters times bytes per weight for the precision) plus the KV cache (from the model's real layers, KV heads, head dimension, and attention pattern) plus activation and a safety margin. Architecture comes from the model's Hugging Face config; GPU VRAM from the NVIDIA datasheet.

Cite this page

Self-host DeepSeek V4 Pro: GPU and VRAM. ByteCosts. Updated June 18, 2026. https://bytecosts.com/gpu/self-host/deepseek-v4-pro/

Sources