Self-host an open model

Self-host Qwen3.5 9B: GPU, VRAM, and rental cost

Last updated July 28, 2026 · ByteCosts

Self-hosting Qwen3.5 9B (9.7B) in FP8 needs about 16 GB of VRAM per GPU at 8K context and 8 concurrent requests. It fits on 14 tracked GPUs on a single card, the cheapest being the L4 24GB from about $0.39 per GPU-hour. Larger context or higher concurrency grows the KV cache and can push it past one card into tensor parallelism. This is a fit and rental-rate estimate, not a throughput quote; use the calculators below for cost per token.

Estimate cost per 1M tokens - Self-host Qwen3.5 9B serving cost →

GPUs that hold Qwen3.5 9B on one card

Single-GPU fit in FP8 at 8K context, 8 concurrent requests, with the cheapest tracked rental rate.

Qwen3.5 9B single-GPU fit and cheapest rental
GPU	VRAM	Cheapest /GPU-hr	Provider
L4 24GB	24 GB	$0.39	RunPod
L40S 48GB	48 GB	$0.60	Vast.ai
A100 SXM 40GB	40 GB	$0.67	Vast.ai
L40 48GB	48 GB	$0.82	RunPod
A10G 24GB (AWS)	24 GB	$1.01	AWS
A10 24GB	24 GB	$1.29	Lambda
A100 PCIe 80GB	80 GB	$1.39	RunPod
A100 SXM 80GB	80 GB	$1.49	RunPod
H100 SXM 80GB	80 GB	$1.80	Vast.ai
GH200 Grace Hopper 96GB HBM3	96 GB	$2.29	Lambda
V100 PCIe 32GB	32 GB	$2.30	Paperspace
H200 SXM 141GB	141 GB	$2.35	Vast.ai
H100 NVL 94GB	94 GB	$3.19	RunPod
B200 (HGX, per GPU)	180 GB	$5.89	RunPod

Frequently asked questions

What GPU do I need to run Qwen3.5 9B?

In FP8 at 8K context, Qwen3.5 9B needs about 16 GB of VRAM per GPU. The cheapest single GPU that holds it is the L4 24GB (24 GB) from around $0.39 per GPU-hour. Higher context or concurrency needs more VRAM or tensor parallelism.

How is the VRAM figure calculated?

Model weights (parameters times bytes per weight for the precision) plus the KV cache (from the model's real layers, KV heads, head dimension, and attention pattern) plus activation and a safety margin. Architecture comes from the model's Hugging Face config; GPU VRAM from the NVIDIA datasheet.

Self-host Qwen3.5 9B: GPU and VRAM. ByteCosts. Updated July 28, 2026. https://bytecosts.com/gpu/self-host/qwen3-5-9b/

GPUs that hold Qwen3.5 9B on one card

Frequently asked questions

What GPU do I need to run Qwen3.5 9B?

How is the VRAM figure calculated?

Sources