AI Economics

How the Major LLM Providers Price Prompt Caching

Last updated 2026-07-19 · ByteCosts

Cache reads can be much cheaper than uncached input, but prompt caching does not have one universal break-even point. The result depends on the write premium, repeated prefix tokens, measured hit rate, TTL, and provider eligibility rules, plus any storage charge. Compare the full cached path against ordinary input for the same workload.

Apply this concept - Prompt cache savings calculator →

Summary

In the named model snapshots below, Anthropic, OpenAI, and Google price a cache read at about a tenth of fresh text input. DeepSeek prices a V4Flash cache hit at roughly a fiftieth of its cache miss. Those read discounts are not the whole bill: Anthropic charges a write premium, Google adds contextcache storage, OpenAI requires eligible exactprefix reuse, and DeepSeek's blended rate depends on observed hits and misses. If you need the mechanism before the commercial comparison, start with what prompt caching is.

Provider Model or plan Uncached input price Cached input or cacheread price Write / storage fee Eligibility / minimum prompt length Cache mechanics Bestfit workload Official source Last checked : : Anthropic Claude Sonnet 4.6 $3.00/MTok $0.30/MTok cache read 5minute cache write $3.75/MTok; 1hour write $6.00/MTok Modelspecific minimums and explicit cache breakpoints Cache reads are 0.1x base input; writes cost more and expire by TTL Reused system prompts, tools, examples, and long reference context Anthropic pricing and prompt caching 20260615 OpenAI gpt5.5 $5.00/MTok $0.50/MTok cached input no separate cachewrite fee listed Automatic for prompts of at least 1,024 tokens Requires exact prefix matches; static content should come first; cached prompts still count toward rate limits and do not change output tokens Long, repeated prefixes with static content first and variable content last OpenAI pricing and prompt caching 20260615 Google Gemini 2.5 Flash, Standard $0.30/MTok text/image/video $0.03/MTok context caching $1.00 per 1M tokens per hour of contextcache storage Context cache must be created and kept warm Storage time is billed separately, so slow reuse can reduce savings Large shared context reused across many calls inside the cache window Gemini API pricing 20260615 DeepSeek DeepSeekV4Flash $0.14/MTok cache miss $0.0028/MTok cache hit no separate write/storage fee listed on the pricing table Provider bills hit and miss input separately Blended cost depends on observed cachehit rate; deepseekchat and deepseekreasoner are compatibility aliases until deprecation Very repetitive prefixes where cache hits are common DeepSeek pricing 20260615

This example uses only the Anthropic Claude Sonnet 4.6 rates in the table above. It illustrates the arithmetic; it is not a universal savings forecast.

Assumptions: The workload sends the same eligible 1 milliontoken prefix twice inside the 5minute TTL. The first request writes the prefix at $3.75/MTok and the second gets a full cache read at $0.30/MTok. There are no expiry, eviction, or eligibility failures, so the repeatedprefix hit rate after the write is 100%. Variable input and output tokens are excluded because they are identical on both sides of this comparison.

Quick answer

The pricing pages, side by side table

Provider	Model or plan	Uncached input price	Cached input or cacheread price	Write / storage fee	Eligibility / minimum prompt length	Cache mechanics	Bestfit workload	Official source	Last checked
Anthropic	Claude Sonnet 4.6	$3.00/MTok	$0.30/MTok cache read	5minute cache write $3.75/MTok; 1hour write $6.00/MTok	Modelspecific minimums and explicit cache breakpoints	Cache reads are 0.1x base input; writes cost more and expire by TTL	Reused system prompts, tools, examples, and long reference context	Anthropic pricing and prompt caching	20260615
OpenAI	gpt5.5	$5.00/MTok	$0.50/MTok cached input	no separate cachewrite fee listed	Automatic for prompts of at least 1,024 tokens	Requires exact prefix matches; static content should come first; cached prompts still count toward rate limits and do not change output tokens	Long, repeated prefixes with static content first and variable content last	OpenAI pricing and prompt caching	20260615
Google	Gemini 2.5 Flash, Standard	$0.30/MTok text/image/video	$0.03/MTok context caching	$1.00 per 1M tokens per hour of contextcache storage	Context cache must be created and kept warm	Storage time is billed separately, so slow reuse can reduce savings	Large shared context reused across many calls inside the cache window	Gemini API pricing	20260615
DeepSeek	DeepSeekV4Flash	$0.14/MTok cache miss	$0.0028/MTok cache hit	no separate write/storage fee listed on the pricing table	Provider bills hit and miss input separately	Blended cost depends on observed cachehit rate; deepseekchat and deepseekreasoner are compatibility aliases until deprecation	Very repetitive prefixes where cache hits are common	DeepSeek pricing	20260615

Worked breakeven example

This example uses only the Anthropic Claude Sonnet 4.6 rates in the table above. It illustrates the arithmetic; it is not a universal savings forecast.

Without caching: 2 × $3.00 = $6.00 for the repeated prefix. With caching: $3.75 + $0.30 = $4.05 for one write and one read. Breakeven check: At one request, the $3.75 write versus $3.00 ordinary input does not break even. At two requests, this modeled cache path passes the breakeven threshold and moves below the $6.00 uncached path. Algebraically, $3.75 + (N 1) × $0.30 1.28; the first wholerequest count that clears the threshold is two.

Your result can reverse with a lower hit rate, a prefix change, or expiry before reuse. Put your own token volume, hit rate, and request count into the promptcache savings calculator, then compare complete workload variants in Scenario Studio.

Where they mostly agree

Three of the four land cache reads at about a tenth of fresh text input in the example rows above. Anthropic states it as a multiplier (0.1x), while Google ($0.03 against $0.30) and OpenAI ($0.50 against $5.00) state it as a discounted pertoken rate. DeepSeek goes further still, pricing a V4Flash cache hit at roughly a fiftieth of a cache miss. Either way, if your workload rereads a large, stable context, the blended cacheread line can dominate the input economics.

Where they differ

Anthropic charges to write. Reads are cheap at 0.1x, but writing tokens into the cache costs 1.25x base input for the 5minute window and 2x for the 1hour window. You pay up front, so caching pays off only when the same context is reused enough times to amortize the write. Google charges rent. Cached input is cheap at $0.03, but Google bills $1.00 per 1M tokens per hour to keep the cache warm. A large context held open across a slow session can cost more in storage than it saves on reads. OpenAI charges nothing extra, but eligibility matters. Its docs say caching works automatically and has no additional fees, but only eligible long prompts are cached. Caching is available for prompts of 1,024 tokens or more, cache hits require exact prefix matches, cached prompts still count toward TPM rate limits, and caching does not change outputtoken generation. Static content should sit at the beginning of the prompt and variable content at the end. DeepSeek prices the gap. The cache hit at $0.0028 against a $0.14 miss is the steepest spread of the four, rewarding reuse heavily and punishing a cold cache.

How to read this for your own bill

The right comparison is never the input sticker price; it is the blended rate your workload actually hits once cache writes, reads, and any storage are counted at your reuse ratio. Pull the current pertoken rates from the ByteCosts AI Provider Pricing Index and model your own cachehit ratio in the AI cost calculator before you pick a provider on its headline number.

Key takeaways

For repeated longprefix workloads, Anthropic, OpenAI, and Google price cache reads at about a tenth of fresh text input in the examples above; DeepSeek goes further, to roughly a fiftieth (cache hit versus miss). They recover the cost differently: Anthropic adds a write premium (1.25x), Google adds storage ($1.00 per 1M tokens per hour), OpenAI adds nothing, DeepSeek prices a steep hitmiss gap ($0.0028 vs $0.14). Compare on the blended rate at your reuse ratio, token length, TTL, and provider eligibility rules, not the input sticker price.

The prices in this article are committed planning snapshots, not live provider data. Provider pricing, plan limits, regions, cache rules, model aliases, discounts, taxes, and marketplace billing can change after the lastchecked date. Where this article shows a comparison, the comparison is scoped to the named model rows and provider pages. It is not a billing guarantee.

Key takeaways table

Claim area	Source type	Source	Last checked	Notes
Anthropic cache read/write multipliers	Official provider pricing and caching docs	Claude API pricing and prompt caching	20260615	Write/read prices vary with model base input price and TTL.
OpenAI cached input and mechanics	Official provider pricing and caching docs	OpenAI pricing and prompt caching	20260615	Caching requires eligible long prompts and exact prefix reuse; no separate write fee found.
Google context caching	Official provider pricing	Gemini API pricing	20260615	Gemini 2.5 Flash Standard text/image/video row; storage rent is separate.
DeepSeek cache hit/miss pricing	Official provider pricing	DeepSeek pricing	20260615	V4 Flash and V4 Pro rows expose cachehit and cachemiss prices directly.

What this article covers

Quick answer
The pricing pages, side by side
Worked breakeven example
Where they mostly agree
Where they differ

Use it with ByteCosts calculators

After reading the research note, open the related calculator and replace the example assumptions with your own users, requests, tokens, seats, or platform usage.

The goal is to convert the article's cost pattern into a concrete monthly run-rate, per-user margin, or break-even point your team can discuss.

Frequently asked questions

Is this article available before JavaScript runs?

Yes. The prerendered HTML includes the article summary, direct answer, key sections, related tools, and citation block for crawlers and readers without JavaScript.

Can I model the article's scenario with my own assumptions?

Yes. Use the related ByteCosts calculators to replace the article's example numbers with your own workload, usage, and pricing assumptions.

How the Major LLM Providers Price Prompt Caching. ByteCosts. Updated 2026-07-19. https://bytecosts.com/blog/how-llm-providers-price-prompt-caching/

Sources

Machine-readable

Markdown mirror