AI Economics

How the Major LLM Providers Price Prompt Caching

Direct answer

How the Major LLM Providers Price Prompt Caching explains We read the prompt-caching pages of Anthropic, OpenAI, Google, and DeepSeek. Three of the four discount cache reads to about a tenth of fresh input, DeepSeek goes further, and they recover the cost in very different places. This ByteCosts research article explains the cost mechanics behind the headline, turns the pattern into budgeting questions, and points readers toward calculators that can model the same issue with their own workload. Read it when you need a finance-readable explanation of AI Economics before choosing a model, cloud platform, subscription, or optimization path. The static HTML includes the summary, article body, tables, related tools, and citation before JavaScript runs.

Read the article - 6 min →

Summary

Prompt caching is the single biggest lever on a contextheavy bill: in any agent loop that resends the same instructions and history on every step, most input tokens are repeats. So we read the promptcaching pages of four major providers to see what they actually charge. They land in a similar place on the headline and disagree on almost everything else.

Anthropic, OpenAI, and Google all price a cache read at about a tenth of fresh input; DeepSeek goes further, pricing a cache hit at roughly a fiftieth of its cache miss. But each provider recovers the cost somewhere different. Anthropic prices cache reads at 0.1x base input yet charges a write premium of 1.25x to put tokens in the cache. Google discounts cached input to $0.03 per 1M tokens against $0.30 input on Gemini 2.5 Flash, then adds $1.00 per 1M tokens per hour of storage. OpenAI's caching is automatic with no extra fee: gpt5.5 cached input is $0.50 against $5.00 input. DeepSeek splits the input price into a cache hit of $0.0028 and a cache miss of $0.14 on V4Flash.

Provider Cache read What else you pay Source Anthropic 0.1x base input write premium: 1.25x (5min), 2x (1hour) Anthropic OpenAI, gpt5.5 $0.50 cached input (vs $5.00 input) nothing (automatic) OpenAI Google, Gemini 2.5 Flash $0.03 cached input (vs $0.30 input) $1.00 per 1M tokens per hour storage Google DeepSeek, V4Flash $0.0028 cache hit $0.14 on a cache miss DeepSeek

Three of the four land cache reads at about a tenth of fresh input. Anthropic states it as a multiplier (0.1x), while Google ($0.03 against $0.30) and OpenAI ($0.50 against $5.00) state it as a discounted pertoken rate. DeepSeek goes further still, pricing a cache hit at roughly a fiftieth of a cache miss. Either way, if your workload rereads a large, stable context, the cacheread line, not the input line, is where most of the bill lands.

Article body

Prompt caching is the single biggest lever on a contextheavy bill: in any agent loop that resends the same instructions and history on every step, most input tokens are repeats. So we read the promptcaching pages of four major providers to see what they actually charge. They land in a similar place on the headline and disagree on almost everything else.

Quick answer

Anthropic, OpenAI, and Google all price a cache read at about a tenth of fresh input; DeepSeek goes further, pricing a cache hit at roughly a fiftieth of its cache miss. But each provider recovers the cost somewhere different. Anthropic prices cache reads at 0.1x base input yet charges a write premium of 1.25x to put tokens in the cache. Google discounts cached input to $0.03 per 1M tokens against $0.30 input on Gemini 2.5 Flash, then adds $1.00 per 1M tokens per hour of storage. OpenAI's caching is automatic with no extra fee: gpt5.5 cached input is $0.50 against $5.00 input. DeepSeek splits the input price into a cache hit of $0.0028 and a cache miss of $0.14 on V4Flash.

The pricing pages, side by side table

ProviderCache readWhat else you paySource
Anthropic0.1x base inputwrite premium: 1.25x (5min), 2x (1hour)Anthropic
OpenAI, gpt5.5$0.50 cached input (vs $5.00 input)nothing (automatic)OpenAI
Google, Gemini 2.5 Flash$0.03 cached input (vs $0.30 input)$1.00 per 1M tokens per hour storageGoogle
DeepSeek, V4Flash$0.0028 cache hit$0.14 on a cache missDeepSeek

Where they mostly agree

Three of the four land cache reads at about a tenth of fresh input. Anthropic states it as a multiplier (0.1x), while Google ($0.03 against $0.30) and OpenAI ($0.50 against $5.00) state it as a discounted pertoken rate. DeepSeek goes further still, pricing a cache hit at roughly a fiftieth of a cache miss. Either way, if your workload rereads a large, stable context, the cacheread line, not the input line, is where most of the bill lands.

Where they differ

Anthropic charges to write. Reads are cheap at 0.1x, but writing tokens into the cache costs 1.25x base input for the 5minute window and 2x for the 1hour window. You pay up front, so caching pays off only when the same context is reused enough times to amortize the write. Google charges rent. Cached input is cheap at $0.03, but Google bills $1.00 per 1M tokens per hour to keep the cache warm. A large context held open across a slow session can cost more in storage than it saves on reads. OpenAI charges nothing extra. Its docs say caching "works automatically on all your API requests (no code changes required) and has no additional fees associated with it", so the cached rate ($0.50 against $5.00 for gpt5.5) is the whole story. Simplest to reason about, least to tune. DeepSeek prices the gap. The cache hit at $0.0028 against a $0.14 miss is the steepest spread of the four, rewarding reuse heavily and punishing a cold cache.

How to read this for your own bill

The right comparison is never the input sticker price; it is the blended rate your workload actually hits once cache writes, reads, and any storage are counted at your reuse ratio. Pull the current pertoken rates from the ByteCosts AI Provider Pricing Index and model your own cachehit ratio in the AI cost calculator before you pick a provider on its headline number.

Key takeaways

Anthropic, OpenAI, and Google price cache reads at about a tenth of fresh input; DeepSeek goes further, to roughly a fiftieth (cache hit versus miss). They recover the cost differently: Anthropic adds a write premium (1.25x), Google adds storage ($1.00 per 1M tokens per hour), OpenAI adds nothing, DeepSeek prices a steep hitmiss gap ($0.0028 vs $0.14). Compare on the blended rate at your reuse ratio, not the input sticker price.

Sources

Prompt caching Anthropic, accessed June 10, 2026. API pricing OpenAI, accessed June 10, 2026. Prompt caching OpenAI, accessed June 10, 2026. Gemini API pricing Google, accessed June 10, 2026. Models and pricing DeepSeek, accessed June 10, 2026.

What this article covers

  • Quick answer
  • The pricing pages, side by side
  • Where they mostly agree
  • Where they differ
  • How to read this for your own bill

Use it with ByteCosts calculators

After reading the research note, open the related calculator and replace the example assumptions with your own users, requests, tokens, seats, or platform usage.

The goal is to convert the article's cost pattern into a concrete monthly run-rate, per-user margin, or break-even point your team can discuss.

Frequently asked questions

Is this article available before JavaScript runs?

Yes. The prerendered HTML includes the article summary, direct answer, key sections, related tools, and citation block for crawlers and readers without JavaScript.

Can I model the article's scenario with my own assumptions?

Yes. Use the related ByteCosts calculators to replace the article's example numbers with your own workload, usage, and pricing assumptions.

Cite this page

How the Major LLM Providers Price Prompt Caching. ByteCosts. Updated 2026-06-10. https://bytecosts.com/blog/how-llm-providers-price-prompt-caching/

Sources

Machine-readable