# How to Calculate LLM API Cost per Request, User, and Month

> Canonical: https://bytecosts.com/blog/how-to-calculate-llm-api-cost/ · Last updated 2026-06-21

**Direct answer.** To calculate LLM API cost, multiply each measured token category by its published rate, add model calls and non-token charges, then multiply by successful and failed request volume. Keep input, cached input, output, retries, routing, and fixed product costs separate. Use provider-reported usage and current source rates, then reconcile the estimate against billing exports instead of hiding differences in a generic multiplier.

**[Use the related calculator - AI App Cost Calculator: Estimate LLM API Cost Per User →](https://bytecosts.com/tools/ai-cost-calculator/)**

## Summary

Start with one unit that maps to a product decision. Useful units include:

One model request One completed task One conversation One active user per month One customer account per month One thousand documents processed

A request is useful for technical accounting. A completed task is often better for workflows that retry or call several models. Cost per active user is useful for pricing and grossmargin decisions.

Do not begin with total monthly spend alone. A total cannot explain whether growth, longer prompts, power users, or failures caused the change.

## Step 1: define the billing unit

Start with one unit that maps to a product decision. Useful units include:

One model request One completed task One conversation One active user per month One customer account per month One thousand documents processed

A request is useful for technical accounting. A completed task is often better for workflows that retry or call several models. Cost per active user is useful for pricing and grossmargin decisions.

Do not begin with total monthly spend alone. A total cannot explain whether growth, longer prompts, power users, or failures caused the change.

## Step 2: collect current rates by category

For each exact model and processing mode, record the published unit price for every category the workload uses:

Ordinary input tokens Cachedinput reads Cache creation or writes when priced separately Output tokens Batch, priority, or realtime processing Audio, image, search, tool, or other feature charges Finetuned model rates when applicable

Store the source URL, currency, unit, model identifier, and date checked. Do not copy a familylevel headline onto a different model version. Provider terms and price tiers can change.

The provider pricing index helps compare sourcebacked rates, but the provider's current documentation remains the final contract for billing.

## Step 3: measure input and output separately

input cost = input tokens ÷ 1,000,000 × input price per million

output cost = output tokens ÷ 1,000,000 × output price per million

Use actual API usage fields when available. Input can include system instructions, user text, conversation history, tools, examples, and retrieved documents. Output is what the model generates under the provider's accounting rules. Read input tokens versus output tokens before building the ledger.

Do not multiply total tokens by a blended rate unless the inputoutput ratio is fixed and clearly documented. Separate categories make the model auditable when behavior changes.

## Step 4: add promptcache categories

When prompt caching is enabled, use the categories reported by the API:

cache cost = cachewrite tokens × write rate + cacheread tokens × read rate

total inputside cost = uncached input cost + cache cost

A request hit rate is not enough. Savings depend on the number of tokens reused. One small cache hit and one large miss should not be averaged as a 50 percent token saving.

Model cache expiry, misses, and prefix changes. The promptcaching definition explains the difference between cache writes, reads, and response caching.

## Step 5: calculate calls per completed task

A user action may create more than one billable request. Include:

Validation or classification calls Main generation Toolselection calls Toolresult followup calls Safety or moderation calls when separately charged Reranking Fallback models Retries after errors or invalid output Evaluation or judge calls

task model cost = sum(request counti × average request costi)

If 10 percent of tasks require one extra attempt, the average call count is not one. It is at least 1.10 before other workflow stages are included.

Track successful and failed calls. A request that does not produce a usable customer result can still consume tokens.

## Step 6: use distributions instead of one average

Calculate at least three workload profiles:

## Step 6: use distributions instead of one average table

| Profile | Purpose |
| --- | --- |
| Median | Represents ordinary usage |
| 95th percentile | Reveals margin pressure from heavy users |
| Worst credible case | Tests caps, abuse controls, and bill shock |

## Step 6: use distributions instead of one average

For each profile, measure requests, input tokens, output tokens, cache reuse, retries, and model routing. A single average can hide a small group that creates most of the cost.

The Scenario Studio is designed for these sidebyside assumptions.

## Step 7: convert request cost into monthly cost

monthly route cost = monthly requests × average cost per request

monthly inference cost = sum(monthly volumei × unit costi)

Then add noninference product costs that scale with the feature:

Embeddings and vector search Storage and data transfer Speech or image processing Observability and trace retention Evaluation runs Queueing or serverless execution Human review and support

Keep fixed costs and variable costs separate. Variable cost supports unit economics; fixed cost supports budget and breakeven analysis.

## Worked example with illustrative rates

Assume a text feature has the following modeled workload. These values are illustrative, not current provider quotes.

## Worked example with illustrative rates table

| Assumption | Value |
| --- | --- |
| Monthly completed tasks | 100,000 |
| Average calls per task | 1.15 |
| Input tokens per call | 2,400 |
| Output tokens per call | 600 |
| Illustrative input rate | $2 per million tokens |
| Illustrative output rate | $8 per million tokens |

## Worked example with illustrative rates

2,400 ÷ 1,000,000 × $2 + 600 ÷ 1,000,000 × $8 = $0.0096

Now suppose retrieval, observability, and storage add $0.0018 per task:

monthly variable AI cost = 100,000 × ($0.01104 + $0.0018) = $1,284

The example demonstrates the method only. Replace every rate and usage assumption with current source data and measured product behavior.

## Step 8: calculate cost per active user

If the feature has U monthly active users:

AI cost per active user = monthly variable AI cost ÷ U

gross margin = (revenue AI COGS other COGS) ÷ revenue

Calculate by plan or customer segment. Enterprise accounts, free users, and automated users can have very different request distributions.

Use the AI cost calculator for a fast monthly estimate and the peruser margin calculator when connecting usage to pricing.

## Step 9: model routing and fallback explicitly

routed cost = sum(traffic sharei × cost per requesti)

Add routerclassification cost and fallback probability. Shares should sum to one for the initial route, while retries and fallbacks are additional conditional calls.

A cheaper model does not save money if it fails often enough to trigger premium fallbacks. Measure acceptedresult cost:

cost per accepted result = total workflow cost ÷ accepted results

## Step 10: reconcile estimates with invoices

An estimate becomes reliable through reconciliation:

1. Sum logged provider usage by day and model. 2. Apply the rate table used by the model. 3. Compare the result with provider billing exports. 4. Investigate taxes, credits, discounts, rounding, unlogged calls, and nontoken products. 5. Version rate changes rather than rewriting history. 6. Update workload percentiles from production data.

Do not force the model to match an invoice by hiding unexplained differences inside a generic multiplier. Keep a reconciliation adjustment visible until the cause is known.

## Launch checklist

Exact model identifiers and source URLs are recorded. Input, cached input, and output are separate. Hidden application context is included. Retries and fallbacks are measured. Median and tail users are modeled. Nontoken services are included. Cost caps and alerts exist. The free tier has a worstcase budget. Logs can be reconciled with billing. Product pricing is tested against peruser COGS.

## What this article covers

- Step 1: define the billing unit
- Step 2: collect current rates by category
- Step 3: measure input and output separately
- Step 4: add promptcache categories
- Step 5: calculate calls per completed task

## Use it with ByteCosts calculators

After reading the research note, open the related calculator and replace the example assumptions with your own users, requests, tokens, seats, or platform usage.

The goal is to convert the article's cost pattern into a concrete monthly run-rate, per-user margin, or break-even point your team can discuss.

## Frequently asked questions

### How do I calculate cost per LLM request?

Multiply each input, cachedinput, and output token count by its matching unit rate, then add any feature charges and conditional calls. Use providerreported usage when possible.

### Should failed requests be included in LLM cost?

Yes. A failed or rejected product outcome may still contain billable model work. Track provider errors, application retries, invalid structured output, and uservisible failures separately.

### Can I use an average token count for budgeting?

Use an average for a first estimate, but also model percentiles and worst credible cases. Heavy users, long contexts, and automation can dominate the monthly bill.

### How often should API prices be updated?

Check on a defined schedule and when providers announce model or pricing changes. Keep effective dates so historical estimates and invoices remain reproducible.

## Related pricing pages

- [AI App Cost Calculator: Estimate LLM API Cost Per User](https://bytecosts.com/tools/ai-cost-calculator/)
- [AI Cost Scenario Studio: Tokens, Caching & Subscriptions](https://bytecosts.com/tools/scenario-studio/)
- [Input Tokens vs Output Tokens: What Counts, What Costs, and Why](https://bytecosts.com/blog/input-tokens-vs-output-tokens/)
- [What Is Prompt Caching? How Reused LLM Context Saves Time and Cost](https://bytecosts.com/blog/what-is-prompt-caching/)
- [AI Model Pricing: Compare LLM Token Costs](https://bytecosts.com/pricing/)
- [OpenAI pricing: API cost per model, user & month](https://bytecosts.com/pricing/openai/)
- [Model routing cost calculator](https://bytecosts.com/use-cases/model-routing-cost-calculator/)

## Model this research

- [AI App Cost Calculator](https://bytecosts.com/tools/ai-cost-calculator/)
- [Scenario Studio](https://bytecosts.com/tools/scenario-studio/)
- [Provider Pricing Index](https://bytecosts.com/tools/ai-provider-pricing/)

## Cite this page

How to Calculate LLM API Cost per Request, User, and Month. ByteCosts. Updated 2026-06-21. https://bytecosts.com/blog/how-to-calculate-llm-api-cost/

**Sources**

- [OpenAI API pricing](https://developers.openai.com/api/docs/pricing)
- [Claude API pricing](https://platform.claude.com/docs/en/about-claude/pricing)
- [Gemini Developer API pricing](https://ai.google.dev/gemini-api/docs/pricing)
