AI Fundamentals

Input Tokens vs Output Tokens: What Counts, What Costs, and Why

Last updated 2026-06-21 · ByteCosts

Direct answer

Input tokens are the tokenized content a model receives, including application-added instructions and context. Output tokens are the tokens the model generates. Providers may price the two categories differently, and both can consume context-window capacity. Accurate cost models must therefore record them separately instead of multiplying one combined token total by a blended price.

Apply this concept - Context Window Cost Calculator →

Summary

Input tokens are not limited to the text typed into a chat box. They can include every token sent to the model as part of the request:

System and developer instructions The current user message Earlier conversation turns Fewshot examples Tool or function definitions Retrieved documents in a RAG pipeline Structured data, markup, or imagerelated text metadata Previous tool results returned to an agent Any application wrapper added before transmission

The exact accounting fields and names vary by API. Some providers use terms such as prompt tokens, input tokens, cache creation tokens, or cached input tokens. The stable principle is that the model must process the supplied context before it can generate a response.

A short visible question can therefore produce a large input count. A customer may enter ten words while the application adds a long system prompt, a tool schema, chat history, and several retrieved passages.

What counts as input tokens

Input tokens are not limited to the text typed into a chat box. They can include every token sent to the model as part of the request:

What counts as output tokens

Output tokens are generated by the model. For a text response, they represent the tokenized completion returned by the model. Toolcall arguments, structured output, reasoningrelated usage, or other modelspecific categories may be reported separately or included according to the provider's API contract.

Output length is usually controlled by a maximumoutput setting, a stop condition, or the model's own endofsequence decision. A maximum is a ceiling, not a prediction. The model may stop earlier.

Because autoregressive generation produces one token after another, longer outputs increase generation time. They also affect cost when the output rate differs from the input rate.

The basic cost equation

For a provider with separate input and output rates:

cost = input tokens ÷ 1,000,000 × input price + output tokens ÷ 1,000,000 × output price

Suppose an illustrative request uses 8,000 input tokens and 1,000 output tokens. If the illustrative rates are $2 per million input tokens and $8 per million output tokens:

Input cost: 8,000 ÷ 1,000,000 × $2 = $0.016 Output cost: 1,000 ÷ 1,000,000 × $8 = $0.008 Total: $0.024

These numbers are examples, not current provider quotes. Always substitute the exact model and price tier from the provider's current pricing documentation. The LLM API cost guide covers cached tokens, retries, and monthly volume.

How caching changes input accounting

Prompt caching can create additional input categories. A repeated prompt prefix may be billed or reported differently from uncached input. Some systems distinguish cache creation from cache reads. Others automatically identify reusable prefixes and report cached tokens in usage metadata.

Do not subtract cached tokens from total input without understanding the provider's fields. A sound ledger records:

uncached input + cache writes + cache reads + output

Each category is multiplied by its own published rate. Read what prompt caching is before modeling savings.

Inputheavy and outputheavy workloads

Different products have different token shapes.

Summarizing long documents RAG over several retrieved passages Reviewing a large code diff Multiturn conversations with extensive history Classifying records with long source text

Generating articles or reports Producing long code files Multistep reasoning responses Creating synthetic data Expanding outlines into detailed prose

The ratio matters. A model that is economical for short classification may be expensive for longform generation if output pricing is high. Compare models against the actual distribution of input and output, not a blended token number.

Measure the hidden input

Production logging should capture more than total tokens. At minimum, record:

Model and provider Route or product feature Input tokens Cachedinput categories Output tokens Request success or retry status Latency Tenant or plan using a nonPII identifier

Then examine median, 90th percentile, and 95th percentile usage. Large contexts and automated users can dominate spend even when the average appears safe.

The contextwindow cost calculator helps isolate the price of additional prompt context. Use the AI cost calculator to convert requestlevel usage into monthly COGS.

Common accounting mistakes

Counting only user text. This misses system prompts, tools, history, and retrieval.

Multiplying all tokens by one rate. Input, cached input, and output may have different prices.

Using the maximum output as actual usage. A configured limit is not the number generated.

Ignoring failed attempts. A retry can create another billable request even if the user sees one answer.

Combining models into one average. Routing and fallbacks should be calculated per model before totals are combined.

What this article covers

What counts as input tokens
What counts as output tokens
The basic cost equation
How caching changes input accounting
How input and output share context capacity

Use it with ByteCosts calculators

After reading the research note, open the related calculator and replace the example assumptions with your own users, requests, tokens, seats, or platform usage.

The goal is to convert the article's cost pattern into a concrete monthly run-rate, per-user margin, or break-even point your team can discuss.

Frequently asked questions

Are prompt tokens the same as input tokens?

They usually describe the same broad side of the request, but field names and subcategories vary by provider. Follow the usage schema for the specific API and preserve cached or specialized categories separately.

Does conversation history count as input tokens?

Yes, when the application sends that history back to the model. A user interface may display an ongoing conversation, but the model only sees the context included in the current request or maintained by the provider's conversation mechanism.

Do tool calls create output tokens?

Toolcall names and arguments are modelgenerated content and can contribute to output usage according to the API's accounting rules. Tool results sent back to the model can then contribute to a later request's input usage.

Which token category usually costs more?

There is no universal rule. Many providers publish different input and output rates, but the relationship depends on the exact model, processing mode, and price tier. Verify the current source page.

Cite this page

Input Tokens vs Output Tokens: What Counts, What Costs, and Why. ByteCosts. Updated 2026-06-21. https://bytecosts.com/blog/input-tokens-vs-output-tokens/

Sources

Machine-readable

Markdown mirror