AI Fundamentals
Input Tokens vs Output Tokens: What Counts, What Costs, and Why
Direct answer
Input tokens are the tokenized content a model receives, including application-added instructions and context. Output tokens are the tokens the model generates. Providers may price the two categories differently, and both can consume context-window capacity. Accurate cost models must therefore record them separately instead of multiplying one combined token total by a blended price.
Apply this concept - Context Window Cost Calculator →
Summary
Input tokens are not limited to the text typed into a chat box. They can include every token sent to the model as part of the request:
System and developer instructions The current user message Earlier conversation turns Fewshot examples Tool or function definitions Retrieved documents in a RAG pipeline Structured data, markup, or imagerelated text metadata Previous tool results returned to an agent Any application wrapper added before transmission
The exact accounting fields and names vary by API. Some providers use terms such as prompt tokens, input tokens, cache creation tokens, or cached input tokens. The stable principle is that the model must process the supplied context before it can generate a response.
A short visible question can therefore produce a large input count. A customer may enter ten words while the application adds a long system prompt, a tool schema, chat history, and several retrieved passages.
What counts as input tokens
Input tokens are not limited to the text typed into a chat box. They can include every token sent to the model as part of the request:
System and developer instructions The current user message Earlier conversation turns Fewshot examples Tool or function definitions Retrieved documents in a RAG pipeline Structured data, markup, or imagerelated text metadata Previous tool results returned to an agent Any application wrapper added before transmission
The exact accounting fields and names vary by API. Some providers use terms such as prompt tokens, input tokens, cache creation tokens, or cached input tokens. The stable principle is that the model must process the supplied context before it can generate a response.
A short visible question can therefore produce a large input count. A customer may enter ten words while the application adds a long system prompt, a tool schema, chat history, and several retrieved passages.
What counts as output tokens
Output tokens are generated by the model. For a text response, they represent the tokenized completion returned by the model. Toolcall arguments, structured output, reasoningrelated usage, or other modelspecific categories may be reported separately or included according to the provider's API contract.
Output length is usually controlled by a maximumoutput setting, a stop condition, or the model's own endofsequence decision. A maximum is a ceiling, not a prediction. The model may stop earlier.
Because autoregressive generation produces one token after another, longer outputs increase generation time. They also affect cost when the output rate differs from the input rate.
The basic cost equation
For a provider with separate input and output rates:
cost = input tokens ÷ 1,000,000 × input price + output tokens ÷ 1,000,000 × output price
Suppose an illustrative request uses 8,000 input tokens and 1,000 output tokens. If the illustrative rates are $2 per million input tokens and $8 per million output tokens:
Input cost: 8,000 ÷ 1,000,000 × $2 = $0.016 Output cost: 1,000 ÷ 1,000,000 × $8 = $0.008 Total: $0.024
These numbers are examples, not current provider quotes. Always substitute the exact model and price tier from the provider's current pricing documentation. The LLM API cost guide covers cached tokens, retries, and monthly volume.
How caching changes input accounting
Prompt caching can create additional input categories. A repeated prompt prefix may be billed or reported differently from uncached input. Some systems distinguish cache creation from cache reads. Others automatically identify reusable prefixes and report cached tokens in usage metadata.
Do not subtract cached tokens from total input without understanding the provider's fields. A sound ledger records:
uncached input + cache writes + cache reads + output
Each category is multiplied by its own published rate. Read what prompt caching is before modeling savings.
Inputheavy and outputheavy workloads
Different products have different token shapes.
Summarizing long documents RAG over several retrieved passages Reviewing a large code diff Multiturn conversations with extensive history Classifying records with long source text
Generating articles or reports Producing long code files Multistep reasoning responses Creating synthetic data Expanding outlines into detailed prose
The ratio matters. A model that is economical for short classification may be expensive for longform generation if output pricing is high. Compare models against the actual distribution of input and output, not a blended token number.
Common accounting mistakes
Counting only user text. This misses system prompts, tools, history, and retrieval.
Multiplying all tokens by one rate. Input, cached input, and output may have different prices.
Using the maximum output as actual usage. A configured limit is not the number generated.
Ignoring failed attempts. A retry can create another billable request even if the user sees one answer.
Combining models into one average. Routing and fallbacks should be calculated per model before totals are combined.
What this article covers
- What counts as input tokens
- What counts as output tokens
- The basic cost equation
- How caching changes input accounting
- How input and output share context capacity
Use it with ByteCosts calculators
After reading the research note, open the related calculator and replace the example assumptions with your own users, requests, tokens, seats, or platform usage.
The goal is to convert the article's cost pattern into a concrete monthly run-rate, per-user margin, or break-even point your team can discuss.
Frequently asked questions
Are prompt tokens the same as input tokens?
They usually describe the same broad side of the request, but field names and subcategories vary by provider. Follow the usage schema for the specific API and preserve cached or specialized categories separately.
Does conversation history count as input tokens?
Yes, when the application sends that history back to the model. A user interface may display an ongoing conversation, but the model only sees the context included in the current request or maintained by the provider's conversation mechanism.
Do tool calls create output tokens?
Toolcall names and arguments are modelgenerated content and can contribute to output usage according to the API's accounting rules. Tool results sent back to the model can then contribute to a later request's input usage.
Which token category usually costs more?
There is no universal rule. Many providers publish different input and output rates, but the relationship depends on the exact model, processing mode, and price tier. Verify the current source page.
Cite this page
Input Tokens vs Output Tokens: What Counts, What Costs, and Why. ByteCosts. Updated 2026-06-21. https://bytecosts.com/blog/input-tokens-vs-output-tokens/
Sources
- OpenAI API pricing
- Claude API contextwindow documentation
- NVIDIA: LLM inference benchmarking fundamentals