# Input Tokens vs Output Tokens: What Counts, What Costs, and Why

> Canonical: https://bytecosts.com/blog/input-tokens-vs-output-tokens/ · Last updated 2026-06-21

**Direct answer.** Input tokens are the tokenized content a model receives, including application-added instructions and context. Output tokens are the tokens the model generates. Providers may price the two categories differently, and both can consume context-window capacity. Accurate cost models must therefore record them separately instead of multiplying one combined token total by a blended price.

**[Apply this concept - Context Window Cost Calculator →](https://bytecosts.com/tools/context-window-cost/)**

## Summary

Input tokens are not limited to the text typed into a chat box. They can include every token sent to the model as part of the request:

System and developer instructions The current user message Earlier conversation turns Fewshot examples Tool or function definitions Retrieved documents in a RAG pipeline Structured data, markup, or imagerelated text metadata Previous tool results returned to an agent Any application wrapper added before transmission

The exact accounting fields and names vary by API. Some providers use terms such as prompt tokens, input tokens, cache creation tokens, or cached input tokens. The stable principle is that the model must process the supplied context before it can generate a response.

A short visible question can therefore produce a large input count. A customer may enter ten words while the application adds a long system prompt, a tool schema, chat history, and several retrieved passages.

## What counts as input tokens

Input tokens are not limited to the text typed into a chat box. They can include every token sent to the model as part of the request:

System and developer instructions The current user message Earlier conversation turns Fewshot examples Tool or function definitions Retrieved documents in a RAG pipeline Structured data, markup, or imagerelated text metadata Previous tool results returned to an agent Any application wrapper added before transmission

The exact accounting fields and names vary by API. Some providers use terms such as prompt tokens, input tokens, cache creation tokens, or cached input tokens. The stable principle is that the model must process the supplied context before it can generate a response.

A short visible question can therefore produce a large input count. A customer may enter ten words while the application adds a long system prompt, a tool schema, chat history, and several retrieved passages.

## What counts as output tokens

Output tokens are generated by the model. For a text response, they represent the tokenized completion returned by the model. Toolcall arguments, structured output, reasoningrelated usage, or other modelspecific categories may be reported separately or included according to the provider's API contract.

Output length is usually controlled by a maximumoutput setting, a stop condition, or the model's own endofsequence decision. A maximum is a ceiling, not a prediction. The model may stop earlier.

Because autoregressive generation produces one token after another, longer outputs increase generation time. They also affect cost when the output rate differs from the input rate.

## The basic cost equation

For a provider with separate input and output rates:

cost = input tokens ÷ 1,000,000 × input price + output tokens ÷ 1,000,000 × output price

Suppose an illustrative request uses 8,000 input tokens and 1,000 output tokens. If the illustrative rates are $2 per million input tokens and $8 per million output tokens:

Input cost: 8,000 ÷ 1,000,000 × $2 = $0.016 Output cost: 1,000 ÷ 1,000,000 × $8 = $0.008 Total: $0.024

These numbers are examples, not current provider quotes. Always substitute the exact model and price tier from the provider's current pricing documentation. The LLM API cost guide covers cached tokens, retries, and monthly volume.

## How caching changes input accounting

Prompt caching can create additional input categories. A repeated prompt prefix may be billed or reported differently from uncached input. Some systems distinguish cache creation from cache reads. Others automatically identify reusable prefixes and report cached tokens in usage metadata.

Do not subtract cached tokens from total input without understanding the provider's fields. A sound ledger records:

uncached input + cache writes + cache reads + output

Each category is multiplied by its own published rate. Read what prompt caching is before modeling savings.

## How input and output share context capacity

The context window defines how much tokenized information the model can use under its API rules. Input and generated output are related because the model must attend to the supplied context and the tokens generated so far.

A request that uses nearly all available capacity for input may leave less room for output, depending on the model and endpoint. Applications should validate both:

input tokens + requested output allowance + required overhead <= supported limit

Provider documentation is the authority for the exact constraint. Do not assume every model treats advertised context and maximum output identically.

## Inputheavy and outputheavy workloads

Different products have different token shapes.

Summarizing long documents RAG over several retrieved passages Reviewing a large code diff Multiturn conversations with extensive history Classifying records with long source text

Generating articles or reports Producing long code files Multistep reasoning responses Creating synthetic data Expanding outlines into detailed prose

The ratio matters. A model that is economical for short classification may be expensive for longform generation if output pricing is high. Compare models against the actual distribution of input and output, not a blended token number.

## Measure the hidden input

Production logging should capture more than total tokens. At minimum, record:

Model and provider Route or product feature Input tokens Cachedinput categories Output tokens Request success or retry status Latency Tenant or plan using a nonPII identifier

Then examine median, 90th percentile, and 95th percentile usage. Large contexts and automated users can dominate spend even when the average appears safe.

The contextwindow cost calculator helps isolate the price of additional prompt context. Use the AI cost calculator to convert requestlevel usage into monthly COGS.

## Common accounting mistakes

Counting only user text. This misses system prompts, tools, history, and retrieval.

Multiplying all tokens by one rate. Input, cached input, and output may have different prices.

Using the maximum output as actual usage. A configured limit is not the number generated.

Ignoring failed attempts. A retry can create another billable request even if the user sees one answer.

Combining models into one average. Routing and fallbacks should be calculated per model before totals are combined.

## What this article covers

- What counts as input tokens
- What counts as output tokens
- The basic cost equation
- How caching changes input accounting
- How input and output share context capacity

## Use it with ByteCosts calculators

After reading the research note, open the related calculator and replace the example assumptions with your own users, requests, tokens, seats, or platform usage.

The goal is to convert the article's cost pattern into a concrete monthly run-rate, per-user margin, or break-even point your team can discuss.

## Frequently asked questions

### Are prompt tokens the same as input tokens?

They usually describe the same broad side of the request, but field names and subcategories vary by provider. Follow the usage schema for the specific API and preserve cached or specialized categories separately.

### Does conversation history count as input tokens?

Yes, when the application sends that history back to the model. A user interface may display an ongoing conversation, but the model only sees the context included in the current request or maintained by the provider's conversation mechanism.

### Do tool calls create output tokens?

Toolcall names and arguments are modelgenerated content and can contribute to output usage according to the API's accounting rules. Tool results sent back to the model can then contribute to a later request's input usage.

### Which token category usually costs more?

There is no universal rule. Many providers publish different input and output rates, but the relationship depends on the exact model, processing mode, and price tier. Verify the current source page.

## Related pricing pages

- [What Is an AI Token? A Practical Definition for LLM Cost and Context](https://bytecosts.com/blog/what-is-an-ai-token/)
- [What Is an LLM Context Window? Tokens, Limits, and Cost](https://bytecosts.com/blog/what-is-an-llm-context-window/)
- [How to Calculate LLM API Cost per Request, User, and Month](https://bytecosts.com/blog/how-to-calculate-llm-api-cost/)
- [Context Window Cost Calculator](https://bytecosts.com/tools/context-window-cost/)
- [AI Model Pricing: Compare LLM Token Costs](https://bytecosts.com/pricing/)
- [OpenAI pricing: API cost per model, user & month](https://bytecosts.com/pricing/openai/)
- [LLM API cost calculator](https://bytecosts.com/use-cases/llm-api-cost-calculator/)

## Model this research

- [AI App Cost Calculator](https://bytecosts.com/tools/ai-cost-calculator/)
- [Scenario Studio](https://bytecosts.com/tools/scenario-studio/)
- [Provider Pricing Index](https://bytecosts.com/tools/ai-provider-pricing/)

## Cite this page

Input Tokens vs Output Tokens: What Counts, What Costs, and Why. ByteCosts. Updated 2026-06-21. https://bytecosts.com/blog/input-tokens-vs-output-tokens/

**Sources**

- [OpenAI API pricing](https://developers.openai.com/api/docs/pricing)
- [Claude API contextwindow documentation](https://platform.claude.com/docs/en/build-with-claude/context-windows)
- [NVIDIA: LLM inference benchmarking fundamentals](https://developer.nvidia.com/blog/llm-benchmarking-fundamental-concepts/)
