Calculator

Retry Loop Tax Calculator

Retry loop tax is the extra AI spend caused by failed requests, parser repairs, timeout retries, and fallback model calls. In the flaky tool-calling agent example, 120,000 monthly requests at 25% per-attempt failure and 5 max attempts turn a $2,952 retry-free bill into $3,932, a 1.33x multiplier with $980/month of retry waste and a P99 band of $8,856.

Open the live Retry Loop Tax calculator - Retry multiplier →

Why this matters now

A 2026 budget-overrun incident catalog documents agent loops, repeated tool calls, and failed automation as recurring AI cost failure modes.

Reported internal token limits and price pressure at major AI companies make retry waste a planning risk, not just an engineering nuisance.

Example scenario

Worked example: the first preset, Flaky tool-calling agent, runs 120,000 monthly requests at 4,200 input and 800 output tokens per attempt with the UI default anthropic:claude-sonnet-4-6 model. At 120,000 monthly requests, 25% per-attempt failure, 5 max attempts, and full re-billing, the retry-free bill is $2,952 and the real bill is $3,932. The retry loop tax is a 1.33x multiplier, $980/month of retry waste, 1.33 expected attempts per request, and a P99 monthly band of $8,856.

What the inputs mean

Failure rate: the share of requests that need another attempt.
Retry count: how many extra attempts happen before the workflow stops.
Fallback path: any more expensive model used after cheaper attempts fail.

What the result means

You get the effective cost multiplier, added monthly spend, and the retry or fallback step that contributes most to the overage.

Assumptions

The example uses the same token shape on every attempt.
The failure probability is per attempt; success after N attempts is modeled separately from cost.
Partial work re-billed is 1.0 in the first preset, so every additional attempt is counted at full per-attempt cost.
Provider list prices come from the committed model row, not a live API call.

Where the prices come from

This worked example uses the committed anthropic:claude-sonnet-4-6 pricing row selected by the live calculator. The row carries source URL, last-checked timestamp, and confidence grade in the pricing index; the static page only reads that committed data.

Formula and methodology

perAttemptCost = (inputTokens x inputRate + outputTokens x outputRate) / 1e6, with negative token inputs clamped to zero. Failure probability p and re-bill factor w are clamped to [0,1]. Max attempts N is floored to an integer and clamped to 1..10 in the top-level calculator. Expected attempts per request with truncated retries: E[A] = (1 - p^N) / (1 - p) for 0 < p < 1; E[A] = 1 when p = 0 and E[A] = N when p = 1. retryMultiplier = 1 + (E[A] - 1) x w. retryFreeMonthly = requests x perAttemptCost. realMonthly = retryFreeMonthly x retryMultiplier. monthlyRetryWaste = realMonthly - retryFreeMonthly. Success probability after N attempts = 1 - p^N. Requests that ultimately fail are requests x p^N, and their failed cost is ultimateFailedRequests x perAttemptCost x (1 + (N - 1) x w). Attempt quantiles use the same truncated geometric convention as the code: for percentile q, tail = clamp(1 - q, Number.EPSILON, 1), attempts = floor(log(tail) / log(p)), then clamp to 1..N; p = 0 returns 1 and p = 1 returns N. The monthly P50/P95/P99 bands are display detail from the shared stochasticBands helper, not part of the reproducible core retry formula: multiplierForQuantile = 1 + (attemptQuantile - 1) x w; stochasticBands(retryFreeMonthly, p50Multiplier, p95Multiplier, p99Multiplier) returns {p50: base x p50Multiplier, p90: base x p95Multiplier, p99: base x p99Multiplier}, and the page labels that p90 value as P95. The module also returns engineSingleStepRetryFactor, a cross-check value from the shared cost engine's reliabilityRetryFactor helper; it is display detail outside the reproducible core retry formula. The kill-after table is reproducible display data: build 10 rows for attempts 1..10; each row uses successProbability(p, attempts), expectedAttempts(p, attempts), multiplier = 1 + (expectedAttempts - 1) x w, and monthlyCost = max(0, requestsPerMonth) x perAttemptCost x multiplier. For rows 1..9, marginalSuccessGain = p^attempts x (1 - p) and marginalCost = p^attempts x perAttemptCost x w; row 10 has null marginal fields. If valuePerSuccess is a finite number, marginalValue = marginalSuccessGain x max(0, valuePerSuccess); otherwise marginalValue is null. killAfter is the first row where marginalCost and marginalValue are both non-null and marginalCost > marginalValue.

Interpretation guide

Compare alternatives with the same workload assumptions.
Stress-test output-heavy, retry-heavy, cache-miss, and power-user cases before committing budget.
Verify source links and production logs before using the estimate for billing decisions.

Limitations before production billing decisions

Treat ByteCosts calculations as planning estimates, not final billing totals. Real invoices can differ because token mix, retry rate, cache hit rate, rate limits, taxes, gateway fees, regional pricing, and negotiated discounts change the effective cost.

Verify the provider source before production billing decisions, then compare the estimate with your own logs or invoice once production traffic is live.

Frequently asked questions

What counts as retry loop tax?

Any billable extra attempt caused by failures, invalid output, timeout retries, repair prompts, or fallback models counts as retry loop tax.

Can retries be worth the extra cost?

Yes, when they recover valuable work. The point of the calculator is to make the cost visible so reliability and product teams can decide where retries are justified.

Does the calculator assume every retry uses the same model?

No. The final calculator can model same-model retries and fallback paths separately because fallback calls often have a different token shape or price.

Retry Loop Tax Calculator. ByteCosts. https://bytecosts.com/tools/retry-loop-tax/

Sources

Machine-readable

Markdown mirror