Skip to main content

Free LLM Token Counter for GPT, Claude & Gemini Prompts

Paste any prompt and instantly estimate token count, API cost, and context-window utilization across GPT-4 Turbo, GPT-4o, Claude Opus 4.7, Claude Sonnet, Claude Haiku, and Gemini 1.5 Pro / Flash. Heuristic tokenizer tuned to match real BPE within ±10%.

Tokens (est.)

52

~4 chars per token

Characters

159

Words

27

UTF-8 bytes

159

Expected response length:300 tokens out
Cost & context-window utilization per model
ModelInput costOutput costTotalContext use
GPT-4 Turbo
OpenAI · 128k context
$0.520/k$0.0090$0.0095
0.04%
GPT-4o
OpenAI · 128k context
$0.130/k$0.0030$0.0031
0.04%
GPT-4o mini
OpenAI · 128k context
$0.008/k$0.180/k$0.188/k
0.04%
Claude Opus 4.7
Anthropic · 1,000k context
$0.780/k$0.0225$0.0233
0.01%
Claude Sonnet 4.6
Anthropic · 200k context
$0.156/k$0.0045$0.0047
0.03%
Claude Haiku 4.5
Anthropic · 200k context
$0.013/k$0.375/k$0.388/k
0.03%
Gemini 1.5 Pro
Google · 2,000k context
$0.065/k$0.0015$0.0016
0.00%
Gemini 1.5 Flash
Google · 1,000k context
$0.004/k$0.090/k$0.094/k
0.01%
Costs are estimates based on Q1 2026 published list prices for the per-API-call rate. Volume discounts, prompt caching, and batch APIs can reduce real cost significantly.

Token counts use a heuristic estimator that matches real BPE tokenizers (tiktoken, Claude's, Gemini's) within ±10% for typical prose. For exact billing, use the provider's official tokenizer endpoint or library.

Instant Token Estimate

Type or paste prompt text and the token count updates as you type. No submit, no API call, no waiting.

Cost Estimate Across Models

Per-model input + output cost in USD based on 2026 list pricing. Compare what GPT-4o vs Claude Sonnet would charge for the same prompt at a glance.

Context Window Bar

Visual indicator showing how much of each model's context window your prompt occupies — 200K for Claude, 128K for GPT-4 Turbo, 1M for Gemini Pro.

100% Client-Side

Your prompts never leave the browser. No fetch, no analytics event, no API key required. Works offline once the page is loaded.

Token Counts Drive Every Decision in LLM Pricing & Context Planning

Every API call to an LLM is metered in tokens — the subword units a Byte-Pair Encoding tokenizer produces from your prompt. Billing, context-window limits, latency, and rate quotas are all expressed in tokens. Yet most prompt engineering happens by feel: paste, run, see what it costs. Our Free LLM Token Counter brings that math forward, so you can see an estimate before you hit submit. It covers the eight frontier models you are most likely to be choosing between in 2026 — OpenAI's GPT-4 Turbo / 4o / 4o mini, Anthropic's Claude Opus 4.7 / Sonnet 4.6 / Haiku 4.5, and Google's Gemini 1.5 Pro / Flash — with per-model cost and context-window utilization.

Pair this with our Word Counter (text statistics for human readability), JSON Formatter (inspect API request and response bodies), Regex Tester (build pre-processing patterns for prompt cleanup), and the JWT Decoder (debug Bearer-token issues when calling LLM APIs).

Token Density by Content Type

Content TypeTypical DensityNotes
English prose~4 chars/tokenMost accurate region; estimate within ±5-10% of real tokenizer.
Code (Python, JS)~3-3.5 chars/tokenSymbols and indentation push token density higher. Use Code-tuned estimate when needed.
CJK languages~1-1.5 chars/tokenChinese/Japanese/Korean characters often each become individual tokens.
JSON / structured~3.5 chars/tokenBrackets, quotes, and field separators add tokens.
Numbers~3 digits/tokenModern tokenizers split long numbers; "2026" is typically 1-2 tokens.

Token Budget by Prompt Component

ComponentTypical RangeNotes
System prompt500-2000 tokensFixed overhead per request; counts toward input cost.
User message50-5000 tokensVaries wildly. Long context tasks (whole-document summarization) easily exceed 10k.
Assistant response100-2000 tokensOutput tokens are 2-4× more expensive than input.
Few-shot examples500-5000 tokensThree good examples often outperform ten mediocre ones.
Retrieved chunks (RAG)500-50000 tokensThe biggest variable in most production systems.

Total prompt cost = input tokens × input rate + output tokens × output rate. Output rate is typically 3-5× higher than input rate, so brevity in the response is the highest-leverage cost optimization.

Token-Efficient Prompt Engineering

1. Compress System Prompts

Every request reads the full system prompt — even a 500-token reduction across millions of calls compounds. Strip filler words, use abbreviations the model already knows, prefer bullet lists over prose.

2. Cap Response Length

Output is 3-5× more expensive than input. Use the max_tokens parameter to prevent runaway responses; instruct the model explicitly: "Answer in 1-2 sentences."

3. Cache Static Context

Anthropic and OpenAI both support prompt caching at 50-90% discount for repeated system prompts and few-shot examples. Use it when the same prefix repeats across many requests.

4. Right-Size the Model

GPT-4o mini, Claude Haiku 4.5, and Gemini Flash are 10-60× cheaper than the frontier-tier models and often sufficient for classification, extraction, and simple Q&A. Measure quality on your task before defaulting to Opus / GPT-4 Turbo.

Free LLM Token Counter: GPT-4, Claude, Gemini Prompt Cost Estimator | Toolk