Free LLM Token Counter for GPT, Claude & Gemini Prompts
Paste any prompt and instantly estimate token count, API cost, and context-window utilization across GPT-4 Turbo, GPT-4o, Claude Opus 4.7, Claude Sonnet, Claude Haiku, and Gemini 1.5 Pro / Flash. Heuristic tokenizer tuned to match real BPE within ±10%.
Tokens (est.)
52
~4 chars per token
Characters
159
Words
27
UTF-8 bytes
159
| Model | Input cost | Output cost | Total | Context use |
|---|---|---|---|---|
GPT-4 Turbo OpenAI · 128k context | $0.520/k | $0.0090 | $0.0095 | 0.04% |
GPT-4o OpenAI · 128k context | $0.130/k | $0.0030 | $0.0031 | 0.04% |
GPT-4o mini OpenAI · 128k context | $0.008/k | $0.180/k | $0.188/k | 0.04% |
Claude Opus 4.7 Anthropic · 1,000k context | $0.780/k | $0.0225 | $0.0233 | 0.01% |
Claude Sonnet 4.6 Anthropic · 200k context | $0.156/k | $0.0045 | $0.0047 | 0.03% |
Claude Haiku 4.5 Anthropic · 200k context | $0.013/k | $0.375/k | $0.388/k | 0.03% |
Gemini 1.5 Pro Google · 2,000k context | $0.065/k | $0.0015 | $0.0016 | 0.00% |
Gemini 1.5 Flash Google · 1,000k context | $0.004/k | $0.090/k | $0.094/k | 0.01% |
Token counts use a heuristic estimator that matches real BPE tokenizers (tiktoken, Claude's, Gemini's) within ±10% for typical prose. For exact billing, use the provider's official tokenizer endpoint or library.
Instant Token Estimate
Type or paste prompt text and the token count updates as you type. No submit, no API call, no waiting.
Cost Estimate Across Models
Per-model input + output cost in USD based on 2026 list pricing. Compare what GPT-4o vs Claude Sonnet would charge for the same prompt at a glance.
Context Window Bar
Visual indicator showing how much of each model's context window your prompt occupies — 200K for Claude, 128K for GPT-4 Turbo, 1M for Gemini Pro.
100% Client-Side
Your prompts never leave the browser. No fetch, no analytics event, no API key required. Works offline once the page is loaded.
Token Counts Drive Every Decision in LLM Pricing & Context Planning
Every API call to an LLM is metered in tokens — the subword units a Byte-Pair Encoding tokenizer produces from your prompt. Billing, context-window limits, latency, and rate quotas are all expressed in tokens. Yet most prompt engineering happens by feel: paste, run, see what it costs. Our Free LLM Token Counter brings that math forward, so you can see an estimate before you hit submit. It covers the eight frontier models you are most likely to be choosing between in 2026 — OpenAI's GPT-4 Turbo / 4o / 4o mini, Anthropic's Claude Opus 4.7 / Sonnet 4.6 / Haiku 4.5, and Google's Gemini 1.5 Pro / Flash — with per-model cost and context-window utilization.
Pair this with our Word Counter (text statistics for human readability), JSON Formatter (inspect API request and response bodies), Regex Tester (build pre-processing patterns for prompt cleanup), and the JWT Decoder (debug Bearer-token issues when calling LLM APIs).
Token Density by Content Type
| Content Type | Typical Density | Notes |
|---|---|---|
| English prose | ~4 chars/token | Most accurate region; estimate within ±5-10% of real tokenizer. |
| Code (Python, JS) | ~3-3.5 chars/token | Symbols and indentation push token density higher. Use Code-tuned estimate when needed. |
| CJK languages | ~1-1.5 chars/token | Chinese/Japanese/Korean characters often each become individual tokens. |
| JSON / structured | ~3.5 chars/token | Brackets, quotes, and field separators add tokens. |
| Numbers | ~3 digits/token | Modern tokenizers split long numbers; "2026" is typically 1-2 tokens. |
Token Budget by Prompt Component
| Component | Typical Range | Notes |
|---|---|---|
| System prompt | 500-2000 tokens | Fixed overhead per request; counts toward input cost. |
| User message | 50-5000 tokens | Varies wildly. Long context tasks (whole-document summarization) easily exceed 10k. |
| Assistant response | 100-2000 tokens | Output tokens are 2-4× more expensive than input. |
| Few-shot examples | 500-5000 tokens | Three good examples often outperform ten mediocre ones. |
| Retrieved chunks (RAG) | 500-50000 tokens | The biggest variable in most production systems. |
Total prompt cost = input tokens × input rate + output tokens × output rate. Output rate is typically 3-5× higher than input rate, so brevity in the response is the highest-leverage cost optimization.
Token-Efficient Prompt Engineering
1. Compress System Prompts
Every request reads the full system prompt — even a 500-token reduction across millions of calls compounds. Strip filler words, use abbreviations the model already knows, prefer bullet lists over prose.
2. Cap Response Length
Output is 3-5× more expensive than input. Use the max_tokens parameter to prevent runaway responses; instruct the model explicitly: "Answer in 1-2 sentences."
3. Cache Static Context
Anthropic and OpenAI both support prompt caching at 50-90% discount for repeated system prompts and few-shot examples. Use it when the same prefix repeats across many requests.
4. Right-Size the Model
GPT-4o mini, Claude Haiku 4.5, and Gemini Flash are 10-60× cheaper than the frontier-tier models and often sufficient for classification, extraction, and simple Q&A. Measure quality on your task before defaulting to Opus / GPT-4 Turbo.