LLM Token Counter for GPT, Claude & Gemini Prompts
Paste a prompt to estimate token count, API cost, and context-window usage across GPT-4 Turbo, GPT-4o, Claude Opus 4.7, Claude Sonnet, Claude Haiku, and Gemini 1.5 Pro / Flash. Free, with a heuristic tokenizer tuned to match real BPE within ±10%.
Tokens (est.)
52
~4 chars per token
Characters
159
Words
27
UTF-8 bytes
159
| Model | Input cost | Output cost | Total | Context use |
|---|---|---|---|---|
GPT-4 Turbo OpenAI · 128k context | $0.520/k | $0.0090 | $0.0095 | 0.04% |
GPT-4o OpenAI · 128k context | $0.130/k | $0.0030 | $0.0031 | 0.04% |
GPT-4o mini OpenAI · 128k context | $0.008/k | $0.180/k | $0.188/k | 0.04% |
Claude Opus 4.7 Anthropic · 1,000k context | $0.780/k | $0.0225 | $0.0233 | 0.01% |
Claude Sonnet 4.6 Anthropic · 200k context | $0.156/k | $0.0045 | $0.0047 | 0.03% |
Claude Haiku 4.5 Anthropic · 200k context | $0.013/k | $0.375/k | $0.388/k | 0.03% |
Gemini 1.5 Pro Google · 2,000k context | $0.065/k | $0.0015 | $0.0016 | 0.00% |
Gemini 1.5 Flash Google · 1,000k context | $0.004/k | $0.090/k | $0.094/k | 0.01% |
Token counts use a heuristic estimator that matches real BPE tokenizers (tiktoken, Claude's, Gemini's) within ±10% for typical prose. For exact billing, use the provider's official tokenizer endpoint or library.
Instant Token Estimate
Type or paste prompt text and the token count updates as you type. No submit, no API call, no waiting.
Cost Estimate Across Models
Per-model input + output cost in USD based on 2026 list pricing. Compare what GPT-4o vs Claude Sonnet would charge for the same prompt at a glance.
Context Window Bar
Visual indicator showing how much of each model's context window your prompt occupies — 200K for Claude, 128K for GPT-4 Turbo, 1M for Gemini Pro.
100% Client-Side
Your prompts never leave the browser. No fetch, no API key required. Works offline once the page is loaded.
LLM Token Counter: Estimate Prompt Tokens & API Cost
A token counter estimates how many tokens a prompt costs before you send it to an LLM API. Paste text and this tool returns an estimated token count, the input and output cost in USD, and the share of each model's context window your prompt fills — across GPT-4 Turbo, GPT-4o, GPT-4o mini, Claude Opus 4.7, Sonnet 4.6, Haiku 4.5, and Gemini 1.5 Pro / Flash. It runs 100% in your browser, free, with no upload.
How to count tokens in a prompt
- Paste or type your prompt into the textarea — the token, character, word, and byte counts update as you type, with no submit and no API call.
- Read the per-model cost table to compare what GPT-4o, Claude Sonnet, or Gemini Flash would charge for the same input.
- Drag the expected-response-length slider to project end-to-end cost, since output tokens are billed at a higher rate than input.
- Check the context-window utilization bar to see whether your prompt fits — 128K for GPT-4 Turbo, 200K for Claude Sonnet, up to 2M for Gemini 1.5 Pro.
- For exact billing, confirm the figure against the provider's official tokenizer; for budgeting and model selection, the estimate here is enough.
What is a token, and why do counts differ between models?
A token is a subword unit — not a character and not a word. Models split text into tokens with a Byte-Pair Encoding (BPE) algorithm that iteratively merges the most frequent character pairs into a fixed vocabulary. For English prose, the rule of thumb is roughly 4 characters or 0.75 words per token: common words like the are a single token, while a rare word like tokenization splits into several. Billing, context-window limits, rate quotas, and latency all scale with tokens, not characters — which is why every API expresses its limits in tokens.
The same text gets different counts on each model because each vendor ships a different tokenizer. GPT-4 / GPT-3.5 use cl100k_base (~100K-token vocabulary); GPT-4o moved to o200k_base (~200K), which adds code-friendly merges and can report 10–20% fewer tokens on a source file. Anthropic's Claude uses a separate SentencePiece-based tokenizer, and Google's Gemini uses a SentencePiece Unigram model with a ~256K vocabulary. You can confirm OpenAI's exact behavior in the open-source tiktoken BPE tokenizer repository.
This tool is a heuristic estimator, not a vocabulary-driven tokenizer. It scans the text in segments — words (≤4 chars = 1 token, 5–8 chars = 2 tokens, longer = 1 token per 4 chars), digit runs (1 token per ~3 digits), newlines (1 token each), and punctuation (1–2 tokens) — tuned to land within ±10% of the real tokenizers for typical prose. Use it for cost and context planning, not exact billing.
Worked examples: text → estimated tokens
Short English sentence
"The quick brown fox jumps over the lazy dog." → ~10 tokens (44 chars, 9 words)
A long number
"1234567890" → ~4 tokens (10 digits, split into ~3-digit runs — not 1 token)
Source code
"const x = arr.map((i) => i * 2);" → denser than prose; symbols and operators each cost tokens
Edge case · chat-message framing
This tool counts the text body only. A real chat-completion call adds ~3 framing tokens per message (role markers and separators) that are billed but not shown here. For batch-cost planning, multiply the estimate by 1.05–1.10 to cover that overhead.
Token density by content type
These ratios drive the estimate and explain why the same character count costs different token amounts. Code and structured data pack more tokens per character than prose; CJK scripts cost the most.
| Content Type | Typical Density | Notes |
|---|---|---|
| English prose | ~4 chars/token | Most accurate region; estimate within ±5-10% of real tokenizer. |
| Code (Python, JS) | ~3-3.5 chars/token | Symbols and indentation push token density higher. Use Code-tuned estimate when needed. |
| CJK languages | ~1-1.5 chars/token | Chinese/Japanese/Korean characters often each become individual tokens. |
| JSON / structured | ~3.5 chars/token | Brackets, quotes, and field separators add tokens. |
| Numbers | ~3 digits/token | Modern tokenizers split long numbers; "2026" is typically 1-2 tokens. |
Token budget by prompt component
| Component | Typical Range | Notes |
|---|---|---|
| System prompt | 500-2000 tokens | Fixed overhead per request; counts toward input cost. |
| User message | 50-5000 tokens | Varies wildly. Long context tasks (whole-document summarization) easily exceed 10k. |
| Assistant response | 100-2000 tokens | Output tokens are 2-4× more expensive than input. |
| Few-shot examples | 500-5000 tokens | Three good examples often outperform ten mediocre ones. |
| Retrieved chunks (RAG) | 500-50000 tokens | The biggest variable in most production systems. |
Total prompt cost = input tokens × input rate + output tokens × output rate. Output is typically 3–5× more expensive than input, so capping the response is the highest-leverage cost optimization.
The pricing trap most cost calculators miss
Output tokens are not priced like input tokens — they cost 3–5× more. As of Q1 2026, GPT-4o is $2.50/1M input but $10.00/1M output (4×), and Claude Opus 4.7 is $15.00/1M input versus $75.00/1M output (5×). A calculator that prices a whole conversation at the input rate can understate the bill by several times. That is why this tool keeps input and output separate and exposes a response-length slider — the response is usually where the money goes.
The second trap is language. English averages ~4 chars/token, but Chinese, Japanese, and Korean often run ~1–1.5 chars/token because each CJK character tends to become its own token. The same content can cost 2–3× more tokens — and dollars — in CJK than in English. Budget for the language you actually ship in, not the one you prototyped in.
How to lower token cost
Compress system prompts
Every request re-reads the full system prompt. Trimming 500 tokens compounds across millions of calls. Prefer bullet lists over prose.
Cap response length
Output costs 3–5× input. Set max_tokens and instruct the model: "Answer in 1–2 sentences."
Cache static context
Anthropic and OpenAI both offer prompt caching at a 50–90% discount for repeated prefixes and few-shot examples.
Right-size the model
GPT-4o mini, Claude Haiku 4.5, and Gemini Flash are 10–60× cheaper and often enough for extraction or classification.
Runs 100% in your browser
Your prompt never leaves your device. The entire calculation — token estimate, cost, byte count, and context-window bar — runs in JavaScript inside your tab, with no fetch, no XHR, and no API key. You can verify this by opening DevTools → Network and typing into the box: no requests fire. I tested the estimator against short sentences, long numbers, symbol-heavy code, and multi-thousand-token documents; it tracks tiktoken closely on prose and runs out wider on dense code, which is exactly where a heuristic should be honest about its margin.
Frequently asked questions
Is this LLM token counter free?
Yes — 100% free with no signup, no API key, and no usage cap. Every estimate runs in your browser, so there is nothing to pay for and no quota to hit.
How accurate is the estimate?
For typical English prose it lands within ±5–10% of the real tokenizer. Accuracy is best for natural-language text, looser for symbol-heavy code (±15%), and least reliable for non-Latin scripts and very short prompts. Use the provider's official tokenizer for exact billing.
Why does the same text show different counts in different tools?
Each model family ships a different tokenizer — GPT-4 uses cl100k_base, GPT-4o uses o200k_base, Claude and Gemini use their own vocabularies. Some tools also include chat-message framing tokens; others count only the body.
Are input and output tokens priced the same?
No. Output almost always costs more — GPT-4o is 4× and Claude Opus 4.7 is 5× the input rate. The response length matters as much as the prompt, so this tool separates the two and adds a response-length slider.
Is my prompt sent anywhere?
No. The whole calculation runs in JavaScript in your browser tab — no fetch, no API call. Confirm it in DevTools → Network: typing fires zero requests. The same guarantee holds for confidential business prompts.
Related developer & web tools
Count words, characters & sentences
JSON FormatterInspect API request/response bodies
JavaScript MinifierShrink code before pasting into a prompt
HTML FormatterTidy markup pasted into context
CSS MinifierCompress CSS to save prompt tokens
HTML Entity ConverterEncode & decode entities for clean text
HTML to JSXConvert markup for React snippets
Markdown TOCGenerate a table of contents
Meta Tag PreviewerPreview title & description tags
Open Graph GeneratorBuild OG social-share tags
Favicon GeneratorCreate multi-size favicons
Image CompressorShrink images in the browser
Guide: Open Graph Meta TagsRead the social-metadata guide
All ToolsBrowse the full Toolk hub
Last updated: June 2, 2026 · Runs 100% in your browser — no uploads, nothing leaves your device.
Need a different tool?
Browse all 89 free, in-browser tools — or tell us what we should build next.