Skip to main content

LLM Token Counter for GPT, Claude & Gemini Prompts

Paste a prompt to estimate token count, API cost, and context-window usage across GPT-4 Turbo, GPT-4o, Claude Opus 4.7, Claude Sonnet, Claude Haiku, and Gemini 1.5 Pro / Flash. Free, with a heuristic tokenizer tuned to match real BPE within ±10%.

Tokens (est.)

52

~4 chars per token

Characters

159

Words

27

UTF-8 bytes

159

Expected response length:300 tokens out
Cost & context-window utilization per model
ModelInput costOutput costTotalContext use
GPT-4 Turbo
OpenAI · 128k context
$0.520/k$0.0090$0.0095
0.04%
GPT-4o
OpenAI · 128k context
$0.130/k$0.0030$0.0031
0.04%
GPT-4o mini
OpenAI · 128k context
$0.008/k$0.180/k$0.188/k
0.04%
Claude Opus 4.7
Anthropic · 1,000k context
$0.780/k$0.0225$0.0233
0.01%
Claude Sonnet 4.6
Anthropic · 200k context
$0.156/k$0.0045$0.0047
0.03%
Claude Haiku 4.5
Anthropic · 200k context
$0.013/k$0.375/k$0.388/k
0.03%
Gemini 1.5 Pro
Google · 2,000k context
$0.065/k$0.0015$0.0016
0.00%
Gemini 1.5 Flash
Google · 1,000k context
$0.004/k$0.090/k$0.094/k
0.01%
Costs are estimates based on Q1 2026 published list prices for the per-API-call rate. Volume discounts, prompt caching, and batch APIs can reduce real cost significantly.

Token counts use a heuristic estimator that matches real BPE tokenizers (tiktoken, Claude's, Gemini's) within ±10% for typical prose. For exact billing, use the provider's official tokenizer endpoint or library.

Instant Token Estimate

Type or paste prompt text and the token count updates as you type. No submit, no API call, no waiting.

Cost Estimate Across Models

Per-model input + output cost in USD based on 2026 list pricing. Compare what GPT-4o vs Claude Sonnet would charge for the same prompt at a glance.

Context Window Bar

Visual indicator showing how much of each model's context window your prompt occupies — 200K for Claude, 128K for GPT-4 Turbo, 1M for Gemini Pro.

100% Client-Side

Your prompts never leave the browser. No fetch, no API key required. Works offline once the page is loaded.

LLM Token Counter: Estimate Prompt Tokens & API Cost

A token counter estimates how many tokens a prompt costs before you send it to an LLM API. Paste text and this tool returns an estimated token count, the input and output cost in USD, and the share of each model's context window your prompt fills — across GPT-4 Turbo, GPT-4o, GPT-4o mini, Claude Opus 4.7, Sonnet 4.6, Haiku 4.5, and Gemini 1.5 Pro / Flash. It runs 100% in your browser, free, with no upload.

How to count tokens in a prompt

  1. Paste or type your prompt into the textarea — the token, character, word, and byte counts update as you type, with no submit and no API call.
  2. Read the per-model cost table to compare what GPT-4o, Claude Sonnet, or Gemini Flash would charge for the same input.
  3. Drag the expected-response-length slider to project end-to-end cost, since output tokens are billed at a higher rate than input.
  4. Check the context-window utilization bar to see whether your prompt fits — 128K for GPT-4 Turbo, 200K for Claude Sonnet, up to 2M for Gemini 1.5 Pro.
  5. For exact billing, confirm the figure against the provider's official tokenizer; for budgeting and model selection, the estimate here is enough.

What is a token, and why do counts differ between models?

A token is a subword unit — not a character and not a word. Models split text into tokens with a Byte-Pair Encoding (BPE) algorithm that iteratively merges the most frequent character pairs into a fixed vocabulary. For English prose, the rule of thumb is roughly 4 characters or 0.75 words per token: common words like the are a single token, while a rare word like tokenization splits into several. Billing, context-window limits, rate quotas, and latency all scale with tokens, not characters — which is why every API expresses its limits in tokens.

The same text gets different counts on each model because each vendor ships a different tokenizer. GPT-4 / GPT-3.5 use cl100k_base (~100K-token vocabulary); GPT-4o moved to o200k_base (~200K), which adds code-friendly merges and can report 10–20% fewer tokens on a source file. Anthropic's Claude uses a separate SentencePiece-based tokenizer, and Google's Gemini uses a SentencePiece Unigram model with a ~256K vocabulary. You can confirm OpenAI's exact behavior in the open-source tiktoken BPE tokenizer repository.

This tool is a heuristic estimator, not a vocabulary-driven tokenizer. It scans the text in segments — words (≤4 chars = 1 token, 5–8 chars = 2 tokens, longer = 1 token per 4 chars), digit runs (1 token per ~3 digits), newlines (1 token each), and punctuation (1–2 tokens) — tuned to land within ±10% of the real tokenizers for typical prose. Use it for cost and context planning, not exact billing.

Worked examples: text → estimated tokens

Short English sentence

"The quick brown fox jumps over the lazy dog." → ~10 tokens (44 chars, 9 words)

A long number

"1234567890" → ~4 tokens (10 digits, split into ~3-digit runs — not 1 token)

Source code

"const x = arr.map((i) => i * 2);" → denser than prose; symbols and operators each cost tokens

Edge case · chat-message framing

This tool counts the text body only. A real chat-completion call adds ~3 framing tokens per message (role markers and separators) that are billed but not shown here. For batch-cost planning, multiply the estimate by 1.05–1.10 to cover that overhead.

Token density by content type

These ratios drive the estimate and explain why the same character count costs different token amounts. Code and structured data pack more tokens per character than prose; CJK scripts cost the most.

Content TypeTypical DensityNotes
English prose~4 chars/tokenMost accurate region; estimate within ±5-10% of real tokenizer.
Code (Python, JS)~3-3.5 chars/tokenSymbols and indentation push token density higher. Use Code-tuned estimate when needed.
CJK languages~1-1.5 chars/tokenChinese/Japanese/Korean characters often each become individual tokens.
JSON / structured~3.5 chars/tokenBrackets, quotes, and field separators add tokens.
Numbers~3 digits/tokenModern tokenizers split long numbers; "2026" is typically 1-2 tokens.

Token budget by prompt component

ComponentTypical RangeNotes
System prompt500-2000 tokensFixed overhead per request; counts toward input cost.
User message50-5000 tokensVaries wildly. Long context tasks (whole-document summarization) easily exceed 10k.
Assistant response100-2000 tokensOutput tokens are 2-4× more expensive than input.
Few-shot examples500-5000 tokensThree good examples often outperform ten mediocre ones.
Retrieved chunks (RAG)500-50000 tokensThe biggest variable in most production systems.

Total prompt cost = input tokens × input rate + output tokens × output rate. Output is typically 3–5× more expensive than input, so capping the response is the highest-leverage cost optimization.

The pricing trap most cost calculators miss

Output tokens are not priced like input tokens — they cost 3–5× more. As of Q1 2026, GPT-4o is $2.50/1M input but $10.00/1M output (4×), and Claude Opus 4.7 is $15.00/1M input versus $75.00/1M output (5×). A calculator that prices a whole conversation at the input rate can understate the bill by several times. That is why this tool keeps input and output separate and exposes a response-length slider — the response is usually where the money goes.

The second trap is language. English averages ~4 chars/token, but Chinese, Japanese, and Korean often run ~1–1.5 chars/token because each CJK character tends to become its own token. The same content can cost 2–3× more tokens — and dollars — in CJK than in English. Budget for the language you actually ship in, not the one you prototyped in.

How to lower token cost

Compress system prompts

Every request re-reads the full system prompt. Trimming 500 tokens compounds across millions of calls. Prefer bullet lists over prose.

Cap response length

Output costs 3–5× input. Set max_tokens and instruct the model: "Answer in 1–2 sentences."

Cache static context

Anthropic and OpenAI both offer prompt caching at a 50–90% discount for repeated prefixes and few-shot examples.

Right-size the model

GPT-4o mini, Claude Haiku 4.5, and Gemini Flash are 10–60× cheaper and often enough for extraction or classification.

Runs 100% in your browser

Your prompt never leaves your device. The entire calculation — token estimate, cost, byte count, and context-window bar — runs in JavaScript inside your tab, with no fetch, no XHR, and no API key. You can verify this by opening DevTools → Network and typing into the box: no requests fire. I tested the estimator against short sentences, long numbers, symbol-heavy code, and multi-thousand-token documents; it tracks tiktoken closely on prose and runs out wider on dense code, which is exactly where a heuristic should be honest about its margin.

Frequently asked questions

Is this LLM token counter free?

Yes — 100% free with no signup, no API key, and no usage cap. Every estimate runs in your browser, so there is nothing to pay for and no quota to hit.

How accurate is the estimate?

For typical English prose it lands within ±5–10% of the real tokenizer. Accuracy is best for natural-language text, looser for symbol-heavy code (±15%), and least reliable for non-Latin scripts and very short prompts. Use the provider's official tokenizer for exact billing.

Why does the same text show different counts in different tools?

Each model family ships a different tokenizer — GPT-4 uses cl100k_base, GPT-4o uses o200k_base, Claude and Gemini use their own vocabularies. Some tools also include chat-message framing tokens; others count only the body.

Are input and output tokens priced the same?

No. Output almost always costs more — GPT-4o is 4× and Claude Opus 4.7 is 5× the input rate. The response length matters as much as the prompt, so this tool separates the two and adds a response-length slider.

Is my prompt sent anywhere?

No. The whole calculation runs in JavaScript in your browser tab — no fetch, no API call. Confirm it in DevTools → Network: typing fires zero requests. The same guarantee holds for confidential business prompts.

Last updated: June 2, 2026 · Runs 100% in your browser — no uploads, nothing leaves your device.

Need a different tool?

Browse all 89 free, in-browser tools — or tell us what we should build next.

Browse all tools