Question 1

How accurate is this token counter?

Accepted Answer

For typical English prose, our heuristic estimator matches the real tokenizer (tiktoken cl100k_base for GPT-4, Anthropic's tokenizer for Claude, Gemini's tokenizer) within ±5-10%. Accuracy is best for natural-language text, slightly worse for source code (which has more symbols and tokens per character), and least accurate for non-Latin scripts and very short prompts (where rounding errors dominate). For exact billing purposes, use the provider's official tokenizer (OpenAI's tiktoken library, Anthropic's `count_tokens` endpoint). For prompt budgeting and cost estimation, this tool is sufficient.

Question 2

Why are tokens used instead of characters?

Accepted Answer

Modern LLMs do not process raw characters — they process tokens, which are subword units produced by a Byte-Pair Encoding (BPE) algorithm. Common words like "the" are one token; uncommon words like "tokenization" may split into 3-4 tokens. This is why a 100-character prompt is roughly 25 tokens but a 100-character SQL query might be 30+ tokens. Billing, context-window limits, and inference latency all scale with tokens, not characters — which is why every API expresses its limits in tokens.

Question 3

Are input and output tokens priced the same?

Accepted Answer

No — output tokens almost always cost more than input. As of 2026: GPT-4o input is $2.50/1M, output is $10.00/1M (4×). Claude Opus 4.7 input is $15.00/1M, output is $75.00/1M (5×). The pricing asymmetry reflects compute cost — generating tokens requires more inference time per token than processing them as input. When estimating cost, the response length matters as much as the prompt length; our calculator separates the two.

Question 4

How big is the context window for each model?

Accepted Answer

In 2026: GPT-4 Turbo and GPT-4o have 128K-token windows; Claude Sonnet 4.6 has 200K; Claude Opus 4.7 has 1M (with a separate 200K standard window); Gemini 1.5 Flash has 1M; Gemini 1.5 Pro has 2M. "Context window" is the maximum combined size of your prompt plus the response — if you send 100K tokens of input and ask for 30K tokens of output, you need at least a 130K-token window. Some models charge a premium for the "long-context" SKU above a threshold (e.g. Claude Opus charges more for queries that use the 1M variant).

Question 5

Can I use this to count tokens for OpenAI batch jobs?

Accepted Answer

Yes for estimation — but the actual count for billing comes from OpenAI's tokenizer applied to your exact JSON payload, which includes role markers and message-structure overhead. Each chat-completion message has a fixed ~3-token framing cost ("
" + role-marker tokens) that this tool does not include. For batch-cost planning, multiply the result here by 1.05-1.10 to account for that overhead.

Question 6

Does this work for code prompts?

Accepted Answer

Yes, but with reduced accuracy. Source code (JavaScript, Python, JSON) typically tokenizes at 3-3.5 characters per token vs 4 for English prose — symbols, indentation, and identifiers split more aggressively. Our estimator catches most of this automatically through its punctuation/symbol weighting, but you may see ±15% variance for very symbol-heavy code. For accurate code token counts, run the provider's real tokenizer on the exact code string.

Question 7

How is token count affected by language?

Accepted Answer

Dramatically. English averages ~4 chars/token. French, German, Spanish are similar (~3.5-4 chars/token, with slight overhead for accented characters). Russian and Arabic are typically ~2 chars/token because of Unicode encoding. Chinese, Japanese, and Korean often have ~1-1.5 chars/token because individual CJK characters are each a token (or split into two). Same content costs 2-3× more tokens (and thus dollars) in CJK languages than in English. Plan budgets accordingly.

Question 8

Why does the same text show different token counts in different tools?

Accepted Answer

Three reasons: (1) Each model family uses a different tokenizer — GPT-4 uses cl100k_base, GPT-3.5 used p50k_base, Claude uses a different vocabulary, Gemini uses yet another. The same text gets different counts across them. (2) Some tools include chat-message framing tokens (role markers, separators); others count only the text body. (3) Different tools use different approximation methods — vocabulary-driven tokenizers vs heuristic estimators like ours.

Question 9

Is my prompt sent anywhere?

Accepted Answer

No. The entire calculation runs in JavaScript inside your browser tab. There is no fetch, no XHR, no analytics event, no API call. Your prompts are private. You can verify by opening DevTools → Network, typing into the textarea, and observing that no requests are made. The same privacy guarantee applies whether you are exploring a public prompt or working with a confidential business document.

Model	Input cost	Output cost	Total	Context use
GPT-4 Turbo OpenAI · 128k context	$0.520/k	$0.0090	$0.0095	0.04%
GPT-4o OpenAI · 128k context	$0.130/k	$0.0030	$0.0031	0.04%
GPT-4o mini OpenAI · 128k context	$0.008/k	$0.180/k	$0.188/k	0.04%
Claude Opus 4.7 Anthropic · 1,000k context	$0.780/k	$0.0225	$0.0233	0.01%
Claude Sonnet 4.6 Anthropic · 200k context	$0.156/k	$0.0045	$0.0047	0.03%
Claude Haiku 4.5 Anthropic · 200k context	$0.013/k	$0.375/k	$0.388/k	0.03%
Gemini 1.5 Pro Google · 2,000k context	$0.065/k	$0.0015	$0.0016	0.00%
Gemini 1.5 Flash Google · 1,000k context	$0.004/k	$0.090/k	$0.094/k	0.01%

Content Type	Typical Density	Notes
English prose	~4 chars/token	Most accurate region; estimate within ±5-10% of real tokenizer.
Code (Python, JS)	~3-3.5 chars/token	Symbols and indentation push token density higher. Use Code-tuned estimate when needed.
CJK languages	~1-1.5 chars/token	Chinese/Japanese/Korean characters often each become individual tokens.
JSON / structured	~3.5 chars/token	Brackets, quotes, and field separators add tokens.
Numbers	~3 digits/token	Modern tokenizers split long numbers; "2026" is typically 1-2 tokens.

Component	Typical Range	Notes
System prompt	500-2000 tokens	Fixed overhead per request; counts toward input cost.
User message	50-5000 tokens	Varies wildly. Long context tasks (whole-document summarization) easily exceed 10k.
Assistant response	100-2000 tokens	Output tokens are 2-4× more expensive than input.
Few-shot examples	500-5000 tokens	Three good examples often outperform ten mediocre ones.
Retrieved chunks (RAG)	500-50000 tokens	The biggest variable in most production systems.

Free LLM Token Counter for GPT, Claude & Gemini Prompts

Instant Token Estimate

Cost Estimate Across Models

Context Window Bar

100% Client-Side

Token Counts Drive Every Decision in LLM Pricing & Context Planning

Token Density by Content Type

Token Budget by Prompt Component

Token-Efficient Prompt Engineering

Related Developer Utilities