Skip to main content
Back to all posts

Security

Base64 Encoding: Complete Developer's Guide with Examples

Everything you need to know about Base64 encoding and decoding. Learn when to use it, performance considerations, and security implications in modern web applications.

SSecurity Team10 min read

Base64 is one of the most quietly important formats on the modern web. It's what lets you embed a 32-byte cryptographic key inside a JSON body, paste a 4 KB icon into a CSS file, or stream a binary photo through an XML payload that refuses to carry anything but text. It is also one of the most misunderstood — frequently confused with encryption, blamed for performance problems that are not its fault, and used in places where a raw byte stream would do the job better.

This guide goes deeper than the textbook explanation. You'll learn the byte-level mechanics, the variants you'll actually encounter in production (standard, URL-safe, MIME), the real cost of using Base64 in HTTP/JSON pipelines, and the security pitfalls — including the one that has shipped silently inside thousands of codebases for a decade.

Understanding Base64 Encoding

Base64 is a binary-to-text encoding that maps every 3 bytes of input to 4 ASCII characters drawn from a 64-character alphabet (A-Z, a-z, 0-9, +, /), with = used as a trailing pad. It is defined formally in RFC 4648.

The mechanics are deliberately simple:

  1. Take 3 input bytes — that's 24 bits.
  2. Split those 24 bits into four 6-bit groups.
  3. Look up each 6-bit group in the alphabet table to get one ASCII character.

Because 6 bits gives you 2⁶ = 64 possible values, the encoded text is safe to transmit through any system that can move ASCII — email bodies, JSON strings, URL query parameters, HTTP headers, XML attributes, terminal output.

A worked example

Encode the three bytes of the ASCII string Cat:

'C' = 0x43 = 01000011
'a' = 0x61 = 01100001
't' = 0x74 = 01110100

Concatenated:   01000011 01100001 01110100
Re-grouped (6): 010000  110110  000101  110100
Decimal:        16      54      5       52
Base64 chars:   Q       2       F       0

So Cat encodes to Q2F0. You can verify this in two lines:

btoa("Cat");                 // "Q2F0"
Buffer.from("Cat").toString("base64"); // "Q2F0"

When the input length isn't a multiple of 3, Base64 pads the encoded output with = so the output length is always a multiple of 4. Encoding Ca produces Q2E=; encoding C produces Qw==. The padding is a structural marker, not data — but, as you'll see in the security section below, it has tripped up a remarkable number of implementations.

The three variants you'll actually meet

| Variant | Alphabet 62 / 63 | Where it lives | |---------|----------------------|----------------| | Standard (RFC 4648 §4) | + / | JSON bodies, MIME, data: URLs, localStorage | | URL-safe (RFC 4648 §5) | - _ | JWTs, query strings, filenames, OAuth tokens | | MIME (RFC 2045) | + / | Email — wraps at 76 chars with CRLF |

The three alphabets only differ in the last two characters, which means a single-character substitution can convert between them. If you've ever decoded a JWT in btoa() and gotten a "malformed" error, this is why — JWTs use URL-safe Base64 and frequently drop the = padding. To turn a JWT segment into something atob() accepts:

const fromBase64Url = (s) =>
  s.replace(/-/g, "+").replace(/_/g, "/").padEnd(s.length + ((4 - s.length % 4) % 4), "=");

atob(fromBase64Url("eyJhbGciOiJIUzI1NiJ9")); // '{"alg":"HS256"}'

If you'd rather not roll your own, the JWT decoder handles header, payload, and signature splitting plus Base64URL normalisation in-browser.

When to Use Base64

The cleanest mental model: Base64 is a tax you pay to move binary data through a text-only channel. Use it when the channel demands text. Don't use it otherwise.

Good fits:

  • JSON payloads carrying binary. REST and GraphQL bodies are JSON, JSON values are strings, strings are UTF-8. Any time you need to ship a file digest, a thumbnail, a public key, or an encrypted blob inside a JSON body, Base64 is the path of least resistance.
  • data: URLs for tiny inline assets. A 600-byte SVG icon as data:image/svg+xml;base64,… saves a round trip and is fine inline. Above ~4 KB the cost of the inflated CSS file usually outweighs the saved request — measure before you commit. The Image to Base64 tool produces correctly-typed data: URLs you can paste straight into CSS.
  • Tokens in URLs. JWTs, OAuth state parameters, signed download links — Base64URL keeps the token URL-safe without percent-encoding overhead.
  • Email attachments. SMTP is a 7-bit protocol; MIME Base64 has been the standard binary carrier since 1996 and is not going away.
  • HTTP headers. Authorization: Basic <base64>, Content-Security-Policy nonces, and PEM-encoded certificates all rely on Base64.

Bad fits:

  • Anywhere you have a binary channel. Don't Base64 a file body before uploading it to a multipart/form-data endpoint — the multipart envelope is already binary-safe. You'll add 33% to the request size for nothing.
  • As a "lightweight" obfuscation layer. It is not. See the security section below.
  • As a compression step. It is not compression; it is the opposite. Encoding grows the payload.
  • For large media. A 4 MB JPEG inside a JSON body becomes a 5.4 MB string that has to be parsed, allocated, and copied. Use a presigned-URL upload or multipart/form-data instead.

For one-off encoding and decoding tasks in the browser, the Base64 encoder and Base64 decoder handle Unicode strings, files, and URL-safe variants without sending data to a server.

Performance Considerations

Base64 adds two costs: size and CPU. The headline number — "33% larger" — only tells half the story.

Wire-size overhead

For every 3 input bytes you produce 4 output bytes, so the encoded payload is 4/3 ≈ 1.333× the original. Padding adds up to 2 more bytes at the tail; MIME adds CRLF every 76 characters, pushing the overhead to about 1.37×. On a 1 MB image, that's an extra 330–370 KB.

Over the network the picture is usually better than that. HTTP/1.1 and HTTP/2 both transport responses through gzip or brotli by default, and Base64 strings compress respectably — typical real-world ratios are 0.75–0.85×, which claws back roughly half the encoding penalty. The encoded-then-compressed result is still larger than the raw-then-compressed binary, but the gap closes from "33% worse" to closer to "10–20% worse".

Practical implication: for assets transferred more than a few times, the encoding overhead is a fixed tax. For a thumbnail rendered inline once, the saved round-trip is often worth it. Measure with a real profile, not a back-of-envelope.

CPU and memory overhead

btoa / atob and Buffer.from(..., "base64") are extremely fast — typically hundreds of megabytes per second on modern CPUs. The real cost shows up in three places:

  1. JSON parsing. A 5 MB Base64 string inside a JSON body has to be tokenised by the JSON parser before you can even start decoding it. JSON parsers handle long strings worse than long arrays.
  2. String allocation. JavaScript strings are immutable; encoding a 10 MB buffer produces a brand-new ~13.3 MB string in memory, while the original buffer is still alive. Peak memory roughly doubles.
  3. Round-trip work. If your pipeline is read file → encode → embed in JSON → decode → write file, you've spent CPU on encode/decode and memory on the inflated intermediate purely to push bytes through a text channel.

If you find yourself optimising any of this, the cheapest fix is almost always "stop putting binary inside JSON" — switch to a binary transport.

Security Best Practices

Base64 is not encryption. It is not even obfuscation in any meaningful sense — the alphabet is public, the algorithm is trivial, every browser shipped with atob() for over a decade, and your terminal can decode it with base64 -d. Treating Base64 as a privacy boundary is the single most common security mistake associated with the format.

There are three real-world security concerns worth knowing about:

1. Don't store secrets "encoded"

Putting a Base64 string in a config file, an environment variable comment, or a code review attachment does not protect it. If the underlying data is sensitive (API key, credential, private key, signed token with PII in the payload), Base64 buys you nothing. Encrypt it, or — better — keep it in a secrets manager and never put the plaintext anywhere it can leak.

This applies to JWTs in particular. The payload of a JWT is Base64URL-encoded JSON, not encrypted. Anyone with the token can read the claims. The signature only proves the token wasn't tampered with — it does not hide the contents.

2. Validate canonical encoding on input

RFC 4648 allows for an attacker to construct multiple distinct Base64 strings that decode to the same bytes by manipulating the trailing characters and padding (the "non-canonical encoding" problem). Most decoders accept all of them. If you use the Base64 string itself as an identifier — for example, as a cache key, a deduplication token, or a database primary key — two equivalent strings will produce different keys and you'll get collisions or duplicates.

The fix: decode first, then re-encode, and compare the canonical re-encoded form. Or use the decoded bytes as the key, not the string.

3. Treat decoded output as untrusted

Whatever you decode is still under attacker control. The classic exploit chain is:

  1. App accepts a Base64-encoded "file" upload via JSON.
  2. App Buffer.from(input, "base64") to recover the bytes.
  3. App writes the bytes straight to disk, or parses them as an image, or evals them as a script.

The encoding step does nothing to sanitise the input. Validate content type, scan for known-bad signatures, run image decoders in a sandbox, and never eval decoded bytes.

Beyond Base64

For most binary-in-text needs, Base64 is the right answer. But it's worth knowing the neighbours:

  • Base32 is case-insensitive and avoids + and /, which makes it friendly to URLs, filenames, and humans reading values aloud. It is 60% larger than raw bytes (vs. Base64's 33%).
  • Base58 drops visually-confusable characters (0, O, I, l) and is the format behind Bitcoin addresses and most modern short-link IDs.
  • Hex is 100% larger than raw bytes but the simplest to read and compare by eye — preferred for hashes, fingerprints, and machine identifiers.
  • CBOR and MessagePack are binary serialisation formats that solve the same problem Base64-in-JSON solves, without the encoding tax. Worth reaching for when you control both ends of the pipe.

Quick reference

| Task | Tool | |------|------| | Encode a string or file to Base64 | Base64 encoder | | Decode Base64 back to text | Base64 decoder | | Convert an image to a data: URL | Image to Base64 | | Render a data: URL back to an image | Base64 to image | | Inspect a JWT (Base64URL payload) | JWT decoder |

Every one of these runs entirely in your browser — your input is never sent to a server. That matters when the thing you're decoding turns out to be a session token or an internal API response that you should never have pasted into a third-party site.

TL;DR

Base64 is a binary-to-text encoding, not a security primitive. Use it when the channel you're shipping bytes through can only carry text — JSON bodies, URLs, headers, email. Expect a ~33% size tax, recovered partially by HTTP compression. Use the URL-safe variant inside tokens and query strings. Never treat the encoded string as private, never use it as a cache key without canonicalising, and never trust the decoded output.

Get those four things right and Base64 is invisible plumbing. Get them wrong and it's a silent bug waiting for the next pentest.

Base64 Encoding: Complete Developer's Guide with Examples — Toolk Blog | Toolk