Web Development
Complete Guide to JSON: Validation, Formatting, and Best Practices
Master JSON handling with our comprehensive guide covering validation techniques, formatting best practices, and advanced optimization strategies for web developers.
JSON looks deceptively simple. Curly braces, quoted keys, a handful of value types, and a strict grammar that fits on a single page. Yet the volume of production incidents caused by JSON — trailing commas in config files, silently-truncated number precision, schemas drifting away from the data they're meant to describe — is enormous. Every senior backend engineer has at least one war story that begins with a missing closing brace.
This guide is the working developer's reference. We'll cover the syntax precisely (including the parts the textbooks gloss over), the validation strategies that actually scale beyond JSON.parse, the formatting rules that keep diffs reviewable, and the advanced patterns — streaming, partial parsing, schema evolution — that production systems rely on.
Understanding JSON Fundamentals
JSON is specified by RFC 8259 and the equivalent ECMA-404 standard. It defines six value types — object, array, string, number, boolean, null — and that is the entire data model. Nothing else is allowed.
What surprises people most often is what JSON doesn't have:
- No date type. Dates are strings — by convention ISO 8601 (
"2024-01-10T14:32:00Z"), but that's only a convention. - No binary type. Binary blobs are encoded as Base64 strings — see our Base64 encoding guide for the size and security trade-offs.
- No comments. This is the single most common cause of "invalid JSON" errors in hand-written config files.
- No trailing commas.
[1, 2, 3,]is not valid JSON. Many parsers tolerate it; the spec does not. - No
undefined. Onlynull. JavaScript'sJSON.stringifysilently dropsundefinedvalues during serialization — a frequent source of "missing field" bugs. - No
NaN,Infinity, or-Infinity. Serialising them in Node.js producesnull. Roundtripping numeric data with these values requires custom replacers.
JSON Syntax and Structure
The grammar is small enough to recite from memory:
value = object | array | string | number | true | false | null
object = "{" [ string ":" value { "," string ":" value } ] "}"
array = "[" [ value { "," value } ] "]"
string = '"' { char } '"' // chars escaped per RFC 8259
number = [-] int [frac] [exp] // no leading + sign, no leading zeros
A few exact rules that matter in practice:
- Object keys are always strings.
{1: "x"}is invalid; you need{"1": "x"}. JavaScript's object literal syntax allows unquoted keys, JSON does not. - Strings use double quotes only. Single quotes are invalid JSON, even though they're legal JavaScript.
- Numbers cannot have leading zeros.
0.5is fine;.5is not. - Numbers cannot be
+5. Only the negative sign is allowed in front. - Whitespace between tokens is free. Whitespace inside tokens (strings, numbers) is significant.
Most parsing errors come from JavaScript-ish habits sneaking into JSON. The JSON formatter flags the exact line and column of the first parser error, which is faster than reading a stack trace.
The number-precision problem (the bug everyone hits eventually)
JSON numbers have no defined precision. JavaScript parses every JSON number into a 64-bit IEEE 754 double, which can exactly represent integers up to 2^53 − 1 (9_007_199_254_740_991). Anything larger silently loses precision:
JSON.parse('{"id":9007199254740993}'); // { id: 9007199254740992 } ← off by one
If you ship Postgres BIGINT IDs, Twitter snowflake IDs, or any 64-bit integer across a JSON boundary, this bug is waiting for you. The standard workaround is to serialise large IDs as strings ("id": "9007199254740993") — virtually every public API with 64-bit IDs does this. Alternatively, use a parser that supports BigInt for JSON numbers (Node 22+ has experimental support).
Validation Techniques
There are three layers of JSON validation, and most production bugs come from confusing them:
| Layer | Question it answers | Tool |
|-------|---------------------|------|
| Syntax | Is this valid JSON at all? | JSON.parse, language-native parsers |
| Structure | Does it have the expected shape? | JSON Schema, TypeScript types, Zod, ajv |
| Semantics | Is the data internally consistent? | Custom business-logic validators |
Syntax validation
JSON.parse(input) is the canonical syntax check. If it throws, the input isn't JSON. It's worth wrapping every untrusted parse in a try/catch and surfacing a specific error:
function parseJson(input) {
try {
return { ok: true, value: JSON.parse(input) };
} catch (err) {
// err.message includes "at position N" — useful for editor highlighting
return { ok: false, error: err.message };
}
}
For ad-hoc debugging, paste the payload into the JSON formatter — it processes locally in your browser, highlights syntax errors with exact line and column numbers, and pretty-prints the result so you can spot the structural mistake.
Structural validation with JSON Schema
JSON.parse succeeding only tells you the bytes are valid JSON. It tells you nothing about whether the parsed object has the fields, types, or constraints your code expects. That is what JSON Schema exists for.
A minimal user schema:
{
"$schema": "https://json-schema.org/draft/2020-12/schema",
"type": "object",
"required": ["id", "email"],
"properties": {
"id": { "type": "string", "pattern": "^u_[a-zA-Z0-9]{12}$" },
"email": { "type": "string", "format": "email" },
"age": { "type": "integer", "minimum": 13 }
},
"additionalProperties": false
}
In Node, validate it with Ajv — the fastest schema validator in the ecosystem:
import Ajv from "ajv";
import addFormats from "ajv-formats";
const ajv = addFormats(new Ajv({ allErrors: true }));
const validate = ajv.compile(userSchema);
if (!validate(data)) {
console.error(validate.errors);
}
If you write the schema by hand from a sample payload, the JSON schema generator gives you a clean starting point that you then tighten with constraints (required fields, patterns, formats).
Type-system validation
If you're working in TypeScript, the most ergonomic option is to derive runtime validators from your types using Zod or Valibot — the types and the validator stay in sync because they're the same definition:
import { z } from "zod";
const User = z.object({
id: z.string().regex(/^u_[a-zA-Z0-9]{12}$/),
email: z.string().email(),
age: z.number().int().min(13).optional(),
});
type User = z.infer<typeof User>;
const result = User.safeParse(payload); // ok: boolean, data | error
If you already have a JSON sample but no types, the JSON to TypeScript converter generates an interface from the payload, which you can then port to Zod for runtime checking.
Cross-format validation: YAML, CSV, JWT
A surprising amount of "JSON validation" work is actually format-translation. A few common cases:
- YAML configs. YAML is a superset of JSON; if your CI pipeline reads YAML and your app reads JSON, drift is inevitable. The YAML validator catches indentation and anchor errors before they ship; the JSON ↔ YAML converter makes round-trips painless.
- CSV imports. Spreadsheets are the most common upstream of dirty data. The CSV ↔ JSON converter normalises encodings and produces validatable JSON in one step.
- JWTs. A JWT payload is a Base64URL-encoded JSON object. Decoding it without verifying the signature is a security mistake — but inspecting the claims is a routine debugging task. Use the JWT decoder.
Formatting Best Practices
Formatting is where reasonable engineers disagree the most, and where most of that disagreement doesn't matter. What does matter:
- Pick a style. Enforce it with a tool. Prettier,
jq,dprint,json-stable-stringify— any of them work as long as everyone on the team uses the same one. Manually formatting JSON in PRs is dead time. - 2-space indentation is the de facto default. GitHub, npm, VS Code, and most public APIs use it.
- Sort keys when the JSON is data, not config. Sorted keys produce stable diffs and reproducible hashes. Use
JSON.stringify(obj, Object.keys(obj).sort(), 2)for shallow sorting, orjson-stable-stringifyfor deep sorting. - Strip whitespace for production payloads. A minified JSON response saves 20–40% over the indented version on the wire. Most HTTP frameworks (Express, Fastify, FastAPI) minify by default.
- Use a stable line-ending convention. LF, not CRLF — every JSON tool and parser handles LF; not all handle CRLF cleanly.
Diffing JSON
JSON diffs in git are notoriously noisy: re-ordered keys, added trailing commas (in JSON5/YAML), or a single deeply-nested change all produce visually huge diffs. Two strategies make this manageable:
- Normalise before committing. Sort keys and apply consistent indentation, so the only diffs are semantic changes.
- Use a structural differ.
git diffis line-based and doesn't understand JSON. The JSON diff tool computes an object-level diff (added / removed / changed by path), which is what you actually want during code review. For general text diffs, diff-checker handles arbitrary content side-by-side.
Advanced JSON Handling
Streaming and partial parsing
JSON.parse is a one-shot operation: it reads the entire input into memory before producing any output. For very large files (say, a multi-GB log export, or an API response with 100k records), this is a non-starter — peak memory usage will be 2–3× the file size.
The standard workaround is line-delimited JSON (NDJSON / JSON Lines):
{"id":1,"event":"login"}
{"id":2,"event":"logout"}
{"id":3,"event":"login"}
Each line is an independent, complete JSON document. You stream the file line-by-line, parse each line individually, and process records as they arrive — bounded memory regardless of file size. Most log shippers (Fluent Bit, Logstash), analytics exports (BigQuery, Snowflake), and search engines (Elasticsearch bulk API) speak NDJSON natively.
For genuinely streaming JSON (a single huge object received progressively), libraries like stream-json provide event-based parsers built on clarinet / jsonparse. They're more complex than JSON.parse but mandatory above ~100 MB.
Canonical JSON for hashing and signing
If you sign or hash JSON payloads (HMAC for webhook verification, content-addressable storage, blockchain), you need a canonical encoding — one where the same logical object always produces the same bytes. The standard is RFC 8785 (JCS): sort object keys lexicographically, normalise numeric representation, and use a fixed UTF-8 string-escape policy.
Don't roll your own canonicaliser. The smallest mismatch — a difference in how 1e2 vs 100 are encoded, or whether the / character is escaped — produces a different hash, and signature verification silently fails.
Schema evolution
Production schemas change. The pattern that works:
- Treat every field as optional in your reader. Old clients reading new payloads must not crash on unknown fields.
- Never repurpose a field. If you change the meaning of
statusfrom boolean to enum, give the new field a new name (status_v2) and deprecate the old one over a release cycle. - Use
additionalProperties: trueduring transitions. Tighten it back tofalseonce you're confident all producers have migrated. - Version the envelope, not the payload. A single
"$schemaVersion": 3at the top of the document is cheaper to evolve than versioning every nested field.
These rules apply equally to internal microservice contracts, external public APIs, and persisted state in databases that store JSON columns.
TL;DR
JSON's value is its strictness. Honour the grammar (no comments, no trailing commas, double-quoted strings, no undefined), validate in three layers (syntax → structure → semantics), normalise format and key order before committing, and reach for NDJSON or schema versioning before you reach for "let's just put it all in one big object". Most of the bugs people blame on JSON are bugs in how they're using it.
Pair this with the right tooling — the JSON formatter, schema generator, TypeScript generator, and diff — and JSON stays what it was supposed to be: boring, predictable plumbing.
Keep reading
Related posts
Base64 Encoding: Complete Developer's Guide with Examples
Everything you need to know about Base64 encoding and decoding. Learn when to use it, performance considerations, and security implications in modern web applications.
Read postThe Ultimate Developer Stack 2025: 50+ Tools to 10x Your Workflow
A comprehensive, deep-dive guide into the 50+ most powerful developer tools for 2025. From AI-native IDEs like Cursor to edge databases and performance utilities, this is the definitive stack for modern software engineers.
Read postText Processing Tools Every Content Creator Needs in 2024
Streamline your content creation workflow with these powerful text processing tools. From word counters to readability analyzers, discover tools that save time and improve quality.
Read post