Skip to main content
Back to all posts

Web Development

JSON vs YAML vs XML: When to Use Each (with Conversion Examples)

JSON for APIs and data interchange, YAML for human-edited config, XML for documents and enterprise. The exact tradeoffs, footguns, and conversion gotchas.

MM H Tawfik11 min read

Three formats carry almost all structured text on the internet, and developers waste an absurd amount of time arguing about which one is "best." Wrong question. They were designed for different jobs, and using the wrong one is how you end up with whitespace-sensitive config that breaks on a stray tab, or a 4x-larger payload because someone wrapped a REST API in XML.

This guide draws the lines exactly: what each format is specified to do, the footguns each one ships with, and the same data shown in all three so you can see the verbosity tradeoff with your own eyes.

TL;DR: JSON vs YAML vs XML — which should you use?

Use JSON for APIs and data interchange. It is compact, unambiguous, parses fast everywhere, and is the lingua franca of the web (RFC 8259). Use YAML for config a human edits by hand — CI pipelines, Kubernetes, Docker Compose — because it is the most readable, but know that it is whitespace-sensitive and full of type-coercion footguns (YAML 1.2). Use XML for documents, legacy/enterprise systems, and anything needing mixed content, namespaces, or rich schema (W3C XML 1.0) — it is verbose, but no other format handles marked-up prose.

The one-line decision rule: machine-to-machine → JSON; human-to-machine → YAML; document-or-enterprise → XML.

The same data in all three formats

Here is one small record — a service config with a list and a nested object — written identically in each format. Read top to bottom and the verbosity/readability tradeoff is obvious.

JSON (compact, ubiquitous, no comments):

{
  "service": "auth-api",
  "port": 8080,
  "enabled": true,
  "regions": ["us-east", "eu-west"],
  "limits": {
    "rps": 500,
    "burst": 1000
  }
}

YAML (the most readable for humans, but indentation is the syntax):

service: auth-api
port: 8080
enabled: true
regions:
  - us-east
  - eu-west
limits:
  rps: 500
  burst: 1000

XML (verbose, every value double-wrapped in tags):

<service name="auth-api" port="8080" enabled="true">
  <regions>
    <region>us-east</region>
    <region>eu-west</region>
  </regions>
  <limits rps="500" burst="1000"/>
</service>

The XML version is the longest and the JSON version is the most rigid; the YAML version is the easiest to scan but the most fragile (mis-indent one line and limits becomes a sibling of regions instead of a peer). That fragility is the entire story of when to use which.

JSON: the data interchange default

JSON is the right answer whenever two programs talk to each other. It is specified by RFC 8259 and the byte-identical ECMA-404, and its data model is exactly six types: object, array, string, number, boolean, and null. Nothing else exists.

What trips people up is what JSON deliberately omits:

  • No comments. This is the number-one cause of "invalid JSON" in hand-written files. If you want comments, you want YAML (or JSON5, which is not JSON).
  • No trailing commas. [1, 2, 3,] is invalid. Many parsers tolerate it; the spec does not.
  • No date type. Dates are strings, ISO 8601 by convention only.
  • No undefined, NaN, or Infinity. JSON.stringify drops undefined and turns NaN/Infinity into null.

The trap that bites production hardest is number precision. JSON numbers have no defined precision, and JavaScript parses every one into a 64-bit IEEE 754 double, which exactly represents integers only up to 2^53 − 1 (9007199254740991). Anything larger silently corrupts:

JSON.parse('{"id":9007199254740993}'); // { id: 9007199254740992 }  ← off by one

If you ship Postgres BIGINT IDs, Twitter snowflake IDs, or any 64-bit integer over JSON, serialize them as strings ("id": "9007199254740993") — which is exactly what every major public API does. We cover this trap in depth in the JSON validation and formatting guide.

To pretty-print, validate, and find the exact line of a syntax error, use the JSON formatter; to compare two payloads structurally rather than line-by-line, use the JSON diff.

YAML: the human-edited config format

YAML wins when a person edits the file by hand. It is the most readable of the three — no braces, no closing tags, comments with #, and indentation that mirrors the data shape. That readability is why Kubernetes, GitHub Actions, Docker Compose, and Ansible all chose it. The current spec is YAML 1.2, and as of 1.2 YAML is a strict superset of JSON — any valid JSON file is also valid YAML.

YAML's killer feature for large configs is anchors and aliases, which let you define a block once and reuse it:

defaults: &defaults
  retries: 3
  timeout: 30

staging:
  <<: *defaults
  host: staging.example.com

production:
  <<: *defaults
  host: example.com
  retries: 5   # overrides the merged default

But YAML's readability is paid for with real footguns:

  • Indentation is the syntax. Misalign one space and the structure changes silently — there is no closing brace to catch the mistake.
  • Tabs are illegal for indentation. The spec forbids them outright; a single tab character is a parse error, and editors that insert tabs cause maddening bugs.
  • The Norway problem. YAML 1.1 coerces unquoted no, yes, off, on, false, true into booleans. So a list of country codes containing NO (Norway) becomes false:
# Looks fine. Is a bug.
countries:
  - SE
  - NO   # ← parsed as boolean false, not the string "NO"
  - DK

The fix is always: quote your scalars when the value could be misread ("NO", "3.10", "on"). YAML 1.2 narrowed boolean coercion to only true/false, but most real-world parsers (and a lot of tooling) still run 1.1 semantics, so do not rely on the version saving you.

To catch indentation and anchor errors before they ship, run files through the YAML validator. To move config between the two worlds — a YAML pipeline reading a JSON-derived value, or vice versa — the JSON ↔ YAML converter handles the round-trip.

XML: documents, namespaces, and enterprise legacy

XML is the verbose one, and that verbosity buys capabilities the other two structurally cannot match. It is defined by W3C XML 1.0, and unlike JSON or YAML it was designed for marked-up documents, not just data records — which is why it is still the backbone of office formats and publishing.

The first thing to understand is that XML has two ways to attach a value: child elements and attributes.

<!-- value as an element -->
<book>
  <title>The Pragmatic Programmer</title>
</book>

<!-- value as an attribute -->
<book title="The Pragmatic Programmer"/>

The loose convention: attributes for metadata/identifiers, elements for the data itself. There is no universal rule, which is exactly the impedance mismatch that makes XML↔JSON conversion lossy (more below).

XML's genuinely unique features:

  • Namespaces. xmlns prefixes let two vocabularies coexist in one document without collision — essential when you merge schemas from different vendors.
  • Mixed content. XML can interleave text and markup (<p>Hello <b>world</b>!</p>). JSON and YAML have no native way to express prose with inline tags. This is the reason document formats use XML.
  • CDATA sections. <![CDATA[ ... ]]> holds raw text — including <, >, and & — without escaping, perfect for embedding code or HTML.
  • Rich schema and transformation. XSD gives far stronger validation than JSON Schema (data types, cardinality, inheritance), and XSLT transforms one XML document into another, into HTML, or into text.

When XML still wins outright: SOAP web services (still pervasive in banking, healthcare, and government), document formats (DOCX, XLSX, SVG, RSS/Atom, and HTML itself are XML or XML-derived), and publishing/transformation pipelines built on XSLT. If your problem is "structured prose" or "I must integrate with a 2008-era enterprise system," XML is not legacy baggage — it is the correct tool.

To indent, collapse, and validate XML structure, use the XML formatter.

JSON vs YAML vs XML: the comparison table

| Dimension | JSON | YAML | XML | |-----------|------|------|-----| | Readability | Good | Best | Poor (verbose) | | Verbosity | Compact | Most compact | Heaviest | | Comments | No | Yes (#) | Yes (<!-- -->) | | Data types | 6 (object, array, string, number, bool, null) | Scalars + collections (loosely typed, coercion-heavy) | Text only (types via XSD) | | Schema | JSON Schema | None native (uses JSON Schema externally) | XSD / DTD / RelaxNG (richest) | | Whitespace-sensitive | No | Yes (indentation = syntax) | No | | Mixed content / markup | No | No | Yes | | Namespaces | No | No | Yes | | Parse speed | Fastest | Slowest | Moderate | | Typical use | APIs, data interchange | Human-edited config | Documents, SOAP, enterprise |

Conversion notes and the gotchas that bite

Because YAML 1.2 is a superset of JSON, JSON → YAML is lossless and trivial — every JSON document is already valid YAML. The reverse (YAML → JSON) is where you lose things: comments, anchors/aliases (they get expanded inline), and any non-JSON scalar tags. Convert in the JSON ↔ YAML converter and expect comments to vanish on the way to JSON.

XML ↔ JSON is the genuinely hard one, because of the attributes-vs-elements impedance mismatch. There is no canonical mapping. Consider:

<user id="42" active="true">
  <name>Ada</name>
</user>

A converter must invent a convention for the id/active attributes versus the name element. The common (Badgerfish/xml-js) approach prefixes attributes with @ or _ and dumps text into a #text key:

{
  "user": {
    "@id": "42",
    "@active": "true",
    "name": "Ada"
  }
}

Three things break in this round-trip: (1) attribute order and the element-vs-attribute distinction are not recoverable without a convention both sides agree on; (2) repeated elements turn into arrays only when there are 2+, so a single <region> deserializes to an object while two become an array — a notorious bug source; and (3) XML namespaces and mixed content have no clean JSON home at all. Never assume XML → JSON → XML returns the original bytes.

For adjacent conversions you will reach for constantly: the CSV ↔ JSON converter for spreadsheet imports, the JSON to TypeScript converter to generate interfaces from a sample payload, and the JSON schema generator to derive a starting JSON Schema you then tighten by hand.

Which should you use, by use case

  • REST / GraphQL API responses → JSON. Compact, fast, universally supported. Non-negotiable.
  • Config a human edits (Kubernetes, GitHub Actions, Compose, Ansible) → YAML. Readability wins; just quote ambiguous scalars and never use tabs.
  • Config a program generates and never a human reads → JSON. No footguns, faster parse, no quoting discipline required.
  • SOAP services, government/banking/health integrations → XML. Often the only option the counterparty accepts.
  • Documents with inline markup (publishing, SVG, office formats, RSS) → XML. Mixed content is the deciding feature.
  • Inter-service messaging, logs, event streams → JSON (or NDJSON for streaming).
  • Anything signed or hashed → JSON with a canonical encoding (RFC 8785); YAML's flexibility makes byte-stable output nearly impossible.

A safe default: start with JSON. Switch to YAML only when humans must hand-edit it, and to XML only when you need documents, namespaces, or a system that demands it.

TL;DR

JSON, YAML, and XML are not competitors — they are specialists. JSON is the data-interchange default: compact, six types, no comments, parses everywhere, but watch the 2^53 bigint precision trap. YAML is for human-edited config: the most readable, with anchors for reuse, but indentation is the syntax, tabs are illegal, and the Norway problem will coerce NO into false unless you quote it. XML is for documents and enterprise: verbose, but the only format with mixed content, namespaces, CDATA, and XSD/XSLT — still correct for SOAP, office formats, and publishing.

The decision rule fits on one line: machine-to-machine → JSON, human-to-machine → YAML, document-or-enterprise → XML. Convert between them with the right tool — the JSON ↔ YAML converter, the XML formatter, the YAML validator, and the JSON formatter — and remember that XML↔JSON never round-trips cleanly. Browse the full set at Toolk's tools.