Web Development
CSV to JSON and Back: Handling Headers, Nesting, and Edge Cases
Convert CSV to JSON without corrupting data: how to handle quoting, delimiters, type inference, and nesting, plus a worked round-trip showing what survives.
CSV is a flat grid of rows and columns. JSON is a nested, typed tree. To convert CSV to JSON you map each data row to an object keyed by the header row; to go back, you flatten each object into a column layout. The mechanical part takes ten lines of code. The part that ruins your data is the rest: quoting rules, delimiter ambiguity, when "123" should become a number, and how nesting collapses in one direction but not the other.
This guide covers exactly those failure points, with the relevant rules from RFC 4180 (the CSV memo) and RFC 8259 (the JSON standard), and a full round-trip you can verify yourself.
What does converting CSV to JSON actually mean?
Converting CSV to JSON turns a tabular file into an array of objects. The first row supplies the keys; every following row becomes one object whose values are looked up by column position. So name,age over Ada,36 becomes [{"name":"Ada","age":"36"}]. The reverse — JSON to CSV — takes an array of objects, collects the union of keys as a header, and writes one row per object. The data models do not match, so three things never translate cleanly: quoting, types, and nested structure.
CSV (flat, all strings) JSON (tree, six typed values)
┌──────┬─────┐
│ name │ age │ ──convert──▶ [ { "name": "Ada", "age": 36 } ]
├──────┼─────┤
│ Ada │ 36 │ ◀──flatten── object keys become columns
└──────┴─────┘
Keep that asymmetry in mind. A flat grid maps up into a tree with no information loss. A tree only maps down into a grid if it is already shallow — the moment you nest, you have to choose how to flatten, and that choice is lossy or ambiguous.
The RFC 4180 quoting rules everyone gets wrong
The single biggest cause of corrupted conversions is splitting on every comma. CSV is not "values separated by commas" — it is values separated by commas unless the value is quoted. RFC 4180 §2 defines exactly three rules that matter:
- Fields with a comma, double quote, or line break must be wrapped in double quotes.
- A double quote inside a quoted field is escaped by doubling it (
""). - Records are separated by CRLF, but in practice you must accept bare LF too.
Here is a deliberately nasty row that breaks naive parsers:
id,name,note
1,"Park, Min","She said ""hi"" then left"
2,"line one
line two",ok
Three traps in three lines: a comma inside a value, a doubled quote that decodes to a single ", and a literal newline inside a quoted field. A converter that splits on , and \n will read this as garbage. The correct JSON is:
[
{ "id": "1", "name": "Park, Min", "note": "She said \"hi\" then left" },
{ "id": "2", "name": "line one\nline two", "note": "ok" }
]
Note the translation of escapes: CSV doubles a quote (""), JSON backslash-escapes it (\"), and the embedded newline becomes \n. If your output still contains a stray "" or a row that broke at the newline, your parser is not RFC 4180-compliant. A correct parser is a small state machine that tracks whether it is currently inside quotes — when it is, commas and newlines are literal characters, not separators. Paste the row above into the CSV to JSON converter and confirm it returns two objects, not four.
Header rows and delimiter variations
The header row is a convention, not part of the format — RFC 4180 calls it "an optional header line." So a converter needs an explicit toggle. With a header, row one becomes the keys. Without one, you synthesize keys (col_1, col_2, …) so the JSON objects still have stable names. Guessing wrong silently shifts every value up or down by a row.
Delimiters are the other landmine. The "C" in CSV says comma, but real files use whatever the producing locale chose:
| Delimiter | Name | Where it comes from |
|-----------|------|---------------------|
| , | Comma | Standard CSV (RFC 4180), US/UK exports |
| ; | Semicolon | European locales — see below |
| \t | Tab | TSV; Excel and BI-tool copy-paste |
| \| | Pipe | Legacy database and mainframe dumps |
The semicolon case is not a quirk — it is the European decimal-comma problem. In German, French, and most continental locales, 1.234,56 means "one thousand two hundred thirty-four point five six": the comma is the decimal point. A spreadsheet there cannot also use the comma as a field separator, so it switches the delimiter to ;. Feed such a file to a comma-splitting converter and every 3,14 becomes two broken columns. Always detect or ask for the delimiter; never assume comma.
Type inference: the "123" problem
This is where most data quietly breaks. CSV has exactly one type — text. JSON has six (RFC 8259: object, array, string, number, boolean, null). When you convert, you must decide whether "123" stays a string or becomes the number 123, whether "true" becomes a boolean, and whether "" becomes null. Reasonable type inference looks like this:
// "" / "null" -> null
// "true" -> true, "false" -> false
// "123" -> 123 (only if it is a safe integer)
// "3.14" -> 3.14
// everything else stays a string
Inference is convenient and dangerous in equal measure. Two pitfalls bite hard:
Leading zeros and phone numbers. A ZIP code 01730, a phone number 0044…, or an order ID 00042 are strings that look like numbers. Coerce them and 01730 becomes 1730 — a different, wrong value. The same applies to ISBNs, SKUs, and account numbers. Rule: identifiers are strings, even when they are all digits.
The 2^53 precision cliff. A safe converter only coerces an integer when it passes Number.isSafeInteger — i.e. it fits within 2^53 − 1 (9007199254740991). A 19-digit Twitter/Postgres snowflake ID like 1234567890123456789 exceeds that, so JavaScript's IEEE 754 doubles round it. The fix is to leave large integers as strings:
JSON.parse('{"id":1234567890123456789}');
// { id: 1234567890123456800 } ← silently wrong, off by 19
The pragmatic default: type inference on for analytics and quick inspection, off whenever the column is an identifier. If you cannot be sure, keep everything a string — a string never loses data, a bad coercion always does. After conversion, the JSON formatter lets you eyeball which values came out as numbers versus strings at a glance.
Nesting: dot-notation columns and the reverse ambiguity
JSON nests; CSV does not. To put a nested object into a grid you flatten its path into a column name, conventionally with dots:
{ "user": { "name": "Ada", "addr": { "city": "London" } } }
user.name,user.addr.city
Ada,London
This flattening (a.b.c) is lossless and reversible for objects. The trouble starts on the way back, because the dot is ambiguous. Given a column literally named user.name, did the producer mean a nested key name inside user, or a flat key whose name happens to contain a dot? CSV cannot tell you. Arrays make it worse — tags.0, tags.1 is one common encoding, but nothing standardizes it, so two tools disagree about whether tags.0 rebuilds an array or an object with a "0" key.
The blunt alternative that most converters take — including ours — is to not flatten at all: when a JSON value is itself an object or array, serialize it into a single cell as a JSON string.
[{ "name": "Ada", "tags": ["admin", "dev"] }]
name,tags
Ada,"[""admin"",""dev""]"
The tags cell is a JSON array, quoted and with its inner quotes doubled per RFC 4180. This round-trips perfectly back to JSON because you just JSON.parse the cell — but spreadsheet users see an ugly string instead of separate columns. Dot-flattening is prettier for humans; embedded JSON is safer for machines. Pick based on who reads the CSV. The JSON to CSV converter takes the embedded-JSON approach so that nested data survives the trip intact.
Worked round-trip: CSV to JSON to CSV
Here is the full loop, so you can see exactly what survives. Start with a tricky CSV:
sku,price,in_stock,note
00042,19.99,true,"Ships in 1, maybe 2 days"
00043,,false,"He said ""sold out"""
Convert to JSON with type inference on:
[
{ "sku": "00042", "price": 19.99, "in_stock": true, "note": "Ships in 1, maybe 2 days" },
{ "sku": "00043", "price": null, "in_stock": false, "note": "He said \"sold out\"" }
]
Read what the inference did, value by value. 00042 kept its leading zero and stayed a string — correct, it is an identifier. 19.99 became a number. true/false became booleans. The empty price cell became null. The commas and doubled quotes inside the note fields were preserved as literal text. Now convert back to CSV:
sku,price,in_stock,note
00042,19.99,true,"Ships in 1, maybe 2 days"
00043,,false,"He said ""sold out"""
Byte-for-byte identical. That is the test of a correct converter pair: a clean CSV survives a full round trip unchanged. What does not survive is type distinction on the way back — 19.99 the number and "19.99" the string both serialize to the same CSV cell, because CSV has no types. The grid forgets what JSON remembered. That is the fundamental, unavoidable asymmetry; everything else is just parser correctness. If you want both directions in one place, the bidirectional CSV ↔ JSON converter runs the whole loop client-side. For a deeper look at how these formats differ from config-oriented ones, see JSON vs YAML vs XML.
Quick reference
| Concern | CSV side | JSON side |
|---------|----------|-----------|
| Value type | Always text | object, array, string, number, boolean, null |
| Comma in value | Wrap field in " | Plain string |
| Quote in value | Double it: "" | Backslash: \" |
| Newline in value | Allowed inside quotes | Becomes \n |
| Missing value | Empty cell | null or omitted key |
| Nested data | Dot columns or embedded JSON string | Native nesting |
| Big integer ID | Plain digits | Keep as string past 2^53−1 |
When you have JSON and need types you can trust
Once a CSV import is clean JSON, lock down its shape before it flows into code. The JSON to TypeScript converter turns a sample payload into an interface, so the string-vs-number decisions you made during type inference become compile-time guarantees instead of runtime surprises. Browse the full set in the tools directory.
Everything above runs in your browser — your spreadsheet exports, customer lists, and financial CSVs never leave your device. No uploads, no servers.
TL;DR
CSV is a flat grid of strings; JSON is a typed tree. Converting CSV to JSON maps each row to an object keyed by the header; JSON to CSV flattens objects back into columns. The three things that break are quoting, types, and nesting. Honour RFC 4180 (wrap fields containing commas, quotes, or newlines in double quotes; escape an inner quote by doubling it), detect the real delimiter (the European semicolon is a decimal-comma artifact, not a quirk), and never coerce identifiers — leading-zero codes and 64-bit IDs must stay strings. A flat grid round-trips losslessly; a tree only flattens if it is shallow. Get the parser right and CSV↔JSON becomes boring plumbing, which is exactly what you want.
Keep reading
Related posts
JSON vs YAML vs XML: When to Use Each (with Conversion Examples)
JSON for APIs and data interchange, YAML for human-edited config, XML for documents and enterprise. The exact tradeoffs, footguns, and conversion gotchas.
Read post7 Common JSON Errors and How to Fix Them
The 7 JSON errors that break parsers every day — trailing commas, single quotes, unquoted keys, comments, bad brackets, wrong types — with the exact fix.
Read postComplete Guide to JSON: Validation, Formatting, and Best Practices
Master JSON handling with our comprehensive guide covering validation techniques, formatting best practices, and advanced optimization strategies for web developers.
Read post