Flattening JSON with pandas (like a pancake, but for data)

A light‑hearted guide to turning nested JSON into tidy DataFrames
pandas
json
jupyter
Author

Deepak Ramani

Published

September 24, 2022

Modified

May 5, 2026

JSON — the data format that loves to nest things like a Russian doll.
Beautiful for APIs, terrible for analysis. Enter json_normalize, the flattening hammer.

import pandas as pd
import json

1. The simplest case: one lonely dict

viv = {
    "player_id": 15623,
    "player_name": "Vivianne Miedema",
    "jersey_number": 11
}
pd.json_normalize(viv)
player_id player_name jersey_number
0 15623 Vivianne Miedema 11

Take‑away: Keys → columns, values → rows. No drama.

2. When null sneaks in

Python doesn’t know null from a hole in the ground. Two quick fixes:

# Trick A: Pretend null is None
null = None
viv1 = { "player_id": 15623, "player_nickname": null }

# Trick B: Let json.loads handle it
import json
viv2 = json.loads('{ "player_id": 15623, "player_nickname": null }')

Both give you a clean dict ready for json_normalize.

3. A whole list of dicts

player_list = [
    { "player_id": 15623, "player_name": "Vivianne Miedema", "jersey_number": 11 },
    { "player_id": 10658, "player_name": "Danielle van de Donk" }  # missing jersey
]
pd.json_normalize(player_list)
player_id player_name jersey_number
0 15623 Vivianne Miedema 11.0
1 10658 Danielle van de Donk NaN

Missing keys become NaN. No whining, just numbers.
(The same happens if you used None above.)

4. Multi‑level JSON: the nesting begins

at_kickoff = {
    "id": "d712fb93",
    "lineup": {
        "player": {"id": 15623, "name": "Vivianne Miedema"},
        "position": {"id": 23, "name": "Center Forward"},
        "jersey_number": 11
    }
}

# Fully flattened – like a steamrolled box
pd.json_normalize(at_kickoff)

# Keep the top layer intact with max_level=1
pd.json_normalize(at_kickoff, max_level=1)
id lineup.player lineup.position lineup.jersey_number
0 d712fb93 {'id': 15623, 'name': 'Vivianne Miedema'} {'id': 23, 'name': 'Center Forward'} 11

Mental model: max_level is the number of nested layers you allow pandas to smash.
max_level=0 means “don’t even look inside”.

5. Nested lists: the record_path extraction

Sometimes a JSON object has a key that holds a list of sub‑records (e.g., players).
We want each list item as a row, not a mangled single column.

awfc = {
    'team': 'AWFC',
    'location': 'London',
    'info': {
        'manager': 'Joe',
        'contacts': {'email': 'joe@afc.com', 'tel': '123456789'}
    },
    'players': [
        {'name': 'Viv', 'goals': 101},
        {'name': 'DvD', 'goals': 60}
    ]
}

# Only the players, stripped of team context
pd.json_normalize(awfc, record_path=['players'])

# Bring along some metadata so we know whose players these are
pd.json_normalize(
    awfc,
    record_path=['players'],
    meta=['team', ['info', 'contacts', 'tel'], ['info', 'manager']]
)
name goals team info.contacts.tel info.manager
0 Viv 101 AWFC 123456789 Joe
1 DvD 60 AWFC 123456789 Joe

Syntax rule: meta paths are like breadcrumbs.
['info', 'contacts', 'tel']info.contacts.tel in the output.

6. A whole gang of records (list of dicts)

Same idea, but now the outer JSON is a list. Pandas happily loops over each dict.

json_list = [
    {
        "team": "arsenal",
        "colour": "red-white",
        "info": {"staff": {"physio": "xxxx", "doctor": "yyyy"}},
        "players": [
            {"name": "Viv", "stats": {"goals": 101, "assists": 40}},
            {"name": "Beth", "stats": {"goals": 60, "assists": 25}},
        ],
    },
    {
        "team": "city",
        "colour": "blue",
        "info": {"staff": {"physio": "aaaa", "doctor": "bbbb"}},
        "players": [
            {"name": "Steph", "stats": {"goals": 30, "assists": 15}},
            {"name": "Lucy", "stats": {"goals": 25, "assists": 20}},
        ],
    },
]

pd.json_normalize(
    json_list,
    record_path=["players"],
    meta=["team", "colour", ["info", "staff", "physio"]],
)
name stats.goals stats.assists team colour info.staff.physio
0 Viv 101 40 arsenal red-white xxxx
1 Beth 60 25 arsenal red-white xxxx
2 Steph 30 15 city blue aaaa
3 Lucy 25 20 city blue aaaa

Every player gets their team, colour, and physio stamped on them. Lovely.

7. Missing meta keys? Don’t panic

What if city forgot to hire a physio? (It happens.)
Without errors='ignore', pandas throws a KeyError.

json_list = [
    {
        "team": "arsenal",
        "info": {"staff": {"physio": "xxxx"}},
        "players": [{"name": "Viv"}],
    },
    {
        "team": "city",
        "info": {"staff": {}},
        "players": [{"name": "Steph"}],
    },  # no physio!
]

# This will error out (uncomment to see the KeyError):
# pd.json_normalize(json_list, record_path=['players'], meta=[['info', 'staff', 'physio']])

# Safer: ignore absent keys
pd.json_normalize(
    json_list,
    record_path=["players"],
    meta=[["info", "staff", "physio"]],
    errors="ignore",  # "Missing? Just put NaN and carry on."
)
name info.staff.physio
0 Viv xxxx
1 Steph NaN

8. Styling the output: separators & prefixes

Dot‑separated column names (info.staff.physio) can be a pain in pandas later.
Swap the dot, and add some leading labels.

json_list = [
    {
        "team": "arsenal",
        "info": {"staff": {"physio": "xxxx"}},
        "players": [{"name": "Viv"}],
    },
    {
        "team": "city",
        "info": {"staff": {"physio": "aaa"}},
        "players": [{"name": "Steph"}],
    },
]
pd.json_normalize(
    json_list,
    record_path=["players"],
    meta=["team", ["info", "staff", "physio"]],
    sep="_",  # goodbye dots, hello underscores
    record_prefix="pl_",  # every record column starts with pl_
    meta_prefix="meta_",  # every meta column starts with meta_
)
# Now we get: pl_name, pl_stats.goals, meta_team, meta_info_staff_physio, etc.
pl_name meta_team meta_info_staff_physio
0 Viv arsenal xxxx
1 Steph city aaa

Pro tip: sep changes the inner connector, while record_prefix/meta_prefix are global labels for the two column families.

9. Real data: files & URLs

# From a local file (like the movies.json you keep in the same folder)
with open('movies.json') as f:
    data = json.load(f)
pd.json_normalize(data)

# From the wild web
import requests
URL = 'https://vega.github.io/vega-datasets/data/cars.json'
data = json.loads(requests.get(URL).text)
pd.json_normalize(data)
Name Miles_per_Gallon Cylinders Displacement Horsepower Weight_in_lbs Acceleration Year Origin
0 chevrolet chevelle malibu 18.0 8 307.0 130.0 3504 12.0 1970-01-01 USA
1 buick skylark 320 15.0 8 350.0 165.0 3693 11.5 1970-01-01 USA
2 plymouth satellite 18.0 8 318.0 150.0 3436 11.0 1970-01-01 USA
3 amc rebel sst 16.0 8 304.0 150.0 3433 12.0 1970-01-01 USA
4 ford torino 17.0 8 302.0 140.0 3449 10.5 1970-01-01 USA
... ... ... ... ... ... ... ... ... ...
401 ford mustang gl 27.0 4 140.0 86.0 2790 15.6 1982-01-01 USA
402 vw pickup 44.0 4 97.0 52.0 2130 24.6 1982-01-01 Europe
403 dodge rampage 32.0 4 135.0 84.0 2295 11.6 1982-01-01 USA
404 ford ranger 28.0 4 120.0 79.0 2625 18.6 1982-01-01 USA
405 chevy s-10 31.0 4 119.0 82.0 2720 19.4 1982-01-01 USA

406 rows × 9 columns

No surprises – the same flattening magic.

Quick Refresher Cheat Sheet

I want to… I type…
Flatten a simple dict pd.json_normalize(dict)
Flatten a list of dicts pd.json_normalize(list)
Stop after one level of nesting pd.json_normalize(data, max_level=1)
Extract a list inside a key pd.json_normalize(data, record_path=['key'])
Keep parent info alongside meta=['key1', ['key2', 'key3']]
Survive missing meta keys errors='ignore'
Use underscores instead of dots sep='_'
Label record columns differently record_prefix='r_'
Label meta columns differently meta_prefix='m_'

That’s it. Now go flatten some JSON and make your future self proud.

Want to support my blog?

Buy Me A Coffee