I talk about this more explicitly in the PyCon talk (https://pythonspeed.com/pycon2025/slides/ - video soon) though that's not specifically about Pydantic, but basically:

1. Inefficient parser implementation. It's just... very easy to allocate way too much memory if you don't think about large-scale documents, and very difficult to measure. Common problem with many (but not all) JSON parsers.

2. CPython in-memory representation is large compared to compiled languages. So e.g. 4-digit integer is 5-6 bytes in JSON, 8 in Rust if you do i64, 25ish in CPython. An empty dictionary is 64 bytes.

▲

cozzyd 2 months ago | parent [-]

Funny to see awkward array in this context! (And... do people really store giant datasets in json?!?).

▲

chao- 2 months ago | parent | next [-]

Often the legacy of an engineer (or team) who "did what they had to do" to meet a deadline, and if they wanted to migrate to something better post-launch, weren't allowed to allocate time to go back and do so.

At least JSON or CSV is better than the ad hoc homegrown formats you found at medium-sized companies that came out of the 90's and 00's.

	▲	2 months ago \| parent [-]
		[deleted]

▲

ljm 2 months ago | parent | prev | next [-]

Some people even use AI-generated JSON as a semantic layer over their SQL.

▲

jfb 2 months ago | parent | prev [-]

My sweet summer child