Remix.run Logo
itamarst 21 hours ago

I talk about this more explicitly in the PyCon talk (https://pythonspeed.com/pycon2025/slides/ - video soon) though that's not specifically about Pydantic, but basically:

1. Inefficient parser implementation. It's just... very easy to allocate way too much memory if you don't think about large-scale documents, and very difficult to measure. Common problem with many (but not all) JSON parsers.

2. CPython in-memory representation is large compared to compiled languages. So e.g. 4-digit integer is 5-6 bytes in JSON, 8 in Rust if you do i64, 25ish in CPython. An empty dictionary is 64 bytes.

cozzyd 19 hours ago | parent [-]

Funny to see awkward array in this context! (And... do people really store giant datasets in json?!?).

chao- 17 hours ago | parent | next [-]

Often the legacy of an engineer (or team) who "did what they had to do" to meet a deadline, and if they wanted to migrate to something better post-launch, weren't allowed to allocate time to go back and do so.

At least JSON or CSV is better than the ad hoc homegrown formats you found at medium-sized companies that came out of the 90's and 00's.

5 hours ago | parent [-]
[deleted]
ljm 13 hours ago | parent | prev | next [-]

Some people even use AI-generated JSON as a semantic layer over their SQL.

jfb 18 hours ago | parent | prev [-]

My sweet summer child