▲ | CJefferson 11 hours ago | ||||||||||||||||
To take 2GB to parse a 100MB file, we increase file size 20x Let's imagine the file is mostly full of single digit numbers with no spaces (so lists like 2,4,1,0,9,3...). We need to spend 40 bytes storing a number. Make a minimal sized class to store an integer:
That object's size is already 48 bytes.Usually we store floats from JSON, the size of 1 as a float in python is 24 bytes. Now, you can get smaller, but as soon as you introduce any kind of class structure or not parsing numbers until they are used (in case you want people to be able to intrepret them as ints or floats), you blow through 20x memory size increase. | |||||||||||||||||
▲ | fidotron 10 hours ago | parent [-] | ||||||||||||||||
> We need to spend 40 bytes storing a number. But . . . why? Assuming they aren't BigInts or similar these are maximum 8 bytes of actual data. This overhead is ridiculous. Using classes should enable you to be much smaller than the JSON representation, not larger. For example, V8 does it like https://v8.dev/docs/hidden-classes > not parsing numbers until they are used Doesn't this defeat the point of pydantic? It's supposed to be checking the model is valid as it's loaded using jiter. If the data is valid it can be loaded into an efficient representation, and if it's not the errors can be emitted during iterating over it. | |||||||||||||||||
|