Remix.run Logo
shoo 6 days ago

i can see how it'd be possible to transform from the input tabular format to the json format, streaming record by record, using a small constant amount of memory, provided the size of input records was bounded independent of the record count. need to maintain position offset into the input across records, but that's about it

but, maybe we'd need to know more about how the output data is consumed to know if this would actually help much in the real application. if the next stage of processing wants to randomly access records using Get(int i), where i is the index of the item, then even if we transform the input to JSON with a constant amount of RAM, we still have to store this output JSON somewhere so we can Get those items.

the blog post mentioned "padding", i didn't immediately understand what that was referring to (padding in the output format?) but i guess it must be talking about struct padding, where the items were previously stored as an array of structs, while the code in the article transposed everything into homogeneous arrays, eliminating the overhead of padding

vrnvu 6 days ago | parent [-]

Padding in the post refers to memory alignment.

If we had an "array of structs" instead of "struct of arrays" it would be: string(8) + long(8) + int(4) + padding(4) = 24 bytes