Remix.run Logo
lowbloodsugar 4 days ago

When working with your own datasets, v2 is a must. If you are willing to make trade offs you can get insane compression and speed.

ted_dunning 4 days ago | parent [-]

Why doesn't this show in the examples in the article? Do you have examples?

lowbloodsugar a day ago | parent [-]

Couple of examples can think of off the top of my head from recording logs for analysis. 1. It might be better to buffer up logs, then sort on a different column than time. You may benefit from the delta encoding or prefix encoding. 2. If you have tracking info, which is usually a random like a UUID or something, then ditch it. Not debugging with this dataset so don’t waste the space on a crazy high noise column. Shit like that.