Remix.run Logo
srean 3 hours ago

This.

I also think that a lot of the waste can be done away with by using application specific codecs. Yes, even gzip compresses logs and metrics by a lot, but one can go further with specialized codecs to hone in on the redundancy much quicker (than what a generic lossless compressor eventually would).

However to build these one can't have a "throw it over the 3rd party wall" mode of development.

One way to do this for stable services would be to build hi-fidelity (mathematical/statistical) models for the logs and metrics, then serialize what is non-redundant. This applies particularly well for numeric data where gzip does not do as well. What we need is the analogue of jpeg for the log type.

At my workplace there has been political buy in of the idea that if a long / metric stream has not been used in 2~3 years, then throw it away and stop collecting. This rubs me the wrong way because so many times I have wished there was some historic data for my data-science project. You never know what data you might need in the future. You, however, do know that you do not need redundant data.