Remix.run Logo
rlupi 2 months ago

Almost 8 years ago, when I was working as a Monitoring SRE at Google, I wrote a proposal to use compressed sensing to reduce storage and transmission costs from linear to logarithmic. (The proposal is also available publicly, as a defensive publication, after lawyers complicated it beyond recognition https://www.tdcommons.org/dpubs_series/954/)

I believe it should be possible now, with AI, to train online tiny models of how systems behave in production and then ship those those models to the edge to use to compress wide-event and metrics data. Capturing higher-level behavior can also be very powerful for anomaly and outlier detection.

For systems that can afford the compute cost (I/O or network bound), this approach may be useful.

This approach should work particularly well for mobile observability.

pas 2 months ago | parent | next [-]

I guess many people had this idea! (My thesis proposal around ~2011-2012 was almost the same [endpoint and service specific models to filter/act/remediate], but turned out to be a bit too big of a bite.)

The LHC already used a hierarchical filtering/aggregation before, that probably inspired some of it - at least in my case.

killme2008 2 months ago | parent | prev | next [-]

Interesting idea. Edge AI for initial anomaly detection before sending data to the central system makes sense. However, how do we handle global-level anomalies that require a broader perspective?

rlupi 2 months ago | parent [-]

Centrally. If your system have K finite modes of behavior (degrees of freedom), then you can compress it as some combination of these effects.

Due to Donoh-Tanner Phase transition theorem, you can almost-surely reconstruct a lower dimensional (K-dimensional) manifold immersed in a N-dimensional space with O(k log N) points. For many real world systems, K << N. Now, K is the degree of freedom of your system, N is the size of your sample, or the inverse of the frequency resolution you need to capture anomalies (that's your sampling rate if you were sampling metrics at regular intervals, but here we are not). So you can capture random projections of your system, compare the results to the predictions of a pre-computed compression model of your system and only ship the changes. Low-dimensional projection maintains correlations (and introduces spurious ones), which can be used already in compressed forms for some central anomaly detection (e.g. how many replicas are affected by the same traffic).

remram 2 months ago | parent | prev | next [-]

If you train a model on your filtered data and then use that model to filter the data you'll train on... it might become impossible to know what your data actually represents.

tomrod 2 months ago | parent | prev [-]

How funny, I wrote a use case for something similar last week. It's is ridiculously simple to build a baseline AI monitor.