Observability 2.0 and the Database for It

jillesvangurp 2 months ago | parent | next [-]

Opensearch and Elasticsearch do most/all of what this proposes. And then some.

The mistake many teams make is to worry about storage but not querying. Storing data is the easy part. Querying is the hard part. Some columnar data format stored in S3 doesn't solve querying. You need to have some system that loads all those files, creates indices or performs some map reduce logic to get answers out of those files. If you get this wrong, stuff gets really expensive and costly quickly.

What you indeed want is a database (probably a columnar one) that provides fast access and that can query across your data efficiently at scale. That's not observability 2.0 but observability 101. Without that, you have no observability. You just have a lot of data that is hard to query and that provides no observability unless you somehow manage solve that. Yahoo figured that out 20 years or so ago when they created hadoop, hdfs, and all the rest.

The article is right to call out the fragmented landscape here. Many products only provide partial/simplistic solutions and they don't integrate well with each other.

I started out doing some of this stuff more than 10 years ago using Elasticsearch and Kibana. Grafana was a fork that hadn't happened yet. This combination is still a good solution for logging, metrics, and traces. These days, Opensearch (the Elasticsearch fork) is a good alternative. Basically the blob of json used in the article with a nice mapping would work fine in either. That's more or less what I did around 2014.

Create a data stream, define some life cycle policies (data retention, rollups, archive/delete, etc.), and start sending data. Both Opensearch and Elasticsearch have stateless versions now that store in S3 (or similar bucket based storage). Exactly like the article proposes. I'd recommend going with Elasticsearch. It's a bit richer in features. But Opensearch will do the job.

This is not the only solution in this space but it works well enough.

	▲	piterrro 2 months ago \| parent \| next [-]
		> The mistake many teams make is to worry about storage but not querying. Storing data is the easy part. Querying is the hard part. Some columnar data format stored in S3 doesn't solve querying. You need to have some system that loads all those files, creates indices or performs some map reduce logic to get answers out of those files. That's a nice callout, there's a lack of awareness in our space that producing logs is one thing, but if you do it on a scale, this stuff gets pretty tricky. Storing for effective query becomes crucial and this is what most popular OSS solutions seem to forget and their approach seem to be: we'll index everything and put it into memory for fast and efficient querying. I'm currently building a storage system just for logs[1] (or timestamped data because you can store events too, whatever you like that is written once and is indexed by a timestamp) which focuses on: data compression and query performance. There's just so much to squeeze if you think things carefully and pay attention to details. This can translate to massive savings. Seeing how much money is spent on observability tools at the company I'm currently working for (they probably spend well over 500k $ per year on: datadog, sumologic, newrelic, sentry, observe) for approximately 40-50TB of data produced per month - it just amazes me. The data could be compressed to like 2-3TB easily and stored for pennies on S3. [1] https://logdy.dev/logdy-pro
	▲	valyala 2 months ago \| parent \| prev [-]
		> The mistake many teams make is to worry about storage but not querying The amounts of logs, wide events and traces, which must be stored and queried, is frequently measured in hundreds of terabytes and petabytes. A petabyte of data on S3 costs $20000/month. Storage costs usually exceed compute costs on such a scale. So it is important to efficiently compress the ingested observability data, so it occupies less disk space and saves storage costs. Efficient storage schemes help improving the performance for heavy queries, which need to process hundreds of terabytes of logs. For example, if you need to calculate the 95th percentile of request duration among hundreds of billions of nginx logs, the database needs to read all these logs. The query performance in this case is limited by storage read bandwidth, since these amounts of logs do not fit RAM, so the processed logs cannot be cached there (either by the database itself or by the Operating System page cache). Let's estimate the time needed for processing 100TB of logs on a storage with 10GB/s read bandwidth (high-performance S3 and/or SSD): 100TB/10GB/s = 10000 seconds = 2 hours 47 minutes. Such query performance is unacceptable in most cases. How to optimize the query performance in this case? 1. Compress the data stored on disk. Typical logs are compressed by 10x and more, so they occupy 10x less storage space than the actual size of the stored logs. This helps improving heavy query performance in the case above by 10x, to 1000 seconds or around 17 minutes. 2. Use column-oriented storage, e.g. store data per every column separately. This usually improves compression rate, since the data from a single column usually has lower randomness comparing to the per-log-entry data. This also helps improving heavy query performance a lot, since the database can read only the data for the requested columns, while skipping the data for the rest of columns. Suppose, the request duration is stored in an uint64 fields with nanosecond precision. Then 100 billions of request durations can be stored in 100 billions * 8 = 800GB. 800GB can be read in 800GB / 10GB/s = 80 seconds on the storage with 10 GB/s read bandwidth. The query duration can be reduced even more if the column data is compressed. There are other options, which help optimizing the query performance over petabytes of logs (wide events): - Partitioning the data by time (for example, storing it into per-day partitions). This helps improving performance for queries with time range filters by efficiently skipping the partitions outside the requested time range (the majority of practical queries over logs / wide events). - Using bloom filters for skipping logs without the given rarely seen words / phrases (such as trace_id, user_id, ip, etc). See https://itnext.io/how-do-open-source-solutions-for-logs-work... for details. - Using min/max indexes for skipping logs without the given numeric values. All these techniques are implemented in VictoriaLogs, so it achieves high performance even in a single-node setup when running on your laptop / Raspberry PI. See https://docs.victoriametrics.com/victorialogs/

▲

rlupi 2 months ago | parent | prev | next [-]

Almost 8 years ago, when I was working as a Monitoring SRE at Google, I wrote a proposal to use compressed sensing to reduce storage and transmission costs from linear to logarithmic. (The proposal is also available publicly, as a defensive publication, after lawyers complicated it beyond recognition https://www.tdcommons.org/dpubs_series/954/)

I believe it should be possible now, with AI, to train online tiny models of how systems behave in production and then ship those those models to the edge to use to compress wide-event and metrics data. Capturing higher-level behavior can also be very powerful for anomaly and outlier detection.

For systems that can afford the compute cost (I/O or network bound), this approach may be useful.

This approach should work particularly well for mobile observability.

▲

pas 2 months ago | parent | next [-]

I guess many people had this idea! (My thesis proposal around ~2011-2012 was almost the same [endpoint and service specific models to filter/act/remediate], but turned out to be a bit too big of a bite.)

The LHC already used a hierarchical filtering/aggregation before, that probably inspired some of it - at least in my case.

▲

killme2008 2 months ago | parent | prev | next [-]

Interesting idea. Edge AI for initial anomaly detection before sending data to the central system makes sense. However, how do we handle global-level anomalies that require a broader perspective?

	▲	rlupi 2 months ago \| parent [-]
		Centrally. If your system have K finite modes of behavior (degrees of freedom), then you can compress it as some combination of these effects. Due to Donoh-Tanner Phase transition theorem, you can almost-surely reconstruct a lower dimensional (K-dimensional) manifold immersed in a N-dimensional space with O(k log N) points. For many real world systems, K << N. Now, K is the degree of freedom of your system, N is the size of your sample, or the inverse of the frequency resolution you need to capture anomalies (that's your sampling rate if you were sampling metrics at regular intervals, but here we are not). So you can capture random projections of your system, compare the results to the predictions of a pre-computed compression model of your system and only ship the changes. Low-dimensional projection maintains correlations (and introduces spurious ones), which can be used already in compressed forms for some central anomaly detection (e.g. how many replicas are affected by the same traffic).

▲

remram 2 months ago | parent | prev | next [-]

If you train a model on your filtered data and then use that model to filter the data you'll train on... it might become impossible to know what your data actually represents.

▲

tomrod 2 months ago | parent | prev [-]

How funny, I wrote a use case for something similar last week. It's is ridiculously simple to build a baseline AI monitor.

▲

jsumrall 2 months ago | parent | prev | next [-]

I'm biased because I recently introduced ClickHouse at my company, but everything I've seen so far makes me think analytical and observability use cases like this "just work" in ClickHouse.

Just like Postgres became the default choice for operational/relational workloads, I think ClickHouse is (or should) quickly become the standard for analytical workloads. In both cases, they both "just work". Postgres even has columnar storage extensions, but I still think ClickHouse is a better choice if you don't need transactions.

A rule of thumb I think devs should follow would be: use Postgres for operational cases, and ClickHouse for analytical ones. That should cover most scenarios well, at least until you encounter something unique enough to justify deeper research.

	▲	ubolonton_ 2 months ago \| parent [-]
		I introduced ClickHouse at my company 2 years ago, and came to the same conclusion. For observability, it seems to have become the dominant storage choice for new observability startups. And the newly introduced JSON type would help it winning even harder.

▲

sunng 2 months ago | parent | prev | next [-]

Author here. Thanks @todsacerdoti for posting this.

I am big fan of the idea to have original data and context as much as possible. With previous metrics system, we lost too much information by pre-aggregation and eventually run into the high-cardinality metrics issue by overwhelming the labels. For those teams own hundreds of millions to billions time series, this o11y 2.0/wide event approach is really worth it. And we are determined to build an open-source database that can deal with challenges of wide events for users from small team or large organization.

Of course, database is not the only issue. We need full tooling from instrument to data transport. We already have opentelemetry-arrow project for larger scale transmission that may work for wide events. We will continue to work in this ecosystem.

▲

zaptheimpaler 2 months ago | parent | prev | next [-]

At my company we seem to have moved a little in the opposite direction of observability 2.0. We moved away from the paid observability tools to something built on OSS with the usual split between metrics, logs and traces. It seems to be mostly for cost reasons. The sheer amount of observability data you can collect in wide events grows incredibly fast and most of it ends up never being read. It sucks but I imagine most companies do the same over time?

▲

wavemode 2 months ago | parent | next [-]

> The sheer amount of observability data you can collect in wide events grows incredibly fast and most of it ends up never being read.

That just means you have to be smart about retention. You don't need permanent logs of every request that hits your application. (And, even if you do for some reason, archiving logs older than X days to colder, cheaper storage still probably makes sense.)

▲

motorest 2 months ago | parent [-]

> That just means you have to be smart about retention.

It's not a problem of retention. It's a problem caused by the sheer volume of data. Telemetry data must be stored for over N days in order to be useful, and if you decide to track telemetry data of all tyoes involved in "wide events" throughout this period then you need to make room to persist it. If you're bundling efficient telemetry types like metrics with data intensive telemetry like logs in events them the data you need to store quickly adds up.

▲

killme2008 2 months ago | parent [-]

Agree. The new wide event pipeline should fully utilize cheaper storage options-object storage like S3. Includes both cold and hot data and maintains performance.

▲

gchamonlive 2 months ago | parent | next [-]

I'm totally in favor of cold storage. Just beware of how you are storing data, the granularity of the files and how frequent you think you'd want to access that data eventually in the future, because what kills in these services is the API cost. Oh and deleting data also trigger API costs AFAIK so there is that too...

	▲	thewisenerd 2 months ago \| parent [-]
		deleting data, has a cost. deleting data early after moving it to cold storage, has additional costs.

▲

valyala 2 months ago | parent | prev [-]

HDD-based persistent disks usually have much lower IO latency comparing to S3 (microseconds vs hundreds of milliseconds). This may help improving query performance a lot.

sc1 HDD-based volumes are cheaper than S3, while st1-based volumes are only 2x more expensive than S3 ( https://aws.amazon.com/ebs/pricing/ ). So there is little economical sense in using S3 over HDD-based persistent volumes.

▲

NitpickLawyer 2 months ago | parent | prev | next [-]

> The sheer amount of observability data you can collect in wide events grows incredibly fast and most of it ends up never being read.

Yes! I know of at least 3 anecdotal "oh shit" stories w/ teams being chewed by upper management when bills from SaaS observability tools get into hundreds of thousands because of logging. Turns out that uploading a full stack dump on error can lead to TBs of data that, as you said, most likely no-one will look at ever again.

▲

incangold 2 months ago | parent [-]

I agree with the broad point- as an industry we still fail to think of logging as a feature to be specified and tested like everything else. We use logging frameworks to indiscriminately and redundantly dump everything we can think of, instead of adopting a pattern of apps and libraries that produce thoughtful, structured event streams. It’s too easy to just chuck another log.info in; having to consider the type and information content of an event results in lower volumes and higher quality of observability data.

A small nit pick but having loads of data that “most likely no-one will look at ever again” is ok to an extent, for the data that are there to diagnose incidents. It’s not useful most of the time, until it’s really really useful. But it’s a matter of degree, and dumping the same information redundantly is pointless and infuriating.

This is one reason why it’s nice to create readable specs from telemetry, with traces/spans initiated from test drivers and passed through the stack (rather than trying to make natural language executable the way Cucumber does it- that’s a lot of work and complexity for non-production code). Then our observability data get looked at many times before there’s a production incident, in order to diagnose test failures. And hopefully the attributes we added to diagnose tests are also useful for similar diagnostics in prod.

	▲	openWrangler 2 months ago \| parent [-]
		I'm currently working with Coroot, which is an open source project trying to create a solution for this issue of logs and other telemetry sources being too much for any team to reasonably have time to parse manually. Data is automatically imported using eBPF and Coroot will provide insights into RCA (with things like mapped incident timeframes) to help with anything overlooked in dumps. GitHub here - hope the tool can help some folks in this thread: https://github.com/coroot/coroot

▲

kushalkamra 2 months ago | parent | prev | next [-]

you’re correct

i believe, we can identify patterns and highlight the variations, so this data can be put to good use.

by aggregating the historical data beyond a certain point, we can also reduce the quantum of it

▲

magic_hamster 2 months ago | parent | prev [-]

Should be easily solved with some kind of retention policy.

▲

awoimbee 2 months ago | parent | prev | next [-]

It looks like what the grafana stack does but it's linking specialized tools instead of building one big tool (eg linking traces [0]).

The only thing then is that there is no link between logs and metrics, but I guess since they created alloy [1] they could make it so logs and metrics labels match, so we could select/see both at once ?

Oh ok here's a blog post from 2020 saying exactly this: https://grafana.com/blog/2020/03/31/how-to-successfully-corr...

[0]: https://grafana.com/docs/grafana/latest/datasources/tempo/tr... [1]: https://grafana.com/docs/alloy/latest/

	▲	killme2008 2 months ago \| parent [-]
		Yes, that's the LGTM(Loki, Grafana, Tempo, and Mimir) stack. First, the main issue with this stack is maintenance: managing multiple storage clusters increases complexity and resource consumption. Consolidating resources can improve utilization. Second, differences in APIs (such as query languages) and data models across these systems increase adoption costs for monitoring applications. While Grafana manages these differences, custom applications do not.

▲

nexo-v1 2 months ago | parent | prev | next [-]

This sounds a lot like structured logging with a fresh coat of paint. Wide events are nice conceptual model, but if you’ve been doing structured logs seriously, especially with something like Loki or ELK stack, you’re already capturing rich context per event — including things like user info, request paths, even DB queries if needed.

I’ve been using Loki recently and really like the approach: it stores log data in object storage and supports on-the-fly processing and extraction. You can build alerts and dashboards off it without needing to pre-aggregate or force everything into a metrics pipeline.

The real friction in all of these systems is instrumentation. You still need to get that structured event data out of your app code in a consistent way, and that part is rarely seamless unless your runtime or framework does most of it for free. So while wide events are a clean unification model, the dev overhead to emit them with enough fidelity is still very real.

▲

wbh1 2 months ago | parent | next [-]

You are correct — wide events are essentially equivalent to structured logs. Charity Majors says as much in her blog post linked at the top of this article.

> The building block of o11y 2.0 is wide, structured log events

Wide events and structured logs are often used interchangeably. One caveat is that in "wide, structured log events" you're only emitting one [giant] log for each request coming through your service. In contrast, I still see many people using structured logs but in the "old fashioned" way of emitting multiple log lines per request.

▲

valyala 2 months ago | parent | prev | next [-]

Loki doesn't work well with structured logs and wide events because it has weak support for log fields with many unique values such as trace_id, span_id, user_id, etc. (aka high-cardinality fields). The recommended way to store structured logs with such fields in Loki is to put them into a big JSON and store it as log message. Later this JSON must be parsed at query time in order to apply various filters and aggregations on log fields. Such an approach doesn't scale well, since Loki needs to read all the log messages with all the logs fields encoded inside JSON log messages during query execution. This requires a lot of additional read IO and CPU for reading, unpacking and parsing the log messages. This also worsens data compression at the storage, which slows down query execution even more.

The much better approach is to store data per every log field into column-based storage. This significantly improves query performance, since only the data for the requested columns must be read from the storage, and this per-column data usually has much better compression rate, so it occupies less storage space.

▲

rbranson 2 months ago | parent | prev [-]

No, wide events are quasi-relational. Read the literature please. I’ve never seen a structured log implementation that is anything remotely relational, it’s just a bunch of random junk thrown into a pile with maybe some common tags.

▲

esafak 2 months ago | parent [-]

Can you expand? What makes the event relational; consistency of tags?

▲

jiggawatts 2 months ago | parent [-]

Relational would be to not repeat yourself — written by jiggawatts on Hacker News, 2025 May.

It causes write amplification and slow queries. — written by jiggawatts on Hacker News, 2025 May.

Some would argue that it’s too hard to solve, but those same people are ingesting a petabyte of logs per hour, which is a bigger problem if you ask me. — written by jiggawatts on Hacker News, 2025 May.

	▲	polynomial 2 months ago \| parent [-]
		This seems related to normalization, just going by your example.

▲

fuzzy2 2 months ago | parent | prev | next [-]

This article leaves me confused. The “wide event” example presented is a mishmash of all the different concerns involved with a business operation: HTTP request, SQL query, business objects, caches, …. How is this any better than collecting most of this information as separate events on a technical level (with minimal, if any, code changes: interceptors, middleware etc) and then aggregating afterwards?

From my perspective, this is just structured logging. It doesn’t cover tracing and metrics, at all.

> This process requires no code changes—metric are derived directly from the raw event data through queries, eliminating the need for pre-aggregation or prior instrumentation.

“requires no code changes”? Well certainly, because by the time you send events like that your code has already bent over backwards to enable them.

Surely I must be missing something.

▲

sunng 2 months ago | parent [-]

Yes, this is a common confusing point between structured logging and wide event. The Wide Event 101 article I referenced has clear explanation:

> Structured logs could be wide events, but not all structured logs are wide events. A structured log with 5 fields is not a wide event. A structured log with no context is not a wide event.

And these also why it requires no code changes to extract more metrics from wide event. The context can carry enough information and you just write a new query to retrieve it. In current metrics tooling, you will make code change to define new labels or add new metrics for that.

▲

lnenad 2 months ago | parent [-]

> And these also why it requires no code changes to extract more metrics from wide event.

I think the point of OP's comment is that while you're not paying code tax for to parse/aggro the data as it's all in one place you're paying code tax for actually generating the event with everything in it.

	▲	sunng 2 months ago \| parent [-]
		Sure you still need to code but instead of concrete metrics one by one, you instrument the context and the state. The opentelemetry trace API can save you a lot of work. But I agree there is still potential to improve the auto instrument.

▲

algorithmsRcool 2 months ago | parent | prev | next [-]

If I understand correctly the "big idea" here is:

1. Juice up your Traces with every attribute possible 2. Use a telemetry backend that relies on cheap object storage so that your costs don't explode. 3. ...profit?

Ok, but now we are exporting and storing everything about every request just so we can derive some previously cheap metrics like server CPU consumption? I guess for most applications the overhead of buffering, formatting and sending all of this telemetry data doesn't matter for folks?

▲

jiggawatts 2 months ago | parent [-]

Yes, it is absurdly expensive no matter what the marketing says. It’s only “cheap” if you’re setting VC cash on fire.

The benefit is that you can retroactively extract reports filtered with very complex predicates.

Sure, aggregated metrics are cheap and efficient, but trivial metrics like CPU usage just tell you that there is a problem, not what the problem is. If you need to “deep dive”, you can’t, not without a Time Machine to go back and configure a filtered metric looking for the specific info you need.

Most sysadmins at this point would just configure a new filtered metric and start collecting data… for a month. While the system is broken. Wrong needle? Start looking through the haystack again with another new custom metric for another month.

As a random example, many systems will track 5xx errors per minute. Great, but are those timeouts or instant failures? I want to group 5xx errors per time bucket! Are those correlated by app release version? By server memory free bytes? By instance? Kernel version? Etc…

Wide events let you do all those and more, trivially and quickly: seconds instead of months.

The downside is the cost.

▲

algorithmsRcool 2 months ago | parent [-]

> Most sysadmins at this point would just configure a new filtered metric and start collecting data… for a month. While the system is broken. Wrong needle? Start looking through the haystack again with another new custom metric for another month.

In this example i feel like it is treating metrics as the only telemetry signal that operators have access to. Once the metrics indicate an issue, we can pull existing logs, traces and profiles to dig into it and eventually capture dumps.

I'm totally onboard with the idea of rich trace metadata, but it seems more evolutionary than revolutionary

▲

jiggawatts 2 months ago | parent [-]

If the logs and traces contain enough info to reproduce the metric, then you don't need a separate metric! That's basically the point here: you can derive arbitrary metrics from wide logs.

▲

valyala 2 months ago | parent [-]

You can't derive system metrics such as the usage of CPU, RAM, disk IO, disk space and network, from wide events.

	▲	algorithmsRcool 2 months ago \| parent [-]
		Well, I could just enrich the trace/event with sample data from CPU, RAM, Disk I/O, etc...

▲

teleforce 2 months ago | parent | prev | next [-]

> We believe raw data based approach will transform how we use observability data and extract value from it.

Perhaps we need to have generic database framework that properly and seamlessly cater for both raw and cooked (processed) for observability something similar to D4M [1].

[1] D4M: Dynamic Distributed Dimensional Data Model:

https://www.mit.edu/~kepner/D4M/

	▲	killme2008 2 months ago \| parent [-]
		The extraction of raw data is the cooking or processing, and the results are ingested back into the same database. I think it's the approach described in this article.

▲

Drahflow 2 months ago | parent | prev | next [-]

The point that the trinity of logs, metrics and traces wastes a lot of engineering effort to pre-select the right metrics (and labels) and storage (by having too many information triplicate), is a good one.

> We believe raw data based approach will transform how we use observability data and extract value from it. Yep. We have built quuxLogging on the same premise, but with more emphasis on "raw": Instead of parsing events (wide or not), we treat it fundamentally as a very large set of (usually text) lines and optimized hard on the querying-lots-of-text part. Basically a horizontally scaled (extremely fast) regex engine with data aggregation support.

Having a decent way to get metrics from logs ad-hoc completely solves the metric cardinality explosion.

▲

wvh 2 months ago | parent | next [-]

Many companies are having trouble to even keep Prometheus running without it getting OOM killed though.

I understand and agree with the problem this is trying to solve; but the solution will rival the actual business software it is observing in cost and resource usage. And hence, just like in quantum mechanics, observing it will drastically impact the event.

▲

Drahflow 2 months ago | parent | next [-]

> in cost and resource usage

Nah, it's fine. Storage of raw logs is pretty cheap (and I think this is widely assumed). For querying, two problems arise:

1. Query latency, i.e. we need enough CPUs to quickly return a result. This is solved by horizontal scaling. All the idle time can be amortized across customers in the SaaS setting (not everyone is looking at the same time).

2. Query cost, i.e. the total amount of CPU time (and other resources) spent per data scanned must be reasonable. This ultimately depends on the speed of the regex engine. We're currently at $0.05/TB scanned. And metric queries on multi-TB datasets can usually be sampled without impacting result quality much.

	▲	wvh 2 months ago \| parent [-]
		It's not the storage cost; it's the computational load (memory, CPU, sometimes network) of gathering thousands and thousands of metrics by default, most of which go unused.

▲

thewisenerd 2 months ago | parent | prev [-]

> observing it will drastically impact the event

this presumes 'metrics' are 'cheaper' than 'traces' / observability 2.0 from a setup standpoint; purely from an implementation perspective?

	▲	wvh 2 months ago \| parent [-]
		Wide events seems like it would require more memory and CPU to combine and more bandwidth due to size. I've implemented services with loggers that gather data and statistics and write out just one combined log line at the end. It's certainly more economical in regard to dev time, not sure how "one large" compares to "many small" in reality resource-wise.

▲

thewisenerd 2 months ago | parent | prev [-]

> having a decent way to get metrics from logs ad-hoc completely solves the metric cardinality explosion.

last i checked, the span metrics connector[1] was supposed to "solve" this in otel; but i'm not particularly inclined, as configurations are fixed.

any data analytics platform worth it's money should be able to do this at runtime (for specified data volume constraints, in reasonable time).

in general, structured logging should also help with this; as much as i love regex, i do not think extracting "data" from raw logs is lossless.

[1] https://github.com/open-telemetry/opentelemetry-collector-co...

▲

the_duke 2 months ago | parent | prev | next [-]

There are a whole bunch of attempts to unify metrics, logs and traces into a single DB now.

* InfluxDB (the newest Rust rewrite)

* http://openobserve.ai/

* https://uptrace.dev/

* Clickhouse powered solutions (eg https://signoz.io)

* ... ?

I'm quite skeptical about the "store raw data" approach. It makes querying much more complex and slower, storage much more expensive, etc.

Columnar databases that can store the data very efficiently are the way to go, IMO. They can still benefit from cheap long-term storage like S3.

	▲	pas 2 months ago \| parent [-]
		In the article the "Materialized view for data derivation" part does the heavy lifting. I assume this means they are creating time-series (indices) on-the-fly (with eventual backfill of the data). For the "exploratory analytics" the techniques developed for Dremel/Drill/Impala [0] are sufficient, and for anything else raw data crunching speeds are really impressive nowadays. (And they claim they can ingest 1B JSON records in ~10-30 seconds [1].) [0] https://en.wikipedia.org/wiki/Dremel_(software) [1] https://greptime.com/blogs/2025-03-18-jsonbench-greptimedb-p...

▲

PeterZaitsev 2 months ago | parent | prev | next [-]

I'm not sure whole "Observability 2.0" focuses on the most pressing problem, which is complexity, both in deployment and usage. At Coroot we focus not on the format of the data and whenever it should be one thing or several "pillars" but rather how easy is it to install it and get the value of resolving issues quickly and easily when you need to. https://coroot.com/

▲

agounaris 2 months ago | parent | prev | next [-]

This model seems super expensive! I interpret it as traces on steroids which will make query complex and slow!

A lot of businesses haven't even nailed simple histograms with prometheus. I wouldn't like observability to become a full set of problems on its own!

Also timeseries is powerfull in observability because a lot of issues can be represented as cheap counters, gauges and distributions. I want to see a paradigm complimentary to this simple principle instead of producing nested documents with nested objects.

▲

2 months ago | parent | prev | next [-]

[deleted]

▲

arkh 2 months ago | parent | prev | next [-]

After reading this post I'm left wondering: you want to capture events. You want to have different views of them. Why don't you use Kafka and create a consumer per "view"?

	▲	killme2008 2 months ago \| parent \| next [-]
		That's a good question. First of all, Kafka is still an event streaming platform and lacks database capabilities such as indexing and query optimization. Although ksql/Kafka Streams can perform computations based on consuming data, they require repeatedly pulling data, and there are no technologies like indexing to accelerate queries. Secondly, dashboards and alerts in monitoring scenarios require a large number of views—these are the “known unknowns”. When dealing with “unknown unknowns” during exploration, it’s necessary to create views dynamically, which may result in a significant increase in the number of views. I’m not sure whether Kafka can handle such situations. Because monitoring requires greater real-time performance, it’s difficult to tolerate delays.
	▲	v0y4g3r 2 months ago \| parent \| prev [-]
		[dead]

▲

ralphael 2 months ago | parent | prev | next [-]

Thanks to all who commented, I learnt a lot from just reading the comments.

▲

QuiCasseRien 2 months ago | parent | prev [-]

just one word : uptrace, https://uptrace.dev/

a very satisfied user : trace, metrics, log in a perfect way

▲

remram 2 months ago | parent | next [-]

Maybe add some more words, what is good about it? Looks just like ELK.

	▲	QuiCasseRien 2 months ago \| parent [-]
		I discover it 2 years ago, it's open source and i currently use the SaaS offer : 30$ for 300Go and 30k timeseries. The underlying database is clickhouse, quite all charts/tables are shown incredibly fast whereas 180Go are traces/logs/metrics in a 6 weeks rolling I like the UI and how you can create or copy, everything is yaml based. Contrary to grafana, the user experience is simple and it also proposes templates for most used metrics (host, service...) It's one of the few to handle the three pillars (trace, log, metric) in the same product. Monitor and alert are well design and easy to set up. Try it and adopt it. I am absolutely not related to uptrace, but their product solve a great issue for me. It's my secondary daily critical tool, just after onedev (onedev.io) for dev.

▲

piterrro 2 months ago | parent | prev [-]

hosted or on-prem version? How much data per month with what retention? What's the final bill?

	▲	QuiCasseRien 2 months ago \| parent [-]
		both. It's opensource, there is a full feature docker-compose to host on your own. Or you can use their hosting service, highly available and with good prices. 30$/monthly for our 380Go + 30k timeseries. Perfect.