▲ | Show HN: Okapi – a metrics engine based on open data formats(github.com) | |||||||||||||
12 points by kushal2048 2 days ago | 5 comments | ||||||||||||||
Hi All I wanted to share an early preview of Okapi an in-memory metrics engine that also integrates with existing datalakes. Modern software systems produce a mammoth amount of telemetry. While we can discuss whether or not this is necessary, we can all agree that it happens. Most metrics engines today use proprietary formats to store data and don’t use disaggregated storage and compute. Okapi changes that by leveraging open data formats and integrating with existing data lakes. This makes it possible to use standard OLAP tools like Snowflake, Databricks, DuckDB or even Jupyter / Polars to run analysis workflows (such as anomaly detection) while avoiding vendor lock-in in two ways - you can bring your own workflows and have a swappable compute engine. Disaggregation also reduces Ops burden of maintaining your own storage and the compute engine can be scaled up and down on demand. Not all data can reside in a data-lake/object store though - this doesn’t work for recent data. To ease realtime queries Okapi first writes all metrics data to an in memory store and reads on recent data are served from this store. Metrics are rolled up as they arrive which helps ease memory pressure. Metrics are held in-memory for a configurable retention period after which it gets shipped out to object storage/datalake (currently only Parquet export is supported). This allows fast reads on recent data while offloading query-processing for older data. On benchmarks queries on in-memory data finish in under a millisecond while having write throughput of ~280k samples per second. On a real deployment, there’d be network delays so YMMV. Okapi it is still early — feedback, critiques, and contributions welcome. Cheers ! | ||||||||||||||
▲ | laxmansharma a day ago | parent | next [-] | |||||||||||||
Really like the direction: hot data in memory for dashboards, cold data in Parquet so you can use normal lake/OLAP tools and avoid lock-in. The streaming rollups + open formats story makes sense for cost and flexibility. A few focused questions to understand how it behaves in the real world: Core design & reliability What protects recent/ hot in-memory data if a node dies? Is there a write-ahead log (on disk or external log like Kafka) or replication, or do we accept some loss between snapshots? How does sharding and failover work? If a shard is down, can reads fan out to replicas, and how are writes handled? When memory gets tight, what’s the backpressure plan—slow senders, drop data, or some other smarter approach? How do you handle late or out-of-order samples after a rollup/export—can you backfill/compact Parquet to fix history? Are there any plans to do this? Queries Will there be any plans for data models and different metric types for the hot in memory store like gauge, counter etc.? Performance & sizing The sub-ms reads are great, is there a Linux version for the performance reports so it's easier to compare with other products? Along with the throughput/ latency details I found on Github, are you able to share the memory/ CPU overhead/ GC details etc. for the benchmarks? What is the rough recommended sizing for RAM/ CPU for different ingestion inputs in terms of bytes per sample or traffic estimation. Lake/Parquet details Considering most people use some other solution like prometheus etc. at this point, will there be an easier migration strategy by Okapi ? Will Okapi be able to serve a single query across hot (memory) + cold (Parquet) seamlessly, or should older ranges be pushed entirely to the lake side and analyzed through OLAP systems ? Ops & security Snapshots can slow ingest—are those pauses tunable and bounded? Any metrics/alerts for export lag, memory pressure, or cardinality spikes? A couple of end-to-end examples (for queries) and a Helm chart/Terraform module would make trials much easier. Are there any additional monitoring and observability implemented or have plans for Okapi itself? Overall: promising approach with a strong cost/flexibility angle. If you share Linux+concurrency benchmarks, ingest compatibility, and lake table format plans (Iceberg/Delta), I think a lot of folks here will try it out. | ||||||||||||||
| ||||||||||||||
▲ | shaheeraslam a day ago | parent | prev [-] | |||||||||||||
This is amazing work man! Can we setup a demo with you guys. | ||||||||||||||
|