▲ | Show HN: OnPair – String compression with fast random access (Rust, C++)(github.com) | |
5 points by gargiulof 5 days ago | ||
I’ve been working on a compression algorithm for fast random access to individual strings in large collections. The problem came up when working with large in-memory database columns (emails, URLs, product titles, etc.), where low-latency point queries are essential. With short strings, LZ77-based compressors don’t perform well. Block compression helps, but block size forces a trade-off between ratio and access speed. Some existing options: - BPE: good ratios, but slow and memory-heavy - FSST (discussed here: https://news.ycombinator.com/item?id=41489047): very fast, but weaker compression This solution provides an interesting balance (more details in the paper): - Compression ratio: similar to BPE - Compression speed: 100–200 MiB/s - Decompression speed: 6–7 GiB/s I’d love to hear your thoughts — whether it’s workloads you think this could help with, ideas for API improvements, or just general discussion. Always happy to chat here on HN or by email. --- Resources: - Paper: https://arxiv.org/pdf/2508.02280 |