| ▲ | ozgrakkurt 2 days ago | |
Some more links that are inside the article: - More info about skiplists: https://arxiv.org/pdf/2403.04582 - Performance comparison with B-tree ?: https://db.cs.cmu.edu/papers/2018/mod342-wangA.pdf - Other blog from Anthithesis about writing their own db: https://antithesis.com/blog/2025/testing_pangolin/ Also I find it a bit hard to understand the performance outcome of this setup. I know formats like parquet and databases like ClickHouse work better when duplicating data instead of doing joins. I guess BigQuery is similar. The article is great but would be also interesting to learn how performance actually worked out with this. | ||
| ▲ | nz 2 days ago | parent [-] | |
Back in 2014, I did an analysis of (single threaded) CPU-efficiency and RAM-efficiency of various data-structures (skiplists, slablists, avl-trees, rb-trees, b-trees): https://nickziv.wordpress.com/wp-content/uploads/2014/02/vis... I used whatever I could find on the internet at the time, so the comparison compares both algorithm and implementation (they were all written in C, but even slight changes to the C code can change performance -- uuavl performs much better than all other avl variants, for example). I suspect that a differently-programmed skip-list would not have performed quite so poorly. The general conclusion from all this, is that any data-structure that can organize itself _around_ page-sizes and cache-sizes, will perform very well compared to structures that cannot. | ||