| ▲ | koverstreet 2 hours ago | |||||||
I think their claims of write amplification reduction are a bit overstated given more realistic workloads. It is true that b-trees aren't ideal in that respect, and you will see some amount of write amplification, but not enough that it should be a major consideration, in my experience You really have to take into account workingset size and cache size to make any judgements there; your b-tree writes should be given by journal/WAL reclaim, which will buffer up updates. A purely random update workload will kill a conventional b-tree on write amplification - like I mentioned, that's the absolute worst case scenario for a b-tree. But it just doesn't happen in the real world. For the data I can give you, that would be bcachefs's hybrid b-tree - large btree nodes (256k, typically) which are internally log structured; I would consider it a minor variation on a classical b-tree. The log structuring mean that we can incrementally write only the dirty keys in a node, at the cost of some compaction overhead (drastically less than a conventional LSM). In actual real world usage, when I've looked at the numbers (not recently, so this may have changed) we're always able to do giant highly efficient b-tree writes - the journal and in-memory cache are batching things up as much as we want - which means write amplification is negligible. | ||||||||
| ▲ | fuzzybear3965 2 hours ago | parent [-] | |||||||
Of course mileage may vary with different workloads, but are there any good benchmarks/suites to use for comparison in cases like these? They used YCSB but I don't know if those workloads ([1]) are relevant to modern/typical access patterns nor if they're applicable to SQL databases. You thinking about running some benchmarks in a bcachefs branch (:pray:)? I want to see this data structure prototyped in PostgreSQL. [1]: https://github.com/brianfrankcooper/YCSB/tree/master/workloa... | ||||||||
| ||||||||