| ▲ | fuzzybear3965 2 hours ago | ||||||||||||||||
Sure, but on principle, looking at the paper, I'd expect it to outperform B-trees since write amplification is reduced, generally. You thinking about cases requiring ordering of writes to a given record (lock contention)? | |||||||||||||||||
| ▲ | koverstreet 2 hours ago | parent [-] | ||||||||||||||||
I think their claims of write amplification reduction are a bit overstated given more realistic workloads. It is true that b-trees aren't ideal in that respect, and you will see some amount of write amplification, but not enough that it should be a major consideration, in my experience You really have to take into account workingset size and cache size to make any judgements there; your b-tree writes should be given by journal/WAL reclaim, which will buffer up updates. A purely random update workload will kill a conventional b-tree on write amplification - like I mentioned, that's the absolute worst case scenario for a b-tree. But it just doesn't happen in the real world. For the data I can give you, that would be bcachefs's hybrid b-tree - large btree nodes (256k, typically) which are internally log structured; I would consider it a minor variation on a classical b-tree. The log structuring mean that we can incrementally write only the dirty keys in a node, at the cost of some compaction overhead (drastically less than a conventional LSM). In actual real world usage, when I've looked at the numbers (not recently, so this may have changed) we're always able to do giant highly efficient b-tree writes - the journal and in-memory cache are batching things up as much as we want - which means write amplification is negligible. | |||||||||||||||||
| |||||||||||||||||