▲ | latency-guy2 7 months ago | |
Well tombstoning is fundamentally punting the operation, the data is still there taking up space and computation if the flagged entry does not get removed from varying levels of query plans. I agree that it meets the requirements for batched DELETE, and that's likely as best as we can make it. But I wonder if there was a better way. I know there are research DBs out there that experimented with reusing the tombstone entry for new INSERT/UPDATE operations, but these suck when you want to do batched INSERT/UPDATE on a range since they're scattered all about in a table, and you lose ordering + monotonic properties. | ||
▲ | mike_hearn 7 months ago | parent [-] | |
The way tombstones work in a sorted KV store like RocksDB is that queries walk up the levels, and the moment a tomb stone is hit the walk stops because it's known the keys are deleted. Then when the levels are compacted the live data is evacuated into a new tablet, so it's like generational GC. The cost scales with live data, not how much data there is in total. The problem of course is you pay that cost over and over again. |