| ▲ | pepperoni_pizza 4 hours ago | |
I think the answer is "all of the above". Columnar storage is very effectively compressed so one "page" actually contains a lot of data (Parquet rowgroups default to 100k records IIRC). Writing usually means replacing the whole table once a day or appending a large block, not many small updates. And reading usually would be full scans with smart skipping based on predicate pushdown, not following indexes around. So the same two million row table that in a traditional db would be scattered across many pages might be four files on S3, each with data for one month or whatnot. But also in this space people are more tolerant of latency. The whole design is not "make operations over thousands of rows fast" but "make operations over billions of rows possible and not slow as a second priority". | ||
| ▲ | atombender an hour ago | parent [-] | |
Good points. I don't have a lot of experience with DuckDB in a production setting, but my team uses ClickHouse, where we ingest log and instrumentation data into materialized views at high volume. What I think saves the segmented/layered architecture there (ClickHouse calls them parts, but it's fundamentally the same thing) is that it's append-only, which means the "layers" don't go backwards, and a single row will never appear in more than one layer. But with a B+tree, the entire tree is mutable. | ||