Remix.run Logo
yencabulator an hour ago

Creating a new record batch for a single row is also a huge kludge leading to lot of write amplification. At that point, you're better off storing rows than pretending it's columnar.

I actually wrote a row storage format reusing Arrow data types (not Feather), just laying them out row-wise not columnar. Validity bits of the different columns collected into a shared per-row bitmap, fixed offsets within a record allow extracting any field in a zerocopy fashion. I store those in RocksDB, for now.

https://git.kantodb.com/kantodb/kantodb/src/branch/main/crat...

https://git.kantodb.com/kantodb/kantodb/src/branch/main/crat...

https://git.kantodb.com/kantodb/kantodb/src/branch/main/crat...