▲ | vouwfietsman 3 days ago | ||||||||||||||||||||||
Although I welcome a parquet successor, I am not particularly interested in a more complicated format. Random access time improvements are nice, but really what I would like just storing multiple tables in a single parquet file. When I read "possible extension through embedded wasm encoders" I can already imagine the c++ linker hell required to get this thing included in my project. I also don't think a lot of people need "ai scale". | |||||||||||||||||||||||
▲ | drdaeman 3 days ago | parent | next [-] | ||||||||||||||||||||||
Storing multiple tables in a single file would be trivially solvable by storing multiple Parquet files in a most basic plain uncompressed tarball (to retain ability to access any part of any file without downloading the whole thing). Or maybe ar or cpio - tar has too many features (such as support for links) that are unnecessary here. Basically, anything well-standardized that implements a very basic directory structure, with a simple index located at a predictable offset. If any tools would've supported that. | |||||||||||||||||||||||
| |||||||||||||||||||||||
▲ | nylonstrung 3 days ago | parent | prev | next [-] | ||||||||||||||||||||||
Lance already exists to solve Parquet problems but with drastically faster random access time | |||||||||||||||||||||||
| |||||||||||||||||||||||
▲ | gcr 3 days ago | parent | prev | next [-] | ||||||||||||||||||||||
If you want "several tables and database-like semantics in one file," then what you want is DuckDB. If you want modern parquet, then you want the Lance format (or LanceDB for DB-like CRUD semantics). | |||||||||||||||||||||||
▲ | alfalfasprout 3 days ago | parent | prev [-] | ||||||||||||||||||||||
also what does "ai scale" even mean? | |||||||||||||||||||||||
|