Remix.run Logo
oa335 6 hours ago

Vortex is a file format, where as delta lake and iceberg are table formats. it should be compared to Parquet rather than delta lake and iceberg. This guest lecture by a maintainer of Vortex provides a good overview of the file format, motivations for its creation and its key features.

https://www.youtube.com/watch?v=zyn_T5uragA

ks2048 6 hours ago | parent | next [-]

The website could use a comparison / motivation in comparison to Parquet (beyond just stating it's 100x better).

3eb7988a1663 3 hours ago | parent [-]

Agreed, really need a tl;dr here, because Parquet is boring technology. Going to require quite the sales pitch to move. At minimum, I assume it will be years before I could expect native integration in pandas/polars/etc which would make it low effort enough to consider.

Parquet is ..fine, I guess. It is good enough. Why invoke churn? Sell me on the vision.

bsder 4 minutes ago | parent | next [-]

> Going to require quite the sales pitch to move.

Mutability would be one such pitch I would like to see ...

frisbm 2 hours ago | parent | prev [-]

DuckDB just added support for vortex in their last release using the Vortex Python package so hopefully other tools wont be too far behind

sys13 6 hours ago | parent | prev [-]

I think it would still make sense to compare with those table formats, or is the idea that you would only use this if you could not use a table format?

bz_bz_bz 5 hours ago | parent [-]

That’s like comparing words with characters.

Vortex is, roughly, how you save data to files and Iceberg is the database-like manager of those files. You’ll soon be able to run Iceberg using Vortex because they are complementary, not competing, technologies.