| ▲ | Vortex: An extensible, state of the art columnar file format(github.com) |
| 58 points by tanelpoder 5 days ago | 10 comments |
| |
|
| ▲ | kipukun 31 minutes ago | parent | next [-] |
| The cuDF interop in the roadmap [1] will be huge for my workloads. XGBoost has the fastest inference time on GPUs, so a fast path straight from these Vortex files to GPU memory seems promising. [1] https://github.com/vortex-data/vortex/issues/2116 |
|
| ▲ | nahnahno 3 hours ago | parent | prev | next [-] |
| how does this compare to Arrow IPC / Feather v2? |
|
| ▲ | sys13 5 hours ago | parent | prev [-] |
| How does this compare with delta lake and iceberg? |
| |
| ▲ | oa335 5 hours ago | parent | next [-] | | Vortex is a file format, where as delta lake and iceberg are table formats. it should be compared to Parquet rather than delta lake and iceberg.
This guest lecture by a maintainer of Vortex provides a good overview of the file format, motivations for its creation and its key features. https://www.youtube.com/watch?v=zyn_T5uragA | | |
| ▲ | ks2048 4 hours ago | parent | next [-] | | The website could use a comparison / motivation in comparison to Parquet (beyond just stating it's 100x better). | | |
| ▲ | 3eb7988a1663 an hour ago | parent [-] | | Agreed, really need a tl;dr here, because Parquet is boring technology. Going to require quite the sales pitch to move. At minimum, I assume it will be years before I could expect native integration in pandas/polars/etc which would make it low effort enough to consider. Parquet is ..fine, I guess. It is good enough. Why invoke churn? Sell me on the vision. | | |
| ▲ | frisbm 30 minutes ago | parent [-] | | DuckDB just added support for vortex in their last release using the Vortex Python package so hopefully other tools wont be too far behind |
|
| |
| ▲ | sys13 5 hours ago | parent | prev [-] | | I think it would still make sense to compare with those table formats, or is the idea that you would only use this if you could not use a table format? | | |
| ▲ | bz_bz_bz 4 hours ago | parent [-] | | That’s like comparing words with characters. Vortex is, roughly, how you save data to files and Iceberg is the database-like manager of those files. You’ll soon be able to run Iceberg using Vortex because they are complementary, not competing, technologies. |
|
| |
| ▲ | cpard 3 hours ago | parent | prev [-] | | As others said, Vortex is complementary to the table
Formats you mentioned. There are other formats though that it can be compared to. The Lance columnar format is one: https://github.com/lancedb/lancedb And Nimble from Meta is another: https://github.com/facebookincubator/nimble Parquet is so core to data infra and widespread, that removing it from its throne is a really really hard task. The people behind these projects that are willing to try and do this, have my total respect. |
|