Remix.run Logo
aynyc 4 hours ago

What's the difference between feather and parquet in terms of usage? I get the design philosophy, but how would you use them differently?

tosh 4 hours ago | parent | next [-]

parquet is optimized for storage and compresses well (=> smaller files)

feather is optimized for fast reading

aynyc 3 hours ago | parent | next [-]

Given the cost of storage is getting cheaper, wouldn't most firms want to use feather for analytic performance? But everyone uses parquet.

yencabulator 2 hours ago | parent | next [-]

You can, still, gain a lot of performance by doing less I/O.

outside1234 3 hours ago | parent | prev [-]

What people have done in the face of cheaper storage is store more data.

twic an hour ago | parent | prev [-]

And now there's Lance! https://lance.org/

dionian 4 hours ago | parent | prev [-]

https://stackoverflow.com/questions/48083405/what-are-the-di...

aynyc 4 hours ago | parent [-]

I read that. But afaik, feather format is stable now. Hence my confusion. I use parquet at work a lot, where we store a lot of time series financial data. We like it. Creating the Parquet data is a pain since it's not append-able.

yencabulator 2 hours ago | parent | next [-]

Generally Parquet files are combined in an LSM style, compacting smaller files into larger ones. Parquet isn't really meant for the "journal" of level-0 append-one-record style storage, it's meant for the levels that follow.

aynyc an hour ago | parent [-]

So feather for journaling and parquet for long term processing?

dionian 42 minutes ago | parent | prev [-]

Have you considered something like iceberg tables?