Remix.run Logo
331c8c71 4 hours ago

It's certainly quite pleasant to work with...but I would rather use sql for etl, the backend be whatever it needs to be...

The real world data transformations can get gnarly very quickly and sql is the perfect common debiminator compared to dplyr which is still niche...

How do you feel about polars?

mjhay 4 hours ago | parent [-]

I’m a big fan of Polars. It’s really fast and memory efficient. With the lazy streaming functionality, I’ve been able to easily process 1 Tb+ data on a single machine (you do have to be careful to not do any operation that would cause the whole DF to materialize in that case).

It’s certainly miles better than Pandas, which has a terrible API in addition to being comically inefficient. In my group, we generally use it for any new work, and have also swapped out pandas for polars in critical spots of our existing code - the latter giving a huge benefit relative to the amount of work it took.

I largely agree with you on SQL being the common denominator, but there are some things that are just awkward in SQL, and much easier to do in Python or other general purpose language.