Remix.run Logo
willvarfar 2 hours ago

My experience was that data science was doable but clunky and ugly with pandas. It got slightly better with polars. Only really slightly better. Then, for me at least, it jumped lightyears ahead with duckdb.

These days I run some big query on an OLAP database and download the results to parquet stored on the local disk of a cloud notebook VM and then mine it to bits with duckdb reading straight from these parquet files.

The notebooks end up with very clear SQL queries and results (most notebook servers support SQL cells with highlighting and completion etc), and small pockets of python cells for doing those corner case things that an imperative language makes easier.

So when I get to the bottom of the article where it shows the difference between Python and R, I'm screaming "wouldn't that look better in SQL?!" :)

mettamage an hour ago | parent | next [-]

Huh, as a frequent polars user, I'll try duckdb.

goatlover an hour ago | parent | prev [-]

So you're saying you prefer SQL to dataframes. I prefer dataframes and staying in the native language.

willvarfar 32 minutes ago | parent [-]

Duckdb can see and manipulate dataframes too. Duckdb has it's own storage, but other table storage - e.g. the parquet files I mentioned or even csv files or even dataframes from pandas and polars - are first-class citizens. Duckdb lets you query them quickly and expressively.