Remix.run Logo
wood_spirit 9 hours ago

I’m also a duckdb convert. All my notebooks have moved from Pandas and polars to Duckdb. It is faster to write and faster to read (after you return to a notebook after time away) and often faster to run. Certainly not slower to run.

My current habit is to suck down big datasets to parquet shards and then just query them with a wildcard in duckdb. I move to bigquery when doing true “big data” but a few GB of extract from BQ to a notebook VM disk and duckdb is super ergonomic and performant most of the time.

It’s the sql that I like. Being a veteran of when the world went mad for nosql it is just so nice to experience the revenge of sql.

theLiminator 9 hours ago | parent [-]

I personally find polars easier to read/write than sql. Especially when you start doing UDFs with numpy/et. al. I think for me, duckdb's clear edge is the cli experience.

> It is faster to write and faster to read

At least on clickbench, polars and duckdb are roughly comparable (with polars edging out duckdb).

erikcw 3 hours ago | parent | next [-]

I use them both depending on which feels more natural for the task, often within the same project. The interop is easy and very high performance thanks to Apache Arrow: `df = duckdb.sql(sql).pl()` and `result = duckdb.sql("SELECT * FROM df")`.

9 hours ago | parent | prev [-]
[deleted]