Remix.run Logo
mritchie712 2 days ago

> For instance, it can handle >100k single row inserts/sec.

DuckLake already has data-inlining for the DuckDB catalog, seems this will be possible once it's supported in the pg catalog.

> Postgres also has a more natural persistence & continuous processing story, so you can set up pg_cron jobs and use PL/pgSQL (with heap tables for bookkeeping) to do orchestration.

This is true, but it's not clear where I'd use this in practice. e.g. if I need to run a complex ETL job, I probably wouldn't do it in pg_cron.

derefr 2 days ago | parent [-]

> This is true, but it's not clear where I'd use this in practice. e.g. if I need to run a complex ETL job, I probably wouldn't do it in pg_cron.

Think "tiered storage."

See the example under https://github.com/Snowflake-Labs/pg_lake/blob/main/docs/ice...:

   select cron.schedule('flush-queue', '* * * * *', $$
     with new_rows as (
       delete from measurements_staging returning *
     )
     insert into measurements select * from new_rows;
   $$);
The "continuous ETL" process the GP is talking about would be exactly this kind of thing, and just as trivial. (In fact it would be this exact same code, just with your mental model flipped around from "promoting data from a staging table into a canonical iceberg table" to "evicting data from a canonical table into a historical-archive table".)