| ▲ | boshomi 2 days ago | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Why not just use Ducklake?[1] That reduces complexity[2] since only DuckDB and PostgreSQL with pg_duckdb are required. [2] DuckLake - The SQL-Powered Lakehouse Format for the Rest of Us by Prof. Hannes Mühleisen: https://www.youtube.com/watch?v=YQEUkFWa69o | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | mslot 2 days ago | parent | next [-] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
DuckLake is pretty cool, and we obviously love everything the DuckDB is doing. It's what made pg_lake possible, and what motivated part of our team to step away from Microsoft/Citus. DuckLake can do things that pg_lake cannot do with Iceberg, and DuckDB can do things Postgres absolutely can't (e.g. query data frames). On the other hand, Postgres can do a lot of things that DuckDB cannot do. For instance, it can handle >100k single row inserts/sec. Transactions don't come for free. Embedding the engine in the catalog rather than the catalog in the engine enables transactions across analytical and operational tables. That way you can do a very high rate of writes in a heap table, and transactionally move data into an Iceberg table. Postgres also has a more natural persistence & continuous processing story, so you can set up pg_cron jobs and use PL/pgSQL (with heap tables for bookkeeping) to do orchestration. There's also the interoperability aspect of Iceberg being supported by other query engines. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | pgguru 2 days ago | parent | prev | next [-] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Boils down to design decisions; see: https://news.ycombinator.com/item?id=45813631 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | swasheck a day ago | parent | prev [-] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
i have so desperately wanted to love and use ducklake, but have come across some issues with it in practice (pg catalog). they seem to have to do with maintenance activities and ducklake suddenly throwing http/400 errors on files it created. i’m not sure if it’s due to my write patterns (gather data from sources into a polars dataframe and insert into the ducklake table from the df) into the partitioned tables or something else. it’s ok in dev/test and for me as the person in the team who’s enamored with duckdb, but it’s made the team experience challenging and so i’ve just kinda reverted to hive partitioned parquet files with a duckdb file that has views created on top of the parquet. attach that file as read only and query away. i may work up a full example to submit as an issue but up until now too may other things are dominating my time. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||