Remix.run Logo
max_streese a day ago

Two questions:

(1) Are there any plans to make this compatible with the ducklake specification? Meaning: Instead of using Iceberg in the background, you would use ducklake with its SQL tables? My knowledge is very limited but to me, besides leveraging duckdb, another big point of ducklake is that it's using SQL for the catalog stuff instead of a confusing mixture of files, thereby offering a bunch of advantages like not having to care about number of snapshots and better concurrent writes.

(2) Might it be possible that pg_duckdb will achieve the same thing in some time or do things not work like that?

mslot a day ago | parent [-]

(1) We've thought about it, no current plans. We'd ideally reimplement DuckLake in Postgres directly such that we can preserve Postgres transaction boundaries, rather than reuse the Ducklake implementation that would run in a separate process. The double-edged sword is that there's a bunch of complexity around things like inlined data and passing the inlined data into DuckDB at query time, though if we can do that then you can get pretty high transaction performance.

(2) In principle, it's a bit easier for pg_duckdb to reuse the existing Ducklake implementation because DuckDB sits in every Postgres process and they can call into each other, but we feel that architecture is less appropriate in terms resource management and stability.