Remix.run Logo
oulipo2 3 days ago

Nice, what would be your typical setup?

You keep like 1 year's worth of data in your "business database", and then archive the rest in S3 with parquet and query with DuckDB ?

And if you want to sync everything, even "current data", to do datascience/analytics, can you just write the recent data (eg the last week of data or whatever) in S3 every hours/days to get relatively up-to-date data? And doesn't that cause the S3 data to grow needlessly (eg does it replace, rather than store an additional copy of recent data each hour?)

Do you have kind of "starter project" for a Postgres + DuckLake integration that I could look at to see how it's used in practice, and how it makes some operations easier?

mritchie712 3 days ago | parent [-]

Once you have meltano installed, it's just be:

    ```
    meltano run tap-postgres target-ducklake
    ```
Setting up meltano would be a bit more involved[0]

0 - https://www.notion.so/luabase/Postgres-to-DuckLake-example-2...