| ▲ | mritchie712 13 hours ago | ||||||||||||||||
tldr: this caches your S3 data in EFS. we run datalakes using DuckLake and this sounds really useful. GCP should follow suit quickly. | |||||||||||||||||
| ▲ | hiyer 11 hours ago | parent | next [-] | ||||||||||||||||
I was thinking of using it with Duckdb as well but seems it would be of limited benefit. Parquet objects are in MBs, so they would be streamed directly from S3. With raw parquet objects, it might help with S3 listing if you have a lot of them (shave off a couple of seconds from the query). If you are already on Ducklake, Duckdb will use that for getting the list of relevant objects anyway. | |||||||||||||||||
| |||||||||||||||||
| ▲ | anentropic 12 hours ago | parent | prev [-] | ||||||||||||||||
I am curious about this use case How do you see it helping with DuckLake? | |||||||||||||||||
| |||||||||||||||||