| ▲ | Quack-Cluster: A Serverless Distributed SQL Query Engine with DuckDB and Ray(github.com) | |||||||||||||
| 63 points by tanelpoder 4 days ago | 12 comments | ||||||||||||||
| ▲ | fodkodrasz 7 hours ago | parent | next [-] | |||||||||||||
So DuckDB was developed to allow queries for bigish data finally without the need for a cluster to simplify data analysis... and we now put it to a cluster? I think there are solutions for that scale of data already, and simplicity is the best feature of DuckDB (at lest for me). | ||||||||||||||
| ||||||||||||||
| ▲ | asyncadventure 4 hours ago | parent | prev | next [-] | |||||||||||||
Interesting take on extending DuckDB beyond single-machine limits. The discussion about "over-engineering" vs real scale needs resonates with a project I worked on recently - sometimes you hit that awkward middle ground where single-node DuckDB maxes out but full Spark feels like bringing a cannon to a knife fight. The Ray abstraction here is clever for bridging that gap, though the serverless claims seem overstated given Ray's infrastructure requirements. | ||||||||||||||
| ▲ | rfonseca 5 hours ago | parent | prev | next [-] | |||||||||||||
What is the lifetime of the Ray workers, or, in other words, what is the scalability / scale-to-zero story that makes this serverless? | ||||||||||||||
| ▲ | nevalainen 7 hours ago | parent | prev | next [-] | |||||||||||||
feels like a missed opportunity to call it cluster-quack xD | ||||||||||||||
| ||||||||||||||
| ▲ | pickleballcourt 2 hours ago | parent | prev | next [-] | |||||||||||||
Reminds me of smallpond from deepseek | ||||||||||||||
| ||||||||||||||
| ▲ | mgaunard 7 hours ago | parent | prev | next [-] | |||||||||||||
In my experience ray clusters don't scale well and end up costing you more money. You need to run permanent per-user instances etc. What you need is a multi-tenancy shared infrastructure that is elastic. | ||||||||||||||
| ▲ | dogman123 7 hours ago | parent | prev | next [-] | |||||||||||||
neat. i'm pretty novice in the guts of this kind of stuff, but how does this work under the hood for blocking operators where they "cannot output a single row until the last row of their input has been seen"? i think this is where spark shuffling comes in? but how does it work here. https://duckdb.org/docs/stable/guides/performance/how_to_tun... | ||||||||||||||
| ▲ | thenaturalist 4 hours ago | parent | prev [-] | |||||||||||||
> "Forget about managing complex server infrastructure for your database needs." So what does this run on then? No docs, it's not possible to find any deployment guides for Ray using serverless solutions like Lambda, Cloud Functions or be it your own Firecracker. Instead, every other post mentions EKS or EC2. The Ray team even rejected Lambda support expressedly as far back as 2020 [0]. Uuuuuugh. No thanks! shiver I'd rather cut complexity for practically the same benefit and either do it single machine or have a thin, manageable layer on top a truly serverless infra like in this talk [1] " Processing Trillions of Records at Okta with Mini Serverless Databases". | ||||||||||||||