▲ | drej 6 days ago | ||||||||||||||||||||||||||||||||||
As a user? No, I don't have to choose. What I'm saying is that analysts (who this Polars Cloud targets, just like Coiled or Databricks) shouldn't worry about instance types, shuffling performance, join strategies, JVM versions, cross-AZ pricing etc. In most cases, they should just get a connection string and/or a web UI to run their queries, everything abstracted from them. Sure, Python code is more testable and composable (and I do love that). Have I seen _any_ analysts write tests or compose their queries? I'm not saying these people don't exist, but I have yet to bump into any. | |||||||||||||||||||||||||||||||||||
▲ | robertkoss 6 days ago | parent | next [-] | ||||||||||||||||||||||||||||||||||
You were talking about data engineering. If you do not write tests as a data engineer what are you doing then? Just hoping that you don't fuck up editing a 1000 > line SQL script? If you use Athena you still have to worry about shuffling and joining, it is just hidden.. It is Trino / Presto under the hood and if you click explain you can see the execution plan, which is essentially the same as looking into the SparkUI. Who cares about JVM versions nowadays? No one is hosting Spark themselves. Literally every tool now supports DataFrame AND SQL APIs and to me there is no reason to pick up SQL if you are familiar with a little bit of Python | |||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||
▲ | ritchie46 6 days ago | parent | prev | next [-] | ||||||||||||||||||||||||||||||||||
With Polars Cloud you don't have to choose those either. You can pick cpu/memory and we will offer autoscaling in a few months. Cluster configuration is optional if you want this control. Anyhow, this doesn't have much to do with the query API, be it SQL or DataFrame. | |||||||||||||||||||||||||||||||||||
▲ | ayhanfuat 6 days ago | parent | prev | next [-] | ||||||||||||||||||||||||||||||||||
I really doubt that Polars Cloud targets analysts doing ad-hoc analyses. It is much more likely towards people who build data pipelines for downstream tasks (ML etc). | |||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||
▲ | riku_iki 5 days ago | parent | prev | next [-] | ||||||||||||||||||||||||||||||||||
> analysts (who this Polars Cloud targets, just like Coiled or Databricks) shouldn't worry about instance types, shuffling performance, join strategies, I think this part(query optimizations) in general not solved/solvable, and it is sometimes/often(depending on domain) necessary to digg into details to make data transformation working. | |||||||||||||||||||||||||||||||||||
▲ | mr_toad 6 days ago | parent | prev | next [-] | ||||||||||||||||||||||||||||||||||
Analysts don’t because it’s not part of the training & culture. If you’re writing tests you’re doing engineering. That said the last Python code I wrote as a data engineer was to run tests on an SQL database, because the equivalent in SQL would have been tens of thousands of lines of wallpaper code. | |||||||||||||||||||||||||||||||||||
▲ | gigatexal 6 days ago | parent | prev [-] | ||||||||||||||||||||||||||||||||||
Again the issue you’re having is the skill level of the audience you keep bringing up not the tool. | |||||||||||||||||||||||||||||||||||
|