▲ | jochem9 3 days ago | ||||||||||||||||||||||
One thing that I don't see mentioned but that does bug me: data engineers often use a lot of Python and SQL, even the ones that have heavily adopted software engineering best practices. Yet both languages are not great for this. Python is dynamically typed, which you can patch a bit with type hints, but it's still easy to go to production with incompatible types, leading to panics in prod. It's uncompiled nature also makes it very slow. SQL is pretty much impossible to unit test, yet often you will end up with logic that you want to test. E.g. to optimize a query. For SQL I don't have a solution. It's a 50 year old language that lacks a lot of features you would expect. It's also the defacto standard for database access. For Python I would say that we should start adopting statically typed compiled languages. Rust has polars as dataframe package, but the language itself isn't that easy to pick up. Go is very easy to learn, but has no serious dataframe package, so you end up doing a lot of that work yourself in goroutines. Maybe there are better options out there. | |||||||||||||||||||||||
▲ | orochimaaru 3 days ago | parent | next [-] | ||||||||||||||||||||||
If you’re using some variety of spark for your data engineering then scala is an option too. In general, choice of language isn’t important - again if you’re using spark your data frame structure schema defines that structure Python or not. Most folks confuse pandas with “data engineering”. It’s not. Most data engineering is spark. | |||||||||||||||||||||||
| |||||||||||||||||||||||
▲ | sbrother 3 days ago | parent | prev | next [-] | ||||||||||||||||||||||
When I was most recently at Google (2021-ish) my team owned a bunch of SQL Pipelines that had fairly effective SQL tests. Not my favorite thing to work on, but it was a productive way to transform data. There are lots of open source versions of the same idea, but I have yet to see them accompanied with ergonomic testing. Any recommendations or pointers to open source SQL testing frameworks? | |||||||||||||||||||||||
| |||||||||||||||||||||||
▲ | greekorich 3 days ago | parent | prev [-] | ||||||||||||||||||||||
I've been a professional java dev for a decade. I've written a little python, clojure, lots of JS/TS/Node. SQL is the most beautiful, expressive, get stuff done language I've used. It is perfect for whatever data engineering is defined as. | |||||||||||||||||||||||
|