▲ | mynameisash 4 days ago | ||||||||||||||||||||||
The comments here are... interesting, as they indicate a strong split between analysts and those engineers that can operationalize things. I see another dimension to it all. My title is senior data engineer at GAMMA/FAANG/whatever we're calling them. I have a CS degree and am firmly in the engineering. My passion, though, is in using software engineering and computer science principles to make very large-scale data processing as stupid fast as we can. To the extent I can ignore it, I don't personally care much about the tooling and frameworks and such (CI/CD, Airflow, Kafka, whatever). I care about how we're affinitizing our data, how we index it, whether and when we can use data sketches to achieve a good tradeoff between accuracy and compute/memory, and so on. While there are plenty of folks in this thread bashing analysts, one could also bash other "proper" engineers that can do the CI/CD but don't know shit about how to be efficient with petabyte-scale processing. | |||||||||||||||||||||||
▲ | kentm 3 days ago | parent | next [-] | ||||||||||||||||||||||
People who can utilize the tooling to process petabytes of data efficiently aren’t the ones that are catching flack. The people I’m thinking of basically run massive inefficient SQL queries and then throw their hands up when it runs slowly or gets an oom error. They don’t even know how to do an explain plan. And if you try to explain to them things like partitioning, indexes, sketches, etc then they are not able to comprehend and argue that it’s not their job to learn, and that it’s the “proper engineers” job to scale the processing. | |||||||||||||||||||||||
| |||||||||||||||||||||||
▲ | VirusNewbie 4 days ago | parent | prev | next [-] | ||||||||||||||||||||||
>one could also bash other "proper" engineers that can do the CI/CD but don't know shit about how to be efficient with petabyte-scale processing. But that would be SWEs no? I was a 'data engineer' (until they changed the terrible title) at a startup and I ended up having to fight with Spark and Apache Beam at times, eventually contributing back to improve throughput for our use cases. That's not the same thing as a Business Analyst who can run a little pyspark query. | |||||||||||||||||||||||
▲ | tdb7893 4 days ago | parent | prev | next [-] | ||||||||||||||||||||||
I mean this very sincerely but I'm a little lost how data engineering is distinct from software engineering. It seems like just a subset of it, my title was software engineer and I've done what sounds like very similar work. | |||||||||||||||||||||||
| |||||||||||||||||||||||
▲ | food4u 3 days ago | parent | prev [-] | ||||||||||||||||||||||
[dead] |