Remix.run Logo
benrutter 6 hours ago

This times a zillion! I think there's been a huge industry push to convince managers and more junior engineers that spark and distributed tools are the correct way to do data engineering.

I think its a similar pattern to web dev influencers have convinced everyone to build huge hydrated-spa-framework-craziness where a static site would do.

My advice to get out of this mess:

- Managers, don't ask for specific solutions (spark, react). Ask for clever engineers to solve problems and optimise / track what you vare about (cost, performance etc). You hired them to know best, and they probably do.

- Technical leads, if your manager is saying "what about hyperscale?" You don't have to say "our existing solution will scale forever". It's fine to say, "our pipelines handle dataset up to 20GB, we don't expect to see anything larger soon, and if we do we'll do x/y/z to meet that scale". Your manager probably just wants to know scaling isn't going to crash everything, not that you've optimised the hell out of everything for your excel spreadsheet processing pipeline.

woeirua 3 hours ago | parent | next [-]

Here’s the thing though, most companies work with small data. The distribution of data set size follows a power law which means that few engineers get to work with petabyte sized datasets. That said, the job market definitely incentivizes people to have that kind of experience on their resume if they want to keep progressing in salary. This incentivizes over engineering.

zug_zug 5 hours ago | parent | prev [-]

Absolutely, when I worked at (semi-well-known unicorn) a half-dozen years ago on the data-engineering team the manager told me "Hey we want to use spark next quarter, that's a huge initiative."

And I immediately asked, "in what capacity?" And the answer was don't-know/doesn't-matter, it's just important that we can say we're using it. I really wish I understood where that was coming from (his manager resume-building? somebody getting a kickback?)

thwarted 6 minutes ago | parent | next [-]

The most interesting part is that you can say you're doing/using something entirely independent of if you actually are. Sure, that's a lie, but so is only using something so you can say you're using it (sure, they admitted to you that was the reason, but that won't be the reason they put on LinkedIn).

coliveira an hour ago | parent | prev | next [-]

They'll never say it's resume building or kickbacks, they'll invent some technically sounding and/or business reason to achieve the same result.

spauldo 2 hours ago | parent | prev [-]

That's when you rewrite your codebase in the SPARK dialect of Ada and play innocent when your management questions you about it.