Remix.run Logo
faxmeyourcode 4 days ago

This is insane to read as a data engineer who actually builds software. These sound like amateurs, not experienced data engineers to be perfectly honest.

There are plenty of us out here with many repos, dozens of contributors, and thousands of lines of terraform, python, custom GitHub actions, k8s deployments running airflow and internal full stack web apps that we're building, EMR spark clusters, etc. All living in our own Snowflake/AWS accounts that we manage ourselves.

The data scientists that we service use notebooks extensively, but it's my teams job to clean it up and make it testable and efficient. You can't develop real software in a notebook, it sounds like they need to upskill into a real orchestration platform like airflow and run everything through it.

Unit test the utility functions and helpers, data quality test the data flowing in and out. Build diff reports for understanding big swings in the data to sign off changes.

My email is in my profile I'm happy to discuss further! :-)