▲ | getnormality 4 days ago | |||||||
It's not hard to do data engineering to the standards of software engineering, and many people do it already, provided that 1. You use a real programming language that supports all the abstractions software engineers rely on, not (just) SQL. 2. The data is not too big, so the feedback cycle is not too horrendously slow. #2 can't ever be fully solved, but testing a data pipeline on randomly subsampled data can help a lot in my experience. | ||||||||
▲ | sdairs 4 days ago | parent [-] | |||||||
In your experience, how are folks doing (1)? The post is talking about a framework to add e.g. type safety, schema-as-code, etc. over assets in data infra in a familiar way as to what is common with Postgres; I'm not familiar with much else out there for that? | ||||||||
|