▲ | Show HN: PyBujia, Easy Unit Testing for PySpark Jobs(github.com) | |
2 points by jpgerek 11 hours ago | ||
As a Data Engineer, I've often wondered why so many companies don't unit test their Spark jobs. In my experience, the main reasons are: - Creating DataFrame fixtures (data and schemas) takes too much time - Debugging across multiple tables is complicated - Boilerplate code is verbose and repetitive To address these pain points, I built PyBujia, a framework that: - Lets you define table fixtures using Markdown to facilitate DataFrame creation, debugging and readability. - Generalizes the boilerplate, saving setup time It's made testing Spark jobs much easier for me, now I do TDD, and I hope it helps other Data Engineers as well. Feedback is very welcome! |