#22 Why testing data pipelines can be so challenging - and how to tackle it

In this episode of the Plumbers of Data Science podcast, I’m diving into why testing can be so challenging for data engineers. The inspiration for this topic actually came from one of my recent Coaching sessions, where the question of test-driven development (TDD) came up during a Q&A. It stuck with me, so I thought it would be a great topic to dive deeper into. I’ll explain the key benefits of TDD, like improved code quality and easier refactoring, and why, despite its advantages, it’s not always widely adopted—especially in fast-paced environments where time constraints dominate. We’ll also talk about the specific challenges data engineers face with TDD, such as handling large, unpredictable data, integrating with external systems, and adapting to ever-changing data.

Om Podcasten

Data Engineering is the plumbing of data science. Almost invisible, but super important and a big mess when done wrong. We talk about interesting Data Engineering trends and topics. I also train Data Engineering in my Data Engineering Academy at LearnDataEngineering.com