33: Katharine Jarmul - Testing in Data Science

A discussion with Katharine Jarmul, aka kjam, about some of the challenges of data science with respect to testing.

Some of the topics we discuss:

  • experimentation vs testing
  • testing pipelines and pipeline changes
  • automating data validation
  • property based testing
  • schema validation and detecting schema changes
  • using unit test techniques to test data pipeline stages
  • testing nodes and transitions in DAGs
  • testing expected and unexpected data
  • missing data and non-signals
  • corrupting a dataset with noise
  • fuzz testing for both data pipelines and web APIs
  • datafuzz
  • hypothesis
  • testing internal interfaces
  • documenting and sharing domain expertise to build good reasonableness
  • intermediary data and stages
  • neural networks
  • speaking at conferences

Special Guest: Katharine Jarmul.

Sponsored By:

Support Test & Code: Python Software Testing & Engineering

Links:

Om Podcasten

Topics include automated testing, testing strategy, software engineering practices, packaging, Python, pytest, data science, TDD, continuous integration, and software methodologies. Also anything I think helps make the daily life of a software developer more fun and rewarding. Hosted by Brian Okken.