If you’re Google or Netflix, and you have a recommendation or search system as part of your bread and butter, what’s the best way to test improvements to your algorithm? A/B testing is the canonical answer for testing how users respond to software changes, but it gets tricky really fast to think about what an A/B test means in the context of an algorithm that returns a ranked list. That’s why we’re talking about interleaving this week—it’s a simple modification to A/B testing that makes it much easier to race two algorithms against each other and find the winner, and it allows you to do it with much less data than a traditional A/B test. Relevant links: https://medium.com/netflix-techblog/interleaving-in-online-experiments-at-netflix-a04ee392ec55 https://www.microsoft.com/en-us/research/publication/predicting-search-satisfaction-metrics-with-interleaved-comparisons/ https://www.cs.cornell.edu/people/tj/publications/joachims_02b.pdf
In each episode, your hosts explore machine learning and data science through interesting (and often very unusual) applications.