Shreya Shankar: Machine Learning in the Real World

In episode 89 of The Gradient Podcast, Daniel Bashir speaks to Shreya Shankar.Shreya is a computer scientist pursuing her PhD in databases at UC Berkeley. Her research interest is in building end-to-end systems for people to develop production-grade machine learning applications. She was previously the first ML engineer at Viaduct, did research at Google Brain, and software engineering at Facebook. She graduated from Stanford with a B.S. and M.S. in computer science with concentrations in systems and artificial intelligence. At Stanford, helped run SHE++, an organization that helps empower underrepresented minorities in technology.Have suggestions for future podcast guests (or other feedback)? Let us know here or reach us at editor@thegradient.pubSubscribe to The Gradient Podcast: Apple Podcasts | Spotify | Pocket Casts | RSSFollow The Gradient on TwitterOutline:* (00:00) Intro* (02:22) Shreya’s background and journey into ML / MLOps* (04:51) ML advances in 2013-2016* (05:45) Shift in Stanford undergrad class ecosystems, accessibility of deep learning research* (09:10) Why Shreya left her job as an ML engineer* (13:30) How Shreya became interested in databases, data quality in ML* (14:50) Daniel complains about things* (16:00) What makes ML engineering uniquely difficult* (16:50) Being a “historian of the craft” of ML engineering* (22:25) Levels of abstraction, what ML engineers do/don’t have to think about* (24:16) Observability for Production ML Pipelines* (28:30) Metrics for real-time ML systems* (31:20) Proposed solutions* (34:00) Moving Fast with Broken Data* (34:25) Existing data validation measures and where they fall short* (36:31) Partition summarization for data validation* (38:30) Small data and quantitative statistics for data cleaning* (40:25) Streaming ML Evaluation* (40:45) What makes a metric actionable* (42:15) Differences in streaming ML vs. batch ML* (45:45) Delayed and incomplete labels* (49:23) Operationalizing Machine Learning* (49:55) The difficult life of an ML engineer* (53:00) Best practices, tools, pain points* (55:56) Pitfalls in current MLOps tools* (1:00:30) LLMOps / FMOps* (1:07:10) Thoughts on ML Engineering, MLE through the lens of data engineering* (1:10:42) Building products, user expectations for AI products* (1:15:50) OutroLinks:* Papers* Towards Observability for Production Machine Learning Pipelines* Rethinking Streaming ML Evaluation* Operationalizing Machine Learning* Moving Fast With Broken Data* Blog posts* The Modern ML Monitoring Mess* Thoughts on ML Engineering After a Year of my PhD Get full access to The Gradient at thegradientpub.substack.com/subscribe

Om Podcasten