46. Ihab Ilyas - Data cleaning is finally being automated

It’s cliché to say that data cleaning accounts for 80% of a data scientist’s job, but it’s directionally true.

That’s too bad, because fun things like data exploration, visualization and modelling are the reason most people get into data science. So it’s a good thing that there’s a major push underway in industry to automate data cleaning as much as possible.

One of the leaders of that effort is Ihab Ilyas, a professor at the University of Waterloo and founder of two companies, Tamr and Inductiv, both of which are focused on the early stages of the data science lifecycle: data cleaning and data integration. Ihab knows an awful lot about data cleaning and data engineering, and has some really great insights to share about the future direction of the space — including what work is left for data scientists, once you automate away data cleaning.

Om Podcasten

Note: The TDS podcast's current run has ended. Researchers and business leaders at the forefront of the field unpack the most pressing questions around data science and AI.