Exposing the Rotten Reality of AI Training Data

In a report released December 20, 2023, the Stanford Internet Observatory said it had detected more than 1,000 instances of verified child sexual abuse imagery in a significant dataset utilized for training generative AI systems such as Stable Diffusion 1.5. This troubling discovery builds on prior research into the “dubious curation” of large-scale datasets used to train AI systems, and raises concerns that such content may contributed to the capability of AI image generators in producing realistic counterfeit images of child sexual exploitation, in addition to other harmful and biased material. Justin Hendrix spoke the report’s author, Stanford Internet Observatory Chief Technologist David Thiel.

Om Podcasten